这篇文章将为大家详细讲解有关asm中dismount导致rac一个节点宕机该怎么办,文章内容质量较高,因此小编分享给大家做个参考,希望大家阅读完这篇文章后对相关知识有一定的了解。
asm日志
/u01/app/grid/diag/asm/+asm/+ASM1/trace
Thu Jul 30 02:10:46 2015<br />
WARNING: Waited 15 secs for write IO to PST disk 0 in group 1.
WARNING: Waited 15 secs for write IO to PST disk 1 in group 1.
WARNING: Waited 15 secs for write IO to PST disk 2 in group 1.
WARNING: Waited 15 secs for write IO to PST disk 0 in group 1.
WARNING: Waited 15 secs for write IO to PST disk 1 in group 1.
WARNING: Waited 15 secs for write IO to PST disk 2 in group 1.
Thu Jul 30 02:10:47 2015<br />
NOTE: process _b000_+asm1 (38695) initiating offline of disk 0.3915941304 (DATA2_0000) with mask 0x7e in group 1
NOTE: process _b000_+asm1 (38695) initiating offline of disk 1.3915941302 (DATA2_0001) with mask 0x7e in group 1
NOTE: process _b000_+asm1 (38695) initiating offline of disk 2.3915941303 (DATA2_0002) with mask 0x7e in group 1<br />
NOTE: checking PST: grp = 1
GMON checking disk modes for group 1 at 12 for pid 28, osid 38695
ERROR: no read quorum in group: required 2, found 0 disks
Dirty Detach Reconfiguration complete<br />
Thu Jul 30 02:10:47 2015
WARNING: dirty detached from domain 1
NOTE: cache dismounted group 1/0xB368755B (DATA2) <–自己dismounted了
SQL> alter diskgroup DATA2 dismount force /* ASM SERVER:3009967451 */
Thu Jul 30 02:11:24 2015<br />
NOTE: Instance updated compatible.asm to 11.2.0.0.0 for grp 1
SUCCESS: diskgroup DATA2 was mounted <
自己又mounted了
SUCCESS: ALTER DISKGROUP DATA2 MOUNT /* asm agent *//* {0:31:15779} */
alert可以看到ASM磁盘dismount,并且是错误“Waited 15 secs for write IO to PST”的问题,这是ASM特有的心跳超时检测,<br />
ASM instance会定期检查每个asm disk是不是能正常反馈
Generally this kind messages comes in ASM alertlog file on below situations,
Delayed ASM PST heart beats on ASM disks in normal or high redundancy diskgroup,
thus the ASM instance dismount the diskgroup.By default, it is 15 seconds.
By the way the heart beat delays are sort of ignored for external redundancy diskgroup.
ASM instance stop issuing more PST heart beat until it succeeds PST revalidation,
but the heart beat delays do not dismount external redundancy diskgroup directly.
上面描述,可以理解为下面几点:1. ASM实例会定期检查每一个磁盘组的磁盘状态,是否通信正常;
2. 这个检查,只是针对normal和high冗余模式,对于external冗余,不会遇到这个错误;
3. 默认情况是15s超时,也就是说15s磁盘组还是没有对ASM实例响应的话,就会dismount磁盘组。在存储网络出现问题的情况下,会引发这个错误的出现。也就是说,在ASM定期发出检查信息的时候,如果磁盘没有在15s内反馈的话,就认为磁盘已经无法访问。
实际情况是上面的凌晨2:10时间点正好是做全库备份时间,估计大量的写入导致io响应慢<br />
<br />
在11.2.0.3.0之后才有这个参数出现,也就是说ASM实例对磁盘超时的检测是在11.2.0.3之后才出现的<br />
<br />
<br />
set pages 9999;<br />
<br />
SELECT x.ksppinm NAME, y.ksppstvl VALUE, x.ksppdesc describ<br />
FROM SYS.x$ksppi x, SYS.x$ksppcv y<br />
WHERE x.inst_id = USERENV ('Instance')<br />
AND y.inst_id = USERENV ('Instance')<br />
AND x.indx = y.indx<br />
AND upper(x.ksppinm) like '%ASM_H%';<br />
显示如下:
_asm_hbeatiowait
number of secs to wait for PST Async Hbeat IO return
_asm_hbeatwaitquantum
quantum used to compute time-to-wait for a PST Hbeat check
在存储网络条件不是很好的情况下可以设置检查时间长点,其实在12.1.0.2默认就是120秒了
alter system set "_asm_hbeatiowait"=120 scope=spfile;
重启asm 继续观察
关于asm中dismount导致rac一个节点宕机该怎么办就分享到这里了,希望以上内容可以对大家有一定的帮助,可以学到更多知识。如果觉得文章不错,可以把它分享出去让更多的人看到。
原创文章,作者:奋斗,如若转载,请注明出处:https://blog.ytso.com/203836.html