重庆思庄Oracle、Redhat认证学习论坛

 找回密码
 注册

QQ登录

只需一步,快速开始

搜索
查看: 1263|回复: 0
打印 上一主题 下一主题

[Oracle] 数据库alert报错:ORA-00202、ORA-15081、ORA-27072

[复制链接]
跳转到指定楼层
楼主
发表于 2021-11-2 12:34:51 | 只看该作者 回帖奖励 |倒序浏览 |阅读模式
思路分析:
1、发现数据库宕机,检查alert日志发现如下出现控制文件:I/O错误
Thu Apr 11 06:40:14 2019
WARNING: Read Failed. group:2 disk:1 AU:675 offset:16384 size:16384
WARNING: failed to read mirror side 1 of virtual extent 0 logical extent 0 of file 260 in group [2.3852408873] from disk DATA_0001 allocation unit 675 reason error; if possible, will try another mirror side
Errors in file /u01/app/oracle/diag/rdbms/jsswgsjk/jsswgsjk1/trace/jsswgsjk1_ckpt_93628.trc:
ORA-00202: control file: '+DATA/jsswgsjk/controlfile/current.260.998936297'
ORA-15081: failed to submit an I/O operation to a disk
ORA-27072: File I/O error
Linux-x86_64 Error: 5: Input/output error
Additional information: 4
Additional information: 1382432
Additional information: -1
Thu Apr 11 06:40:15 2019
WARNING: Read Failed. group:2 disk:1 AU:675 offset:65536 size:16384
2、检查ASM日志
-------发生磁盘超时,开始dimountOCR
Thu Apr 11 06:39:29 2019
NOTE: process _b000_+asm1 (31654636) initiating offline of disk 0.3671375779 (OCR_0000) with mask 0x7e in group 3
NOTE: process _b000_+asm1 (31654636) initiating offline of disk 1.3671375780 (OCR_0001) with mask 0x7e in group 3
NOTE: process _b000_+asm1 (31654636) initiating offline of disk 2.3671375781 (OCR_0002) with mask 0x7e in group 3
NOTE: checking PST: grp = 3
GMON checking disk modes for group 3 at 13 for pid 67, osid 31654636
ERROR: no read quorum in group: required 2, found 0 disks
NOTE: checking PST for grp 3 done.
NOTE: initiating PST update: grp = 3, dsk = 0/0xdad4bfa3, mask = 0x6a, op = clear
NOTE: initiating PST update: grp = 3, dsk = 1/0xdad4bfa4, mask = 0x6a, op = clear
NOTE: initiating PST update: grp = 3, dsk = 2/0xdad4bfa5, mask = 0x6a, op = clear
GMON updating disk modes for group 3 at 14 for pid 67, osid 31654636
ERROR: no read quorum in group: required 2, found 0 disks  <<<< 0个磁盘可访问。
Thu Apr 11 06:39:29 2019

解决方案:
1、综合以上信息分析,故障分析总结如下:
Oracle RAC ASM管理磁盘组有一种特有的心跳磁盘监控’ASM PST heartbeat’,这个监控是在oracle 11.2.0.3之后出现,系统默认设至是15s,到12.1.0.2之后oracle把默认值改为了120s。
这个PST heartbeat:往往发生在IO闪断/繁忙/CPU繁忙时,PST检测到同步延迟超过"_asm_hbeatiowait"值时,会通知ORACLE ASM INSTANCE dismount disk group,造成ASM instance disk group offline。一般Normal Redundancy或者High Redundancy策略下,超过半数的disk group offline就会造成Rack脑裂。
我们任何的升级在链路切换中,PP一般会hold住 IO 15秒钟左右再恢复,很大可能性会引起上述timeout问题,在升级之前强烈建议更改此参数值到120。
具体的检查这个参数的办法如下,修改为120s后,为确保设置生效,需要重启CRS服务。
2、检查参数 “_asm_hbeatiowait” 的值:(检查为:15)
select ksppinm as "hidden parameter", ksppstvl as "value"
  from x$ksppi
  join x$ksppcv
using (indx)
where ksppinm like '\_%' escape '\'
   and ksppinm like '%asm_hb%'
order by ksppinm;
3、修改方案,在ASM实例下调整
alter system set "_asm_hbeatiowait"=120 scope=spfile;
注意重启ASM或者CRS

分享到:  QQ好友和群QQ好友和群 QQ空间QQ空间 腾讯微博腾讯微博 腾讯朋友腾讯朋友
收藏收藏 支持支持 反对反对
回复

使用道具 举报

您需要登录后才可以回帖 登录 | 注册

本版积分规则

QQ|手机版|小黑屋|重庆思庄Oracle、Redhat认证学习论坛 ( 渝ICP备12004239号-4 )

GMT+8, 2024-5-5 17:56 , Processed in 0.094010 second(s), 20 queries .

重庆思庄学习中心论坛-重庆思庄科技有限公司论坛

© 2001-2020

快速回复 返回顶部 返回列表