现象:
在12.2.0.1 版本,集群环境
当在一个大表上做 truncate 操作时,实例 alert log 里出现以下报错并且实例崩溃。
SQL> truncate table xxxxx;
数据库实例 alert log(<oracle base>/diag/rdbms/<db name>/<sid name>/trace/alert_<sid name>.log):
2018-03-20T08:09:21.705048+08:00
LCK1 (ospid: 99265) waits for event 'libcache interrupt action by LCK' for 72 secs.
2018-03-20T08:09:21.705135+08:00
LCK1 (ospid: 99265) is hung in an acceptable location (libcache 0x41.02).
2018-03-20T08:11:41.552652+08:00
LCK1 (ospid: 99265) waits for event 'libcache interrupt action by LCK' for 212 secs.
2018-03-20T08:11:41.552746+08:00
LCK1 (ospid: 99265) is hung in an acceptable location (libcache 0x41.02).
2018-03-20T08:13:31.818425+08:00
IPC Receiver dump detected. Sender instance 1 Receiver pnum 91 ospid 99437 [oracle@rac2 (LCK0)], pser 1
2018-03-20T08:13:31.866524+08:00
Errors in file /u01/app/oracle/diag/rdbms/rac/rac2/trace/rac2_lck0_99437.trc:
2018-03-20T08:14:20.720213+08:00
Detected an inconsistent instance membership by instance 1
Errors in file /u01/app/oracle/diag/rdbms/rac/rac2/trace/rac2_lmon_99188.trc (incident=811697):
ORA-29740: evicted by instance number 1, group incarnation 6
Incident details in: /u01/app/oracle/diag/rdbms/rac/rac2/incident/incdir_811697/rac2_lmon_99188_i811697.trc
2018-03-20T08:14:22.580678+08:00
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
2018-03-20T08:14:22.593568+08:00
Errors in file /u01/app/oracle/diag/rdbms/rac/rac2/trace/rac2_lmon_99188.trc:
ORA-29740: evicted by instance number 1, group incarnation 6
Errors in file /u01/app/oracle/diag/rdbms/rac/rac2/trace/rac2_lmon_99188.trc (incident=811698):
ORA-29740 [] [] [] [] [] [] [] [] [] [] [] []
Incident details in: /u01/app/oracle/diag/rdbms/rac/rac2/incident/incdir_811698/rac2_lmon_99188_i811698.trc
2018-03-20T08:14:22.774895+08:00
CJQ0 (ospid: 108083): terminating the instance due to error 481
2018-03-20T08:14:25.187706+08:00
License high water mark = 61
2018-03-20T08:14:33.896168+08:00
Instance terminated by CJQ0, pid = 108083
2018-03-20T08:14:33.919170+08:00
Warning: 2 processes are still attach to shmid 622610:
(size: 143360 bytes, creator pid: 63799, last attach/detach pid: 99159)
2018-03-20T08:14:34.330367+08:00
USER (ospid: 92815): terminating the instance
2018-03-20T08:14:34.335851+08:00
Instance terminated by USER, pid = 92815
2018-03-20T08:14:57.912601+08:00
Starting ORACLE instance (normal) (OS id: 99357)
远端实例 alert log (<oracle base>/diag/rdbms/<db name>/<sid name>/trace/alert_<sid name>.log):
Evicting instance 2 from cluster
Waiting for instances to leave: 2
2018-03-20T08:14:20.935803+08:00
IPC Send timeout to 2.8 inc 4 for msg type 65521 from opid 32
......
2018-03-20T08:14:20.954301+08:00
IPC Send timeout to 2.2 inc 4 for msg type 65521 from opid 26
......
2018-03-20T08:14:21.568723+08:00
Errors in file /u01/app/oracle/diag/rdbms/rac/rac1/trace/rac1_dia0_21859_base_1.trc:
ORA-27508: IPC error sending a message
2018-03-20T08:14:34.973773+08:00
......
2018-03-20T08:14:40.850954+08:00
Remote instance kill is issued with system inc 6
Remote instance kill map (size 1) : 2
LMON received an instance eviction notification from instance 1
The instance eviction reason is 0x20000000
The instance eviction map is 2
2018-03-20T08:14:50.838054+08:00
Waiting for instances to leave: 2
2018-03-20T08:14:53.920473+08:00
Reconfiguration started (old inc 4, new inc 8)
List of instances (total 1) :
1
Dead instances (total 1) :
2
My inst 1
原因:
这个问题在bug 28111583 中被研究过。
当 truncate 开始时,它会引起很高的私网通信流量并使 lms 进程不能继续进行,因此实例2崩溃了很多次。
根本原因是 IPC 发送没有流量控制导致接受超出限制从而丢包。
解决方案:
这个 bug 已经在 12.2.0.1.190115DBRU 修复, 您也可以应用 DBRU 来解决此问题。
如果它在相应的版本和平台存在,也可应用补丁 28111583
|