现象:
1.Performing a disk remove from a diskgroup
SQL> ALTER DISKGROUP OCR drop DISK OCR_000n rebalance POWER n;
2.The following error reported during rebalance in RBAL trace file:+ASM2_rbal_xxxx.trc
*** 2023-10-24T13:28:49.980843+08:00
RBAL processing file +OCR.258.1098786551 HARDCHECK on:0 flags:0x0
RBAL processing file +OCR.259.1151017003 HARDCHECK on:0 flags:0x0
RBAL processing file +OCR.260.1150037575 HARDCHECK on:0 flags:0x0
RBAL processing file +OCR.261.1150930583 HARDCHECK on:0 flags:0x0
RBAL processing file +OCR.262.1150642517 HARDCHECK on:0 flags:0x0
RBAL processing file +OCR.263.1139565669 HARDCHECK on:0 flags:0x0
RBAL processing file +OCR.264.1151060211 HARDCHECK on:0 flags:0x0
RBAL processing file +OCR.266.1151031403 HARDCHECK on:0 flags:0x0
RBAL processing file +OCR.267.1151045807 HARDCHECK on:0 flags:0x0
KFGB remirror; pri=0
kfgbScheduleExpel: REBAL gn: 3 xplsta 0 xpcnt 0
NOTE: stopping process ARB0
NOTE: stopping process ARBA
waiting for ARB0 rundown on group 3
kfgbTryFn: failed to acquire HD.3.0 in 6 for kfgbVotingFileRelocateNow (of group 3/0xf916c2fe)
*** 2023-10-24T13:28:52.180445+08:00
waiting for expel rundown on group 3
kfgbTryFn: failed to acquire HD.3.0 in 6 for kfgbVotingFileRelocateNow (of group 3/0xf916c2fe)
*** 2023-10-24T13:28:55.188585+08:00
waiting for expel rundown on group 3
kfgbTryFn: failed to acquire HD.3.0 in 6 for kfgbVotingFileRelocateNow (of group 3/0xf916c2fe)
*** 2023-10-24T13:28:58.196504+08:00
waiting for expel rundown on group 3
kfgbTryFn: failed to acquire HD.3.0 in 6 for kfgbVotingFileRelocateNow (of group 3/0xf916c2fe)
*** 2023-10-24T13:29:01.204473+08:00
waiting for expel rundown on group 3
kfgbTryFn: failed to acquire HD.3.0 in 6 for kfgbVotingFileRelocateNow (of group 3/0xf916c2fe)
*** 2023-10-24T13:29:04.212519+08:00
waiting for expel rundown on group 3
kfgbTryFn: failed to acquire HD.3.0 in 6 for kfgbVotingFileRelocateNow (of group 3/0xf916c2fe)
*** 2023-10-24T13:29:07.221553+08:00
waiting for expel rundown on group 3
kfgbTryFn: failed to acquire HD.3.0 in 6 for kfgbVotingFileRelocateNow (of group 3/0xf916c2fe)
*** 2023-10-24T13:29:10.228604+08:00
waiting for expel rundown on group 3
kfgbTryFn: failed to acquire HD.3.0 in 6 for kfgbVotingFileRelocateNow (of group 3/0xf916c2fe)
*** 2023-10-24T13:29:13.236640+08:00
waiting for expel rundown on group 3
kfgbTryFn: failed to acquire HD.3.0 in 6 for kfgbVotingFileRelocateNow (of group 3/0xf916c2fe)
3.Checking gv$asm_operation, you will see the ora-59048 reported in ERROR_CODE column.
select * from gv$asm_operation;
INST_ID GROUP_NUMBER OPERA PASS STAT POWER ACTUAL SOFAR EST_WORK EST_RATE EST_MINUTES ERROR_CODE
---------- ---------- ----------- --------------------------------------------
2 5 REBAL REBALANCE ERRS 5 ORA-59048
OS may also terminated by ocssd monitor process and as ocssd termiated with the following errors if the action performed related to vote disks :
OCSSD.trc:
2023-10-24 13:28:50.936 : CLSF:3847264000: Opened hdl:0x7fff880a79a0 for dev:/dev/votedisk_location:
2023-10-24 13:28:50.936 : CSSD:3847264000: [ INFO] clssnmvDiskCreate: name /dev/votedisk_location blocksz 512
2023-10-24 13:28:50.937 : CSSD:3847264000: [ INFO] clssnmvDiskCreate: clguidoff_volinfo =44, siteoff_volinfo = 77
2023-10-24 13:28:50.937 : CSSD:3847264000: [ INFO] clssnmvDiskCreate: site guid during discovery = 00112233445566778899aabbccddeeff
2023-10-24 13:28:50.937 : CSSD:3847264000: [ INFO] clssnmFindVF: found VF by vdin in the configured queue
2023-10-24 13:28:50.937 : CSSD:3847264000: [ INFO] clssnmFindVF: Duplicate voting file found in the queue of previously configured disks queued(/dev/votedisk_location|[a7963586-f5f74ffd-bfbafe78-03f2a195]), found(/dev/votedisk_location|[a7963586-f5f74ffd-bfbafe78-03f2a195]), is not corrupted
2023-10-24 13:28:50.937 : CSSD:3847264000: [ INFO] clssnmvDiskCreate: Found a duplicate voting file with same file name and UID as the newly discovered disk /dev/votedisk_location. Rejecting the newly discovered disk.
2023-10-24 13:28:50.937 : default:3847264000: clssnmvDiskCreate:destroy_vdisk->vdisk: dump of 0x0x7fff8853e110, len 14728
2023-10-24 13:28:50.937 : default:3847264000: 0x0x7fff8853e110 10 e1 53 88 ff 7f 00 00 - 10 e1 53 88 ff 7f 00 00 ..S.......S.....
.....
2023-10-24 13:28:52.050 : CSSD:1931962112: [ INFO] clssnmHandleUpdate: sync[544556712] src[1], msgvers 4 icin 544556690
CLSB:1931962112: [ INFO] Oracle Clusterware infrastructure error in OCSSD (OS PID 4110914): Fatal signal 11 has occurred in program ocssd thread 1931962112; nested signal count is 1
Trace file /u01/grid/diag/crs/xxxx/crs/trace/ocssd.trc
Oracle Database 19c Clusterware Release 19.0.0.0.0 - Production
Version 19.13.0.0.0 Copyright 1996, 2021 Oracle. All rights reserved.
DDE: Flood control is not active
2023-10-24T13:28:52.055781+08:00
ohasd_cssdmonitor_root.trc
2023-10-24 13:29:05.145 : USRTHRD:3221223168: [ INFO] (:CLSN00121:)clsnproc_reboot: Impending reboot at 50% of limit 27560; disk timeout 27560, network timeout 27500, last heartbeat from CSSD at epoch seconds 1698125331.273, 13871 milliseconds ago based on invariant clock 706642797; now polling at 100 ms
...
2023-10-24 13:29:11.951 : USRTHRD:3221223168: [ INFO] (:CLSN00121:)clsnproc_reboot: Impending reboot at 75% of limit 27560; disk timeout 27560, network timeout 27500, last heartbeat from CSSD at epoch seconds 1698125331.273, 20681 milliseconds ago based on invariant clock 706642797; now polling at 100 ms
...
2023-10-24 13:29:16.156 : USRTHRD:3221223168: [ INFO] (:CLSN00121:)clsnproc_reboot: Impending reboot at 90% of limit 27560; disk timeout 27560, network timeout 27500, last heartbeat from CSSD at epoch seconds 1698125331.273, 24881 milliseconds ago based on invariant clock 706642797; now polling at 100 ms
2023-10-24 13:29:16.156 : USRTHRD:3221223168: [ INFO] clsnwork_queue: posting worker thread
2023-10-24 13:29:16.156 : USRTHRD:3221223168: [ INFO] clsnwork_queue: posting worker thread
Trace file /u01/grid/diag/crs/xxx/crs/trace/ohasd_cssdmonitor_root.trc
...
改动:
Droping disk from a diskgroup.
原因:
Bug 32382021
处理方法:
As a workaround you can take the action below:
SQL>alter system set events '15441 trace name context forever, level 0x8000000'
Then run run rebalance to expel the dropping disk.
Unset event.
SQL>alter system set events '15441 trace name context off'
2.You can download and apply the fix over the link below:
https://updates.oracle.com/download/32382021.html
|