重庆思庄Oracle、KingBase、PostgreSQL、Redhat认证学习论坛

标题: Rebalance Operation in RECO Diskgroup is Failing With ORA-59048 [打印本页]

作者: 刘泽宇 时间: 2025-10-26 18:17
标题: Rebalance Operation in RECO Diskgroup is Failing With ORA-59048
现象：
1.Performing a disk remove from a diskgroup

SQL> ALTER DISKGROUP OCR drop DISK OCR_000n rebalance POWER n;

2.The following error reported during rebalance in RBAL trace file:+ASM2_rbal_xxxx.trc

*** 2023-10-24T13:28:49.980843+08:00
RBAL processing file +OCR.258.1098786551 HARDCHECK on:0 flags:0x0
RBAL processing file +OCR.259.1151017003 HARDCHECK on:0 flags:0x0
RBAL processing file +OCR.260.1150037575 HARDCHECK on:0 flags:0x0
RBAL processing file +OCR.261.1150930583 HARDCHECK on:0 flags:0x0
RBAL processing file +OCR.262.1150642517 HARDCHECK on:0 flags:0x0
RBAL processing file +OCR.263.1139565669 HARDCHECK on:0 flags:0x0
RBAL processing file +OCR.264.1151060211 HARDCHECK on:0 flags:0x0
RBAL processing file +OCR.266.1151031403 HARDCHECK on:0 flags:0x0
RBAL processing file +OCR.267.1151045807 HARDCHECK on:0 flags:0x0
KFGB remirror; pri=0
kfgbScheduleExpel: REBAL gn: 3 xplsta 0 xpcnt 0
NOTE: stopping process ARB0
NOTE: stopping process ARBA
waiting for ARB0 rundown on group 3
kfgbTryFn: failed to acquire HD.3.0 in 6 for kfgbVotingFileRelocateNow (of group 3/0xf916c2fe)

*** 2023-10-24T13:28:52.180445+08:00
waiting for expel rundown on group 3
kfgbTryFn: failed to acquire HD.3.0 in 6 for kfgbVotingFileRelocateNow (of group 3/0xf916c2fe)

*** 2023-10-24T13:28:55.188585+08:00
waiting for expel rundown on group 3
kfgbTryFn: failed to acquire HD.3.0 in 6 for kfgbVotingFileRelocateNow (of group 3/0xf916c2fe)

*** 2023-10-24T13:28:58.196504+08:00
waiting for expel rundown on group 3
kfgbTryFn: failed to acquire HD.3.0 in 6 for kfgbVotingFileRelocateNow (of group 3/0xf916c2fe)

*** 2023-10-24T13:29:01.204473+08:00
waiting for expel rundown on group 3
kfgbTryFn: failed to acquire HD.3.0 in 6 for kfgbVotingFileRelocateNow (of group 3/0xf916c2fe)

*** 2023-10-24T13:29:04.212519+08:00
waiting for expel rundown on group 3
kfgbTryFn: failed to acquire HD.3.0 in 6 for kfgbVotingFileRelocateNow (of group 3/0xf916c2fe)

*** 2023-10-24T13:29:07.221553+08:00
waiting for expel rundown on group 3
kfgbTryFn: failed to acquire HD.3.0 in 6 for kfgbVotingFileRelocateNow (of group 3/0xf916c2fe)

*** 2023-10-24T13:29:10.228604+08:00
waiting for expel rundown on group 3
kfgbTryFn: failed to acquire HD.3.0 in 6 for kfgbVotingFileRelocateNow (of group 3/0xf916c2fe)

*** 2023-10-24T13:29:13.236640+08:00
waiting for expel rundown on group 3
kfgbTryFn: failed to acquire HD.3.0 in 6 for kfgbVotingFileRelocateNow (of group 3/0xf916c2fe)

3.Checking gv$asm_operation, you will see the ora-59048 reported in ERROR_CODE column.

select * from gv$asm_operation;

  INST_ID GROUP_NUMBER OPERA PASS    STAT    POWER    ACTUAL    SOFAR  EST_WORK EST_RATE EST_MINUTES ERROR_CODE
---------- ---------- ----------- --------------------------------------------
      2          5 REBAL REBALANCE ERRS       5                               ORA-59048

OS may also terminated by ocssd monitor process and as ocssd termiated with the following errors if the action performed related to vote disks :

OCSSD.trc:

2023-10-24 13:28:50.936 : CLSF:3847264000: Opened hdl:0x7fff880a79a0 for dev:/dev/votedisk_location:

2023-10-24 13:28:50.936 : CSSD:3847264000: [ INFO] clssnmvDiskCreate: name /dev/votedisk_location blocksz 512

2023-10-24 13:28:50.937 : CSSD:3847264000: [ INFO] clssnmvDiskCreate: clguidoff_volinfo =44, siteoff_volinfo = 77

2023-10-24 13:28:50.937 : CSSD:3847264000: [ INFO] clssnmvDiskCreate: site guid during discovery = 00112233445566778899aabbccddeeff

2023-10-24 13:28:50.937 : CSSD:3847264000: [ INFO] clssnmFindVF: found VF by vdin in the configured queue

2023-10-24 13:28:50.937 : CSSD:3847264000: [ INFO] clssnmFindVF: Duplicate voting file found in the queue of previously configured disks queued(/dev/votedisk_location|[a7963586-f5f74ffd-bfbafe78-03f2a195]), found(/dev/votedisk_location|[a7963586-f5f74ffd-bfbafe78-03f2a195]), is not corrupted

2023-10-24 13:28:50.937 : CSSD:3847264000: [ INFO] clssnmvDiskCreate: Found a duplicate voting file with same file name and UID as the newly discovered disk /dev/votedisk_location. Rejecting the newly discovered disk.

2023-10-24 13:28:50.937 : default:3847264000: clssnmvDiskCreate:destroy_vdisk->vdisk: dump of 0x0x7fff8853e110, len 14728

2023-10-24 13:28:50.937 : default:3847264000: 0x0x7fff8853e110 10 e1 53 88 ff 7f 00 00 - 10 e1 53 88 ff 7f 00 00 ..S.......S.....

.....

2023-10-24 13:28:52.050 : CSSD:1931962112: [ INFO] clssnmHandleUpdate: sync[544556712] src[1], msgvers 4 icin 544556690

CLSB:1931962112: [ INFO] Oracle Clusterware infrastructure error in OCSSD (OS PID 4110914): Fatal signal 11 has occurred in program ocssd thread 1931962112; nested signal count is 1

Trace file /u01/grid/diag/crs/xxxx/crs/trace/ocssd.trc

Oracle Database 19c Clusterware Release 19.0.0.0.0 - Production

Version 19.13.0.0.0 Copyright 1996, 2021 Oracle. All rights reserved.

DDE: Flood control is not active

2023-10-24T13:28:52.055781+08:00

ohasd_cssdmonitor_root.trc

2023-10-24 13:29:05.145 : USRTHRD:3221223168: [ INFO] (:CLSN00121:)clsnproc_reboot: Impending reboot at 50% of limit 27560; disk timeout 27560, network timeout 27500, last heartbeat from CSSD at epoch seconds 1698125331.273, 13871 milliseconds ago based on invariant clock 706642797; now polling at 100 ms

...

2023-10-24 13:29:11.951 : USRTHRD:3221223168: [ INFO] (:CLSN00121:)clsnproc_reboot: Impending reboot at 75% of limit 27560; disk timeout 27560, network timeout 27500, last heartbeat from CSSD at epoch seconds 1698125331.273, 20681 milliseconds ago based on invariant clock 706642797; now polling at 100 ms

...

2023-10-24 13:29:16.156 : USRTHRD:3221223168: [ INFO] (:CLSN00121:)clsnproc_reboot: Impending reboot at 90% of limit 27560; disk timeout 27560, network timeout 27500, last heartbeat from CSSD at epoch seconds 1698125331.273, 24881 milliseconds ago based on invariant clock 706642797; now polling at 100 ms

2023-10-24 13:29:16.156 : USRTHRD:3221223168: [ INFO] clsnwork_queue: posting worker thread

2023-10-24 13:29:16.156 : USRTHRD:3221223168: [ INFO] clsnwork_queue: posting worker thread

Trace file /u01/grid/diag/crs/xxx/crs/trace/ohasd_cssdmonitor_root.trc

...

改动：
Droping disk from a diskgroup.

原因：
Bug 32382021

处理方法：
As a workaround you can take the action below:

SQL>alter system set events '15441 trace name context forever, level 0x8000000'
Then run run rebalance to expel the dropping disk.

Unset event.

SQL>alter system set events '15441 trace name context off'
2.You can download and apply the fix  over the link below:

https://updates.oracle.com/download/32382021.html

欢迎光临重庆思庄Oracle、KingBase、PostgreSQL、Redhat认证学习论坛 (http://bbs.cqsztech.com/)