重庆思庄Oracle、KingBase、PostgreSQL、Redhat认证学习论坛

 找回密码
 注册

QQ登录

只需一步,快速开始

搜索
查看: 582|回复: 0
打印 上一主题 下一主题

[Oracle] Instance Hang with Events 'log file parallel write' 'LGWR any worker group'

[复制链接]
跳转到指定楼层
楼主
发表于 2025-5-25 10:54:11 | 只看该作者 回帖奖励 |倒序浏览 |阅读模式
现象:
One of Real Application Cluster instances was hanging with 'log file parallel write', 'LGWR worker group ordering', etc.

2025-02-06T10:26:33.245053+08:00
LG00 (ospid: 32135) waits for event 'log file parallel write' for 79 secs.
2025-02-06T10:26:35.686384+08:00
LG01 (ospid: 32150) waits for event 'LGWR worker group ordering' for 81 secs.
2025-02-06T10:26:40.966712+08:00
LGWR (ospid: 32131) waits for event 'LGWR any worker group' for 79 secs.
2025-02-06T10:26:50.044534+08:00
LGWR (ospid: 32131) waits for event 'LGWR any worker group' for 88 secs.

...

2025-02-06T10:31:22.155761+08:00
LG00 (ospid: 32135) waits for event 'log file parallel write' for 176 secs.
2025-02-06T10:31:24.796786+08:00
LGWR (ospid: 32131) waits for event 'LGWR all worker groups' for 179 secs.
2025-02-06T10:31:33.260291+08:00
LGWR (ospid: 32131) waits for event 'LGWR all worker groups' for 187 secs.
2025-02-06T10:31:42.212506+08:00
LGWR (ospid: 32131) waits for event 'LGWR all worker groups' for 196 secs.
2025-02-06T10:31:47.138127+08:00
LG00 (ospid: 32135) waits for event 'log file parallel write' for 201 secs.
2025-02-06T10:31:51.260376+08:00
LGWR (ospid: 32131) waits for event 'LGWR all worker groups' for 205 secs.

...

2025-02-06T10:55:44.022923+08:00
LGWR (ospid: 32131) waits for event 'log file parallel write' for 198 secs.
2025-02-06T10:55:54.062963+08:00
LGWR (ospid: 32131) waits for event 'log file parallel write' for 208 secs.
2025-02-06T10:56:03.942882+08:00
LGWR (ospid: 32131) waits for event 'log file parallel write' for 218 secs.
2025-02-06T10:56:13.974936+08:00
LGWR (ospid: 32131) waits for event 'log file parallel write' for 228 secs.

Following  were reported in OS log:

Feb 6 10:24:57 <Host Name> kernel: qla2xxx [0000:3b:00.1]-801c:16: Abort command issued nexus=16:0:3 -- 1 2002.
Feb 6 10:24:57 <Host Name> kernel: qla2xxx [0000:3b:00.1]-8009:16: DEVICE RESET ISSUED nexus=16:0:0 cmd=ffff8df3a124e548.
...
Feb 6 10:25:44 <Host Name> kernel: qla2xxx [0000:3b:00.1]-8009:16: DEVICE RESET ISSUED nexus=16:0:0 cmd=ffff8df3aad27548.
Feb 6 10:25:44 <Host Name> kernel: qla2xxx [0000:3b:00.1]-800e:16: DEVICE RESET SUCCEEDED nexus:16:0:0 cmd=ffff8df3aad27548.
Feb 6 10:25:44 <Host Name> kernel: qla2xxx [0000:3b:00.1]-8009:16: DEVICE RESET ISSUED nexus=16:0:1 cmd=ffff8ddf73eacd48.
Feb 6 10:25:44 <Host Name> kernel: qla2xxx [0000:3b:00.1]-800e:16: DEVICE RESET SUCCEEDED nexus:16:0:1 cmd=ffff8ddf73eacd48.
Feb 6 10:25:44 <Host Name> kernel: sd 16:0:0:1: [sdc] tag#1 FAILED Result: hostbyte=DID_RESET driverbyte=DRIVER_OK
Feb 6 10:25:44 <Host Name> kernel: sd 16:0:0:1: [sdc] tag#1 CDB: Write(16) 8a 00 00 00 00 00 00 0b a3 90 00 00 00 80 00 00
Feb 6 10:25:44 <Host Name> kernel: print_req_error: I/O error, dev sdc, sector 762768
Feb 6 10:25:44 <Host Name> kernel: device-mapper: multipath: Failing path 8:32.
Feb 6 10:25:44 <Host Name> kernel: sd 16:0:0:0: alua: port group 3e8 state A non-preferred supports TolUsNA
Feb 6 10:25:44 <Host Name> multipathd: sdc: mark as failed
Feb 6 10:25:44 <Host Name> multipathd: 3600a09803831465875245171634e6143: remaining active paths: 3
Feb 6 10:25:44 <Host Name> kernel: sd 16:0:0:1: alua: port group 3e8 state A non-preferred supports TolUsNA
Feb 6 10:25:44 <Host Name> kernel: sd 16:0:1:1: alua: port group 3e9 state N non-preferred supports TolUsNA
Feb 6 10:25:44 <Host Name> kernel: sd 16:0:1:0: alua: port group 3e9 state N non-preferred supports TolUsNA
Feb 6 10:25:44 <Host Name> kernel: sd 15:0:0:0: alua: port group 3e8 state A non-preferred supports TolUsNA
Feb 6 10:25:44 <Host Name> multipathd: 3600a09803831465875245171634e6143: sdc - tur checker reports path is up
Feb 6 10:25:44 <Host Name> kernel: device-mapper: multipath: Reinstating path 8:32.
Feb 6 10:25:44 <Host Name> multipathd: 8:32: reinstated
Feb 6 10:25:44 <Host Name> multipathd: 3600a09803831465875245171634e6143: remaining active paths: 4





原因:
'log file parallel write' is the last event of database end after which I/O subsystem is the next one to respond for I/O requests.

In this case the cause is a fault on the module of hardware HBA card on the machine.
And it affected multipath shared disks used by the RAC cluster.

As a related result, trace file of background process LG00 reported:

*** 2025-02-06T10:31:22.723141+08:00 (CDB$ROOT(1))
Warning: log write elapsed time 27423ms, size 1KB

*** 2025-02-06T10:32:25.697906+08:00 (CDB$ROOT(1))
Warning: log write elapsed time 30628ms, size 6KB

*** 2025-02-06T10:35:33.090846+08:00 (CDB$ROOT(1))
Warning: log write elapsed time 31122ms, size 1KB

...



处理方法:
1. Stop database instance/GI on the node.

2. Replace the hardware HBA module.

3. Start GI / database instance.

分享到:  QQ好友和群QQ好友和群 QQ空间QQ空间 腾讯微博腾讯微博 腾讯朋友腾讯朋友
收藏收藏 支持支持 反对反对
回复

使用道具 举报

您需要登录后才可以回帖 登录 | 注册

本版积分规则

QQ|手机版|小黑屋|重庆思庄Oracle、Redhat认证学习论坛 ( 渝ICP备12004239号-4 )

GMT+8, 2026-5-1 21:53 , Processed in 0.242202 second(s), 20 queries .

重庆思庄学习中心论坛-重庆思庄科技有限公司论坛

© 2001-2020

快速回复 返回顶部 返回列表