Exadata: Node or Instance evictions due to RDS connectivity issues after the DB node upgrade to 12.2.1.1.0 or higher (文档 ID 2270319.1) 转到底部转到底部
In this Document
Symptoms
Changes
Cause
Solution
References
APPLIES TO:
Oracle Database - Enterprise Edition - Version 12.1.0.1 to 12.1.0.1 [Release 12.1]
Oracle Exadata Storage Server Software - Version 12.2.1.1.0 to 12.2.1.1.1 [Release 12.2]
Oracle Database Cloud Schema Service - Version N/A and later
Oracle Database Exadata Cloud Machine - Version N/A and later
Oracle Cloud Infrastructure - Database Service - Version N/A and later
Information in this document applies to any platform.
SYMPTOMS
Node or DB instance evictions due to RDS connectivity issues after Exadata storage software upgrade to 12.2.1.1.0 or newer
diskmon.trc:
2017-04-19 15:23:23.985215*: osswait failed: context 0x7ff1d03d97d0 childctx 0x7ff1d03d97d0 timeout 5000 errorcode 38 2017-04-19 15:23:24.343159 : oss_wait done, no request to return. rcode=Process timedout when waiting for I/O completions (46)
2017-04-19 15:23:24.343192 : oss_wait called for request: 0x7ff1d03fa780
2017-04-19 15:24:24.343510 : ossnet_wait_all: WAITED TOO LONG for network request completion: 60000. init_timeout: 4294967295 remaining_timeout: 4294907295
Box 0x7ff1d03f9f80 my_box_refid: 0 source_id: 82127097 (box inc: 10)
Request Flags - 0000000c Callback Context - 0x7ff1c4040e80 Reconnect Time - 0
msec Num. Reconnects - 0
.
Message 0x7ff1c40413d8 with flags 0000000c
RQ_Tag_82127097_7249: RefId - 0, Last Reply Frag - 0
Reply PTR - (nil), expected size - 0, actual size - 0
Message has not been reaped
Command 0x7ff1c4041418 with flags 80000001
Payload Ptr - 0x7ff17c022d00, payload size - 32
RQ_Tag_82127097_7249: Command name IOCTL, refid 7249
Ioctl arguments fd 0 opcode 184 size 32
Reply 0x7ff1c40414d0 with flags 80000000
Payload Ptr - (nil), payload size - 0
Invalid command: Command name UNINITIALIZED COMMAND CODE, refid 0
Number of pollfds - 0
Poll list is not dirty
QOS level requested = 0
QOS support is available
Num pending netmsg - 1
ossnet_setup_connection failed - 0
/var/log/messages file on the OS side would indicate several RDS reconnect messages with vendor error 0xd7 or 0x8a
/var/log/message (DB Node):
Apr 19 15:23:19 dbnode01 kernel: [478147.056379] RDS/IB: send completion <10.nnn.nnn.49,10.nnn.nnn.85,4> status 9 vendor_err 0x8a, disconnecting and reconnecting
Apr 19 15:23:19 dbnode01 kernel: [478147.056567] RDS/IB: connection <10.nnn.nnn.49,10.nnn.nnn.85,4> dropped due to 'DISCONNECTED event'
/var/log/message (Cell):
Apr 19 15:23:19 cel01 kernel: [1062478.077722] RDS/IB: recv completion <10.nnn.nnn.85,10.nnn.nnn.49,4> had status 1 vendor_err 0xd7, disconnecting and reconnecting
Apr 19 15:23:19 cel01 kernel: [1062478.077730] RDS/IB: connection <10.nnn.nnn.85,10.nnn.nnn.49,4> dropped due to 'recv completion error'
Other symptoms include, Nodes not rejoining cluster after crs restarts.
CHANGES
Upgrade to Storage software version 12.2.1.1.0 or newer from a previous release.
CAUSE
Bug 25920916
This is identified as a rolling upgrade issue to 12.2.1.1.0 or newer which comes with UEK4 kernel.
UEK4 kernel uses 16K fragment size RDS connections, where as the UEK2 kernel uses 4K frag size. During the rolling upgrade, there is a possibility of a 4KB buffer getting into the RDS 16KB frag cache, which creates the RDS connection issues.
SOLUTION
This kernel fix is included in the Exadata storage software maintenance releases 12.2.1.1.1.170605 and 12.2.1.1.2
Potential workaround is to reboot all the affected database nodes