重庆思庄Oracle、Redhat认证学习论坛

标题: Intermittent ORA-25408 and TNS-12535 Errors When no Failover Has Occurred (Do... [打印本页]

作者: 刘泽宇    时间: 2023-4-2 17:52
标题: Intermittent ORA-25408 and TNS-12535 Errors When no Failover Has Occurred (Do...
SYMPTOMS
On : 11.2.0.4.0 version, OCI JDBC driver

Intermittently, all application batch jobs running against an Oracle 11.2.0.4 database start failing with the below error, even when no failover has occurred:


ERROR:DATASERVICES JDBCService.restore Error: ORA-25408: can not safely replay call
ERROR:root ORA-25408: can not safely replay call

At the same time, the database alert log file shows the errors:

Fatal NI connect error 12170.

VERSION INFORMATION:
TNS for Linux: Version 11.2.0.4.0 - Production
Oracle Bequeath NT Protocol Adapter for Linux: Version 11.2.0.4.0 - Production
TCP/IP NT Protocol Adapter for Linux: Version 11.2.0.4.0 - Production
Time: 04-FEB-2016 23:58:16
Tracing not turned on.
Tns error struct:
ns main err code: 12535

TNS-12535: TNS:operation timed out
ns secondary err code: 12560
nt main err code: 505

TNS-00505: Operation timed out
nt secondary err code: 110
nt OS err code: 0
Client address: (ADDRESS=(PROTOCOL=tcp)(HOST=<HOSTNAME>)(PORT=<PORT>))


CAUSE
The issue is caused by timers expiring which triggers the assumption that a fail over has occurred.

The database server host is unreachable or took more than the default TCP connect timeout duration (60 seconds) for the connection to be established. When this happens, ORA-12170: TNS:Connect timeout occurred errors are expected to occur.
This is explained in Database Net Services Reference Guide.
In this case, the alert log reports the 12170 errors which confirm that there is a timing issue between the client and server.
The TNS-12535 error is commonly a timeout error associated with Firewalls or slow Networks.

The ORA-25408 can be generated on the client when TAF is enabled even though there has been no failover. This is discussed in Document:1985471.1 ORA-25408 generated intermittently on Client but Database has not "failed" or "switched" over.
This document explains the scenario in this case. The connection could not be established in the allocated time by SQLNET.INBOUND_CONNECT_TIMEOUT in the database, and it caused the client to incorrectly assume a fail over has occurred, which in turn generates the ORA-25408 error.


SOLUTION
1. Add or edit the following line in the sqlnet.ora file of the Oracle Client used with JDBC OCI:
SQLNET.OUTBOUND_CONNECT_TIMEOUT = 10

Trial and testing will show whether this value should be increased further.

2. When dealing with a firewall or network communication issues, implement DCD ( Dead Connection Detection).
DCD works by having the server send a probe packet to the client when inactivity is detected ( SQLNET.EXPIRE_TIME timeout ). The JDBC connection clause "(enable=broken)" ensures that the JDBC driver responses to the probe. This not only confirms to the server that the client is alive but in the process creates traffic on the socket , this will reset the inactivity on the socket and should prevent the firewall from prematurely closing it.
See: JDBC developer's guide section D Troubleshooting - Using JDBC with Firewalls .

The ENABLE=BROKEN clause should be set like:

jdbc:oracle:oci:@(DESCRIPTION=(ENABLE=BROKEN)(ADDRESS=(PROTOCOL=tcp)(PORT=<PORT>)(HOST=<HOST NAME>))(CONNECT_DATA=(SERVICE_NAME=<SERVICE NAME>)))

Along with this, SQLNET.EXPIRE_TIME=10 should be set in the database sqlnet.ora. This will enable Dead Connection detection.






欢迎光临 重庆思庄Oracle、Redhat认证学习论坛 (http://bbs.cqsztech.com/) Powered by Discuz! X3.2