11.2.0.4 [ins-41112] specified network interface doesnt maintain connectivity

郑全 · 发表于 2013-10-7 13:05:39

在linux 6.4 下面安装11.2.0.4 grid 时，在网络接口确认时，报以下错误：

[ins-41112] specified network interface doesnt maintain connectivity across cluster nodes

检查/etc/hosts文件，都灭有问题，而且相互ping 对方，也没有问题，

[root@szrac1 raw]# cat /etc/hosts
127.0.0.1 localhost
192.168.0.201 szrac1
192.168.0.202 szrac2

10.0.0.201 szrac1-priv
10.0.0.202 szrac2-priv

192.168.0.203 szrac1-vip
192.168.0.204 szrac2-vip

192.168.0.205 scan-ip

但在ssh 节点时，发现需要输入密码

[grid@szrac2 ~]$ ssh szrac1-priv date
The authenticity of host 'szrac1-priv (10.0.0.201)' can't be established.
RSA key fingerprint is 40:1b:01:3c:98:05:83:b8:39:b8:95:09:2f:62:8c:67.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'szrac1-priv,10.0.0.201' (RSA) to the list of known hosts.
Mon Oct 7 11:30:55 CST 2013
[grid@szrac2 ~]$ ssh szrac2-priv date
The authenticity of host 'szrac2-priv (10.0.0.202)' can't be established.
RSA key fingerprint is 40:1b:01:3c:98:05:83:b8:39:b8:95:09:2f:62:8c:67.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'szrac2-priv,10.0.0.202' (RSA) to the list of known hosts.
Mon Oct 7 11:31:04 CST 2013

作了这个后，问题依旧。

后看到这篇文章：

[INS-41112] Specified network interface doesnt maintain connectivity across cluster nodes. [ID 1427202.1]

--------------------------------------------------------------------------------

  修改时间 05-MAR-2012    类型 REFERENCE    状态 MODERATED

In this Document
  Purpose
  [INS-41112] Specified network interface doesnt maintain connectivity across cluster nodes.

--------------------------------------------------------------------------------

This document is being delivered to you via Oracle Support's Rapid Visibility (RaV) process and therefore has not been subject to an independent technical review.

Applies to:
Oracle Server - Enterprise Edition - Version: 11.2.0.1 and later [Release: 11.2 and later ]
Information in this document applies to any platform.

Purpose
The note lists problems, solutions or workarounds that's related to the following 11gR2 GI OUI error:

[FATAL] [INS-41112] Specified network interface doesnt maintain connectivity across cluster nodes.
CAUSE: Installer has detected that network interface eth1 does not maintain connectivity on all cluster nodes.
ACTION: Ensure that the chosen interface has been configured across all cluster nodes.

[INS-41112] Specified network interface doesnt maintain connectivity across cluster nodes.

[INS-41112] is a high level error number, the workarounds/solutions depend on the error code from lower layer, however, [INS-41112] does tell which interface is having the issue:

CAUSE: Installer has detected that network interface eth1 does not maintain connectivity on all cluster nodes.

## >> in this case, it's eth1 that's having connectivityissue

To find out lower layer error code, execute the following as grid user:

runcluvfy.sh comp nodecon -i <network-interface> -n <racnode1>,<racnode2>,<racnode3> -verbose

Refer to the following once CVU reports real error code:

?PRVF-7617
Refer to note 1335136.1 for details.

?PRVF-6020 : Different MTU values used across network interfaces in subnet "10.10.10.0"
Refer to note 1429104.1 for details.

郑全 · 发表于 2013-10-7 13:08:21

马上执行节点连同性检查：

grid@szrac1 grid]$ ./runcluvfy.sh comp nodecon -i eth0 -n szrac1,szrac2 -verbose

Verifying node connectivity

Checking node connectivity...

Checking hosts config file...
Node Name                             Status
------------------------------------ ------------------------
szrac2                                passed
szrac1                                passed

Verification of the hosts config file successful

Interface information for node "szrac2"
Name   IP Address      Subnet          Gateway         Def. Gateway    HW Address        MTU
------ --------------- --------------- --------------- --------------- ----------------- ------
eth0   192.168.0.202   192.168.0.0     0.0.0.0         192.168.0.1     08:00:27:AC:2E:55 1500
eth1   10.0.0.202      10.0.0.0        0.0.0.0         192.168.0.1     08:00:27:47:85:E9 1500

Interface information for node "szrac1"
Name   IP Address      Subnet          Gateway         Def. Gateway    HW Address        MTU
------ --------------- --------------- --------------- --------------- ----------------- ------
eth0   192.168.0.201   192.168.0.0     0.0.0.0         192.168.0.1     08:00:27:AC:2E:55 1500
eth1   10.0.0.201      10.0.0.0        0.0.0.0         192.168.0.1     08:00:27:47:85:E9 1500

Check: Node connectivity for interface "eth0"
Source Destination Connected?
------------------------------ ------------------------------ ----------------
szrac2[192.168.0.202] szrac1[192.168.0.201] yes
Result: Node connectivity passed for interface "eth0"

Check: TCP connectivity of subnet "192.168.0.0"
Source Destination Connected?
------------------------------ ------------------------------ ----------------
szrac1:192.168.0.201 szrac2:192.168.0.202 failed

ERROR:
PRVF-7617 : Node connectivity between "szrac1 : 192.168.0.201" and "szrac2 : 192.168.0.202" failed
Result: TCP connectivity check failed for subnet "192.168.0.0"

Checking subnet mask consistency...
Subnet mask consistency check passed for subnet "192.168.0.0".
Subnet mask consistency check passed for subnet "10.0.0.0".
Subnet mask consistency check passed.

Result: Node connectivity check failed

Verification of node connectivity was unsuccessful on all the specified nodes.

根据文章： ID 1427202.1 ，我们察看

Refer to note 1335136.1 for details.

郑全 · 发表于 2013-10-7 13:10:41

In this Document
Purpose
Details
Known Issues
To verify manually

When to ignore the error?
References
--------------------------------------------------------------------------------

Applies to:
Oracle Database - Enterprise Edition - Version 11.2.0.1 and later
Information in this document applies to any platform.

Purpose

The note is to list problems, solutions or workarounds that's related to the following error:

PRVF-7617: TCP connectivity check failed for subnet

OR

PRVF-7617 : Node connectivity between "racnode1 : 10.10.10.148" and "racnode2 : 10.10.10.149" failed

TCP connectivity check failed for subnet "10.10.10.0"

OR

PRVF-7616 : Node connectivity failed for subnet "10.10.16.0" between "racnode1 - eth5 : 10.10.16.109" and "racnode2 - eth5 : 10.10.16.121"

Result: Node connectivity failed for subnet "10.10.16.0"

When the error happens, likely OUI will report:

[INS-41110] Specified network interface doesnt maintain connectivity across cluster nodes.
[INS-41112] Specified network interface doesnt maintain connectivity across cluster nodes.

Details

Known Issues

?bug 12849377 - CVU should check only selected network interfaces (ignore "do not use")

CVU checks network interfaces that's marked "do not use", fixed in 11.2.0.3 GI PSU1

?bug 9952812 - CVU SHOULD RETURN WARNING INSTEAD OF FATAL ERROR FOR VIRBR0

Happens on Linux if network adapter virbr0 exists, fixed in 11.2.0.3.

The fix introduces new CVU parameter (-network) to check only specified networks:

runcluvfy.sh stage -pre crsinst -n <racnode1>,<racnode2> -networks "eth*" -verbose

?bug 11903488 - affects Solaris only, fixed 11.2.0.3

As Solaris does not support the socket option SO_RCVTIMEO, TCP server fails to start:

In this example, racnode1 is nodename and 10.1.0.11 is the IP to test connectivity:

/tmp/CVU_<version>_<user>/exectask.sh -runTCPserver racnode1 10.1.0.11

<CV_ERR>location:prvnconss1 opname:free port unavailable category:0 DepInfo: 99</CV_ERR>
<CV_LOG>Exectask:runTCPServer failed</CV_LOG>
..
<CV_ERR>Error running TCP server</CV_ERR>

bug 11903488 also remove the port range of 49900-50000 to use the first available
exectask.sh -chkTCPclient <server> <server-IP> <server-port> <client> <client-IP>

?bug 12353524 - affects hp-ux only, fixed in 11.2.0.3

<CV_ERR>location:prvnconcc3 opname:client to server connection fail
category:0 otherInfo: Client to server connection failed, errno: 227
DepInfo: 227</CV_ERR> <CV_VAL>-1</CV_VAL>
<CV_LOG>Exectask:chkTCPClient failed</CV_LOG> <CV_VRES>1</CV_VRES>
<CV_ERR>Error checking TCP communication</CV_ERR>
<CV_ERES>1</CV_ERES>

?bug 12608083 - affects Windows only, fixed in 11.2.0.3

When more than one network interface are on the same subnet, it is possible that the wrong interface is used to verify TCP connectivity.

?bug 10106374 - affects Windows only, fixed in 11.2.0.2

Refer to note 1286394.1 for details.

?bug 16953470 - affects Solaris only, happens when "hostmodel" is set to strong

CVU trace:
[7041@racnode1] [Thread-408] [ 2013-06-13 12:41:17.772 GMT+04:00 ] [StreamReader.run:65] OUTPUT><CV_CMD>/usr/sbin/ping -i 192.168.169.2 192.168.169.2 3 </CV_CMD><CV_VAL>/usr/sbin/ping: sendto Network is unreachable

Manually run the "ping -i" command, receives same error

To find out current "hostmodel":

# ipadm show-prop -p hostmodel ip
PROTO PROPERTY PERM CURRENT PERSISTENT DEFAULT POSSIBLE
ipv6 hostmodel rw weak weak weak strong, src-prio, rity, weak
ipv4 hostmodel rw weak weak weak strong, src-prio, rity, weak

To change hostmodel:

ipadm set-prop -p hostmodel=weak ipv4
ipadm set-prop -p hostmodel=weak ipv6

The workaround is to set hostmodel to weak
In addition, Solaris bug 16827053 is open to fix on OS level.

?bug 17043435

The bug is closed as duplicate of internal bug 17070860 which is fixed in 11.2.0.4

To verify manually

Repeat the following for each interface as grid user:

runcluvfy.sh comp nodecon -i <network-interface> -n <racnode1>,<racnode2>,<racnode3> -verbose

When to ignore the error?

If the error happened on network that's not related to Oracle Clusterware, it can be ignored, i.e. if happened on administrative network and not affecting anything, it can be ignored.

郑全 · 发表于 2013-10-7 13:15:10

通过这个文档，也没有看出什么问题，但看到另一篇文档，

郑全 · 发表于 2013-10-7 13:15:37

PRVF-7617 Cluster Verify Fails For Private Network if Firewall Exists (文档 ID 1357657.1)

In this Document
Symptoms
Cause
Solution
References
--------------------------------------------------------------------------------

Applies to:
Oracle Database - Enterprise Edition - Version 11.2.0.1 and later
Information in this document applies to any platform.

Symptoms

During Cluster Verification, a part of cluster installation, the connectivity check between nodes may fail with the following errors

Check: TCP connectivity of subnet "10.0.0.0"
Source Destination Connected?
------------------------------ ------------------------------ ----------------
racnode01:10.0.0.1 racnode02:10.0.0.2 failed

ERROR:
PRVF-7617 : Node connectivity between "racnode01 : 10.0.0.1" and "racnode02 : 10.0.0.2" failed
Result: TCP connectivity check failed for subnet "10.0.0.0"

This may occur on any of the interconnects

Cause

iptables (a Linux firewall) is active between the nodes, blocking network traffic on the cluster interconnect network.

Solution

A temporary solution is to disable iptables. A more permament soution, if iptables is required, is to configure the iptables such that it does not block interconnect traffic(no firewall should exist between cluster nodes).

To disable iptables, use the following commands as root:

For IPV4:

service iptables save
service iptables stop
chkconfig iptables off

For IPV6:

service ip6tables save
service ipt6ables stop
chkconfig ip6tables off

Note: IPV6 is not supported with Oracle Clusterware/RAC 11gR2

郑全 · 发表于 2013-10-7 13:18:32

看到这里，突然想到，我的防火墙没有关，只是关闭了selinux，但没有关闭防火墙。马上关闭防火墙，再测试网络的联通性，

成功，同时，再去安装界面，问题搞定。

[grid@szrac1 grid]$ ./runcluvfy.sh comp nodecon -i eth0 -n szrac1,szrac2 -verbose

Verifying node connectivity

Checking node connectivity...

Checking hosts config file...
Node Name                             Status
------------------------------------ ------------------------
szrac2                                passed
szrac1                                passed

Verification of the hosts config file successful

Interface information for node "szrac2"
Name   IP Address      Subnet          Gateway         Def. Gateway    HW Address        MTU
------ --------------- --------------- --------------- --------------- ----------------- ------
eth0   192.168.0.202   192.168.0.0     0.0.0.0         192.168.0.1     08:00:27:AC:2E:55 1500
eth1   10.0.0.202      10.0.0.0        0.0.0.0         192.168.0.1     08:00:27:47:85:E9 1500

Interface information for node "szrac1"
Name   IP Address      Subnet          Gateway         Def. Gateway    HW Address        MTU
------ --------------- --------------- --------------- --------------- ----------------- ------
eth0   192.168.0.201   192.168.0.0     0.0.0.0         192.168.0.1     08:00:27:AC:2E:55 1500
eth1   10.0.0.201      10.0.0.0        0.0.0.0         192.168.0.1     08:00:27:47:85:E9 1500

Check: Node connectivity for interface "eth0"
Source Destination Connected?
------------------------------ ------------------------------ ----------------
szrac2[192.168.0.202] szrac1[192.168.0.201] yes
Result: Node connectivity passed for interface "eth0"

Check: TCP connectivity of subnet "192.168.0.0"
Source Destination Connected?
------------------------------ ------------------------------ ----------------
szrac1:192.168.0.201 szrac2:192.168.0.202 passed
Result: TCP connectivity check passed for subnet "192.168.0.0"

Checking subnet mask consistency...
Subnet mask consistency check passed for subnet "192.168.0.0".
Subnet mask consistency check passed for subnet "10.0.0.0".
Subnet mask consistency check passed.

Result: Node connectivity check passed

Verification of node connectivity was successful.

郑全 · 发表于 2013-10-7 13:25:05

至此，问题搞定。

这个问题，之所以出现，就是没有任何文档，随心安装所致，没有作任何准备。

在安装rac之前，一定要做充分准备，最好有一篇参考安装文档，这样，就不会在安装过程中，不停的出现错误。

杨芳超 · 发表于 2013-10-7 13:55:39

必须顶……

郑全 · 发表于 2013-10-7 16:44:50

最后说一下，防火墙不是必须关闭的，可以不关闭，但要允许私网 multicast通过。

帐号		自动登录	找回密码
密码			注册

11.2.0.4 [ins-41112] specified network interface doesnt maintain connectivity

节点连同性检查，出现问题

PRVF-7617: TCP connectivity check failed for subnet (文档 ID 1335136.1)

PRVF-7617 Cluster Verify Fails For Private Network if Firewall Exists (文档 I