郑全 发表于 2020-6-23 15:52
手工去执行启动 ASM,也失败。
[size=130%]ASM on Non-First Node (Second or Others) Fails to Start: PMON (ospid: nnnn): terminating the instance due to error 481 (文档 ID 1383737.1) |
In this Document
Applies to: Oracle Database - Enterprise Edition - Version 11.2.0.1 and later Oracle Database Exadata Cloud Machine - Version N/A and later Oracle Cloud Infrastructure - Database Service - Version N/A and later Oracle Database Cloud Exadata Service - Version N/A and later Oracle Database Cloud Schema Service - Version N/A and later Information in this document applies to any platform. PurposeThis note lists common causes of ASM start up failure with the following error on non-first node (second or others):
lmon registered with NM - instance number 2 (internal mem no 1) If ASM on non-first node was running previously, likely the following will be in alert.log when it failed originally: ..
kjzdattdlm: Can not attach to DLM (LMON up=[TRUE], DB mounted=[FALSE]).
LMON (ospid: 15986) detects hung instances during IMR reconfiguration If the issue happens while running root script (root.sh or rootupgrade.sh) as part of Grid Infrastructure installation/upgrade process, the following symptoms will present:
Start of resource "ora.asm" failed
2011-11-29 15:56:48: Executing cmd: /g01/app/11.2.0.3/bin/crsctl start resource ora.asm -init
Scope DetailsCase1: link local IP (169.254.x.x) is being used by other adapter/networkSymptoms:
[/ocw/grid/bin/orarootagent.bin(4813)]CRS-5018:(:CLSN00037:) Removed unused HAIP route: 169.254.x.x / 255.255.255.0 / 0.0.0.0 / usb0
Dec 6 06:11:14 racnode1 dhclient: DHCPREQUEST on usb0 to 255.255.255.255 port 67
.. Solution: Link local IP must not be used by any other network on cluster nodes. In this case, an USB network device gets IP 169.254.x.x from DHCP server which disrupted HAIP routing, and solution is to black list the device in udev from being activated automatically. Dell iDRAC service module may use link local, engage Dell to change the subnet. On Sun T series, by default, ILOM (adapter name usbecm0) uses link local, engage Oracle Support for advice. Case2: firewall exists between nodes on private network (iptables etc)No firewall is allowed on private network (cluster_interconnect) between nodes including software firewall like iptables, ipmon etc Case3: HAIP is up on some nodes but not on allSymptoms:
Cluster communication is configured to use the following interface(s) for this instance
Cluster communication is configured to use the following interface(s) for this instanceSolution: The solution is to bring up HAIP on all nodes. To find out HAIP status, execute the following on all nodes: $GRID_HOME/bin/crsctl stat res ora.cluster_interconnect.haip -init If it's offline, try to bring it up as root: $GRID_HOME/bin/crsctl start res ora.cluster_interconnect.haip -init If HAIP fails to start, refer to Note 1210883.1 for known issues. Once HAIP is restarted, ASM/DB instances need to be restarted to use HAIP; if OCR is on ASM DG, GI needs to be restarted. If the "up node" is not using HAIP, and no outage is allowed, the workaround is to set init.ora/spfile parameter cluster_interconnect to the private IP of each node to allow ASM/DB to come up on "down node". Once a maintenance window is planned, the parameter must be removed to allow HAIP to work. The following article may assist in determining the reason for the failure to start HAIP: note 1640865.1 - Known Issues: Grid Infrastructure Redundant Interconnect and ora.cluster_interconnect.haip If the issue happened in the middle of GI upgrade, refer to: note 2063676.1 - rootupgrade.sh fails on node1 as HAIP was not starting from old home but starting from new home Case4: HAIP is up on all nodes but some do not have route infoSymptoms:
Cluster communication is configured to use the following interface(s) for this instance
netstat -rnSolution: The solution is to manually add HAIP route info on the nodes that's missing: 4.1. Execute "netstat -rn" on any node that has HAIP route info and locate the following: 169.254.0.0 0.0.0.0 255.255.0.0 U 0 0 0 bond2 Note: the first field is HAIP subnet ID and will start with 169.254.xxx.xxx, the third field is HAIP subnet netmask and the last field is private network adapter name 4.2. Execute the following as root on the node that's missing HAIP route: # route add -net <HAIP subnet ID> netmask <HAIP subnet netmask> dev <private network adapter> i.e. # route add -net 169.254.x.x netmask 255.255.0.0 dev bond2 4.3. Start ora.crsd as root on the node that's partial up:. # $GRID_HOME/bin/crsctl start res ora.crsd -init The other workaround is to restart GI on the node that's missing HAIP route with "crsctl stop crs -f" and "crsctl start crs" command as root. Case5. HAIP is up on all nodes and route info is presented but HAIP is not pingableSymptom: HAIP is presented on both nodes and route information is also presented, but both nodes can not ping or traceroute against the other node HAIP. [oracle@racnode2 script]$ netstat -r Kernel IP routing table Destination Gateway Genmask Flags MSS Window irtt Iface 192.168.x.x * 255.255.255.0 U 0 0 0 eth2 192.168.x.x * 255.255.255.0 U 0 0 0 eth1 192.168.x.x * 255.255.255.0 U 0 0 0 eth0 link-local * 255.255.0.0 U 0 0 0 eth2 default 192.168.x.x 0.0.0.0 UG 0 0 0 eth0 [oracle@racnode1 trace]$ netstat -r Kernel IP routing table Destination Gateway Genmask Flags MSS Window irtt Iface 192.168.x.x * 255.255.255.0 U 0 0 0 eth2 192.168.x.x * 255.255.255.0 U 0 0 0 eth1 192.168.x.x * 255.255.255.0 U 0 0 0 eth0 link-local * 255.255.0.0 U 0 0 0 eth2 default 192.168.x.x 0.0.0.0 UG 0 0 0 eth0 [oracle@racnode2 script]$ ping 169.254.x.x PING 169.254.x.x (169.254.x.x) 56(84) bytes of data. ^C --- 169.254.x.x ping statistics --- 39 packets transmitted, 0 received, 100% packet loss, time 38841ms [oracle@racnode1 trace]$ ping 169.254.x.x PING 169.254.x.x (169.254.x.x) 56(84) bytes of data. ^C --- 169.254.x.x ping statistics --- 35 packets transmitted, 0 received, 100% packet loss, time 34555ms Solution: For Openstack Cloud implementation, engage system admin to create another neutron port to map link-local traffic. For other environment, engage SysAdmin/NetworkAdmin to review routing/network setup. |
欢迎光临 重庆思庄Oracle、Redhat认证学习论坛 (http://bbs.cqsztech.com/) | Powered by Discuz! X3.2 |