重庆思庄Oracle、KingBase、PostgreSQL、Redhat认证学习论坛

 找回密码
 注册

QQ登录

只需一步,快速开始

搜索
查看: 350|回复: 0
打印 上一主题 下一主题

[参考文档] Flex ASM环境中crsd无法启动造成Grid Infrastructure (GI) 启动失败

[复制链接]
跳转到指定楼层
楼主
发表于 2026-1-18 11:32:35 | 只看该作者 回帖奖励 |倒序浏览 |阅读模式
概要
在一个Flex ASM环境里, Grid Infrastructure (GI) 启动失败, 而这时其它的一个或者多个节点上GI正在运行,
并且 "crsctl stat res -t -init" 的输出显示除了ora.crsd以外其它资源都是起来的。
这时 ora.crsd 的状态是offline 或者 intermediate。

集群的 alert.log 报如下错误:
2018-04-05 15:16:53.918 [CRSD(2697)]CRS-8500: Oracle Clusterware CRSD process is starting with operating system process ID 2697
2018-04-05 15:17:00.608 [CRSD(2697)]CRS-1013: The OCR location in an ASM disk group is inaccessible. Details in /u01/app/grid/diag/crs/<name>/crs/trace/crsd.trc.
2018-04-05 15:17:00.615 [CRSD(2697)]CRS-0804: Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-26: Error while accessing the physical storage Storage layer error [Insufficient quorum to open OCR devices] [0]]. Details at (:CRSD00111:) in /u01/app/grid/diag/crs/<name>/crs/trace/crsd.trc.



跟踪文件 crsd.trc 报如下错误:
2018-04-05 15:17:01.732 : CLSCRED:2919112768: (:CLSCRED0101:)clsCredDomInitRootDom: Using user given storage context for repository access.
2018-04-05 15:17:01.757 : OCRRAW:2919112768: 8033 Error 4 querying length of attr ASM_DISCOVERY_ADDRESS

2018-04-05 15:17:01.761 : OCRRAW:2919112768: 8033 Error 4 querying length of attr ASM_STATIC_DISCOVERY_ADDRESS

2018-04-05 15:17:01.798 : CLSCRED:2919112768: (:CLSCRED1079:)clsCredOcrKeyExists: Obj dom : SYSTEM.credentials.domains.root.ASM.Self.076fa97b2ac84f70ff7035254e98f38d.root not found
2018-04-05 15:17:01.798 : OCRRAW:2919112768: 7755 Error 4 opening dom root in 0x4d37e30

2018-04-05 15:17:01.816 : OCRRAW:2919112768: kgfnConnect2: kgfnGetBeqData failed

2018-04-05 15:17:01.816*:kgfn.c@4933: kgfnConnect2: kgfnGetBeqData failed
2018-04-05 15:17:01.816 : CSSCLNT:2919112768: clsssinit: initialized context: (0x4f23d50) flags 0x104
2018-04-05 15:17:01.821 : CSSCLNT:2919112768: clsssterm: terminating context (0x4f23d50)
2018-04-05 15:17:01.862 : OCRRAW:2919112768: kgfnConnect2Int: cstr=(DESCRIPTION=(TCP_USER_TIMEOUT=1)(TRANSPORT_CONNECT_TIMEOUT=60)(EXPIRE_TIME=1)(ADDRESS_LIST=(LOAD_BALANCE=ON)(ADDRESS=(PROTOCOL=tcp)(HOST=nn.nn.255.13))(PORT=1526)))(CONNECT_DATA=(SERVICE_NAME=+ASM)))

2018-04-05 15:17:01.862*:kgfn.c@6685: kgfnConnect2Int: cstr=(DESCRIPTION=(TCP_USER_TIMEOUT=1)(TRANSPORT_CONNECT_TIMEOUT=60)(EXPIRE_TIME=1)(ADDRESS_LIST=(LOAD_BALANCE=ON)(ADDRESS=(PROTOCOL=tcp)(HOST=nn.nn.255.13)(PORT=1526)))(CONNECT_DATA=(SERVICE_NAME=+ASM)))
2018-04-05 15:17:01.862*:kgfn.c@6853: kgfnConnect2Int: OCISessionBegin failed
2018-04-05 15:17:03.139 : OCRRAW:2919112768: kgfnRecordErr 1017 OCI error:
ORA-01017: invalid username/password; logon denied

2018-04-05 15:17:03.139*:kgfn.c@1707: kgfnRecordErrPriv: 1017 error=ORA-01017: invalid username/password; logon denied

2018-04-05 15:17:03.140 : default:2919112768: clsCredDomClose: Credctx deleted 0x4d45890
2018-04-05 15:17:03.140 : OCRRAW:2919112768: kgfnConnect2: failed to connect

2018-04-05 15:17:03.140*:kgfn.c@5253: kgfnConnect2: failed to connect
2018-04-05 15:17:03.140 : OCRRAW:2919112768: kgfnConnect2Retry: failed to connect connect after 1 attempts, 143s elapsed

2018-04-05 15:17:03.140 : OCRRAW:2919112768: kgfo_kge2slos error stack at kgfoAl06: ORA-01017: invalid username/password; logon denied
ORA-27300: OS system dependent operation:sslssunreghdlr failed with status: 0
ORA-27301: OS failure message: Error 0
ORA-27302: failure occurred at: sskgpreset1
ORA-15077: could not locate ASM instance serving a required diskgroup


原因
通常有如下原因:
1) sqlnet.ora 里有
SQLNET.AUTHENTICATION_SERVICES=none
这项设置使得crsd连接到其它节点上的远程ASM实例所需要的任何OS认证,都变得失效。

2) ASM 密码不对

3) ASMlistener 子网与为私有interconnect配置不匹配。
执行 "oifcfg getif" 能看到 private interconnect (cluster interconnect) 所在的子网。
这个问题的报错在SQLNET.AUTHENTICATION_SERVICES=all时也会出现。


解决方案

1) 如果是sqlnet.ora 里有 SQLNET.AUTHENTICATION_SERVICES=none 或 SQLNET.AUTHENTICATION_SERVICES=all 的情况

1) 从 Grid Home SQLNET.ORA 文件 (位于 $ORACLE_HOME/network/admin) 里清除 "SQLNET.AUTHENTICATION_SERVICES=none" 或 "SQLNET.AUTHENTICATION_SERVICES=all"

2) 以强制方式重启 CRS

crsctl stop crs -f
crsctl start crs

参考 "Unable to startup CRS as ASM failed to startup with "ORA-01017: invalid username/password; logon denied KB145388"

2) ASM 密码不对的情况 (这是在 sqlnet.ora 文件没有问题的情况下的可能的故障原因)

1)  对于12.2及以下的版本 -- 按 MOS 文章 " How to recreate shared ASM password file in 12c GI cluster KB147540" 所指示的重建ASM 密码
     对于19.x及以上的版本 -- 参考 How to Recreate Shared ASM Password File in 19c Grid Infrastructure (GI)  (Doc ID KB790238)

2) 以强制方式重启 CRS
crsctl stop crs -f
crsctl start crs

3) ASMlistener 子网(subnet)与所配置的 interconnect 不匹配

按文章 KB109184 里段落 "C. For 12c and 18c Oracle Clusterware with Flex ASM" 的步骤 3,重建ASMlistener。


一个快速的规避方法是在本地节点上使用sqlplus手动启动。
如果手动启动asm几分钟后ora.crsd还是没有online, 则以root身份执行"crsctl start res ora.crsd -init" 。





分享到:  QQ好友和群QQ好友和群 QQ空间QQ空间 腾讯微博腾讯微博 腾讯朋友腾讯朋友
收藏收藏 支持支持 反对反对
回复

使用道具 举报

您需要登录后才可以回帖 登录 | 注册

本版积分规则

QQ|手机版|小黑屋|重庆思庄Oracle、Redhat认证学习论坛 ( 渝ICP备12004239号-4 )

GMT+8, 2026-4-17 20:56 , Processed in 0.233290 second(s), 20 queries .

重庆思庄学习中心论坛-重庆思庄科技有限公司论坛

© 2001-2020

快速回复 返回顶部 返回列表