重庆思庄Oracle、Redhat认证学习论坛

 找回密码
 注册

QQ登录

只需一步,快速开始

搜索
查看: 3406|回复: 3
打印 上一主题 下一主题

[讨论] RHEL7.2 会crash掉Oracle asm 实例和Oracle database实例

[复制链接]
跳转到指定楼层
楼主
发表于 2016-9-14 19:56:15 | 只看该作者 回帖奖励 |倒序浏览 |阅读模式
设置了RemoveIPC=yes 的RHEL7.2  会crash掉Oracle asm 实例和Oracle database实例,该问题也会在使用Shared Memory Segment (SHM) or Semaphores (SEM)的应用程序中发生。
来源于:
ALERT: Setting RemoveIPC=yes on Redhat 7.2 Crashes ASM and Database Instances as Well as Any Application That Uses a Shared Memory Segment (SHM) or Semaphores (SEM) (文档 ID 2081410.1)
适用于:
Oracle Database - Standard Edition
Oracle Database - Enterprise Edition
Linux x86-64
Linux x86

描述:
在RHEL7.2中,systemd-logind 服务引入了一个新特性,该新特性是:当一个user 完全退出os之后,remove掉所有的IPC objects。
该特性由/etc/systemd/logind.conf参数文件中RemoveIPC选项来控制。详细请看man logind.conf(5)
在RHEL7.2中,RemoveIPC的默认值为yes
因此,当最后一个oracle 或者Grid用户退出时,操作系统会remove 掉这个user的shared memory segments and semaphores
由于Oracle ASM 和database 使用 shared memory segments ,remove shared memory segments将会crash掉Oracle ASM and database  instances.
请参考Redhat bug 1264533  - https://bugzilla.redhat.com/show_bug.cgi?id=1264533

OCCURRENCE
该问题影响使用the shared memory segments 和semaphores 的所有应用程序,因此,Oracle ASM 实例和Oracle Database 实例均受到影响。
Oracle Linux 7.2 通过在/etc/systemd/logind.conf配置文件中明确设置RemoveIPC为no,Oracle Linux7.2 避免了该问题,
但是若是/etc/systemd/logind.conf文件是在os upgrade之前修改的,那么yum/update将会写一个正确的配置文件(RemoveIPC=no),该配置文件名是logind.conf.rpmnew,如果用户使用原来的配置文件,那么本文描述的failures将会发生。
为了避免本问题,当os升级之后,务必编辑logind.conf 文件并设置RemoveIPC=no。这在Oracle Linux 7.2 release notes中有记录。

症状:


  • 1) Installing 11.2 and 12c GI/CRS fails, because ASM crashes towards the end of the installation.
  • 2) Upgrading to 11.2 and 12c GI/CRS fails.
  • 3) After Redhat Linux is upgraded to 7.2, 11.2 and 12c ASM and database instances crash.

1) Installing 11.2 and 12c GI/CRS fails, because ASM crashes towards the end of the installation.2) Upgrading to 11.2 and 12c GI/CRS fails.3) After Redhat Linux is upgraded to 7.2, 11.2 and 12c ASM and database instances crash.systemd-logind remove掉IPC objects可能在任何时候发生,故障的表现可以有很大的不同,下面是故障的几个例子


  • Most common error that occurs is that the following is found in the asm or database alert.log:
  • ORA-27157: OS post/wait facility removed
  • ORA-27300: OS system dependent operation:semop failed with status: 43
  • ORA-27301: OS failure message: Identifier removed
  • ORA-27302: failure occurred at: sskgpwwait1


Most common error that occurs is that the following is found in the asm or database alert.log:ORA-27157: OS post/wait facility removedORA-27300: OS system dependent operation:semop failed with status: 43ORA-27301: OS failure message: Identifier removedORA-27302: failure occurred at: sskgpwwait1

  • The second observed error occurs during installation and upgrade when asmca fails with the following error:
  • KFOD-00313: No ASM instances available. CSS group services were successfully initilized by kgxgncin
  • KFOD-00105: Could not open pfile 'init@.ora'


The second observed error occurs during installation and upgrade when asmca fails with the following error:KFOD-00313: No ASM instances available. CSS group services were successfully initilized by kgxgncinKFOD-00105: Could not open pfile 'init@.ora'


  • The third observed error occurred during installation and upgrade:
  • Creation of ASM password file failed. Following error occurred: Error in Process: /u01/app/12.1.0/grid/bin/orapwd
  • Enter password for SYS:
  • OPW-00009: Could not establish connection to Automatic Storage Management instance
  • 2015/11/20 21:38:45 CLSRSC-184: Configuration of ASM failed
  • 2015/11/20 21:38:46 CLSRSC-258: Failed to configure and start ASM


The third observed error occurred during installation and upgrade:Creation of ASM password file failed. Following error occurred: Error in Process: /u01/app/12.1.0/grid/bin/orapwd Enter password for SYS:OPW-00009: Could not establish connection to Automatic Storage Management instance2015/11/20 21:38:45 CLSRSC-184: Configuration of ASM failed2015/11/20 21:38:46 CLSRSC-258: Failed to configure and start ASM

  • The fourth observed error is the following message is found in the /var/log/messages file around the time that asm or database instance crashed:
  • Nov 20 21:38:43 testc201 kernel: traps: oracle[24861] trap divide error
  • ip:3896db8 sp:7ffef1de3c40 error:0 in oracle[400000+ef57000]


The fourth observed error is the following message is found in the /var/log/messages file around the time that asm or database instance crashed:Nov 20 21:38:43 testc201 kernel: traps: oracle[24861] trap divide errorip:3896db8 sp:7ffef1de3c40 error:0 in oracle[400000+ef57000]
变通的解决方法:
1) Set RemoveIPC=no in /etc/systemd/logind.conf
2) Reboot the server or restart systemd-logind as follows:
     # systemctl daemon-reload
     # systemctl restart systemd-logind

补丁:
从RHEL7.2迁移到Oracle Linux7.2可以解决本问题。
若是迁移到Oracle Linux7.2不可能,请使用上述变通的解决方法

历史:
2015年11月23日,本文章被建立。
分享到:  QQ好友和群QQ好友和群 QQ空间QQ空间 腾讯微博腾讯微博 腾讯朋友腾讯朋友
收藏收藏 支持支持 反对反对
回复

使用道具 举报

沙发
 楼主| 发表于 2016-9-14 20:00:24 | 只看该作者
针对该问题,手头刚好安装了一个centos7 (1511),发现/etc/systemd/logind.conf内容如下:
[root@localhost ~]# cat  /etc/systemd/logind.conf
#  This file is part of systemd.
#
#  systemd is free software; you can redistribute it and/or modify it
#  under the terms of the GNU Lesser General Public License as published by
#  the Free Software Foundation; either version 2.1 of the License, or
#  (at your option) any later version.
#
# Entries in this file show the compile time defaults.
# You can change settings by editing this file.
# Defaults can be restored by simply deleting this file.
#
# See logind.conf(5) for details.

[Login]
#NAutoVTs=6
#ReserveVT=6
#KillUserProcesses=no
#KillOnlyUsers=
#KillExcludeUsers=root
#InhibitDelayMaxSec=5
#HandlePowerKey=poweroff
#HandleSuspendKey=suspend
#HandleHibernateKey=hibernate
#HandleLidSwitch=suspend
#HandleLidSwitchDocked=ignore
#PowerKeyIgnoreInhibited=no
#SuspendKeyIgnoreInhibited=no
#HibernateKeyIgnoreInhibited=no
#LidSwitchIgnoreInhibited=yes
#IdleAction=ignore
#IdleActionSec=30min
#RuntimeDirectorySize=10%
#RemoveIPC=no
[root@localhost ~]#

看来 centos7 1511,对 RemoveIPC=no 默认为no了,所以,我多次退出 oracle用户,都没有发生该问题,
后修改 RemoveIPC=YES,后,问题就发生了,看来centos7 1511 ,以及oel7.2,已经发现了该问题,在发布时,已经对此作了处理.
回复 支持 反对

使用道具 举报

板凳
 楼主| 发表于 2017-12-25 20:09:02 | 只看该作者
centos7.4 默认也为 removeipc=no
回复 支持 反对

使用道具 举报

地板
 楼主| 发表于 2017-12-25 20:10:40 | 只看该作者
[root@hisdb2 setup]# man logind.conf |grep -A 5 RemoveIPC
       RemoveIPC=
           Controls whether System V and POSIX IPC objects belonging to the user shall be removed when the user fully logs out. Takes a boolean argument. If enabled, the
           user may not consume IPC resources after the last of the user's sessions terminated. This covers System V semaphores, shared memory and message queues, as well
           as POSIX shared memory and message queues. Note that IPC objects of the root user are excluded from the effect of this setting. Defaults to "no".

SEE ALSO
[root@hisdb2 setup]#
回复 支持 反对

使用道具 举报

您需要登录后才可以回帖 登录 | 注册

本版积分规则

QQ|手机版|小黑屋|重庆思庄Oracle、Redhat认证学习论坛 ( 渝ICP备12004239号-4 )

GMT+8, 2024-5-4 18:28 , Processed in 0.094122 second(s), 19 queries .

重庆思庄学习中心论坛-重庆思庄科技有限公司论坛

© 2001-2020

快速回复 返回顶部 返回列表