Applies to: Linux OS - Version Oracle Linux 6.0 and later
Oracle Database - Enterprise Edition
Linux x86-64
Symptoms- Database Alert log captures below:
mtype: 61 process 14152 failed because of a resource problem in the OS. The OS has most likely run out of buffers (rval: 4)
Errors in file /oracle/admin/diag/rdbms/nadb29/NADB29-7/trace/NADB29-7_ora_14152.trc (incident=249601):
ORA-00603: ORACLE server session terminated by fatal error
ORA-27504: IPC error creating OSD context
ORA-27300: OS system dependent operation:sendmsg failed with status: 105
ORA-27301: OS failure message: No buffer space available
ORA-27302: failure occurred at: sskgxpsnd2
Incident details in: /oracle/admin/diag/rdbms/nadb29/NADB29-7/incident/incdir_249601/NADB29-7_ora_14152_i249601.trc
- Network communication issue and High Memory consumption would be observed during this time.
- Server is running with UEK3 kernel.
Cause This happens due to less space available for network buffer reservation.
Solution1. On servers with High Physical Memory, the parameter vm.min_free_kbytes should be set in the order of 0.4% of total Physical Memory. This helps in keeping a larger range of defragmented memory pages available for network buffers reducing the probability of a low-buffer-space conditions.
*** For example, on a server which is having 256GB RAM, the parameter vm.min_free_kbytes should be set to 1048576 ***
On NUMA Enabled Systems, the value of vm.min_free_kbytes should be multiplied by the number of NUMA nodes since the value is to be split across all the nodes.
On NUMA Enabled Systems, the value of vm.min_free_kbytes = n * 0.4% of total Physical Memory. Here 'n' is the number of NUMA nodes.
2. Additionally, the MTU value should be modified as below
#ifconfig lo mtu 16436
To make the change persistent over reboot add the following line in the file /etc/sysconfig/network-scripts/ifcfg-lo :
MTU=16436
Save the file and restart the network service to load the changes
#service network restart
Note : While making the changes in CRS nodes, if network is restarted while CRS is up, it can hung CRS. So cluster services should be stopped prior to the network restart.