SYMPTOMS
2 node RAC, intermittently instance is slow then go into 'hang' status without accepting any new connections. There is very high wait of 'cursor: mutex S'.
AWR report during slow/hanging period shows:
Top 5 Timed Foreground Events
Event Waits Time(s) Avg wait (ms) % DB time Wait Class
cursor: mutex S 5,895,761 19,387 3 32.29 Concurrency
DB CPU 16,926 28.19
library cache lock 122,326 9,803 80 16.33 Concurrency
db file sequential read 524,864 1,332 3 2.22 User I/O
enq: TX - row lock contention 7 380 54317 0.63 Application
CHANGES
CAUSE
This is caused by unpublished Bug 7352775 Many child cursors when PARALLEL_INSTANCE_GROUP set wrong.
From ASH report, the SQL waiting for 'cursor: mutex S' is:
SQL ID Planhash Sampled # of Executions % Activity Event % Event Top Row Source % RwSrc SQL Text
1v717nvrhgbn9 0 78.33 cursor: mutex S 42.52 ** Row Source Not Available ** 42.52 SELECT USER FROM SYS.DUAL
0 78.33 library cache lock 20.67 ** Row Source Not Available ** 20.67 SELECT USER FROM SYS.DUAL
0 78.33 CPU + Wait for CPU 14.93 ** Row Source Not Available ** 14.93 SELECT USER FROM SYS.DUAL
1546270724 0 6.50 CPU + Wait for CPU 2.93 ** Row Source Not Available ** 2.93 SELECT USER FROM SYS.DUAL
The version count of this SQL reached 4575 in one of AWR report. It is likely the cause of contention and high CPU usage.
Check cursor sharing:
select * from v$sql_shared_cursor where sql_id = '1v717nvrhgbn9';
SQL_ID ADDRESS CHILD_ADDRESS CHILD_NUMBER U S O O S L F E B P I S T A B D L T B I I R L I O E M U T N F A I T D L D B P C S C P T M B M R O P M F L P L A F L R L H P B
------------- ---------------- ---------------- ------------ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
1v717nvrhgbn9 070000033F0840D0 070000033E7CF808 0 N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N Y N N N N N N N N N N N N N
1v717nvrhgbn9 070000033F0840D0 070000033E790A70 1 N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N Y N N N N N N N N N N N N N
...
1v717nvrhgbn9 070000033F0840D0 070000030918C788 2602 N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N Y N N N N N N N N N N N N N
Those cursors are not shared because:
PX_MISMATCH
Check parameter setting:
<RAC1>.parallel_instance_group <RAC1>
<RAC2>.parallel_instance_group <RAC2>
There is no instance_group setting as it is deprecated in 11.1.
This incorrect setting matches Bug 7352775.
SOLUTION
Remove parallel_instance_group parameter.
alter system set parallel_instance_group='' scope=both sid='*';
If parallel_instance_group is needed, please config service_name instead.
After remove the parallel_instance_group setting, the version count for the SQL is remaining low and instance is no longer hanging intermittently.
|