Alter Log中VKTM时间drift漂移现象

jiawang · 发表于 2020-11-2 09:46:57

时间是包括数据库系统在内的诸多信息系统基础件的重要因素。对于运行在操作系统OS之上的中间件组件而言，获取到一个准确、连续和一致的时间非常重要，特别是多节点的环境下。如果没有一个统一的时间管理机制，其上的cluster组件工作是及其困难的。
本篇主要介绍Oracle vktm时间后台进程报警的Bug问题。
1、从11g VKTM进程谈起
对Oracle数据库，避免对于操作系统层面时间的调用，维持一个统一稳定的时间体系一直是发展方向。在11g中，一个独立的后台进程vktm被引入到实例体系下。
VKTM进程全称为“Virtual Keeper of Time Process”，用于给数据库运行和间隔运算计量提供出一个统一的时间服务。
官方解释是：
VKTM acts as a time publisher for an Oracle instance. VKTM publishes two sets of time: a wall clock time using a seconds interval and a higher resolution time (which is not wall clock time) for interval measurements. The VKTM timer service centralizes time tracking and offloads multiple timer calls from other clients.
在11g之前的版本中，如果数据库实例（包括ASM和RAC Instance）需要当前时间的时候，都调用操作系统层面的时间获取函数（例如：gettimeofday()）。进入11g之后，这个动作就由统一的VKTM负责完成，并且在进程内部保留时间过程。其他进程如果需要时间，都通过这个进程间接获得。专门的时间后台进程的出现，最直接的效果就是减少了同操作系统内核的交互，提高了性能。

2、alter日志报错现象：

登录/注册后可看大图

209565f9f5ca36dc09.png (120.5 KB, 下载次数: 187)

下载附件

2020-11-2 09:10 上传

alert log是我们发现数据库运行问题最直接的途径和方法。从11g开始，一些预测、诊断性的信息，也都通过alert log进行输出，期望实现数据库故障问题预先诊断。
从日志信息看，数据库在两个时间点经历了两次VKTM进程的Time Drift现象。Drift是漂移、浮动的含义。VKTM作为时间维护后台进程，在两个时间点中似乎发生了快速的前移和后退。
根据日志中信息时间信息，我们查找定位故障发生时的trace文件。
[oracle@strong trace]$ ls -l | grep vktm
-rw-r-----. 1 oracle dba  1520 Nov  2 09:30 orcl_vktm_64390.trc
-rw-r-----. 1 oracle dba 104 Nov  2 09:30 orcl_vktm_64390.trm
-rw-r-----. 1 oracle dba  1199 Jul 31 15:48 orcl_vktm_64268.trc
-rw-r-----. 1 oracle dba 88 Jul 31 15:48 orcl_vktm_64268.trm
-rw-r-----. 1 oracle dba  1170 Jul 31 15:48 orcl_vktm_64094.trc
-rw-r-----. 1 oracle dba 88 Jul 31 15:48 orcl_vktm_64094.trm
-rw-r-----. 1 oracle dba  1199 Jul 31 15:47 orcl_vktm_63999.trc
-rw-r-----. 1 oracle dba 88 Jul 31 15:47 orcl_vktm_63999.trm
查看对应的orcl_vktm_64390.trc日志
*** 2020-11-02 09:30:33.834
kstmchkdrift (kstmhighrestimecntkeeper:highres): Time jumped forward by (8098558638874)usec at (1604280633834134) whereas (1000000) is allow
ed
kstmchkdrift (kstmhighrestimecntkeeper:lowres): Time jumped forward by (8098558)usec at (1604280633) whereas (5000000) is allowed
在alert结果中，我们看到一次漂移drift的现象。信息中，我们看到一个单位usec，折合0.000001秒钟。

在MOS中，定位到了一篇针对alert log中出现Time Drift提示错误的文章，名称为：Time Drift Detected. Please Check VKTM Trace File for More Details，文章编号：ID 1347586.1。

[size=130%]Time Drift Detected. Please Check VKTM Trace File for More Details (Doc ID 1347586.1)

登录/注册后可看大图

To Bottom

登录/注册后可看大图

In this Document

Goal

Solution

References

Applies to: Oracle Database - Enterprise Edition - Version 11.2.0.4 and later
Oracle Database - Standard Edition - Version 11.2.0.4 to 11.2.0.4 [Release 11.2]
Oracle Database Cloud Schema Service - Version N/A and later
Oracle Database Exadata Cloud Machine - Version N/A and later
Oracle Cloud Infrastructure - Database Service - Version N/A and later
Information in this document applies to any platform.
GoalBelow message keeps repeating in Alert-log.
Time drift detected. Please check VKTM trace file for more details.
Warning: VKTM detected a time drift

This note explains the possible cause and if the issue needs to be addressed or not.
SolutionThere are multiple bugs filed for this issue, for example:
Bug 12601857 - TIME DRIFT DETECTED. PLEASE CHECK VKTM TRACE FILE FOR MORE DETAILS.
Bug 12374867 - TIME DRIFT DETECTED. PLEASE CHECK VKTM TRACE FILE FOR MORE DETAILS.

Which were closed as duplicates of:
Bug 11837095 "TIME DRIFT DETECTED" APPEARS INTERMITTENTLY IN ALERT LOG, THO' EVENT 10795 SET.

To fix the issue, please download and apply patch 11837095 if it is available for your release/platform. This is available for 11.2.0.2 and above.
NOTE:
Patch 11837095 extends to functionality of event 10795 (set to level 2) for VKTM tracing and limits alert log entries.
After applying the patch, enable event 10795 at level 2:
alter system set event="10795 trace name context forever, level 2" scope=spfile;
Then bounce the instance (shutdown then startup) to make the change take effect.
The permanent fix is included in the 11.2.0.4 and 12.1 releases.
For version 11.2.0.1, apply patch for Bug 9843304. This Bug is already fixed in 11.2.0.2
Meaning and technical Impact of the error:
If the time drifts occurs less than 1sec and 5 sec for forward and backward respectively, then it is permissible and OK.
If the traces are emitting time drifts of amount beyond these ranges, then it needs to be analyzed.
Most of the times, during high loads, there would be issues with underlying OS due to virtual memory, network time protocol improper configuration etc.

In general VKTM process need to be scheduled in every 10ms, if due to above reasons this is not happening, occurs less than 1sec and 5 sec then it is permissible.
Eventually, this probably would cause the resource manager to take improper decisions and can lead to a hang in worst case.

VKTM process trace file can be found under bdump, however in this case the trace file doesn't contain useful information, which makes the message ambiguous.
There are bugs for the VKTM process malfunction
Unpublished Bug 18390507 : EXADATA: VKTM CONSUMING HUGE CPU AFTER PATCHING 11.2.3.3.0 AND 11.2.0.4.3
This Bug is closed as duplicate of Unpublished Bug 18499306
Unpublished Bug 18499306 : DBMX28 DATABASE VKTM PROCESSES CONSUME LARGE AMOUNT OF CPU, UNSTABLIZE MACHINE
This Bug is fixed in 12.1.0.2

So please download & apply patch 18499306, if it is available on top of your database version for your platform (If the patch is not available on your version/platform below 12.1.0.2, you may want to raise an SR to request it).

帐号		自动登录	找回密码
密码			注册

[Oracle] Alter Log中VKTM时间drift漂移现象