重庆思庄Oracle、KingBase、PostgreSQL、Redhat认证学习论坛

标题: InnoDB：page_cleaner：1000ms intended loop 解决 [打印本页]

作者: jiawang 时间: 2024-7-30 14:44
标题: InnoDB：page_cleaner：1000ms intended loop 解决

告警日志：
2024-07-29T03:30:01.870780+08:00 441415 [Note] Aborted connection 441415 to db: 'financeproduct' user: 'DBywuser' host: '10.100.225.51' (Got timeout reading communication packets)

2024-07-29T09:34:26.291175+08:000 [Note] InnoDB: page_cleaner: 1000ms intended loop took 4709ms. The settingsmight not be optimal. (flushed=364 and evicted=0, during the time.)

2024-07-29T11:37:11.081633+08:000 [Note] InnoDB: page_cleaner: 1000ms intended loop took 4639ms. The settingsmight not be optimal. (flushed=537 and evicted=0, during the time.)

2024-07-30T00:15:11.777868+08:000 [Note] InnoDB: page_cleaner: 1000ms intended loop took 7271ms. The settingsmight not be optimal. (flushed=604 and evicted=0, during the time.)

造成该问题的主要原因：
page_cleaner_thread:脏页清理线程负责将脏页从内存写到磁盘。
出现该问题的原因：上面提示的信息的含义是,有大量脏页需要刷新，理论上应该在1s内完成，但实际却用了16s的时间将脏页刷新到磁盘，它接受脏页的数量远远大于它每秒能够处理脏页的能力，因此为了避免该问题，可降低每秒循环期间搜索脏页的深度（innodb_lru_scan_depth）。
如果数据库存在很多的buffer pool instance 将会引起更多的刷新工作，因此如果只是增大 buffer pool instances 而没有降低lru_scan_depth,很可能会引起性能瓶颈。
如果只是临时对数据库进行大量导入操作造成的这类问题,可以忽略这个问题不需关注。如果数据库一直存在大量更新操作，快速创建大量的脏页，该问题一直存在需要关注。

下面是官方文档对lru_scan_depth参数的解释，中文为自己对官方文档的理解翻译，错误之处望大家指正。
A setting smaller than the default is generally suitable for most workloads. A value that is much higher than necessary may impact performance. Only consider increasing the value if you have spare I/O capacity under a typical workload. Conversely, if a write-intensive workload saturates your I/O capacity, decrease the value, especially in the case of a large buffer pool.
When tuning innodb_lru_scan_depth, start with a low value and configure the setting upward with the goal of rarely seeing zero free pages. Also, consider adjusting innodb_lru_scan_depth when changing the number of buffer pool instances, since innodb_lru_scan_depth * innodb_buffer_pool_instances defines the amount of work performed by the page cleaner thread each second.
比默认值小适合于大多数工作负载，如果设置比默认值大可能会影响性能。只有当有额外的磁盘io容量的时候才考虑需不需要增加该值。相反，如果在写密集度负载已经使io容量饱和，需要降低该值，尤其是buffer pool的值设置特别大的时候。
如果lru_scan_depth值特别低，而且存在0空闲页，可以考虑调高该值。如果调整buffer pool instances时，需要考虑是否调整innodb_lru_scan_depth大小，因为innodb_lru_scan_depth * innodb_buffer_pool_instances决定了每秒page cleaner thread处理的工作量。

2、解决方法
SET GLOBAL innodb_lru_scan_depth=256;
该参数默认只为1024。
这只是数据库级别调整，出现该问题还需要查看服务器硬件问题，磁盘io是否达到瓶颈扥问题。

以下是导致此警告的内部结构的详细说明.

每秒一次,页面清理程序扫描缓冲池以查找脏页以从缓冲池刷新到磁盘.您看到的警告显示它有大量要刷新的脏页,将一批它们刷新到磁盘需要4秒以上,而它应该在1秒内完成该工作.换句话说,它咬得比它能咀嚼的多.

您通过innodb_lru_scan_depth从1024 减少到256来调整此值.这减少了页面清除程序线程在每秒一次循环期间搜索脏页的缓冲池的距离.你要求它采取较小的咬伤.

请注意,如果您有许多缓冲池实例,则会导致刷新以执行更多工作.它会innodb_lru_scan_depth减少每个缓冲池实例的工作量.因此,您可能会在不降低扫描深度的情况下通过增加缓冲池的数量而无意中导致此瓶颈.

innodb_lru_scan_depth"小于默认值的设置通常适用于大多数工作负载" 的文档说明.听起来他们给这个选项一个默认值太高的值.

您可以使用innodb_io_capacity和innodb_io_capacity_max选项限制背景刷新使用的IOPS .第一个选项是InnoDB将要求的I/O吞吐量的软限制.但这个限制是灵活的; 如果刷新落后于新脏页面创建的速度,InnoDB将动态增加超出此限制的刷新率.第二个选项定义了InnoDB可以提高冲洗率的更严格限制.

如果刷新率可以跟上创建新脏页的平均速度,那么你就可以了.但是如果你一直创建脏页的速度比刷新的速度快,那么最终你的缓冲池将填满脏页,直到脏页超出innodb_max_dirty_page_pct缓冲池.此时,刷新率将自动增加,并可能再次导致page_cleaner发送警告.

另一个解决方案是将MySQL放在具有更快磁盘的服务器上.您需要一个可以处理页面刷新所需吞吐量的I/O系统.

如果您在平均流量下始终看到此警告,则可能是在此MySQL服务器上尝试执行过多的写入查询.可能是时候向外扩展,并将写入分割为多个MySQL实例,每个实例都有自己的磁盘系统.

欢迎光临重庆思庄Oracle、KingBase、PostgreSQL、Redhat认证学习论坛 (http://bbs.cqsztech.com/)