问题概述
分区表在执行DDL的时候,pg_dump执行导出任意表都会等待
问题复现
会话1开启一个事务,查询分区表test_list
postgres=# begin;
BEGIN
postgres=*# select * from test_list limit 1;
id | name
----+------
1 | 1
(1 row)
postgres=*#
会话2执行truncate分区表test_list
postgres=# truncate test_list;
会话3执行pg_dump导出a表定义,执行后并未输出任何信息
[postgres@mydb1a src]$ /app/pg12_7/bin/pg_dump -p 5431 -t a -s
会话4查看活动会话和锁信息,可以看到pid:1496559(pg_dump) -> 等待pid:1496329(truncate) -> 等待pid:1494903(select)
postgres=# select pid,application_name,xact_start,state,wait_event,wait_event_type,query from pg_stat_activity where state<>'idle' order by xact_start;
pid | application_name | xact_start | state | wait_event | wait_event_type |query
--------------------------
1494903 | psql | 2024-08-14 14:26:49.79773+08 | idle in transaction | ClientRead | Client | select * from test_list limit 1;
1496329 | psql | 2024-08-14 14:29:13.82338+08 | active | relation | Lock | truncate test_list;
1496559 | pg_dump | 2024-08-14 14:29:21.555196+08 | active | relation | Lock | SELECT c.tableoid, c.oid, c.relname, (SELECT
pg_catalog.array_agg(acl ORDER BY row_n) FROM (SELECT acl, row_n FROM pg_catalog.unnest(coalesce(c.relacl,pg_catalog.acldefault(CASE WHEN c.relkind = 'S' THEN
's' ELSE 'r' END::"char",c.relowner))) WITH ORDINALITY AS perm(acl,row_n) WHERE NOT EXISTS ( SELECT 1 FROM pg_catalog.unnest(coalesce(pip.initprivs,pg_catalog
.acldefault(CASE WHEN c.relkind = 'S' THEN 's' ELSE 'r' END::"char",c.relowner))) AS init(init_acl) WHERE acl = init_acl)) as foo) AS relacl, (SELECT pg_catalo
g.array_agg(acl ORDER BY row_n) FROM (SELECT acl, row_n FROM pg_catalog.unnest(coalesce(pip.initprivs,pg_catalog.acldefault(CASE WHEN c.relkind = 'S' THEN 's'
ELSE 'r' END::"char",c.relowner))) WITH ORDINALITY AS initp(acl,row_n) WHERE NOT EXISTS ( SELECT 1 FROM pg_catalog.unnest(coalesce(c.relacl,pg_catalog.acldefau
lt(CASE WHEN c.relkind = 'S' THEN 's' ELSE 'r' END::"char",c.relowner))) AS permp(orig_acl) WHERE acl = orig_acl)) as foo) as rrelacl, NULL AS initrelacl, NULL
as initrrelacl, c.relkin
postgres=# select relation::regclass,pid,pg_blocking_pids(pid),mode from pg_locks where granted='f';
relation | pid | pg_blocking_pids | mode
-----------+---------+------------------+---------------------
test_list | 1496559 | {1496329} | AccessShareLock
test_list | 1496329 | {1494903} | AccessExclusiveLock
(2 rows)
问题分析
此时test_list表是有锁的,这毫无疑问,但是pg_dump导出a表时为什么也要等待?
首先要拿到pg_dump执行的这个完整SQL是什么,设置参数log_statement=all,再次重复测试,在日志中找到完整SQL
只展示部分SQL
SELECT c.tableoid
......
pg_get_partkeydef(c.oid) AS partkeydef, c.relispartition AS ispartition, pg_get_expr(c.relpartbound, c.oid) AS partbound
FROM pg_class c
LEFT JOIN pg_depend d ON (c.relkind = 'S' AND d.classid = c.tableoid AND d.objid = c.oid AND d.objsubid = 0 AND d.refclassid = c.tableoid AND d.deptype IN ('a', 'i'))
LEFT JOIN pg_class tc ON (c.reltoastrelid = tc.oid AND c.relkind <> 'p')
LEFT JOIN pg_am am ON (c.relam = am.oid)
LEFT JOIN pg_init_privs pip ON (c.oid = pip.objoid AND pip.classoid = 'pg_class'::regclass AND pip.objsubid = 0)
WHERE c.relkind in ('r', 'S', 'v', 'c', 'm', 'f', 'p') ORDER BY c.oid
可以看到SQL中的两个函数pg_get_partkeydef、pg_get_expr,问题就在这,表test_list正在执行DDL,表上会加最高级别锁AccessExclusiveLock,此时表test_list是不可访问,分区函数也不可以执行,pg_dump就会在此处等待。
测试pg15及之后的版本已不存在该问题
pg15源码"bin/pg_dump/pg_dump.c"注释中可以看到相关说明
/*
* Find all the tables and table-like objects.
*
* We must fetch all tables in this phase because otherwise we cannot
* correctly identify inherited columns, owned sequences, etc.
*
* We include system catalogs, so that we can work if a user table is
* defined to inherit from a system catalog (pretty weird, but...)
*
* Note: in this phase we should collect only a minimal amount of
* information about each table, basically just enough to decide if it is
* interesting. In particular, since we do not yet have lock on any user
* table, we MUST NOT invoke any server-side data collection functions
* (for instance, pg_get_partkeydef()). Those are likely to fail or give
* wrong answers if any concurrent DDL is happening.
*/
复制
问题总结
1.分区表正在执行DDL时,执行pg_dump导出任意表都会卡住,pg15及之后版本已优化该问题可以正常执行
2.普通表a正在执行DDL时,执行pg_dump导出其他表没问题
导出a表卡住,可以在pg_dump时使用选项–lock-wait-timeout=TIMEOUT ,设置超时时间,超时后pg_dump退出
|