本帖最后由 mahan 于 2024-3-17 15:02 编辑
近期处理一个服务器磁盘爆满事件,发现是一台postgre的数据库wal日志过大导致
磁盘爆满排查,在centos,直接就命令du去定位大目录
/dev/vda1 40G 11G 27g 29%
/dev/vdb1 296G 232G 49g 83%
## 查看磁盘大小,按目录输出
du -sh *
最终发现是postgres的目录下的pg_wal,一个文件256M,有600+个
接下来,我们来零基础入门postgres,粗略了解这台服务器上的数据库基本情况
1. 确定postgres部署的目录
[postgres@test01 pg_data]$ whereis psql
psql: /var/postgresql/soft/pg12.8/bin/psql
2. 连接及查看帮助
[postgres@test01 ~]$ psql
psql (12.8)
Type "help" for help.
postgres=# help
You are using psql, the command-line interface to PostgreSQL.
Type: \copyright for distribution terms
\h for help with SQL commands
\? for help with psql commands
\g or terminate with semicolon to execute query
\q to quit
postgres=# \h
3. 查看pg的状态,为 active(running)
[postgre@test01 ~]$ sudo systemctl status postgresql
● postgresql.service - PostgreSQL 12 database server
Loaded: loaded (/usr/lib/systemd/system/postgresql.service; enabled; vendor preset: disabled)
Active: active (running) since Sat 2023-12-16 17:02:36 CST; 2 months 7 days ago
Main PID: 1143 (postgres)
Tasks: 9
Memory: 512.8M
CGroup: /system.slice/postgresql.service
├─1143 /var/postgresql/soft/pg12.8/bin/postgres -D /mnt/data/postgres/pg_data
├─1848 postgres: logger
├─3099 postgres: checkpointer
├─3100 postgres: background writer
├─3101 postgres: walwriter
├─3102 postgres: autovacuum launcher
├─3103 postgres: archiver failed on 000000010000000000000001
├─3104 postgres: stats collector
└─3105 postgres: logical replication launcher
Dec 16 17:02:21 test01 systemd[1]: Starting PostgreSQL 12 database server...
Dec 16 17:02:21 test01 pg_ctl[1098]: pg_ctl: another server might be running; trying to start server anyway
Dec 16 17:02:21 test01 pg_ctl[1098]: waiting for server to start....2023-12-16 17:02:21.642 CST [1143] LOG: starting PostgreSQL 12.8 on x86_64-pc-linux-gnu, compiled by g...-44), 64-bit
Dec 16 17:02:21 test01 pg_ctl[1098]: 2023-12-16 17:02:21.642 CST [1143] LOG: listening on IPv4 address "0.0.0.0", port 5432
Dec 16 17:02:21 test01 pg_ctl[1098]: 2023-12-16 17:02:21.646 CST [1143] LOG: listening on Unix socket "/tmp/.s.PGSQL.5432"
Dec 16 17:02:22 test01 pg_ctl[1098]: 2023-12-16 17:02:22.140 CST [1143] LOG: redirecting log output to logging collector process
Dec 16 17:02:22 test01 pg_ctl[1098]: 2023-12-16 17:02:22.140 CST [1143] HINT: Future log output will appear in directory "log".
Dec 16 17:02:36 test01 systemd[1]: Started PostgreSQL 12 database server.
Hint: Some lines were ellipsized, use -l to show in full.
4. 查询数据库的版本
postgres=# select version();
version
---------------------------------------------------------------------------------------------------------
PostgreSQL 12.8 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-44), 64-bit
(1 row)
5. 查询配置参数,若是有备份会提示备份目录
[postgres@test01 pg_data]$ pg_config --configure
'--prefix=/var/postgresql/soft/pg12.8_8k_16g_openssl' '--with-libxml' '--with-libxslt' '--with-uuid=ossp' '--with-segsize=16' '--with-openssl'
由此可见,此处为涉及
6. 查询有哪些数据库
postgres=# select datname from pg_database;
datname
--------------------------
postgres
template1
template0
pgtest1
pgtest2
pgtest3
pgtest4
pgtest5
pgtest6
pgtest7
pgtest8
pgtest9
pgtest10
(13 rows)
7. 查询数据库大小
postgres=# SELECT datname AS "Database Name", pg_size_pretty(pg_database_size(datname)) AS "Size"
postgres-# FROM pg_database;
Database Name | Size
--------------------------+---------
postgres | 8049 kB
template1 | 7809 kB
template0 | 7809 kB
pgtest1 | 355 MB
pgtest2 | 9609 kB
pgtest3 | 1172 MB
pgtest4 | 91 MB
pgtest5 | 166 MB
pgtest6 | 457 MB
pgtest7 | 283 MB
pgtest8 | 47 MB
pgtest9 | 796 MB
pgtest10 | 8553 kB
(13 rows)
8. 查看当前归档命令及位置
postgres=# SHOW archive_command;
archive_command
-------------------------------------------------------------------------------------------------------------------
test ! -f /mnt/data/postgres/archivelog/%f && cp %p /mnt/data/postgres/archivelog/%f
(1 row)
test命令,由此可见,并未对wal日志进行归档
9. 查和显示控制信息
[postgres@test01 pg_wal]$ pg_controldata /mnt/data/postgres/pg_data/
pg_control version number: 1201
Catalog version number: 201909212
Database system identifier: 7218821847280610498
Database cluster state: in production
pg_control last modified: Thu 22 Feb 2024 10:08:56 AM CST
Latest checkpoint location: 27/80000060
Latest checkpoint's REDO location: 27/80000028
Latest checkpoint's REDO WAL file: 000000010000002700000008
Latest checkpoint's TimeLineID: 1
Latest checkpoint's PrevTimeLineID: 1
Latest checkpoint's full_page_writes: on
Latest checkpoint's NextXID: 0:964212
Latest checkpoint's NextOID: 606208
Latest checkpoint's NextMultiXactId: 1
Latest checkpoint's NextMultiOffset: 0
Latest checkpoint's oldestXID: 479
Latest checkpoint's oldestXID's DB: 1
Latest checkpoint's oldestActiveXID: 964212
Latest checkpoint's oldestMultiXid: 1
Latest checkpoint's oldestMulti's DB: 1
Latest checkpoint's oldestCommitTsXid:0
Latest checkpoint's newestCommitTsXid:0
Time of latest checkpoint: Thu 22 Feb 2024 10:08:55 AM CST
Fake LSN counter for unlogged rels: 0/3E8
Minimum recovery ending location: 0/0
Min recovery ending loc's timeline: 0
Backup start location: 0/0
Backup end location: 0/0
End-of-backup record required: no
wal_level setting: logical
wal_log_hints setting: on
max_connections setting: 5000
max_worker_processes setting: 128
max_wal_senders setting: 6
max_prepared_xacts setting: 0
max_locks_per_xact setting: 64
track_commit_timestamp setting: off
Maximum data alignment: 8
Database block size: 8192
Blocks per segment of large relation: 2097152
WAL block size: 8192
Bytes per WAL segment: 268435456
Maximum length of identifiers: 64
Maximum columns in an index: 32
Maximum size of a TOAST chunk: 1996
Size of a large-object chunk: 2048
Date/time type storage: 64-bit integers
Float4 argument passing: by value
Float8 argument passing: by value
Data page checksum version: 0
Mock authentication nonce: ffa8422268d5d89b2e2b7a3c6ccf0e3c77337ad8c0c9d785d76d92e3b2df4093
pg_controldata参数说明:
用于检查和显示PostgreSQL数据库控制文件信息的命令行工具。它可以提供有关数据库实例的一些关键信息,包括:
PostgreSQL数据库版本信息
数据目录的位置和大小
数据目录的设置和配置
系统标识符(System Identifier)和数据库标识符
最新检查点的位置和相关信息
通过运行pg_controldata命令,您可以快速了解有关您的PostgreSQL数据库实例的关键信息,并在需要时进行故障诊断或数据库管理操作。
10.查看归档日志情况(是否归档)
postgres=# select * from pg_stat_archiver;
archived_count | last_archived_wal | last_archived_time | failed_count | last_failed_wal | last_failed_time | stats_reset
----------------+-------------------+--------------------+--------------+--------------------------+-------------------------------+-------------------------------
0 | | | 284598 | 000000010000000000000001 | 2024-02-22 16:31:23.149986+08 | 2023-12-16 17:02:36.208542+08
(1 row)
postgres=# exit
[postgres@test01 ~]$ ls /mnt/data/postgres/pg_data/pg_wal/
000000010000000000000001 00000001000000050000000C 000000010000000B00000007 000000010000001100000002 00000001000000160000000D 000000010000001C00000008 000000010000002200000003
000000010000000000000002 00000001000000050000000D 000000010000000B00000008 000000010000001100000003 00000001000000160000000E 000000010000001C00000009 000000010000002200000004
000000010000000000000003 00000001000000050000000E 000000010000000B00000009 000000010000001100000004 00000001000000160000000F 000000010000001C0000000A 000000010000002200000005
000000010000000000000004 。。。。(省略600+个文件)
登录后复制
发现failed_count=284598 次数特别高,且不断增加,而last_failed_wal=000000010000000000000001,此为1号文件,再查pg_wal/archive_status/ 目录下,发现都是以.ready 结尾的
说明数据建立之初,wal日志一直就未成功归档过
|