PostgreSQL 恢复一例

今天在将pg_dump压缩过的dump文件,通过pg_restore 导入到测试库时,中途异常中断,造成测试库宕机,而且之后数据无法启动。

数据库宕机的 Csvlog

1
2
3
4
5
6
7
8
2010-09-17 17:28:03.943 CST,"mydb","mydb",23936,"192.168.1.25:58855",4c9334a3.5d80,1,"/opt/pgsql/bin/postgres",2010-09-17 17:28:03 CST,,0,FATAL,57P03,"the database system is in recovery mode",,,,,,,,  
2010-09-17 17:28:03.944 CST,"mydb","mydb",23937,"192.168.1.25:58856",4c9334a3.5d81,1,"/opt/pgsql/bin/postgres",2010-09-17 17:28:03 CST,,0,FATAL,57P03,"the database system is in recovery mode",,,,,,,,
2010-09-17 17:28:03.954 CST,"mydb","mydb",23938,"192.168.1.25:58857",4c9334a3.5d82,1,"/opt/pgsql/bin/postgres",2010-09-17 17:28:03 CST,,0,FATAL,57P03,"the database system is in recovery mode",,,,,,,,
2010-09-17 17:28:03.955 CST,"mydb","mydb",23939,"192.168.1.25:58858",4c9334a3.5d83,1,"/opt/pgsql/bin/postgres",2010-09-17 17:28:03 CST,,0,FATAL,57P03,"the database system is in recovery mode",,,,,,,,
2010-09-17 17:28:03.956 CST,"mydb","mydb",23940,"192.168.1.25:58859",4c9334a3.5d84,1,"/opt/pgsql/bin/postgres",2010-09-17 17:28:03 CST,,0,FATAL,57P03,"the database system is in recovery mode",,,,,,,,
2010-09-17 17:28:04.032 CST,,,32240,,4c932f7d.7df0,9,,2010-09-17 17:06:05 CST,,0,FATAL,53100,"could not write to file ""pg_xlog/xlogtemp.32240"": No space left on device",,,,,,,,
2010-09-17 17:28:04.098 CST,,,946,,4c9321d4.3b2,5,,2010-09-17 16:07:48 CST,,0,LOG,00000,"startup process (PID 32240) exited with exit code 1",,,,,,,,
2010-09-17 17:28:04.098 CST,,,946,,4c9321d4.3b2,6,,2010-09-17 16:07:48 CST,,0,LOG,00000,"aborting startup due to startup process failure",,,,,,,,

从数据库down机前的 csvlog日志来看,down机前数据库SERVER 处于 recovery mode , 说明pg_restore时 SERVER此时有异常,之后尝试重新起动Server,命令如下

1
2
[postgres@PG1 bin]$ pg_ctl -D $PGDATA start  
server starting

虽然显示’server starting’, 但数据库并没有真正起来,因为这时数据库根本无法连接,接着查看 csvlog

数据库启动异常时 Csvlog

1
2
3
4
5
6
7
2010-09-17 17:36:36.660 CST,"mydb","mydb",24245,"192.168.169.42:42566",4c9336a4.5eb5,1,"/opt/pgsql/bin/postgres",2010-09-17 17:36:36 CST,,0,FATAL,57P03,"the database system is starting up",,,,,,,,  
2010-09-17 17:36:36.662 CST,"mydb","mydb",24247,"192.168.169.42:42567",4c9336a4.5eb7,1,"/opt/pgsql/bin/postgres",2010-09-17 17:36:36 CST,,0,FATAL,57P03,"the database system is starting up",,,,,,,,
2010-09-17 17:36:36.663 CST,"mydb","mydb",24246,"192.168.1.25:33223",4c9336a4.5eb6,1,"/opt/pgsql/bin/postgres",2010-09-17 17:36:36 CST,,0,FATAL,57P03,"the database system is starting up",,,,,,,,
2010-09-17 17:36:36.663 CST,"mydb","mydb",24248,"192.168.169.42:42568",4c9336a4.5eb8,1,"/opt/pgsql/bin/postgres",2010-09-17 17:36:36 CST,,0,FATAL,57P03,"the database system is starting up",,,,,,,,
2010-09-17 17:36:36.665 CST,"mydb","mydb",24249,"192.168.169.42:42569",4c9336a4.5eb9,1,"/opt/pgsql/bin/postgres",2010-09-17 17:36:36 CST,,0,FATAL,57P03,"the database system is starting up",,,,,,,,
2010-09-17 17:36:36.666 CST,,,24163,,4c9336a3.5e63,1,,2010-09-17 17:36:35 CST,,0,LOG,00000,"startup process (PID 24165) exited with exit code 1",,,,,,,,
2010-09-17 17:36:36.666 CST,,,24163,,4c9336a3.5e63,2,,2010-09-17 17:36:35 CST,,0,LOG,00000,"aborting startup due to startup process failure",,,,,,,,

从上面日志来看,可以看出Server 数据库正在启动(the database system is starting up”), 但是到后面就异常中上了; 而且也没有多余的信息,由于事情紧迫,随即向师傅请教,师傅说数据库需要恢复,可以恢复到具体的时间点。以下是解决过程。

解决过程

pg_controldata 查看PG SERVER 详细信息

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
[postgres@PG1 bin]$ pg_controldata $PGDATA  
pg_control version number: 843
Catalog version number: 200904091
Database system identifier: 5509641090052341117
Database cluster state: shut down
pg_control last modified: Fri 17 Sep 2010 05:28:03 PM CST
Latest checkpoint location: 137/BFFFF68
Prior checkpoint location: 131/9D90C818
Latest checkpoint's REDO location: 137/BFFFF68
Latest checkpoint's TimeLineID: 1
Latest checkpoint's NextXID: 0/64282
Latest checkpoint's NextOID: 215491390
Latest checkpoint's NextMultiXactId: 1
Latest checkpoint's NextMultiOffset: 0
Time of latest checkpoint: Fri 17 Sep 2010 05:26:05 PM CST
Minimum recovery ending location: 0/0
Maximum data alignment: 8
Database block size: 8192
Blocks per segment of large relation: 1048576
WAL block size: 65536
Bytes per WAL segment: 67108864
Maximum length of identifiers: 64
Maximum columns in an index: 32
Maximum size of a TOAST chunk: 1996
Date/time type storage: 64-bit integers
Float4 argument passing: by value
Float8 argument passing: by value

重要的信息:Latest checkpoint's NextXID: 0/64282, Latest checkpoint's NextXID 是指最近一次安全的checkpoints的下一个事务ID,我们可以将数据库恢复到这一时刻。

通过pg_resetxlog 将数据库恢复到事务 64282 时刻

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
[postgres@PG1 bin]$ pg_resetxlog --help  
pg_resetxlog resets the PostgreSQL transaction log.
Usage:
pg_resetxlog [OPTION]... DATADIR
Options:
-e XIDEPOCH set next transaction ID epoch
-f force update to be done
-l TLI,FILE,SEG force minimum WAL starting location for new transaction log
-m XID set next multitransaction ID
-n no update, just show extracted control values (for testing)
-o OID set next OID
-O OFFSET set next multitransaction offset
-x XID set next transaction ID
--help show this help, then exit
--version output version information, then exit
Report bugs to <[pgsql-bugs@postgresql.org](mailto:pgsql-bugs@postgresql.org)>.
[postgres@PG1 bin]$ pg_resetxlog -x 64282 $PGDATA
Transaction log reset

再次启动 PG SERVER,正常,此时数据库已恢复

1
2
3
4
5
6
7
8
[postgres@PG1 bin]$ pg_ctl -D $PGDATA start  
server starting
[postgres@PG1 bin]$ ps -ef | grep post
postgres 25297 1 14 18:08 pts/1 00:00:00 /opt/pgsql/bin/postgres -D /opt/pgdata/pg_root
postgres 25298 25297 0 18:08 ? 00:00:00 postgres: logger process
postgres 25829 25297 0 18:08 ? 00:00:00 postgres: writer process
postgres 25830 25297 0 18:08 ? 00:00:00 postgres: wal writer process
postgres 25831 25297 0 18:08 ? 00:00:00 postgres: stats collector process

到了这里,数据库PG SERVER 终于可以启来了,这里顺便说一下,之前的 pg_restore 中途异常是因为脚本中加了 -j 参数,同时跑多个 pg_restore 线程, 造成 pg_restore 子线程连接丢失,pg_restore 脚本中去掉 -j 参数时,数据库顺利导入。

Pg_controldata 官网文档介绍

Name
pg_controldata ― display control information of a PostgreSQL database cluster
Synopsis
pg_controldata [datadir]
Description
pg_controldata prints information initialized during initdb, such as the catalog version. It also shows information about write-ahead logging and checkpoint processing. This information is cluster-wide, and not specific to any one database.
This utility can only be run by the user who initialized the cluster because it requires read access to the data directory. You can specify the data directory on the command line, or use the environment variable PGDATA.
Environment
PGDATA
Default data directory location

Pg_resetxlog 官网文档介绍

pg_resetxlog ― reset the write-ahead log and other control information of a PostgreSQL database cluster

原创文章,作者:Maggie-Hunter,如若转载,请注明出处:https://blog.ytso.com/236329.html

(0)
上一篇 2022年1月24日
下一篇 2022年1月24日

相关推荐

发表回复

登录后才能评论