今天一数据库流复制环境备库由于硬件原因导致主机宕机，PostgreSQL 版本为 9.2.1，持续时间为 5 小时左右，可能很多人会认为，备库挂了，可能要重做了，庆幸的是， pg 在这方面非常省心，在这种情况下，理论上只要主库的保持的 WAL 足够多，备库就能赶上主库并恢复，简单记录下。

备库数据库日志

备库系统启动后，数据库日志如下：

2012-10-29 20:07:50.193 CST,,,11520,,508e7196.2d00,1,,2012-10-29 20:07:50 CST,,0,LOG,00000,"database system was interrupted while in recovery at log time 2012-10-29 15:12:23 CST",,"If this has occurred more than once some data might be corrupted and you might need to choose an earlier recovery target.",,,,,,,""  
2012-10-29 20:07:50.226 CST,,,11520,,508e7196.2d00,2,,2012-10-29 20:07:50 CST,,0,LOG,00000,"entering standby mode",,,,,,,,,""  
2012-10-29 20:07:50.280 CST,,,11520,,508e7196.2d00,3,,2012-10-29 20:07:50 CST,1/0,0,LOG,00000,"redo starts at DC/DAAC3460",,,,,,,,,""  
2012-10-29 20:08:42.461 CST,,,11521,,508e7196.2d01,1,,2012-10-29 20:07:50 CST,,0,LOG,00000,"restartpoint starting: xlog",,,,,,,,,""  
2012-10-29 20:08:57.384 CST,,,11520,,508e7196.2d00,4,,2012-10-29 20:07:50 CST,1/0,0,LOG,00000,"consistent recovery state reached at DD/CED97FE0",,,,,,,,,""  
2012-10-29 20:08:57.385 CST,,,11518,,508e7195.2cfe,1,,2012-10-29 20:07:49 CST,,0,LOG,00000,"database system is ready to accept read only connections",,,,,,,,,""  
2012-10-29 20:08:57.403 CST,,,11520,,508e7196.2d00,5,,2012-10-29 20:07:50 CST,1/0,0,LOG,00000,"unexpected pageaddr D8/FAEC8000 in log file 221, segment 206, offset 15499264",,,,,,,,,""  
2012-10-29 20:08:57.442 CST,,,11607,,508e71d9.2d57,1,,2012-10-29 20:08:57 CST,,0,LOG,00000,"streaming replication successfully connected to primary",,,,,,,,,""  
2012-10-29 20:09:41.259 CST,,,11642,"",508e7205.2d7a,1,"",2012-10-29 20:09:41 CST,,0,LOG,00000,"connection received: host=127.0.0.1 port=13025",,,,,,,,,""  
2012-10-29 20:09:51.146 CST,,,11643,"",508e720f.2d7b,1,"",2012-10-29 20:09:51 CST,,0,LOG,00000,"connection received: host=192.168.1.38 port=53984",,,,,,,,,""  
2012-10-29 20:09:51.146 CST,,,11643,"xxx.xxx.xxx.xxx:xxxxx",508e720f.2d7b,2,"",2012-10-29 20:09:51 CST,,0,LOG,08P01,"incomplete startup packet",,,,,,,,,""  
2012-10-29 20:09:54.177 CST,,,11521,,508e7196.2d01,2,,2012-10-29 20:07:50 CST,,0,LOG,00000,"restartpoint complete: wrote 81492 buffers (31.1%); 0 transaction log file(s) added, 120 removed, 257 recycled; write=56.140 s, sync=14.822 s, total=71.715 s; sync files=66, longest=5.080 s, average=0.224 s",,,,,,,,,""  
2012-10-29 20:09:54.177 CST,,,11521,,508e7196.2d01,3,,2012-10-29 20:07:50 CST,,0,LOG,00000,"recovery restart point at DD/5B013708","last completed transaction was at log time 2012-10-29 15:26:04.08085+08",,,,,,,,""

备注：上面是备库系统宕机后，重新启动 PostgreSQL 时的日志，在备库恢复过程中还看不到 wal 接收进程，过会之后能看到如下进程。

检查备库 WAL 接收进程

1
2
3

[postgres@db](mailto:postgres@db)> ps -ef | grep wal  
postgres 11607 11518 1 20:08 ? 00:00:43 postgres: wal receiver process streaming E1/1D11F308  
postgres 16693 11871 0 20:59 pts/2 00:00:00 grep wal

主库 Wal_keep_segments 参数

postgres=> show wal_keep_segments ;  
wal_keep_segments  
-------------------  
1024  
(1 row)

备注：主库设置的 wal_keep_segments 参数为 1024，一个较大的 wal_keep_segments 设置，允许备库在宕机较长的时间内依然能够重新追上主库，当然这与主库的繁忙程度有关，主库越忙，产生的 WAL 日志越多，之前的 WAL 日志越容易被覆盖。

总结

此文仅简单记录了 PostgreSQL 流复制环境，standby 由于系统挂掉后恢复的情况，从以上看出，备库挂了之后，重新恢复是非常容易的，前提是在硬盘空间允许的范围内，主库尽量设置较大的 wal_keep_segments 参数，从而保证较多的 WAL 日志文件。

原创文章，作者：kepupublish，如若转载，请注明出处：https://blog.ytso.com/tech/database/237904.html

PostgreSQL流复制：备库主机宕机一例

备库数据库日志

检查备库 WAL 接收进程

主库 Wal_keep_segments 参数

总结

发表回复

PostgreSQL流复制：备库主机宕机一例

备库数据库日志

检查备库 WAL 接收进程

主库 Wal_keep_segments 参数

总结

相关推荐

发表回复