[pg95@db2 pg_root]$ grep ^[a-z] recovery.conf recovery_target_timeline = 'latest' standby_mode = on primary_conninfo = 'host=192.168.2.37 port=1931 user=repuser' # e.g. 'host=localhost port=5432'
激活备节点: db2 上操作
1 2 3 4 5
[pg95@db2 pg_root]$ pg_ctl promote -D $PGDATA server promoting [pg95@db2 pg_root]$ pg_controldata | grep cluster Database cluster state: in production
备节点激活后,创建一张测试表并插入数据
1 2 3 4 5 6 7 8 9
[pg95@db2 pg_root]$ psql psql (9.5alpha1) Type "help"for help. postgres=# create table test_2(id int4); CREATE TABLE postgres=# insert into test_2(id) select n from generate_series(1,10000) n; INSERT 0 10000
停原来主节点: db1 上操作
1 2 3 4 5 6
[pg95@db1 ~]$ pg_controldata | grep cluster Database cluster state: in production [pg95@db1 ~]$ pg_ctl stop -m fast -D $PGDATA waiting for server to shut down....... done server stopped
备注:停完原主库后,千万不能立即以备节点形式拉起老库,否则在执行 pg_rewind 时会报,”target server must be shut down cleanly” 错误。
pg_rewind: db1 操作
1 2 3 4 5 6 7
[pg95@db1 pg_root]$ pg_ctl stop -m fast -D $PGDATA waiting for server to shut down......... done server stopped [pg95@db1 pg_root]$ pg_rewind --target-pgdata $PGDATA--source-server='host=192.168.2.38 port=1931 user=postgres dbname=postgres' -P connected to server target server needs to use either data checksums or"wal_log_hints = on"
备注:执行 pg_rewind 抛出以上错误,错误内容很明显。
pg_rewind 代码分析
1 2 3 4 5 6 7 8 9 10
364 /* 365 * Target cluster need to use checksums or hint bit wal-logging, this to 366 * prevent from data corruption that could occur because of hint bits. 367 */ 368 if (ControlFile_target.data_checksum_version != PG_DATA_CHECKSUM_VERSION && 369 !ControlFile_target.wal_log_hints) 370 { 371 pg_fatal("target server needs to use either data checksums or "wal_log_hints = on"n"); 372 } 373
The basic idea istocopy everything fromthe new cluster tothe old cluster, except forthe blocks that we know to be the same. 1)Scan the WAL logofthe old cluster, starting fromthelast checkpoint beforethe point wherethe new cluster's timeline history forked off fromthe old cluster. For each WAL record, make a note ofthe data blocks that were touched. This yields a listof all the data blocks that were changed inthe old cluster, afterthe new cluster forked off. 2)Copy all those changed blocks fromthe new cluster tothe old cluster. 3)Copy all other files like clog, conf files etc. fromthe new cluster to old cluster. Everything except the relation files. 4) Apply the WAL fromthe new cluster, starting fromthe checkpoint created at failover. (Strictly speaking, pg_rewind doesn't apply the WAL, it just creates a backup label file indicating that when PostgreSQL is started, it will start replay fromthat checkpoint and apply all the required WAL.)