PostgreSQL+repmgr实现主备集群保姆级部署手册

一、工作原理

前言:
repmgr是一套开源工具,用于管理 PostgreSQL 服务器集群内的复制和故障转移。repmgr 支持并增强了 PostgreSQL 的内置流复制,它提供了一个单一的读/写主服务器和一个或多个只读备用服务器。

1.1、功能概述

repmgr 流复制管理工具对 PostgreSQL 集群节点的管理是基于分布式的管理方式。集群每个节点都具备一个 repmgr.conf 配置文件,用来记录本节点的 ID、节点名称、连接信息、数据库的 PGDATA 目录等配置参数。在完成参数配置后,就可以通过 repmgr 命令实现对集群节点的 “一键式” 部署。

Repmgr整体架构图:

PostgreSQL+repmgr实现主备集群保姆级部署手册

集群节点部署完成后,每个节点都可通过 repmgrd 守护进程来监控节点数据库状态;每个节点元数据表可独立维护,这些元数据表将记录所有集群节点的信息。

1.2、选举原理

在发生 Auto Failover 时,备节点在尝试多次连接主节点失败后(尝试次数及尝试间隔可以通过 repmgr.conf 配置文件修改),repmgrd 会在所有备节点中选举一个候选备节点(选举机制参考下文)提升为新主节点,其他备节点去 Follow 到该新主上,形成一个新的集群。
repmgr 选举候选备节点按照以下顺序选举:LSN > Priority > Node_ID
系统将优先选举一个 LSN 较大的节点,作为候选备节点;
若 LSN 一样,会根据 Priority 优先级进行比较(该优先级是在配置文件中进行参数配置,如果 Priority 为 0,则代表该节点被禁止提升为主节点);
若优先级也一样,会比较节点的 Node ID,小者会优先选举。

1.3、两个工具

repmgr 主要提供了 repmgr 和 repmgrd 两个工具。
repmgr 是一个执行管理任务的命令行工具,方便进行 PostgreSQL 服务器集群的管理。具备以下功能特点:

  • 设置备用服务器
    -> promote 备
    -> 主从切换
    -> 显示复制集群中服务器的状态
    -> epmgrd 是一个守护进程,它主动监视复制集群中的服务器并支持以下任务:
    -> 监控和记录复制集群信息
    -> 故障检测、故障转移
    -> 集群中事件的通知(需要自定义脚本接受通知)

二、环境初始化

2.1、环境规划

主机名称 主机IP 用途 操作系统 PostgreSQL版本 Repmgr版本
Node1 192.168.1.61 Primary CentOS Linux 7.9 16.4 5.4.1
Node2 192.168.1.62 Standby CentOS Linux 7.9 16.4 5.4.1

2.2、统一主机名称

vi /etc/hostname
或 hostnamectl set-hostname Node1

2.3、关闭SELINUX

sed -i  "s/SELINUX=enforcing/SELINUX=disabled/"  /etc/selinux/config

2.4、关闭防火墙

systemctl stop firewalld.service
systemctl disable firewalld.service

或设置开放端口:
查看开放的端口:
firewall-cmd --list-ports
开启防火墙端口:
firewall-cmd --zone=public --add-port=5432/tcp --permanent
命令含义:
–zone #作用域
–add-port=9200/tcp #添加端口,格式为:端口/通讯协议
–permanent #永久生效,没有此参数重启后失效
重新加载一遍才会生效:
firewall-cmd --reload

2.5、安装插件

yum -y install readline readline-devel zlib zlib-devel gettext gettext-devel openssl openssl-devel pam pam-devel libxml2 libxml2-devel libxslt libxslt-devel perl perl-devel tcl-devel uuid-devel gcc gcc-c++ make flex bison perl-ExtUtils*  libicu  libicu-devel  ntp libcurl-devel

2.6、编辑hosts

cat>>/etc/hosts<<EOF
192.168.1.61  Node1
192.168.1.62  Node2
EOF

2.7、用户limits设置

cat>>/etc/security/limits.conf<<EOF
postgres soft nofile 1048576
postgres hard nofile 1048576
postgres soft nproc 131072
postgres hard nproc 131072
postgres soft stack 10240
postgres hard stack 32768
postgres soft core 6291456
postgres hard core 6291456
EOF

2.8、调整内核

cat>>/etc/sysctl.conf<<EOF
fs.file-max = 76724200
kernel.sem = 10000 10240000 10000 1024
kernel.shmmni = 4096
kernel.shmall = 253702
kernel.shmmax = 1039163392
net.ipv4.ip_local_port_range = 9000 65500
net.core.rmem_default = 262144
net.core.wmem_default = 262144
net.core.rmem_max = 4194304
net.core.wmem_max = 1048576
fs.aio-max-nr = 40960000
vm.dirty_ratio=20
vm.dirty_background_ratio=3
vm.dirty_writeback_centisecs=100
vm.dirty_expire_centisecs=500
vm.swappiness=10
vm.min_free_kbytes=524288
vm.swappiness=0
vm.overcommit_memory=2
vm.overcommit_ratio=75
net.ipv4.ip_local_port_range = 10000 65535
EOF

sysctl -p 

2.9、目录规划

目录编号 目录名称 备注
1 /postgres postgres根目录
2 /postgres/server 软件安装目录
3 /postgres/data 数据目录
4 /postgres/WAL WAL 目录
5 /postgres/archive 归档目录
6 /postgres/backup 备份目录
7 /install 安装包上传目录
8 /postgres/server/etc repmgr.conf目录
9 /postgres/server/etc/log repmgr日志目录
  • 执行脚本:
mkdir /install
 mkdir -p /postgres/server /postgres/data  /postgres/wal  /postgres/archive  /postgres/backup /postgres/server/etc/log
 sudo chown -R postgres:postgres /postgres /install
 chmod 0775 /postgres  /install
 chmod 0700 /postgres/data

2.10、配置各节点互信

su - postgres
rm -rf /home/postgres/.ssh
mkdir ~/.ssh
chmod 700 ~/.ssh
ssh-keygen -t rsa
ssh-copy-id postgres@192.168.1.61
ssh-copy-id postgres@192.168.1.62

三、PostgreSQL部署

3.1、部署PostgreSQL(2台主机Node1、Node2都要部署)

su  postgres
cd /install	
wget  https://ftp.postgresql.org/pub/source/v16.4/postgresql-16.4.tar.gz
tar -zxvf postgresql-16.4.tar.gz

3.2、在Node1分发文件到Node2、Node1

cd /install
scp -r ./* 192.168.1.62:/install

3.3、在Node1分发文件到Node2、Node1

cd /install/postgresql-16.4


./configure --prefix=/postgres/server --with-pgport=5432 --with-openssl

make && make install

使用  make 或者  make world
$ make world
使用 make install 或者 make install-world 进行安装
$ make install-world     #包含扩展包和文档

  • Node1:
[postgres@Node1 postgresql-16.4]$ ./configure --prefix=/postgres/server --with-pgport=5432 --with-openssl
checking build system type... x86_64-pc-linux-gnu
checking host system type... x86_64-pc-linux-gnu
checking which template to use... linux
checking whether NLS is wanted... no
checking for default port number... 5432
checking for block size... 8kB
checking for segment size... 1GB
checking for WAL block size... 8kB
checking for gcc... gcc
checking whether the C compiler works... yes
.....................................................
configure: creating ./config.status
config.status: creating GNUmakefile
config.status: creating src/Makefile.global
config.status: creating src/include/pg_config.h
config.status: creating src/include/pg_config_ext.h
config.status: creating src/interfaces/ecpg/include/ecpg_config.h
config.status: linking src/backend/port/tas/dummy.s to src/backend/port/tas.s
config.status: linking src/backend/port/posix_sema.c to src/backend/port/pg_sema.c
config.status: linking src/backend/port/sysv_shmem.c to src/backend/port/pg_shmem.c
config.status: linking src/include/port/linux.h to src/include/pg_config_os.h
config.status: linking src/makefiles/Makefile.linux to src/Makefile.port
[postgres@Node1 postgresql-16.4]$


[postgres@Node1 postgresql-16.4]$ make && make install
make -C ./src/backend generated-headers
make[1]: Entering directory `/install/postgresql-16.4/src/backend'
make -C catalog distprep generated-header-symlinks
make[2]: Entering directory `/install/postgresql-16.4/src/backend/catalog'
make[2]: Nothing to be done for `distprep'.
prereqdir=`cd './' >/dev/null && pwd` && 
cd '../../../src/include/catalog/' && for file in pg_proc_d.h pg_type_d.h pg_attribute_d.h pg_class_d.h pg_attrdef_d.h pg_constraint_d.h pg_inherits_d.h pg_index_d.h pg_operator_d.h pg_opfamily_d.h pg_opclass_d.h pg_am_d.h pg_amop_d.h pg_amproc_d.h pg_language_d.h pg_largeobject_metadata_d.h pg_largeobject_d.h pg_aggregate_d.h pg_statistic_d.h pg_statistic_ext_d.h pg_statistic_ext_data_d.h pg_rewrite_d.h pg_trigger_d.h pg_event_trigger_d.h pg_description_d.h pg_cast_d.h pg_enum_d.h pg_namespace_d.h pg_conversion_d.h pg_depend_d.h pg_database_d.h pg_db_role_setting_d.h pg_tablespace_d.h pg_authid_d.h pg_auth_members_d.h pg_shdepend_d.h pg_shdescription_d.h pg_ts_config_d.h pg_ts_config_map_d.h pg_ts_dict_d.h pg_ts_parser_d.h pg_ts_template_d.h pg_extension_d.h pg_foreign_data_wrapper_d.h pg_foreign_server_d.h pg_user_mapping_d.h pg_foreign_table_d.h pg_policy_d.h pg_replication_origin_d.h pg_default_acl_d.h pg_init_privs_d.h pg_seclabel_d.h pg_shseclabel_d.h pg_collation_d.h pg_parameter_acl_d.h pg_partitioned_table_d.h pg_range_d.h pg_transform_d.h pg_sequence_d.h pg_publication_d.h pg_publication_namespace_d.h pg_publication_rel_d.h pg_subscription_d.h pg_subscription_rel_d.h schemapg.h system_fk_info.h; do 
  rm -f $file && ln -s "$prereqdir/$file" . ; 
done
.....................................................
make[1]: Leaving directory `/install/postgresql-16.4/src'
make -C config install
make[1]: Entering directory `/install/postgresql-16.4/config'
/bin/mkdir -p '/postgres/server/lib/pgxs/config'
/bin/install -c -m 755 ./install-sh '/postgres/server/lib/pgxs/config/install-sh'
/bin/install -c -m 755 ./missing '/postgres/server/lib/pgxs/config/missing'
make[1]: Leaving directory `/install/postgresql-16.4/config'
[postgres@Node1 postgresql-16.4]$
  • Node2:
[postgres@Node2 postgresql-16.4]$
[postgres@Node2 postgresql-16.4]$ ./configure --prefix=/postgres/server --with-pgport=5432 --with-openssl
checking build system type... x86_64-pc-linux-gnu
checking host system type... x86_64-pc-linux-gnu
checking which template to use... linux
checking whether NLS is wanted... no
checking for default port number... 5432
checking for block size... 8kB
checking for segment size... 1GB
checking for WAL block size... 8kB
checking for gcc... gcc
checking whether the C compiler works... yes
...................................................................
configure: creating ./config.status
config.status: creating GNUmakefile
config.status: creating src/Makefile.global
config.status: creating src/include/pg_config.h
config.status: creating src/include/pg_config_ext.h
config.status: creating src/interfaces/ecpg/include/ecpg_config.h
config.status: linking src/backend/port/tas/dummy.s to src/backend/port/tas.s
config.status: linking src/backend/port/posix_sema.c to src/backend/port/pg_sema.c
config.status: linking src/backend/port/sysv_shmem.c to src/backend/port/pg_shmem.c
config.status: linking src/include/port/linux.h to src/include/pg_config_os.h
config.status: linking src/makefiles/Makefile.linux to src/Makefile.port
[postgres@Node2 postgresql-16.4]$


[postgres@Node2.. postgresql-16.4]$ make && make install
make -C ./src/backend generated-headers
make[1]: Entering directory `/install/postgresql-16.4/src/backend'
make -C catalog distprep generated-header-symlinks
make[2]: Entering directory `/install/postgresql-16.4/src/backend/catalog'
make[2]: Nothing to be done for `distprep'.
prereqdir=`cd './' >/dev/null && pwd` && 
cd '../../../src/include/catalog/' && for file in pg_proc_d.h pg_type_d.h pg_attribute_d.h pg_class_d.h pg_attrdef_d.h pg_constraint_d.h pg_inherits_d.h pg_index_d.h pg_operator_d.h pg_opfamily_d.h pg_opclass_d.h pg_am_d.h pg_amop_d.h pg_amproc_d.h pg_language_d.h pg_largeobject_metadata_d.h pg_largeobject_d.h pg_aggregate_d.h pg_statistic_d.h pg_statistic_ext_d.h pg_statistic_ext_data_d.h pg_rewrite_d.h pg_trigger_d.h pg_event_trigger_d.h pg_description_d.h pg_cast_d.h pg_enum_d.h pg_namespace_d.h pg_conversion_d.h pg_depend_d.h pg_database_d.h pg_db_role_setting_d.h pg_tablespace_d.h pg_authid_d.h pg_auth_members_d.h pg_shdepend_d.h pg_shdescription_d.h pg_ts_config_d.h pg_ts_config_map_d.h pg_ts_dict_d.h pg_ts_parser_d.h pg_ts_template_d.h pg_extension_d.h pg_foreign_data_wrapper_d.h pg_foreign_server_d.h pg_user_mapping_d.h pg_foreign_table_d.h pg_policy_d.h pg_replication_origin_d.h pg_default_acl_d.h pg_init_privs_d.h pg_seclabel_d.h pg_shseclabel_d.h pg_collation_d.h pg_parameter_acl_d.h pg_partitioned_table_d.h pg_range_d.h pg_transform_d.h pg_sequence_d.h pg_publication_d.h pg_publication_namespace_d.h pg_publication_rel_d.h pg_subscription_d.h pg_subscription_rel_d.h schemapg.h system_fk_info.h; do 
  rm -f $file && ln -s "$prereqdir/$file" . ; 
done
.....................................................
make[1]: Leaving directory `/install/postgresql-16.4/src'
make -C config install
make[1]: Entering directory `/install/postgresql-16.4/config'
/bin/mkdir -p '/postgres/server/lib/pgxs/config'
/bin/install -c -m 755 ./install-sh '/postgres/server/lib/pgxs/config/install-sh'
/bin/install -c -m 755 ./missing '/postgres/server/lib/pgxs/config/missing'
make[1]: Leaving directory `/install/postgresql-16.4/config'
[postgres@Node2 postgresql-16.4]$

3.4、设置软链接

su root
mkdir /postgresql
chmod -R 777 /postgresql
chown -R postgres:postgres /postgresql
cd /postgres
ln -s /postgres/server /postgresql

3.5、设置环境变量

vi /home/postgres/.bashrc
export PGPORT=5432
export PGUSER=postgres
export PGHOME=/postgres/server
export PGDATA=/postgres/data
export PATH=$PGHOME/bin:$PATH

source  /home/postgres/.bashrc

3.6、初始化数据库

  • 说明:只需初始化Node1
  • 脚本:
 /postgresql/server/bin/initdb -D/postgres/data -X/postgres/wal -EUTF8 -Upostgres -W --data-checksums --pwprompt
  • 执行过程:
[postgres@Node1 ~]$  /postgresql/server/bin/initdb -D/postgres/data -X/postgres/wal -EUTF8 -Upostgres -W --data-checksums --pwprompt
The files belonging to this database system will be owned by user "postgres".
This user must also own the server process.

The database cluster will be initialized with locale "en_US.UTF-8".
The default text search configuration will be set to "english".

Data page checksums are enabled.

Enter new superuser password:
Enter it again:

fixing permissions on existing directory /postgres/data ... ok
fixing permissions on existing directory /postgres/wal ... ok
creating subdirectories ... ok
selecting dynamic shared memory implementation ... posix
selecting default max_connections ... 100
selecting default shared_buffers ... 128MB
selecting default time zone ... Asia/Shanghai
creating configuration files ... ok
running bootstrap script ... ok
performing post-bootstrap initialization ... ok
syncing data to disk ... ok

initdb: warning: enabling "trust" authentication for local connections
initdb: hint: You can change this by editing pg_hba.conf or using the option -A, or --auth-local and --auth-host, the next time you run initdb.

Success. You can now start the database server using:

    /postgresql/server/bin/pg_ctl -D /postgres/data -l logfile start

[postgres@Node1 ~]$

3.7、设置开机自启

PostgreSQL的开机自启动脚本位于PostgreSQL源码目录的contrib/start-scripts路径下
这个路径是安装包解压完的路径。

linux文件即为linux系统上的启动脚本
1)修改linux文件属性,添加X属性
exit
切换到root用户
cd /install/postgresql-16.2/contrib/start-scripts 
切换到解压目录下
chmod a+x linux
2) 复制linux文件到/etc/init.d目录下,更名为PostgreSQL
cp linux /etc/init.d/PostgreSQL
 
3)修改/etc/init.d/PostgreSQL文件的两个变量
vi /etc/init.d/PostgreSQL

prefix设置为PostgreSQL的安装路径:prefix=/postgres/server
PGDATA设置为PostgreSQL的数据目录路径:PGDATA="/postgres/data"
 
4) 执行service PostgreSQL start,就可以启动PostgreSQL服务
service PostgreSQL start
 
5)设置PostgreSQL服务开机自启动
chkconfig --add PostgreSQL

执行上面的命令,就可以实现PostgreSQL服务的开机自启动。
Reboot

3.8、修改postgresql.conf

cat >> /postgres/data/postgresql.conf <<EOF
listen_addresses = '*'
max_replication_slots = 12
max_wal_senders = 10
wal_keep_size = 1024
wal_level = replica
hot_standby = on
wal_log_hints = on
full_page_writes=on
shared_preload_libraries = 'repmgr'
archive_command = 'test ! -f /postgres/archive/%f && cp %p /postgres/archive/%f'
EOF

3.9、修改pg_hba.conf


vi $PGDATA/pg_hba.conf

# IPv4 local connections:
host    all              all              127.0.0.1/32            trust
host    repmgr          repmgr          192.168.1.61/24         trust
host    repmgr          repmgr          192.168.1.62/24         trust
host    replication     repmgr          192.168.1.61/24         trust
host    replication     repmgr          192.168.1.62/24         trust
host    all             all             0.001.0/0               scram-sha-256

四、部署repmgr

  • 说明:部署Repmgr(2台主机Node1、Node2都要部署)

4.1、安装插件

sudo yum -y install libcurl-devel
sudo yum -y install json-c-devel

4.2、在线下载repmgr安装包

cd /install

wget -c https://repmgr.org/download/repmgr-5.4.1.tar.gz
tar -zxvf repmgr-5.4.1.tar.gz 
cd /install/repmgr-5.4.1

./configure   && make install
  • Node1:
[postgres@Node1 ~]$ cd /install/repmgr-5.4.1
[postgres@Node1 repmgr-5.4.1]$
[postgres@Node1 repmgr-5.4.1]$ ./configure && make install
checking for a sed that does not truncate output... /bin/sed
checking for pg_config... /postgres/server/bin/pg_config
configure: building against PostgreSQL 16.4
checking for gnused... no
checking for gsed... no
checking for sed... yes
configure: creating ./config.status
config.status: creating Makefile
config.status: creating Makefile.global
config.status: creating config.h
Building against PostgreSQL 16
.....................................................
/bin/mkdir -p '/postgres/server/lib'
/bin/mkdir -p '/postgres/server/share/extension'
/bin/mkdir -p '/postgres/server/share/extension'
/bin/mkdir -p '/postgres/server/bin'
/bin/install -c -m 755  repmgr.so '/postgres/server/lib/repmgr.so'
/bin/install -c -m 644 .//repmgr.control '/postgres/server/share/extension/'
/bin/install -c -m 644 .//repmgr--unpackaged--4.0.sql .//repmgr--unpackaged--5.1.sql .//repmgr--unpackaged--5.2.sql .//repmgr--unpackaged--5.3.sql .//repmgr--4.0.sql .//repmgr--4.0--4.1.sql .//repmgr--4.1.sql .//repmgr--4.1--4.2.sql .//repmgr--4.2.sql .//repmgr--4.2--4.3.sql .//repmgr--4.3.sql .//repmgr--4.3--4.4.sql .//repmgr--4.4.sql .//repmgr--4.4--5.0.sql .//repmgr--5.0.sql .//repmgr--5.0--5.1.sql .//repmgr--5.1.sql .//repmgr--5.1--5.2.sql .//repmgr--5.2.sql .//repmgr--5.2--5.3.sql .//repmgr--5.3.sql .//repmgr--5.3--5.4.sql .//repmgr--5.4.sql  '/postgres/server/share/extension/'
/bin/install -c -m 755 repmgr repmgrd '/postgres/server/bin/'
[postgres@Node1 repmgr-5.4.1]$
  • Node2:
[postgres@Node2 ~]$ cd /install/repmgr-5.4.1
[postgres@Node2 repmgr-5.4.1]$ ./configure && make install
checking for a sed that does not truncate output... /bin/sed
checking for pg_config... /postgres/server/bin/pg_config
configure: building against PostgreSQL 16.4
checking for gnused... no
checking for gsed... no
checking for sed... yes
configure: creating ./config.status
config.status: creating Makefile
config.status: creating Makefile.global
config.status: creating config.h
Building against PostgreSQL 16
.....................................................
/bin/mkdir -p '/postgres/server/lib'
/bin/mkdir -p '/postgres/server/share/extension'
/bin/mkdir -p '/postgres/server/share/extension'
/bin/mkdir -p '/postgres/server/bin'
/bin/install -c -m 755  repmgr.so '/postgres/server/lib/repmgr.so'
/bin/install -c -m 644 .//repmgr.control '/postgres/server/share/extension/'
/bin/install -c -m 644 .//repmgr--unpackaged--4.0.sql .//repmgr--unpackaged--5.1.sql .//repmgr--unpackaged--5.2.sql .//repmgr--unpackaged--5.3.sql .//repmgr--4.0.sql .//repmgr--4.0--4.1.sql .//repmgr--4.1.sql .//repmgr--4.1--4.2.sql .//repmgr--4.2.sql .//repmgr--4.2--4.3.sql .//repmgr--4.3.sql .//repmgr--4.3--4.4.sql .//repmgr--4.4.sql .//repmgr--4.4--5.0.sql .//repmgr--5.0.sql .//repmgr--5.0--5.1.sql .//repmgr--5.1.sql .//repmgr--5.1--5.2.sql .//repmgr--5.2.sql .//repmgr--5.2--5.3.sql .//repmgr--5.3.sql .//repmgr--5.3--5.4.sql .//repmgr--5.4.sql  '/postgres/server/share/extension/'
/bin/install -c -m 755 repmgr repmgrd '/postgres/server/bin/'
[postgres@Node2 repmgr-5.4.1]$


4.3、检查repmgr安装位置

Node1:

[postgres@Node1 ~]$ pg_config --sysconfdir
/postgres/server/etc
[postgres@Node1 ~]$

Node2:

[postgres@Node2 ~]$ pg_config --sysconfdir
/postgres/server/etc
[postgres@Node2 ~]$

4.4、配置repmgr.conf(2个节点Node1、Node2都要配置)

vi /postgres/server/etc/repmgr.conf
  • Node1:
# 节点标识设置
node_id = 1
node_name = Node1
# 数据库连接信息
conninfo = 'host=192.168.1.61 user=repmgr dbname=repmgr port=5432 connect_timeout=2'
# 数据目录路径
data_directory = '/postgres/data'
# PostgreSQL二进制目录路径
pg_bindir = '/postgres/server/bin'
config_directory='/postgres/server'
# 日志级别与日志文件位置
log_level = INFO
log_facility=STDERR
log_file = '/postgres/server/etc/repmgr.log'

# 故障切换策略设定
failover = automatic
# 提升从库为新主库的命令
promote_command = '/postgres/server/bin/repmgr standby promote -f /postgres/server/etc/repmgr.conf'
# 从库跟随主库命令
follow_command = '/postgres/server/bin/repmgr standby follow -f /postgres/server/etc/repmgr.conf --upstream-node-id=%n'
# 监控历史记录存储选项
monitoring_history = true
# 监控间隔时间(秒)
monitor_interval_secs = 2
connection_check_type='ping'
# 重连尝试次数
reconnect_attempts = 3
# 两次重连之间的间隔时间(秒)
reconnect_interval = 5
# repmgrd文件
repmgrd_pid_file = '/postgres/server/etc/repmgrd.pid'
  • Node2:
# 节点标识设置
node_id = 2
node_name = Node2
# 数据库连接信息
conninfo = 'host=192.168.1.62 user=repmgr dbname=repmgr port=5432 connect_timeout=2'
# 数据目录路径
data_directory = '/postgres/data'
# PostgreSQL二进制目录路径
pg_bindir = '/postgres/server/bin'
config_directory='/postgres/server'
# 日志级别与日志文件位置
log_level = INFO
log_facility=STDERR
log_file = '/postgres/server/etc/repmgr.log'

# 故障切换策略设定
failover = automatic
# 提升从库为新主库的命令
promote_command = '/postgres/server/bin/repmgr standby promote -f /postgres/server/etc/repmgr.conf'
# 从库跟随主库命令
follow_command = '/postgres/server/bin/repmgr standby follow -f /postgres/server/etc/repmgr.conf --upstream-node-id=%n'
# 监控历史记录存储选项
monitoring_history = true
# 监控间隔时间(秒)
monitor_interval_secs = 2
connection_check_type='ping'
# 重连尝试次数
reconnect_attempts = 3
# 两次重连之间的间隔时间(秒)
reconnect_interval = 5
# repmgrd文件
repmgrd_pid_file = '/postgres/server/etc/repmgrd.pid'

4.5、启动Node1的PostgreSQL服务

  • 脚本:
    pg_ctl start
  • 执行内容:
[postgres@Node1 ~]$ pg_ctl start
waiting for server to start....2024-09-14 21:18:41.333 CST [18549] LOG:  starting PostgreSQL 16.4 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-44), 64-bit
2024-09-14 21:18:41.337 CST [18549] LOG:  listening on IPv4 address "0.0.0.0", port 5432
2024-09-14 21:18:41.337 CST [18549] LOG:  listening on IPv6 address "::", port 5432
2024-09-14 21:18:41.341 CST [18549] LOG:  listening on Unix socket "/tmp/.s.PGSQL.5432"
2024-09-14 21:18:41.349 CST [18552] LOG:  database system was shut down at 2024-09-14 19:58:52 CST
2024-09-14 21:18:41.356 CST [18549] LOG:  database system is ready to accept connections
 done
server started
[postgres@Node1 ~]$

4.6、在Node1登录psql,创建集群管理用户repmgr,数据库repmgr

  • 脚本:
 create user repmgr with superuser replication password 'repmgr' ;
create database repmgr owner  repmgr;
  • 执行内容:
[postgres@Node1 ~]$ psql
psql (16.4)
Type "help" for help.

postgres=# create user repmgr with superuser replication password 'repmgr' ;
CREATE ROLE
postgres=# create database repmgr owner  repmgr;
CREATE DATABASE
postgres=#

4.7、注册Primary主库(在Node1节点执行)

  • 脚本:
repmgr primary register
  • 执行内容:
[postgres@Node1 ~]$ repmgr primary register
INFO: connecting to primary database...
NOTICE: attempting to install extension "repmgr"
NOTICE: "repmgr" extension successfully installed
NOTICE: primary node record (ID: 1) registered
[postgres@Node1 ~]$

4.8、启动守护进程(在Node1节点执行)

  • 脚本:
    repmgrd -d
  • 执行内容:
[postgres@Node1 ~]$ repmgrd -d
[2024-09-14 21:20:48] [NOTICE] redirecting logging output to "/postgres/server/etc/repmgr.log"

[postgres@Node1 ~]$

4.9、注册Standby主库(在Node2节点执行)

  • 脚本:
  • 1、-检查是否可以注册备节点
    repmgr -h 192.168.1.61 -p5432 -U repmgr -d repmgr standby clone –dry-run
  • 2、注意备库repmgr.conf文件中的目录要为空
    repmgr -h 192.168.1.61 -p5432 -U repmgr -d repmgr standby clone –force
  • 3、启动服务
    pg_ctl start
  • 执行内容:
[postgres@Node2 ~]$ repmgr -h 192.168.1.61 -p5432 -U repmgr -d repmgr standby clone --dry-run
WARNING: following problems with command line parameters detected:
  "config_directory" set in repmgr.conf, but --copy-external-config-files not provided
NOTICE: destination directory "/postgres/data" provided
INFO: connecting to source node
DETAIL: connection string is: host=192.168.1.61 port=5432 user=repmgr dbname=repmgr
DETAIL: current installation size is 30 MB
INFO: "repmgr" extension is installed in database "repmgr"
INFO: replication slot usage not requested;  no replication slot will be set up for this standby
INFO: parameter "max_wal_senders" set to 10
NOTICE: checking for available walsenders on the source node (2 required)
INFO: sufficient walsenders available on the source node
DETAIL: 2 required, 10 available
NOTICE: checking replication connections can be made to the source server (2 required)
INFO: required number of replication connections could be made to the source server
DETAIL: 2 replication connections required
NOTICE: standby will attach to upstream node 1
HINT: consider using the -c/--fast-checkpoint option
INFO: would execute:
  /postgres/server/bin/pg_basebackup -l "repmgr base backup"  -D /postgres/data -h 192.168.1.61 -p 5432 -U repmgr -X stream
INFO: all prerequisites for "standby clone" are met
[postgres@Node2 ~]$


[postgres@Node2 ~]$ repmgr -h 192.168.1.61 -p5432 -U repmgr -d repmgr standby clone --force
WARNING: following problems with command line parameters detected:
  "config_directory" set in repmgr.conf, but --copy-external-config-files not provided
NOTICE: destination directory "/postgres/data" provided
INFO: connecting to source node
DETAIL: connection string is: host=192.168.1.61 port=5432 user=repmgr dbname=repmgr
DETAIL: current installation size is 30 MB
INFO: replication slot usage not requested;  no replication slot will be set up for this standby
NOTICE: checking for available walsenders on the source node (2 required)
NOTICE: checking replication connections can be made to the source server (2 required)
INFO: checking and correcting permissions on existing directory "/postgres/data"
NOTICE: starting backup (using pg_basebackup)...
HINT: this may take some time; consider using the -c/--fast-checkpoint option
INFO: executing:
  /postgres/server/bin/pg_basebackup -l "repmgr base backup"  -D /postgres/data -h 192.168.1.61 -p 5432 -U repmgr -X stream
NOTICE: standby clone (using pg_basebackup) complete
NOTICE: you can now start your PostgreSQL server
HINT: for example: pg_ctl -D /postgres/data start
HINT: after starting the server, you need to register this standby with "repmgr standby register"
[postgres@Node2 ~]$


[postgres@Node2 ~]$  pg_ctl start
waiting for server to start....2024-09-14 21:25:23.805 CST [19557] LOG:  starting PostgreSQL 16.4 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-44), 64-bit
2024-09-14 21:25:23.806 CST [19557] LOG:  listening on IPv4 address "0.0.0.0", port 5432
2024-09-14 21:25:23.806 CST [19557] LOG:  listening on IPv6 address "::", port 5432
2024-09-14 21:25:23.812 CST [19557] LOG:  listening on Unix socket "/tmp/.s.PGSQL.5432"
2024-09-14 21:25:23.818 CST [19560] LOG:  database system was interrupted; last known up at 2024-09-14 21:24:35 CST
2024-09-14 21:25:23.833 CST [19560] LOG:  entering standby mode
2024-09-14 21:25:23.834 CST [19560] LOG:  starting backup recovery with redo LSN 0/2000028, checkpoint LSN 0/2000098, on timeline ID 1
2024-09-14 21:25:23.836 CST [19560] LOG:  redo starts at 0/2000028
2024-09-14 21:25:23.837 CST [19560] LOG:  completed backup recovery with redo LSN 0/2000028 and end LSN 0/2000170
2024-09-14 21:25:23.837 CST [19560] LOG:  consistent recovery state reached at 0/2000170
2024-09-14 21:25:23.837 CST [19557] LOG:  database system is ready to accept read-only connections
 done
server started
[postgres@Node2 ~]$ 2024-09-14 21:25:23.865 CST [19561] LOG:  started streaming WAL from primary at 0/3000000 on timeline 1

[postgres@Node2 ~]$

4.10、注册standby节点(在Node2节点执行)

[postgres@Node2 ~]$
[postgres@Node2 ~]$ repmgr standby register
INFO: connecting to local node "Node2" (ID: 2)
INFO: connecting to primary database
WARNING: --upstream-node-id not supplied, assuming upstream node is primary (node ID: 1)
INFO: standby registration complete
NOTICE: standby node "Node2" (ID: 2) successfully registered
[postgres@Node2 ~]$

4.11、启动守护进程(在Node2节点执行)

  • 脚本:
    repmgrd -d
  • 执行内容:
[postgres@Node2 ~]$ repmgrd -d
[2024-09-14 21:25:23] [NOTICE] redirecting logging output to "/postgres/server/etc/repmgr.log"

五、repmgr集群管理

5.1、检查集群运行情况

1、检查集群状态1

[postgres@Node1 ~]$ psql -Urepmgr -drepmgr
psql (16.4)
Type "help" for help.

repmgr=#  select * from nodes;
 node_id | upstream_node_id | active | node_name |  type   | location | priority |                                conninfo
                      | repluser | slot_name |           config_file
---------+------------------+--------+-----------+---------+----------+----------+---------------------------------------------------
----------------------+----------+-----------+----------------------------------
       1 |                  | t      | Node1     | primary | default  |      100 | host=192.168.1.61 user=repmgr dbname=repmgr port=5
432 connect_timeout=2 | repmgr   |           | /postgres/server/etc/repmgr.conf
       2 |                1 | t      | Node2     | standby | default  |      100 | host=192.168.1.62 user=repmgr dbname=repmgr port=5
432 connect_timeout=2 | repmgr   |           | /postgres/server/etc/repmgr.conf
(2 rows)

repmgr=#

2、检查集群状态2

[postgres@Node2 ~]$ repmgr cluster show
 ID | Name  | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string

----+-------+---------+-----------+----------+----------+----------+----------+-------------------------------------------------------------------------
 1  | Node1 | primary | * running |          | default  | 100      | 1        | host=192.168.1.61 user=repmgr dbname=repmgr port=5432 connect_timeout=2
 2  | Node2 | standby |   running | Node1    | default  | 100      | 1        | host=192.168.1.62 user=repmgr dbname=repmgr port=5432 connect_timeout=2
[postgres@Node2 ~]$

5.2、模拟故障,实现自动主备切换

1、在Node1停止服务

[postgres@Node2 ~]$ repmgr cluster show
 ID | Name  | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string

----+-------+---------+-----------+----------+----------+----------+----------+-------------------------------------------------------------------------
 1  | Node1 | primary | - failed  | ?        | default  | 100      |          | host=192.168.1.61 user=repmgr dbname=repmgr port=5432 connect_timeout=2
 2  | Node2 | primary | * running |          | default  | 100      | 2        | host=192.168.1.62 user=repmgr dbname=repmgr port=5432 connect_timeout=2

WARNING: following issues were detected
  - unable to connect to node "Node1" (ID: 1)

HINT: execute with --verbose option to see connection error messages
[postgres@Node2 ~]$

2、在Node2检查集群状态

[postgres@Node2 ~]$ repmgr cluster show
 ID | Name  | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string

----+-------+---------+-----------+----------+----------+----------+----------+-------------------------------------------------------------------------
 1  | Node1 | primary | - failed  | ?        | default  | 100      |          | host=192.168.1.61 user=repmgr dbname=repmgr port=5432 connect_timeout=2
 2  | Node2 | primary | * running |          | default  | 100      | 2        | host=192.168.1.62 user=repmgr dbname=repmgr port=5432 connect_timeout=2

WARNING: following issues were detected
  - unable to connect to node "Node1" (ID: 1)

HINT: execute with --verbose option to see connection error messages
[postgres@Node2 ~]$

3、手工修复Node1,将主库故障节点改为备库

[postgres@Node1 ~]$ repmgr node rejoin -h 192.168.1.62 -p5432 -Urepmgr -drepmgr --force-rewind
NOTICE: rejoin target is node "Node2" (ID: 2)
NOTICE: pg_rewind execution required for this node to attach to rejoin target node 2
DETAIL: rejoin target server's timeline 2 forked off current database system timeline 1 before current recovery point 0/301C758
NOTICE: executing pg_rewind
DETAIL: pg_rewind command is "/postgres/server/bin/pg_rewind -D '/postgres/data' --source-server='host=192.168.1.62 user=repmgr dbname=repmgr port=5432 connect_timeout=2'"
NOTICE: 0 files copied to /postgres/data
NOTICE: setting node 1's upstream to node 2
WARNING: unable to ping "host=192.168.1.61 user=repmgr dbname=repmgr port=5432 connect_timeout=2"
DETAIL: PQping() returned "PQPING_NO_RESPONSE"
NOTICE: starting server using "/postgres/server/bin/pg_ctl  -w -D '/postgres/data' start"
NOTICE: NODE REJOIN successful
DETAIL: node 1 is now attached to node 2
[postgres@Node1 ~]$

4、再次查询状态

[postgres@Node2 ~]$ repmgr cluster show
 ID | Name  | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string

----+-------+---------+-----------+----------+----------+----------+----------+-------------------------------------------------------------------------
 1  | Node1 | standby |   running | Node2    | default  | 100      | 1        | host=192.168.1.61 user=repmgr dbname=repmgr port=5432 connect_timeout=2
 2  | Node2 | primary | * running |          | default  | 100      | 2        | host=192.168.1.62 user=repmgr dbname=repmgr port=5432 connect_timeout=2
[postgres@Node2 ~]$

5.3、手动切换主备

1、模拟切换,查看模拟状态

. 说明:在备库执行

[postgres@Node1 ~]$ repmgr standby switchover --siblings-follow --force-rewind --dry-run
NOTICE: checking switchover on node "Node1" (ID: 1) in --dry-run mode
INFO: prerequisites for using pg_rewind are met
INFO: SSH connection to host "192.168.1.62" succeeded
INFO: able to execute "repmgr" on remote host "192.168.1.62"
WARNING: option "--sibling-nodes" specified, but no sibling nodes exist
INFO: 1 walsenders required, 10 available
INFO: demotion candidate is able to make replication connection to promotion candidate
INFO: archive mode is "off"
INFO: replication lag on this standby is 0 seconds
NOTICE: attempting to pause repmgrd on 2 nodes
INFO: would pause repmgrd on node "Node1" (ID: 1)
INFO: would pause repmgrd on node "Node2" (ID: 2)
NOTICE: local node "Node1" (ID: 1) would be promoted to primary; current primary "Node2" (ID: 2) would be demoted to standby
INFO: following shutdown command would be run on node "Node2":
  "/postgres/server/bin/pg_ctl  -D '/postgres/data' -W -m fast stop"
INFO: parameter "shutdown_check_timeout" is set to 60 seconds
INFO: prerequisites for executing STANDBY SWITCHOVER are met
[postgres@Node1 ~]$
2、主备切换
[postgres@Node1 ~]$ repmgr standby switchover --siblings-follow --force-rewind
NOTICE: executing switchover on node "Node1" (ID: 1)
WARNING: option "--sibling-nodes" specified, but no sibling nodes exist
NOTICE: attempting to pause repmgrd on 2 nodes
NOTICE: local node "Node1" (ID: 1) will be promoted to primary; current primary "Node2" (ID: 2) will be demoted to standby
NOTICE: stopping current primary node "Node2" (ID: 2)
NOTICE: issuing CHECKPOINT on node "Node2" (ID: 2)
DETAIL: executing server command "/postgres/server/bin/pg_ctl  -D '/postgres/data' -W -m fast stop"
INFO: checking for primary shutdown; 1 of 60 attempts ("shutdown_check_timeout")
INFO: checking for primary shutdown; 2 of 60 attempts ("shutdown_check_timeout")
NOTICE: current primary has been cleanly shut down at location 0/30474B0
NOTICE: promoting standby to primary
DETAIL: promoting server "Node1" (ID: 1) using pg_promote()
NOTICE: waiting up to 60 seconds (parameter "promote_check_timeout") for promotion to complete
NOTICE: STANDBY PROMOTE successful
DETAIL: server "Node1" (ID: 1) was successfully promoted to primary
NOTICE: issuing CHECKPOINT on node "Node1" (ID: 1)
ERROR: unable to execute CHECKPOINT
WARNING: node "Node2" attached in state "startup"
INFO: waiting for node "Node2" (ID: 2) to connect to new primary; 1 of max 60 attempts (parameter "node_rejoin_timeout")
DETAIL: node "Node1" (ID: 2) is currently attached to its upstream node in state "startup"
NOTICE: node "Node1" (ID: 1) promoted to primary, node "Node2" (ID: 2) demoted to standby
NOTICE: switchover was successful
DETAIL: node "Node1" is now primary and node "Node2" is attached as standby
NOTICE: STANDBY SWITCHOVER has completed successfully
[postgres@Node1 ~]$

3、检查状态

[postgres@Node1 ~]$ repmgr cluster show
 ID | Name  | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string

----+-------+---------+-----------+----------+----------+----------+----------+-------------------------------------------------------------------------
 1  | Node1 | primary | * running |          | default  | 100      | 3        | host=192.168.1.61 user=repmgr dbname=repmgr port=5432 connect_timeout=2
 2  | Node2 | standby |   running | Node1    | default  | 100      | 2        | host=192.168.1.62 user=repmgr dbname=repmgr port=5432 connect_timeout=2
[postgres@Node1 ~]$

六、总结

repmgr‌是一个用于管理PostgreSQL服务器集群复制和故障转移的开源工具套件。它提供了一系列命令行工具,用于设置和管理PostgreSQL流复制集群。以下是一些常用的repmgr命令:

  • repmgr primary register – 初始化一个repmgr安装并注册主节点
  • repmgr primary unregister — 取消注册非活动的主节点
  • repmgr standby clone – 从另一个PostgreSQL节点克隆一个PostgreSQL备用节点
  • repmgr standby unregister — 从 repmgr 元数据中移除备库的信息
  • repmgr standby promote – 将备用实例提升为主实例
  • repmgr standby follow — 将运行中的备用连接到新的上游节点
  • repmgr standby switchover – 将备用提升为主用,并将现有主用降级为备用
  • repmgr witness register – 将见证节点的信息添加到 repmgr 元数据中
  • repmgr witness unregister – 从 repmgr 元数据中移除见证节点的信息
  • repmgr node status – 显示节点的基本信息和复制状态概览
  • repmgr node check – 从复制角度对节点执行一些健康检查
  • repmgr node rejoin —将休眠(停止)的节点重新加入复制集群
  • repmgr node service – 显示或执行系统服务命令以停止/启动/重启/重新加载/提升节点
  • repmgr cluster show – 显示复制集群中每个已注册节点的信息
  • repmgr cluster matrix – 在每个节点上运行repmgr集群显示,并汇总输出
  • repmgr cluster crosscheck – 检查每个节点组合之间的连接情况
  • repmgr cluster event – 输出格式化后的集群事件列表
  • repmgr cluster cleanup – 清除监控历史
  • repmgr service status – 显示集群中每个节点上repmgrd的状态信息
  • repmgr service pause – 指示复制集群中的所有repmgrd实例暂停故障切换操作
  • repmgr service unpause – 指示复制集群中的所有repmgrd实例恢复故障切换操作
  • repmgr daemon start — Start the repmgrd daemon on the local node
  • repmgr daemon start — 在本地节点上启动 repmgrd 守护进程
  • repmgr daemon stop — 在本地节点上停止 repmgrd 守护进程

原创文章,作者:kirin,如若转载,请注明出处:https://blog.ytso.com/tech/bigdata/317974.html

(0)
上一篇 1天前
下一篇 1天前

相关推荐

发表回复

登录后才能评论