PostgreSQL+repmgr实现主备集群保姆级部署手册

文章目录[隐藏]

  • 五、repmgr集群管理
  • 六、总结
  • 一、工作原理

    前言:
    repmgr是一套开源工具,用于管理 PostgreSQL 服务器集群内的复制和故障转移。repmgr 支持并增强了 PostgreSQL 的内置流复制,它提供了一个单一的读/写主服务器和一个或多个只读备用服务器。

    1.1、功能概述

    repmgr 流复制管理工具对 PostgreSQL 集群节点的管理是基于分布式的管理方式。集群每个节点都具备一个 repmgr.conf 配置文件,用来记录本节点的 ID、节点名称、连接信息、数据库的 PGDATA 目录等配置参数。在完成参数配置后,就可以通过 repmgr 命令实现对集群节点的 “一键式” 部署。

    Repmgr整体架构图:

    PostgreSQL+repmgr实现主备集群保姆级部署手册

    集群节点部署完成后,每个节点都可通过 repmgrd 守护进程来监控节点数据库状态;每个节点元数据表可独立维护,这些元数据表将记录所有集群节点的信息。

    1.2、选举原理

    在发生 Auto Failover 时,备节点在尝试多次连接主节点失败后(尝试次数及尝试间隔可以通过 repmgr.conf 配置文件修改),repmgrd 会在所有备节点中选举一个候选备节点(选举机制参考下文)提升为新主节点,其他备节点去 Follow 到该新主上,形成一个新的集群。
    repmgr 选举候选备节点按照以下顺序选举:LSN > Priority > Node_ID
    系统将优先选举一个 LSN 较大的节点,作为候选备节点;
    若 LSN 一样,会根据 Priority 优先级进行比较(该优先级是在配置文件中进行参数配置,如果 Priority 为 0,则代表该节点被禁止提升为主节点);
    若优先级也一样,会比较节点的 Node ID,小者会优先选举。

    1.3、两个工具

    repmgr 主要提供了 repmgr 和 repmgrd 两个工具。
    repmgr 是一个执行管理任务的命令行工具,方便进行 PostgreSQL 服务器集群的管理。具备以下功能特点:

    • 设置备用服务器
      -> promote 备
      -> 主从切换
      -> 显示复制集群中服务器的状态
      -> epmgrd 是一个守护进程,它主动监视复制集群中的服务器并支持以下任务:
      -> 监控和记录复制集群信息
      -> 故障检测、故障转移
      -> 集群中事件的通知(需要自定义脚本接受通知)

    二、环境初始化

    2.1、环境规划

    主机名称 主机IP 用途 操作系统 PostgreSQL版本 Repmgr版本
    Node1 192.168.1.61 Primary CentOS Linux 7.9 16.4 5.4.1
    Node2 192.168.1.62 Standby CentOS Linux 7.9 16.4 5.4.1

    2.2、统一主机名称

    vi /etc/hostname
    或 hostnamectl set-hostname Node1
    

    2.3、关闭SELINUX

    sed -i  "s/SELINUX=enforcing/SELINUX=disabled/"  /etc/selinux/config
    

    2.4、关闭防火墙

    systemctl stop firewalld.service
    systemctl disable firewalld.service
    
    或设置开放端口:
    查看开放的端口:
    firewall-cmd --list-ports
    开启防火墙端口:
    firewall-cmd --zone=public --add-port=5432/tcp --permanent
    命令含义:
    –zone #作用域
    –add-port=9200/tcp #添加端口,格式为:端口/通讯协议
    –permanent #永久生效,没有此参数重启后失效
    重新加载一遍才会生效:
    firewall-cmd --reload
    

    2.5、安装插件

    yum -y install readline readline-devel zlib zlib-devel gettext gettext-devel openssl openssl-devel pam pam-devel libxml2 libxml2-devel libxslt libxslt-devel perl perl-devel tcl-devel uuid-devel gcc gcc-c++ make flex bison perl-ExtUtils*  libicu  libicu-devel  ntp libcurl-devel
    

    2.6、编辑hosts

    cat>>/etc/hosts<<EOF
    192.168.1.61  Node1
    192.168.1.62  Node2
    EOF
    

    2.7、用户limits设置

    cat>>/etc/security/limits.conf<<EOF
    postgres soft nofile 1048576
    postgres hard nofile 1048576
    postgres soft nproc 131072
    postgres hard nproc 131072
    postgres soft stack 10240
    postgres hard stack 32768
    postgres soft core 6291456
    postgres hard core 6291456
    EOF
    

    2.8、调整内核

    cat>>/etc/sysctl.conf<<EOF
    fs.file-max = 76724200
    kernel.sem = 10000 10240000 10000 1024
    kernel.shmmni = 4096
    kernel.shmall = 253702
    kernel.shmmax = 1039163392
    net.ipv4.ip_local_port_range = 9000 65500
    net.core.rmem_default = 262144
    net.core.wmem_default = 262144
    net.core.rmem_max = 4194304
    net.core.wmem_max = 1048576
    fs.aio-max-nr = 40960000
    vm.dirty_ratio=20
    vm.dirty_background_ratio=3
    vm.dirty_writeback_centisecs=100
    vm.dirty_expire_centisecs=500
    vm.swappiness=10
    vm.min_free_kbytes=524288
    vm.swappiness=0
    vm.overcommit_memory=2
    vm.overcommit_ratio=75
    net.ipv4.ip_local_port_range = 10000 65535
    EOF
    
    sysctl -p 
    

    2.9、目录规划

    目录编号 目录名称 备注
    1 /postgres postgres根目录
    2 /postgres/server 软件安装目录
    3 /postgres/data 数据目录
    4 /postgres/WAL WAL 目录
    5 /postgres/archive 归档目录
    6 /postgres/backup 备份目录
    7 /install 安装包上传目录
    8 /postgres/server/etc repmgr.conf目录
    9 /postgres/server/etc/log repmgr日志目录
    • 执行脚本:
    mkdir /install
     mkdir -p /postgres/server /postgres/data  /postgres/wal  /postgres/archive  /postgres/backup /postgres/server/etc/log
     sudo chown -R postgres:postgres /postgres /install
     chmod 0775 /postgres  /install
     chmod 0700 /postgres/data
    

    2.10、配置各节点互信

    su - postgres
    rm -rf /home/postgres/.ssh
    mkdir ~/.ssh
    chmod 700 ~/.ssh
    ssh-keygen -t rsa
    ssh-copy-id postgres@192.168.1.61
    ssh-copy-id postgres@192.168.1.62
    

    三、PostgreSQL部署

    3.1、部署PostgreSQL(2台主机Node1、Node2都要部署)

    su  postgres
    cd /install	
    wget  https://ftp.postgresql.org/pub/source/v16.4/postgresql-16.4.tar.gz
    tar -zxvf postgresql-16.4.tar.gz
    

    3.2、在Node1分发文件到Node2、Node1

    cd /install
    scp -r ./* 192.168.1.62:/install
    

    3.3、在Node1分发文件到Node2、Node1

    cd /install/postgresql-16.4
    
    
    ./configure --prefix=/postgres/server --with-pgport=5432 --with-openssl
    
    make && make install
    
    使用  make 或者  make world
    $ make world
    使用 make install 或者 make install-world 进行安装
    $ make install-world     #包含扩展包和文档
    
    
    • Node1:
    [postgres@Node1 postgresql-16.4]$ ./configure --prefix=/postgres/server --with-pgport=5432 --with-openssl
    checking build system type... x86_64-pc-linux-gnu
    checking host system type... x86_64-pc-linux-gnu
    checking which template to use... linux
    checking whether NLS is wanted... no
    checking for default port number... 5432
    checking for block size... 8kB
    checking for segment size... 1GB
    checking for WAL block size... 8kB
    checking for gcc... gcc
    checking whether the C compiler works... yes
    .....................................................
    configure: creating ./config.status
    config.status: creating GNUmakefile
    config.status: creating src/Makefile.global
    config.status: creating src/include/pg_config.h
    config.status: creating src/include/pg_config_ext.h
    config.status: creating src/interfaces/ecpg/include/ecpg_config.h
    config.status: linking src/backend/port/tas/dummy.s to src/backend/port/tas.s
    config.status: linking src/backend/port/posix_sema.c to src/backend/port/pg_sema.c
    config.status: linking src/backend/port/sysv_shmem.c to src/backend/port/pg_shmem.c
    config.status: linking src/include/port/linux.h to src/include/pg_config_os.h
    config.status: linking src/makefiles/Makefile.linux to src/Makefile.port
    [postgres@Node1 postgresql-16.4]$
    
    
    [postgres@Node1 postgresql-16.4]$ make && make install
    make -C ./src/backend generated-headers
    make[1]: Entering directory `/install/postgresql-16.4/src/backend'
    make -C catalog distprep generated-header-symlinks
    make[2]: Entering directory `/install/postgresql-16.4/src/backend/catalog'
    make[2]: Nothing to be done for `distprep'.
    prereqdir=`cd './' >/dev/null && pwd` && 
    cd '../../../src/include/catalog/' && for file in pg_proc_d.h pg_type_d.h pg_attribute_d.h pg_class_d.h pg_attrdef_d.h pg_constraint_d.h pg_inherits_d.h pg_index_d.h pg_operator_d.h pg_opfamily_d.h pg_opclass_d.h pg_am_d.h pg_amop_d.h pg_amproc_d.h pg_language_d.h pg_largeobject_metadata_d.h pg_largeobject_d.h pg_aggregate_d.h pg_statistic_d.h pg_statistic_ext_d.h pg_statistic_ext_data_d.h pg_rewrite_d.h pg_trigger_d.h pg_event_trigger_d.h pg_description_d.h pg_cast_d.h pg_enum_d.h pg_namespace_d.h pg_conversion_d.h pg_depend_d.h pg_database_d.h pg_db_role_setting_d.h pg_tablespace_d.h pg_authid_d.h pg_auth_members_d.h pg_shdepend_d.h pg_shdescription_d.h pg_ts_config_d.h pg_ts_config_map_d.h pg_ts_dict_d.h pg_ts_parser_d.h pg_ts_template_d.h pg_extension_d.h pg_foreign_data_wrapper_d.h pg_foreign_server_d.h pg_user_mapping_d.h pg_foreign_table_d.h pg_policy_d.h pg_replication_origin_d.h pg_default_acl_d.h pg_init_privs_d.h pg_seclabel_d.h pg_shseclabel_d.h pg_collation_d.h pg_parameter_acl_d.h pg_partitioned_table_d.h pg_range_d.h pg_transform_d.h pg_sequence_d.h pg_publication_d.h pg_publication_namespace_d.h pg_publication_rel_d.h pg_subscription_d.h pg_subscription_rel_d.h schemapg.h system_fk_info.h; do 
      rm -f $file && ln -s "$prereqdir/$file" . ; 
    done
    .....................................................
    make[1]: Leaving directory `/install/postgresql-16.4/src'
    make -C config install
    make[1]: Entering directory `/install/postgresql-16.4/config'
    /bin/mkdir -p '/postgres/server/lib/pgxs/config'
    /bin/install -c -m 755 ./install-sh '/postgres/server/lib/pgxs/config/install-sh'
    /bin/install -c -m 755 ./missing '/postgres/server/lib/pgxs/config/missing'
    make[1]: Leaving directory `/install/postgresql-16.4/config'
    [postgres@Node1 postgresql-16.4]$
    
    • Node2:
    [postgres@Node2 postgresql-16.4]$
    [postgres@Node2 postgresql-16.4]$ ./configure --prefix=/postgres/server --with-pgport=5432 --with-openssl
    checking build system type... x86_64-pc-linux-gnu
    checking host system type... x86_64-pc-linux-gnu
    checking which template to use... linux
    checking whether NLS is wanted... no
    checking for default port number... 5432
    checking for block size... 8kB
    checking for segment size... 1GB
    checking for WAL block size... 8kB
    checking for gcc... gcc
    checking whether the C compiler works... yes
    ...................................................................
    configure: creating ./config.status
    config.status: creating GNUmakefile
    config.status: creating src/Makefile.global
    config.status: creating src/include/pg_config.h
    config.status: creating src/include/pg_config_ext.h
    config.status: creating src/interfaces/ecpg/include/ecpg_config.h
    config.status: linking src/backend/port/tas/dummy.s to src/backend/port/tas.s
    config.status: linking src/backend/port/posix_sema.c to src/backend/port/pg_sema.c
    config.status: linking src/backend/port/sysv_shmem.c to src/backend/port/pg_shmem.c
    config.status: linking src/include/port/linux.h to src/include/pg_config_os.h
    config.status: linking src/makefiles/Makefile.linux to src/Makefile.port
    [postgres@Node2 postgresql-16.4]$
    
    
    [postgres@Node2.. postgresql-16.4]$ make && make install
    make -C ./src/backend generated-headers
    make[1]: Entering directory `/install/postgresql-16.4/src/backend'
    make -C catalog distprep generated-header-symlinks
    make[2]: Entering directory `/install/postgresql-16.4/src/backend/catalog'
    make[2]: Nothing to be done for `distprep'.
    prereqdir=`cd './' >/dev/null && pwd` && 
    cd '../../../src/include/catalog/' && for file in pg_proc_d.h pg_type_d.h pg_attribute_d.h pg_class_d.h pg_attrdef_d.h pg_constraint_d.h pg_inherits_d.h pg_index_d.h pg_operator_d.h pg_opfamily_d.h pg_opclass_d.h pg_am_d.h pg_amop_d.h pg_amproc_d.h pg_language_d.h pg_largeobject_metadata_d.h pg_largeobject_d.h pg_aggregate_d.h pg_statistic_d.h pg_statistic_ext_d.h pg_statistic_ext_data_d.h pg_rewrite_d.h pg_trigger_d.h pg_event_trigger_d.h pg_description_d.h pg_cast_d.h pg_enum_d.h pg_namespace_d.h pg_conversion_d.h pg_depend_d.h pg_database_d.h pg_db_role_setting_d.h pg_tablespace_d.h pg_authid_d.h pg_auth_members_d.h pg_shdepend_d.h pg_shdescription_d.h pg_ts_config_d.h pg_ts_config_map_d.h pg_ts_dict_d.h pg_ts_parser_d.h pg_ts_template_d.h pg_extension_d.h pg_foreign_data_wrapper_d.h pg_foreign_server_d.h pg_user_mapping_d.h pg_foreign_table_d.h pg_policy_d.h pg_replication_origin_d.h pg_default_acl_d.h pg_init_privs_d.h pg_seclabel_d.h pg_shseclabel_d.h pg_collation_d.h pg_parameter_acl_d.h pg_partitioned_table_d.h pg_range_d.h pg_transform_d.h pg_sequence_d.h pg_publication_d.h pg_publication_namespace_d.h pg_publication_rel_d.h pg_subscription_d.h pg_subscription_rel_d.h schemapg.h system_fk_info.h; do 
      rm -f $file && ln -s "$prereqdir/$file" . ; 
    done
    .....................................................
    make[1]: Leaving directory `/install/postgresql-16.4/src'
    make -C config install
    make[1]: Entering directory `/install/postgresql-16.4/config'
    /bin/mkdir -p '/postgres/server/lib/pgxs/config'
    /bin/install -c -m 755 ./install-sh '/postgres/server/lib/pgxs/config/install-sh'
    /bin/install -c -m 755 ./missing '/postgres/server/lib/pgxs/config/missing'
    make[1]: Leaving directory `/install/postgresql-16.4/config'
    [postgres@Node2 postgresql-16.4]$
    

    3.4、设置软链接

    su root
    mkdir /postgresql
    chmod -R 777 /postgresql
    chown -R postgres:postgres /postgresql
    cd /postgres
    ln -s /postgres/server /postgresql
    

    3.5、设置环境变量

    vi /home/postgres/.bashrc
    export PGPORT=5432
    export PGUSER=postgres
    export PGHOME=/postgres/server
    export PGDATA=/postgres/data
    export PATH=$PGHOME/bin:$PATH
    
    source  /home/postgres/.bashrc
    
    

    3.6、初始化数据库

    • 说明:只需初始化Node1
    • 脚本:
     /postgresql/server/bin/initdb -D/postgres/data -X/postgres/wal -EUTF8 -Upostgres -W --data-checksums --pwprompt
    
    • 执行过程:
    [postgres@Node1 ~]$  /postgresql/server/bin/initdb -D/postgres/data -X/postgres/wal -EUTF8 -Upostgres -W --data-checksums --pwprompt
    The files belonging to this database system will be owned by user "postgres".
    This user must also own the server process.
    
    The database cluster will be initialized with locale "en_US.UTF-8".
    The default text search configuration will be set to "english".
    
    Data page checksums are enabled.
    
    Enter new superuser password:
    Enter it again:
    
    fixing permissions on existing directory /postgres/data ... ok
    fixing permissions on existing directory /postgres/wal ... ok
    creating subdirectories ... ok
    selecting dynamic shared memory implementation ... posix
    selecting default max_connections ... 100
    selecting default shared_buffers ... 128MB
    selecting default time zone ... Asia/Shanghai
    creating configuration files ... ok
    running bootstrap script ... ok
    performing post-bootstrap initialization ... ok
    syncing data to disk ... ok
    
    initdb: warning: enabling "trust" authentication for local connections
    initdb: hint: You can change this by editing pg_hba.conf or using the option -A, or --auth-local and --auth-host, the next time you run initdb.
    
    Success. You can now start the database server using:
    
        /postgresql/server/bin/pg_ctl -D /postgres/data -l logfile start
    
    [postgres@Node1 ~]$
    
    

    3.7、设置开机自启

    PostgreSQL的开机自启动脚本位于PostgreSQL源码目录的contrib/start-scripts路径下
    这个路径是安装包解压完的路径。
    
    linux文件即为linux系统上的启动脚本
    1)修改linux文件属性,添加X属性
    exit
    切换到root用户
    cd /install/postgresql-16.2/contrib/start-scripts 
    切换到解压目录下
    chmod a+x linux
    2) 复制linux文件到/etc/init.d目录下,更名为PostgreSQL
    cp linux /etc/init.d/PostgreSQL
     
    3)修改/etc/init.d/PostgreSQL文件的两个变量
    vi /etc/init.d/PostgreSQL
    
    prefix设置为PostgreSQL的安装路径:prefix=/postgres/server
    PGDATA设置为PostgreSQL的数据目录路径:PGDATA="/postgres/data"
     
    4) 执行service PostgreSQL start,就可以启动PostgreSQL服务
    service PostgreSQL start
     
    5)设置PostgreSQL服务开机自启动
    chkconfig --add PostgreSQL
    
    执行上面的命令,就可以实现PostgreSQL服务的开机自启动。
    Reboot
    
    

    3.8、修改postgresql.conf

    cat >> /postgres/data/postgresql.conf <<EOF
    listen_addresses = '*'
    max_replication_slots = 12
    max_wal_senders = 10
    wal_keep_size = 1024
    wal_level = replica
    hot_standby = on
    wal_log_hints = on
    full_page_writes=on
    shared_preload_libraries = 'repmgr'
    archive_command = 'test ! -f /postgres/archive/%f && cp %p /postgres/archive/%f'
    EOF
    

    3.9、修改pg_hba.conf

    
    vi $PGDATA/pg_hba.conf
    
    # IPv4 local connections:
    host    all              all              127.0.0.1/32            trust
    host    repmgr          repmgr          192.168.1.61/24         trust
    host    repmgr          repmgr          192.168.1.62/24         trust
    host    replication     repmgr          192.168.1.61/24         trust
    host    replication     repmgr          192.168.1.62/24         trust
    host    all             all             0.001.0/0               scram-sha-256
    

    四、部署repmgr

    • 说明:部署Repmgr(2台主机Node1、Node2都要部署)

    4.1、安装插件

    sudo yum -y install libcurl-devel
    sudo yum -y install json-c-devel
    

    4.2、在线下载repmgr安装包

    cd /install
    
    wget -c https://repmgr.org/download/repmgr-5.4.1.tar.gz
    tar -zxvf repmgr-5.4.1.tar.gz 
    cd /install/repmgr-5.4.1
    
    ./configure   && make install
    
    • Node1:
    [postgres@Node1 ~]$ cd /install/repmgr-5.4.1
    [postgres@Node1 repmgr-5.4.1]$
    [postgres@Node1 repmgr-5.4.1]$ ./configure && make install
    checking for a sed that does not truncate output... /bin/sed
    checking for pg_config... /postgres/server/bin/pg_config
    configure: building against PostgreSQL 16.4
    checking for gnused... no
    checking for gsed... no
    checking for sed... yes
    configure: creating ./config.status
    config.status: creating Makefile
    config.status: creating Makefile.global
    config.status: creating config.h
    Building against PostgreSQL 16
    .....................................................
    /bin/mkdir -p '/postgres/server/lib'
    /bin/mkdir -p '/postgres/server/share/extension'
    /bin/mkdir -p '/postgres/server/share/extension'
    /bin/mkdir -p '/postgres/server/bin'
    /bin/install -c -m 755  repmgr.so '/postgres/server/lib/repmgr.so'
    /bin/install -c -m 644 .//repmgr.control '/postgres/server/share/extension/'
    /bin/install -c -m 644 .//repmgr--unpackaged--4.0.sql .//repmgr--unpackaged--5.1.sql .//repmgr--unpackaged--5.2.sql .//repmgr--unpackaged--5.3.sql .//repmgr--4.0.sql .//repmgr--4.0--4.1.sql .//repmgr--4.1.sql .//repmgr--4.1--4.2.sql .//repmgr--4.2.sql .//repmgr--4.2--4.3.sql .//repmgr--4.3.sql .//repmgr--4.3--4.4.sql .//repmgr--4.4.sql .//repmgr--4.4--5.0.sql .//repmgr--5.0.sql .//repmgr--5.0--5.1.sql .//repmgr--5.1.sql .//repmgr--5.1--5.2.sql .//repmgr--5.2.sql .//repmgr--5.2--5.3.sql .//repmgr--5.3.sql .//repmgr--5.3--5.4.sql .//repmgr--5.4.sql  '/postgres/server/share/extension/'
    /bin/install -c -m 755 repmgr repmgrd '/postgres/server/bin/'
    [postgres@Node1 repmgr-5.4.1]$
    
    • Node2:
    [postgres@Node2 ~]$ cd /install/repmgr-5.4.1
    [postgres@Node2 repmgr-5.4.1]$ ./configure && make install
    checking for a sed that does not truncate output... /bin/sed
    checking for pg_config... /postgres/server/bin/pg_config
    configure: building against PostgreSQL 16.4
    checking for gnused... no
    checking for gsed... no
    checking for sed... yes
    configure: creating ./config.status
    config.status: creating Makefile
    config.status: creating Makefile.global
    config.status: creating config.h
    Building against PostgreSQL 16
    .....................................................
    /bin/mkdir -p '/postgres/server/lib'
    /bin/mkdir -p '/postgres/server/share/extension'
    /bin/mkdir -p '/postgres/server/share/extension'
    /bin/mkdir -p '/postgres/server/bin'
    /bin/install -c -m 755  repmgr.so '/postgres/server/lib/repmgr.so'
    /bin/install -c -m 644 .//repmgr.control '/postgres/server/share/extension/'
    /bin/install -c -m 644 .//repmgr--unpackaged--4.0.sql .//repmgr--unpackaged--5.1.sql .//repmgr--unpackaged--5.2.sql .//repmgr--unpackaged--5.3.sql .//repmgr--4.0.sql .//repmgr--4.0--4.1.sql .//repmgr--4.1.sql .//repmgr--4.1--4.2.sql .//repmgr--4.2.sql .//repmgr--4.2--4.3.sql .//repmgr--4.3.sql .//repmgr--4.3--4.4.sql .//repmgr--4.4.sql .//repmgr--4.4--5.0.sql .//repmgr--5.0.sql .//repmgr--5.0--5.1.sql .//repmgr--5.1.sql .//repmgr--5.1--5.2.sql .//repmgr--5.2.sql .//repmgr--5.2--5.3.sql .//repmgr--5.3.sql .//repmgr--5.3--5.4.sql .//repmgr--5.4.sql  '/postgres/server/share/extension/'
    /bin/install -c -m 755 repmgr repmgrd '/postgres/server/bin/'
    [postgres@Node2 repmgr-5.4.1]$
    
    
    

    4.3、检查repmgr安装位置

    Node1:
    
    [postgres@Node1 ~]$ pg_config --sysconfdir
    /postgres/server/etc
    [postgres@Node1 ~]$
    
    Node2:
    
    [postgres@Node2 ~]$ pg_config --sysconfdir
    /postgres/server/etc
    [postgres@Node2 ~]$
    

    4.4、配置repmgr.conf(2个节点Node1、Node2都要配置)

    vi /postgres/server/etc/repmgr.conf
    
    • Node1:
    # 节点标识设置
    node_id = 1
    node_name = Node1
    # 数据库连接信息
    conninfo = 'host=192.168.1.61 user=repmgr dbname=repmgr port=5432 connect_timeout=2'
    # 数据目录路径
    data_directory = '/postgres/data'
    # PostgreSQL二进制目录路径
    pg_bindir = '/postgres/server/bin'
    config_directory='/postgres/server'
    # 日志级别与日志文件位置
    log_level = INFO
    log_facility=STDERR
    log_file = '/postgres/server/etc/repmgr.log'
    
    # 故障切换策略设定
    failover = automatic
    # 提升从库为新主库的命令
    promote_command = '/postgres/server/bin/repmgr standby promote -f /postgres/server/etc/repmgr.conf'
    # 从库跟随主库命令
    follow_command = '/postgres/server/bin/repmgr standby follow -f /postgres/server/etc/repmgr.conf --upstream-node-id=%n'
    # 监控历史记录存储选项
    monitoring_history = true
    # 监控间隔时间(秒)
    monitor_interval_secs = 2
    connection_check_type='ping'
    # 重连尝试次数
    reconnect_attempts = 3
    # 两次重连之间的间隔时间(秒)
    reconnect_interval = 5
    # repmgrd文件
    repmgrd_pid_file = '/postgres/server/etc/repmgrd.pid'
    
    • Node2:
    # 节点标识设置
    node_id = 2
    node_name = Node2
    # 数据库连接信息
    conninfo = 'host=192.168.1.62 user=repmgr dbname=repmgr port=5432 connect_timeout=2'
    # 数据目录路径
    data_directory = '/postgres/data'
    # PostgreSQL二进制目录路径
    pg_bindir = '/postgres/server/bin'
    config_directory='/postgres/server'
    # 日志级别与日志文件位置
    log_level = INFO
    log_facility=STDERR
    log_file = '/postgres/server/etc/repmgr.log'
    
    # 故障切换策略设定
    failover = automatic
    # 提升从库为新主库的命令
    promote_command = '/postgres/server/bin/repmgr standby promote -f /postgres/server/etc/repmgr.conf'
    # 从库跟随主库命令
    follow_command = '/postgres/server/bin/repmgr standby follow -f /postgres/server/etc/repmgr.conf --upstream-node-id=%n'
    # 监控历史记录存储选项
    monitoring_history = true
    # 监控间隔时间(秒)
    monitor_interval_secs = 2
    connection_check_type='ping'
    # 重连尝试次数
    reconnect_attempts = 3
    # 两次重连之间的间隔时间(秒)
    reconnect_interval = 5
    # repmgrd文件
    repmgrd_pid_file = '/postgres/server/etc/repmgrd.pid'
    

    4.5、启动Node1的PostgreSQL服务

    • 脚本:
      pg_ctl start
    • 执行内容:
    [postgres@Node1 ~]$ pg_ctl start
    waiting for server to start....2024-09-14 21:18:41.333 CST [18549] LOG:  starting PostgreSQL 16.4 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-44), 64-bit
    2024-09-14 21:18:41.337 CST [18549] LOG:  listening on IPv4 address "0.0.0.0", port 5432
    2024-09-14 21:18:41.337 CST [18549] LOG:  listening on IPv6 address "::", port 5432
    2024-09-14 21:18:41.341 CST [18549] LOG:  listening on Unix socket "/tmp/.s.PGSQL.5432"
    2024-09-14 21:18:41.349 CST [18552] LOG:  database system was shut down at 2024-09-14 19:58:52 CST
    2024-09-14 21:18:41.356 CST [18549] LOG:  database system is ready to accept connections
     done
    server started
    [postgres@Node1 ~]$
    

    4.6、在Node1登录psql,创建集群管理用户repmgr,数据库repmgr

    • 脚本:
     create user repmgr with superuser replication password 'repmgr' ;
    create database repmgr owner  repmgr;
    
    • 执行内容:
    [postgres@Node1 ~]$ psql
    psql (16.4)
    Type "help" for help.
    
    postgres=# create user repmgr with superuser replication password 'repmgr' ;
    CREATE ROLE
    postgres=# create database repmgr owner  repmgr;
    CREATE DATABASE
    postgres=#
    
    

    4.7、注册Primary主库(在Node1节点执行)

    • 脚本:
    repmgr primary register
    
    • 执行内容:
    [postgres@Node1 ~]$ repmgr primary register
    INFO: connecting to primary database...
    NOTICE: attempting to install extension "repmgr"
    NOTICE: "repmgr" extension successfully installed
    NOTICE: primary node record (ID: 1) registered
    [postgres@Node1 ~]$
    

    4.8、启动守护进程(在Node1节点执行)

    • 脚本:
      repmgrd -d
    • 执行内容:
    [postgres@Node1 ~]$ repmgrd -d
    [2024-09-14 21:20:48] [NOTICE] redirecting logging output to "/postgres/server/etc/repmgr.log"
    
    [postgres@Node1 ~]$
    

    4.9、注册Standby主库(在Node2节点执行)

    • 脚本:
    • 1、-检查是否可以注册备节点
      repmgr -h 192.168.1.61 -p5432 -U repmgr -d repmgr standby clone --dry-run
    • 2、注意备库repmgr.conf文件中的目录要为空
      repmgr -h 192.168.1.61 -p5432 -U repmgr -d repmgr standby clone --force
    • 3、启动服务
      pg_ctl start
    • 执行内容:
    [postgres@Node2 ~]$ repmgr -h 192.168.1.61 -p5432 -U repmgr -d repmgr standby clone --dry-run
    WARNING: following problems with command line parameters detected:
      "config_directory" set in repmgr.conf, but --copy-external-config-files not provided
    NOTICE: destination directory "/postgres/data" provided
    INFO: connecting to source node
    DETAIL: connection string is: host=192.168.1.61 port=5432 user=repmgr dbname=repmgr
    DETAIL: current installation size is 30 MB
    INFO: "repmgr" extension is installed in database "repmgr"
    INFO: replication slot usage not requested;  no replication slot will be set up for this standby
    INFO: parameter "max_wal_senders" set to 10
    NOTICE: checking for available walsenders on the source node (2 required)
    INFO: sufficient walsenders available on the source node
    DETAIL: 2 required, 10 available
    NOTICE: checking replication connections can be made to the source server (2 required)
    INFO: required number of replication connections could be made to the source server
    DETAIL: 2 replication connections required
    NOTICE: standby will attach to upstream node 1
    HINT: consider using the -c/--fast-checkpoint option
    INFO: would execute:
      /postgres/server/bin/pg_basebackup -l "repmgr base backup"  -D /postgres/data -h 192.168.1.61 -p 5432 -U repmgr -X stream
    INFO: all prerequisites for "standby clone" are met
    [postgres@Node2 ~]$
    
    
    [postgres@Node2 ~]$ repmgr -h 192.168.1.61 -p5432 -U repmgr -d repmgr standby clone --force
    WARNING: following problems with command line parameters detected:
      "config_directory" set in repmgr.conf, but --copy-external-config-files not provided
    NOTICE: destination directory "/postgres/data" provided
    INFO: connecting to source node
    DETAIL: connection string is: host=192.168.1.61 port=5432 user=repmgr dbname=repmgr
    DETAIL: current installation size is 30 MB
    INFO: replication slot usage not requested;  no replication slot will be set up for this standby
    NOTICE: checking for available walsenders on the source node (2 required)
    NOTICE: checking replication connections can be made to the source server (2 required)
    INFO: checking and correcting permissions on existing directory "/postgres/data"
    NOTICE: starting backup (using pg_basebackup)...
    HINT: this may take some time; consider using the -c/--fast-checkpoint option
    INFO: executing:
      /postgres/server/bin/pg_basebackup -l "repmgr base backup"  -D /postgres/data -h 192.168.1.61 -p 5432 -U repmgr -X stream
    NOTICE: standby clone (using pg_basebackup) complete
    NOTICE: you can now start your PostgreSQL server
    HINT: for example: pg_ctl -D /postgres/data start
    HINT: after starting the server, you need to register this standby with "repmgr standby register"
    [postgres@Node2 ~]$
    
    
    [postgres@Node2 ~]$  pg_ctl start
    waiting for server to start....2024-09-14 21:25:23.805 CST [19557] LOG:  starting PostgreSQL 16.4 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-44), 64-bit
    2024-09-14 21:25:23.806 CST [19557] LOG:  listening on IPv4 address "0.0.0.0", port 5432
    2024-09-14 21:25:23.806 CST [19557] LOG:  listening on IPv6 address "::", port 5432
    2024-09-14 21:25:23.812 CST [19557] LOG:  listening on Unix socket "/tmp/.s.PGSQL.5432"
    2024-09-14 21:25:23.818 CST [19560] LOG:  database system was interrupted; last known up at 2024-09-14 21:24:35 CST
    2024-09-14 21:25:23.833 CST [19560] LOG:  entering standby mode
    2024-09-14 21:25:23.834 CST [19560] LOG:  starting backup recovery with redo LSN 0/2000028, checkpoint LSN 0/2000098, on timeline ID 1
    2024-09-14 21:25:23.836 CST [19560] LOG:  redo starts at 0/2000028
    2024-09-14 21:25:23.837 CST [19560] LOG:  completed backup recovery with redo LSN 0/2000028 and end LSN 0/2000170
    2024-09-14 21:25:23.837 CST [19560] LOG:  consistent recovery state reached at 0/2000170
    2024-09-14 21:25:23.837 CST [19557] LOG:  database system is ready to accept read-only connections
     done
    server started
    [postgres@Node2 ~]$ 2024-09-14 21:25:23.865 CST [19561] LOG:  started streaming WAL from primary at 0/3000000 on timeline 1
    
    [postgres@Node2 ~]$
    

    4.10、注册standby节点(在Node2节点执行)

    [postgres@Node2 ~]$
    [postgres@Node2 ~]$ repmgr standby register
    INFO: connecting to local node "Node2" (ID: 2)
    INFO: connecting to primary database
    WARNING: --upstream-node-id not supplied, assuming upstream node is primary (node ID: 1)
    INFO: standby registration complete
    NOTICE: standby node "Node2" (ID: 2) successfully registered
    [postgres@Node2 ~]$
    

    4.11、启动守护进程(在Node2节点执行)

    • 脚本:
      repmgrd -d
    • 执行内容:
    [postgres@Node2 ~]$ repmgrd -d
    [2024-09-14 21:25:23] [NOTICE] redirecting logging output to "/postgres/server/etc/repmgr.log"
    

    五、repmgr集群管理

    5.1、检查集群运行情况

    1、检查集群状态1

    [postgres@Node1 ~]$ psql -Urepmgr -drepmgr
    psql (16.4)
    Type "help" for help.
    
    repmgr=#  select * from nodes;
     node_id | upstream_node_id | active | node_name |  type   | location | priority |                                conninfo
                          | repluser | slot_name |           config_file
    ---------+------------------+--------+-----------+---------+----------+----------+---------------------------------------------------
    ----------------------+----------+-----------+----------------------------------
           1 |                  | t      | Node1     | primary | default  |      100 | host=192.168.1.61 user=repmgr dbname=repmgr port=5
    432 connect_timeout=2 | repmgr   |           | /postgres/server/etc/repmgr.conf
           2 |                1 | t      | Node2     | standby | default  |      100 | host=192.168.1.62 user=repmgr dbname=repmgr port=5
    432 connect_timeout=2 | repmgr   |           | /postgres/server/etc/repmgr.conf
    (2 rows)
    
    repmgr=#
    

    2、检查集群状态2

    [postgres@Node2 ~]$ repmgr cluster show
     ID | Name  | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string
    
    ----+-------+---------+-----------+----------+----------+----------+----------+-------------------------------------------------------------------------
     1  | Node1 | primary | * running |          | default  | 100      | 1        | host=192.168.1.61 user=repmgr dbname=repmgr port=5432 connect_timeout=2
     2  | Node2 | standby |   running | Node1    | default  | 100      | 1        | host=192.168.1.62 user=repmgr dbname=repmgr port=5432 connect_timeout=2
    [postgres@Node2 ~]$
    

    5.2、模拟故障,实现自动主备切换

    1、在Node1停止服务

    [postgres@Node2 ~]$ repmgr cluster show
     ID | Name  | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string
    
    ----+-------+---------+-----------+----------+----------+----------+----------+-------------------------------------------------------------------------
     1  | Node1 | primary | - failed  | ?        | default  | 100      |          | host=192.168.1.61 user=repmgr dbname=repmgr port=5432 connect_timeout=2
     2  | Node2 | primary | * running |          | default  | 100      | 2        | host=192.168.1.62 user=repmgr dbname=repmgr port=5432 connect_timeout=2
    
    WARNING: following issues were detected
      - unable to connect to node "Node1" (ID: 1)
    
    HINT: execute with --verbose option to see connection error messages
    [postgres@Node2 ~]$
    

    2、在Node2检查集群状态

    [postgres@Node2 ~]$ repmgr cluster show
     ID | Name  | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string
    
    ----+-------+---------+-----------+----------+----------+----------+----------+-------------------------------------------------------------------------
     1  | Node1 | primary | - failed  | ?        | default  | 100      |          | host=192.168.1.61 user=repmgr dbname=repmgr port=5432 connect_timeout=2
     2  | Node2 | primary | * running |          | default  | 100      | 2        | host=192.168.1.62 user=repmgr dbname=repmgr port=5432 connect_timeout=2
    
    WARNING: following issues were detected
      - unable to connect to node "Node1" (ID: 1)
    
    HINT: execute with --verbose option to see connection error messages
    [postgres@Node2 ~]$
    

    3、手工修复Node1,将主库故障节点改为备库

    [postgres@Node1 ~]$ repmgr node rejoin -h 192.168.1.62 -p5432 -Urepmgr -drepmgr --force-rewind
    NOTICE: rejoin target is node "Node2" (ID: 2)
    NOTICE: pg_rewind execution required for this node to attach to rejoin target node 2
    DETAIL: rejoin target server's timeline 2 forked off current database system timeline 1 before current recovery point 0/301C758
    NOTICE: executing pg_rewind
    DETAIL: pg_rewind command is "/postgres/server/bin/pg_rewind -D '/postgres/data' --source-server='host=192.168.1.62 user=repmgr dbname=repmgr port=5432 connect_timeout=2'"
    NOTICE: 0 files copied to /postgres/data
    NOTICE: setting node 1's upstream to node 2
    WARNING: unable to ping "host=192.168.1.61 user=repmgr dbname=repmgr port=5432 connect_timeout=2"
    DETAIL: PQping() returned "PQPING_NO_RESPONSE"
    NOTICE: starting server using "/postgres/server/bin/pg_ctl  -w -D '/postgres/data' start"
    NOTICE: NODE REJOIN successful
    DETAIL: node 1 is now attached to node 2
    [postgres@Node1 ~]$
    

    4、再次查询状态

    [postgres@Node2 ~]$ repmgr cluster show
     ID | Name  | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string
    
    ----+-------+---------+-----------+----------+----------+----------+----------+-------------------------------------------------------------------------
     1  | Node1 | standby |   running | Node2    | default  | 100      | 1        | host=192.168.1.61 user=repmgr dbname=repmgr port=5432 connect_timeout=2
     2  | Node2 | primary | * running |          | default  | 100      | 2        | host=192.168.1.62 user=repmgr dbname=repmgr port=5432 connect_timeout=2
    [postgres@Node2 ~]$
    

    5.3、手动切换主备

    1、模拟切换,查看模拟状态

    . 说明:在备库执行

    [postgres@Node1 ~]$ repmgr standby switchover --siblings-follow --force-rewind --dry-run
    NOTICE: checking switchover on node "Node1" (ID: 1) in --dry-run mode
    INFO: prerequisites for using pg_rewind are met
    INFO: SSH connection to host "192.168.1.62" succeeded
    INFO: able to execute "repmgr" on remote host "192.168.1.62"
    WARNING: option "--sibling-nodes" specified, but no sibling nodes exist
    INFO: 1 walsenders required, 10 available
    INFO: demotion candidate is able to make replication connection to promotion candidate
    INFO: archive mode is "off"
    INFO: replication lag on this standby is 0 seconds
    NOTICE: attempting to pause repmgrd on 2 nodes
    INFO: would pause repmgrd on node "Node1" (ID: 1)
    INFO: would pause repmgrd on node "Node2" (ID: 2)
    NOTICE: local node "Node1" (ID: 1) would be promoted to primary; current primary "Node2" (ID: 2) would be demoted to standby
    INFO: following shutdown command would be run on node "Node2":
      "/postgres/server/bin/pg_ctl  -D '/postgres/data' -W -m fast stop"
    INFO: parameter "shutdown_check_timeout" is set to 60 seconds
    INFO: prerequisites for executing STANDBY SWITCHOVER are met
    [postgres@Node1 ~]$
    
    2、主备切换
    [postgres@Node1 ~]$ repmgr standby switchover --siblings-follow --force-rewind
    NOTICE: executing switchover on node "Node1" (ID: 1)
    WARNING: option "--sibling-nodes" specified, but no sibling nodes exist
    NOTICE: attempting to pause repmgrd on 2 nodes
    NOTICE: local node "Node1" (ID: 1) will be promoted to primary; current primary "Node2" (ID: 2) will be demoted to standby
    NOTICE: stopping current primary node "Node2" (ID: 2)
    NOTICE: issuing CHECKPOINT on node "Node2" (ID: 2)
    DETAIL: executing server command "/postgres/server/bin/pg_ctl  -D '/postgres/data' -W -m fast stop"
    INFO: checking for primary shutdown; 1 of 60 attempts ("shutdown_check_timeout")
    INFO: checking for primary shutdown; 2 of 60 attempts ("shutdown_check_timeout")
    NOTICE: current primary has been cleanly shut down at location 0/30474B0
    NOTICE: promoting standby to primary
    DETAIL: promoting server "Node1" (ID: 1) using pg_promote()
    NOTICE: waiting up to 60 seconds (parameter "promote_check_timeout") for promotion to complete
    NOTICE: STANDBY PROMOTE successful
    DETAIL: server "Node1" (ID: 1) was successfully promoted to primary
    NOTICE: issuing CHECKPOINT on node "Node1" (ID: 1)
    ERROR: unable to execute CHECKPOINT
    WARNING: node "Node2" attached in state "startup"
    INFO: waiting for node "Node2" (ID: 2) to connect to new primary; 1 of max 60 attempts (parameter "node_rejoin_timeout")
    DETAIL: node "Node1" (ID: 2) is currently attached to its upstream node in state "startup"
    NOTICE: node "Node1" (ID: 1) promoted to primary, node "Node2" (ID: 2) demoted to standby
    NOTICE: switchover was successful
    DETAIL: node "Node1" is now primary and node "Node2" is attached as standby
    NOTICE: STANDBY SWITCHOVER has completed successfully
    [postgres@Node1 ~]$
    

    3、检查状态

    [postgres@Node1 ~]$ repmgr cluster show
     ID | Name  | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string
    
    ----+-------+---------+-----------+----------+----------+----------+----------+-------------------------------------------------------------------------
     1  | Node1 | primary | * running |          | default  | 100      | 3        | host=192.168.1.61 user=repmgr dbname=repmgr port=5432 connect_timeout=2
     2  | Node2 | standby |   running | Node1    | default  | 100      | 2        | host=192.168.1.62 user=repmgr dbname=repmgr port=5432 connect_timeout=2
    [postgres@Node1 ~]$
    

    六、总结

    repmgr‌是一个用于管理PostgreSQL服务器集群复制和故障转移的开源工具套件。它提供了一系列命令行工具,用于设置和管理PostgreSQL流复制集群。以下是一些常用的repmgr命令:

    • repmgr primary register - 初始化一个repmgr安装并注册主节点
    • repmgr primary unregister — 取消注册非活动的主节点
    • repmgr standby clone - 从另一个PostgreSQL节点克隆一个PostgreSQL备用节点
    • repmgr standby unregister — 从 repmgr 元数据中移除备库的信息
    • repmgr standby promote - 将备用实例提升为主实例
    • repmgr standby follow — 将运行中的备用连接到新的上游节点
    • repmgr standby switchover - 将备用提升为主用,并将现有主用降级为备用
    • repmgr witness register - 将见证节点的信息添加到 repmgr 元数据中
    • repmgr witness unregister - 从 repmgr 元数据中移除见证节点的信息
    • repmgr node status - 显示节点的基本信息和复制状态概览
    • repmgr node check - 从复制角度对节点执行一些健康检查
    • repmgr node rejoin —将休眠(停止)的节点重新加入复制集群
    • repmgr node service - 显示或执行系统服务命令以停止/启动/重启/重新加载/提升节点
    • repmgr cluster show - 显示复制集群中每个已注册节点的信息
    • repmgr cluster matrix - 在每个节点上运行repmgr集群显示,并汇总输出
    • repmgr cluster crosscheck - 检查每个节点组合之间的连接情况
    • repmgr cluster event - 输出格式化后的集群事件列表
    • repmgr cluster cleanup - 清除监控历史
    • repmgr service status - 显示集群中每个节点上repmgrd的状态信息
    • repmgr service pause - 指示复制集群中的所有repmgrd实例暂停故障切换操作
    • repmgr service unpause - 指示复制集群中的所有repmgrd实例恢复故障切换操作
    • repmgr daemon start — Start the repmgrd daemon on the local node
    • repmgr daemon start — 在本地节点上启动 repmgrd 守护进程
    • repmgr daemon stop — 在本地节点上停止 repmgrd 守护进程

    原创文章,作者:kirin,如若转载,请注明出处:https://blog.ytso.com/tech/bigdata/317974.html

    (0)
    上一篇 2025年11月22日 22:46
    下一篇 2025年11月22日 22:49

    相关推荐

    发表回复

    登录后才能评论