Redis、Zabbix


一、简述 redis 特点及其应用场景

Redis 特点

  • 速度快:10W QPS,基于内存,C 语言实现
  • 持久化
  • 支持多种数据结构:支持 string(字符串)、hash(哈希数据)、list(列表)、set(集合)、zset(有序集合)
  • 支持多种编码语言
  • 功能丰富:支持 Lua 脚本,发布订阅,事务,pipeline 等功能
  • 简单:代码短小精悍(单机核心代码只有 23000 行左右),单线程开发容易,不依赖外部库,使用简单
  • 主从复制
  • 支持高可用和分布式

Redis 典型应用场景

  • session 共享:常见于 Web 集群中的 Tomcat 或 PHP 中多 Web 服务器 session 共享
  • 缓存:数据查询、电商网站商品信息、新闻内容
  • 计数器:访问排行榜、商品浏览数等和次数相关的数值统计场景
  • 微博/微信社交场合:共同好友,粉丝数,关注,点赞评论等
  • 消息队列:ELK 的日志缓存、部分业务的订阅发布系统
  • 地理位置:基于 GEO(地理信息定位),实现摇一摇,附件的人,外卖等功能

二、对比 redis 的 RDB、AOF 模式的优缺点

1. RDB(Redis DataBase)模式

RDB 工作原理

image.png

RDB 基于时间的快照,其默认只保留当前最新的一次快照,特点是执行速度比较快,缺点是可能会丢失从上次快照到当前时间点之间未做快照的数据。

RDB bgsave(异步)实现快照具体过程

image.png

RDB 模式优缺点

优点

  • RDB 快照保存了某个时间点的数据,可以通过脚本执行 redis 指令 bgsave(非阻塞,后台执行)或者 save(会阻塞写操作,不推荐)命令自定义时间点备份,可以保留多个备份,当出现问题可以恢复到不同时间点的版本,很适合备份,并且此文件格式也支持有不少第三方工具可以进行后续的数据分析。

    比如: 可以在最近的 24 小时内,每小时备份一次 RDB 文件,并且在每个月的每一天,也备份一个 RDB 文件。这样的话,即使遇上问题,也可以随时将数据集还原到不同的版本。

  • RDB 可以最大化 Redis 的性能,父进程在保存 RDB 文件时唯一要做的就是 fork 出一个子进程,然后这个子进程就会处理接下来的所有保存工作,父进程无须执行任何磁盘工/0 操作。

  • RDB 在大量数据,比如几个 G 的数据,恢复的速度比 AOF 的快

缺点

  • 不能实时保存数据,可能会丢失自上一次执行 RDB 备份到当前的内存数据

    如果需要尽量避免在服务器故障时丢失数据,那么 RDB 不适合。虽然 Redis 允许设置不同的保存点(save point)来控制保存 RDB 文件的频率,但是,因为 RDB 文件需要保存整个数据集的状态,所以它并不是一个轻松快速的操作。因此一般会超过 5 分钟以上才保存一次 RDB 文件。在这种情况下,一旦发生故障停机,就可能会丢失好几分钟的数据。

  • 当数据量非常大的时候,从父进程 fork 子进程进行保存至 RDB 文件时需要一点时间,可能是毫秒或者秒,取决于磁盘 IO 性能

    在数据集比较庞大时,fork()可能会非常耗时,造成服务器在一定时间内停止处理客户端﹔如果数据集非常巨大,并且 CPU 时间非常紧张的话,那么这种停止时间甚至可能会长达整整一秒或更久。虽然 AOF 重写也需要进行 fork(),但无论 AOF 重写的执行间隔有多长,数据的持久性都不会有任何损失。

AOF(AppendOnlyFile)模式

AOF 工作原理

image.png

AOF 按照操作顺序依次将操作追加到指定的日志文件末尾。

注意:

同时启用 RDB 和 AOF,进行恢复时,默认 AOF 文件优先级高于 RDB 文件,即会使用 AOF 文件进行恢复;

AOF 模式默认是关闭的,第一次开启 AOF 后,并重启服务生效后,会因为 AOF 的优先级高于 RDB,而 AOF 默认没有文件存在,从而导致所有数据丢失。

AOF rewrite 重写

将一些重复的,可以合并的,过期的数据重新写入一个新的 AOF 文件,从而节约 AOF 备份占用的硬盘空间,也能加速恢复过程;可以手动执行 bgrewriteaof 触发 AOF,或定义自动 rewrite 策略。

AOF rewrite 过程

image.png

AOF 模式优缺点

优点

  • 数据安全性相对较高,根据所使用的 fsync 策略(fsync 是同步内存中 redis 所有已经修改的文件到存储设备),默认是 appendfsync everysec,即每秒执行一次 fsync,在这种配置下,Redis 仍然可以保持良好的性能,并且就算发生故障停机,也最多只会丢失一秒钟的数据( fsync 会在后台线程执行,所以主线程可以继续努力地处理命令请求)

  • 由于该机制对日志文件的写入操作采用的是 append 模式,因此在写入过程中不需要 seek, 即使出现宕机现象,也不会破坏日志文件中已经存在的内容。然而如果本次操作只是写入了一半数据就出现了系统崩溃问题,不用担心,在 Redis 下一次启动之前,可以通过 redis-check-aof 工具来解决数据一致性的问题

  • Redis 可以在 AOF 文件体积变得过大时,自动地在后台对 AOF 进行重写,重写后的新 AOF 文件包含了恢复当前数据集所需的最小命令集合。整个重写操作是绝对安全的,因为 Redis 在创建新 AOF 文件的过程中,append 模式不断的将修改数据追加到现有的 AOF 文件里面,即使重写过程中发生停机,现有的 AOF 文件也不会丢失。而一旦新 AOF 文件创建完毕,Redis 就会从旧 AOF 文件切换到新 AOF 文件,并开始对新 AOF 文件进行追加操作。

  • AOF 包含一个格式清晰、易于理解的日志文件用于记录所有的修改操作。事实上,也可以通过该文件完成数据的重建

    AOF 文件有序地保存了对数据库执行的所有写入操作,这些写入操作以 Redis 协议的格式保存,因此 AOF 文件的内容非常容易被人读懂,对文件进行分析(parse)也很轻松。导出(export)AOF 文件也非常简单:举个例子,如果不小心执行了 FLUSHALL.命令,但只要 AOF 文件未被重写,那么只要停止服务器,移除 AOF 文件末尾的 FLUSHAL 命令,并重启 Redis ,就可以将数据集恢复到 FLUSHALL 执行之前的状态。

缺点

  • 即使有些操作是重复的也会全部记录,AOF 的文件大小要大于 RDB 格式的文件
  • AOF 在恢复大数据集时的速度比 RDB 的恢复速度要慢
  • 根据 fsync 策略不同,AOF 速度可能会慢于 RDB
  • bug 出现的可能性更多

RDB 和 AOF 适用场景

  • 如果主要充当缓存功能,或者可以承受数分钟数据的丢失, 通常生产环境一般只需启用 RDB 即可,此也是默认值
  • 如果数据需要持久保存,一点不能丢失,可以选择同时开启 RDB 和 AOF
  • 一般不建议只开启 AOF

三、实现 redis 哨兵,模拟 master 故障场景

工作原理

image.png

image.png

实现哨兵(sentinel)模式

graph LR
M[Sentinel</br>10.0.0.7</br>master]
S1[Sentinel</br>10.0.0.17</br>slave1]
S2[Sentinel</br>10.0.0.27</br>slave2]
M—->S1
M—->S2

配置一主两从

一键编译 redis 安装脚本

#!/bin/bash
# 编译安装Redis

source /etc/init.d/functions
#Redis版本
Redis_version=redis-5.0.9
suffix=tar.gz
Redis=${Redis_version}.${suffix}
Password=123456

#redis源码下载地址
redis_url=http://download.redis.io/releases/${Redis}
#redis安装路径
redis_install_DIR=/apps/redis

# CPU数量
CPUS=`lscpu|grep "^CPU(s)"|awk '{print $2}'`
# 系统类型
os_type=`grep "^NAME" /etc/os-release |awk -F'"| ' '{print $2}'`
# 系统版本号
os_version=`awk -F'"' '/^VERSION_ID/{print $2}' /etc/os-release`

color () {
if [[ $2 -eq 0 ]];then
    echo -e "/e[1;32m$1/t/t/t/t/t/t[  OK  ]/e[0;m"
else
    echo $2
    echo -e "/e[1;31m$1/t/t/t/t/t/t[ FAILED ]/e[0;m"
fi
}


download_redis (){
# 安装依赖包
yum -y install gcc jemalloc-devel || { color "安装依赖包失败,请检查网络" 1 ;exit 1;}

cd /opt
if [ -e ${Redis} ];then
	color "Redis源码包已存在" 0
else
	color "开始下载Redis源码包" 0
	wget ${redis_url}
	if [ $? -ne 0 ];then
		color "下载Redis源码包失败,退出!" 1
		exit 1
	fi
fi
}


install_redis (){
# 解压源码包
tar xvf /opt/${Redis} -C /usr/local/src
ln -s /usr/local/src/${Redis_version} /usr/local/src/redis

# 编译安装
cd /usr/local/src/redis
make -j ${CPUS} install PREFIX=${redis_install_DIR}
if [ $? -ne 0 ];then
	color "redis 编译安装失败!" 1
	exit 1
else
	color "redis编译安装成功" 0
fi

ln -s ${redis_install_DIR}/bin/redis-* /usr/sbin/

# 添加用户
if id redis &> /dev/null;then
	color "redis用户已存在" 1
else
	useradd -r -s /sbin/nologin redis
	color "redis用户已创建完成" 0
fi
mkdir -p ${redis_install_DIR}/{etc,log,data,run}

#准备redis配置文件
cp redis.conf ${redis_install_DIR}/etc/
sed -i "s/bind 127.0.0.1/bind 0.0.0.0/" ${redis_install_DIR}/etc/redis.conf
sed -i "/# requirepass/a requirepass ${Password}" ${redis_install_DIR}/etc/redis.conf
sed -i "s@^dir .*/$@dir ${redis_install_DIR}//data@" ${redis_install_DIR}/etc/redis.conf
sed -i "s@^logfile .*/$@logfile ${redis_install_DIR}//log//redis-6379.log@" ${redis_install_DIR}/etc/redis.conf
sed -i "s@^pidfile .*/$@pidfile ${redis_install_DIR}//run//redis-6379.pid@" ${redis_install_DIR}/etc/redis.conf

chown -R redis:redis ${redis_install_DIR}

cat >> /etc/sysctl.conf <<EOF
net.core.somaxconn = 1024
vm.overcommit_memory = 1
EOF
sysctl -p

echo 'echo never > /sys/kernel/mm/transparent_hugepage/enabled' >> /etc/rc.d/rc.local
chmod +x /etc/rc.d/rc.local
source /etc/rc.d/rc.local


# 准备service服务
cat > /usr/lib/systemd/system/redis.service <<EOF
[Unit]
Description=redis persistent key-value database
After=network.target

[Service]
ExecStart=${redis_install_DIR}/bin/redis-server ${redis_install_DIR}/etc/redis.conf --supervised systemd
ExecStop=/bin/kill -s QUIT /$MAINPID
Type=notify
User=redis
Group=redis
RuntimeDirectory=redis
RuntimeDirectoryMode=0755

[Install]
WantedBy=multi-user.target
EOF

chown -R redis:redis ${redis_install_DIR}
systemctl daemon-reload
systemctl enable --now redis
systemctl is-active redis

if [ $? -ne 0 ];then
	color "redis服务启动失败!" 1
	exit 1
else
	color "redis服务启动成功" 0
	color "redis安装已完成" 0
fi
}


download_redis

install_redis

exit 0
  1. master 节点配置

    #修改redis.conf配置
    vim /apps/redis/etc/redis.conf
    bind 0.0.0.0
    masterauth "123456"
    requirepass "123456"
    
    #重启redis
    systemctl restart redis
    
  2. slave 节点配置

    #修改redis.conf配置
    vim /apps/redis/etc/redis.conf
    bind 0.0.0.0
    masterauth "123456"
    requirepass "123456"
    replicaof 10.0.0.7 6379
    
    #重启redis
    systemctl restart redis
    
  3. 状态查看

    master

    [root@master ~]# redis-cli -a 123456
    Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
    127.0.0.1:6379> info replication
    # Replication
    role:master
    connected_slaves:2
    slave0:ip=10.0.0.27,port=6379,state=online,offset=28,lag=1
    slave1:ip=10.0.0.17,port=6379,state=online,offset=28,lag=1
    master_replid:14883e4254918d97c50ec0f05c6b7b741e09cc59
    master_replid2:0000000000000000000000000000000000000000
    master_repl_offset:28
    second_repl_offset:-1
    repl_backlog_active:1
    repl_backlog_size:1048576
    repl_backlog_first_byte_offset:1
    repl_backlog_histlen:28
    127.0.0.1:6379> 
    
    

    slave1

    [root@slave1 ~]# redis-cli -a 123456
    Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
    127.0.0.1:6379> info replication
    # Replication
    role:slave
    master_host:10.0.0.7
    master_port:6379
    master_link_status:up
    master_last_io_seconds_ago:9
    master_sync_in_progress:0
    slave_repl_offset:154
    slave_priority:100
    slave_read_only:1
    connected_slaves:0
    master_replid:14883e4254918d97c50ec0f05c6b7b741e09cc59
    master_replid2:0000000000000000000000000000000000000000
    master_repl_offset:154
    second_repl_offset:-1
    repl_backlog_active:1
    repl_backlog_size:1048576
    repl_backlog_first_byte_offset:1
    repl_backlog_histlen:154
    127.0.0.1:6379> 
    

    slave2

    [root@slave2 ~]# redis-cli -a 123456
    Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
    127.0.0.1:6379> info replication
    # Replication
    role:slave
    master_host:10.0.0.7
    master_port:6379
    master_link_status:up
    master_last_io_seconds_ago:5
    master_sync_in_progress:0
    slave_repl_offset:210
    slave_priority:100
    slave_read_only:1
    connected_slaves:0
    master_replid:14883e4254918d97c50ec0f05c6b7b741e09cc59
    master_replid2:0000000000000000000000000000000000000000
    master_repl_offset:210
    second_repl_offset:-1
    repl_backlog_active:1
    repl_backlog_size:1048576
    repl_backlog_first_byte_offset:1
    repl_backlog_histlen:210
    127.0.0.1:6379> 
    

编辑哨兵配置文件

Sentinel实际上是一个特殊的redis服务器,有些redis指令支持,但很多指令并不支持.默认监听在26379/tcp端口。

哨兵可以不和Redis服务器部署在一起,但一般部署在一起。

  • 配置sentinel文件
cp /usr/local/src/redis/sentinel.conf /apps/redis/etc/redis-sentinel.conf
cd /apps/redis/etc/
#配置sentinel
[root@master etc]# grep "^[a-Z]" redis-sentinel.conf
bind 0.0.0.0
port 26379
daemonize yes
pidfile /apps/redis/run/redis-sentinel.pid
logfile /apps/redis/log/sentinel_26379.log
dir /apps/redis/data
sentinel monitor mymaster 10.0.0.7 6379 2
sentinel auth-pass mymaster 123456
sentinel down-after-milliseconds mymaster 3000
sentinel parallel-syncs mymaster 1
sentinel failover-timeout mymaster 180000
sentinel deny-scripts-reconfig yes

#启动sentinel
[root@master etc]# redis-sentinel /apps/redis/etc/redis-sentinel.conf 
#查看sentinel配置信息
[root@master etc]# grep "^[a-Z]" redis-sentinel.conf
bind 0.0.0.0
port 26379
daemonize yes
pidfile /apps/redis/run/redis-sentinel.pid
logfile /apps/redis/log/sentinel_26379.log
dir /apps/redis/data

sentinel deny-scripts-reconfig yes
sentinel monitor mymaster 10.0.0.7 6379 2
sentinel parallel-syncs mymaster 1
sentinel down-after-milliseconds mymaster 3000
sentinel auth-pass mymaster 123456
sentinel config-epoch mymaster 0
#以下内容为自动生成
sentinel myid c663d4b9db845d721cd6dccf608c7904d896b745      #myid必须唯一
protected-mode no
sentinel leader-epoch mymaster 0
sentinel known-replica mymaster 10.0.0.27 6379
sentinel known-replica mymaster 10.0.0.17 6379
sentinel known-sentinel mymaster 10.0.0.27 26379 66f276f274802c6f0243007a2be4b04001b9867e
sentinel known-sentinel mymaster 10.0.0.17 26379 5d3a6880bd134e211c77bef6bc408ab63a1fd3ac
sentinel current-epoch 0

配置sentinel服务

[root@shichu ~]# cat /lib/systemd/system/redis-sentinel.service
[Unit]
Description=Redis Sentinel
After=network.target
After=network-online.target
Wants=network-online.target

[Service]
ExecStart=/apps/redis/bin/redis-sentinel /apps/redis/etc/redis-sentinel.conf --supervised systemd
ExecStop=/bin/kill -s QUIT $MAINPID
Type=notify
User=redis
Group=redis
RuntimeDirectory=redis
RuntimeDirectoryMode=0755

[Install]
WantedBy=multi-user.target

启动sentinel服务

chown -R redis:redis /apps/redis
systemctl daemon-reload
systemctl enable --now redis-sentinel

sentinel配置参数说明

sentinel monitor mymaster 10.0.0.8 6379 2 # 指定当前mymaster集群中master服务器的地址和端口

2为法定人数限制(quorum),即有几个sentinel认为master down了就进行故障转移,一般此值是所有sentinel节点(一般总数是>=3的 奇数,如:3,5,7等)的一半以上的整数值,比如,总数是3,即3/2=1.5,取整为2,是master的ODOWN客观下线的依据

sentinel auth-pass mymaster 123456 #mymaster集群中master的密码,注意此行要在上面行的下面

sentinel down-after-milliseconds mymaster 30000 #(SDOWN)判断mymaster集群中所有节点的主观下线的时间,单位:毫秒,建议3000

sentinel parallel-syncs mymaster 1 #发生故障转移后,同时向新master同步数据的slave数量,数字越小总同步时间越长,但可以减轻新master的负载压力

sentinel failover-timeout mymaster 180000 #所有slaves指向新的master所需的超时时间,单位:毫秒

sentinel deny-scripts-reconfig yes #禁止修改脚本

  • 查看端口
[root@master etc]# ss -ntl
State       Recv-Q Send-Q                                  Local Address:Port                                                 Peer Address:Port            
LISTEN      0      100                                         127.0.0.1:25                                                              *:*                
LISTEN      0      511                                                 *:26379                                                           *:*                
LISTEN      0      511                                                 *:6379                                                            *:*                
LISTEN      0      128                                                 *:111                                                             *:*                
LISTEN      0      128                                                 *:22                                                              *:*                
LISTEN      0      100                                             [::1]:25                                                           [::]:*                
LISTEN      0      128                                              [::]:111                                                          [::]:*                
LISTEN      0      128                                              [::]:22  
  • 查看sentinel日志

    master日志

    [root@master redis]# tail /apps/redis/log/sentinel_26379.log
    1491:X 11 Jul 2022 16:38:43.636 * supervised by systemd, will signal readiness
    1491:X 11 Jul 2022 16:38:43.637 * Increased maximum number of open files to 10032 (it was originally set to 1024).
    1491:X 11 Jul 2022 16:38:43.637 * Running mode=sentinel, port=26379.
    1491:X 11 Jul 2022 16:38:43.638 # Sentinel ID is c663d4b9db845d721cd6dccf608c7904d896b745
    1491:X 11 Jul 2022 16:38:43.638 # +monitor master mymaster 10.0.0.7 6379 quorum 2
    1491:X 11 Jul 2022 16:38:46.640 # +sdown sentinel 5d3a6880bd134e211c77bef6bc408ab63a1fd3ac 10.0.0.17 26379 @ mymaster 10.0.0.7 6379
    1491:X 11 Jul 2022 16:38:46.640 # +sdown sentinel 66f276f274802c6f0243007a2be4b04001b9867e 10.0.0.27 26379 @ mymaster 10.0.0.7 6379
    1491:X 11 Jul 2022 16:39:20.763 # -sdown sentinel 5d3a6880bd134e211c77bef6bc408ab63a1fd3ac 10.0.0.17 26379 @ mymaster 10.0.0.7 6379
    1491:X 11 Jul 2022 16:39:48.855 # -sdown sentinel 66f276f274802c6f0243007a2be4b04001b9867e 10.0.0.27 26379 @ mymaster 10.0.0.7 6379
    

    slave1日志

    [root@slave1 ~]# tail /apps/redis/log/sentinel_26379.log
    1293:X 11 Jul 2022 16:39:19.722 # Redis version=5.0.9, bits=64, commit=00000000, modified=0, pid=1293, just started
    1293:X 11 Jul 2022 16:39:19.722 # Configuration loaded
    1293:X 11 Jul 2022 16:39:19.722 * supervised by systemd, will signal readiness
    1293:X 11 Jul 2022 16:39:19.723 * Increased maximum number of open files to 4096 (it was originally set to 1024).
    1293:X 11 Jul 2022 16:39:19.724 * Running mode=sentinel, port=26379.
    1293:X 11 Jul 2022 16:39:19.724 # Sentinel ID is 5d3a6880bd134e211c77bef6bc408ab63a1fd3ac
    1293:X 11 Jul 2022 16:39:19.724 # +monitor master mymaster 10.0.0.7 6379 quorum 2
    1293:X 11 Jul 2022 16:39:22.777 # +sdown sentinel 66f276f274802c6f0243007a2be4b04001b9867e 10.0.0.27 26379 @ mymaster 10.0.0.7 6379
    1293:X 11 Jul 2022 16:39:48.988 # -sdown sentinel 66f276f274802c6f0243007a2be4b04001b9867e 10.0.0.27 26379 @ mymaster 10.0.0.7 6379
    

    slave2日志

    [root@slave2 ~]# tail /apps/redis/log/sentinel_26379.log
    900:X 11 Jul 2022 16:32:23.322 # +sdown sentinel 605f713c7e6554ae0bfed0b98304e29d6a69e678 10.0.0.37 26379 @ mymaster 10.0.0.7 6379
    1256:X 11 Jul 2022 16:39:48.523 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
    1256:X 11 Jul 2022 16:39:48.523 # Redis version=5.0.9, bits=64, commit=00000000, modified=0, pid=1256, just started
    1256:X 11 Jul 2022 16:39:48.523 # Configuration loaded
    1256:X 11 Jul 2022 16:39:48.523 * supervised by systemd, will signal readiness
    1256:X 11 Jul 2022 16:39:48.524 * Increased maximum number of open files to 4096 (it was originally set to 1024).
    1256:X 11 Jul 2022 16:39:48.525 * Running mode=sentinel, port=26379.
    1256:X 11 Jul 2022 16:39:48.525 # Sentinel ID is 66f276f274802c6f0243007a2be4b04001b9867e
    1256:X 11 Jul 2022 16:39:48.525 # +monitor master mymaster 10.0.0.7 6379 quorum 2
    
  • 查看sentinel状态

    [root@master redis]# redis-cli -a 123456 -p 26379
    Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
    127.0.0.1:26379> info sentinel
    sentinel_masters:1
    sentinel_tilt:0
    sentinel_running_scripts:0
    sentinel_scripts_queue_length:0
    sentinel_simulate_failure_flags:0
    master0:name=mymaster,status=ok,address=10.0.0.7:6379,slaves=2,sentinels=3
    #两个slave,三个sentinel服务器,如果sentinels值不符合,检查myid可能冲突
    

模拟故障转移

  • 停止master redis
[root@master etc]# systemctl stop redis
[root@master etc]# ss -ntl
State       Recv-Q Send-Q                                  Local Address:Port                                                 Peer Address:Port      
LISTEN      0      100                                         127.0.0.1:25                                                              *:*          
LISTEN      0      511                                                 *:26379                                                           *:*          
LISTEN      0      128                                                 *:111                                                             *:*          
LISTEN      0      128                                                 *:22                                                              *:*          
LISTEN      0      100                                             [::1]:25                                                           [::]:*          
LISTEN      0      128                                              [::]:111                                                          [::]:*          
LISTEN      0      128                                              [::]:22 
  • 故障转移时sentinel信息
[root@master redis]# tail -f /apps/redis/log/sentinel_26379.log 
1491:X 11 Jul 2022 17:07:16.959 # +sdown master mymaster 10.0.0.7 6379
1491:X 11 Jul 2022 17:07:17.044 # +odown master mymaster 10.0.0.7 6379 #quorum 2/2
1491:X 11 Jul 2022 17:07:17.044 # +new-epoch 4
1491:X 11 Jul 2022 17:07:17.044 # +try-failover master mymaster 10.0.0.7 6379
1491:X 11 Jul 2022 17:07:17.045 # +vote-for-leader c663d4b9db845d721cd6dccf608c7904d896b745 4
1491:X 11 Jul 2022 17:07:17.048 # 5d3a6880bd134e211c77bef6bc408ab63a1fd3ac voted for c663d4b9db845d721cd6dccf608c7904d896b745 4
1491:X 11 Jul 2022 17:07:17.050 # 66f276f274802c6f0243007a2be4b04001b9867e voted for c663d4b9db845d721cd6dccf608c7904d896b745 4
1491:X 11 Jul 2022 17:07:17.102 # +elected-leader master mymaster 10.0.0.7 6379
1491:X 11 Jul 2022 17:07:17.102 # +failover-state-select-slave master mymaster 10.0.0.7 6379
1491:X 11 Jul 2022 17:07:17.205 # +selected-slave slave 10.0.0.27:6379 10.0.0.27 6379 @ mymaster 10.0.0.7 6379
1491:X 11 Jul 2022 17:07:17.205 * +failover-state-send-slaveof-noone slave 10.0.0.27:6379 10.0.0.27 6379 @ mymaster 10.0.0.7 6379
1491:X 11 Jul 2022 17:07:17.269 * +failover-state-wait-promotion slave 10.0.0.27:6379 10.0.0.27 6379 @ mymaster 10.0.0.7 6379
1491:X 11 Jul 2022 17:07:18.078 # +promoted-slave slave 10.0.0.27:6379 10.0.0.27 6379 @ mymaster 10.0.0.7 6379
1491:X 11 Jul 2022 17:07:18.078 # +failover-state-reconf-slaves master mymaster 10.0.0.7 6379
1491:X 11 Jul 2022 17:07:18.145 * +slave-reconf-sent slave 10.0.0.17:6379 10.0.0.17 6379 @ mymaster 10.0.0.7 6379
1491:X 11 Jul 2022 17:07:19.144 * +slave-reconf-inprog slave 10.0.0.17:6379 10.0.0.17 6379 @ mymaster 10.0.0.7 6379
1491:X 11 Jul 2022 17:07:19.144 * +slave-reconf-done slave 10.0.0.17:6379 10.0.0.17 6379 @ mymaster 10.0.0.7 6379
1491:X 11 Jul 2022 17:07:19.228 # -odown master mymaster 10.0.0.7 6379
1491:X 11 Jul 2022 17:07:19.228 # +failover-end master mymaster 10.0.0.7 6379
1491:X 11 Jul 2022 17:07:19.228 # +switch-master mymaster 10.0.0.7 6379 10.0.0.27 6379        #可看出master节点已转移到10.0.0.27上
1491:X 11 Jul 2022 17:07:19.229 * +slave slave 10.0.0.17:6379 10.0.0.17 6379 @ mymaster 10.0.0.27 6379
1491:X 11 Jul 2022 17:07:19.229 * +slave slave 10.0.0.7:6379 10.0.0.7 6379 @ mymaster 10.0.0.27 6379
1491:X 11 Jul 2022 17:07:22.276 # +sdown slave 10.0.0.7:6379 10.0.0.7 6379 @ mymaster 10.0.0.27 6379

日志参数说明

+reset-master :主服务器已被重置。
+slave :一个新的从服务器已经被 Sentinel 识别并关联。
+failover-state-reconf-slaves :故障转移状态切换到了 reconf-slaves 状态。
+failover-detected :另一个 Sentinel 开始了一次故障转移操作,或者一个从服务器转换成了主服务器。
+slave-reconf-sent :领头(leader)的 Sentinel 向实例发送了 SLAVEOF 命令,为实例设置新的主服务器。
+slave-reconf-inprog :实例正在将自己设置为指定主服务器的从服务器,但相应的同步过程仍未完成。
+slave-reconf-done :从服务器已经成功完成对新主服务器的同步。
-dup-sentinel :对给定主服务器进行监视的一个或多个 Sentinel 已经因为重复出现而被移除 —— 当 Sentinel 实例重启的时候,就会出现这种情况。
+sentinel :一个监视给定主服务器的新 Sentinel 已经被识别并添加。
+sdown :给定的实例现在处于主观下线状态。
-sdown :给定的实例已经不再处于主观下线状态。
+odown :给定的实例现在处于客观下线状态。
-odown :给定的实例已经不再处于客观下线状态。
+new-epoch :当前的纪元(epoch)已经被更新。
+try-failover :一个新的故障迁移操作正在执行中,等待被大多数 Sentinel 选中(waiting to be elected by the majority)。
+elected-leader :赢得指定纪元的选举,可以进行故障迁移操作了。
+failover-state-select-slave :故障转移操作现在处于 select-slave 状态 —— Sentinel 正在寻找可以升级为主服务器的从服务器。
no-good-slave :Sentinel 操作未能找到适合进行升级的从服务器。Sentinel 会在一段时间之后再次尝试寻找合适的从服务器来进行升级,又或者直接放弃执行故障转移操作。
selected-slave :Sentinel 顺利找到适合进行升级的从服务器。
failover-state-send-slaveof-noone :Sentinel 正在将指定的从服务器升级为主服务器,等待升级功能完成。
failover-end-for-timeout :故障转移因为超时而中止,不过最终所有从服务器都会开始复制新的主服务器(slaves will eventually be configured to replicate with the new master anyway)。
failover-end :故障转移操作顺利完成。所有从服务器都开始复制新的主服务器了。
+switch-master :配置变更,主服务器的 IP 和地址已经改变。 这是绝大多数外部用户都关心的信息。
+tilt :进入 tilt 模式。
-tilt :退出 tilt 模式。

  • 故障转移后

    redis配置文件中replicaof的master IP自动修改

    [root@slave1 ~]# grep "^replicaof" /apps/redis/etc/redis.conf 
    replicaof 10.0.0.27 6379
    

    sentinel配置文件的sentinel monitor IP自动修改

    [root@slave1 ~]# grep "^sentinel monitor" /apps/redis/etc/redis-sentinel.conf 
    sentinel monitor mymaster 10.0.0.27 6379 2
    
  • redis状态

    新master状态

    [root@slave2 ~]# redis-cli -a 123456
    Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
    127.0.0.1:6379> info replication
    # Replication
    role:master
    connected_slaves:1
    slave0:ip=10.0.0.17,port=6379,state=online,offset=4290787,lag=1
    master_replid:590248f1058be0774dab136e8fb18a8e5b5e4078
    master_replid2:90a5507845cbc6319a7f704ec666b28aa7e9b5ff
    master_repl_offset:4290787
    second_repl_offset:3910006
    repl_backlog_active:1
    repl_backlog_size:1048576
    repl_backlog_first_byte_offset:3242212
    repl_backlog_histlen:1048576
    127.0.0.1:6379> 
    

    另一个slave指向新的master

    [root@slave1 ~]# redis-cli -a 123456
    Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
    127.0.0.1:6379> info replication
    # Replication
    role:slave
    master_host:10.0.0.27
    master_port:6379
    master_link_status:up
    master_last_io_seconds_ago:0
    master_sync_in_progress:0
    slave_repl_offset:4296387
    slave_priority:100
    slave_read_only:1
    connected_slaves:0
    master_replid:590248f1058be0774dab136e8fb18a8e5b5e4078
    master_replid2:90a5507845cbc6319a7f704ec666b28aa7e9b5ff
    master_repl_offset:4296387
    second_repl_offset:3910006
    repl_backlog_active:1
    repl_backlog_size:1048576
    repl_backlog_first_byte_offset:3247812
    repl_backlog_histlen:1048576
    127.0.0.1:6379> 
    
    
  • 恢复原故障master重新加入redis集群

    [root@master redis]# systemctl start redis

    原master状态

    #redis配置指向新的master节点
    [root@master redis]# grep "^replicaof" /apps/redis/etc/redis.conf
    replicaof 10.0.0.27 6379
    
    #查看redis状态
    [root@master redis]# redis-cli -a 123456
    Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
    127.0.0.1:6379> info replication
    # Replication
    role:slave
    master_host:10.0.0.27
    master_port:6379
    master_link_status:up
    master_last_io_seconds_ago:0
    master_sync_in_progress:0
    slave_repl_offset:4366815
    slave_priority:100
    slave_read_only:1
    connected_slaves:0
    master_replid:590248f1058be0774dab136e8fb18a8e5b5e4078
    master_replid2:0000000000000000000000000000000000000000
    master_repl_offset:4366815
    second_repl_offset:-1
    repl_backlog_active:1
    repl_backlog_size:1048576
    repl_backlog_first_byte_offset:4343555
    repl_backlog_histlen:23261
    
    #查看sentinel状态
    [root@master redis]# redis-cli -a 123456 -p 26379
    Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
    127.0.0.1:26379> info sentinel
    # Sentinel
    sentinel_masters:1
    sentinel_tilt:0
    sentinel_running_scripts:0
    sentinel_scripts_queue_length:0
    sentinel_simulate_failure_flags:0
    master0:name=mymaster,status=ok,address=10.0.0.27:6379,slaves=2,sentinels=3
    

    新master状态

    #redis状态
    [root@slave2 ~]# redis-cli -a 123456
    Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
    127.0.0.1:6379> info replication
    # Replication
    role:master
    connected_slaves:2
    slave0:ip=10.0.0.17,port=6379,state=online,offset=4407027,lag=0
    slave1:ip=10.0.0.7,port=6379,state=online,offset=4407160,lag=0
    master_replid:590248f1058be0774dab136e8fb18a8e5b5e4078
    master_replid2:90a5507845cbc6319a7f704ec666b28aa7e9b5ff
    master_repl_offset:4407293
    second_repl_offset:3910006
    repl_backlog_active:1
    repl_backlog_size:1048576
    repl_backlog_first_byte_offset:3358718
    repl_backlog_histlen:1048576
    
    
    #sentinel日志
    [root@slave2 ~]# tail -f /apps/redis/log/sentinel_26379.log
    1256:X 11 Jul 2022 17:07:17.049 # +new-epoch 4
    1256:X 11 Jul 2022 17:07:17.052 # +vote-for-leader c663d4b9db845d721cd6dccf608c7904d896b745 4
    1256:X 11 Jul 2022 17:07:17.068 # +odown master mymaster 10.0.0.7 6379 #quorum 3/2
    1256:X 11 Jul 2022 17:07:17.068 # Next failover delay: I will not start a failover before Mon Jul 11 17:13:17 2022
    1256:X 11 Jul 2022 17:07:18.149 # +config-update-from sentinel c663d4b9db845d721cd6dccf608c7904d896b745 10.0.0.7 26379 @ mymaster 10.0.0.7 6379
    1256:X 11 Jul 2022 17:07:18.149 # +switch-master mymaster 10.0.0.7 6379 10.0.0.27 6379
    1256:X 11 Jul 2022 17:07:18.149 * +slave slave 10.0.0.17:6379 10.0.0.17 6379 @ mymaster 10.0.0.27 6379
    1256:X 11 Jul 2022 17:07:18.149 * +slave slave 10.0.0.7:6379 10.0.0.7 6379 @ mymaster 10.0.0.27 6379
    1256:X 11 Jul 2022 17:07:21.189 # +sdown slave 10.0.0.7:6379 10.0.0.7 6379 @ mymaster 10.0.0.27 6379
    1256:X 11 Jul 2022 17:43:54.361 # -sdown slave 10.0.0.7:6379 10.0.0.7 6379 @ mymaster 10.0.0.27 6379
    
    
  • sentinel运维

    手动让主节点下线

    sentinel failover <masterName>
    

    范例

    #可指定优先级,值越小sentinel会优先将之选为新的master,默为值为100
    [root@slave1 ~]# grep 'replica-priority' /apps/redis/etc/redis.conf 
    replica-priority 30
    
    [root@slave1 ~]# redis-cli -a 123456 -p 26379
    127.0.0.1:26379> sentinel failover mymaster
    OK
    127.0.0.1:26379> info sentinel
    # Sentinel
    sentinel_masters:1
    sentinel_tilt:0
    sentinel_running_scripts:0
    sentinel_scripts_queue_length:0
    sentinel_simulate_failure_flags:0
    master0:name=mymaster,status=ok,address=10.0.0.17:6379,slaves=2,sentinels=3
    

四、简述 redis 集群的实现原理

Redis Cluster特点

  • 所有Redis节点使用(PING机制)互联
  • 集群中某个节点的是否失效,是由整个集群中超过半数的节点监测都失效,才能算真正的失效
  • 客户端不需要proxy即可直接连接redis,应用程序中需要配置有全部的redis服务器IP
  • redis cluster把所有的redis node 平均映射到 0-16383个槽位(slot)上,读写需要到指定的redis node上进行操作,因此有多少个redis node相当于redis 并发扩展了多少倍,每个redis node 承担16384/N个槽位
  • Redis cluster预先分配16384个(slot)槽位,当需要在redis集群中写入一个key -value的时候,会使用CRC16(key) mod 16384之后的值,决定将key写入值哪一个槽位从而决定写入哪一个Redis节点上,从而有效解决单机瓶颈。

Redis cluster 架构

image.png

五、基于 redis5 的 redis cluster 部署

官方文档:https://redis.io/topics/cluster-tutorial

创建Redis Cluster准备条件

  • 每个redis 节点采用相同的硬件配置、相同的密码、相同的redis版本

  • 所有redis服务器必须没有任何数据

  • 准备6台机器,三主三从架构

    image.png

    #集群节点
    Redis-node1:10.0.0.7
    Redis-node2:10.0.0.17
    Redis-node3:10.0.0.27
    Redis-node4: 10.0.0.37
    Redis-node5: 10.0.0.47
    Redis-node6: 10.0.0.57
    #预留节点
    10.0.0.67
    10.0.0.77
    

部署redis cluster

1. 安装redis

修改redis配置

[root@node1 etc]# cat redis.conf 
...
bind 0.0.0.0
masterauth 123456   #建议配置,否则后期的master和slave主从复制无法成功,还需再配置
requirepass 123456
cluster-enabled yes #取消此行注释,必须开启集群,开启后redis 进程会有cluster显示
cluster-config-file nodes-6379.conf #取消此行注释,此为集群状态文件,记录主从关系及slot范围信息,由redis cluster 集群自动创建和维护
cluster-require-full-coverage no   #默认值为yes,设为no可以防止一个节点不可用导致整个cluster不可能
...

[root@node1 etc]#systemctl enable --now redis

2. 查看当前redis状态

#查看端口
[root@node1 ~]# ss -ntl
State      Recv-Q Send-Q                Local Address:Port                               Peer Address:Port        
LISTEN     0      511                               *:6379                                          *:*            
LISTEN     0      128                               *:111                                           *:*            
LISTEN     0      128                               *:22                                            *:*            
LISTEN     0      100                       127.0.0.1:25                                            *:*            
LISTEN     0      511                               *:16379                                         *:*            
LISTEN     0      128                            [::]:111                                        [::]:*            
LISTEN     0      128                            [::]:22                                         [::]:*            
LISTEN     0      100                           [::1]:25                                         [::]:*

#查看进程有[cluster]状态
[root@node1 ~]# ps aux|grep redis
redis     24754  0.2  0.3 153996  3172 ?        Ssl  21:28   0:02 /apps/redis/bin/redis-server 0.0.0.0:6379 [cluster]
root      24822  0.0  0.0 112812   980 pts/0    R+   21:44   0:00 grep --color=auto redis

3. 创建集群

[root@node1 ~]# redis-cli -a 123456 --cluster create 10.0.0.7:6379 10.0.0.17:6379 10.0.0.27:6379 10.0.0.37:6379 /
10.0.0.47:6379 10.0.0.57:6379 --cluster-replicas 1
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
>>> Performing hash slots allocation on 6 nodes...
Master[0] -> Slots 0 - 5460
Master[1] -> Slots 5461 - 10922
Master[2] -> Slots 10923 - 16383
Adding replica 10.0.0.47:6379 to 10.0.0.7:6379
Adding replica 10.0.0.57:6379 to 10.0.0.17:6379
Adding replica 10.0.0.37:6379 to 10.0.0.27:6379
M: 4ccee0bb38763061cf567995bcdd9289cea9cfec 10.0.0.7:6379	#带M的为master
   slots:[0-5460] (5461 slots) master				#当前master的槽位起始和结束位
M: 12fdc235442ed40a838e77b246025799b4b3357b 10.0.0.17:6379
   slots:[5461-10922] (5462 slots) master
M: 16bb6630a6a09bd4b24d7a203ecaa38b9a4360a7 10.0.0.27:6379
   slots:[10923-16383] (5461 slots) master
S: 59eac16e6e2992cdfffe97934d7409afe21d2a9a 10.0.0.37:6379	#带S的slave
   replicates 16bb6630a6a09bd4b24d7a203ecaa38b9a4360a7
S: 15e2e2eccefd453f1a154fc42c6a9b030acacfb2 10.0.0.47:6379
   replicates 4ccee0bb38763061cf567995bcdd9289cea9cfec
S: 8c3b8146ce75ab277958937d4e79e893a15c50e2 10.0.0.57:6379
   replicates 12fdc235442ed40a838e77b246025799b4b3357b
Can I set the above configuration? (type 'yes' to accept): yes	#输入yes自动创建集群
>>> Nodes configuration updated
>>> Assign a different config epoch to each node
>>> Sending CLUSTER MEET messages to join the cluster
Waiting for the cluster to join
...
>>> Performing Cluster Check (using node 10.0.0.7:6379)
M: 4ccee0bb38763061cf567995bcdd9289cea9cfec 10.0.0.7:6379
   slots:[0-5460] (5461 slots) master				#已经分配的槽位
   1 additional replica(s)					#分配了一个slave
S: 59eac16e6e2992cdfffe97934d7409afe21d2a9a 10.0.0.37:6379
   slots: (0 slots) slave					#slave没有分配槽位
   replicates 16bb6630a6a09bd4b24d7a203ecaa38b9a4360a7		#对应的master的10.0.0.27的ID
M: 12fdc235442ed40a838e77b246025799b4b3357b 10.0.0.17:6379
   slots:[5461-10922] (5462 slots) master
   1 additional replica(s)
S: 8c3b8146ce75ab277958937d4e79e893a15c50e2 10.0.0.57:6379
   slots: (0 slots) slave
   replicates 12fdc235442ed40a838e77b246025799b4b3357b		#对应的master的10.0.0.17的ID
M: 16bb6630a6a09bd4b24d7a203ecaa38b9a4360a7 10.0.0.27:6379
   slots:[10923-16383] (5461 slots) master
   1 additional replica(s)
S: 15e2e2eccefd453f1a154fc42c6a9b030acacfb2 10.0.0.47:6379
   slots: (0 slots) slave
   replicates 4ccee0bb38763061cf567995bcdd9289cea9cfec		#对应的master的10.0.0.7的ID
[OK] All nodes agree about slots configuration.		#所有节点槽位分配完成
>>> Check for open slots...				#检查打开的槽位
>>> Check slots coverage...				#检查插槽覆盖范围
[OK] All 16384 slots covered.				 #所有槽位(16384个)分配完成	
[root@node1 ~]# 


观察以上结果,可以看到3组master/slave

master:10.0.0.7-->slave:10.0.0.47
master:10.0.0.17-->slave:10.0.0.57
master:10.0.0.27-->slave:10.0.0.37

4. 查看主从状态

node1(10.0.0.7)

[root@node1 ~]# redis-cli -a 123456 -c info replication
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
# Replication
role:master
connected_slaves:1
slave0:ip=10.0.0.47,port=6379,state=online,offset=1008,lag=1
master_replid:3493f56b2f698cea41c90cb0a41e1562b5821636
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:1008
second_repl_offset:-1
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:1
repl_backlog_histlen:1008

node2(10.0.0.17)

[root@node2 etc]# redis-cli -a 123456 -c info replication
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
# Replication
role:master
connected_slaves:1
slave0:ip=10.0.0.57,port=6379,state=online,offset=1008,lag=0
master_replid:269568d06cb92748f583d6ea900e7563b1739f54
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:1008
second_repl_offset:-1
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:1
repl_backlog_histlen:1008

node3(10.0.0.27)

[root@node3 ~]# redis-cli -a 123456 -c info replication
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
# Replication
role:master
connected_slaves:1
slave0:ip=10.0.0.37,port=6379,state=online,offset=1008,lag=0
master_replid:826e716b92aa4e287013a33f9786e529be2fff71
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:1008
second_repl_offset:-1
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:1
repl_backlog_histlen:1008

node4(10.0.0.37)

[root@node4 ~]# redis-cli -a 123456 -c info replication
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
# Replication
role:slave
master_host:10.0.0.27
master_port:6379
master_link_status:up
master_last_io_seconds_ago:6
master_sync_in_progress:0
slave_repl_offset:1008
slave_priority:100
slave_read_only:1
connected_slaves:0
master_replid:826e716b92aa4e287013a33f9786e529be2fff71
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:1008
second_repl_offset:-1
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:1
repl_backlog_histlen:1008

node5(10.0.0.47)

[root@node5 ~]# redis-cli -a 123456 -c info replication
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
# Replication
role:slave
master_host:10.0.0.7
master_port:6379
master_link_status:up
master_last_io_seconds_ago:4
master_sync_in_progress:0
slave_repl_offset:1008
slave_priority:100
slave_read_only:1
connected_slaves:0
master_replid:3493f56b2f698cea41c90cb0a41e1562b5821636
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:1008
second_repl_offset:-1
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:1
repl_backlog_histlen:1008

node6(10.0.0.57)

[root@node6 ~]# redis-cli -a 123456 -c info replication
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
# Replication
role:slave
master_host:10.0.0.17
master_port:6379
master_link_status:up
master_last_io_seconds_ago:10
master_sync_in_progress:0
slave_repl_offset:1008
slave_priority:100
slave_read_only:1
connected_slaves:0
master_replid:269568d06cb92748f583d6ea900e7563b1739f54
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:1008
second_repl_offset:-1
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:1
repl_backlog_histlen:1008

查看指定master节点的slave节点信息

#获取所有节点信息
[root@node1 ~]# redis-cli -a 123456 cluster nodes 2>/dev/null
59eac16e6e2992cdfffe97934d7409afe21d2a9a 10.0.0.37:6379@16379 slave 16bb6630a6a09bd4b24d7a203ecaa38b9a4360a7 0 1657554345797 4 connected
4ccee0bb38763061cf567995bcdd9289cea9cfec 10.0.0.7:6379@16379 myself,master - 0 1657554345000 1 connected 0-5460
12fdc235442ed40a838e77b246025799b4b3357b 10.0.0.17:6379@16379 master - 0 1657554343746 2 connected 5461-10922
8c3b8146ce75ab277958937d4e79e893a15c50e2 10.0.0.57:6379@16379 slave 12fdc235442ed40a838e77b246025799b4b3357b 0 1657554344770 6 connected
16bb6630a6a09bd4b24d7a203ecaa38b9a4360a7 10.0.0.27:6379@16379 master - 0 1657554344000 3 connected 10923-16383
15e2e2eccefd453f1a154fc42c6a9b030acacfb2 10.0.0.47:6379@16379 slave 4ccee0bb38763061cf567995bcdd9289cea9cfec 0 1657554344000 5 connected

#查看master节点ID对应的slave节点信息,16bb6630a6a09bd4b24d7a203ecaa38b9a4360a7为10.0.0.27 master节点ID
[root@node1 ~]# redis-cli -a 123456 cluster slaves 16bb6630a6a09bd4b24d7a203ecaa38b9a4360a7 2>/dev/null
1) "59eac16e6e2992cdfffe97934d7409afe21d2a9a 10.0.0.37:6379@16379 slave 16bb6630a6a09bd4b24d7a203ecaa38b9a4360a7 0 1657554778157 4 connected"

5. 验证集群状态

[root@node1 ~]# redis-cli -a 123456 cluster info
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
cluster_state:ok
cluster_slots_assigned:16384
cluster_slots_ok:16384
cluster_slots_pfail:0
cluster_slots_fail:0
cluster_known_nodes:6		#6个节点
cluster_size:3			#3组集群
cluster_current_epoch:6
cluster_my_epoch:1
cluster_stats_messages_ping_sent:3639
cluster_stats_messages_pong_sent:3625
cluster_stats_messages_sent:7264
cluster_stats_messages_ping_received:3620
cluster_stats_messages_pong_received:3639
cluster_stats_messages_meet_received:5
cluster_stats_messages_received:7264

#查看任意节点的集群状态
[root@node1 ~]# redis-cli -a 123456 --cluster info 10.0.0.27:6379
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
10.0.0.27:6379 (16bb6630...) -> 0 keys | 5461 slots | 1 slaves.
10.0.0.17:6379 (12fdc235...) -> 0 keys | 5462 slots | 1 slaves.
10.0.0.7:6379 (4ccee0bb...) -> 0 keys | 5461 slots | 1 slaves.
[OK] 0 keys in 3 masters.
0.00 keys per slot on average.

查看集群node对应关系

#获取集群中所有节点
[root@node1 ~]# redis-cli -a 123456 cluster nodes
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
59eac16e6e2992cdfffe97934d7409afe21d2a9a 10.0.0.37:6379@16379 slave 16bb6630a6a09bd4b24d7a203ecaa38b9a4360a7 0 1657556036000 4 connected
4ccee0bb38763061cf567995bcdd9289cea9cfec 10.0.0.7:6379@16379 myself,master - 0 1657556036000 1 connected 0-5460
12fdc235442ed40a838e77b246025799b4b3357b 10.0.0.17:6379@16379 master - 0 1657556036033 2 connected 5461-10922
8c3b8146ce75ab277958937d4e79e893a15c50e2 10.0.0.57:6379@16379 slave 12fdc235442ed40a838e77b246025799b4b3357b 0 1657556038079 6 connected
16bb6630a6a09bd4b24d7a203ecaa38b9a4360a7 10.0.0.27:6379@16379 master - 0 1657556037057 3 connected 10923-16383
15e2e2eccefd453f1a154fc42c6a9b030acacfb2 10.0.0.47:6379@16379 slave 4ccee0bb38763061cf567995bcdd9289cea9cfec 0 1657556036000 5 connected


[root@node1 ~]# redis-cli -a 123456 --cluster check 10.0.0.27:6379
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
10.0.0.27:6379 (16bb6630...) -> 0 keys | 5461 slots | 1 slaves.
10.0.0.17:6379 (12fdc235...) -> 0 keys | 5462 slots | 1 slaves.
10.0.0.7:6379 (4ccee0bb...) -> 0 keys | 5461 slots | 1 slaves.
[OK] 0 keys in 3 masters.
0.00 keys per slot on average.
>>> Performing Cluster Check (using node 10.0.0.27:6379)
M: 16bb6630a6a09bd4b24d7a203ecaa38b9a4360a7 10.0.0.27:6379
   slots:[10923-16383] (5461 slots) master
   1 additional replica(s)
S: 59eac16e6e2992cdfffe97934d7409afe21d2a9a 10.0.0.37:6379
   slots: (0 slots) slave
   replicates 16bb6630a6a09bd4b24d7a203ecaa38b9a4360a7
S: 8c3b8146ce75ab277958937d4e79e893a15c50e2 10.0.0.57:6379
   slots: (0 slots) slave
   replicates 12fdc235442ed40a838e77b246025799b4b3357b
M: 12fdc235442ed40a838e77b246025799b4b3357b 10.0.0.17:6379
   slots:[5461-10922] (5462 slots) master
   1 additional replica(s)
M: 4ccee0bb38763061cf567995bcdd9289cea9cfec 10.0.0.7:6379
   slots:[0-5460] (5461 slots) master
   1 additional replica(s)
S: 15e2e2eccefd453f1a154fc42c6a9b030acacfb2 10.0.0.47:6379
   slots: (0 slots) slave
   replicates 4ccee0bb38763061cf567995bcdd9289cea9cfec
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.

验证集群写入

image.png

#连接节点,可能会出现槽位不在当前node所以无法写入
[root@shichu ~]# redis-cli -a 123456 -h 10.0.0.7
10.0.0.7:6379> set key1 v1
(error) MOVED 9189 10.0.0.17:6379
#需要连接指定node,才可写入
[root@shichu ~]# redis-cli -a 123456 -h 10.0.0.17
10.0.0.17:6379> set key1 values1
OK
10.0.0.17:6379> get key1
"values1"


#使用选项-c以集群方式连接,连接至集群中任意一节点均可
[root@shichu ~]# redis-cli -a 123456 -h 10.0.0.7 -c
10.0.0.7:6379> set key1 v1
-> Redirected to slot [9189] located at 10.0.0.17:6379
OK
10.0.0.17:6379> get key1
"v1"

六、部署 Zabbix 监控

官网下载地址:https://www.zabbix.com/cn/download

官网文档:https://www.zabbix.com/manuals

https://cdn.zabbix.com/zabbix/sources/stable/5.0/zabbix-5.0.25.tar.gz

使用LNMP编译安装Zabbix 5

L:Linux(CentOS7)https://mirrors.aliyun.com/centos/7/isos/x86_64/
N:Nginx(1.18.0) https://nginx.org/en/download.html
M:MySQL(8.0.19) https://dev.mysql.com/downloads/mysql/
P:PHP(7.4.11)   http://php.net/downloads.php
Zabbix (5.0.25)  https://cdn.zabbix.com/zabbix/sources/

graph LR
A[Client]
B[Linux</br>Nginx</br>PHP</br>Zabbix</br>10.0.0.100]
C[Linux</br>MySQL</br>10.0.0.200]
A—>B—>C

1. 安装MySQL

参考:基于CentOS 7 二进制安装Mysql 8.0

安装完成后创建zabbix用户

mysql -uroot -p123456 -e "create database zabbix character set utf8 collate utf8_bin;"
mysql -uroot -p123456 -e "create user zabbix@'10.0.0.%' identified by '123456'"
mysql -uroot -p123456 -e "grant all privileges on zabbix.* to zabbix@'10.0.0.%'"
mysql -uroot -p123456 -e "use mysql;/
alter user zabbix@'10.0.0.%' identified with mysql_native_password by '123456';/
flush privileges;"

2. 安装Nginx

参考:基于CentOS 7 编译安装Nginx 1.18[^1]

3. 安装PHP

参考:基于CentOS 7 编译安装PHP 7.4[^2]

4. 安装Zabbix

安装zabbix_server

#!/bin/bash
# 编译安装Zabbix

source /etc/init.d/functions
#Zabbix版本
Zabbix_Version=zabbix-5.0.25
Suffix=tar.gz
Zabbix=${Zabbix_Version}.${Suffix}

Password=123456

#Zabbix源码下载地址
Zabbix_url=https://cdn.zabbix.com/zabbix/sources/stable/5.0/zabbix-5.0.25.tar.gz

#Zabbix安装路径
Zabbix_install_DIR=/apps/zabbix

# CPU数量
CPUS=`lscpu|grep "^CPU(s)"|awk '{print $2}'`
# 系统类型
os_type=`grep "^NAME" /etc/os-release |awk -F'"| ' '{print $2}'`
# 系统版本号
os_version=`awk -F'"' '/^VERSION_ID/{print $2}' /etc/os-release`

color () {
if [[ $2 -eq 0 ]];then
    echo -e "/e[1;32m$1/t/t/t/t/t/t[  OK  ]/e[0;m"
else
    echo $2
    echo -e "/e[1;31m$1/t/t/t/t/t/t[ FAILED ]/e[0;m"
fi
}


install_Zabbix (){
#----------------------------下载源码包-----------------------------
cd /opt
if [ -e ${Zabbix} ];then
	color "Zabbix源码包已存在" 0
else
	color "开始下载Zabbix源码包" 0
	wget ${Zabbix_url}
	if [ $? -ne 0 ];then
		color "下载Zabbix源码包失败,退出!" 1
		exit 1
	fi
fi


#----------------------------解压源码包-----------------------------
color "开始解压源码包" 0
tar -zxvf /opt/${Zabbix} -C /usr/local/src
ln -s /usr/local/src/${Zabbix_Version} /usr/local/src/zabbix


#----------------------------安装依赖包-------------------------------- 
color "开始安装依赖包" 0

#wget https://dev.mysql.com/get/mysql80-community-release-el7-6.noarch.rpm

yum install -y gcc libxml2-devel net-snmp net-snmp-devel curl curl-devel php-gd php-bcmath php-xml /
php-mbstring mariadb mariadb-devel OpenIPMI-devel libevent-devel java-1.8.0-openjdk-devel /
|| { color "安装依赖包失败,请检查网络" 1 ;exit 1;}


#---------------------------创建Zabbix用户---------------------------
if id zabbix &> /dev/null ;then
	color "Zabbix用户已存在" 1
else
	groupadd --system zabbix
	useradd --system -g zabbix -d /usr/lib/zabbix -s /sbin/nologin -c "Zabbix Monitoring System" zabbix
	color "Zabbix用户已创建完成" 0
fi

#---------------------------编译---------------------------
color "开始编译zabbix" 0
cd /usr/local/src/zabbix
./configure --prefix=${Zabbix_install_DIR} /
--enable-server /
--enable-agent /
--with-mysql /
--with-net-snmp /
--with-libcurl /
--with-libxml2 /
--with-openipmi /
--enable-proxy /
--enable-java

make -j ${CPUS} install
if [ $? -ne 0 ];then
	color "Zabbix 编译安装失败!" 1
	exit 1
else
	color "Zabbix编译安装成功" 0
fi

#复制web界面相关文件
mkdir -pv /home/nginx/zabbix
cp -rf /usr/local/src/zabbix/ui/* /home/nginx/zabbix/
chown nginx:nginx -R /home/nginx/zabbix

/apps/zabbix/sbin/zabbix_server -c /apps/zabbix/etc/zabbix_server.conf
if [ $? -eq 0 ];then
	color "zabbix_server测试能正常启动" 0
	pkill zabbix
fi

color "zabbix安装完成" 0
}

install_Zabbix

exit 0

修改配置文件

  1. 修改/apps/nginx/conf/nginx.conf配置文件

    worker_processes   1;
    pid 		logs/nginx.pid;
    events {
    		worker_connections	1024;
    }
    http {
    	include			mime.types;
    	default_type	application/octet-stream;
    	sendfile		on;
    	keepalive_timeout	65;
    	server {
    		listen		80;
    		server_name	10.0.0.100;				#指定主机名
    		server_tokens off;						#隐藏nginx版本信息
    
    		location / {
    			root	/home/nginx/zabbix;				#指定数据目录
    			index	index.php index.html index.htm;			#指定默认主页
    		}
    
    		error_page	500 502 503 504 /50x.html;
    
    		location = /50x.html {
    			root	html;
    		}
    
    		location ~ /.php$ {						#实现php-fpm
    			root		/home/nginx/zabbix;
    			fastcgi_pass	127.0.0.1:9000;
    			fastcgi_index	index.php;
    			fastcgi_param	SCRIPT_FILENAME	$document_root$fastcgi_script_name;
    			include		fastcgi_params;
    			fastcgi_hide_header X-Powered-By;			#隐藏php版本信息
    		}
    
    		location ~ ^/(ping|pm_status)$ {				#实现状态页
    			include		fastcgi_params;
    			fastcgi_pass	127.0.0.1:9000;
    			fastcgi_param	PATH_TRANSLATED	$document_root$fastcgi_script_name;
    		}
    	}
    }
    
  2. 修改php配置文件

    #修改/etc/php.ini
    sed -i -e "/memory_limit/c memory_limit = 256M" /
    -e "/post_max_size/c post_max_size = 30M" /
    -e "/upload_max_filesize/c upload_max_filesize = 20M" /
    -e "/max_execution_time/c max_execution_time = 300" /
    -e "/max_input_time/c max_input_time = 300" /
    -e "/;date.timezone/c date.timezone = Asia/Shanghai" /
    /etc/php.ini
    
    #修改/apps/php/etc/php-fpm.d/www.conf
    sed -i -e "/user = www/c user = nginx" /
    -e "/group = www/c group = nginx" /apps/php/etc/php-fpm.d/www.conf
    

    重启服务

    systemctl restart nginx php-fpm
    
  3. 导入mysql数据

    mysql -uzabbix -p123456 -h10.0.0.200 zabbix < /usr/local/src/zabbix/database/mysql/schema.sql
    mysql -uzabbix -p123456 -h10.0.0.200 zabbix < /usr/local/src/zabbix/database/mysql/images.sql
    mysql -uzabbix -p123456 -h10.0.0.200 zabbix < /usr/local/src/zabbix/database/mysql/data.sql
    
  4. 修改zabbix配置文件

    sed -i "/# DBHost=localhost/aDBHost=10.0.0.200" /apps/zabbix/etc/zabbix_server.conf
    sed -i "/# DBPassword=/aDBPassword=123456" /apps/zabbix/etc/zabbix_server.conf
    sed -i "/# DBPort=/aDBPort=3306" /apps/zabbix/etc/zabbix_server.conf
    sed -i "/StatsAllowedIP=127.0.0.1/c #StatsAllowedIP=127.0.0.1" /apps/zabbix/etc/zabbix_server.conf
    
  5. 设置zabbix_server启动服务脚本

    cat /lib/systemd/system/zabbix-server.service

    [Unit]
    Description=Zabbix Server
    After=syslog.target
    After=network.target
    
    [Service]
    Environment="CONFFILE=/apps/zabbix/etc/zabbix_server.conf"
    EnvironmentFile=-/etc/default/zabbix-server
    Type=forking
    Restart=on-failure
    PIDFile=/tmp/zabbix_server.pid
    KillMode=control-group
    ExecStart=/apps/zabbix/sbin/zabbix_server -c $CONFFILE
    ExecStop=/bin/kill -SIGTERM $MAINPID
    RestartSec=10s
    TimeoutStopSec=5
    
    [Install]
    WantedBy=multi-user.target
    

    启动服务

    systemctl daemon-reload
    systemctl enable --now zabbix-server
    
  6. 设置zabbix_agent启动服务脚本

    cat /lib/systemd/system/zabbix-agent.service

    [Unit]
    Description=Zabbix Agent
    After=syslog.target
    After=network.target
    
    [Service]
    Environment="CONFFILE=/apps/zabbix/etc/zabbix_agentd.conf"
    EnvironmentFile=-/etc/default/zabbix-agent
    Type=forking
    Restart=on-failure
    PIDFile=/tmp/zabbix_agentd.pid
    KillMode=control-group
    ExecStart=/apps/zabbix/sbin/zabbix_agentd -c $CONFFILE
    ExecStop=/bin/kill -SIGTERM $MAINPID
    RestartSec=10s
    User=zabbix
    Group=zabbix
    
    [Install]
    WantedBy=multi-user.target
    

    启动服务

    systemctl daemon-reload
    systemctl enable --now zabbix-agent
    
  7. 查看状态

    • 10050、10051端口启动正常
    #可看到10050(agent)、10051(server)端口
    [root@shichu apps]# ss -ntl
    State      Recv-Q Send-Q               Local Address:Port                              Peer Address:Port          
    LISTEN     0      128                              *:22                                           *:*              
    LISTEN     0      100                      127.0.0.1:25                                           *:*              
    LISTEN     0      128                              *:10050                                        *:*              
    LISTEN     0      128                              *:10051                                        *:*              
    LISTEN     0      128                      127.0.0.1:9000                                         *:*              
    LISTEN     0      128                              *:111                                          *:*              
    LISTEN     0      128                              *:80                                           *:*              
    LISTEN     0      128                           [::]:22                                        [::]:*              
    LISTEN     0      100                          [::1]:25                                        [::]:*              
    LISTEN     0      128                           [::]:111                                       [::]:*
    
    • zabbix-sever服务状态
    [root@shichu apps]# systemctl status zabbix-server
    ● zabbix-server.service - Zabbix Server
       Loaded: loaded (/usr/lib/systemd/system/zabbix-server.service; disabled; vendor preset: disabled)
       Active: active (running) since Thu 2022-07-14 00:47:09 CST; 52s ago
      Process: 8346 ExecStop=/bin/kill -SIGTERM $MAINPID (code=exited, status=0/SUCCESS)
      Process: 8352 ExecStart=/apps/zabbix/sbin/zabbix_server -c $CONFFILE (code=exited, status=0/SUCCESS)
     Main PID: 8360 (zabbix_server)
       CGroup: /system.slice/zabbix-server.service
               ├─8360 /apps/zabbix/sbin/zabbix_server -c /apps/zabbix/etc/zabbix_server.conf
               ├─8362 /apps/zabbix/sbin/zabbix_server: configuration syncer [synced configuration in 0.059399 sec, idle 6...
               ├─8363 /apps/zabbix/sbin/zabbix_server: alert manager #1 [sent 0, failed 0 alerts, idle 5.027609 sec durin...
               ├─8364 /apps/zabbix/sbin/zabbix_server: alerter #1 started
               ├─8365 /apps/zabbix/sbin/zabbix_server: alerter #2 started
               ├─8366 /apps/zabbix/sbin/zabbix_server: alerter #3 started
               ├─8367 /apps/zabbix/sbin/zabbix_server: preprocessing manager #1 [queued 0, processed 11 values, idle 5.00...
               ├─8368 /apps/zabbix/sbin/zabbix_server: preprocessing worker #1 started
               ├─8369 /apps/zabbix/sbin/zabbix_server: preprocessing worker #2 started
               ├─8370 /apps/zabbix/sbin/zabbix_server: preprocessing worker #3 started
               ├─8371 /apps/zabbix/sbin/zabbix_server: lld manager #1 [processed 0 LLD rules, idle 5.008702sec during 5.0...
               ├─8372 /apps/zabbix/sbin/zabbix_server: lld worker #1 started
               ├─8373 /apps/zabbix/sbin/zabbix_server: lld worker #2 started
               ├─8374 /apps/zabbix/sbin/zabbix_server: housekeeper [startup idle for 30 minutes]
               ├─8375 /apps/zabbix/sbin/zabbix_server: timer #1 [updated 0 hosts, suppressed 0 events in 0.001868 sec, id...
               ├─8376 /apps/zabbix/sbin/zabbix_server: http poller #1 [got 0 values in 0.001502 sec, idle 5 sec]
               ├─8377 /apps/zabbix/sbin/zabbix_server: discoverer #1 [processed 0 rules in 0.004759 sec, idle 60 sec]
               ├─8378 /apps/zabbix/sbin/zabbix_server: history syncer #1 [processed 0 values, 0 triggers in 0.000050 sec,...
               ├─8379 /apps/zabbix/sbin/zabbix_server: history syncer #2 [processed 0 values, 0 triggers in 0.000175 sec,...
               ├─8380 /apps/zabbix/sbin/zabbix_server: history syncer #3 [processed 0 values, 0 triggers in 0.000029 sec,...
               ├─8381 /apps/zabbix/sbin/zabbix_server: history syncer #4 [processed 0 values, 0 triggers in 0.000019 sec,...
               ├─8382 /apps/zabbix/sbin/zabbix_server: escalator #1 [processed 0 escalations in 0.004440 sec, idle 3 sec]...
               ├─8383 /apps/zabbix/sbin/zabbix_server: proxy poller #1 [exchanged data with 0 proxies in 0.000028 sec, id...
               ├─8384 /apps/zabbix/sbin/zabbix_server: self-monitoring [processed data in 0.000016 sec, idle 1 sec]
               ├─8385 /apps/zabbix/sbin/zabbix_server: task manager [processed 0 task(s) in 0.000836 sec, idle 5 sec]
               ├─8386 /apps/zabbix/sbin/zabbix_server: poller #1 [got 0 values in 0.000050 sec, idle 1 sec]
               ├─8387 /apps/zabbix/sbin/zabbix_server: poller #2 [got 0 values in 0.000048 sec, idle 1 sec]
               ├─8388 /apps/zabbix/sbin/zabbix_server: poller #3 [got 1 values in 0.001602 sec, idle 1 sec]
               ├─8389 /apps/zabbix/sbin/zabbix_server: poller #4 [got 0 values in 0.000019 sec, idle 1 sec]
               ├─8390 /apps/zabbix/sbin/zabbix_server: poller #5 [got 0 values in 0.001402 sec, idle 1 sec]
               ├─8391 /apps/zabbix/sbin/zabbix_server: unreachable poller #1 [got 0 values in 0.000039 sec, idle 5 sec]
               ├─8392 /apps/zabbix/sbin/zabbix_server: trapper #1 [processed data in 0.000000 sec, waiting for connection...
               ├─8393 /apps/zabbix/sbin/zabbix_server: trapper #2 [processed data in 0.000000 sec, waiting for connection...
               ├─8394 /apps/zabbix/sbin/zabbix_server: trapper #3 [processed data in 0.000000 sec, waiting for connection...
               ├─8395 /apps/zabbix/sbin/zabbix_server: trapper #4 [processed data in 0.000000 sec, waiting for connection...
               ├─8396 /apps/zabbix/sbin/zabbix_server: trapper #5 [processed data in 0.000000 sec, waiting for connection...
               ├─8397 /apps/zabbix/sbin/zabbix_server: icmp pinger #1 [got 0 values in 0.000020 sec, idle 5 sec]
               └─8398 /apps/zabbix/sbin/zabbix_server: alert syncer [queued 0 alerts(s), flushed 0 result(s) in 0.001557 ...
    
    Jul 14 00:47:08 shichu systemd[1]: Starting Zabbix Server...
    Jul 14 00:47:09 shichu systemd[1]: Started Zabbix Server.
    
    • zabbix-agent服务状态

      [root@shichu apps]# systemctl status zabbix-agent
      ● zabbix-agent.service - Zabbix Agent
         Loaded: loaded (/usr/lib/systemd/system/zabbix-agent.service; enabled; vendor preset: disabled)
         Active: active (running) since Thu 2022-07-14 00:47:09 CST; 58s ago
        Process: 8349 ExecStart=/apps/zabbix/sbin/zabbix_agentd -c $CONFFILE (code=exited, status=0/SUCCESS)
       Main PID: 8353 (zabbix_agentd)
         CGroup: /system.slice/zabbix-agent.service
                 ├─8353 /apps/zabbix/sbin/zabbix_agentd -c /apps/zabbix/etc/zabbix_agentd.conf
                 ├─8354 /apps/zabbix/sbin/zabbix_agentd: collector [idle 1 sec]
                 ├─8355 /apps/zabbix/sbin/zabbix_agentd: listener #1 [waiting for connection]
                 ├─8356 /apps/zabbix/sbin/zabbix_agentd: listener #2 [waiting for connection]
                 ├─8357 /apps/zabbix/sbin/zabbix_agentd: listener #3 [waiting for connection]
                 └─8358 /apps/zabbix/sbin/zabbix_agentd: active checks #1 [idle 1 sec]
      
      Jul 14 00:47:08 shichu systemd[1]: Starting Zabbix Agent...
      Jul 14 00:47:09 shichu systemd[1]: Started Zabbix Agent.
      

启动

5. 配置Web界面

初始化设置

浏览器访问本地IP(10.0.0.100)

  • 本地环境检查

image.png

  • 配置数据库信息

image.png

  • 配置zabbix信息

image.png

  • 信息确认

image.png

  • 创建配置

需要手动下载配置文件上传至zabbix sever的/home/nginx/zabbix/conf/目录下

image.png

  • 完成安装

image.png

  • 登录
默认用户名:Admin	#注意A是大写
密码:zabbix

image.png

  • 进入首页

image.png

优化设置

设置中文菜单

image.png

显示中文

image.png

解决监控项乱码

  • 监控项存在乱码

image.png

  • 从Windows选择一种字体,如楷体(simkai.ttf)

image.png

  • 上传字体至zabbix web目录

具体路径为:/home/nginx/zabbix/assets/fonts

image.png

  • 修改zabbix调用字体
vim /home/nginx/zabbix/include/defines.inc.php
#修改如下两处即可
//define('ZBX_GRAPH_FONT_NAME',     'DejaVuSans'); // font file name
define('ZBX_GRAPH_FONT_NAME',       'simkai'); // font file name 


#define('ZBX_FONT_NAME', 'DejaVuSans');
define('ZBX_FONT_NAME', 'simkai');
  • 验证字体生效

字体自动生效,无需重启zabbix及nginx服务

image.png

七、实现 Nginx、Mysql 的监控

flowchart TB
zabbix[Zabbix Server</br>10.0.0.100]
mysql-m[Master</br>10.0.0.17]
mysql-s[Slave</br>10.0.0.27]
nginx[Nginx</br>10.0.0.7]

subgraph Mysql

mysql-m<–>mysql-s
end
zabbix—>nginx
zabbix—>Mysql

1. 安装zabbix agent

  • 通过yum安装agent yum install zabbix50-agent

  • 修改agent配置文件

    [root@nginx ~]# grep '^[a-Z]' /etc/zabbix_agentd.conf 
    PidFile=/run/zabbix/zabbix_agentd.pid
    LogFile=/var/log/zabbix/zabbix_agentd.log
    LogFileSize=0
    Server=10.0.0.100		#zabbix-server的IP或Proxy的IP
    ListenPort=10050		#监听端口,默认值
    StartAgents=3			#被动状态是默认启动的进程数,为0不监听任何端口
    ServerActive=10.0.0.100		#主动模式下的zabbix-server的IP或Proxy的IP
    Hostname=10.0.0.7		#区分大小写且在zabbix server中值唯一,默认填本机IP
    Include=/etc/zabbix_agentd.conf.d/*.conf	#在文件末尾新增子配置文件路径
    
    

    启动服务

    mkdir -p /etc/zabbix_agentd.conf.d

    systemctl start zabbix-agent

    查看状态

    [root@nginx ~]# systemctl status zabbix-agent
    ● zabbix-agent.service - Zabbix Monitor Agent
       Loaded: loaded (/usr/lib/systemd/system/zabbix-agent.service; enabled; vendor preset: disabled)
       Active: active (running) since Thu 2022-07-14 16:07:35 CST; 1s ago
     Main PID: 1511 (zabbix_agentd)
       CGroup: /system.slice/zabbix-agent.service
               ├─1511 /usr/sbin/zabbix_agentd -f
               ├─1512 /usr/sbin/zabbix_agentd: collector [idle 1 sec]
               ├─1513 /usr/sbin/zabbix_agentd: listener #1 [waiting for connection]
               ├─1514 /usr/sbin/zabbix_agentd: listener #2 [waiting for connection]
               └─1515 /usr/sbin/zabbix_agentd: listener #3 [waiting for connection]
    
    Jul 14 16:07:35 nginx systemd[1]: Stopped Zabbix Monitor Agent.
    Jul 14 16:07:35 nginx systemd[1]: Started Zabbix Monitor Agent.
    Jul 14 16:07:35 nginx zabbix_agentd[1511]: Starting Zabbix Agent [10.0.0.7]. Zabbix 5.0.21 (revision 47104dd574).
    Jul 14 16:07:35 nginx zabbix_agentd[1511]: Press Ctrl+C to exit.
    
  • web界面添加被监控主机

    配置——主机——创建主机

    image.png

2. 实现监控Nginx

  1. 准备nginx状态页
#添加nginx状态配置
[root@nginx ~]# cat /etc/nginx/nginx.conf
#在server块中添加状态页信息
...
        location /nginx_status {
            stub_status;
            allow 10.0.0.0/24;
            allow 127.0.0.1;
        }
  1. 准备nginx监控脚本
[root@nginx etc]# cat /etc/zabbix_agentd.d/nginx_status.sh
#!/bin/bash 

nginx_status_fun(){			#函数内容
	NGINX_PORT=$1			#端口,函数的第一个参数是脚本的第二个参数,即脚本的第二个参数是端口号
	NGINX_COMMAND=$2 		#命令,函数的第二个参数是脚本的第三个参数,即脚本的第三个参数是命令
	nginx_active(){			#获取nginx_active数量,以下相同,这是开启了nginx状态但是只能从本机看到
        /usr/bin/curl "http://127.0.0.1:"$NGINX_PORT"/nginx_status" 2>/dev/null| grep 'Active' | awk '{print $NF}'
		}
	nginx_reading(){		#获取状态的数量
        /usr/bin/curl "http://127.0.0.1:"$NGINX_PORT"/nginx_status" 2>/dev/null| grep 'Reading' | awk '{print $2}'
		}
	nginx_writing(){
        /usr/bin/curl "http://127.0.0.1:"$NGINX_PORT"/nginx_status" 2>/dev/null| grep 'Writing' | awk '{print $4}'
		}
	nginx_waiting(){
        /usr/bin/curl "http://127.0.0.1:"$NGINX_PORT"/nginx_status" 2>/dev/null| grep 'Waiting' | awk '{print $6}'
		}
	nginx_accepts(){
        /usr/bin/curl "http://127.0.0.1:"$NGINX_PORT"/nginx_status" 2>/dev/null| awk NR==3 | awk '{print $1}'
		}
	nginx_handled(){
        /usr/bin/curl "http://127.0.0.1:"$NGINX_PORT"/nginx_status" 2>/dev/null| awk NR==3 | awk '{print $2}'
		}
	nginx_requests(){
        /usr/bin/curl "http://127.0.0.1:"$NGINX_PORT"/nginx_status" 2>/dev/null| awk NR==3 | awk '{print $3}'
		}
  	case $NGINX_COMMAND in
		active)
			nginx_active;
			;;
		reading)
			nginx_reading;
			;;
		writing)
			nginx_writing;
			;;
		waiting)
			nginx_waiting;
			;;
		accepts)
			nginx_accepts;
			;;
		handled)
			nginx_handled;
			;;
		requests)
			nginx_requests;
		esac 
}

main(){							#主函数内容
	case $1 in
		nginx_status)				#分支结构,用于判断用户的输入而进行响应的操作
			nginx_status_fun $2 $3;		#当输入nginx_status就调用nginx_status_fun,并传递第二和第三个参数
			;;
		status)					#获取状态码
			curl -I -s http://127.0.0.1/nginx_status 2>/dev/null | awk 'NR==1{print $2}';
	            	;;				# -I仅输出HTTP请求头,-s不输出任何东西
		*)					#其他的输入打印帮助信息
			echo $"Usage: $0 {nginx_status key}"
	esac
}

main $1 $2 $3
  1. 添加zabbix agent自定义监控项(通过子配置文件方式)

    • 创建子配置文件
    [root@nginx etc]# cat /etc/zabbix_agentd.conf.d/nginx_monitor.conf 
    UserParameter=nginx_status[*],/etc/zabbix_agentd.d/nginx_status.sh "$1" "$2" "$3"
    
  2. 验证测试

#重启服务
systemctl restart nginx zabbix-agent

#本地获取所有nginx状态
[root@nginx zabbix_agentd.d]# curl 127.0.0.1/nginx_status
Active connections: 1 
server accepts handled requests
 21 21 21 
Reading: 0 Writing: 1 Waiting: 0 

#本机获取active连接数
[root@nginx zabbix_agentd.d]# /etc/zabbix_agentd.d/nginx_status.sh nginx_status 80 active
1

#server获取active连接数
[root@zabbix ~]# /apps/zabbix/bin/zabbix_get -s 10.0.0.7 -p 10050 -k "nginx_status["nginx_status",80,"active"]"
1
  1. 导入监控模板

    模板参考:nginx-template.xml

    image.png

    关联模板

    image.png

    查看导入的nginx模板监控项

    image.png

  2. 验证监控

    image.png

3. 实现监控Mysql

1)搭建mysql主从

master(10.0.0.17)

#修改配置
vim /etc/my.cnf.d/server.cnf
[mysqld]
bind=0.0.0.0
server-id=17
log-bin

#重启数据库
systemctl restart mariadb


#创建复制用户
MariaDB [(none)]> create user 'repluser'@'10.0.0.%';
Query OK, 0 rows affected (0.00 sec)
#授权复制用户权限
MariaDB [(none)]> grant replication slave on *.* to 'repluser'@'10.0.0.%';
Query OK, 0 rows affected (0.00 sec)

#备份数据
[root@mysql-master ~]# mysqldump --all-databases --single_transaction --flush-logs --master-data=2 /
--lock-tables > /opt/backup.sql

#将备份数据复制到slave节点
[root@mysql-master ~]# scp /opt/backup.sql 10.0.0.27:/opt/

#查看二进制文件和位置
[root@mysql-master ~]# mysql
MariaDB [(none)]> show master logs;
+--------------------+-----------+
| Log_name           | File_size |
+--------------------+-----------+
| mariadb-bin.000001 |     29733 |
| mariadb-bin.000002 |       245 |
+--------------------+-----------+

2 rows in set (0.00 sec)

slave(10.0.0.27)

#修改配置
vim /etc/my.cnf.d/server.cnf
[mysqld]
bind=0.0.0.0
server-id=27
read-only

#重启数据库
systemctl restart mariadb

# 导入master节点备份数据
[root@slave ~]# mysql < /opt/backup.sql

#根据master信息开启同步设置
#其中MASTER_LOG_FILE、MASTER_LOG_POS对应master节点中Log_name、File_size(可通过命令show master logs查看)
[root@mysql-slave ~]# mysql
MariaDB [(none)]> CHANGE MASTER TO
  MASTER_HOST='10.0.0.17',
  MASTER_USER='repluser',
  MASTER_PASSWORD='',
  MASTER_PORT=3306,
  MASTER_LOG_FILE='mariadb-bin.000001',
  MASTER_LOG_POS=29733,
  MASTER_CONNECT_RETRY=10;

#开启slave
MariaDB [(none)]> start slave;

#显示状态信息
MariaDB [(none)]> show slave status/G;
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 10.0.0.17
                  Master_User: repluser
                  Master_Port: 3306
                Connect_Retry: 10
              Master_Log_File: mariadb-bin.000002
          Read_Master_Log_Pos: 245
               Relay_Log_File: mariadb-relay-bin.000003
                Relay_Log_Pos: 531
        Relay_Master_Log_File: mariadb-bin.000002
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
......
             Master_Server_Id: 17

2)利用percona工具实现监控

官网下载地址:https://www.percona.com/downloads/

安装包:https://www.percona.com/downloads/percona-monitoring-plugins/LATEST/

  1. 安装percona插件
#下载
wget https://downloads.percona.com/downloads/percona-monitoring-plugins/percona-monitoring-plugins-1.1.8/binary/redhat/7/x86_64/percona-zabbix-templates-1.1.8-1.noarch.rpm
#安装
yum install -y percona-zabbix-templates-1.1.8-1.noarch.rpm
#安装php
yum install -y php php-mysql

#复制模板
cp /var/lib/zabbix/percona/templates/userparameter_percona_mysql.conf /etc/zabbix_agentd.conf.d/

#创建连接mysql数据库的php配置文件
vim /var/lib/zabbix/percona/scripts/ss_get_mysql_stats.php.cnf
<?php
$mysql_user = 'root';
$mysql_pass = ''; 

#重启
systemctl restart zabbix-agent
  1. 在zabbix-server上测试
[root@zabbix ~]# /apps/zabbix/bin/zabbix_get -s 10.0.0.17 -p 10050 -k MySQL.Key-reads
19
[root@zabbix ~]# /apps/zabbix/bin/zabbix_get -s 10.0.0.27 -p 10050 -k MySQL.Key-reads
0
  1. 关联主机模板

    注意:默认的模板/var/lib/zabbix/percona/templates/zabbix_agent_template_percona_mysql_server_ht_2.0.9-sver1.1.8.xml不可用,需要进行修改。

    模板参考:siyuan://blocks/20220715151809-f0mrj0m

image.png

  1. 查看监控状态

image.png

  1. 监控类型更改为主动式

image.png

  1. 验证监控

    image.png

4. 问题

1. 主动模式下监控数据正常,但ZBX图标为灰色未变绿

解决方法:将模板Template OS Linux by Zabbix agent active中的链接模板Template Module Zabbix agent active先取消链接并清理,再添加Template Module Zabbix agent模板。

image.png

ZBX图标变绿

image.png

八、zabbix实现故障和恢复的邮件通知

1. 实现故障自治愈

1)agent开启远程执行命令权限

[root@nginx tmp]# grep '^[a-Z]' /etc/zabbix_agentd.conf
PidFile=/run/zabbix/zabbix_agentd.pid
LogFile=/var/log/zabbix/zabbix_agentd.log
LogFileSize=0
EnableRemoteCommands=1		#开启远程执行命令功能
Server=10.0.0.100
ListenPort=10050
StartAgents=3
ServerActive=10.0.0.100
Hostname=10.0.0.7
User=zabbix
UnsafeUserParameters=1		#允许远程执行命令的时候使用不安全的参数(特殊的字符串)
Include=/etc/zabbix_agentd.conf.d/*.conf

2)agent添加zabbix用户授权

[root@nginx ~]# vim /etc/sudoers
......
root    ALL=(ALL)   ALL
zabbix ALL=NOPASSWD:ALL		#授权zabbix用户执行特殊命令不再需要密码,比如sudo命令

重启服务

systemctl restart zabbix-agent

3)创建动作

  • 添加动作名称和执行条件

image.png

  • 添加具体操作指令

    远程执行的命令要写绝对路径

image.png

2. 实现邮件通知

1) 邮箱开启SMTP

进入个人邮箱,开启SMTP功能

image.png

发短信获取授权码

image.png

2) 创建报警媒介类型

设置邮箱参考:https://service.mail.qq.com/cgi-bin/help?subtype=1&&id=28&&no=371

密码是前面获取的授权码

image.png

3)给用户添加报警媒介

选择Admin用户

image.png

选择报警媒介,点击添加

image.png

类型选择前面创建的报警媒介,收件人选择要发送信息的对象

image.png

更新报警媒介

image.png

4)创建动作

  • 在自治愈动作上添加发送邮件操作

image.png

  • 添加故障发生时、故障恢复后的操作

image.png

发送故障时的邮件通告内容

image.png

恢复后的邮件通告内容

image.png

3. 验证故障告警邮件及恢复邮件通告功能

1)关闭nginx服务

查看80端口

image.png

nginx自动恢复

image.png

2)zabbix能够自动执行恢复指令及发送通知邮件

image.png

3)登录个人邮箱,查看告警邮件信息

image.png

原创文章,作者:745907710,如若转载,请注明出处:https://blog.ytso.com/274679.html

(0)
上一篇 2022年7月16日
下一篇 2022年7月16日

相关推荐

发表回复

登录后才能评论