一、简述 redis 特点及其应用场景
Redis 特点
- 速度快:10W QPS,基于内存,C 语言实现
- 持久化
- 支持多种数据结构:支持 string(字符串)、hash(哈希数据)、list(列表)、set(集合)、zset(有序集合)
- 支持多种编码语言
- 功能丰富:支持 Lua 脚本,发布订阅,事务,pipeline 等功能
- 简单:代码短小精悍(单机核心代码只有 23000 行左右),单线程开发容易,不依赖外部库,使用简单
- 主从复制
- 支持高可用和分布式
Redis 典型应用场景
- session 共享:常见于 Web 集群中的 Tomcat 或 PHP 中多 Web 服务器 session 共享
- 缓存:数据查询、电商网站商品信息、新闻内容
- 计数器:访问排行榜、商品浏览数等和次数相关的数值统计场景
- 微博/微信社交场合:共同好友,粉丝数,关注,点赞评论等
- 消息队列:ELK 的日志缓存、部分业务的订阅发布系统
- 地理位置:基于 GEO(地理信息定位),实现摇一摇,附件的人,外卖等功能
二、对比 redis 的 RDB、AOF 模式的优缺点
1. RDB(Redis DataBase)模式
RDB 工作原理
RDB 基于时间的快照,其默认只保留当前最新的一次快照,特点是执行速度比较快,缺点是可能会丢失从上次快照到当前时间点之间未做快照的数据。
RDB bgsave(异步)实现快照具体过程
RDB 模式优缺点
优点
-
RDB 快照保存了某个时间点的数据,可以通过脚本执行 redis 指令 bgsave(非阻塞,后台执行)或者 save(会阻塞写操作,不推荐)命令自定义时间点备份,可以保留多个备份,当出现问题可以恢复到不同时间点的版本,很适合备份,并且此文件格式也支持有不少第三方工具可以进行后续的数据分析。
比如: 可以在最近的 24 小时内,每小时备份一次 RDB 文件,并且在每个月的每一天,也备份一个 RDB 文件。这样的话,即使遇上问题,也可以随时将数据集还原到不同的版本。
-
RDB 可以最大化 Redis 的性能,父进程在保存 RDB 文件时唯一要做的就是 fork 出一个子进程,然后这个子进程就会处理接下来的所有保存工作,父进程无须执行任何磁盘工/0 操作。
-
RDB 在大量数据,比如几个 G 的数据,恢复的速度比 AOF 的快
缺点
-
不能实时保存数据,可能会丢失自上一次执行 RDB 备份到当前的内存数据
如果需要尽量避免在服务器故障时丢失数据,那么 RDB 不适合。虽然 Redis 允许设置不同的保存点(save point)来控制保存 RDB 文件的频率,但是,因为 RDB 文件需要保存整个数据集的状态,所以它并不是一个轻松快速的操作。因此一般会超过 5 分钟以上才保存一次 RDB 文件。在这种情况下,一旦发生故障停机,就可能会丢失好几分钟的数据。
-
当数据量非常大的时候,从父进程 fork 子进程进行保存至 RDB 文件时需要一点时间,可能是毫秒或者秒,取决于磁盘 IO 性能
在数据集比较庞大时,fork()可能会非常耗时,造成服务器在一定时间内停止处理客户端﹔如果数据集非常巨大,并且 CPU 时间非常紧张的话,那么这种停止时间甚至可能会长达整整一秒或更久。虽然 AOF 重写也需要进行 fork(),但无论 AOF 重写的执行间隔有多长,数据的持久性都不会有任何损失。
AOF(AppendOnlyFile)模式
AOF 工作原理
AOF 按照操作顺序依次将操作追加到指定的日志文件末尾。
注意:
同时启用 RDB 和 AOF,进行恢复时,默认 AOF 文件优先级高于 RDB 文件,即会使用 AOF 文件进行恢复;
AOF 模式默认是关闭的,第一次开启 AOF 后,并重启服务生效后,会因为 AOF 的优先级高于 RDB,而 AOF 默认没有文件存在,从而导致所有数据丢失。
AOF rewrite 重写
将一些重复的,可以合并的,过期的数据重新写入一个新的 AOF 文件,从而节约 AOF 备份占用的硬盘空间,也能加速恢复过程;可以手动执行 bgrewriteaof 触发 AOF,或定义自动 rewrite 策略。
AOF rewrite 过程
AOF 模式优缺点
优点
-
数据安全性相对较高,根据所使用的 fsync 策略(fsync 是同步内存中 redis 所有已经修改的文件到存储设备),默认是 appendfsync everysec,即每秒执行一次 fsync,在这种配置下,Redis 仍然可以保持良好的性能,并且就算发生故障停机,也最多只会丢失一秒钟的数据( fsync 会在后台线程执行,所以主线程可以继续努力地处理命令请求)
-
由于该机制对日志文件的写入操作采用的是 append 模式,因此在写入过程中不需要 seek, 即使出现宕机现象,也不会破坏日志文件中已经存在的内容。然而如果本次操作只是写入了一半数据就出现了系统崩溃问题,不用担心,在 Redis 下一次启动之前,可以通过 redis-check-aof 工具来解决数据一致性的问题
-
Redis 可以在 AOF 文件体积变得过大时,自动地在后台对 AOF 进行重写,重写后的新 AOF 文件包含了恢复当前数据集所需的最小命令集合。整个重写操作是绝对安全的,因为 Redis 在创建新 AOF 文件的过程中,append 模式不断的将修改数据追加到现有的 AOF 文件里面,即使重写过程中发生停机,现有的 AOF 文件也不会丢失。而一旦新 AOF 文件创建完毕,Redis 就会从旧 AOF 文件切换到新 AOF 文件,并开始对新 AOF 文件进行追加操作。
-
AOF 包含一个格式清晰、易于理解的日志文件用于记录所有的修改操作。事实上,也可以通过该文件完成数据的重建
AOF 文件有序地保存了对数据库执行的所有写入操作,这些写入操作以 Redis 协议的格式保存,因此 AOF 文件的内容非常容易被人读懂,对文件进行分析(parse)也很轻松。导出(export)AOF 文件也非常简单:举个例子,如果不小心执行了 FLUSHALL.命令,但只要 AOF 文件未被重写,那么只要停止服务器,移除 AOF 文件末尾的 FLUSHAL 命令,并重启 Redis ,就可以将数据集恢复到 FLUSHALL 执行之前的状态。
缺点
- 即使有些操作是重复的也会全部记录,AOF 的文件大小要大于 RDB 格式的文件
- AOF 在恢复大数据集时的速度比 RDB 的恢复速度要慢
- 根据 fsync 策略不同,AOF 速度可能会慢于 RDB
- bug 出现的可能性更多
RDB 和 AOF 适用场景
- 如果主要充当缓存功能,或者可以承受数分钟数据的丢失, 通常生产环境一般只需启用 RDB 即可,此也是默认值
- 如果数据需要持久保存,一点不能丢失,可以选择同时开启 RDB 和 AOF
- 一般不建议只开启 AOF
三、实现 redis 哨兵,模拟 master 故障场景
工作原理
实现哨兵(sentinel)模式
graph LR
M[Sentinel</br>10.0.0.7</br>master]
S1[Sentinel</br>10.0.0.17</br>slave1]
S2[Sentinel</br>10.0.0.27</br>slave2]
M—->S1
M—->S2
配置一主两从
一键编译 redis 安装脚本
#!/bin/bash
# 编译安装Redis
source /etc/init.d/functions
#Redis版本
Redis_version=redis-5.0.9
suffix=tar.gz
Redis=${Redis_version}.${suffix}
Password=123456
#redis源码下载地址
redis_url=http://download.redis.io/releases/${Redis}
#redis安装路径
redis_install_DIR=/apps/redis
# CPU数量
CPUS=`lscpu|grep "^CPU(s)"|awk '{print $2}'`
# 系统类型
os_type=`grep "^NAME" /etc/os-release |awk -F'"| ' '{print $2}'`
# 系统版本号
os_version=`awk -F'"' '/^VERSION_ID/{print $2}' /etc/os-release`
color () {
if [[ $2 -eq 0 ]];then
echo -e "/e[1;32m$1/t/t/t/t/t/t[ OK ]/e[0;m"
else
echo $2
echo -e "/e[1;31m$1/t/t/t/t/t/t[ FAILED ]/e[0;m"
fi
}
download_redis (){
# 安装依赖包
yum -y install gcc jemalloc-devel || { color "安装依赖包失败,请检查网络" 1 ;exit 1;}
cd /opt
if [ -e ${Redis} ];then
color "Redis源码包已存在" 0
else
color "开始下载Redis源码包" 0
wget ${redis_url}
if [ $? -ne 0 ];then
color "下载Redis源码包失败,退出!" 1
exit 1
fi
fi
}
install_redis (){
# 解压源码包
tar xvf /opt/${Redis} -C /usr/local/src
ln -s /usr/local/src/${Redis_version} /usr/local/src/redis
# 编译安装
cd /usr/local/src/redis
make -j ${CPUS} install PREFIX=${redis_install_DIR}
if [ $? -ne 0 ];then
color "redis 编译安装失败!" 1
exit 1
else
color "redis编译安装成功" 0
fi
ln -s ${redis_install_DIR}/bin/redis-* /usr/sbin/
# 添加用户
if id redis &> /dev/null;then
color "redis用户已存在" 1
else
useradd -r -s /sbin/nologin redis
color "redis用户已创建完成" 0
fi
mkdir -p ${redis_install_DIR}/{etc,log,data,run}
#准备redis配置文件
cp redis.conf ${redis_install_DIR}/etc/
sed -i "s/bind 127.0.0.1/bind 0.0.0.0/" ${redis_install_DIR}/etc/redis.conf
sed -i "/# requirepass/a requirepass ${Password}" ${redis_install_DIR}/etc/redis.conf
sed -i "s@^dir .*/$@dir ${redis_install_DIR}//data@" ${redis_install_DIR}/etc/redis.conf
sed -i "s@^logfile .*/$@logfile ${redis_install_DIR}//log//redis-6379.log@" ${redis_install_DIR}/etc/redis.conf
sed -i "s@^pidfile .*/$@pidfile ${redis_install_DIR}//run//redis-6379.pid@" ${redis_install_DIR}/etc/redis.conf
chown -R redis:redis ${redis_install_DIR}
cat >> /etc/sysctl.conf <<EOF
net.core.somaxconn = 1024
vm.overcommit_memory = 1
EOF
sysctl -p
echo 'echo never > /sys/kernel/mm/transparent_hugepage/enabled' >> /etc/rc.d/rc.local
chmod +x /etc/rc.d/rc.local
source /etc/rc.d/rc.local
# 准备service服务
cat > /usr/lib/systemd/system/redis.service <<EOF
[Unit]
Description=redis persistent key-value database
After=network.target
[Service]
ExecStart=${redis_install_DIR}/bin/redis-server ${redis_install_DIR}/etc/redis.conf --supervised systemd
ExecStop=/bin/kill -s QUIT /$MAINPID
Type=notify
User=redis
Group=redis
RuntimeDirectory=redis
RuntimeDirectoryMode=0755
[Install]
WantedBy=multi-user.target
EOF
chown -R redis:redis ${redis_install_DIR}
systemctl daemon-reload
systemctl enable --now redis
systemctl is-active redis
if [ $? -ne 0 ];then
color "redis服务启动失败!" 1
exit 1
else
color "redis服务启动成功" 0
color "redis安装已完成" 0
fi
}
download_redis
install_redis
exit 0
-
master 节点配置
#修改redis.conf配置 vim /apps/redis/etc/redis.conf bind 0.0.0.0 masterauth "123456" requirepass "123456" #重启redis systemctl restart redis
-
slave 节点配置
#修改redis.conf配置 vim /apps/redis/etc/redis.conf bind 0.0.0.0 masterauth "123456" requirepass "123456" replicaof 10.0.0.7 6379 #重启redis systemctl restart redis
-
状态查看
master
[root@master ~]# redis-cli -a 123456 Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe. 127.0.0.1:6379> info replication # Replication role:master connected_slaves:2 slave0:ip=10.0.0.27,port=6379,state=online,offset=28,lag=1 slave1:ip=10.0.0.17,port=6379,state=online,offset=28,lag=1 master_replid:14883e4254918d97c50ec0f05c6b7b741e09cc59 master_replid2:0000000000000000000000000000000000000000 master_repl_offset:28 second_repl_offset:-1 repl_backlog_active:1 repl_backlog_size:1048576 repl_backlog_first_byte_offset:1 repl_backlog_histlen:28 127.0.0.1:6379>
slave1
[root@slave1 ~]# redis-cli -a 123456 Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe. 127.0.0.1:6379> info replication # Replication role:slave master_host:10.0.0.7 master_port:6379 master_link_status:up master_last_io_seconds_ago:9 master_sync_in_progress:0 slave_repl_offset:154 slave_priority:100 slave_read_only:1 connected_slaves:0 master_replid:14883e4254918d97c50ec0f05c6b7b741e09cc59 master_replid2:0000000000000000000000000000000000000000 master_repl_offset:154 second_repl_offset:-1 repl_backlog_active:1 repl_backlog_size:1048576 repl_backlog_first_byte_offset:1 repl_backlog_histlen:154 127.0.0.1:6379>
slave2
[root@slave2 ~]# redis-cli -a 123456 Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe. 127.0.0.1:6379> info replication # Replication role:slave master_host:10.0.0.7 master_port:6379 master_link_status:up master_last_io_seconds_ago:5 master_sync_in_progress:0 slave_repl_offset:210 slave_priority:100 slave_read_only:1 connected_slaves:0 master_replid:14883e4254918d97c50ec0f05c6b7b741e09cc59 master_replid2:0000000000000000000000000000000000000000 master_repl_offset:210 second_repl_offset:-1 repl_backlog_active:1 repl_backlog_size:1048576 repl_backlog_first_byte_offset:1 repl_backlog_histlen:210 127.0.0.1:6379>
编辑哨兵配置文件
Sentinel实际上是一个特殊的redis服务器,有些redis指令支持,但很多指令并不支持.默认监听在26379/tcp端口。
哨兵可以不和Redis服务器部署在一起,但一般部署在一起。
- 配置sentinel文件
cp /usr/local/src/redis/sentinel.conf /apps/redis/etc/redis-sentinel.conf
cd /apps/redis/etc/
#配置sentinel
[root@master etc]# grep "^[a-Z]" redis-sentinel.conf
bind 0.0.0.0
port 26379
daemonize yes
pidfile /apps/redis/run/redis-sentinel.pid
logfile /apps/redis/log/sentinel_26379.log
dir /apps/redis/data
sentinel monitor mymaster 10.0.0.7 6379 2
sentinel auth-pass mymaster 123456
sentinel down-after-milliseconds mymaster 3000
sentinel parallel-syncs mymaster 1
sentinel failover-timeout mymaster 180000
sentinel deny-scripts-reconfig yes
#启动sentinel
[root@master etc]# redis-sentinel /apps/redis/etc/redis-sentinel.conf
#查看sentinel配置信息
[root@master etc]# grep "^[a-Z]" redis-sentinel.conf
bind 0.0.0.0
port 26379
daemonize yes
pidfile /apps/redis/run/redis-sentinel.pid
logfile /apps/redis/log/sentinel_26379.log
dir /apps/redis/data
sentinel deny-scripts-reconfig yes
sentinel monitor mymaster 10.0.0.7 6379 2
sentinel parallel-syncs mymaster 1
sentinel down-after-milliseconds mymaster 3000
sentinel auth-pass mymaster 123456
sentinel config-epoch mymaster 0
#以下内容为自动生成
sentinel myid c663d4b9db845d721cd6dccf608c7904d896b745 #myid必须唯一
protected-mode no
sentinel leader-epoch mymaster 0
sentinel known-replica mymaster 10.0.0.27 6379
sentinel known-replica mymaster 10.0.0.17 6379
sentinel known-sentinel mymaster 10.0.0.27 26379 66f276f274802c6f0243007a2be4b04001b9867e
sentinel known-sentinel mymaster 10.0.0.17 26379 5d3a6880bd134e211c77bef6bc408ab63a1fd3ac
sentinel current-epoch 0
配置sentinel服务
[root@shichu ~]# cat /lib/systemd/system/redis-sentinel.service
[Unit]
Description=Redis Sentinel
After=network.target
After=network-online.target
Wants=network-online.target
[Service]
ExecStart=/apps/redis/bin/redis-sentinel /apps/redis/etc/redis-sentinel.conf --supervised systemd
ExecStop=/bin/kill -s QUIT $MAINPID
Type=notify
User=redis
Group=redis
RuntimeDirectory=redis
RuntimeDirectoryMode=0755
[Install]
WantedBy=multi-user.target
启动sentinel服务
chown -R redis:redis /apps/redis
systemctl daemon-reload
systemctl enable --now redis-sentinel
sentinel配置参数说明
sentinel monitor mymaster 10.0.0.8 6379 2 # 指定当前mymaster集群中master服务器的地址和端口
2为法定人数限制(quorum),即有几个sentinel认为master down了就进行故障转移,一般此值是所有sentinel节点(一般总数是>=3的 奇数,如:3,5,7等)的一半以上的整数值,比如,总数是3,即3/2=1.5,取整为2,是master的ODOWN客观下线的依据
sentinel auth-pass mymaster 123456 #mymaster集群中master的密码,注意此行要在上面行的下面
sentinel down-after-milliseconds mymaster 30000 #(SDOWN)判断mymaster集群中所有节点的主观下线的时间,单位:毫秒,建议3000
sentinel parallel-syncs mymaster 1 #发生故障转移后,同时向新master同步数据的slave数量,数字越小总同步时间越长,但可以减轻新master的负载压力
sentinel failover-timeout mymaster 180000 #所有slaves指向新的master所需的超时时间,单位:毫秒
sentinel deny-scripts-reconfig yes #禁止修改脚本
- 查看端口
[root@master etc]# ss -ntl
State Recv-Q Send-Q Local Address:Port Peer Address:Port
LISTEN 0 100 127.0.0.1:25 *:*
LISTEN 0 511 *:26379 *:*
LISTEN 0 511 *:6379 *:*
LISTEN 0 128 *:111 *:*
LISTEN 0 128 *:22 *:*
LISTEN 0 100 [::1]:25 [::]:*
LISTEN 0 128 [::]:111 [::]:*
LISTEN 0 128 [::]:22
-
查看sentinel日志
master日志
[root@master redis]# tail /apps/redis/log/sentinel_26379.log 1491:X 11 Jul 2022 16:38:43.636 * supervised by systemd, will signal readiness 1491:X 11 Jul 2022 16:38:43.637 * Increased maximum number of open files to 10032 (it was originally set to 1024). 1491:X 11 Jul 2022 16:38:43.637 * Running mode=sentinel, port=26379. 1491:X 11 Jul 2022 16:38:43.638 # Sentinel ID is c663d4b9db845d721cd6dccf608c7904d896b745 1491:X 11 Jul 2022 16:38:43.638 # +monitor master mymaster 10.0.0.7 6379 quorum 2 1491:X 11 Jul 2022 16:38:46.640 # +sdown sentinel 5d3a6880bd134e211c77bef6bc408ab63a1fd3ac 10.0.0.17 26379 @ mymaster 10.0.0.7 6379 1491:X 11 Jul 2022 16:38:46.640 # +sdown sentinel 66f276f274802c6f0243007a2be4b04001b9867e 10.0.0.27 26379 @ mymaster 10.0.0.7 6379 1491:X 11 Jul 2022 16:39:20.763 # -sdown sentinel 5d3a6880bd134e211c77bef6bc408ab63a1fd3ac 10.0.0.17 26379 @ mymaster 10.0.0.7 6379 1491:X 11 Jul 2022 16:39:48.855 # -sdown sentinel 66f276f274802c6f0243007a2be4b04001b9867e 10.0.0.27 26379 @ mymaster 10.0.0.7 6379
slave1日志
[root@slave1 ~]# tail /apps/redis/log/sentinel_26379.log 1293:X 11 Jul 2022 16:39:19.722 # Redis version=5.0.9, bits=64, commit=00000000, modified=0, pid=1293, just started 1293:X 11 Jul 2022 16:39:19.722 # Configuration loaded 1293:X 11 Jul 2022 16:39:19.722 * supervised by systemd, will signal readiness 1293:X 11 Jul 2022 16:39:19.723 * Increased maximum number of open files to 4096 (it was originally set to 1024). 1293:X 11 Jul 2022 16:39:19.724 * Running mode=sentinel, port=26379. 1293:X 11 Jul 2022 16:39:19.724 # Sentinel ID is 5d3a6880bd134e211c77bef6bc408ab63a1fd3ac 1293:X 11 Jul 2022 16:39:19.724 # +monitor master mymaster 10.0.0.7 6379 quorum 2 1293:X 11 Jul 2022 16:39:22.777 # +sdown sentinel 66f276f274802c6f0243007a2be4b04001b9867e 10.0.0.27 26379 @ mymaster 10.0.0.7 6379 1293:X 11 Jul 2022 16:39:48.988 # -sdown sentinel 66f276f274802c6f0243007a2be4b04001b9867e 10.0.0.27 26379 @ mymaster 10.0.0.7 6379
slave2日志
[root@slave2 ~]# tail /apps/redis/log/sentinel_26379.log 900:X 11 Jul 2022 16:32:23.322 # +sdown sentinel 605f713c7e6554ae0bfed0b98304e29d6a69e678 10.0.0.37 26379 @ mymaster 10.0.0.7 6379 1256:X 11 Jul 2022 16:39:48.523 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo 1256:X 11 Jul 2022 16:39:48.523 # Redis version=5.0.9, bits=64, commit=00000000, modified=0, pid=1256, just started 1256:X 11 Jul 2022 16:39:48.523 # Configuration loaded 1256:X 11 Jul 2022 16:39:48.523 * supervised by systemd, will signal readiness 1256:X 11 Jul 2022 16:39:48.524 * Increased maximum number of open files to 4096 (it was originally set to 1024). 1256:X 11 Jul 2022 16:39:48.525 * Running mode=sentinel, port=26379. 1256:X 11 Jul 2022 16:39:48.525 # Sentinel ID is 66f276f274802c6f0243007a2be4b04001b9867e 1256:X 11 Jul 2022 16:39:48.525 # +monitor master mymaster 10.0.0.7 6379 quorum 2
-
查看sentinel状态
[root@master redis]# redis-cli -a 123456 -p 26379 Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe. 127.0.0.1:26379> info sentinel sentinel_masters:1 sentinel_tilt:0 sentinel_running_scripts:0 sentinel_scripts_queue_length:0 sentinel_simulate_failure_flags:0 master0:name=mymaster,status=ok,address=10.0.0.7:6379,slaves=2,sentinels=3 #两个slave,三个sentinel服务器,如果sentinels值不符合,检查myid可能冲突
模拟故障转移
- 停止master redis
[root@master etc]# systemctl stop redis
[root@master etc]# ss -ntl
State Recv-Q Send-Q Local Address:Port Peer Address:Port
LISTEN 0 100 127.0.0.1:25 *:*
LISTEN 0 511 *:26379 *:*
LISTEN 0 128 *:111 *:*
LISTEN 0 128 *:22 *:*
LISTEN 0 100 [::1]:25 [::]:*
LISTEN 0 128 [::]:111 [::]:*
LISTEN 0 128 [::]:22
- 故障转移时sentinel信息
[root@master redis]# tail -f /apps/redis/log/sentinel_26379.log
1491:X 11 Jul 2022 17:07:16.959 # +sdown master mymaster 10.0.0.7 6379
1491:X 11 Jul 2022 17:07:17.044 # +odown master mymaster 10.0.0.7 6379 #quorum 2/2
1491:X 11 Jul 2022 17:07:17.044 # +new-epoch 4
1491:X 11 Jul 2022 17:07:17.044 # +try-failover master mymaster 10.0.0.7 6379
1491:X 11 Jul 2022 17:07:17.045 # +vote-for-leader c663d4b9db845d721cd6dccf608c7904d896b745 4
1491:X 11 Jul 2022 17:07:17.048 # 5d3a6880bd134e211c77bef6bc408ab63a1fd3ac voted for c663d4b9db845d721cd6dccf608c7904d896b745 4
1491:X 11 Jul 2022 17:07:17.050 # 66f276f274802c6f0243007a2be4b04001b9867e voted for c663d4b9db845d721cd6dccf608c7904d896b745 4
1491:X 11 Jul 2022 17:07:17.102 # +elected-leader master mymaster 10.0.0.7 6379
1491:X 11 Jul 2022 17:07:17.102 # +failover-state-select-slave master mymaster 10.0.0.7 6379
1491:X 11 Jul 2022 17:07:17.205 # +selected-slave slave 10.0.0.27:6379 10.0.0.27 6379 @ mymaster 10.0.0.7 6379
1491:X 11 Jul 2022 17:07:17.205 * +failover-state-send-slaveof-noone slave 10.0.0.27:6379 10.0.0.27 6379 @ mymaster 10.0.0.7 6379
1491:X 11 Jul 2022 17:07:17.269 * +failover-state-wait-promotion slave 10.0.0.27:6379 10.0.0.27 6379 @ mymaster 10.0.0.7 6379
1491:X 11 Jul 2022 17:07:18.078 # +promoted-slave slave 10.0.0.27:6379 10.0.0.27 6379 @ mymaster 10.0.0.7 6379
1491:X 11 Jul 2022 17:07:18.078 # +failover-state-reconf-slaves master mymaster 10.0.0.7 6379
1491:X 11 Jul 2022 17:07:18.145 * +slave-reconf-sent slave 10.0.0.17:6379 10.0.0.17 6379 @ mymaster 10.0.0.7 6379
1491:X 11 Jul 2022 17:07:19.144 * +slave-reconf-inprog slave 10.0.0.17:6379 10.0.0.17 6379 @ mymaster 10.0.0.7 6379
1491:X 11 Jul 2022 17:07:19.144 * +slave-reconf-done slave 10.0.0.17:6379 10.0.0.17 6379 @ mymaster 10.0.0.7 6379
1491:X 11 Jul 2022 17:07:19.228 # -odown master mymaster 10.0.0.7 6379
1491:X 11 Jul 2022 17:07:19.228 # +failover-end master mymaster 10.0.0.7 6379
1491:X 11 Jul 2022 17:07:19.228 # +switch-master mymaster 10.0.0.7 6379 10.0.0.27 6379 #可看出master节点已转移到10.0.0.27上
1491:X 11 Jul 2022 17:07:19.229 * +slave slave 10.0.0.17:6379 10.0.0.17 6379 @ mymaster 10.0.0.27 6379
1491:X 11 Jul 2022 17:07:19.229 * +slave slave 10.0.0.7:6379 10.0.0.7 6379 @ mymaster 10.0.0.27 6379
1491:X 11 Jul 2022 17:07:22.276 # +sdown slave 10.0.0.7:6379 10.0.0.7 6379 @ mymaster 10.0.0.27 6379
日志参数说明
+reset-master
:主服务器已被重置。
+slave:一个新的从服务器已经被 Sentinel 识别并关联。
+failover-state-reconf-slaves:故障转移状态切换到了 reconf-slaves 状态。
+failover-detected:另一个 Sentinel 开始了一次故障转移操作,或者一个从服务器转换成了主服务器。
+slave-reconf-sent:领头(leader)的 Sentinel 向实例发送了 SLAVEOF 命令,为实例设置新的主服务器。
+slave-reconf-inprog:实例正在将自己设置为指定主服务器的从服务器,但相应的同步过程仍未完成。
+slave-reconf-done:从服务器已经成功完成对新主服务器的同步。
-dup-sentinel:对给定主服务器进行监视的一个或多个 Sentinel 已经因为重复出现而被移除 —— 当 Sentinel 实例重启的时候,就会出现这种情况。
+sentinel:一个监视给定主服务器的新 Sentinel 已经被识别并添加。
+sdown:给定的实例现在处于主观下线状态。
-sdown:给定的实例已经不再处于主观下线状态。
+odown:给定的实例现在处于客观下线状态。
-odown:给定的实例已经不再处于客观下线状态。
+new-epoch:当前的纪元(epoch)已经被更新。
+try-failover:一个新的故障迁移操作正在执行中,等待被大多数 Sentinel 选中(waiting to be elected by the majority)。
+elected-leader:赢得指定纪元的选举,可以进行故障迁移操作了。
+failover-state-select-slave:故障转移操作现在处于 select-slave 状态 —— Sentinel 正在寻找可以升级为主服务器的从服务器。
no-good-slave:Sentinel 操作未能找到适合进行升级的从服务器。Sentinel 会在一段时间之后再次尝试寻找合适的从服务器来进行升级,又或者直接放弃执行故障转移操作。
selected-slave:Sentinel 顺利找到适合进行升级的从服务器。
failover-state-send-slaveof-noone:Sentinel 正在将指定的从服务器升级为主服务器,等待升级功能完成。
failover-end-for-timeout:故障转移因为超时而中止,不过最终所有从服务器都会开始复制新的主服务器(slaves will eventually be configured to replicate with the new master anyway)。
failover-end:故障转移操作顺利完成。所有从服务器都开始复制新的主服务器了。
+switch-master:配置变更,主服务器的 IP 和地址已经改变。 这是绝大多数外部用户都关心的信息。
+tilt :进入 tilt 模式。
-tilt :退出 tilt 模式。
-
故障转移后
redis配置文件中replicaof的master IP自动修改
[root@slave1 ~]# grep "^replicaof" /apps/redis/etc/redis.conf replicaof 10.0.0.27 6379
sentinel配置文件的sentinel monitor IP自动修改
[root@slave1 ~]# grep "^sentinel monitor" /apps/redis/etc/redis-sentinel.conf sentinel monitor mymaster 10.0.0.27 6379 2
-
redis状态
新master状态
[root@slave2 ~]# redis-cli -a 123456 Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe. 127.0.0.1:6379> info replication # Replication role:master connected_slaves:1 slave0:ip=10.0.0.17,port=6379,state=online,offset=4290787,lag=1 master_replid:590248f1058be0774dab136e8fb18a8e5b5e4078 master_replid2:90a5507845cbc6319a7f704ec666b28aa7e9b5ff master_repl_offset:4290787 second_repl_offset:3910006 repl_backlog_active:1 repl_backlog_size:1048576 repl_backlog_first_byte_offset:3242212 repl_backlog_histlen:1048576 127.0.0.1:6379>
另一个slave指向新的master
[root@slave1 ~]# redis-cli -a 123456 Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe. 127.0.0.1:6379> info replication # Replication role:slave master_host:10.0.0.27 master_port:6379 master_link_status:up master_last_io_seconds_ago:0 master_sync_in_progress:0 slave_repl_offset:4296387 slave_priority:100 slave_read_only:1 connected_slaves:0 master_replid:590248f1058be0774dab136e8fb18a8e5b5e4078 master_replid2:90a5507845cbc6319a7f704ec666b28aa7e9b5ff master_repl_offset:4296387 second_repl_offset:3910006 repl_backlog_active:1 repl_backlog_size:1048576 repl_backlog_first_byte_offset:3247812 repl_backlog_histlen:1048576 127.0.0.1:6379>
-
恢复原故障master重新加入redis集群
[root@master redis]# systemctl start redis
原master状态
#redis配置指向新的master节点 [root@master redis]# grep "^replicaof" /apps/redis/etc/redis.conf replicaof 10.0.0.27 6379 #查看redis状态 [root@master redis]# redis-cli -a 123456 Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe. 127.0.0.1:6379> info replication # Replication role:slave master_host:10.0.0.27 master_port:6379 master_link_status:up master_last_io_seconds_ago:0 master_sync_in_progress:0 slave_repl_offset:4366815 slave_priority:100 slave_read_only:1 connected_slaves:0 master_replid:590248f1058be0774dab136e8fb18a8e5b5e4078 master_replid2:0000000000000000000000000000000000000000 master_repl_offset:4366815 second_repl_offset:-1 repl_backlog_active:1 repl_backlog_size:1048576 repl_backlog_first_byte_offset:4343555 repl_backlog_histlen:23261 #查看sentinel状态 [root@master redis]# redis-cli -a 123456 -p 26379 Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe. 127.0.0.1:26379> info sentinel # Sentinel sentinel_masters:1 sentinel_tilt:0 sentinel_running_scripts:0 sentinel_scripts_queue_length:0 sentinel_simulate_failure_flags:0 master0:name=mymaster,status=ok,address=10.0.0.27:6379,slaves=2,sentinels=3
新master状态
#redis状态 [root@slave2 ~]# redis-cli -a 123456 Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe. 127.0.0.1:6379> info replication # Replication role:master connected_slaves:2 slave0:ip=10.0.0.17,port=6379,state=online,offset=4407027,lag=0 slave1:ip=10.0.0.7,port=6379,state=online,offset=4407160,lag=0 master_replid:590248f1058be0774dab136e8fb18a8e5b5e4078 master_replid2:90a5507845cbc6319a7f704ec666b28aa7e9b5ff master_repl_offset:4407293 second_repl_offset:3910006 repl_backlog_active:1 repl_backlog_size:1048576 repl_backlog_first_byte_offset:3358718 repl_backlog_histlen:1048576 #sentinel日志 [root@slave2 ~]# tail -f /apps/redis/log/sentinel_26379.log 1256:X 11 Jul 2022 17:07:17.049 # +new-epoch 4 1256:X 11 Jul 2022 17:07:17.052 # +vote-for-leader c663d4b9db845d721cd6dccf608c7904d896b745 4 1256:X 11 Jul 2022 17:07:17.068 # +odown master mymaster 10.0.0.7 6379 #quorum 3/2 1256:X 11 Jul 2022 17:07:17.068 # Next failover delay: I will not start a failover before Mon Jul 11 17:13:17 2022 1256:X 11 Jul 2022 17:07:18.149 # +config-update-from sentinel c663d4b9db845d721cd6dccf608c7904d896b745 10.0.0.7 26379 @ mymaster 10.0.0.7 6379 1256:X 11 Jul 2022 17:07:18.149 # +switch-master mymaster 10.0.0.7 6379 10.0.0.27 6379 1256:X 11 Jul 2022 17:07:18.149 * +slave slave 10.0.0.17:6379 10.0.0.17 6379 @ mymaster 10.0.0.27 6379 1256:X 11 Jul 2022 17:07:18.149 * +slave slave 10.0.0.7:6379 10.0.0.7 6379 @ mymaster 10.0.0.27 6379 1256:X 11 Jul 2022 17:07:21.189 # +sdown slave 10.0.0.7:6379 10.0.0.7 6379 @ mymaster 10.0.0.27 6379 1256:X 11 Jul 2022 17:43:54.361 # -sdown slave 10.0.0.7:6379 10.0.0.7 6379 @ mymaster 10.0.0.27 6379
-
sentinel运维
手动让主节点下线
sentinel failover <masterName>
范例
#可指定优先级,值越小sentinel会优先将之选为新的master,默为值为100 [root@slave1 ~]# grep 'replica-priority' /apps/redis/etc/redis.conf replica-priority 30 [root@slave1 ~]# redis-cli -a 123456 -p 26379 127.0.0.1:26379> sentinel failover mymaster OK 127.0.0.1:26379> info sentinel # Sentinel sentinel_masters:1 sentinel_tilt:0 sentinel_running_scripts:0 sentinel_scripts_queue_length:0 sentinel_simulate_failure_flags:0 master0:name=mymaster,status=ok,address=10.0.0.17:6379,slaves=2,sentinels=3
四、简述 redis 集群的实现原理
Redis Cluster特点
- 所有Redis节点使用(PING机制)互联
- 集群中某个节点的是否失效,是由整个集群中超过半数的节点监测都失效,才能算真正的失效
- 客户端不需要proxy即可直接连接redis,应用程序中需要配置有全部的redis服务器IP
- redis cluster把所有的redis node 平均映射到 0-16383个槽位(slot)上,读写需要到指定的redis node上进行操作,因此有多少个redis node相当于redis 并发扩展了多少倍,每个redis node 承担16384/N个槽位
- Redis cluster预先分配16384个(slot)槽位,当需要在redis集群中写入一个key -value的时候,会使用CRC16(key) mod 16384之后的值,决定将key写入值哪一个槽位从而决定写入哪一个Redis节点上,从而有效解决单机瓶颈。
Redis cluster 架构
五、基于 redis5 的 redis cluster 部署
官方文档:https://redis.io/topics/cluster-tutorial
创建Redis Cluster准备条件
-
每个redis 节点采用相同的硬件配置、相同的密码、相同的redis版本
-
所有redis服务器必须没有任何数据
-
准备6台机器,三主三从架构
#集群节点 Redis-node1:10.0.0.7 Redis-node2:10.0.0.17 Redis-node3:10.0.0.27 Redis-node4: 10.0.0.37 Redis-node5: 10.0.0.47 Redis-node6: 10.0.0.57 #预留节点 10.0.0.67 10.0.0.77
部署redis cluster
1. 安装redis
修改redis配置
[root@node1 etc]# cat redis.conf
...
bind 0.0.0.0
masterauth 123456 #建议配置,否则后期的master和slave主从复制无法成功,还需再配置
requirepass 123456
cluster-enabled yes #取消此行注释,必须开启集群,开启后redis 进程会有cluster显示
cluster-config-file nodes-6379.conf #取消此行注释,此为集群状态文件,记录主从关系及slot范围信息,由redis cluster 集群自动创建和维护
cluster-require-full-coverage no #默认值为yes,设为no可以防止一个节点不可用导致整个cluster不可能
...
[root@node1 etc]#systemctl enable --now redis
2. 查看当前redis状态
#查看端口
[root@node1 ~]# ss -ntl
State Recv-Q Send-Q Local Address:Port Peer Address:Port
LISTEN 0 511 *:6379 *:*
LISTEN 0 128 *:111 *:*
LISTEN 0 128 *:22 *:*
LISTEN 0 100 127.0.0.1:25 *:*
LISTEN 0 511 *:16379 *:*
LISTEN 0 128 [::]:111 [::]:*
LISTEN 0 128 [::]:22 [::]:*
LISTEN 0 100 [::1]:25 [::]:*
#查看进程有[cluster]状态
[root@node1 ~]# ps aux|grep redis
redis 24754 0.2 0.3 153996 3172 ? Ssl 21:28 0:02 /apps/redis/bin/redis-server 0.0.0.0:6379 [cluster]
root 24822 0.0 0.0 112812 980 pts/0 R+ 21:44 0:00 grep --color=auto redis
3. 创建集群
[root@node1 ~]# redis-cli -a 123456 --cluster create 10.0.0.7:6379 10.0.0.17:6379 10.0.0.27:6379 10.0.0.37:6379 /
10.0.0.47:6379 10.0.0.57:6379 --cluster-replicas 1
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
>>> Performing hash slots allocation on 6 nodes...
Master[0] -> Slots 0 - 5460
Master[1] -> Slots 5461 - 10922
Master[2] -> Slots 10923 - 16383
Adding replica 10.0.0.47:6379 to 10.0.0.7:6379
Adding replica 10.0.0.57:6379 to 10.0.0.17:6379
Adding replica 10.0.0.37:6379 to 10.0.0.27:6379
M: 4ccee0bb38763061cf567995bcdd9289cea9cfec 10.0.0.7:6379 #带M的为master
slots:[0-5460] (5461 slots) master #当前master的槽位起始和结束位
M: 12fdc235442ed40a838e77b246025799b4b3357b 10.0.0.17:6379
slots:[5461-10922] (5462 slots) master
M: 16bb6630a6a09bd4b24d7a203ecaa38b9a4360a7 10.0.0.27:6379
slots:[10923-16383] (5461 slots) master
S: 59eac16e6e2992cdfffe97934d7409afe21d2a9a 10.0.0.37:6379 #带S的slave
replicates 16bb6630a6a09bd4b24d7a203ecaa38b9a4360a7
S: 15e2e2eccefd453f1a154fc42c6a9b030acacfb2 10.0.0.47:6379
replicates 4ccee0bb38763061cf567995bcdd9289cea9cfec
S: 8c3b8146ce75ab277958937d4e79e893a15c50e2 10.0.0.57:6379
replicates 12fdc235442ed40a838e77b246025799b4b3357b
Can I set the above configuration? (type 'yes' to accept): yes #输入yes自动创建集群
>>> Nodes configuration updated
>>> Assign a different config epoch to each node
>>> Sending CLUSTER MEET messages to join the cluster
Waiting for the cluster to join
...
>>> Performing Cluster Check (using node 10.0.0.7:6379)
M: 4ccee0bb38763061cf567995bcdd9289cea9cfec 10.0.0.7:6379
slots:[0-5460] (5461 slots) master #已经分配的槽位
1 additional replica(s) #分配了一个slave
S: 59eac16e6e2992cdfffe97934d7409afe21d2a9a 10.0.0.37:6379
slots: (0 slots) slave #slave没有分配槽位
replicates 16bb6630a6a09bd4b24d7a203ecaa38b9a4360a7 #对应的master的10.0.0.27的ID
M: 12fdc235442ed40a838e77b246025799b4b3357b 10.0.0.17:6379
slots:[5461-10922] (5462 slots) master
1 additional replica(s)
S: 8c3b8146ce75ab277958937d4e79e893a15c50e2 10.0.0.57:6379
slots: (0 slots) slave
replicates 12fdc235442ed40a838e77b246025799b4b3357b #对应的master的10.0.0.17的ID
M: 16bb6630a6a09bd4b24d7a203ecaa38b9a4360a7 10.0.0.27:6379
slots:[10923-16383] (5461 slots) master
1 additional replica(s)
S: 15e2e2eccefd453f1a154fc42c6a9b030acacfb2 10.0.0.47:6379
slots: (0 slots) slave
replicates 4ccee0bb38763061cf567995bcdd9289cea9cfec #对应的master的10.0.0.7的ID
[OK] All nodes agree about slots configuration. #所有节点槽位分配完成
>>> Check for open slots... #检查打开的槽位
>>> Check slots coverage... #检查插槽覆盖范围
[OK] All 16384 slots covered. #所有槽位(16384个)分配完成
[root@node1 ~]#
观察以上结果,可以看到3组master/slave
master:10.0.0.7-->slave:10.0.0.47
master:10.0.0.17-->slave:10.0.0.57
master:10.0.0.27-->slave:10.0.0.37
4. 查看主从状态
node1(10.0.0.7)
[root@node1 ~]# redis-cli -a 123456 -c info replication
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
# Replication
role:master
connected_slaves:1
slave0:ip=10.0.0.47,port=6379,state=online,offset=1008,lag=1
master_replid:3493f56b2f698cea41c90cb0a41e1562b5821636
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:1008
second_repl_offset:-1
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:1
repl_backlog_histlen:1008
node2(10.0.0.17)
[root@node2 etc]# redis-cli -a 123456 -c info replication
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
# Replication
role:master
connected_slaves:1
slave0:ip=10.0.0.57,port=6379,state=online,offset=1008,lag=0
master_replid:269568d06cb92748f583d6ea900e7563b1739f54
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:1008
second_repl_offset:-1
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:1
repl_backlog_histlen:1008
node3(10.0.0.27)
[root@node3 ~]# redis-cli -a 123456 -c info replication
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
# Replication
role:master
connected_slaves:1
slave0:ip=10.0.0.37,port=6379,state=online,offset=1008,lag=0
master_replid:826e716b92aa4e287013a33f9786e529be2fff71
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:1008
second_repl_offset:-1
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:1
repl_backlog_histlen:1008
node4(10.0.0.37)
[root@node4 ~]# redis-cli -a 123456 -c info replication
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
# Replication
role:slave
master_host:10.0.0.27
master_port:6379
master_link_status:up
master_last_io_seconds_ago:6
master_sync_in_progress:0
slave_repl_offset:1008
slave_priority:100
slave_read_only:1
connected_slaves:0
master_replid:826e716b92aa4e287013a33f9786e529be2fff71
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:1008
second_repl_offset:-1
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:1
repl_backlog_histlen:1008
node5(10.0.0.47)
[root@node5 ~]# redis-cli -a 123456 -c info replication
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
# Replication
role:slave
master_host:10.0.0.7
master_port:6379
master_link_status:up
master_last_io_seconds_ago:4
master_sync_in_progress:0
slave_repl_offset:1008
slave_priority:100
slave_read_only:1
connected_slaves:0
master_replid:3493f56b2f698cea41c90cb0a41e1562b5821636
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:1008
second_repl_offset:-1
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:1
repl_backlog_histlen:1008
node6(10.0.0.57)
[root@node6 ~]# redis-cli -a 123456 -c info replication
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
# Replication
role:slave
master_host:10.0.0.17
master_port:6379
master_link_status:up
master_last_io_seconds_ago:10
master_sync_in_progress:0
slave_repl_offset:1008
slave_priority:100
slave_read_only:1
connected_slaves:0
master_replid:269568d06cb92748f583d6ea900e7563b1739f54
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:1008
second_repl_offset:-1
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:1
repl_backlog_histlen:1008
查看指定master节点的slave节点信息
#获取所有节点信息
[root@node1 ~]# redis-cli -a 123456 cluster nodes 2>/dev/null
59eac16e6e2992cdfffe97934d7409afe21d2a9a 10.0.0.37:6379@16379 slave 16bb6630a6a09bd4b24d7a203ecaa38b9a4360a7 0 1657554345797 4 connected
4ccee0bb38763061cf567995bcdd9289cea9cfec 10.0.0.7:6379@16379 myself,master - 0 1657554345000 1 connected 0-5460
12fdc235442ed40a838e77b246025799b4b3357b 10.0.0.17:6379@16379 master - 0 1657554343746 2 connected 5461-10922
8c3b8146ce75ab277958937d4e79e893a15c50e2 10.0.0.57:6379@16379 slave 12fdc235442ed40a838e77b246025799b4b3357b 0 1657554344770 6 connected
16bb6630a6a09bd4b24d7a203ecaa38b9a4360a7 10.0.0.27:6379@16379 master - 0 1657554344000 3 connected 10923-16383
15e2e2eccefd453f1a154fc42c6a9b030acacfb2 10.0.0.47:6379@16379 slave 4ccee0bb38763061cf567995bcdd9289cea9cfec 0 1657554344000 5 connected
#查看master节点ID对应的slave节点信息,16bb6630a6a09bd4b24d7a203ecaa38b9a4360a7为10.0.0.27 master节点ID
[root@node1 ~]# redis-cli -a 123456 cluster slaves 16bb6630a6a09bd4b24d7a203ecaa38b9a4360a7 2>/dev/null
1) "59eac16e6e2992cdfffe97934d7409afe21d2a9a 10.0.0.37:6379@16379 slave 16bb6630a6a09bd4b24d7a203ecaa38b9a4360a7 0 1657554778157 4 connected"
5. 验证集群状态
[root@node1 ~]# redis-cli -a 123456 cluster info
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
cluster_state:ok
cluster_slots_assigned:16384
cluster_slots_ok:16384
cluster_slots_pfail:0
cluster_slots_fail:0
cluster_known_nodes:6 #6个节点
cluster_size:3 #3组集群
cluster_current_epoch:6
cluster_my_epoch:1
cluster_stats_messages_ping_sent:3639
cluster_stats_messages_pong_sent:3625
cluster_stats_messages_sent:7264
cluster_stats_messages_ping_received:3620
cluster_stats_messages_pong_received:3639
cluster_stats_messages_meet_received:5
cluster_stats_messages_received:7264
#查看任意节点的集群状态
[root@node1 ~]# redis-cli -a 123456 --cluster info 10.0.0.27:6379
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
10.0.0.27:6379 (16bb6630...) -> 0 keys | 5461 slots | 1 slaves.
10.0.0.17:6379 (12fdc235...) -> 0 keys | 5462 slots | 1 slaves.
10.0.0.7:6379 (4ccee0bb...) -> 0 keys | 5461 slots | 1 slaves.
[OK] 0 keys in 3 masters.
0.00 keys per slot on average.
查看集群node对应关系
#获取集群中所有节点
[root@node1 ~]# redis-cli -a 123456 cluster nodes
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
59eac16e6e2992cdfffe97934d7409afe21d2a9a 10.0.0.37:6379@16379 slave 16bb6630a6a09bd4b24d7a203ecaa38b9a4360a7 0 1657556036000 4 connected
4ccee0bb38763061cf567995bcdd9289cea9cfec 10.0.0.7:6379@16379 myself,master - 0 1657556036000 1 connected 0-5460
12fdc235442ed40a838e77b246025799b4b3357b 10.0.0.17:6379@16379 master - 0 1657556036033 2 connected 5461-10922
8c3b8146ce75ab277958937d4e79e893a15c50e2 10.0.0.57:6379@16379 slave 12fdc235442ed40a838e77b246025799b4b3357b 0 1657556038079 6 connected
16bb6630a6a09bd4b24d7a203ecaa38b9a4360a7 10.0.0.27:6379@16379 master - 0 1657556037057 3 connected 10923-16383
15e2e2eccefd453f1a154fc42c6a9b030acacfb2 10.0.0.47:6379@16379 slave 4ccee0bb38763061cf567995bcdd9289cea9cfec 0 1657556036000 5 connected
[root@node1 ~]# redis-cli -a 123456 --cluster check 10.0.0.27:6379
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
10.0.0.27:6379 (16bb6630...) -> 0 keys | 5461 slots | 1 slaves.
10.0.0.17:6379 (12fdc235...) -> 0 keys | 5462 slots | 1 slaves.
10.0.0.7:6379 (4ccee0bb...) -> 0 keys | 5461 slots | 1 slaves.
[OK] 0 keys in 3 masters.
0.00 keys per slot on average.
>>> Performing Cluster Check (using node 10.0.0.27:6379)
M: 16bb6630a6a09bd4b24d7a203ecaa38b9a4360a7 10.0.0.27:6379
slots:[10923-16383] (5461 slots) master
1 additional replica(s)
S: 59eac16e6e2992cdfffe97934d7409afe21d2a9a 10.0.0.37:6379
slots: (0 slots) slave
replicates 16bb6630a6a09bd4b24d7a203ecaa38b9a4360a7
S: 8c3b8146ce75ab277958937d4e79e893a15c50e2 10.0.0.57:6379
slots: (0 slots) slave
replicates 12fdc235442ed40a838e77b246025799b4b3357b
M: 12fdc235442ed40a838e77b246025799b4b3357b 10.0.0.17:6379
slots:[5461-10922] (5462 slots) master
1 additional replica(s)
M: 4ccee0bb38763061cf567995bcdd9289cea9cfec 10.0.0.7:6379
slots:[0-5460] (5461 slots) master
1 additional replica(s)
S: 15e2e2eccefd453f1a154fc42c6a9b030acacfb2 10.0.0.47:6379
slots: (0 slots) slave
replicates 4ccee0bb38763061cf567995bcdd9289cea9cfec
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.
验证集群写入
#连接节点,可能会出现槽位不在当前node所以无法写入
[root@shichu ~]# redis-cli -a 123456 -h 10.0.0.7
10.0.0.7:6379> set key1 v1
(error) MOVED 9189 10.0.0.17:6379
#需要连接指定node,才可写入
[root@shichu ~]# redis-cli -a 123456 -h 10.0.0.17
10.0.0.17:6379> set key1 values1
OK
10.0.0.17:6379> get key1
"values1"
#使用选项-c以集群方式连接,连接至集群中任意一节点均可
[root@shichu ~]# redis-cli -a 123456 -h 10.0.0.7 -c
10.0.0.7:6379> set key1 v1
-> Redirected to slot [9189] located at 10.0.0.17:6379
OK
10.0.0.17:6379> get key1
"v1"
六、部署 Zabbix 监控
官网下载地址:https://www.zabbix.com/cn/download
官网文档:https://www.zabbix.com/manuals
https://cdn.zabbix.com/zabbix/sources/stable/5.0/zabbix-5.0.25.tar.gz
使用LNMP编译安装Zabbix 5
L:Linux(CentOS7)https://mirrors.aliyun.com/centos/7/isos/x86_64/
N:Nginx(1.18.0) https://nginx.org/en/download.html
M:MySQL(8.0.19) https://dev.mysql.com/downloads/mysql/
P:PHP(7.4.11) http://php.net/downloads.php
Zabbix (5.0.25) https://cdn.zabbix.com/zabbix/sources/
graph LR
A[Client]
B[Linux</br>Nginx</br>PHP</br>Zabbix</br>10.0.0.100]
C[Linux</br>MySQL</br>10.0.0.200]
A—>B—>C
1. 安装MySQL
安装完成后创建zabbix用户
mysql -uroot -p123456 -e "create database zabbix character set utf8 collate utf8_bin;"
mysql -uroot -p123456 -e "create user zabbix@'10.0.0.%' identified by '123456'"
mysql -uroot -p123456 -e "grant all privileges on zabbix.* to zabbix@'10.0.0.%'"
mysql -uroot -p123456 -e "use mysql;/
alter user zabbix@'10.0.0.%' identified with mysql_native_password by '123456';/
flush privileges;"
2. 安装Nginx
参考:基于CentOS 7 编译安装Nginx 1.18[^1]
3. 安装PHP
参考:基于CentOS 7 编译安装PHP 7.4[^2]
4. 安装Zabbix
安装zabbix_server
#!/bin/bash
# 编译安装Zabbix
source /etc/init.d/functions
#Zabbix版本
Zabbix_Version=zabbix-5.0.25
Suffix=tar.gz
Zabbix=${Zabbix_Version}.${Suffix}
Password=123456
#Zabbix源码下载地址
Zabbix_url=https://cdn.zabbix.com/zabbix/sources/stable/5.0/zabbix-5.0.25.tar.gz
#Zabbix安装路径
Zabbix_install_DIR=/apps/zabbix
# CPU数量
CPUS=`lscpu|grep "^CPU(s)"|awk '{print $2}'`
# 系统类型
os_type=`grep "^NAME" /etc/os-release |awk -F'"| ' '{print $2}'`
# 系统版本号
os_version=`awk -F'"' '/^VERSION_ID/{print $2}' /etc/os-release`
color () {
if [[ $2 -eq 0 ]];then
echo -e "/e[1;32m$1/t/t/t/t/t/t[ OK ]/e[0;m"
else
echo $2
echo -e "/e[1;31m$1/t/t/t/t/t/t[ FAILED ]/e[0;m"
fi
}
install_Zabbix (){
#----------------------------下载源码包-----------------------------
cd /opt
if [ -e ${Zabbix} ];then
color "Zabbix源码包已存在" 0
else
color "开始下载Zabbix源码包" 0
wget ${Zabbix_url}
if [ $? -ne 0 ];then
color "下载Zabbix源码包失败,退出!" 1
exit 1
fi
fi
#----------------------------解压源码包-----------------------------
color "开始解压源码包" 0
tar -zxvf /opt/${Zabbix} -C /usr/local/src
ln -s /usr/local/src/${Zabbix_Version} /usr/local/src/zabbix
#----------------------------安装依赖包--------------------------------
color "开始安装依赖包" 0
#wget https://dev.mysql.com/get/mysql80-community-release-el7-6.noarch.rpm
yum install -y gcc libxml2-devel net-snmp net-snmp-devel curl curl-devel php-gd php-bcmath php-xml /
php-mbstring mariadb mariadb-devel OpenIPMI-devel libevent-devel java-1.8.0-openjdk-devel /
|| { color "安装依赖包失败,请检查网络" 1 ;exit 1;}
#---------------------------创建Zabbix用户---------------------------
if id zabbix &> /dev/null ;then
color "Zabbix用户已存在" 1
else
groupadd --system zabbix
useradd --system -g zabbix -d /usr/lib/zabbix -s /sbin/nologin -c "Zabbix Monitoring System" zabbix
color "Zabbix用户已创建完成" 0
fi
#---------------------------编译---------------------------
color "开始编译zabbix" 0
cd /usr/local/src/zabbix
./configure --prefix=${Zabbix_install_DIR} /
--enable-server /
--enable-agent /
--with-mysql /
--with-net-snmp /
--with-libcurl /
--with-libxml2 /
--with-openipmi /
--enable-proxy /
--enable-java
make -j ${CPUS} install
if [ $? -ne 0 ];then
color "Zabbix 编译安装失败!" 1
exit 1
else
color "Zabbix编译安装成功" 0
fi
#复制web界面相关文件
mkdir -pv /home/nginx/zabbix
cp -rf /usr/local/src/zabbix/ui/* /home/nginx/zabbix/
chown nginx:nginx -R /home/nginx/zabbix
/apps/zabbix/sbin/zabbix_server -c /apps/zabbix/etc/zabbix_server.conf
if [ $? -eq 0 ];then
color "zabbix_server测试能正常启动" 0
pkill zabbix
fi
color "zabbix安装完成" 0
}
install_Zabbix
exit 0
修改配置文件
-
修改/apps/nginx/conf/nginx.conf配置文件
worker_processes 1; pid logs/nginx.pid; events { worker_connections 1024; } http { include mime.types; default_type application/octet-stream; sendfile on; keepalive_timeout 65; server { listen 80; server_name 10.0.0.100; #指定主机名 server_tokens off; #隐藏nginx版本信息 location / { root /home/nginx/zabbix; #指定数据目录 index index.php index.html index.htm; #指定默认主页 } error_page 500 502 503 504 /50x.html; location = /50x.html { root html; } location ~ /.php$ { #实现php-fpm root /home/nginx/zabbix; fastcgi_pass 127.0.0.1:9000; fastcgi_index index.php; fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name; include fastcgi_params; fastcgi_hide_header X-Powered-By; #隐藏php版本信息 } location ~ ^/(ping|pm_status)$ { #实现状态页 include fastcgi_params; fastcgi_pass 127.0.0.1:9000; fastcgi_param PATH_TRANSLATED $document_root$fastcgi_script_name; } } }
-
修改php配置文件
#修改/etc/php.ini sed -i -e "/memory_limit/c memory_limit = 256M" / -e "/post_max_size/c post_max_size = 30M" / -e "/upload_max_filesize/c upload_max_filesize = 20M" / -e "/max_execution_time/c max_execution_time = 300" / -e "/max_input_time/c max_input_time = 300" / -e "/;date.timezone/c date.timezone = Asia/Shanghai" / /etc/php.ini #修改/apps/php/etc/php-fpm.d/www.conf sed -i -e "/user = www/c user = nginx" / -e "/group = www/c group = nginx" /apps/php/etc/php-fpm.d/www.conf
重启服务
systemctl restart nginx php-fpm
-
导入mysql数据
mysql -uzabbix -p123456 -h10.0.0.200 zabbix < /usr/local/src/zabbix/database/mysql/schema.sql mysql -uzabbix -p123456 -h10.0.0.200 zabbix < /usr/local/src/zabbix/database/mysql/images.sql mysql -uzabbix -p123456 -h10.0.0.200 zabbix < /usr/local/src/zabbix/database/mysql/data.sql
-
修改zabbix配置文件
sed -i "/# DBHost=localhost/aDBHost=10.0.0.200" /apps/zabbix/etc/zabbix_server.conf sed -i "/# DBPassword=/aDBPassword=123456" /apps/zabbix/etc/zabbix_server.conf sed -i "/# DBPort=/aDBPort=3306" /apps/zabbix/etc/zabbix_server.conf sed -i "/StatsAllowedIP=127.0.0.1/c #StatsAllowedIP=127.0.0.1" /apps/zabbix/etc/zabbix_server.conf
-
设置zabbix_server启动服务脚本
cat /lib/systemd/system/zabbix-server.service
[Unit] Description=Zabbix Server After=syslog.target After=network.target [Service] Environment="CONFFILE=/apps/zabbix/etc/zabbix_server.conf" EnvironmentFile=-/etc/default/zabbix-server Type=forking Restart=on-failure PIDFile=/tmp/zabbix_server.pid KillMode=control-group ExecStart=/apps/zabbix/sbin/zabbix_server -c $CONFFILE ExecStop=/bin/kill -SIGTERM $MAINPID RestartSec=10s TimeoutStopSec=5 [Install] WantedBy=multi-user.target
启动服务
systemctl daemon-reload systemctl enable --now zabbix-server
-
设置zabbix_agent启动服务脚本
cat /lib/systemd/system/zabbix-agent.service
[Unit] Description=Zabbix Agent After=syslog.target After=network.target [Service] Environment="CONFFILE=/apps/zabbix/etc/zabbix_agentd.conf" EnvironmentFile=-/etc/default/zabbix-agent Type=forking Restart=on-failure PIDFile=/tmp/zabbix_agentd.pid KillMode=control-group ExecStart=/apps/zabbix/sbin/zabbix_agentd -c $CONFFILE ExecStop=/bin/kill -SIGTERM $MAINPID RestartSec=10s User=zabbix Group=zabbix [Install] WantedBy=multi-user.target
启动服务
systemctl daemon-reload systemctl enable --now zabbix-agent
-
查看状态
- 10050、10051端口启动正常
#可看到10050(agent)、10051(server)端口 [root@shichu apps]# ss -ntl State Recv-Q Send-Q Local Address:Port Peer Address:Port LISTEN 0 128 *:22 *:* LISTEN 0 100 127.0.0.1:25 *:* LISTEN 0 128 *:10050 *:* LISTEN 0 128 *:10051 *:* LISTEN 0 128 127.0.0.1:9000 *:* LISTEN 0 128 *:111 *:* LISTEN 0 128 *:80 *:* LISTEN 0 128 [::]:22 [::]:* LISTEN 0 100 [::1]:25 [::]:* LISTEN 0 128 [::]:111 [::]:*
- zabbix-sever服务状态
[root@shichu apps]# systemctl status zabbix-server ● zabbix-server.service - Zabbix Server Loaded: loaded (/usr/lib/systemd/system/zabbix-server.service; disabled; vendor preset: disabled) Active: active (running) since Thu 2022-07-14 00:47:09 CST; 52s ago Process: 8346 ExecStop=/bin/kill -SIGTERM $MAINPID (code=exited, status=0/SUCCESS) Process: 8352 ExecStart=/apps/zabbix/sbin/zabbix_server -c $CONFFILE (code=exited, status=0/SUCCESS) Main PID: 8360 (zabbix_server) CGroup: /system.slice/zabbix-server.service ├─8360 /apps/zabbix/sbin/zabbix_server -c /apps/zabbix/etc/zabbix_server.conf ├─8362 /apps/zabbix/sbin/zabbix_server: configuration syncer [synced configuration in 0.059399 sec, idle 6... ├─8363 /apps/zabbix/sbin/zabbix_server: alert manager #1 [sent 0, failed 0 alerts, idle 5.027609 sec durin... ├─8364 /apps/zabbix/sbin/zabbix_server: alerter #1 started ├─8365 /apps/zabbix/sbin/zabbix_server: alerter #2 started ├─8366 /apps/zabbix/sbin/zabbix_server: alerter #3 started ├─8367 /apps/zabbix/sbin/zabbix_server: preprocessing manager #1 [queued 0, processed 11 values, idle 5.00... ├─8368 /apps/zabbix/sbin/zabbix_server: preprocessing worker #1 started ├─8369 /apps/zabbix/sbin/zabbix_server: preprocessing worker #2 started ├─8370 /apps/zabbix/sbin/zabbix_server: preprocessing worker #3 started ├─8371 /apps/zabbix/sbin/zabbix_server: lld manager #1 [processed 0 LLD rules, idle 5.008702sec during 5.0... ├─8372 /apps/zabbix/sbin/zabbix_server: lld worker #1 started ├─8373 /apps/zabbix/sbin/zabbix_server: lld worker #2 started ├─8374 /apps/zabbix/sbin/zabbix_server: housekeeper [startup idle for 30 minutes] ├─8375 /apps/zabbix/sbin/zabbix_server: timer #1 [updated 0 hosts, suppressed 0 events in 0.001868 sec, id... ├─8376 /apps/zabbix/sbin/zabbix_server: http poller #1 [got 0 values in 0.001502 sec, idle 5 sec] ├─8377 /apps/zabbix/sbin/zabbix_server: discoverer #1 [processed 0 rules in 0.004759 sec, idle 60 sec] ├─8378 /apps/zabbix/sbin/zabbix_server: history syncer #1 [processed 0 values, 0 triggers in 0.000050 sec,... ├─8379 /apps/zabbix/sbin/zabbix_server: history syncer #2 [processed 0 values, 0 triggers in 0.000175 sec,... ├─8380 /apps/zabbix/sbin/zabbix_server: history syncer #3 [processed 0 values, 0 triggers in 0.000029 sec,... ├─8381 /apps/zabbix/sbin/zabbix_server: history syncer #4 [processed 0 values, 0 triggers in 0.000019 sec,... ├─8382 /apps/zabbix/sbin/zabbix_server: escalator #1 [processed 0 escalations in 0.004440 sec, idle 3 sec]... ├─8383 /apps/zabbix/sbin/zabbix_server: proxy poller #1 [exchanged data with 0 proxies in 0.000028 sec, id... ├─8384 /apps/zabbix/sbin/zabbix_server: self-monitoring [processed data in 0.000016 sec, idle 1 sec] ├─8385 /apps/zabbix/sbin/zabbix_server: task manager [processed 0 task(s) in 0.000836 sec, idle 5 sec] ├─8386 /apps/zabbix/sbin/zabbix_server: poller #1 [got 0 values in 0.000050 sec, idle 1 sec] ├─8387 /apps/zabbix/sbin/zabbix_server: poller #2 [got 0 values in 0.000048 sec, idle 1 sec] ├─8388 /apps/zabbix/sbin/zabbix_server: poller #3 [got 1 values in 0.001602 sec, idle 1 sec] ├─8389 /apps/zabbix/sbin/zabbix_server: poller #4 [got 0 values in 0.000019 sec, idle 1 sec] ├─8390 /apps/zabbix/sbin/zabbix_server: poller #5 [got 0 values in 0.001402 sec, idle 1 sec] ├─8391 /apps/zabbix/sbin/zabbix_server: unreachable poller #1 [got 0 values in 0.000039 sec, idle 5 sec] ├─8392 /apps/zabbix/sbin/zabbix_server: trapper #1 [processed data in 0.000000 sec, waiting for connection... ├─8393 /apps/zabbix/sbin/zabbix_server: trapper #2 [processed data in 0.000000 sec, waiting for connection... ├─8394 /apps/zabbix/sbin/zabbix_server: trapper #3 [processed data in 0.000000 sec, waiting for connection... ├─8395 /apps/zabbix/sbin/zabbix_server: trapper #4 [processed data in 0.000000 sec, waiting for connection... ├─8396 /apps/zabbix/sbin/zabbix_server: trapper #5 [processed data in 0.000000 sec, waiting for connection... ├─8397 /apps/zabbix/sbin/zabbix_server: icmp pinger #1 [got 0 values in 0.000020 sec, idle 5 sec] └─8398 /apps/zabbix/sbin/zabbix_server: alert syncer [queued 0 alerts(s), flushed 0 result(s) in 0.001557 ... Jul 14 00:47:08 shichu systemd[1]: Starting Zabbix Server... Jul 14 00:47:09 shichu systemd[1]: Started Zabbix Server.
-
zabbix-agent服务状态
[root@shichu apps]# systemctl status zabbix-agent ● zabbix-agent.service - Zabbix Agent Loaded: loaded (/usr/lib/systemd/system/zabbix-agent.service; enabled; vendor preset: disabled) Active: active (running) since Thu 2022-07-14 00:47:09 CST; 58s ago Process: 8349 ExecStart=/apps/zabbix/sbin/zabbix_agentd -c $CONFFILE (code=exited, status=0/SUCCESS) Main PID: 8353 (zabbix_agentd) CGroup: /system.slice/zabbix-agent.service ├─8353 /apps/zabbix/sbin/zabbix_agentd -c /apps/zabbix/etc/zabbix_agentd.conf ├─8354 /apps/zabbix/sbin/zabbix_agentd: collector [idle 1 sec] ├─8355 /apps/zabbix/sbin/zabbix_agentd: listener #1 [waiting for connection] ├─8356 /apps/zabbix/sbin/zabbix_agentd: listener #2 [waiting for connection] ├─8357 /apps/zabbix/sbin/zabbix_agentd: listener #3 [waiting for connection] └─8358 /apps/zabbix/sbin/zabbix_agentd: active checks #1 [idle 1 sec] Jul 14 00:47:08 shichu systemd[1]: Starting Zabbix Agent... Jul 14 00:47:09 shichu systemd[1]: Started Zabbix Agent.
启动
5. 配置Web界面
初始化设置
浏览器访问本地IP(10.0.0.100)
- 本地环境检查
- 配置数据库信息
- 配置zabbix信息
- 信息确认
- 创建配置
需要手动下载配置文件上传至zabbix sever的/home/nginx/zabbix/conf/目录下
- 完成安装
- 登录
默认用户名:Admin #注意A是大写
密码:zabbix
- 进入首页
优化设置
设置中文菜单
显示中文
解决监控项乱码
- 监控项存在乱码
- 从Windows选择一种字体,如楷体(simkai.ttf)
- 上传字体至zabbix web目录
具体路径为:/home/nginx/zabbix/assets/fonts
- 修改zabbix调用字体
vim /home/nginx/zabbix/include/defines.inc.php
#修改如下两处即可
//define('ZBX_GRAPH_FONT_NAME', 'DejaVuSans'); // font file name
define('ZBX_GRAPH_FONT_NAME', 'simkai'); // font file name
#define('ZBX_FONT_NAME', 'DejaVuSans');
define('ZBX_FONT_NAME', 'simkai');
- 验证字体生效
字体自动生效,无需重启zabbix及nginx服务
七、实现 Nginx、Mysql 的监控
flowchart TB
zabbix[Zabbix Server</br>10.0.0.100]
mysql-m[Master</br>10.0.0.17]
mysql-s[Slave</br>10.0.0.27]
nginx[Nginx</br>10.0.0.7]
subgraph Mysql
mysql-m<–>mysql-s
end
zabbix—>nginx
zabbix—>Mysql
1. 安装zabbix agent
-
通过yum安装agent
yum install zabbix50-agent
-
修改agent配置文件
[root@nginx ~]# grep '^[a-Z]' /etc/zabbix_agentd.conf PidFile=/run/zabbix/zabbix_agentd.pid LogFile=/var/log/zabbix/zabbix_agentd.log LogFileSize=0 Server=10.0.0.100 #zabbix-server的IP或Proxy的IP ListenPort=10050 #监听端口,默认值 StartAgents=3 #被动状态是默认启动的进程数,为0不监听任何端口 ServerActive=10.0.0.100 #主动模式下的zabbix-server的IP或Proxy的IP Hostname=10.0.0.7 #区分大小写且在zabbix server中值唯一,默认填本机IP Include=/etc/zabbix_agentd.conf.d/*.conf #在文件末尾新增子配置文件路径
启动服务
mkdir -p /etc/zabbix_agentd.conf.d
systemctl start zabbix-agent
查看状态
[root@nginx ~]# systemctl status zabbix-agent ● zabbix-agent.service - Zabbix Monitor Agent Loaded: loaded (/usr/lib/systemd/system/zabbix-agent.service; enabled; vendor preset: disabled) Active: active (running) since Thu 2022-07-14 16:07:35 CST; 1s ago Main PID: 1511 (zabbix_agentd) CGroup: /system.slice/zabbix-agent.service ├─1511 /usr/sbin/zabbix_agentd -f ├─1512 /usr/sbin/zabbix_agentd: collector [idle 1 sec] ├─1513 /usr/sbin/zabbix_agentd: listener #1 [waiting for connection] ├─1514 /usr/sbin/zabbix_agentd: listener #2 [waiting for connection] └─1515 /usr/sbin/zabbix_agentd: listener #3 [waiting for connection] Jul 14 16:07:35 nginx systemd[1]: Stopped Zabbix Monitor Agent. Jul 14 16:07:35 nginx systemd[1]: Started Zabbix Monitor Agent. Jul 14 16:07:35 nginx zabbix_agentd[1511]: Starting Zabbix Agent [10.0.0.7]. Zabbix 5.0.21 (revision 47104dd574). Jul 14 16:07:35 nginx zabbix_agentd[1511]: Press Ctrl+C to exit.
-
web界面添加被监控主机
配置——主机——创建主机
2. 实现监控Nginx
- 准备nginx状态页
#添加nginx状态配置
[root@nginx ~]# cat /etc/nginx/nginx.conf
#在server块中添加状态页信息
...
location /nginx_status {
stub_status;
allow 10.0.0.0/24;
allow 127.0.0.1;
}
- 准备nginx监控脚本
[root@nginx etc]# cat /etc/zabbix_agentd.d/nginx_status.sh
#!/bin/bash
nginx_status_fun(){ #函数内容
NGINX_PORT=$1 #端口,函数的第一个参数是脚本的第二个参数,即脚本的第二个参数是端口号
NGINX_COMMAND=$2 #命令,函数的第二个参数是脚本的第三个参数,即脚本的第三个参数是命令
nginx_active(){ #获取nginx_active数量,以下相同,这是开启了nginx状态但是只能从本机看到
/usr/bin/curl "http://127.0.0.1:"$NGINX_PORT"/nginx_status" 2>/dev/null| grep 'Active' | awk '{print $NF}'
}
nginx_reading(){ #获取状态的数量
/usr/bin/curl "http://127.0.0.1:"$NGINX_PORT"/nginx_status" 2>/dev/null| grep 'Reading' | awk '{print $2}'
}
nginx_writing(){
/usr/bin/curl "http://127.0.0.1:"$NGINX_PORT"/nginx_status" 2>/dev/null| grep 'Writing' | awk '{print $4}'
}
nginx_waiting(){
/usr/bin/curl "http://127.0.0.1:"$NGINX_PORT"/nginx_status" 2>/dev/null| grep 'Waiting' | awk '{print $6}'
}
nginx_accepts(){
/usr/bin/curl "http://127.0.0.1:"$NGINX_PORT"/nginx_status" 2>/dev/null| awk NR==3 | awk '{print $1}'
}
nginx_handled(){
/usr/bin/curl "http://127.0.0.1:"$NGINX_PORT"/nginx_status" 2>/dev/null| awk NR==3 | awk '{print $2}'
}
nginx_requests(){
/usr/bin/curl "http://127.0.0.1:"$NGINX_PORT"/nginx_status" 2>/dev/null| awk NR==3 | awk '{print $3}'
}
case $NGINX_COMMAND in
active)
nginx_active;
;;
reading)
nginx_reading;
;;
writing)
nginx_writing;
;;
waiting)
nginx_waiting;
;;
accepts)
nginx_accepts;
;;
handled)
nginx_handled;
;;
requests)
nginx_requests;
esac
}
main(){ #主函数内容
case $1 in
nginx_status) #分支结构,用于判断用户的输入而进行响应的操作
nginx_status_fun $2 $3; #当输入nginx_status就调用nginx_status_fun,并传递第二和第三个参数
;;
status) #获取状态码
curl -I -s http://127.0.0.1/nginx_status 2>/dev/null | awk 'NR==1{print $2}';
;; # -I仅输出HTTP请求头,-s不输出任何东西
*) #其他的输入打印帮助信息
echo $"Usage: $0 {nginx_status key}"
esac
}
main $1 $2 $3
-
添加zabbix agent自定义监控项(通过子配置文件方式)
- 创建子配置文件
[root@nginx etc]# cat /etc/zabbix_agentd.conf.d/nginx_monitor.conf UserParameter=nginx_status[*],/etc/zabbix_agentd.d/nginx_status.sh "$1" "$2" "$3"
-
验证测试
#重启服务
systemctl restart nginx zabbix-agent
#本地获取所有nginx状态
[root@nginx zabbix_agentd.d]# curl 127.0.0.1/nginx_status
Active connections: 1
server accepts handled requests
21 21 21
Reading: 0 Writing: 1 Waiting: 0
#本机获取active连接数
[root@nginx zabbix_agentd.d]# /etc/zabbix_agentd.d/nginx_status.sh nginx_status 80 active
1
#server获取active连接数
[root@zabbix ~]# /apps/zabbix/bin/zabbix_get -s 10.0.0.7 -p 10050 -k "nginx_status["nginx_status",80,"active"]"
1
-
导入监控模板
模板参考:nginx-template.xml
关联模板
查看导入的nginx模板监控项
-
验证监控
3. 实现监控Mysql
1)搭建mysql主从
master(10.0.0.17)
#修改配置
vim /etc/my.cnf.d/server.cnf
[mysqld]
bind=0.0.0.0
server-id=17
log-bin
#重启数据库
systemctl restart mariadb
#创建复制用户
MariaDB [(none)]> create user 'repluser'@'10.0.0.%';
Query OK, 0 rows affected (0.00 sec)
#授权复制用户权限
MariaDB [(none)]> grant replication slave on *.* to 'repluser'@'10.0.0.%';
Query OK, 0 rows affected (0.00 sec)
#备份数据
[root@mysql-master ~]# mysqldump --all-databases --single_transaction --flush-logs --master-data=2 /
--lock-tables > /opt/backup.sql
#将备份数据复制到slave节点
[root@mysql-master ~]# scp /opt/backup.sql 10.0.0.27:/opt/
#查看二进制文件和位置
[root@mysql-master ~]# mysql
MariaDB [(none)]> show master logs;
+--------------------+-----------+
| Log_name | File_size |
+--------------------+-----------+
| mariadb-bin.000001 | 29733 |
| mariadb-bin.000002 | 245 |
+--------------------+-----------+
2 rows in set (0.00 sec)
slave(10.0.0.27)
#修改配置
vim /etc/my.cnf.d/server.cnf
[mysqld]
bind=0.0.0.0
server-id=27
read-only
#重启数据库
systemctl restart mariadb
# 导入master节点备份数据
[root@slave ~]# mysql < /opt/backup.sql
#根据master信息开启同步设置
#其中MASTER_LOG_FILE、MASTER_LOG_POS对应master节点中Log_name、File_size(可通过命令show master logs查看)
[root@mysql-slave ~]# mysql
MariaDB [(none)]> CHANGE MASTER TO
MASTER_HOST='10.0.0.17',
MASTER_USER='repluser',
MASTER_PASSWORD='',
MASTER_PORT=3306,
MASTER_LOG_FILE='mariadb-bin.000001',
MASTER_LOG_POS=29733,
MASTER_CONNECT_RETRY=10;
#开启slave
MariaDB [(none)]> start slave;
#显示状态信息
MariaDB [(none)]> show slave status/G;
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 10.0.0.17
Master_User: repluser
Master_Port: 3306
Connect_Retry: 10
Master_Log_File: mariadb-bin.000002
Read_Master_Log_Pos: 245
Relay_Log_File: mariadb-relay-bin.000003
Relay_Log_Pos: 531
Relay_Master_Log_File: mariadb-bin.000002
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
......
Master_Server_Id: 17
2)利用percona工具实现监控
官网下载地址:https://www.percona.com/downloads/
安装包:https://www.percona.com/downloads/percona-monitoring-plugins/LATEST/
- 安装percona插件
#下载
wget https://downloads.percona.com/downloads/percona-monitoring-plugins/percona-monitoring-plugins-1.1.8/binary/redhat/7/x86_64/percona-zabbix-templates-1.1.8-1.noarch.rpm
#安装
yum install -y percona-zabbix-templates-1.1.8-1.noarch.rpm
#安装php
yum install -y php php-mysql
#复制模板
cp /var/lib/zabbix/percona/templates/userparameter_percona_mysql.conf /etc/zabbix_agentd.conf.d/
#创建连接mysql数据库的php配置文件
vim /var/lib/zabbix/percona/scripts/ss_get_mysql_stats.php.cnf
<?php
$mysql_user = 'root';
$mysql_pass = '';
#重启
systemctl restart zabbix-agent
- 在zabbix-server上测试
[root@zabbix ~]# /apps/zabbix/bin/zabbix_get -s 10.0.0.17 -p 10050 -k MySQL.Key-reads
19
[root@zabbix ~]# /apps/zabbix/bin/zabbix_get -s 10.0.0.27 -p 10050 -k MySQL.Key-reads
0
-
关联主机模板
注意:默认的模板/var/lib/zabbix/percona/templates/zabbix_agent_template_percona_mysql_server_ht_2.0.9-sver1.1.8.xml不可用,需要进行修改。
- 查看监控状态
- 监控类型更改为主动式
-
验证监控
4. 问题
1. 主动模式下监控数据正常,但ZBX图标为灰色未变绿
解决方法:将模板Template OS Linux by Zabbix agent active中的链接模板Template Module Zabbix agent active先取消链接并清理,再添加Template Module Zabbix agent模板。
ZBX图标变绿
八、zabbix实现故障和恢复的邮件通知
1. 实现故障自治愈
1)agent开启远程执行命令权限
[root@nginx tmp]# grep '^[a-Z]' /etc/zabbix_agentd.conf
PidFile=/run/zabbix/zabbix_agentd.pid
LogFile=/var/log/zabbix/zabbix_agentd.log
LogFileSize=0
EnableRemoteCommands=1 #开启远程执行命令功能
Server=10.0.0.100
ListenPort=10050
StartAgents=3
ServerActive=10.0.0.100
Hostname=10.0.0.7
User=zabbix
UnsafeUserParameters=1 #允许远程执行命令的时候使用不安全的参数(特殊的字符串)
Include=/etc/zabbix_agentd.conf.d/*.conf
2)agent添加zabbix用户授权
[root@nginx ~]# vim /etc/sudoers
......
root ALL=(ALL) ALL
zabbix ALL=NOPASSWD:ALL #授权zabbix用户执行特殊命令不再需要密码,比如sudo命令
重启服务
systemctl restart zabbix-agent
3)创建动作
- 添加动作名称和执行条件
-
添加具体操作指令
远程执行的命令要写绝对路径
2. 实现邮件通知
1) 邮箱开启SMTP
进入个人邮箱,开启SMTP功能
发短信获取授权码
2) 创建报警媒介类型
设置邮箱参考:https://service.mail.qq.com/cgi-bin/help?subtype=1&&id=28&&no=371
密码是前面获取的授权码
3)给用户添加报警媒介
选择Admin用户
选择报警媒介,点击添加
类型选择前面创建的报警媒介,收件人选择要发送信息的对象
更新报警媒介
4)创建动作
- 在自治愈动作上添加发送邮件操作
- 添加故障发生时、故障恢复后的操作
发送故障时的邮件通告内容
恢复后的邮件通告内容
3. 验证故障告警邮件及恢复邮件通告功能
1)关闭nginx服务
查看80端口
nginx自动恢复
2)zabbix能够自动执行恢复指令及发送通知邮件
3)登录个人邮箱,查看告警邮件信息
原创文章,作者:745907710,如若转载,请注明出处:https://blog.ytso.com/274679.html