采用corosync 构成Pacemaker 集群。但发现启动corosync 服务后,不会自动启动pacemaker 服务。
经确认,在CentOS 7 的corosync 2.3.3 下,pacemaker 默认是disable 的,需要自行激活。
启动corosync 服务后,发现两个节点无法构成集群,没有Nodes:
Last updated: Mon May 4 14:43:13 2015
Last change: Mon May 4 14:26:45 2015
Current DC: NONE
0 Nodes configured
0 Resources configured
1.排查
经分析,corosync 服务和pacemaker 服务启动都是正常的。但日志中显示 quorum 没有配置:
pacemaker.service – Pacemaker High Availability Cluster Manager
Loaded: loaded (/usr/lib/systemd/system/pacemaker.service; disabled)
Active: active (running) since 一 2015-05-04 11:59:10 CST; 1s ago
Main PID: 8378 (pacemakerd)
CGroup: /system.slice/pacemaker.service
├─8378 /usr/sbin/pacemakerd -f
├─8379 /usr/libexec/pacemaker/cib
├─8380 /usr/libexec/pacemaker/stonithd
├─8381 /usr/libexec/pacemaker/lrmd
Attempting connection to the cluster…
Last updated: Mon May 4 12:03:39 2015
Last change: Mon May 4 11:59:10 2015
Current DC: NONE
0 Nodes configured
0 Resources configured
├─8382 /usr/libexec/pacemaker/attrd
├─8383 /usr/libexec/pacemaker/pengine
└─8384 /usr/libexec/pacemaker/crmd
5月 04 11:59:10 gz-controller-209100.vclound.com cib[8379]: notice: get_node_name: Could not obtain a node name for corosync nodeid 1084805476
5月 04 11:59:10 gz-controller-209100.vclound.com cib[8379]: notice: corosync_node_name: Unable to get node name for nodeid 1084805476
5月 04 11:59:10 gz-controller-209100.vclound.com cib[8379]: notice: get_node_name: Defaulting to uname -n for the local corosync node name
5月 04 11:59:11 gz-controller-209100.vclound.com crmd[8384]: notice: crm_cluster_connect: Connecting to cluster infrastructure: corosync
5月 04 11:59:11 gz-controller-209100.vclound.com crmd[8384]: notice: corosync_node_name: Unable to get node name for nodeid 1084805476
5月 04 11:59:11 gz-controller-209100.vclound.com crmd[8384]: notice: get_node_name: Could not obtain a node name for corosync nodeid 1084805476
5月 04 11:59:11 gz-controller-209100.vclound.com crmd[8384]: notice: corosync_node_name: Unable to get node name for nodeid 1084805476
5月 04 11:59:11 gz-controller-209100.vclound.com crmd[8384]: notice: get_node_name: Defaulting to uname -n for the local corosync node name
5月 04 11:59:11 gz-controller-209100.vclound.com crmd[8384]: error: cluster_connect_quorum: Corosync quorum is not configured
5月 04 11:59:11 gz-controller-209100.vclound.com stonith-ng[8380]: notice: setup_cib: Watching for stonith topology changes
2.解决
参考 man votequorum 的说明,增加 quorum 配置段。
把节点一的配置文件修改为:
totem {
version: 2
token: 3000
token_retransmits_before_loss_const: 10
join: 60
consensus: 3600
vsftype: none
max_messages: 20
clear_node_high_bit: yes
rrp_mode: none
secauth: on
threads: 2
interface {
ringnumber: 0
bindnetaddr: 192.168.209.0 # 此为监听的网段,非固定IP
mcastaddr: 239.32.12.5
mcastport: 5405
}
}
logging {
fileline: off
to_stderr: yes
to_logfile: yes
to_syslog: no
logfile: /var/log/cluster/corosync.log
syslog_facility: daemon
debug: off
timestamp: on
logger_subsys {
subsys: AMF
debug: off
tags: enter|leave|trace1|trace2|trace3|trace4|trace6
}
}
amf {
mode: disabled
}
aisexec {
user: root
group: root
}
quorum {
provider: corosync_votequorum
expected_votes: 2
two_node: 1
}
重启corosync 和 pacemaker 服务:
[root@gz-controller-209100 ~]# systemctl restart pacemaker
再次查看集群信息:
Last updated: Mon May 4 14:44:47 2015
Last change: Mon May 4 14:43:33 2015 via crmd on gz-controller-209100.vclound.com
Stack: corosync
Current DC: gz-controller-209100.vclound.com (1084805476) – partition with quorum
Version: 1.1.10-32.el7_0.1-368c726
1 Nodes configured
0 Resources configured
Online: [ gz-controller-209100.vclound.com ]
节点一已经加入集群。
把配置文件拷贝到第二个节点:
重启服务:
[root@gz-controller-209101 ~]# systemctl restart pacemaker
集群状态:
Last updated: Mon May 4 14:44:55 2015
Last change: Mon May 4 14:44:53 2015 via crmd on gz-controller-209100.vclound.com
Stack: corosync
Current DC: gz-controller-209100.vclound.com (1084805476) – partition with quorum
Version: 1.1.10-32.el7_0.1-368c726
2 Nodes configured
0 Resources configured
Online: [ gz-controller-209100.vclound.com gz-controller-209101.vclound.com ]
两个节点都已经加入集群,问题解决。
3.遗留问题
执行pcs status 的时候有报错
Cluster name:
WARNING: no stonith devices and stonith-enabled is not false
Last updated: Mon May 4 15:09:08 2015
Last change: Mon May 4 14:44:53 2015 via crmd on gz-controller-209100.vclound.com
Stack: corosync
Current DC: gz-controller-209100.vclound.com (1084805476) – partition with quorum
Version: 1.1.10-32.el7_0.1-368c726
2 Nodes configured
0 Resources configured
Online: [ gz-controller-209100.vclound.com gz-controller-209101.vclound.com ]
Full list of resources:
PCSD Status:
Error: no nodes found in corosync.conf
参考:
Why is the message “Error: no nodes found in corosync.conf” in the output of “pcs cluster status” command ?
https://access.redhat.com/solutions/663283
决议
The errors need to be ignored as no corosync.conf file is used.
根源
The error messages seen are not harmful and are expected due to cman stack is being used.
所以,可以忽略该问题。
原创文章,作者:ItWorker,如若转载,请注明出处:https://blog.ytso.com/tech/linux/98247.html