[原]解决CentOS 7 下corosync 2.3.3 无法组成两个节点集群的问题

采用corosync 构成Pacemaker 集群。但发现启动corosync 服务后，不会自动启动pacemaker 服务。
经确认，在CentOS 7 的corosync 2.3.3 下，pacemaker 默认是disable 的，需要自行激活。

启动corosync 服务后，发现两个节点无法构成集群，没有Nodes：

引用

[root@gz-controller-209100 ~]# crm status
Last updated: Mon May  4 14:43:13 2015
Last change: Mon May  4 14:26:45 2015
Current DC: NONE
0 Nodes configured
0 Resources configured

1.排查
经分析，corosync 服务和pacemaker 服务启动都是正常的。但日志中显示 quorum 没有配置：

引用

[root@gz-controller-209100 corosync]# systemctl status pacemaker
pacemaker.service – Pacemaker High Availability Cluster Manager
   Loaded: loaded (/usr/lib/systemd/system/pacemaker.service; disabled)
   Active: active (running) since 一 2015-05-04 11:59:10 CST; 1s ago
Main PID: 8378 (pacemakerd)
   CGroup: /system.slice/pacemaker.service
           ├─8378 /usr/sbin/pacemakerd -f
           ├─8379 /usr/libexec/pacemaker/cib
           ├─8380 /usr/libexec/pacemaker/stonithd
           ├─8381 /usr/libexec/pacemaker/lrmd
Attempting connection to the cluster…
Last updated: Mon May  4 12:03:39 2015
Last change: Mon May  4 11:59:10 2015
Current DC: NONE
0 Nodes configured
0 Resources configured
           ├─8382 /usr/libexec/pacemaker/attrd
           ├─8383 /usr/libexec/pacemaker/pengine
           └─8384 /usr/libexec/pacemaker/crmd

5月 04 11:59:10 gz-controller-209100.vclound.com cib[8379]: notice: get_node_name: Could not obtain a node name for corosync nodeid 1084805476
5月 04 11:59:10 gz-controller-209100.vclound.com cib[8379]: notice: corosync_node_name: Unable to get node name for nodeid 1084805476
5月 04 11:59:10 gz-controller-209100.vclound.com cib[8379]: notice: get_node_name: Defaulting to uname -n for the local corosync node name
5月 04 11:59:11 gz-controller-209100.vclound.com crmd[8384]: notice: crm_cluster_connect: Connecting to cluster infrastructure: corosync
5月 04 11:59:11 gz-controller-209100.vclound.com crmd[8384]: notice: corosync_node_name: Unable to get node name for nodeid 1084805476
5月 04 11:59:11 gz-controller-209100.vclound.com crmd[8384]: notice: get_node_name: Could not obtain a node name for corosync nodeid 1084805476
5月 04 11:59:11 gz-controller-209100.vclound.com crmd[8384]: notice: corosync_node_name: Unable to get node name for nodeid 1084805476
5月 04 11:59:11 gz-controller-209100.vclound.com crmd[8384]: notice: get_node_name: Defaulting to uname -n for the local corosync node name
5月 04 11:59:11 gz-controller-209100.vclound.com crmd[8384]: error: cluster_connect_quorum: Corosync quorum is not configured
5月 04 11:59:11 gz-controller-209100.vclound.com stonith-ng[8380]: notice: setup_cib: Watching for stonith topology changes

2.解决
参考 man votequorum 的说明，增加 quorum 配置段。

把节点一的配置文件修改为：

[root@gz-controller-209100 ~]# cat /etc/corosync/corosync.conf compatibility: whitetank totem { version: 2 token: 3000 token_retransmits_before_loss_const: 10 join: 60 consensus: 3600 vsftype: none max_messages: 20 clear_node_high_bit: yes rrp_mode: none secauth: on threads: 2 interface { ringnumber: 0 bindnetaddr: 192.168.209.0 # 此为监听的网段，非固定IP mcastaddr: 239.32.12.5 mcastport: 5405 } } logging { fileline: off to_stderr: yes to_logfile: yes to_syslog: no logfile: /var/log/cluster/corosync.log syslog_facility: daemon debug: off timestamp: on logger_subsys { subsys: AMF debug: off tags: enter|leave|trace1|trace2|trace3|trace4|trace6 } } amf { mode: disabled } aisexec { user: root group: root } quorum { provider: corosync_votequorum expected_votes: 2 two_node: 1 }
重启corosync 和 pacemaker 服务：

引用

[root@gz-controller-209100 ~]# systemctl restart corosync
[root@gz-controller-209100 ~]# systemctl restart pacemaker

再次查看集群信息：

引用

[root@gz-controller-209100 ~]# crm status
Last updated: Mon May 4 14:44:47 2015
Last change: Mon May 4 14:43:33 2015 via crmd on gz-controller-209100.vclound.com
Stack: corosync
Current DC: gz-controller-209100.vclound.com (1084805476) – partition with quorum
Version: 1.1.10-32.el7_0.1-368c726
1 Nodes configured
0 Resources configured

Online: [ gz-controller-209100.vclound.com ]

节点一已经加入集群。

把配置文件拷贝到第二个节点：

引用

[root@gz-controller-209100 ~]# scp /etc/corosync/corosync.conf 192.168.209.101:/etc/corosync/

重启服务：

引用

[root@gz-controller-209101 ~]# systemctl restart corosync
[root@gz-controller-209101 ~]# systemctl restart pacemaker

集群状态：

引用

[root@gz-controller-209100 ~]# crm status
Last updated: Mon May 4 14:44:55 2015
Last change: Mon May 4 14:44:53 2015 via crmd on gz-controller-209100.vclound.com
Stack: corosync
Current DC: gz-controller-209100.vclound.com (1084805476) – partition with quorum
Version: 1.1.10-32.el7_0.1-368c726
2 Nodes configured
0 Resources configured

Online: [ gz-controller-209100.vclound.com gz-controller-209101.vclound.com ]

两个节点都已经加入集群，问题解决。

3.遗留问题
执行pcs status 的时候有报错

引用

[root@gz-controller-209100 ~]# pcs status
Cluster name:
WARNING: no stonith devices and stonith-enabled is not false
Last updated: Mon May 4 15:09:08 2015
Last change: Mon May 4 14:44:53 2015 via crmd on gz-controller-209100.vclound.com
Stack: corosync
Current DC: gz-controller-209100.vclound.com (1084805476) – partition with quorum
Version: 1.1.10-32.el7_0.1-368c726
2 Nodes configured
0 Resources configured

Online: [ gz-controller-209100.vclound.com gz-controller-209101.vclound.com ]

Full list of resources:

PCSD Status:
Error: no nodes found in corosync.conf

参考：
Why is the message “Error: no nodes found in corosync.conf” in the output of “pcs cluster status” command ?
https://access.redhat.com/solutions/663283
决议
The errors need to be ignored as no corosync.conf file is used.
根源
The error messages seen are not harmful and are expected due to cman stack is being used.
所以，可以忽略该问题。

原创文章，作者：ItWorker，如若转载，请注明出处：https://blog.ytso.com/98247.html

[原]解决CentOS 7 下corosync 2.3.3 无法组成两个节点集群的问题

发表回复