解决在节点和实例VM 较多的情况下,创建实例报错:
对应Neutron OpenvSwitch Agent 的错误:
经查询相关资料,在Juno 之前的版本,RPC 存在随节点增加,以指数方式增长的问题。
此外,还有使用iptables 完成security_group 设置需时较长的问题。
创建实例时,没创建一个Port,此时,因为系统中某个安全组有成员变化,所以需要通知到各个节点,传递这样一个信息:一些安全组中的成员有变化,如果你有对这些安全组的引用,请更新对应的iptables规则。对于linux bridge和ovs来说,需要由neutron l2 agent处理更新请求。
这两项结合起来,导致在宿主机节点和VM 较多的情况下,security_group 每个返回时间较长,port 创建rpc timeout:
最终Nova 在等待Neutron 创建Port 超时,就报Virtual Interface creation failed 错误。
1. 故障描述
创建实例失败:
2015-02-03 13:00:40.263 170448 INFO oslo.messaging._drivers.impl_rabbit [-] Reconnecting to AMQP server on 240.10.129.40:5672
2015-02-03 13:00:40.263 170448 INFO oslo.messaging._drivers.impl_rabbit [-] Delaying reconnect for 1.0 seconds…
2015-02-03 13:00:41.273 170448 ERROR oslo.messaging._drivers.impl_rabbit [-] AMQP server on 240.10.129.40:5672 is unreachable: [Errno 111] ECONNREFUSED. Trying again in 7 seconds.
2015-02-03 13:00:48.273 170448 INFO oslo.messaging._drivers.impl_rabbit [-] Reconnecting to AMQP server on 240.10.129.40:5672
2015-02-03 13:00:48.274 170448 INFO oslo.messaging._drivers.impl_rabbit [-] Delaying reconnect for 1.0 seconds…
2015-02-03 13:00:49.292 170448 INFO oslo.messaging._drivers.impl_rabbit [-] Connected to AMQP server on 240.10.129.40:5672
2015-02-03 13:16:24.465 183656 INFO oslo.messaging._drivers.impl_rabbit [-] Connected to AMQP server on 240.10.129.40:5672
2015-02-03 13:16:31.855 183890 INFO oslo.messaging._drivers.impl_rabbit [-] Connected to AMQP server on 240.10.129.40:5672
2015-02-03 14:22:42.338 183890 INFO oslo.messaging._drivers.impl_rabbit [-] Connected to AMQP server on 240.10.129.40:5672
2015-02-03 15:13:12.926 183890 ERROR nova.scheduler.filter_scheduler [req-3acf21fd-f802-49a7-8713-f131efa3445e 271a33b320b84b19aa6f44f97613c024 98e5fdd9e50f423881f49c845e1d26ad] [instance: 70af200c-bcc3-4307-afd6-049178d9174a] Error from last host: hh-yun-compute-130104.vclound.com (node hh-yun-compute-130104.vclound.com): [u’Traceback (most recent call last):\n’, u’ File “/usr/lib/python2.6/site-packages/nova/compute/manager.py”, line 1328, in _build_instance\n set_access_ip=set_access_ip)\n’, u’ File “/usr/lib/python2.6/site-packages/nova/compute/manager.py”, line 393, in decorated_function\n return function(self, context, *args, **kwargs)\n’, u’ File “/usr/lib/python2.6/site-packages/nova/compute/manager.py”, line 1740, in _spawn\n LOG.exception(_(\’Instance failed to spawn\’), instance=instance)\n’, u’ File “/usr/lib/python2.6/site-packages/nova/openstack/common/excutils.py”, line 68, in __exit__\n six.reraise(self.type_, self.value, self.tb)\n’, u’ File “/usr/lib/python2.6/site-packages/nova/compute/manager.py”, line 1737, in _spawn\n block_device_info)\n’, u’ File “/usr/lib/python2.6/site-packages/nova/virt/libvirt/driver.py”, line 2291, in spawn\n write_to_disk=True)\n’, u’ File “/usr/lib/python2.6/site-packages/nova/virt/libvirt/driver.py”, line 3480, in to_xml\n disk_info, rescue, block_device_info)\n’, u’ File “/usr/lib/python2.6/site-packages/nova/virt/libvirt/driver.py”, line 3294, in get_guest_config\n flavor)\n’, u’ File “/usr/lib/python2.6/site-packages/nova/virt/libvirt/vif.py”, line 384, in get_config\n _(“Unexpected vif_type=%s”) % vif_type)\n’, u’NovaException: Unexpected vif_type=binding_failed\n’]
对应Neutron 创建Port 失败的日志:
2015-02-03 11:21:14.664 31957 INFO neutron.agent.securitygroups_rpc [req-1023501d-6e4a-4729-a524-64e5dc9085e0 None] Security group member updated [u’d24baeb8-6958-45f3-85fc-27c3caff4b46′]
2015-02-03 11:21:16.867 31957 INFO neutron.agent.securitygroups_rpc [-] Refresh firewall rules
2015-02-03 11:22:16.872 31957 ERROR neutron.plugins.openvswitch.agent.ovs_neutron_agent [-] Error while processing VIF ports
2015-02-03 11:22:16.872 31957 TRACE neutron.plugins.openvswitch.agent.ovs_neutron_agent Traceback (most recent call last):
2015-02-03 11:22:16.872 31957 TRACE neutron.plugins.openvswitch.agent.ovs_neutron_agent File “/usr/lib/python2.6/site-packages/neutron/plugins/openvswitch/agent/ovs_neutron_agent.py”, line 1335, in rpc_loop
2015-02-03 11:22:16.872 31957 TRACE neutron.plugins.openvswitch.agent.ovs_neutron_agent ovs_restarted)
2015-02-03 11:22:16.872 31957 TRACE neutron.plugins.openvswitch.agent.ovs_neutron_agent File “/usr/lib/python2.6/site-packages/neutron/plugins/openvswitch/agent/ovs_neutron_agent.py”, line 1139, in process_network_ports
2015-02-03 11:22:16.872 31957 TRACE neutron.plugins.openvswitch.agent.ovs_neutron_agent port_info.get(‘updated’, set()))
2015-02-03 11:22:16.872 31957 TRACE neutron.plugins.openvswitch.agent.ovs_neutron_agent File “/usr/lib/python2.6/site-packages/neutron/agent/securitygroups_rpc.py”, line 268, in setup_port_filters
2015-02-03 11:22:16.872 31957 TRACE neutron.plugins.openvswitch.agent.ovs_neutron_agent self.refresh_firewall(updated_devices)
2015-02-03 11:22:16.872 31957 TRACE neutron.plugins.openvswitch.agent.ovs_neutron_agent File “/usr/lib/python2.6/site-packages/neutron/agent/securitygroups_rpc.py”, line 224, in refresh_firewall
2015-02-03 11:22:16.872 31957 TRACE neutron.plugins.openvswitch.agent.ovs_neutron_agent self.context, device_ids)
2015-02-03 11:22:16.872 31957 TRACE neutron.plugins.openvswitch.agent.ovs_neutron_agent File “/usr/lib/python2.6/site-packages/neutron/agent/securitygroups_rpc.py”, line 86, in security_group_rules_for_devices
2015-02-03 11:22:16.872 31957 TRACE neutron.plugins.openvswitch.agent.ovs_neutron_agent topic=self.topic)
2015-02-03 11:22:16.872 31957 TRACE neutron.plugins.openvswitch.agent.ovs_neutron_agent File “/usr/lib/python2.6/site-packages/neutron/openstack/common/rpc/proxy.py”, line 129, in call
2015-02-03 11:22:16.872 31957 TRACE neutron.plugins.openvswitch.agent.ovs_neutron_agent exc.info, real_topic, msg.get(‘method’))
2015-02-03 11:22:16.872 31957 TRACE neutron.plugins.openvswitch.agent.ovs_neutron_agent Timeout: Timeout while waiting on RPC response – topic: “q-plugin”, RPC method: “security_group_rules_for_devices” info: “
2015-02-03 11:22:16.872 31957 TRACE neutron.plugins.openvswitch.agent.ovs_neutron_agent
2015-02-03 11:22:16.873 31957 INFO neutron.plugins.openvswitch.agent.ovs_neutron_agent [-] Agent out of sync with plugin!
返回的是Timeout while waiting on RPC response 错误。
2. 相关问题描述
◎ Neutron中的security_group
http://blog.csdn.net/lynn_kong/article/details/13503847
创建一个port时,可以指定port所属的安全组(若不指定,则加入默认的安全组),此时,因为系统中某个安全组有成员变化,所以需要通知到各个节点,传递这样一个信息:一些安全组中的成员有变化,如果你有对这些安全组的引用,请更新对应的iptables规则。对于linux bridge和ovs来说,需要由neutron l2 agent处理更新请求。
首先,l2 agent初始化时,在加载IptablesFirewallDriver时就会初始化一些iptables的配置。
◎ Neutron security_group_rules_for_devices RPC rewrite
http://www.ajo.es/post/95269040924/neutron-security-group-rules-for-devices-rpc
这里指出,在Icehouse 版本中RPC 通讯是以指数的方式增长。
So we filled a spec for juno-3, the effort leaded by shihanzhang and me can be tracked here:
https://review.openstack.org/#/c/111876/
https://review.openstack.org/#/c/115575/
改进的补丁,但只适用于Juno 版本。
◎ What’s New in Neutron for OpenStack Juno
http://www.tuicool.com/articles/yi2iaa
There are some well known issues around security group scaling with previous versions of OpenStack Neutron. In Juno, we’ve addressed these issues with two very important blueprints: The addition of ipset in lieu of iptables to manage security group rules on compute nodes, and the refactoring of the security_group_rules_for_devices RPC call. Both of these additions are meant to scale and dramatically improve the performance of the security groups implementations of Neutron.
这是一个Juno 之前的版本存在的问题。
◎ HOW WE SCALED OPENSTACK TO LAUNCH 168,000 CLOUD INSTANCES
https://javacruft.wordpress.com/2014/06/18/168k-instances/
这是一份测试中遇到的两个瓶颈,其中第一个是通过增加neutron-server api 和nova api 的进程数,这个已经完成;第二个提及的问题是,当宿主机和实例很多的时候,创建每个实例,都需要通过所有的宿主机,并且通过iptables 完成security_group 的设置,并等待返回。而由于上面的RPC 问题,进行该操作的时候,iptables 返回时间很长,导致队列等待。
3. 解决问题
综上所述,在OpenStack Icehouse 版本中,暂时只能通过关闭Neutron 的security_group 驱动来规避。这样,将暂时无法使用安全组特性(改为原来Nova 的security_group)。
不过,由于在生产环境中,防火墙是由硬件实现的,所以,该操作对实例的使用影响不大。(存在IP 被盗用的风险)
相关配置项:
Nova:
#
# Options defined in nova.network.security_group.openstack_driver
#
# The full class name of the security API class (string value)
security_group_api=nova
#security_group_api=neutron
Neutron:
[securitygroup]
# Controls if neutron security group is enabled or not.
# It should be false when you use nova security group.
# enable_security_group = True
enable_security_group = True
firewall_driver=True
[securitygroup]
# Controls if neutron security group is enabled or not.
# It should be false when you use nova security group.
# enable_security_group = True
#enable_security_group = True
enable_security_group = False
#firewall_driver=True
firewall_driver=False
[root@hh-yun-compute-130149 ~]# vim /etc/neutron/plugins/openvswitch/ovs_neutron_plugin.ini
[securitygroup]
# Firewall driver for realizing neutron security group function.
# firewall_driver = neutron.agent.firewall.NoopFirewallDriver
# Example: firewall_driver = neutron.agent.linux.iptables_firewall.OVSHybridIptablesFirewallDriver
# Controls if neutron security group is enabled or not.
# It should be false when you use nova security group.
# enable_security_group = True
enable_security_group = False
[SECURITYGROUP]
#firewall_driver=neutron.agent.linux.iptables_firewall.OVSHybridIptablesFirewallDriver
Nova-Compute:
#
# Options defined in nova.network.security_group.openstack_driver
#
# The full class name of the security API class (string value)
security_group_api=nova
#security_group_api=neutron
[root@hh-yun-compute-130149 ~]# vim /etc/neutron/plugins/openvswitch/ovs_neutron_plugin.ini
[securitygroup]
# Firewall driver for realizing neutron security group function.
# firewall_driver = neutron.agent.firewall.NoopFirewallDriver
# Example: firewall_driver = neutron.agent.linux.iptables_firewall.OVSHybridIptablesFirewallDriver
# Controls if neutron security group is enabled or not.
# It should be false when you use nova security group.
# enable_security_group = True
enable_security_group = False
[SECURITYGROUP]
#firewall_driver=neutron.agent.linux.iptables_firewall.OVSHybridIptablesFirewallDriver
4. 测试验证
并发创建100 台实例:
real 0m60.583s
user 0m0.427s
sys 0m0.075s
并发创建300 台实例:
[hyphen@hh-yun-puppet-129021 ~(unlimit_hyphen)]$ nova boot –flavor b2c_web_1core –num-instances 300 –image ‘Centos6.3_x86_64_1.2’ –security_group default –nic net-id=e302ca3a-dc19-4387-90ef-f5eb188e98cd hyphen_demo_server –user-data change_pwd.sh && time while true;do if [ `nova list|grep -i active|wc -l` -eq 300 ];then break;else sleep 5;fi;done
real 2m23.233s
user 0m12.334s
sys 0m1.423s
实例全部创建成功,查看RabbitMQ 的监控页面,q-plugin Queues 的Messages 等待队列基本为0,问题解决。
解决 OpenvSwitch terminating with signal 14 (Alarm clock) 错误
使用RDO juno dev1462 部署mongodb 失败的问题
Neutron 网络架构讲解_VLAN 网络
原创文章,作者:kepupublish,如若转载,请注明出处:https://blog.ytso.com/98249.html