[原]创建实例报 Virtual Interface creation failed 的错误

    解决在节点和实例VM 较多的情况下,创建实例报错:

引用
Virtual Interface creation failed

对应Neutron OpenvSwitch Agent 的错误:

引用
Timeout while waiting on RPC response

经查询相关资料,在Juno 之前的版本,RPC 存在随节点增加,以指数方式增长的问题。
此外,还有使用iptables 完成security_group  设置需时较长的问题。

创建实例时,没创建一个Port,此时,因为系统中某个安全组有成员变化,所以需要通知到各个节点,传递这样一个信息:一些安全组中的成员有变化,如果你有对这些安全组的引用,请更新对应的iptables规则。对于linux bridge和ovs来说,需要由neutron l2 agent处理更新请求。

这两项结合起来,导致在宿主机节点和VM 较多的情况下,security_group 每个返回时间较长,port 创建rpc timeout:

引用
Timeout: Timeout while waiting on RPC response – topic: “q-plugin”, RPC method: “security_group_rules_for_devices” info: “

最终Nova 在等待Neutron 创建Port 超时,就报Virtual Interface creation failed 错误。

1.  故障描述
创建实例失败:

引用
[root@hh-yun-nova-129162 nova]# tail scheduler.log
2015-02-03 13:00:40.263 170448 INFO oslo.messaging._drivers.impl_rabbit [-] Reconnecting to AMQP server on 240.10.129.40:5672
2015-02-03 13:00:40.263 170448 INFO oslo.messaging._drivers.impl_rabbit [-] Delaying reconnect for 1.0 seconds…
2015-02-03 13:00:41.273 170448 ERROR oslo.messaging._drivers.impl_rabbit [-] AMQP server on 240.10.129.40:5672 is unreachable: [Errno 111] ECONNREFUSED. Trying again in 7 seconds.
2015-02-03 13:00:48.273 170448 INFO oslo.messaging._drivers.impl_rabbit [-] Reconnecting to AMQP server on 240.10.129.40:5672
2015-02-03 13:00:48.274 170448 INFO oslo.messaging._drivers.impl_rabbit [-] Delaying reconnect for 1.0 seconds…
2015-02-03 13:00:49.292 170448 INFO oslo.messaging._drivers.impl_rabbit [-] Connected to AMQP server on 240.10.129.40:5672
2015-02-03 13:16:24.465 183656 INFO oslo.messaging._drivers.impl_rabbit [-] Connected to AMQP server on 240.10.129.40:5672
2015-02-03 13:16:31.855 183890 INFO oslo.messaging._drivers.impl_rabbit [-] Connected to AMQP server on 240.10.129.40:5672
2015-02-03 14:22:42.338 183890 INFO oslo.messaging._drivers.impl_rabbit [-] Connected to AMQP server on 240.10.129.40:5672
2015-02-03 15:13:12.926 183890 ERROR nova.scheduler.filter_scheduler [req-3acf21fd-f802-49a7-8713-f131efa3445e 271a33b320b84b19aa6f44f97613c024 98e5fdd9e50f423881f49c845e1d26ad] [instance: 70af200c-bcc3-4307-afd6-049178d9174a] Error from last host: hh-yun-compute-130104.vclound.com (node hh-yun-compute-130104.vclound.com): [u’Traceback (most recent call last):\n’, u’  File “/usr/lib/python2.6/site-packages/nova/compute/manager.py”, line 1328, in _build_instance\n    set_access_ip=set_access_ip)\n’, u’  File “/usr/lib/python2.6/site-packages/nova/compute/manager.py”, line 393, in decorated_function\n    return function(self, context, *args, **kwargs)\n’, u’  File “/usr/lib/python2.6/site-packages/nova/compute/manager.py”, line 1740, in _spawn\n    LOG.exception(_(\’Instance failed to spawn\’), instance=instance)\n’, u’  File “/usr/lib/python2.6/site-packages/nova/openstack/common/excutils.py”, line 68, in __exit__\n    six.reraise(self.type_, self.value, self.tb)\n’, u’  File “/usr/lib/python2.6/site-packages/nova/compute/manager.py”, line 1737, in _spawn\n    block_device_info)\n’, u’  File “/usr/lib/python2.6/site-packages/nova/virt/libvirt/driver.py”, line 2291, in spawn\n    write_to_disk=True)\n’, u’  File “/usr/lib/python2.6/site-packages/nova/virt/libvirt/driver.py”, line 3480, in to_xml\n    disk_info, rescue, block_device_info)\n’, u’  File “/usr/lib/python2.6/site-packages/nova/virt/libvirt/driver.py”, line 3294, in get_guest_config\n    flavor)\n’, u’  File “/usr/lib/python2.6/site-packages/nova/virt/libvirt/vif.py”, line 384, in get_config\n    _(“Unexpected vif_type=%s”) % vif_type)\n’, u’NovaException: Unexpected vif_type=binding_failed\n’]

对应Neutron 创建Port 失败的日志:

引用
[root@hh-yun-compute-130070 ~]# vim /var/log/neutron/openvswitch-agent.log
2015-02-03 11:21:14.664 31957 INFO neutron.agent.securitygroups_rpc [req-1023501d-6e4a-4729-a524-64e5dc9085e0 None] Security group member updated [u’d24baeb8-6958-45f3-85fc-27c3caff4b46′]
2015-02-03 11:21:16.867 31957 INFO neutron.agent.securitygroups_rpc [-] Refresh firewall rules
2015-02-03 11:22:16.872 31957 ERROR neutron.plugins.openvswitch.agent.ovs_neutron_agent [-] Error while processing VIF ports
2015-02-03 11:22:16.872 31957 TRACE neutron.plugins.openvswitch.agent.ovs_neutron_agent Traceback (most recent call last):
2015-02-03 11:22:16.872 31957 TRACE neutron.plugins.openvswitch.agent.ovs_neutron_agent   File “/usr/lib/python2.6/site-packages/neutron/plugins/openvswitch/agent/ovs_neutron_agent.py”, line 1335, in rpc_loop
2015-02-03 11:22:16.872 31957 TRACE neutron.plugins.openvswitch.agent.ovs_neutron_agent     ovs_restarted)
2015-02-03 11:22:16.872 31957 TRACE neutron.plugins.openvswitch.agent.ovs_neutron_agent   File “/usr/lib/python2.6/site-packages/neutron/plugins/openvswitch/agent/ovs_neutron_agent.py”, line 1139, in process_network_ports
2015-02-03 11:22:16.872 31957 TRACE neutron.plugins.openvswitch.agent.ovs_neutron_agent     port_info.get(‘updated’, set()))
2015-02-03 11:22:16.872 31957 TRACE neutron.plugins.openvswitch.agent.ovs_neutron_agent   File “/usr/lib/python2.6/site-packages/neutron/agent/securitygroups_rpc.py”, line 268, in setup_port_filters
2015-02-03 11:22:16.872 31957 TRACE neutron.plugins.openvswitch.agent.ovs_neutron_agent     self.refresh_firewall(updated_devices)
2015-02-03 11:22:16.872 31957 TRACE neutron.plugins.openvswitch.agent.ovs_neutron_agent   File “/usr/lib/python2.6/site-packages/neutron/agent/securitygroups_rpc.py”, line 224, in refresh_firewall
2015-02-03 11:22:16.872 31957 TRACE neutron.plugins.openvswitch.agent.ovs_neutron_agent     self.context, device_ids)
2015-02-03 11:22:16.872 31957 TRACE neutron.plugins.openvswitch.agent.ovs_neutron_agent   File “/usr/lib/python2.6/site-packages/neutron/agent/securitygroups_rpc.py”, line 86, in security_group_rules_for_devices
2015-02-03 11:22:16.872 31957 TRACE neutron.plugins.openvswitch.agent.ovs_neutron_agent     topic=self.topic)
2015-02-03 11:22:16.872 31957 TRACE neutron.plugins.openvswitch.agent.ovs_neutron_agent   File “/usr/lib/python2.6/site-packages/neutron/openstack/common/rpc/proxy.py”, line 129, in call
2015-02-03 11:22:16.872 31957 TRACE neutron.plugins.openvswitch.agent.ovs_neutron_agent     exc.info, real_topic, msg.get(‘method’))
2015-02-03 11:22:16.872 31957 TRACE neutron.plugins.openvswitch.agent.ovs_neutron_agent Timeout: Timeout while waiting on RPC response – topic: “q-plugin”, RPC method: “security_group_rules_for_devices” info: “
2015-02-03 11:22:16.872 31957 TRACE neutron.plugins.openvswitch.agent.ovs_neutron_agent
2015-02-03 11:22:16.873 31957 INFO neutron.plugins.openvswitch.agent.ovs_neutron_agent [-] Agent out of sync with plugin!

返回的是Timeout while waiting on RPC response 错误。

2.  相关问题描述
◎ Neutron中的security_group
http://blog.csdn.net/lynn_kong/article/details/13503847
创建一个port时,可以指定port所属的安全组(若不指定,则加入默认的安全组),此时,因为系统中某个安全组有成员变化,所以需要通知到各个节点,传递这样一个信息:一些安全组中的成员有变化,如果你有对这些安全组的引用,请更新对应的iptables规则。对于linux bridge和ovs来说,需要由neutron l2 agent处理更新请求。
首先,l2 agent初始化时,在加载IptablesFirewallDriver时就会初始化一些iptables的配置。

◎ Neutron security_group_rules_for_devices RPC rewrite
http://www.ajo.es/post/95269040924/neutron-security-group-rules-for-devices-rpc

引用
We found during scalability tests, that the security_group_rules_for_devices RPC, which is transmitted from neutron-server to the neutron L2 agents during port changes, grew exponentially.

这里指出,在Icehouse 版本中RPC 通讯是以指数的方式增长。
So we filled a spec for juno-3, the effort leaded by shihanzhang and me can be tracked here:
https://review.openstack.org/#/c/111876/
https://review.openstack.org/#/c/115575/
改进的补丁,但只适用于Juno 版本。

◎ What’s New in Neutron for OpenStack Juno
http://www.tuicool.com/articles/yi2iaa

引用
Security Group Enhancements
There are some well known issues around security group scaling with previous versions of OpenStack Neutron. In Juno, we’ve addressed these issues with two very important blueprints: The addition of ipset in lieu of iptables to manage security group rules on compute nodes, and the refactoring of the security_group_rules_for_devices RPC call. Both of these additions are meant to scale and dramatically improve the performance of the security groups implementations of Neutron.

这是一个Juno 之前的版本存在的问题。

◎ HOW WE SCALED OPENSTACK TO LAUNCH 168,000 CLOUD INSTANCES
https://javacruft.wordpress.com/2014/06/18/168k-instances/

引用
At around 170 instances per compute server, we hit our next bottleneck; the Neutron agent status on compute nodes started to flap, with agents being marked down as instances were being created. After some investigation, it turned out that the time required to parse and then update the iptables firewall rules at this instance density took longer than the default agent timeout – hence why agents kept dropping out from Neutrons perspective. This resulted in virtual interface (VIF) creation timing out and we started to see instance activation failures when trying to create more that a few instances in parallel. Without an immediate fix for this issue (see bug 1314189), we took the decision to turn Neutron security groups off in the deployment and run without any VIF level iptables security. This was applied using the nova-compute charm we were using, but is obviously not something that will make it back into the official charm in the Juju charm store.

这是一份测试中遇到的两个瓶颈,其中第一个是通过增加neutron-server api 和nova api 的进程数,这个已经完成;第二个提及的问题是,当宿主机和实例很多的时候,创建每个实例,都需要通过所有的宿主机,并且通过iptables 完成security_group 的设置,并等待返回。而由于上面的RPC 问题,进行该操作的时候,iptables 返回时间很长,导致队列等待。

3.  解决问题
综上所述,在OpenStack Icehouse 版本中,暂时只能通过关闭Neutron 的security_group 驱动来规避。这样,将暂时无法使用安全组特性(改为原来Nova 的security_group)。
不过,由于在生产环境中,防火墙是由硬件实现的,所以,该操作对实例的使用影响不大。(存在IP 被盗用的风险)
相关配置项:
Nova:

引用
[root@hh-yun-nova-129161 ~]# vim /etc/nova/nova.conf
#
# Options defined in nova.network.security_group.openstack_driver
#
# The full class name of the security API class (string value)
security_group_api=nova
#security_group_api=neutron

Neutron:

引用
[root@hh-yun-neutron-129141 ~]# vim /etc/neutron/plugins/ml2/ml2_conf.ini
[securitygroup]
# Controls if neutron security group is enabled or not.
# It should be false when you use nova security group.
# enable_security_group = True
enable_security_group = True
firewall_driver=True
[securitygroup]
# Controls if neutron security group is enabled or not.
# It should be false when you use nova security group.
# enable_security_group = True
#enable_security_group = True
enable_security_group = False
#firewall_driver=True
firewall_driver=False

[root@hh-yun-compute-130149 ~]# vim /etc/neutron/plugins/openvswitch/ovs_neutron_plugin.ini
[securitygroup]
# Firewall driver for realizing neutron security group function.
# firewall_driver = neutron.agent.firewall.NoopFirewallDriver
# Example: firewall_driver = neutron.agent.linux.iptables_firewall.OVSHybridIptablesFirewallDriver
# Controls if neutron security group is enabled or not.
# It should be false when you use nova security group.
# enable_security_group = True
enable_security_group = False
[SECURITYGROUP]
#firewall_driver=neutron.agent.linux.iptables_firewall.OVSHybridIptablesFirewallDriver

Nova-Compute:

引用
[root@hh-yun-nova-129161 ~]# vim /etc/nova/nova.conf
#
# Options defined in nova.network.security_group.openstack_driver
#
# The full class name of the security API class (string value)
security_group_api=nova
#security_group_api=neutron

[root@hh-yun-compute-130149 ~]# vim /etc/neutron/plugins/openvswitch/ovs_neutron_plugin.ini
[securitygroup]
# Firewall driver for realizing neutron security group function.
# firewall_driver = neutron.agent.firewall.NoopFirewallDriver
# Example: firewall_driver = neutron.agent.linux.iptables_firewall.OVSHybridIptablesFirewallDriver
# Controls if neutron security group is enabled or not.
# It should be false when you use nova security group.
# enable_security_group = True
enable_security_group = False
[SECURITYGROUP]
#firewall_driver=neutron.agent.linux.iptables_firewall.OVSHybridIptablesFirewallDriver

4.  测试验证
并发创建100 台实例:

引用
[hyphen@hh-yun-puppet-129021 ~(unlimit_hyphen)]$ time nova boot –flavor mysql_db –num-instances 100 –image ‘Centos6.3_x86_64_1.2’ –security_group default –nic net-id=e302ca3a-dc19-4387-90ef-f5eb188e98cd hyphen_demo_server –user-data change_pwd.sh –poll
real    0m60.583s
user    0m0.427s
sys     0m0.075s
并发创建300 台实例:
[hyphen@hh-yun-puppet-129021 ~(unlimit_hyphen)]$ nova boot –flavor b2c_web_1core –num-instances 300 –image ‘Centos6.3_x86_64_1.2’ –security_group default –nic net-id=e302ca3a-dc19-4387-90ef-f5eb188e98cd hyphen_demo_server –user-data change_pwd.sh && time while true;do if [ `nova list|grep -i active|wc -l` -eq 300 ];then break;else sleep 5;fi;done
real    2m23.233s
user    0m12.334s
sys     0m1.423s

实例全部创建成功,查看RabbitMQ 的监控页面,q-plugin Queues 的Messages 等待队列基本为0,问题解决。

执行nova migrate 的时候指定目标主机
解决 OpenvSwitch terminating with signal 14 (Alarm clock) 错误
使用RDO juno dev1462 部署mongodb 失败的问题
Neutron 网络架构讲解_VLAN 网络

原创文章,作者:kepupublish,如若转载,请注明出处:https://blog.ytso.com/98249.html

(0)
上一篇 2021年8月20日
下一篇 2021年8月20日

相关推荐

发表回复

登录后才能评论