nova-compute service of a compute node is unstable

5flyingbird 发布于 2015/07/23 15:39
阅读 1K+
收藏 0

有一个计算节点上的nova-compute服务不稳定,在dashboard上“计算服务”一栏里显示这个节点的计算服务是"down"的状态时,该节点就无法建立虚机了,然而我登录到节点检查服务是正常运行的状态,有进程号。出现这种情况后只有restart nova-compute service才可以在此节点上新建立虚机。但是重启服务后一般会有效时间在10小时左右会再次出现建立不了虚机的情况。

查看日志有如下错误信息:

2015-07-23 07:01:51.549 12683 ERROR nova.openstack.common.periodic_task [-] Error during ComputeManager.update_available_resource: Timed out waiting for a reply to message ID aefecca0b79147d4a99d4edc66c7e2e5.
2015-07-23 07:02:00.784 12683 ERROR nova.servicegroup.drivers.db [-] model server went away
2015-07-23 07:02:51.778 12683 ERROR nova.openstack.common.periodic_task [-] Error during ComputeManager._run_pending_deletes: Timed out waiting for a reply to message ID 925284b7efb24c01817f5846a0a8e7c9.
2015-07-23 07:03:07.758 12683 ERROR oslo_messaging._drivers.impl_rabbit [-] Failed to consume message from queue: [Errno 110] Connection timed out
2015-07-23 07:03:07.763 12683 ERROR oslo_messaging._drivers.impl_rabbit [-] AMQP server on 15.12.52.37:5672 is unreachable: [Errno 110] Connection timed out. Trying again in 1 seconds.
2015-07-23 07:03:11.510 12683 ERROR oslo_messaging._drivers.impl_rabbit [-] Failed to consume message from queue: [Errno 113] EHOSTUNREACH
2015-07-23 07:03:11.513 12683 ERROR oslo_messaging._drivers.impl_rabbit [-] AMQP server on 15.12.52.37:5672 is unreachable: [Errno 113] EHOSTUNREACH. Trying again in 2 seconds.
2015-07-23 07:03:14.510 12683 ERROR oslo_messaging._drivers.impl_rabbit [-] Failed to consume message from queue: [Errno 113] EHOSTUNREACH
2015-07-23 07:03:14.512 12683 ERROR oslo_messaging._drivers.impl_rabbit [-] AMQP server on 15.12.52.37:5672 is unreachable: [Errno 113] EHOSTUNREACH. Trying again in 2 seconds.
2015-07-23 07:03:17.438 12683 ERROR oslo_messaging._drivers.impl_rabbit [-] Failed to publish message to topic 'conductor': Socket closed
2015-07-23 07:03:17.442 12683 ERROR oslo_messaging._drivers.impl_rabbit [-] AMQP server 15.12.52.37:5672 closed the connection. Check login credentials: Socket closed
2015-07-23 07:04:19.074 12683 ERROR nova.openstack.common.periodic_task [-] Error during ComputeManager._heal_instance_info_cache: Timed out waiting for a reply to message ID c359b12760394e23bf6472bc7f856543.

并且该节点存在丢包现象但是丢包率不高,从该节点ping任何节点都会丢包,但是其他节点上不存在这现象。其他节点都是正常使用。

rabbitmq服务正常运行。


加载中
0
YueZheng
YueZheng
看看是不是时间不同步的问题。各节点都要开启ntp
YueZheng
YueZheng
回复 @5flyingbird : 丢包的话有没有可能是硬件问题,检查下网线。
5
5flyingbird
我查看过ntp,我认为如下信息是不会产生如上所说的问题的。 offset:-0.035 jitter:0.040
返回顶部
顶部