请帮忙解决一个奇怪的问题:让KVM主机拥有2个Intel(R)Xeon(R)CPU E5-2630 v2(总共24个虚拟核心) . 这个主机带有3个典型的ubuntu客户 - 每个8核,20Gb内存 . 在这样的配置中,一切似乎都没问题 . 当尝试使用相同的配置部署另一个guest虚拟机时,奇怪的事情开始发生 - 即使其他3个guest虚拟机没有负载,并且在4rth上给出一些合理的负载时,kvm主机上的%sy cpu使用率达到25-30%,顶部通常是这样的:
top - 14:29:39 up 104 days, 2:51, 6 users, load average: 6.46, 6.33, 4.81
Tasks: 227 total, 1 running, 226 sleeping, 0 stopped, 0 zombie
Cpu(s): 5.0%us, 25.2%sy, 0.0%ni, 69.8%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 98975536k total, 48515312k used, 50460224k free, 154456k buffers
Swap: 100628476k total, 2176k used, 100626300k free, 1072440k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
27523 libvirt- 20 0 21.1g 10g 6880 S 700 11.5 126:27.51 kvm
11745 libvirt- 20 0 21.1g 20g 6964 S 21 21.4 137891:19 kvm
32692 root 20 0 865m 8792 4532 S 1 0.0 28:49.51 libvirtd
23252 libvirt- 20 0 10.7g 1.0g 6840 S 1 1.1 6:54.43 kvm
117 root 25 5 0 0 0 S 0 0.0 1245:09 ksmd
1481 root 20 0 63784 12m 3880 S 0 0.0 54:34.15 gunicorn
22413 root 20 0 17464 1540 1092 S 0 0.0 4:21.68 top
22880 root 20 0 17452 1396 972 S 0 0.0 3:50.49 top
22885 root 20 0 73444 3564 2772 S 0 0.0 2:54.02 sshd
26008 root 20 0 17460 1528 1088 S 0 0.0 0:07.31 top
26530 root 20 0 17472 1412 972 S 0 0.0 0:05.43 top
1 root 20 0 24448 2324 1344 S 0 0.0 0:04.69 init
(27523是有问题的客人,另一个kvm进程是没有负载的客人)
此时的访客变得无法操作,LA开始增长到50-80甚至更高,几乎所有的cpu使用都分布在%us和%sy之间的不同proprotions
top - 14:38:21 up 37 min, 2 users, load average: 53.72, 59.50, 45.16
Tasks: 313 total, 9 running, 301 sleeping, 0 stopped, 3 zombie
Cpu(s): 67.5%us, 31.9%sy, 0.0%ni, 0.0%id, 0.4%wa, 0.0%hi, 0.0%si, 0.1%st
Mem: 20590644k total, 11358672k used, 9231972k free, 59020k buffers
Swap: 10483708k total, 0k used, 10483708k free, 1821100k cached
在某些时刻开始有例外:
2014 Sep 17 14:35:09 dev2 [ 2037.438362] Stack:
2014 Sep 17 14:35:09 dev2 [ 2037.438370] Call Trace:
2014 Sep 17 14:35:09 dev2 [ 2037.438429] Code: 48 89 45 c0 48 8d 45 d0 4c 89 4d f8 c7 45 b8 10 00 00 00 48 89 45 c8 e8 e8 f6 ff ff c9 c3 90 90 90 90 90 90 b9 00 02 00 00 31 c0 <f3> 48 ab c3 0f 1f 44 00 00 b9 00 10 00 00 31 c0 f3 aa c3 66 0f
2014 Sep 17 14:35:09 dev2 [ 2037.441963] Stack:
2014 Sep 17 14:35:09 dev2 [ 2037.443586] Call Trace:
2014 Sep 17 14:35:19 dev2 [ 2037.443586] Code: 48 89 45 c0 48 8d 45 d0 4c 89 4d f8 c7 45 b8 10 00 00 00 48 89 45 c8 e8 e8 f6 ff ff c9 c3 90 90 90 90 90 90 b9 00 02 00 00 31 c0 <f3> 48 ab c3 0f 1f 44 00 00 b9 00 10 00 00 31 c0 f3 aa c3 66 0f
2014 Sep 17 14:35:45 dev2 [ 2073.284329] Stack:
2014 Sep 17 14:35:45 dev2 [ 2073.285151] Stack:
2014 Sep 17 14:35:45 dev2 [ 2073.285159] Call Trace:
2014 Sep 17 14:35:45 dev2 [ 2073.285221] Code: 48 89 45 c0 48 8d 45 d0 4c 89 4d f8 c7 45 b8 10 00 00 00 48 89 45 c8 e8 e8 f6 ff ff c9 c3 90 90 90 90 90 90 b9 00 02 00 00 31 c0 <f3> 48 ab c3 0f 1f 44 00 00 b9 00 10 00 00 31 c0 f3 aa c3 66 0f
2014 Sep 17 14:35:56 dev2 [ 2073.285857] Stack:
2014 Sep 17 14:35:56 dev2 [ 2073.285864] Call Trace:
2014 Sep 17 14:35:56 dev2 [ 2073.285914] Code: 48 89 45 c0 48 8d 45 d0 4c 89 4d f8 c7 45 b8 10 00 00 00 48 89 45 c8 e8 e8 f6 ff ff c9 c3 90 90 90 90 90 90 b9 00 02 00 00 31 c0 <f3> 48 ab c3 0f 1f 44 00 00 b9 00 10 00 00 31 c0 f3 aa c3 66 0f
2014 Sep 17 14:35:56 dev2 [ 2073.290207] Call Trace:
2014 Sep 17 14:35:56 dev2 [ 2073.290207] Code: 48 89 45 c0 48 8d 45 d0 4c 89 4d f8 c7 45 b8 10 00 00 00 48 89 45 c8 e8 e8 f6 ff ff c9 c3 90 90 90 90 90 90 b9 00 02 00 00 31 c0 <f3> 48 ab c3 0f 1f 44 00 00 b9 00 10 00 00 31 c0 f3 aa c3 66 0f
客人的配置是典型的,我们有几十个工作正常,我们也有KVM主机与4个这样的客人也没有问题 . 我应该在哪里挖掘以找到问题的根源?现在没有想法......
主机运行Ubuntu LTS 12.04(Linux vhost12 3.2.0-60-generic#91-Ubuntu SMP Wed Feb 19 03:54:44 UTC 2014 x86_64 x86_64 x86_64 GNU / Linux),guest是相同但3.2.0-56-generic