请帮忙解决一个奇怪的问题:让KVM主机拥有2个Intel(R)Xeon(R)CPU E5-2630 v2(总共24个虚拟核心) . 这个主机带有3个典型的ubuntu客户 - 每个8核,20Gb内存 . 在这样的配置中,一切似乎都没问题 . 当尝试使用相同的配置部署另一个guest虚拟机时,奇怪的事情开始发生 - 即使其他3个guest虚拟机没有负载,并且在4rth上给出一些合理的负载时,kvm主机上的%sy cpu使用率达到25-30%,顶部通常是这样的:

top - 14:29:39 up 104 days,  2:51,  6 users,  load average: 6.46, 6.33, 4.81
Tasks: 227 total,   1 running, 226 sleeping,   0 stopped,   0 zombie
Cpu(s):  5.0%us, 25.2%sy,  0.0%ni, 69.8%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:  98975536k total, 48515312k used, 50460224k free,   154456k buffers
Swap: 100628476k total,     2176k used, 100626300k free,  1072440k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                                                                       
27523 libvirt-  20   0 21.1g  10g 6880 S  700 11.5 126:27.51 kvm                                                                                                           
11745 libvirt-  20   0 21.1g  20g 6964 S   21 21.4 137891:19 kvm                                                                                                           
32692 root      20   0  865m 8792 4532 S    1  0.0  28:49.51 libvirtd                                                                                                      
23252 libvirt-  20   0 10.7g 1.0g 6840 S    1  1.1   6:54.43 kvm                                                                                                           
  117 root      25   5     0    0    0 S    0  0.0   1245:09 ksmd                                                                                                          
 1481 root      20   0 63784  12m 3880 S    0  0.0  54:34.15 gunicorn                                                                                                      
22413 root      20   0 17464 1540 1092 S    0  0.0   4:21.68 top                                                                                                           
22880 root      20   0 17452 1396  972 S    0  0.0   3:50.49 top                                                                                                           
22885 root      20   0 73444 3564 2772 S    0  0.0   2:54.02 sshd                                                                                                          
26008 root      20   0 17460 1528 1088 S    0  0.0   0:07.31 top                                                                                                           
26530 root      20   0 17472 1412  972 S    0  0.0   0:05.43 top                                                                                                           
    1 root      20   0 24448 2324 1344 S    0  0.0   0:04.69 init

(27523是有问题的客人,另一个kvm进程是没有负载的客人)

此时的访客变得无法操作,LA开始增长到50-80甚至更高,几乎所有的cpu使用都分布在%us和%sy之间的不同proprotions

top - 14:38:21 up 37 min,  2 users,  load average: 53.72, 59.50, 45.16
Tasks: 313 total,   9 running, 301 sleeping,   0 stopped,   3 zombie
Cpu(s): 67.5%us, 31.9%sy,  0.0%ni,  0.0%id,  0.4%wa,  0.0%hi,  0.0%si,  0.1%st
Mem:  20590644k total, 11358672k used,  9231972k free,    59020k buffers
Swap: 10483708k total,        0k used, 10483708k free,  1821100k cached

在某些时刻开始有例外:

2014 Sep 17 14:35:09 dev2 [ 2037.438362] Stack:
2014 Sep 17 14:35:09 dev2 [ 2037.438370] Call Trace:
2014 Sep 17 14:35:09 dev2 [ 2037.438429] Code: 48 89 45 c0 48 8d 45 d0 4c 89 4d f8 c7 45 b8 10 00 00 00 48 89 45 c8 e8 e8 f6 ff ff c9 c3 90 90 90 90 90 90 b9 00 02 00 00 31 c0 <f3> 48 ab c3 0f 1f 44 00 00 b9 00 10 00 00 31 c0 f3 aa c3 66 0f 
2014 Sep 17 14:35:09 dev2 [ 2037.441963] Stack:
2014 Sep 17 14:35:09 dev2 [ 2037.443586] Call Trace:
2014 Sep 17 14:35:19 dev2 [ 2037.443586] Code: 48 89 45 c0 48 8d 45 d0 4c 89 4d f8 c7 45 b8 10 00 00 00 48 89 45 c8 e8 e8 f6 ff ff c9 c3 90 90 90 90 90 90 b9 00 02 00 00 31 c0 <f3> 48 ab c3 0f 1f 44 00 00 b9 00 10 00 00 31 c0 f3 aa c3 66 0f 
2014 Sep 17 14:35:45 dev2 [ 2073.284329] Stack:
2014 Sep 17 14:35:45 dev2 [ 2073.285151] Stack:
2014 Sep 17 14:35:45 dev2 [ 2073.285159] Call Trace:
2014 Sep 17 14:35:45 dev2 [ 2073.285221] Code: 48 89 45 c0 48 8d 45 d0 4c 89 4d f8 c7 45 b8 10 00 00 00 48 89 45 c8 e8 e8 f6 ff ff c9 c3 90 90 90 90 90 90 b9 00 02 00 00 31 c0 <f3> 48 ab c3 0f 1f 44 00 00 b9 00 10 00 00 31 c0 f3 aa c3 66 0f 
2014 Sep 17 14:35:56 dev2 [ 2073.285857] Stack:
2014 Sep 17 14:35:56 dev2 [ 2073.285864] Call Trace:
2014 Sep 17 14:35:56 dev2 [ 2073.285914] Code: 48 89 45 c0 48 8d 45 d0 4c 89 4d f8 c7 45 b8 10 00 00 00 48 89 45 c8 e8 e8 f6 ff ff c9 c3 90 90 90 90 90 90 b9 00 02 00 00 31 c0 <f3> 48 ab c3 0f 1f 44 00 00 b9 00 10 00 00 31 c0 f3 aa c3 66 0f 
2014 Sep 17 14:35:56 dev2 [ 2073.290207] Call Trace:
2014 Sep 17 14:35:56 dev2 [ 2073.290207] Code: 48 89 45 c0 48 8d 45 d0 4c 89 4d f8 c7 45 b8 10 00 00 00 48 89 45 c8 e8 e8 f6 ff ff c9 c3 90 90 90 90 90 90 b9 00 02 00 00 31 c0 <f3> 48 ab c3 0f 1f 44 00 00 b9 00 10 00 00 31 c0 f3 aa c3 66 0f

客人的配置是典型的,我们有几十个工作正常,我们也有KVM主机与4个这样的客人也没有问题 . 我应该在哪里挖掘以找到问题的根源?现在没有想法......

主机运行Ubuntu LTS 12.04(Linux vhost12 3.2.0-60-generic#91-Ubuntu SMP Wed Feb 19 03:54:44 UTC 2014 x86_64 x86_64 x86_64 GNU / Linux),guest是相同但3.2.0-56-generic