* RHEL6 host / CentOS5.6 guest periodically very sluggish
@ 2011-05-09 3:33 T Johnson
0 siblings, 0 replies; only message in thread
From: T Johnson @ 2011-05-09 3:33 UTC (permalink / raw)
To: kvm
Hello,
I have a perplexing problem I'm hoping I might be able to get some
help with. This is a RHEL6 kvm host, with 4 idle CentOS5.6 guests. 3
guests have 1 vpu, 1 guest has 4 vpu. Host is an 8 core/16 thread
Nahalem class machine and is idle other than the KVM guests.
Probably every 5-10 minutes both the host and guests will become very
sluggish to respond to input and likely any running workload. This
lasts for maybe 4-5 minutes then everything returns to normal until it
happens again 5-10 minutes later. repeat infinitely. CPU usage on the
host is mostly idle during these sluggish periods. I've noticed a big
drop in interrupts on the host during these periods and "missed X
ticks" messages in dstat:
normal (responsive) dstat output on host:
----------------------------
usr sys idl wai hiq siq| read writ| recv send| in out | int csw
0 2 97 0 0 0| 0 0 | 53k 809k| 0 0 | 37k 37k
0 1 99 0 0 0| 0 47k| 46B 346B| 0 0 | 37k 35k
0 0 100 0 0 0| 0 0 | 394B 1038B| 0 0 | 30k 34k
0 2 98 0 0 0| 0 0 | 46B 346B| 0 0 | 36k 34k
1 3 96 0 0 0| 0 0 | 394B 1038B| 0 0 | 39k 34k
1 2 98 0 0 0| 0 0 | 46B 346B| 0 0 | 37k 35k
1 3 96 0 0 0| 0 1024B| 514B 1038B| 0 0 | 36k 39k
1 1 98 0 0 0| 0 0 | 46B 346B| 0 0 | 35k 42k
0 2 98 0 0 0| 0 0 | 394B 1038B| 0 0 | 38k 42k
0 2 98 0 0 0| 0 0 | 46B 346B| 0 0 | 37k 42k
1 2 97 0 0 0| 0 0 | 394B 1038B| 0 0 | 35k 41k
1 1 98 0 0 0| 0 0 | 46B 346B| 0 0 | 31k 39k
example sluggish dstat output on host:
-------------------------------
usr sys idl wai hiq siq| read writ| recv send| in out | int csw
0 0 100 0 0 0| 0 0 |5902B 71k| 0 0 | 681 2657
0 1 99 0 0 0| 0 1024B| 652B 692B| 0 0 |5387
41k missed 2 ticks
0 1 99 0 0 0| 0 0 | 546B 788B| 0 0 |5741
43k missed 2 ticks
0 1 99 0 0 0| 0 0 | 546B 756B| 0 0 |5770
43k missed 2 ticks
1 1 98 0 0 0| 0 1024B| 184B 378B| 0 0 |8890
66k missed 2 ticks
0 0 99 0 0 0| 0 0 |1062B 1166B| 0 0 |4631
34k missed 2 ticks
0 2 98 0 0 0| 0 0 | 100B 378B| 0 0 |2680 24k
On the guests (which are idle) there is also some interesting dstat
output. During the sluggish periods, user,system, and interrupt cpu
increases greatly and the number of interrupts doubles or triples. I
can also usually count on dstat crashing on every guest as soon as I
noticed the problem starting on the host:
Traceback (most recent call last):
File "/usr/bin/dstat", line 1974, in ?
main()
File "/usr/bin/dstat", line 1919, in main
o.extract()
File "/usr/bin/dstat", line 509, in extract
self.val[name][i] = 100.0 * (self.cn2[name][i] -
self.cn1[name][i]) / (sum(self.cn2[name]) - sum(self.cn1[name]))
ZeroDivisionError: float division
example normal (responsive) dstat output on guest:
----------------------
usr sys idl wai hiq siq| read writ| recv send| in out | int csw
0 0 100 0 0 0| 0 0 | 60B 314B| 0 0 |1004 11
0 0 100 0 0 0| 0 0 | 60B 314B| 0 0 |1005 12
0 0 100 0 0 0| 0 0 | 60B 314B| 0 0 |1002 11
1 0 99 0 0 0| 0 0 | 60B 314B| 0 0 |1003 9
0 0 100 0 0 0| 0 0 | 60B 314B| 0 0 |1004 11
0 0 100 0 0 0| 0 0 | 106B 368B| 0 0 |1004 11
0 0 100 0 0 0| 0 0 | 60B 314B| 0 0 |1004 15
0 0 100 0 0 0| 0 0 | 60B 314B| 0 0 |1003 9
0 0 100 0 0 0| 0 0 | 60B 314B| 0 0 |1004 11
0 0 100 0 0 0| 0 0 | 60B 314B| 0 0 |1003 9
example sluggish dstat output on guest:
--------------------
usr sys idl wai hiq siq| read writ| recv send| in out | int csw
20 20 60 0 0 0| 0 0 | 60B 404B| 0 0 |1840 8
0 0 100 0 0 0| 0 0 | 60B 314B| 0 0 |2341 8
17 0 50 0 0 33| 0 0 | 60B 404B| 0 0 |3374 8
0 0 33 0 0 67| 0 0 | 60B 314B| 0 0 |2002 11
0 0 0 0 0 100| 0 0 | 60B 314B| 0 0 |1943 5
0 50 50 0 0 0| 0 32k| 60B 420B| 0 0 | 922 18
33 0 67 0 0 0| 0 0 | 60B 404B| 0 0 |1563 9
I'd guess some sort of timer/clock issue, but I'm unsure of where to
go from here? Any help would be appreciated.
Thanks,
TJ
^ permalink raw reply [flat|nested] only message in thread
only message in thread, other threads:[~2011-05-09 3:33 UTC | newest]
Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-05-09 3:33 RHEL6 host / CentOS5.6 guest periodically very sluggish T Johnson
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.