* strange guest slowness after some time @ 2009-03-07 15:47 Tomasz Chmielewski 2009-03-07 16:41 ` Johannes Baumann ` (2 more replies) 0 siblings, 3 replies; 70+ messages in thread From: Tomasz Chmielewski @ 2009-03-07 15:47 UTC (permalink / raw) To: kvm I have a strange slowness which affects some guests after they are running for some time. "Slowness" can happen a few hours after guest start, or, a couple of days after guest start. What do I mean by "slowness"? This is how long it takes to log in via SSH to an unaffected guest - below a second: $ time ssh backupuser@normal_guest exit 0.02user 0.01system 0:00.67elapsed 4%CPU (0avgtext+0avgdata 0maxresident) Now, let's try to log in to the affected guest running on the same host - more than 12 seconds: $ time ssh backupuser@slow_guest exit 0.02user 0.01system 0:12.56elapsed 0%CPU (0avgtext+0avgdata 0maxresident) If I log in via SSH to the affected guest, any key presses lag a second or two. This is actually weird - if I run something IO intensive on the guest, the login is much faster (running CPU-intensive tasks makes no difference): guest# dd if=/dev/vda of=/dev/null $ time ssh backupuser@slow_guest exit 0.02user 0.00system 0:00.70elapsed 2%CPU (0avgtext+0avgdata 0maxresident) Also, running "ping -f <slow_guest>" helps a lot and SSH logins are fast. Look at the difference here - 7470ms vs 139183ms (and packet losses): # ping -f -c 10000 normal_guest 10000 packets transmitted, 10000 received, 0% packet loss, time 7470ms rtt min/avg/max/mdev = 0.443/0.709/6.487/0.112 ms, ipg/ewma 0.747/0.716 ms # ping -f -c 10000 slow_guest 10000 packets transmitted, 9934 received, 0% packet loss, time 139183ms rtt min/avg/max/mdev = 0.470/14.337/50.455/5.409 ms, pipe 4, ipg/ewma 13.919/14.788 ms CPU-intensive tasks are as fast as on unaffected guests. Reading from /dev/vda is as fast as on unaffected guests. So the only thing broken seems to be the network. Rebooting the guest does not help - it is still slow. The only thing that helps is stopping the guest and starting it again (i.e., stopping kvm process and starting a new one). Is there an explanation to this phenomenon? Looks like a problem with virtio drivers somewhere, or? The host is running kvm-83. Affected guests are running 2.6.27.14 kernels and use virtio drivers. The problem happens only _sometimes_. Out of 9 guests I have running on this host, I saw this problem only on 3 guests. I never saw this happening on more than one guest at a time. All three have 512 MB memory assigned, other guests have less memory. -- Tomasz Chmielewski http://wpkg.org ^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: strange guest slowness after some time 2009-03-07 15:47 strange guest slowness after some time Tomasz Chmielewski @ 2009-03-07 16:41 ` Johannes Baumann 2009-03-07 16:54 ` Tomasz Chmielewski 2009-03-09 9:18 ` Tomasz Chmielewski 2009-03-09 9:55 ` Avi Kivity 2 siblings, 1 reply; 70+ messages in thread From: Johannes Baumann @ 2009-03-07 16:41 UTC (permalink / raw) To: Tomasz Chmielewski, kvm are your nameservers ok? ssh is reveres checking your ip, if your nameserver is not available login may take some time. johannes Tomasz Chmielewski schrieb: > I have a strange slowness which affects some guests after they are > running for some time. "Slowness" can happen a few hours after guest > start, or, a couple of days after guest start. > > What do I mean by "slowness"? > > This is how long it takes to log in via SSH to an unaffected guest - > below a second: > > $ time ssh backupuser@normal_guest exit > 0.02user 0.01system 0:00.67elapsed 4%CPU (0avgtext+0avgdata 0maxresident) > > Now, let's try to log in to the affected guest running on the same host > - more than 12 seconds: > > $ time ssh backupuser@slow_guest exit > 0.02user 0.01system 0:12.56elapsed 0%CPU (0avgtext+0avgdata 0maxresident) > > If I log in via SSH to the affected guest, any key presses lag a second > or two. > > > This is actually weird - if I run something IO intensive on the guest, > the login is much faster (running CPU-intensive tasks makes no difference): > > guest# dd if=/dev/vda of=/dev/null > > $ time ssh backupuser@slow_guest exit > 0.02user 0.00system 0:00.70elapsed 2%CPU (0avgtext+0avgdata 0maxresident) > > Also, running "ping -f <slow_guest>" helps a lot and SSH logins are fast. > > > Look at the difference here - 7470ms vs 139183ms (and packet losses): > > # ping -f -c 10000 normal_guest > > 10000 packets transmitted, 10000 received, 0% packet loss, time 7470ms > rtt min/avg/max/mdev = 0.443/0.709/6.487/0.112 ms, ipg/ewma 0.747/0.716 ms > > # ping -f -c 10000 slow_guest > > 10000 packets transmitted, 9934 received, 0% packet loss, time 139183ms > rtt min/avg/max/mdev = 0.470/14.337/50.455/5.409 ms, pipe 4, ipg/ewma > 13.919/14.788 ms > > > CPU-intensive tasks are as fast as on unaffected guests. > Reading from /dev/vda is as fast as on unaffected guests. > > So the only thing broken seems to be the network. > > > Rebooting the guest does not help - it is still slow. > The only thing that helps is stopping the guest and starting it again > (i.e., stopping kvm process and starting a new one). > > > Is there an explanation to this phenomenon? Looks like a problem with > virtio drivers somewhere, or? > > > > The host is running kvm-83. > Affected guests are running 2.6.27.14 kernels and use virtio drivers. > The problem happens only _sometimes_. Out of 9 guests I have running on > this host, I saw this problem only on 3 guests. I never saw this > happening on more than one guest at a time. > All three have 512 MB memory assigned, other guests have less memory. > > ^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: strange guest slowness after some time 2009-03-07 16:41 ` Johannes Baumann @ 2009-03-07 16:54 ` Tomasz Chmielewski 0 siblings, 0 replies; 70+ messages in thread From: Tomasz Chmielewski @ 2009-03-07 16:54 UTC (permalink / raw) To: Johannes Baumann; +Cc: kvm Johannes Baumann schrieb: > are your nameservers ok? > ssh is reveres checking your ip, if your nameserver is not > available login may take some time. Nameservers were fine. If they were wrong, it would affect all other guests, or? Also, to my knowledge, nameservers normally do not affect ping losses and/or ping roundtrip times ;) "dd if=/dev/vda of=/dev/null" curing the problem also excludes the nameserver idea. -- Tomasz Chmielewski http://wpkg.org ^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: strange guest slowness after some time 2009-03-07 15:47 strange guest slowness after some time Tomasz Chmielewski 2009-03-07 16:41 ` Johannes Baumann @ 2009-03-09 9:18 ` Tomasz Chmielewski 2009-03-09 9:28 ` Tomasz Chmielewski 2009-03-09 9:55 ` Avi Kivity 2 siblings, 1 reply; 70+ messages in thread From: Tomasz Chmielewski @ 2009-03-09 9:18 UTC (permalink / raw) To: kvm Tomasz Chmielewski schrieb: > The host is running kvm-83. > Affected guests are running 2.6.27.14 kernels and use virtio drivers. > The problem happens only _sometimes_. Out of 9 guests I have running on > this host, I saw this problem only on 3 guests. I never saw this > happening on more than one guest at a time. > All three have 512 MB memory assigned, other guests have less memory. I upgraded ~2 days ago to kvm-84 and the same just happened for a guest with 256 MB memory. Note how _time_ is different (similar timings are to other unaffected guests): # ping -f -c 10000 <unaffected_guest> 10000 packets transmitted, 10000 received, 0% packet loss, time 12313ms rtt min/avg/max/mdev = 0.432/1.164/96.163/1.934 ms, pipe 7, ipg/ewma 1.231/1.111 ms # ping -f -c 10000 <affected_guest> 10000 packets transmitted, 10000 received, 0% packet loss, time 135625ms rtt min/avg/max/mdev = 0.807/14.228/55.569/5.779 ms, pipe 4, ipg/ewma 13.563/8.601 ms Running "dd if=/dev/vda of=/dev/null" on the affected guest reduces that a bit: # ping -f -c 10000 <affected_guest> 10000 packets transmitted, 10000 received, 0% packet loss, time 50469ms rtt min/avg/max/mdev = 0.616/4.881/54.357/3.847 ms, pipe 5, ipg/ewma 5.047/7.783 ms Anyone? Is it a known bug? -- Tomasz Chmielewski http://wpkg.org ^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: strange guest slowness after some time 2009-03-09 9:18 ` Tomasz Chmielewski @ 2009-03-09 9:28 ` Tomasz Chmielewski 2009-03-19 13:03 ` Tomasz Chmielewski 0 siblings, 1 reply; 70+ messages in thread From: Tomasz Chmielewski @ 2009-03-09 9:28 UTC (permalink / raw) To: kvm Tomasz Chmielewski schrieb: > I upgraded ~2 days ago to kvm-84 and the same just happened for a guest > with 256 MB memory. > > Note how _time_ is different (similar timings are to other unaffected > guests): This is also pretty interesting: # ping -c 10 <unaffected guest> PING 192.168.4.4 (192.168.4.4) 56(84) bytes of data. 64 bytes from 192.168.4.4: icmp_seq=1 ttl=64 time=1.25 ms 64 bytes from 192.168.4.4: icmp_seq=2 ttl=64 time=1.58 ms 64 bytes from 192.168.4.4: icmp_seq=3 ttl=64 time=3.53 ms 64 bytes from 192.168.4.4: icmp_seq=4 ttl=64 time=1.43 ms 64 bytes from 192.168.4.4: icmp_seq=5 ttl=64 time=3.89 ms 64 bytes from 192.168.4.4: icmp_seq=6 ttl=64 time=3.43 ms 64 bytes from 192.168.4.4: icmp_seq=7 ttl=64 time=1.03 ms 64 bytes from 192.168.4.4: icmp_seq=8 ttl=64 time=1.36 ms 64 bytes from 192.168.4.4: icmp_seq=9 ttl=64 time=1.28 ms 64 bytes from 192.168.4.4: icmp_seq=10 ttl=64 time=1.78 ms --- 192.168.4.4 ping statistics --- 10 packets transmitted, 10 received, 0% packet loss, time 9091ms rtt min/avg/max/mdev = 1.031/2.059/3.894/1.045 ms How probable it is so many pings returned with exactly 1000 ms? # ping -c 10 <affected_guest> PING 192.168.4.5 (192.168.4.5) 56(84) bytes of data. 64 bytes from 192.168.4.5: icmp_seq=1 ttl=64 time=1009 ms 64 bytes from 192.168.4.5: icmp_seq=2 ttl=64 time=9.61 ms 64 bytes from 192.168.4.5: icmp_seq=3 ttl=64 time=1000 ms 64 bytes from 192.168.4.5: icmp_seq=4 ttl=64 time=1000 ms 64 bytes from 192.168.4.5: icmp_seq=5 ttl=64 time=1000 ms 64 bytes from 192.168.4.5: icmp_seq=6 ttl=64 time=992 ms 64 bytes from 192.168.4.5: icmp_seq=7 ttl=64 time=1000 ms 64 bytes from 192.168.4.5: icmp_seq=8 ttl=64 time=1001 ms 64 bytes from 192.168.4.5: icmp_seq=9 ttl=64 time=1000 ms 64 bytes from 192.168.4.5: icmp_seq=10 ttl=64 time=998 ms --- 192.168.4.5 ping statistics --- 10 packets transmitted, 10 received, 0% packet loss, time 10025ms rtt min/avg/max/mdev = 9.610/901.198/1009.161/297.222 ms, pipe 2 This one is with "dd if=/dev/vda of=/dev/null" running on the affected guest: # ping -c 10 <affected_guest> PING 192.168.4.5 (192.168.4.5) 56(84) bytes of data. 64 bytes from 192.168.4.5: icmp_seq=1 ttl=64 time=29.4 ms 64 bytes from 192.168.4.5: icmp_seq=2 ttl=64 time=4.56 ms 64 bytes from 192.168.4.5: icmp_seq=3 ttl=64 time=4.05 ms 64 bytes from 192.168.4.5: icmp_seq=4 ttl=64 time=4.20 ms 64 bytes from 192.168.4.5: icmp_seq=5 ttl=64 time=3.82 ms 64 bytes from 192.168.4.5: icmp_seq=6 ttl=64 time=2.47 ms 64 bytes from 192.168.4.5: icmp_seq=7 ttl=64 time=2.16 ms 64 bytes from 192.168.4.5: icmp_seq=8 ttl=64 time=3.89 ms 64 bytes from 192.168.4.5: icmp_seq=9 ttl=64 time=5.98 ms 64 bytes from 192.168.4.5: icmp_seq=10 ttl=64 time=9.16 ms --- 192.168.4.5 ping statistics --- 10 packets transmitted, 10 received, 0% packet loss, time 9107ms rtt min/avg/max/mdev = 2.169/6.978/29.439/7.714 ms -- Tomasz Chmielewski http://wpkg.org ^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: strange guest slowness after some time 2009-03-09 9:28 ` Tomasz Chmielewski @ 2009-03-19 13:03 ` Tomasz Chmielewski 0 siblings, 0 replies; 70+ messages in thread From: Tomasz Chmielewski @ 2009-03-19 13:03 UTC (permalink / raw) To: kvm, Anthony Liguori, Rusty Russell, Avi Kivity Tomasz Chmielewski schrieb: >> Note how _time_ is different (similar timings are to other unaffected >> guests): > > This is also pretty interesting: > > # ping -c 10 <unaffected guest> > PING 192.168.4.4 (192.168.4.4) 56(84) bytes of data. > 64 bytes from 192.168.4.4: icmp_seq=1 ttl=64 time=1.25 ms > 64 bytes from 192.168.4.4: icmp_seq=2 ttl=64 time=1.58 ms (...) > --- 192.168.4.4 ping statistics --- > 10 packets transmitted, 10 received, 0% packet loss, time 9091ms > rtt min/avg/max/mdev = 1.031/2.059/3.894/1.045 ms > > > > How probable it is so many pings returned with exactly 1000 ms? > > # ping -c 10 <affected_guest> > PING 192.168.4.5 (192.168.4.5) 56(84) bytes of data. > 64 bytes from 192.168.4.5: icmp_seq=1 ttl=64 time=1009 ms > 64 bytes from 192.168.4.5: icmp_seq=2 ttl=64 time=9.61 ms > 64 bytes from 192.168.4.5: icmp_seq=3 ttl=64 time=1000 ms > 64 bytes from 192.168.4.5: icmp_seq=4 ttl=64 time=1000 ms (...) Just same as above happened for me again. This time, I equipped the guest in one virtio card and one e1000 card. 00:03.0 Ethernet controller: Qumranet, Inc. Device 1000 00:04.0 Ethernet controller: Intel Corporation 82540EM Gigabit Ethernet Controller (rev 03) Pinging e1000 card on affected guest - replies are as fast: # ping 10.1.1.1 PING 10.1.1.1 (10.1.1.1) 56(84) bytes of data. 64 bytes from 10.1.1.1: icmp_seq=1 ttl=64 time=5.86 ms 64 bytes from 10.1.1.1: icmp_seq=2 ttl=64 time=3.40 ms 64 bytes from 10.1.1.1: icmp_seq=3 ttl=64 time=0.791 ms Pinging virtio on affected guest - slow: # ping 192.168.113.83 PING 192.168.113.83 (192.168.113.83) 56(84) bytes of data. 64 bytes from 192.168.113.83: icmp_seq=1 ttl=64 time=21.6 ms 64 bytes from 192.168.113.83: icmp_seq=2 ttl=64 time=1000 ms 64 bytes from 192.168.113.83: icmp_seq=3 ttl=64 time=2.73 ms 64 bytes from 192.168.113.83: icmp_seq=4 ttl=64 time=243 ms (this is same network, guests on the same host, so latencies are not caused by packets travelling around the globe). -- Tomasz Chmielewski http://wpkg.org ^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: strange guest slowness after some time 2009-03-07 15:47 strange guest slowness after some time Tomasz Chmielewski 2009-03-07 16:41 ` Johannes Baumann 2009-03-09 9:18 ` Tomasz Chmielewski @ 2009-03-09 9:55 ` Avi Kivity 2009-03-09 10:22 ` Tomasz Chmielewski 2 siblings, 1 reply; 70+ messages in thread From: Avi Kivity @ 2009-03-09 9:55 UTC (permalink / raw) To: Tomasz Chmielewski; +Cc: kvm Tomasz Chmielewski wrote: > I have a strange slowness which affects some guests after they are > running for some time. "Slowness" can happen a few hours after guest > start, or, a couple of days after guest start. > > What do I mean by "slowness"? > > This is how long it takes to log in via SSH to an unaffected guest - > below a second: > > $ time ssh backupuser@normal_guest exit > 0.02user 0.01system 0:00.67elapsed 4%CPU (0avgtext+0avgdata 0maxresident) > > Now, let's try to log in to the affected guest running on the same > host - more than 12 seconds: > > $ time ssh backupuser@slow_guest exit > 0.02user 0.01system 0:12.56elapsed 0%CPU (0avgtext+0avgdata 0maxresident) > > If I log in via SSH to the affected guest, any key presses lag a > second or two. > > > This is actually weird - if I run something IO intensive on the guest, > the login is much faster (running CPU-intensive tasks makes no > difference): > > guest# dd if=/dev/vda of=/dev/null > > $ time ssh backupuser@slow_guest exit > 0.02user 0.00system 0:00.70elapsed 2%CPU (0avgtext+0avgdata 0maxresident) > > Also, running "ping -f <slow_guest>" helps a lot and SSH logins are fast. > > > Look at the difference here - 7470ms vs 139183ms (and packet losses): > > # ping -f -c 10000 normal_guest > > 10000 packets transmitted, 10000 received, 0% packet loss, time 7470ms > rtt min/avg/max/mdev = 0.443/0.709/6.487/0.112 ms, ipg/ewma > 0.747/0.716 ms > > # ping -f -c 10000 slow_guest > > 10000 packets transmitted, 9934 received, 0% packet loss, time 139183ms > rtt min/avg/max/mdev = 0.470/14.337/50.455/5.409 ms, pipe 4, ipg/ewma > 13.919/14.788 ms > > > CPU-intensive tasks are as fast as on unaffected guests. > Reading from /dev/vda is as fast as on unaffected guests. > > So the only thing broken seems to be the network. > > > Rebooting the guest does not help - it is still slow. > The only thing that helps is stopping the guest and starting it again > (i.e., stopping kvm process and starting a new one). > > > Is there an explanation to this phenomenon? Looks like a problem with > virtio drivers somewhere, or? > > > > The host is running kvm-83. > Affected guests are running 2.6.27.14 kernels and use virtio drivers. > The problem happens only _sometimes_. Out of 9 guests I have running > on this host, I saw this problem only on 3 guests. I never saw this > happening on more than one guest at a time. > All three have 512 MB memory assigned, other guests have less memory. > I'm guessing there's a problem with timers or timer interrupts. What is the host cpu? Does the problem occur if you pin a guest to a cpu with taskset? -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: strange guest slowness after some time 2009-03-09 9:55 ` Avi Kivity @ 2009-03-09 10:22 ` Tomasz Chmielewski 2009-03-09 10:25 ` Avi Kivity 0 siblings, 1 reply; 70+ messages in thread From: Tomasz Chmielewski @ 2009-03-09 10:22 UTC (permalink / raw) To: Avi Kivity; +Cc: kvm Avi Kivity schrieb: > I'm guessing there's a problem with timers or timer interrupts. > > What is the host cpu? 4 entries like this in /proc/cpuinfo: processor : 3 vendor_id : AuthenticAMD cpu family : 15 model : 65 model name : Dual-Core AMD Opteron(tm) Processor 2212 stepping : 2 cpu MHz : 2000.000 cache size : 1024 KB physical id : 1 siblings : 2 core id : 1 cpu cores : 2 fpu : yes fpu_exception : yes cpuid level : 1 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt rdtscp lm3dnowext 3dnow rep_good pni cx16 lahf_lm cmp_legacy svm extapic cr8_legacy bogomips : 3993.03 TLB size : 1024 4K pages clflush size : 64 cache_alignment : 64 address sizes : 40 bits physical, 48 bits virtual power management: ts fid vid ttp tm stc > Does the problem occur if you pin a guest to a cpu with taskset? Like this? # taskset -p 01 22906 (doesn't help) # taskset -p 02 22906 (doesn't help) But if I do: # taskset -p 03 22906 or # taskset -p 04 22906 it fixes it _rarely_ for the first few seconds, then it's broken again, until I switch the CPUs again (look at ping 9 and 10; other pings are also slow, unaffected guests are around 1 ms): # ping -c 10 192.168.113.85 PING 192.168.113.85 (192.168.113.85) 56(84) bytes of data. 64 bytes from 192.168.113.85: icmp_seq=1 ttl=64 time=22.0 ms 64 bytes from 192.168.113.85: icmp_seq=2 ttl=64 time=23.7 ms 64 bytes from 192.168.113.85: icmp_seq=3 ttl=64 time=2.96 ms 64 bytes from 192.168.113.85: icmp_seq=4 ttl=64 time=51.3 ms 64 bytes from 192.168.113.85: icmp_seq=5 ttl=64 time=22.2 ms 64 bytes from 192.168.113.85: icmp_seq=6 ttl=64 time=1.60 ms 64 bytes from 192.168.113.85: icmp_seq=7 ttl=64 time=49.8 ms 64 bytes from 192.168.113.85: icmp_seq=8 ttl=64 time=23.3 ms 64 bytes from 192.168.113.85: icmp_seq=9 ttl=64 time=999 ms 64 bytes from 192.168.113.85: icmp_seq=10 ttl=64 time=822 ms -- Tomasz Chmielewski http://wpkg.org ^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: strange guest slowness after some time 2009-03-09 10:22 ` Tomasz Chmielewski @ 2009-03-09 10:25 ` Avi Kivity 2009-03-09 10:31 ` Tomasz Chmielewski 2009-03-15 13:19 ` Tomasz Chmielewski 0 siblings, 2 replies; 70+ messages in thread From: Avi Kivity @ 2009-03-09 10:25 UTC (permalink / raw) To: Tomasz Chmielewski; +Cc: kvm Tomasz Chmielewski wrote: > Avi Kivity schrieb: > >> I'm guessing there's a problem with timers or timer interrupts. >> >> What is the host cpu? > > 4 entries like this in /proc/cpuinfo: > > processor : 3 > vendor_id : AuthenticAMD > cpu family : 15 > model : 65 > model name : Dual-Core AMD Opteron(tm) Processor 2212 > That's probably the kvmclock issue that hit older AMDs. It was fixed in kvm-84, please try that. >> Does the problem occur if you pin a guest to a cpu with taskset? > > Like this? > > # taskset -p 01 22906 > I meant 'taskset 01 qemu ...' but it wouldn't have helped if it's kvmclock. -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: strange guest slowness after some time 2009-03-09 10:25 ` Avi Kivity @ 2009-03-09 10:31 ` Tomasz Chmielewski 2009-03-09 10:37 ` Avi Kivity 2009-03-15 13:19 ` Tomasz Chmielewski 1 sibling, 1 reply; 70+ messages in thread From: Tomasz Chmielewski @ 2009-03-09 10:31 UTC (permalink / raw) To: Avi Kivity; +Cc: kvm Avi Kivity schrieb: > Tomasz Chmielewski wrote: >> Avi Kivity schrieb: >> >>> I'm guessing there's a problem with timers or timer interrupts. >>> >>> What is the host cpu? >> >> 4 entries like this in /proc/cpuinfo: >> >> processor : 3 >> vendor_id : AuthenticAMD >> cpu family : 15 >> model : 65 >> model name : Dual-Core AMD Opteron(tm) Processor 2212 >> > > That's probably the kvmclock issue that hit older AMDs. It was fixed in > kvm-84, please try that. It is kvm-84, I have it running since Saturday (but I had this issue with kvm-83 as well). # dmesg | grep kvm (...) loaded kvm module (kvm-84) # modinfo kvm filename: /lib/modules/2.6.24-2-pve/kernel/arch/x86/kvm/kvm.ko version: kvm-84 # kvm -h QEMU PC emulator version 0.9.1 (kvm-84), Copyright (c) 2003-2008 Fabrice Bellard >>> Does the problem occur if you pin a guest to a cpu with taskset? >> >> Like this? >> >> # taskset -p 01 22906 >> > > I meant 'taskset 01 qemu ...' but it wouldn't have helped if it's kvmclock. It can be done on a running process as well (22906 is the PID of the affected guest). And the issue is hard to reproduce (shows up after 1-7 days on a random guest). -- Tomasz Chmielewski http://wpkg.org ^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: strange guest slowness after some time 2009-03-09 10:31 ` Tomasz Chmielewski @ 2009-03-09 10:37 ` Avi Kivity 2009-03-09 10:54 ` Tomasz Chmielewski 0 siblings, 1 reply; 70+ messages in thread From: Avi Kivity @ 2009-03-09 10:37 UTC (permalink / raw) To: Tomasz Chmielewski; +Cc: kvm Tomasz Chmielewski wrote: > Avi Kivity schrieb: >> Tomasz Chmielewski wrote: >>> Avi Kivity schrieb: >>> >>>> I'm guessing there's a problem with timers or timer interrupts. >>>> >>>> What is the host cpu? >>> >>> 4 entries like this in /proc/cpuinfo: >>> >>> processor : 3 >>> vendor_id : AuthenticAMD >>> cpu family : 15 >>> model : 65 >>> model name : Dual-Core AMD Opteron(tm) Processor 2212 >>> >> >> That's probably the kvmclock issue that hit older AMDs. It was fixed >> in kvm-84, please try that. > > It is kvm-84, I have it running since Saturday (but I had this issue > with kvm-83 as well). > And the problem continues? What's your current clocksource (in the guest)? Does changing it help? See /sys/devices/system/clocksource/clocksource0/*. >> >> I meant 'taskset 01 qemu ...' but it wouldn't have helped if it's >> kvmclock. > > It can be done on a running process as well (22906 is the PID of the > affected gue Right, but if the guest is poisoned somehow, this won't help. -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: strange guest slowness after some time 2009-03-09 10:37 ` Avi Kivity @ 2009-03-09 10:54 ` Tomasz Chmielewski 2009-03-09 11:37 ` Tomasz Chmielewski 0 siblings, 1 reply; 70+ messages in thread From: Tomasz Chmielewski @ 2009-03-09 10:54 UTC (permalink / raw) To: Avi Kivity; +Cc: kvm Avi Kivity schrieb: >>>>> I'm guessing there's a problem with timers or timer interrupts. >>>>> >>>>> What is the host cpu? >>>> >>>> 4 entries like this in /proc/cpuinfo: >>>> >>>> processor : 3 >>>> vendor_id : AuthenticAMD >>>> cpu family : 15 >>>> model : 65 >>>> model name : Dual-Core AMD Opteron(tm) Processor 2212 >>>> >>> >>> That's probably the kvmclock issue that hit older AMDs. It was fixed >>> in kvm-84, please try that. >> >> It is kvm-84, I have it running since Saturday (but I had this issue >> with kvm-83 as well). >> > > And the problem continues? > > What's your current clocksource (in the guest)? Does changing it help? > > See /sys/devices/system/clocksource/clocksource0/*. It was kvm-clock. I tried changing it to acpi_pm, jiffies, tsc, but it made no difference. >>> I meant 'taskset 01 qemu ...' but it wouldn't have helped if it's >>> kvmclock. >> >> It can be done on a running process as well (22906 is the PID of the >> affected gue > > Right, but if the guest is poisoned somehow, this won't help. Yep, it seems poisoned. I'll start the guest again in the evening, will add it a e1000 card. If the problem reappears, it would be good to see if it affect only virtio card or not (I've never seen this issue on a guest which doesn't use virtio drivers - so far at least). -- Tomasz Chmielewski http://wpkg.org ^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: strange guest slowness after some time 2009-03-09 10:54 ` Tomasz Chmielewski @ 2009-03-09 11:37 ` Tomasz Chmielewski 2009-03-09 12:14 ` Avi Kivity 0 siblings, 1 reply; 70+ messages in thread From: Tomasz Chmielewski @ 2009-03-09 11:37 UTC (permalink / raw) To: Avi Kivity, kvm Tomasz Chmielewski schrieb: > Avi Kivity schrieb: > >>>>>> I'm guessing there's a problem with timers or timer interrupts. >>>>>> >>>>>> What is the host cpu? >>>>> >>>>> 4 entries like this in /proc/cpuinfo: >>>>> >>>>> processor : 3 >>>>> vendor_id : AuthenticAMD >>>>> cpu family : 15 >>>>> model : 65 >>>>> model name : Dual-Core AMD Opteron(tm) Processor 2212 >>>>> >>>> >>>> That's probably the kvmclock issue that hit older AMDs. It was >>>> fixed in kvm-84, please try that. >>> >>> It is kvm-84, I have it running since Saturday (but I had this issue >>> with kvm-83 as well). >>> >> >> And the problem continues? >> >> What's your current clocksource (in the guest)? Does changing it help? >> >> See /sys/devices/system/clocksource/clocksource0/*. > > It was kvm-clock. > I tried changing it to acpi_pm, jiffies, tsc, but it made no difference. Actually, I don't think that I checked tsc, because when I changed to jiffies, the time has stopped: # echo jiffies > /sys/devices/system/clocksource/clocksource0/current_clocksource # date Mon Mar 9 12:29:00 CET 2009 # date Mon Mar 9 12:29:00 CET 2009 # date Mon Mar 9 12:29:00 CET 2009 # date Mon Mar 9 12:29:00 CET 2009 # date Mon Mar 9 12:29:00 CET 2009 And I couldn't change to anything else any more: # echo tsc > /sys/devices/system/clocksource/clocksource0/current_clocksource # cat /sys/devices/system/clocksource/clocksource0/current_clocksource jiffies # echo kvm-clock > /sys/devices/system/clocksource/clocksource0/current_clocksource # cat /sys/devices/system/clocksource/clocksource0/current_clocksource jiffies So I had to kill the guest and start it again (the above is reproduced on another, "non-poisoned" guest). -- Tomasz Chmielewski http://wpkg.org ^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: strange guest slowness after some time 2009-03-09 11:37 ` Tomasz Chmielewski @ 2009-03-09 12:14 ` Avi Kivity 2009-03-09 12:52 ` Tomasz Chmielewski 0 siblings, 1 reply; 70+ messages in thread From: Avi Kivity @ 2009-03-09 12:14 UTC (permalink / raw) To: Tomasz Chmielewski; +Cc: kvm Tomasz Chmielewski wrote: >> >> It was kvm-clock. >> I tried changing it to acpi_pm, jiffies, tsc, but it made no difference. > > Actually, I don't think that I checked tsc, because when I changed to > jiffies, the time has stopped: > > # echo jiffies > > /sys/devices/system/clocksource/clocksource0/current_clocksource > # date > Mon Mar 9 12:29:00 CET 2009 > # date > Mon Mar 9 12:29:00 CET 2009 > # date > Mon Mar 9 12:29:00 CET 2009 can you post some /proc/interrupt dumps from the guest? I guess the timer interrupt isn't working. Does -no-kvm-irqchip help? -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: strange guest slowness after some time 2009-03-09 12:14 ` Avi Kivity @ 2009-03-09 12:52 ` Tomasz Chmielewski 2009-03-15 15:41 ` Avi Kivity 0 siblings, 1 reply; 70+ messages in thread From: Tomasz Chmielewski @ 2009-03-09 12:52 UTC (permalink / raw) To: Avi Kivity; +Cc: kvm Avi Kivity schrieb: > Tomasz Chmielewski wrote: >>> >>> It was kvm-clock. >>> I tried changing it to acpi_pm, jiffies, tsc, but it made no difference. >> >> Actually, I don't think that I checked tsc, because when I changed to >> jiffies, the time has stopped: >> >> # echo jiffies > >> /sys/devices/system/clocksource/clocksource0/current_clocksource >> # date >> Mon Mar 9 12:29:00 CET 2009 >> # date >> Mon Mar 9 12:29:00 CET 2009 >> # date >> Mon Mar 9 12:29:00 CET 2009 > > can you post some /proc/interrupt dumps from the guest? I guess the > timer interrupt isn't working. We're touching another issue from my original one ("guest slowness") here, I suppose. But there are new interrupts here, when I set the clocksource to "jiffies" (setting to "jiffies" also kills my serial console connection - no key press go through to the guest any more): # cat /proc/interrupts CPU0 0: 104 IO-APIC-edge timer 1: 6 IO-APIC-edge i8042 4: 480 IO-APIC-edge serial 6: 2 IO-APIC-edge floppy 7: 0 IO-APIC-edge parport0 8: 2 IO-APIC-edge rtc0 9: 0 IO-APIC-fasteoi acpi 10: 4400 IO-APIC-fasteoi virtio0, virtio2, virtio4 11: 1550 IO-APIC-fasteoi uhci_hcd:usb1, virtio1, virtio3 12: 89 IO-APIC-edge i8042 14: 0 IO-APIC-edge ide0 15: 30 IO-APIC-edge ide1 NMI: 0 Non-maskable interrupts LOC: 85231 Local timer interrupts RES: 0 Rescheduling interrupts CAL: 0 function call interrupts TLB: 0 TLB shootdowns TRM: 0 Thermal event interrupts SPU: 0 Spurious interrupts ERR: 0 MIS: 0 # cat /proc/interrupts CPU0 0: 104 IO-APIC-edge timer 1: 6 IO-APIC-edge i8042 4: 486 IO-APIC-edge serial 6: 2 IO-APIC-edge floppy 7: 0 IO-APIC-edge parport0 8: 2 IO-APIC-edge rtc0 9: 0 IO-APIC-fasteoi acpi 10: 4461 IO-APIC-fasteoi virtio0, virtio2, virtio4 11: 1590 IO-APIC-fasteoi uhci_hcd:usb1, virtio1, virtio3 12: 89 IO-APIC-edge i8042 14: 0 IO-APIC-edge ide0 15: 30 IO-APIC-edge ide1 NMI: 0 Non-maskable interrupts LOC: 108361 Local timer interrupts RES: 0 Rescheduling interrupts CAL: 0 function call interrupts TLB: 0 TLB shootdowns TRM: 0 Thermal event interrupts SPU: 0 Spurious interrupts ERR: 0 MIS: 0 > Does -no-kvm-irqchip help? Nope, it doesn't - with jiffies, time always stops. -- Tomasz Chmielewski http://wpkg.org ^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: strange guest slowness after some time 2009-03-09 12:52 ` Tomasz Chmielewski @ 2009-03-15 15:41 ` Avi Kivity 2009-03-15 16:14 ` Avi Kivity 0 siblings, 1 reply; 70+ messages in thread From: Avi Kivity @ 2009-03-15 15:41 UTC (permalink / raw) To: Tomasz Chmielewski; +Cc: kvm Tomasz Chmielewski wrote: > >> Does -no-kvm-irqchip help? > > Nope, it doesn't - with jiffies, time always stops. > > Here, too. This is strange. On bare metal it works as expected. -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: strange guest slowness after some time 2009-03-15 15:41 ` Avi Kivity @ 2009-03-15 16:14 ` Avi Kivity 0 siblings, 0 replies; 70+ messages in thread From: Avi Kivity @ 2009-03-15 16:14 UTC (permalink / raw) To: Tomasz Chmielewski; +Cc: kvm Avi Kivity wrote: > Tomasz Chmielewski wrote: >> >>> Does -no-kvm-irqchip help? >> >> Nope, it doesn't - with jiffies, time always stops. >> >> > > > Here, too. This is strange. On bare metal it works as expected. > I think it's unrelated. The PIT is programmed in one-shot mode (likely for the scheduler) and doesn't return to periodic mode. Pity the kernel doesn't warn about this. -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: strange guest slowness after some time 2009-03-09 10:25 ` Avi Kivity 2009-03-09 10:31 ` Tomasz Chmielewski @ 2009-03-15 13:19 ` Tomasz Chmielewski 2009-03-17 10:47 ` Tomasz Chmielewski 1 sibling, 1 reply; 70+ messages in thread From: Tomasz Chmielewski @ 2009-03-15 13:19 UTC (permalink / raw) To: Avi Kivity; +Cc: kvm Avi Kivity schrieb: >>> I'm guessing there's a problem with timers or timer interrupts. >>> >>> What is the host cpu? >> >> 4 entries like this in /proc/cpuinfo: >> >> processor : 3 >> vendor_id : AuthenticAMD >> cpu family : 15 >> model : 65 >> model name : Dual-Core AMD Opteron(tm) Processor 2212 >> > > That's probably the kvmclock issue that hit older AMDs. It was fixed in > kvm-84, please try that. I've been running it for about a week now with kvm-84 and no guest got slow. Can it be related to using cpufreq and ondemand governor? 1) with kvm-83 and cpufreq/ondemand, guests go totally crazy (see "Houston, we have May 15, 1953" thread) 2) with kvm-83 without cpufreq, "slowness" affects guests sometimes 3) with kvm-84 and cpufreq/ondemand, "slowness" affects guests sometimes 4) with kvm-84 without cpufreq, everything run correctly (at least it does for a week now) Does anything from this make any sense? I would really like to use cpufreq/ondemand on the host with KVM, as my tests show it would save me about 50 EUR on electricity bills per one of such servers yearly. -- Tomasz Chmielewski http://wpkg.org ^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: strange guest slowness after some time 2009-03-15 13:19 ` Tomasz Chmielewski @ 2009-03-17 10:47 ` Tomasz Chmielewski 2009-03-17 11:16 ` Avi Kivity 0 siblings, 1 reply; 70+ messages in thread From: Tomasz Chmielewski @ 2009-03-17 10:47 UTC (permalink / raw) To: Avi Kivity; +Cc: kvm Tomasz Chmielewski schrieb: > Avi Kivity schrieb: > >>>> I'm guessing there's a problem with timers or timer interrupts. >>>> >>>> What is the host cpu? >>> >>> 4 entries like this in /proc/cpuinfo: >>> >>> processor : 3 >>> vendor_id : AuthenticAMD >>> cpu family : 15 >>> model : 65 >>> model name : Dual-Core AMD Opteron(tm) Processor 2212 >>> >> >> That's probably the kvmclock issue that hit older AMDs. It was fixed >> in kvm-84, please try that. > > I've been running it for about a week now with kvm-84 and no guest got > slow. > > Can it be related to using cpufreq and ondemand governor? Something fishy here :( After a week or so, network in one guest got slow with kvm-84 and no cpufreq. -- Tomasz Chmielewski http://wpkg.org ^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: strange guest slowness after some time 2009-03-17 10:47 ` Tomasz Chmielewski @ 2009-03-17 11:16 ` Avi Kivity 2009-03-17 11:25 ` Tomasz Chmielewski 0 siblings, 1 reply; 70+ messages in thread From: Avi Kivity @ 2009-03-17 11:16 UTC (permalink / raw) To: Tomasz Chmielewski; +Cc: kvm Tomasz Chmielewski wrote: > > After a week or so, network in one guest got slow with kvm-84 and no > cpufreq. > This is virtio, right? What about e1000? (I realize it takes a week to reproduce, but maybe you have some more experience) -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: strange guest slowness after some time 2009-03-17 11:16 ` Avi Kivity @ 2009-03-17 11:25 ` Tomasz Chmielewski 2009-03-17 15:32 ` Felix Leimbach 0 siblings, 1 reply; 70+ messages in thread From: Tomasz Chmielewski @ 2009-03-17 11:25 UTC (permalink / raw) To: Avi Kivity; +Cc: kvm Avi Kivity schrieb: > Tomasz Chmielewski wrote: >> >> After a week or so, network in one guest got slow with kvm-84 and no >> cpufreq. >> > > This is virtio, right? What about e1000? > > (I realize it takes a week to reproduce, but maybe you have some more > experience) Yes, all affected had virtio. Probably because I didn't have many guests with e1000 interface. After a guest gets slow, I stop it and add another interface, e1000. If it gets slow again, I'll check if e1000 interface is slow as well. Will keep you updated. -- Tomasz Chmielewski http://wpkg.org ^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: strange guest slowness after some time 2009-03-17 11:25 ` Tomasz Chmielewski @ 2009-03-17 15:32 ` Felix Leimbach 2009-03-17 15:43 ` Tomasz Chmielewski ` (2 more replies) 0 siblings, 3 replies; 70+ messages in thread From: Felix Leimbach @ 2009-03-17 15:32 UTC (permalink / raw) To: Tomasz Chmielewski; +Cc: avi, kvm Tomasz Chmielewski wrote: > Avi Kivity schrieb: >> Tomasz Chmielewski wrote: >>> After a week or so, network in one guest got slow with kvm-84 and no >>> cpufreq. >> This is virtio, right? What about e1000? >> >> (I realize it takes a week to reproduce, but maybe you have some more >> experience) > > Yes, all affected had virtio. Probably because I didn't have many > guests with e1000 interface. > > After a guest gets slow, I stop it and add another interface, e1000. > > > If it gets slow again, I'll check if e1000 interface is slow as well. > > Will keep you updated. I see similar behavior: After a week one of my guests' network totally stops to respond. Only guests using virtio networking get hit. Both windows and linux guests are affected. My guests in production use e1000 and have never been hit. While that can be a coincidence it seems very unlikely: Out of 3 virtio guests 2 have been hit, one repeatedly. Out of 3 e1000 guests none has ever been hit. Observed with kvm-83 and kvm-84 with the host running in-kernel KVM code (linux 2.6.25.7) ^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: strange guest slowness after some time 2009-03-17 15:32 ` Felix Leimbach @ 2009-03-17 15:43 ` Tomasz Chmielewski 2009-03-17 17:01 ` Felix Leimbach 2009-03-17 15:52 ` Avi Kivity 2009-03-17 16:27 ` Tomasz Chmielewski 2 siblings, 1 reply; 70+ messages in thread From: Tomasz Chmielewski @ 2009-03-17 15:43 UTC (permalink / raw) To: Felix Leimbach; +Cc: avi, kvm Felix Leimbach schrieb: >> If it gets slow again, I'll check if e1000 interface is slow as well. >> >> Will keep you updated. > I see similar behavior: After a week one of my guests' network totally > stops to respond. Only guests using virtio networking get hit. Both > windows and linux guests are affected. > My guests in production use e1000 and have never been hit. > While that can be a coincidence it seems very unlikely: Out of 3 virtio > guests 2 have been hit, one repeatedly. > Out of 3 e1000 guests none has ever been hit. > > Observed with kvm-83 and kvm-84 with the host running in-kernel KVM code > (linux 2.6.25.7) Could you add a (unused) e1000 interface to your virtio guests? As this issue happens rarely for me, maybe you could help to reproduce it as well (i.e. if network gets slow on virtio interface, give e1000 a IP address, and try if network is also slow on e1000 on the very same guest). BTW, what CPU do you have? -- Tomasz Chmielewski http://wpkg.org ^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: strange guest slowness after some time 2009-03-17 15:43 ` Tomasz Chmielewski @ 2009-03-17 17:01 ` Felix Leimbach 2009-03-17 17:05 ` Avi Kivity ` (2 more replies) 0 siblings, 3 replies; 70+ messages in thread From: Felix Leimbach @ 2009-03-17 17:01 UTC (permalink / raw) To: Tomasz Chmielewski; +Cc: avi, kvm Tomasz Chmielewski wrote: > Felix Leimbach schrieb: >> Out of 3 e1000 guests none has ever been hit. >> >> Observed with kvm-83 and kvm-84 with the host running in-kernel KVM >> code (linux 2.6.25.7) > Could you add a (unused) e1000 interface to your virtio guests? > As this issue happens rarely for me, maybe you could help to reproduce > it as well (i.e. if network gets slow on virtio interface, give e1000 > a IP address, and try if network is also slow on e1000 on the very > same guest). Will do and report > > BTW, what CPU do you have? One dual core Opteron 2212 Note: I will upgrade to two Shanghai Quad-Cores in 2 weeks and test with those as well. processor : 1 vendor_id : AuthenticAMD cpu family : 15 model : 65 model name : Dual-Core AMD Opteron(tm) Processor 2212 stepping : 2 cpu MHz : 1994.996 cache size : 1024 KB physical id : 0 siblings : 2 core id : 1 cpu cores : 2 apicid : 1 initial apicid : 1 fpu : yes fpu_exception : yes cpuid level : 1 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt rdtscp lm 3dno wext 3dnow rep_good nopl pni cx16 lahf_lm cmp_legacy svm extapic cr8_legacy bogomips : 3990.06 TLB size : 1024 4K pages clflush size : 64 cache_alignment : 64 address sizes : 40 bits physical, 48 bits virtual power management: ts fid vid ttp tm stc ^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: strange guest slowness after some time 2009-03-17 17:01 ` Felix Leimbach @ 2009-03-17 17:05 ` Avi Kivity 2009-03-17 18:49 ` Felix Leimbach 2009-03-17 17:38 ` Tomasz Chmielewski 2009-03-31 8:50 ` Tomasz Chmielewski 2 siblings, 1 reply; 70+ messages in thread From: Avi Kivity @ 2009-03-17 17:05 UTC (permalink / raw) To: Felix Leimbach; +Cc: Tomasz Chmielewski, kvm Felix Leimbach wrote: >> >> BTW, what CPU do you have? > One dual core Opteron 2212 Does idle=poll help things? It can cause tsc breakage similar to cpufreq. -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: strange guest slowness after some time 2009-03-17 17:05 ` Avi Kivity @ 2009-03-17 18:49 ` Felix Leimbach 2009-03-18 6:36 ` Avi Kivity 0 siblings, 1 reply; 70+ messages in thread From: Felix Leimbach @ 2009-03-17 18:49 UTC (permalink / raw) To: Avi Kivity; +Cc: Tomasz Chmielewski, kvm Avi Kivity wrote: > Does idle=poll help things? It can cause tsc breakage similar to > cpufreq. On the host, right? Can't test that as I cannot reboot the server. Is tsc breakage still s.th. to watch out after I've upgraded to the Shanghai quadcores? ^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: strange guest slowness after some time 2009-03-17 18:49 ` Felix Leimbach @ 2009-03-18 6:36 ` Avi Kivity 2009-03-18 7:57 ` Felix Leimbach 0 siblings, 1 reply; 70+ messages in thread From: Avi Kivity @ 2009-03-18 6:36 UTC (permalink / raw) To: Felix Leimbach; +Cc: Tomasz Chmielewski, kvm Felix Leimbach wrote: > Avi Kivity wrote: >> Does idle=poll help things? It can cause tsc breakage similar to >> cpufreq. > On the host, right? Can't test that as I cannot reboot the server. > Is tsc breakage still s.th. to watch out after I've upgraded to the > Shanghai quadcores? No, should be gone. Will you have the old server around so we can test things? -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. ^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: strange guest slowness after some time 2009-03-18 6:36 ` Avi Kivity @ 2009-03-18 7:57 ` Felix Leimbach 2009-03-18 8:48 ` Avi Kivity 0 siblings, 1 reply; 70+ messages in thread From: Felix Leimbach @ 2009-03-18 7:57 UTC (permalink / raw) To: kvm, Tomasz Chmielewski Avi Kivity wrote: > Felix Leimbach wrote: >> Is tsc breakage still s.th. to watch out after I've upgraded to the >> Shanghai quadcores? > > No, should be gone. > > Will you have the old server around so we can test things? No, I'll be upgrading the existing server. If you have specific tests in mind I can perform them in the next two weeks before the upgrade. But I cannot restart the server because a few VMs are in production use. If a developer is interested in the old CPU (Opteron 2212) then I can have it mailed to him/her. ^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: strange guest slowness after some time 2009-03-18 7:57 ` Felix Leimbach @ 2009-03-18 8:48 ` Avi Kivity 2009-03-18 9:08 ` Felix Leimbach 0 siblings, 1 reply; 70+ messages in thread From: Avi Kivity @ 2009-03-18 8:48 UTC (permalink / raw) To: Felix Leimbach; +Cc: kvm, Tomasz Chmielewski Felix Leimbach wrote: > Avi Kivity wrote: >> Felix Leimbach wrote: >>> Is tsc breakage still s.th. to watch out after I've upgraded to the >>> Shanghai quadcores? >> >> No, should be gone. >> >> Will you have the old server around so we can test things? > > No, I'll be upgrading the existing server. If you have specific tests in > mind I can perform them in the next two weeks before the upgrade. But I > cannot restart the server because a few VMs are in production use. > > If a developer is interested in the old CPU (Opteron 2212) then I can > have it mailed to him/her. > Thanks for the offer; I can probably find a similar cpu, the main difficulty is replicating the problem. Since there are now at least two reports, maybe it won't be that difficult. If you can figure out a way to reliably reproduce this, that would be most helpful. -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: strange guest slowness after some time 2009-03-18 8:48 ` Avi Kivity @ 2009-03-18 9:08 ` Felix Leimbach 0 siblings, 0 replies; 70+ messages in thread From: Felix Leimbach @ 2009-03-18 9:08 UTC (permalink / raw) To: Avi Kivity; +Cc: kvm, Tomasz Chmielewski Avi Kivity wrote: > Since there are now at least two reports, maybe it won't be that > difficult. If you can figure out a way to reliably reproduce this, > that would be most helpful. I'll see what I can do. Although I'm not too optimistic because I have not experienced the problem after upgrading to kvm-84. But hey, that's a good thing. ^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: strange guest slowness after some time 2009-03-17 17:01 ` Felix Leimbach 2009-03-17 17:05 ` Avi Kivity @ 2009-03-17 17:38 ` Tomasz Chmielewski 2009-06-08 11:02 ` Felix Leimbach 2009-03-31 8:50 ` Tomasz Chmielewski 2 siblings, 1 reply; 70+ messages in thread From: Tomasz Chmielewski @ 2009-03-17 17:38 UTC (permalink / raw) To: Felix Leimbach; +Cc: avi, kvm Felix Leimbach schrieb: >> BTW, what CPU do you have? > One dual core Opteron 2212 > Note: I will upgrade to two Shanghai Quad-Cores in 2 weeks and test with > those as well. > > processor : 1 > vendor_id : AuthenticAMD > cpu family : 15 > model : 65 > model name : Dual-Core AMD Opteron(tm) Processor 2212 > stepping : 2 > cpu MHz : 1994.996 > cache size : 1024 KB It's exactly the same CPU I have. Almost. My is 5.004 MHz faster ;) model name : Dual-Core AMD Opteron(tm) Processor 2212 stepping : 2 cpu MHz : 2000.000 -- Tomasz Chmielewski http://wpkg.org ^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: strange guest slowness after some time 2009-03-17 17:38 ` Tomasz Chmielewski @ 2009-06-08 11:02 ` Felix Leimbach 2009-06-16 14:26 ` Tomasz Chmielewski 0 siblings, 1 reply; 70+ messages in thread From: Felix Leimbach @ 2009-06-08 11:02 UTC (permalink / raw) To: Tomasz Chmielewski; +Cc: avi, kvm Tomasz Chmielewski wrote: > Felix Leimbach schrieb: > >>> BTW, what CPU do you have? >> One dual core Opteron 2212 >> Note: I will upgrade to two Shanghai Quad-Cores in 2 weeks and test >> with those as well. >> >> processor : 1 >> vendor_id : AuthenticAMD >> cpu family : 15 >> model : 65 >> model name : Dual-Core AMD Opteron(tm) Processor 2212 >> stepping : 2 >> cpu MHz : 1994.996 >> cache size : 1024 KB > > It's exactly the same CPU I have. Interesting: Since two months I'm running on 2 Shanghai Quad-Cores instead and the problem is definitely gone. The rest of the hardware as well as the whole software-stack remained unchanged. That should confirm what we assumed already. ^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: strange guest slowness after some time 2009-06-08 11:02 ` Felix Leimbach @ 2009-06-16 14:26 ` Tomasz Chmielewski 0 siblings, 0 replies; 70+ messages in thread From: Tomasz Chmielewski @ 2009-06-16 14:26 UTC (permalink / raw) To: Felix Leimbach; +Cc: avi, kvm Felix Leimbach wrote: >> It's exactly the same CPU I have. > Interesting: Since two months I'm running on 2 Shanghai Quad-Cores > instead and the problem is definitely gone. > The rest of the hardware as well as the whole software-stack remained > unchanged. > > That should confirm what we assumed already. For me, it turned out that KVM I was running (coming with Proxmox VE) had a "fairsched" patch (OpenVZ-related) which caused this broken behaviour. -- Tomasz Chmielewski http://wpkg.org ^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: strange guest slowness after some time 2009-03-17 17:01 ` Felix Leimbach 2009-03-17 17:05 ` Avi Kivity 2009-03-17 17:38 ` Tomasz Chmielewski @ 2009-03-31 8:50 ` Tomasz Chmielewski 2009-04-01 4:22 ` David S. Ahern 2 siblings, 1 reply; 70+ messages in thread From: Tomasz Chmielewski @ 2009-03-31 8:50 UTC (permalink / raw) To: Felix Leimbach; +Cc: avi, kvm, Rusty Russell, Anthony Liguori, David S. Ahern Felix Leimbach schrieb: > Tomasz Chmielewski wrote: >> Felix Leimbach schrieb: >>> Out of 3 e1000 guests none has ever been hit. >>> >>> Observed with kvm-83 and kvm-84 with the host running in-kernel KVM >>> code (linux 2.6.25.7) >> Could you add a (unused) e1000 interface to your virtio guests? >> As this issue happens rarely for me, maybe you could help to reproduce >> it as well (i.e. if network gets slow on virtio interface, give e1000 >> a IP address, and try if network is also slow on e1000 on the very >> same guest). > Will do and report >> >> BTW, what CPU do you have? > One dual core Opteron 2212 > Note: I will upgrade to two Shanghai Quad-Cores in 2 weeks and test with > those as well. I have this "slowness" on an Intel CPU as well, after about 10 days of guest uptime (using virtio net): processor : 1 vendor_id : GenuineIntel cpu family : 6 model : 15 model name : Intel(R) Xeon(R) CPU 3050 @ 2.13GHz stepping : 6 cpu MHz : 2133.410 cache size : 2048 KB physical id : 0 siblings : 2 core id : 1 cpu cores : 2 fpu : yes fpu_exception : yes cpuid level : 10 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall lm constant_tsc arch_perfmon pebs bts rep_good pni monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr lahf_lm bogomips : 4266.87 clflush size : 64 cache_alignment : 64 address sizes : 36 bits physical, 48 bits virtual power management: -- Tomasz Chmielewski http://wpkg.org ^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: strange guest slowness after some time 2009-03-31 8:50 ` Tomasz Chmielewski @ 2009-04-01 4:22 ` David S. Ahern 2009-04-01 6:21 ` Tomasz Chmielewski 0 siblings, 1 reply; 70+ messages in thread From: David S. Ahern @ 2009-04-01 4:22 UTC (permalink / raw) To: Tomasz Chmielewski Cc: Felix Leimbach, avi, kvm, Rusty Russell, Anthony Liguori Tomasz Chmielewski wrote: > Felix Leimbach schrieb: >> Tomasz Chmielewski wrote: >>> Felix Leimbach schrieb: >>>> Out of 3 e1000 guests none has ever been hit. >>>> >>>> Observed with kvm-83 and kvm-84 with the host running in-kernel KVM >>>> code (linux 2.6.25.7) >>> Could you add a (unused) e1000 interface to your virtio guests? >>> As this issue happens rarely for me, maybe you could help to >>> reproduce it as well (i.e. if network gets slow on virtio interface, >>> give e1000 a IP address, and try if network is also slow on e1000 on >>> the very same guest). >> Will do and report >>> >>> BTW, what CPU do you have? >> One dual core Opteron 2212 >> Note: I will upgrade to two Shanghai Quad-Cores in 2 weeks and test >> with those as well. > > I have this "slowness" on an Intel CPU as well, after about 10 days of > guest uptime (using virtio net): > > processor : 1 > vendor_id : GenuineIntel > cpu family : 6 > model : 15 > model name : Intel(R) Xeon(R) CPU 3050 @ 2.13GHz > stepping : 6 > cpu MHz : 2133.410 > cache size : 2048 KB > physical id : 0 > siblings : 2 > core id : 1 > cpu cores : 2 > fpu : yes > fpu_exception : yes > cpuid level : 10 > wp : yes > flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge > mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe > syscall lm constant_tsc arch_perfmon pebs bts rep_good pni monitor > ds_cpl vmx est tm2 ssse3 cx16 xtpr lahf_lm > bogomips : 4266.87 > clflush size : 64 > cache_alignment : 64 > address sizes : 36 bits physical, 48 bits virtual > power management: > > For the Intel server, the guest is using the e1000 NIC or virtio or other? I have a few DL320G5s with this processor; I have not hit this problem running rhel3 and rhel4 guests using e1000/scsi devices. david ^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: strange guest slowness after some time 2009-04-01 4:22 ` David S. Ahern @ 2009-04-01 6:21 ` Tomasz Chmielewski 2009-04-06 15:19 ` Tomasz Chmielewski 0 siblings, 1 reply; 70+ messages in thread From: Tomasz Chmielewski @ 2009-04-01 6:21 UTC (permalink / raw) To: David S. Ahern; +Cc: Felix Leimbach, avi, kvm, Rusty Russell, Anthony Liguori David S. Ahern schrieb: >>>> Could you add a (unused) e1000 interface to your virtio guests? >>>> As this issue happens rarely for me, maybe you could help to >>>> reproduce it as well (i.e. if network gets slow on virtio interface, >>>> give e1000 a IP address, and try if network is also slow on e1000 on >>>> the very same guest). >>> Will do and report >>>> BTW, what CPU do you have? >>> One dual core Opteron 2212 >>> Note: I will upgrade to two Shanghai Quad-Cores in 2 weeks and test >>> with those as well. >> I have this "slowness" on an Intel CPU as well, after about 10 days of >> guest uptime (using virtio net): >> >> processor : 1 >> vendor_id : GenuineIntel >> cpu family : 6 >> model : 15 >> model name : Intel(R) Xeon(R) CPU 3050 @ 2.13GHz > For the Intel server, the guest is using the e1000 NIC or virtio or > other? I have a few DL320G5s with this processor; I have not hit this > problem running rhel3 and rhel4 guests using e1000/scsi devices. As I mentioned, it was using virtio net. Guests running with e1000 (and virtio_blk) don't have this problem. -- Tomasz Chmielewski http://wpkg.org ^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: strange guest slowness after some time 2009-04-01 6:21 ` Tomasz Chmielewski @ 2009-04-06 15:19 ` Tomasz Chmielewski 2009-04-08 0:49 ` Rusty Russell 0 siblings, 1 reply; 70+ messages in thread From: Tomasz Chmielewski @ 2009-04-06 15:19 UTC (permalink / raw) To: David S. Ahern; +Cc: Felix Leimbach, avi, kvm, Rusty Russell, Anthony Liguori Tomasz Chmielewski schrieb: > As I mentioned, it was using virtio net. > > Guests running with e1000 (and virtio_blk) don't have this problem. Also, virtio_console seem to be affected by this "slowness" issue. Am I correct to think that if: * on guest "lsmod" outputs: virtio_console 6828 0 [permanent] * on guest, /etc/inittab contains: 6:2345:respawn:/sbin/mingetty ttyS0 * on host, I start the guest with a parameter: -serial unix:/var/run/qemu-server/103.serial,server,nowait That the guests's ttyS0 console is "virtio_console"? If my thinking is correct, than I have a "slow serial console" on some of the guests using virtio_pci and virtio_console driver. By "slow serial console" I mean any character typed shows up after a second or so. It can be also "cured" like with virtio_net - just run: dd if=/dev/vda of=/dev/null And the console reacts normally. Stop dd, console is slow again. I have this issue on two guests with e1000 network, which use virtio_blk (and virtio_console...). I never saw this issue with guests which don't use virtio. -- Tomasz Chmielewski http://wpkg.org ^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: strange guest slowness after some time 2009-04-06 15:19 ` Tomasz Chmielewski @ 2009-04-08 0:49 ` Rusty Russell 2009-04-08 5:45 ` Tomasz Chmielewski 2009-05-26 11:49 ` Tomasz Chmielewski 0 siblings, 2 replies; 70+ messages in thread From: Rusty Russell @ 2009-04-08 0:49 UTC (permalink / raw) To: Tomasz Chmielewski Cc: David S. Ahern, Felix Leimbach, avi, kvm, Anthony Liguori On Tuesday 07 April 2009 00:49:17 Tomasz Chmielewski wrote: > Tomasz Chmielewski schrieb: > > > As I mentioned, it was using virtio net. > > > > Guests running with e1000 (and virtio_blk) don't have this problem. > > Also, virtio_console seem to be affected by this "slowness" issue. I'm pretty sure this is different. Older virtio_console code ignored interrupts and polled, and use a heuristic to back off on polling (this was because we used the generic "hvc" infrastructure which hacked support). You'll find a delay on the first keystroke after idle, but none on the second. Hope that helps, Rusty. ^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: strange guest slowness after some time 2009-04-08 0:49 ` Rusty Russell @ 2009-04-08 5:45 ` Tomasz Chmielewski 2009-05-26 11:49 ` Tomasz Chmielewski 1 sibling, 0 replies; 70+ messages in thread From: Tomasz Chmielewski @ 2009-04-08 5:45 UTC (permalink / raw) To: Rusty Russell; +Cc: David S. Ahern, Felix Leimbach, avi, kvm, Anthony Liguori Rusty Russell schrieb: > On Tuesday 07 April 2009 00:49:17 Tomasz Chmielewski wrote: >> Tomasz Chmielewski schrieb: >> >>> As I mentioned, it was using virtio net. >>> >>> Guests running with e1000 (and virtio_blk) don't have this problem. >> Also, virtio_console seem to be affected by this "slowness" issue. > > I'm pretty sure this is different. Older virtio_console code ignored > interrupts and polled, and use a heuristic to back off on polling (this was > because we used the generic "hvc" infrastructure which hacked support). By "older" you mean guest drivers? I have 2.6.27.x on guests and see this issue. If you meant host, I use kvm-84. > You'll find a delay on the first keystroke after idle, but none on the > second. I'm not sure. Press "a" seven times fast, and 7 characters will be printed a second later. But: wait one second more, it will be unresponsive again. You won't see the characters "as you type". Also these symptoms are very similar to virtio_net issue: - it happens only on some guest (even if they have the same kernel and userspace) after a random period of time - it used to happen for me _always_ when network got slow with virtio_net driver - it doesn't go away with guest restart initiated from guest's system - it goes away with kvm process stop/start (i.e. new kvm process), but can appear later with no apparent cause -- Tomasz Chmielewski http://wpkg.org ^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: strange guest slowness after some time 2009-04-08 0:49 ` Rusty Russell 2009-04-08 5:45 ` Tomasz Chmielewski @ 2009-05-26 11:49 ` Tomasz Chmielewski 2009-05-26 11:55 ` Avi Kivity 1 sibling, 1 reply; 70+ messages in thread From: Tomasz Chmielewski @ 2009-05-26 11:49 UTC (permalink / raw) To: Rusty Russell; +Cc: David S. Ahern, Felix Leimbach, avi, kvm, Anthony Liguori Rusty Russell wrote: > On Tuesday 07 April 2009 00:49:17 Tomasz Chmielewski wrote: >> Tomasz Chmielewski schrieb: >> >>> As I mentioned, it was using virtio net. >>> >>> Guests running with e1000 (and virtio_blk) don't have this problem. >> Also, virtio_console seem to be affected by this "slowness" issue. > > I'm pretty sure this is different. Older virtio_console code ignored > interrupts and polled, and use a heuristic to back off on polling (this was > because we used the generic "hvc" infrastructure which hacked support). > > You'll find a delay on the first keystroke after idle, but none on the > second. I still observe this "slowness" with kvm-86 after the guest is running for some time (virtio_net and virtio_console seem to be affected; guest restart doesn't fix it). -- Tomasz Chmielewski http://wpkg.org ^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: strange guest slowness after some time 2009-05-26 11:49 ` Tomasz Chmielewski @ 2009-05-26 11:55 ` Avi Kivity 2009-05-26 12:05 ` Tomasz Chmielewski 0 siblings, 1 reply; 70+ messages in thread From: Avi Kivity @ 2009-05-26 11:55 UTC (permalink / raw) To: Tomasz Chmielewski Cc: Rusty Russell, David S. Ahern, Felix Leimbach, kvm, Anthony Liguori Tomasz Chmielewski wrote: > > I still observe this "slowness" with kvm-86 after the guest is running > for some time (virtio_net and virtio_console seem to be affected; > guest restart doesn't fix it). > Anything in guest dmesg? -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: strange guest slowness after some time 2009-05-26 11:55 ` Avi Kivity @ 2009-05-26 12:05 ` Tomasz Chmielewski 2009-05-26 12:10 ` Avi Kivity 0 siblings, 1 reply; 70+ messages in thread From: Tomasz Chmielewski @ 2009-05-26 12:05 UTC (permalink / raw) To: Avi Kivity Cc: Rusty Russell, David S. Ahern, Felix Leimbach, kvm, Anthony Liguori Avi Kivity wrote: > Tomasz Chmielewski wrote: >> >> I still observe this "slowness" with kvm-86 after the guest is running >> for some time (virtio_net and virtio_console seem to be affected; >> guest restart doesn't fix it). >> > > Anything in guest dmesg? No. No hints in syslog, dmesg... Can it be that this is more likely to happens on "busy" hosts? It happens for me on a host where I have 16 guests running. Also, as I booted the host almost 2 days ago, 2 or 3 guests didn't start properly (16 guests were starting at the same time), with their kernel saying: Kernel panic - not syncing: IO-APIC + timer doesn't work! Can it be related? After I restarted these failed guests, they started properly. -- Tomasz Chmielewski http://wpkg.org ^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: strange guest slowness after some time 2009-05-26 12:05 ` Tomasz Chmielewski @ 2009-05-26 12:10 ` Avi Kivity 2009-05-26 14:07 ` Tomasz Chmielewski 0 siblings, 1 reply; 70+ messages in thread From: Avi Kivity @ 2009-05-26 12:10 UTC (permalink / raw) To: Tomasz Chmielewski Cc: Rusty Russell, David S. Ahern, Felix Leimbach, kvm, Anthony Liguori Tomasz Chmielewski wrote: > Avi Kivity wrote: >> Tomasz Chmielewski wrote: >>> >>> I still observe this "slowness" with kvm-86 after the guest is >>> running for some time (virtio_net and virtio_console seem to be >>> affected; guest restart doesn't fix it). >>> >> >> Anything in guest dmesg? > > No. > No hints in syslog, dmesg... > > > Can it be that this is more likely to happens on "busy" hosts? > We'll only know once we fix it... > It happens for me on a host where I have 16 guests running. > > > Also, as I booted the host almost 2 days ago, 2 or 3 guests didn't > start properly (16 guests were starting at the same time), with their > kernel saying: > > Kernel panic - not syncing: IO-APIC + timer doesn't work! > > Can it be related? > > After I restarted these failed guests, they started properly. > This is timing related. On a busy host you can get timeouts and thus the panics. It's unrelated. Maybe virtio is racy and a loaded host exposes the race. -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: strange guest slowness after some time 2009-05-26 12:10 ` Avi Kivity @ 2009-05-26 14:07 ` Tomasz Chmielewski 2009-05-26 14:35 ` Avi Kivity 0 siblings, 1 reply; 70+ messages in thread From: Tomasz Chmielewski @ 2009-05-26 14:07 UTC (permalink / raw) To: Avi Kivity Cc: Rusty Russell, David S. Ahern, Felix Leimbach, kvm, Anthony Liguori Avi Kivity wrote: > Tomasz Chmielewski wrote: >> Avi Kivity wrote: >>> Tomasz Chmielewski wrote: >>>> >>>> I still observe this "slowness" with kvm-86 after the guest is >>>> running for some time (virtio_net and virtio_console seem to be >>>> affected; guest restart doesn't fix it). >>>> >>> >>> Anything in guest dmesg? >> >> No. >> No hints in syslog, dmesg... >> >> >> Can it be that this is more likely to happens on "busy" hosts? >> > > We'll only know once we fix it... (...) > Maybe virtio is racy and a loaded host exposes the race. I see it happening with virtio on 2.6.29.x guests as well. So, what would you do if you saw it on your systems as well? ;) Add some debug routines into virtio_* modules? -- Tomasz Chmielewski http://wpkg.org ^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: strange guest slowness after some time 2009-05-26 14:07 ` Tomasz Chmielewski @ 2009-05-26 14:35 ` Avi Kivity 2009-05-28 14:58 ` Tomasz Chmielewski 0 siblings, 1 reply; 70+ messages in thread From: Avi Kivity @ 2009-05-26 14:35 UTC (permalink / raw) To: Tomasz Chmielewski Cc: Rusty Russell, David S. Ahern, Felix Leimbach, kvm, Anthony Liguori Tomasz Chmielewski wrote: >> Maybe virtio is racy and a loaded host exposes the race. > > I see it happening with virtio on 2.6.29.x guests as well. > > So, what would you do if you saw it on your systems as well? ;) > > Add some debug routines into virtio_* modules? > I'm no virtio expert. Maybe I'd insert tracepoints to record interrupts and kicks. -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: strange guest slowness after some time 2009-05-26 14:35 ` Avi Kivity @ 2009-05-28 14:58 ` Tomasz Chmielewski 2009-05-31 8:43 ` Avi Kivity 0 siblings, 1 reply; 70+ messages in thread From: Tomasz Chmielewski @ 2009-05-28 14:58 UTC (permalink / raw) To: Avi Kivity Cc: Rusty Russell, David S. Ahern, Felix Leimbach, kvm, Anthony Liguori Avi Kivity wrote: > Tomasz Chmielewski wrote: >>> Maybe virtio is racy and a loaded host exposes the race. >> >> I see it happening with virtio on 2.6.29.x guests as well. >> >> So, what would you do if you saw it on your systems as well? ;) >> >> Add some debug routines into virtio_* modules? >> > > I'm no virtio expert. Maybe I'd insert tracepoints to record interrupts > and kicks. Accidentally, I made some "interesting" discovery. This ~2 MB video shows a kvm-86 guest being rebooted and GRUB started: http://syneticon.net/kvm/kvm-slowness.ogg GRUB has its timeout set to 50 seconds, and is supposed to show it on the screen by decreasing the number of seconds shown, every second. Here, GRUB decreases the second counter very fast by 2 seconds, then waits 2 seconds, then again decreases the number of sends by 2 seconds very fast, and so on. Perhaps my wording does not describe it very well though, so just try to download the video and open it i.e. in mplayer. Comments? -- Tomasz Chmielewski http://wpkg.org ^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: strange guest slowness after some time 2009-05-28 14:58 ` Tomasz Chmielewski @ 2009-05-31 8:43 ` Avi Kivity 0 siblings, 0 replies; 70+ messages in thread From: Avi Kivity @ 2009-05-31 8:43 UTC (permalink / raw) To: Tomasz Chmielewski Cc: Rusty Russell, David S. Ahern, Felix Leimbach, kvm, Anthony Liguori, Marcelo Tosatti Tomasz Chmielewski wrote: > Accidentally, I made some "interesting" discovery. > > This ~2 MB video shows a kvm-86 guest being rebooted and GRUB started: > > http://syneticon.net/kvm/kvm-slowness.ogg > > > GRUB has its timeout set to 50 seconds, and is supposed to show it on > the screen by decreasing the number of seconds shown, every second. > > Here, GRUB decreases the second counter very fast by 2 seconds, then > waits 2 seconds, then again decreases the number of sends by 2 seconds > very fast, and so on. > > Perhaps my wording does not describe it very well though, so just try > to download the video and open it i.e. in mplayer. Wierd, wierd. Can you run kvmtrace on this guest while this is happening and post the results somewhere? -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: strange guest slowness after some time 2009-03-17 15:32 ` Felix Leimbach 2009-03-17 15:43 ` Tomasz Chmielewski @ 2009-03-17 15:52 ` Avi Kivity 2009-03-17 16:12 ` Tomasz Chmielewski 2009-03-17 16:27 ` Tomasz Chmielewski 2 siblings, 1 reply; 70+ messages in thread From: Avi Kivity @ 2009-03-17 15:52 UTC (permalink / raw) To: Felix Leimbach; +Cc: Tomasz Chmielewski, kvm Felix Leimbach wrote: > I see similar behavior: After a week one of my guests' network totally > stops to respond. Only guests using virtio networking get hit. Both > windows and linux guests are affected. > My guests in production use e1000 and have never been hit. > While that can be a coincidence it seems very unlikely: Out of 3 > virtio guests 2 have been hit, one repeatedly. > Out of 3 e1000 guests none has ever been hit. > > Observed with kvm-83 and kvm-84 with the host running in-kernel KVM > code (linux 2.6.25.7) Might it be that some counter overflowed? What are the packet counts on long running guests? (output of ifconfig, even on an unaffected e1000 guest, might help) -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: strange guest slowness after some time 2009-03-17 15:52 ` Avi Kivity @ 2009-03-17 16:12 ` Tomasz Chmielewski 2009-03-17 17:05 ` Felix Leimbach 0 siblings, 1 reply; 70+ messages in thread From: Tomasz Chmielewski @ 2009-03-17 16:12 UTC (permalink / raw) To: Avi Kivity; +Cc: Felix Leimbach, kvm Avi Kivity schrieb: > Felix Leimbach wrote: >> I see similar behavior: After a week one of my guests' network totally >> stops to respond. Only guests using virtio networking get hit. Both >> windows and linux guests are affected. >> My guests in production use e1000 and have never been hit. >> While that can be a coincidence it seems very unlikely: Out of 3 >> virtio guests 2 have been hit, one repeatedly. >> Out of 3 e1000 guests none has ever been hit. >> >> Observed with kvm-83 and kvm-84 with the host running in-kernel KVM >> code (linux 2.6.25.7) > > Might it be that some counter overflowed? What are the packet counts on > long running guests? I don't think so. I just made both counters (TX, RX) of ifconfig for virtio interfaces overflow several times and everything is still as fast as it should be. > (output of ifconfig, even on an unaffected e1000 guest, might help) -- Tomasz Chmielewski http://wpkg.org ^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: strange guest slowness after some time 2009-03-17 16:12 ` Tomasz Chmielewski @ 2009-03-17 17:05 ` Felix Leimbach 2009-03-17 17:10 ` Avi Kivity 0 siblings, 1 reply; 70+ messages in thread From: Felix Leimbach @ 2009-03-17 17:05 UTC (permalink / raw) To: Tomasz Chmielewski; +Cc: Avi Kivity, kvm Tomasz Chmielewski wrote: > Avi Kivity schrieb: >> Might it be that some counter overflowed? What are the packet counts >> on long running guests? > I don't think so. > > I just made both counters (TX, RX) of ifconfig for virtio interfaces > overflow several times and everything is still as fast as it should be. I had overflows on the counters as well (32 bit guests) without an problem. Here is the current ifconfig output of a machine which suffered the problem before: eth0 Link encap:Ethernet HWaddr 52:54:00:74:01:01 inet addr:10.75.13.1 Bcast:10.75.255.255 Mask:255.255.0.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:3542104 errors:0 dropped:0 overruns:0 frame:0 TX packets:412546 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:682285568 (650.6 MiB) TX bytes:2907586796 (2.7 GiB) >> (output of ifconfig, even on an unaffected e1000 guest, might help) currently I have e1000 only on windows guests. Is there a way to gather relevant statistics there too? ^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: strange guest slowness after some time 2009-03-17 17:05 ` Felix Leimbach @ 2009-03-17 17:10 ` Avi Kivity 2009-03-17 17:43 ` Tomasz Chmielewski ` (2 more replies) 0 siblings, 3 replies; 70+ messages in thread From: Avi Kivity @ 2009-03-17 17:10 UTC (permalink / raw) To: Felix Leimbach; +Cc: Tomasz Chmielewski, kvm Felix Leimbach wrote: > Tomasz Chmielewski wrote: >> Avi Kivity schrieb: >>> Might it be that some counter overflowed? What are the packet >>> counts on long running guests? >> I don't think so. >> >> I just made both counters (TX, RX) of ifconfig for virtio interfaces >> overflow several times and everything is still as fast as it should be. > I had overflows on the counters as well (32 bit guests) without an > problem. > Here is the current ifconfig output of a machine which suffered the > problem before: > > eth0 Link encap:Ethernet HWaddr 52:54:00:74:01:01 > inet addr:10.75.13.1 Bcast:10.75.255.255 Mask:255.255.0.0 > UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 > RX packets:3542104 errors:0 dropped:0 overruns:0 frame:0 > TX packets:412546 errors:0 dropped:0 overruns:0 carrier:0 > collisions:0 txqueuelen:1000 > RX bytes:682285568 (650.6 MiB) TX bytes:2907586796 (2.7 GiB) packet counters are will within 32-bit limits. byte counters not so interesting. > currently I have e1000 only on windows guests. Is there a way to > gather relevant statistics there too? Sure, right-click on the adapter icon, it's there somewhere. Do you experience the slowdown on Windows guests? -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: strange guest slowness after some time 2009-03-17 17:10 ` Avi Kivity @ 2009-03-17 17:43 ` Tomasz Chmielewski 2009-03-17 18:55 ` Tomasz Chmielewski 2009-03-17 18:57 ` Felix Leimbach 2009-03-18 5:54 ` Felix Leimbach 2 siblings, 1 reply; 70+ messages in thread From: Tomasz Chmielewski @ 2009-03-17 17:43 UTC (permalink / raw) To: Avi Kivity; +Cc: Felix Leimbach, kvm Avi Kivity schrieb: > Felix Leimbach wrote: >> Tomasz Chmielewski wrote: >>> Avi Kivity schrieb: >>>> Might it be that some counter overflowed? What are the packet >>>> counts on long running guests? >> Here is the current ifconfig output of a machine which suffered the >> problem before: >> >> eth0 Link encap:Ethernet HWaddr 52:54:00:74:01:01 >> inet addr:10.75.13.1 Bcast:10.75.255.255 Mask:255.255.0.0 >> UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 >> RX packets:3542104 errors:0 dropped:0 overruns:0 frame:0 >> TX packets:412546 errors:0 dropped:0 overruns:0 carrier:0 >> collisions:0 txqueuelen:1000 >> RX bytes:682285568 (650.6 MiB) TX bytes:2907586796 (2.7 GiB) > > packet counters are will within 32-bit limits. byte counters not so > interesting. Ah OK. I did only byte overflow. Packet overflow will take much longer. It's one of these very rare cases where setting very small MTU is useful... -- Tomasz Chmielewski http://wpkg.org ^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: strange guest slowness after some time 2009-03-17 17:43 ` Tomasz Chmielewski @ 2009-03-17 18:55 ` Tomasz Chmielewski 2009-03-17 19:04 ` Felix Leimbach 0 siblings, 1 reply; 70+ messages in thread From: Tomasz Chmielewski @ 2009-03-17 18:55 UTC (permalink / raw) To: Avi Kivity; +Cc: Felix Leimbach, kvm Tomasz Chmielewski schrieb: > Avi Kivity schrieb: >> Felix Leimbach wrote: >>> Tomasz Chmielewski wrote: >>>> Avi Kivity schrieb: >>>>> Might it be that some counter overflowed? What are the packet >>>>> counts on long running guests? > >>> Here is the current ifconfig output of a machine which suffered the >>> problem before: >>> >>> eth0 Link encap:Ethernet HWaddr 52:54:00:74:01:01 >>> inet addr:10.75.13.1 Bcast:10.75.255.255 Mask:255.255.0.0 >>> UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 >>> RX packets:3542104 errors:0 dropped:0 overruns:0 frame:0 >>> TX packets:412546 errors:0 dropped:0 overruns:0 carrier:0 >>> collisions:0 txqueuelen:1000 >>> RX bytes:682285568 (650.6 MiB) TX bytes:2907586796 (2.7 GiB) >> >> packet counters are will within 32-bit limits. byte counters not so >> interesting. > > Ah OK. > I did only byte overflow. > > Packet overflow will take much longer. It's one of these very rare cases > where setting very small MTU is useful... OK, another bug found. Set your MTU to 100. On two hosts, do: HOST1_MTU1500# dd if=/dev/zero | ssh manager@HOST2 dd of=/dev/null HOST2_MTU100# dd if=/dev/zero | ssh manager@HOST1 dd of=/dev/null HOST2 with MTU 100 will crash after 10-15 minutes (with packet count still not overflown). -- Tomasz Chmielewski http://wpkg.org ^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: strange guest slowness after some time 2009-03-17 18:55 ` Tomasz Chmielewski @ 2009-03-17 19:04 ` Felix Leimbach 2009-03-17 19:24 ` Tomasz Chmielewski 0 siblings, 1 reply; 70+ messages in thread From: Felix Leimbach @ 2009-03-17 19:04 UTC (permalink / raw) To: Tomasz Chmielewski; +Cc: Avi Kivity, kvm Tomasz Chmielewski wrote: > Tomasz Chmielewski schrieb: >> Avi Kivity schrieb: >>> packet counters are will within 32-bit limits. byte counters not so >>> interesting. >> >> Ah OK. >> I did only byte overflow. >> >> Packet overflow will take much longer. It's one of these very rare >> cases where setting very small MTU is useful... > > OK, another bug found. > > Set your MTU to 100. > > On two hosts, do: > > HOST1_MTU1500# dd if=/dev/zero | ssh manager@HOST2 dd of=/dev/null > HOST2_MTU100# dd if=/dev/zero | ssh manager@HOST1 dd of=/dev/null > > HOST2 with MTU 100 will crash after 10-15 minutes (with packet count > still not overflown). > Intersting. What are the packet counter at crash time (roughly)? My - currently running - test is: Guest 1 (Linux): MTU 150 # cat /dev/zero | nc <guest2ip> 7777 Guest 2 (Windows 2003 Server): MTU: 1500 # nc -l -p 7777 > NUL My packet are currently at 63 million without a problem - yet. ^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: strange guest slowness after some time 2009-03-17 19:04 ` Felix Leimbach @ 2009-03-17 19:24 ` Tomasz Chmielewski 2009-03-17 20:14 ` Tomasz Chmielewski 2009-03-18 6:29 ` Avi Kivity 0 siblings, 2 replies; 70+ messages in thread From: Tomasz Chmielewski @ 2009-03-17 19:24 UTC (permalink / raw) To: Felix Leimbach; +Cc: Avi Kivity, kvm Felix Leimbach schrieb: >> OK, another bug found. >> >> Set your MTU to 100. >> >> On two hosts, do: >> >> HOST1_MTU1500# dd if=/dev/zero | ssh manager@HOST2 dd of=/dev/null >> HOST2_MTU100# dd if=/dev/zero | ssh manager@HOST1 dd of=/dev/null >> >> HOST2 with MTU 100 will crash after 10-15 minutes (with packet count >> still not overflown). >> > Intersting. What are the packet counter at crash time (roughly)? > > My - currently running - test is: > > Guest 1 (Linux): > MTU 150 > # cat /dev/zero | nc <guest2ip> 7777 > > Guest 2 (Windows 2003 Server): > MTU: 1500 > # nc -l -p 7777 > NUL > > My packet are currently at 63 million without a problem - yet. I have it running with MTU 1500. And one of the guests (the one which was crashing with MTU=100) froze. On a VNC console I can see: virtio_net virtio0: id 64 is not a head! BUG: soft lockup - CPU#0 stuck for 61s! [ssh:2265] And "soft lockup" is being printed periodically. VNC and serial console do not react to any key press. Guest do not react on ACPI events (shutdown). kvm/qemu process is using 100% CPU. See this screenshot: http://www1.wpkg.org/lockup.png Guest that locks up is running Debian Lenny with 2.6.26 kernel. Guest that does not lock up runs Mandriva 2009.0 with 2.6.27.x kernel. (data being transferred both side to/from each of these hosts). -- Tomasz Chmielewski http://wpkg.org ^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: strange guest slowness after some time 2009-03-17 19:24 ` Tomasz Chmielewski @ 2009-03-17 20:14 ` Tomasz Chmielewski 2009-03-17 22:34 ` Tomasz Chmielewski 2009-03-18 6:29 ` Avi Kivity 1 sibling, 1 reply; 70+ messages in thread From: Tomasz Chmielewski @ 2009-03-17 20:14 UTC (permalink / raw) To: Felix Leimbach; +Cc: Avi Kivity, kvm Tomasz Chmielewski schrieb: > See this screenshot: > > http://www1.wpkg.org/lockup.png > > > Guest that locks up is running Debian Lenny with 2.6.26 kernel. > Guest that does not lock up runs Mandriva 2009.0 with 2.6.27.x kernel. > (data being transferred both side to/from each of these hosts). Sorry, both machines run Debian Lenny and 2.6.26 kernel. The only difference is that machine which crashes (with MTU=100) or locks up (with MTU=1500) runs a "2.6.26-1-686" kernel and the one which doesn't lock up runs "2.6.26-1-486" kernel (both are Debian's kernels). -- Tomasz Chmielewski http://wpkg.org ^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: strange guest slowness after some time 2009-03-17 20:14 ` Tomasz Chmielewski @ 2009-03-17 22:34 ` Tomasz Chmielewski 2009-03-17 23:02 ` Tomasz Chmielewski 0 siblings, 1 reply; 70+ messages in thread From: Tomasz Chmielewski @ 2009-03-17 22:34 UTC (permalink / raw) To: Felix Leimbach; +Cc: Avi Kivity, kvm Tomasz Chmielewski schrieb: > Sorry, both machines run Debian Lenny and 2.6.26 kernel. > The only difference is that machine which crashes (with MTU=100) or > locks up (with MTU=1500) runs a "2.6.26-1-686" kernel and the one which > doesn't lock up runs "2.6.26-1-486" kernel (both are Debian's kernels). Some more tries and I got this one. Serial console died, but SSH is still working. Note the "S" tainted flag. According to Documentation/oops-tracing.txt, it means: 3: 'S' if the oops occurred on an SMP kernel running on hardware that hasn't been certified as safe to run multiprocessor. Currently this occurs only on various Athlons that are not SMP capable. And this is a difference between "2.6.26-1-686" and "2.6.26-1-486" kernels. # grep -i smp /boot/config-2.6.26-1-686 CONFIG_X86_SMP=y CONFIG_X86_32_SMP=y CONFIG_SMP=y # grep -i smp /boot/config-2.6.26-1-486 CONFIG_BROKEN_ON_SMP=y # CONFIG_SMP is not set [10942.216450] BUG: soft lockup - CPU#0 stuck for 760s! [postgres:1802] [10942.216450] Modules linked in: ipv6 loop joydev virtio_balloon virtio_net parport_pc parport snd_pcsp serio_raw snd_pcm snd_timer psmouse snd soundcore snd_page_alloc i2c_piix4 i2c_core button usbhid hid ff_memless evdev ext3 jbd mbcache virtio_blk ide_cd_mod cdrom ide_pci_generic floppy virtio_pci uhci_hcd usbcore piix ide_core ata_generic libata scsi_mod dock thermal processor fan thermal_sys [10942.216450] [10942.216450] Pid: 1802, comm: postgres Tainted: G S (2.6.26-1-686 #1) [10942.216450] EIP: 0060:[<c011d5a0>] EFLAGS: 00000206 CPU: 0 [10942.216450] EIP is at finish_task_switch+0x25/0x99 [10942.216450] EAX: c1208fa0 EBX: c03bafa0 ECX: c1208fa0 EDX: ce0be4a0 [10942.216450] ESI: 00000000 EDI: ce0be4a0 EBP: 00000001 ESP: ce7f9afc [10942.216450] DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 [10942.216450] CR0: 8005003b CR2: 080f3a10 CR3: 0eaeb000 CR4: 000006d0 [10942.216450] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000 [10942.216450] DR6: ffff0ff0 DR7: 00000400 [10942.216450] [<c02b82ee>] ? schedule+0x60c/0x66f [10942.216450] [<c0129ab0>] ? lock_timer_base+0x19/0x35 [10942.216450] [<c0129bc3>] ? __mod_timer+0x99/0xa3 [10942.216450] [<c02b8549>] ? schedule_timeout+0x6b/0x86 [10942.216450] [<c01297ec>] ? process_timeout+0x0/0x5 [10942.216450] [<c02b8544>] ? schedule_timeout+0x66/0x86 [10942.216450] [<c017f2c6>] ? do_select+0x364/0x3bd [10942.216450] [<c017f7ca>] ? __pollwait+0x0/0xac [10942.216450] [<d08e74c4>] ? start_xmit+0x9f/0xa5 [virtio_net] [10942.216450] [<c025895c>] ? dev_hard_start_xmit+0x1eb/0x24f [10942.216450] [<c02669f2>] ? __qdisc_run+0xcc/0x17c [10942.216450] [<c025abbf>] ? dev_queue_xmit+0x287/0x2bc [10942.216450] [<c02762cd>] ? ip_finish_output+0x1c5/0x1fc [10942.216450] [<c0115403>] ? pvclock_clocksource_read+0x4b/0xd0 [10942.216450] [<c0275e5b>] ? ip_local_out+0x15/0x17 [10942.216450] [<c013604c>] ? getnstimeofday+0x37/0xbc [10942.216450] [<c01344c2>] ? ktime_get_ts+0x22/0x49 [10942.216450] [<c01344f6>] ? ktime_get+0xd/0x21 [10942.216450] [<c01190e6>] ? hrtick_start_fair+0xeb/0x12c [10942.216450] [<c011b39f>] ? task_rq_lock+0x3b/0x5e [10942.216450] [<c02531ab>] ? skb_checksum+0x52/0x272 [10942.216450] [<c017f5a1>] ? core_sys_select+0x282/0x29f [10942.216450] [<c0129ccb>] ? mod_timer+0x19/0x36 [10942.216450] [<c0252345>] ? sock_def_readable+0xf/0x58 [10942.216450] [<c0283cf4>] ? tcp_rcv_established+0x51d/0x7b1 [10942.216450] [<c0288d9f>] ? tcp_v4_do_rcv+0x262/0x3e8 [10942.216450] [<c028ab5d>] ? tcp_v4_rcv+0x5b6/0x609 [10942.216450] [<c0272ec3>] ? ip_local_deliver_finish+0xe8/0x183 [10942.216450] [<c0272dbe>] ? ip_rcv_finish+0x286/0x2a3 [10942.216450] [<c025837a>] ? netif_receive_skb+0x2d6/0x343 [10942.216450] [<d08e7aa9>] ? virtnet_poll+0x21d/0x258 [virtio_net] [10942.216450] [<c017f915>] ? sys_select+0x9f/0x180 [10942.216450] [<c0103853>] ? sysenter_past_esp+0x78/0xb1 [10942.216450] ======================= -- Tomasz Chmielewski http://wpkg.org ^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: strange guest slowness after some time 2009-03-17 22:34 ` Tomasz Chmielewski @ 2009-03-17 23:02 ` Tomasz Chmielewski 0 siblings, 0 replies; 70+ messages in thread From: Tomasz Chmielewski @ 2009-03-17 23:02 UTC (permalink / raw) To: Felix Leimbach; +Cc: Avi Kivity, kvm Tomasz Chmielewski schrieb: > Tomasz Chmielewski schrieb: > >> Sorry, both machines run Debian Lenny and 2.6.26 kernel. >> The only difference is that machine which crashes (with MTU=100) or >> locks up (with MTU=1500) runs a "2.6.26-1-686" kernel and the one >> which doesn't lock up runs "2.6.26-1-486" kernel (both are Debian's >> kernels). > > Some more tries and I got this one. Serial console died, but SSH is > still working. > > Note the "S" tainted flag. > According to Documentation/oops-tracing.txt, it means: > > 3: 'S' if the oops occurred on an SMP kernel running on hardware that > hasn't been certified as safe to run multiprocessor. > Currently this occurs only on various Athlons that are not > SMP capable. > > > And this is a difference between "2.6.26-1-686" and "2.6.26-1-486" kernels. > > # grep -i smp /boot/config-2.6.26-1-686 > CONFIG_X86_SMP=y > CONFIG_X86_32_SMP=y > CONFIG_SMP=y > > > # grep -i smp /boot/config-2.6.26-1-486 > CONFIG_BROKEN_ON_SMP=y > # CONFIG_SMP is not set BTW, it was the machine with /boot/config-2.6.26-1-486 kernel (non-SMP) which got slow for me today. -- Tomasz Chmielewski http://wpkg.org ^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: strange guest slowness after some time 2009-03-17 19:24 ` Tomasz Chmielewski 2009-03-17 20:14 ` Tomasz Chmielewski @ 2009-03-18 6:29 ` Avi Kivity 2009-03-19 4:59 ` Rusty Russell 1 sibling, 1 reply; 70+ messages in thread From: Avi Kivity @ 2009-03-18 6:29 UTC (permalink / raw) To: Tomasz Chmielewski; +Cc: Felix Leimbach, kvm, Rusty Russell, Anthony Liguori Tomasz Chmielewski wrote: > Felix Leimbach schrieb: > >>> OK, another bug found. >>> >>> Set your MTU to 100. >>> >>> On two hosts, do: >>> >>> HOST1_MTU1500# dd if=/dev/zero | ssh manager@HOST2 dd of=/dev/null >>> HOST2_MTU100# dd if=/dev/zero | ssh manager@HOST1 dd of=/dev/null >>> >>> HOST2 with MTU 100 will crash after 10-15 minutes (with packet count >>> still not overflown). >>> >> Intersting. What are the packet counter at crash time (roughly)? >> >> My - currently running - test is: >> >> Guest 1 (Linux): >> MTU 150 >> # cat /dev/zero | nc <guest2ip> 7777 >> >> Guest 2 (Windows 2003 Server): >> MTU: 1500 >> # nc -l -p 7777 > NUL >> >> My packet are currently at 63 million without a problem - yet. > > I have it running with MTU 1500. And one of the guests (the one which > was crashing with MTU=100) froze. > > On a VNC console I can see: > > virtio_net virtio0: id 64 is not a head! > BUG: soft lockup - CPU#0 stuck for 61s! [ssh:2265] > > And "soft lockup" is being printed periodically. VNC and serial > console do not react to any key press. Guest do not react on ACPI > events (shutdown). > kvm/qemu process is using 100% CPU. > > See this screenshot: > > http://www1.wpkg.org/lockup.png > > > Guest that locks up is running Debian Lenny with 2.6.26 kernel. > Guest that does not lock up runs Mandriva 2009.0 with 2.6.27.x kernel. > (data being transferred both side to/from each of these hosts). Copying the virtio folks... something is wrong. You can obtain a stack trace of the locked up guest by doing (qemu) gdbserver 1234 $ gdb /path/to/guest/vmlinux (gdb) target remote localhost:1234 (gdb) backtrace I don't know host you obtain the guest vmlinux on debian; on Fedora it is contained in kernel-debuginfo. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. ^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: strange guest slowness after some time 2009-03-18 6:29 ` Avi Kivity @ 2009-03-19 4:59 ` Rusty Russell 2009-03-19 5:22 ` David S. Ahern 0 siblings, 1 reply; 70+ messages in thread From: Rusty Russell @ 2009-03-19 4:59 UTC (permalink / raw) To: Avi Kivity; +Cc: Tomasz Chmielewski, Felix Leimbach, kvm, Anthony Liguori On Wednesday 18 March 2009 16:59:36 Avi Kivity wrote: > Tomasz Chmielewski wrote: > > virtio_net virtio0: id 64 is not a head! This means that qemu said "I've finished with buffer 64" and the guest didn't know anything about buffer 64. We should not lock up, tho networking is toast: I think that qemu got upset and that caused this as well as it to chew 100% cpu. I'll see if I can reproduce with kvm-84 userspace and 2.6.27 guests, 32-bit guests on a 64-bit AMD host. What's your kvm/qemu command line? Thanks, Rusty. ^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: strange guest slowness after some time 2009-03-19 4:59 ` Rusty Russell @ 2009-03-19 5:22 ` David S. Ahern 2009-03-19 6:08 ` David S. Ahern 0 siblings, 1 reply; 70+ messages in thread From: David S. Ahern @ 2009-03-19 5:22 UTC (permalink / raw) To: Rusty Russell Cc: Avi Kivity, Tomasz Chmielewski, Felix Leimbach, kvm, Anthony Liguori Rusty Russell wrote: > On Wednesday 18 March 2009 16:59:36 Avi Kivity wrote: >> Tomasz Chmielewski wrote: >>> virtio_net virtio0: id 64 is not a head! > > This means that qemu said "I've finished with buffer 64" and the guest didn't > know anything about buffer 64. > > We should not lock up, tho networking is toast: I think that qemu got upset > and that caused this as well as it to chew 100% cpu. > > I'll see if I can reproduce with kvm-84 userspace and 2.6.27 guests, 32-bit > guests on a 64-bit AMD host. What's your kvm/qemu command line? > I've hit this as well. Intel host, running RHEL5.3, x86_64 with KVM-81. Guest is RHEL4.7, 32-bit, with the virtio drivers from RHEL4.8 beta. Happens pretty darn quickly for me. david > Thanks, > Rusty. > -- > To unsubscribe from this list: send the line "unsubscribe kvm" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > ^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: strange guest slowness after some time 2009-03-19 5:22 ` David S. Ahern @ 2009-03-19 6:08 ` David S. Ahern 2009-03-19 8:03 ` Tomasz Chmielewski 0 siblings, 1 reply; 70+ messages in thread From: David S. Ahern @ 2009-03-19 6:08 UTC (permalink / raw) To: Rusty Russell Cc: Avi Kivity, Tomasz Chmielewski, Felix Leimbach, kvm, Anthony Liguori David S. Ahern wrote: > > Rusty Russell wrote: >> On Wednesday 18 March 2009 16:59:36 Avi Kivity wrote: >>> Tomasz Chmielewski wrote: >>>> virtio_net virtio0: id 64 is not a head! >> This means that qemu said "I've finished with buffer 64" and the guest didn't >> know anything about buffer 64. >> >> We should not lock up, tho networking is toast: I think that qemu got upset >> and that caused this as well as it to chew 100% cpu. >> >> I'll see if I can reproduce with kvm-84 userspace and 2.6.27 guests, 32-bit >> guests on a 64-bit AMD host. What's your kvm/qemu command line? >> > > I've hit this as well. > > Intel host, running RHEL5.3, x86_64 with KVM-81. > > Guest is RHEL4.7, 32-bit, with the virtio drivers from RHEL4.8 beta. > > Happens pretty darn quickly for me. > > david > Like I said, pretty darn quickly. More information for you. Command line (a few elements blurred) for this run (started about 15 minutes ago): kvm -localtime -no-reboot -m 3584 -smp 4 \ -drive file=/dev/cciss/c0d0,if=scsi,cache=off,boot=on \ -drive file=/dev/cciss/c0d1,if=scsi,cache=off,boot=off \ -net nic,vlan=0,macaddr=00:11:22:33:44:55,model=virtio \ -net tap,vlan=0,ifname=tap0,script=no,downscript=no \ -net nic,vlan=1,macaddr=00:12:34:56:78:1,model=virtio \ -net tap,vlan=1,ifname=tap1,script=no,downscript=no \ -usb -usbdevice tablet -mem-path /hugepages \ -pidfile /tmp/1.pid \ -monitor unix:/tmp/1,server,nowait \ -vnc :1 It does not take much network traffic for the network to lock up. In this case, the host shows 2 kvm threads spinning -- for vcpus 2,3. I have vcpus pinned to pcpus (vcpu0:pcpu0, etc). Backtrace for kvm, though nothing interesting: Thread 5 (Thread 0x43344940 (LWP 3153)): #0 0x00002b8af5088c77 in ioctl () from /lib64/libc.so.6 #1 0x0000000000530ece in kvm_run () #2 0x0000000000506529 in kvm_cpu_exec () #3 0x00000000005067c0 in ap_main_loop () #4 0x00002b8af46aa367 in start_thread () from /lib64/libpthread.so.0 #5 0x00002b8af50900ad in clone () from /lib64/libc.so.6 Thread 4 (Thread 0x43d45940 (LWP 3154)): #0 0x00002b8af5088c77 in ioctl () from /lib64/libc.so.6 #1 0x0000000000530ece in kvm_run () #2 0x0000000000506529 in kvm_cpu_exec () #3 0x00000000005067c0 in ap_main_loop () #4 0x00002b8af46aa367 in start_thread () from /lib64/libpthread.so.0 #5 0x00002b8af50900ad in clone () from /lib64/libc.so.6 In the guest, I see 2 threads of 2 separate processes spinning on cpus 2,3. They appear to be spinning kernel side. Attempts to restart the network froze the guest console, and at this point the host shows 3 threads spinning away, though not at 100% cpu. The qemu monitor was able to push a system_powerdown event to the guest, and it showed signs of receiving it though it did not powerdown on its own. david > >> Thanks, >> Rusty. >> -- >> To unsubscribe from this list: send the line "unsubscribe kvm" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> > ^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: strange guest slowness after some time 2009-03-19 6:08 ` David S. Ahern @ 2009-03-19 8:03 ` Tomasz Chmielewski 2009-03-19 14:11 ` David S. Ahern 0 siblings, 1 reply; 70+ messages in thread From: Tomasz Chmielewski @ 2009-03-19 8:03 UTC (permalink / raw) To: David S. Ahern Cc: Rusty Russell, Avi Kivity, Felix Leimbach, kvm, Anthony Liguori David S. Ahern schrieb: > > David S. Ahern wrote: >> Rusty Russell wrote: >>> On Wednesday 18 March 2009 16:59:36 Avi Kivity wrote: >>>> Tomasz Chmielewski wrote: >>>>> virtio_net virtio0: id 64 is not a head! >>> This means that qemu said "I've finished with buffer 64" and the guest didn't >>> know anything about buffer 64. >>> >>> We should not lock up, tho networking is toast: I think that qemu got upset >>> and that caused this as well as it to chew 100% cpu. >>> >>> I'll see if I can reproduce with kvm-84 userspace and 2.6.27 guests, 32-bit >>> guests on a 64-bit AMD host. What's your kvm/qemu command line? >>> >> I've hit this as well. >> >> Intel host, running RHEL5.3, x86_64 with KVM-81. >> >> Guest is RHEL4.7, 32-bit, with the virtio drivers from RHEL4.8 beta. >> >> Happens pretty darn quickly for me. >> >> david >> > > Like I said, pretty darn quickly. Can you reproduce it also with e1000 instead of virtio? -- Tomasz Chmielewski http://wpkg.org ^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: strange guest slowness after some time 2009-03-19 8:03 ` Tomasz Chmielewski @ 2009-03-19 14:11 ` David S. Ahern 0 siblings, 0 replies; 70+ messages in thread From: David S. Ahern @ 2009-03-19 14:11 UTC (permalink / raw) To: Tomasz Chmielewski Cc: Rusty Russell, Avi Kivity, Felix Leimbach, kvm, Anthony Liguori Tomasz Chmielewski wrote: > David S. Ahern schrieb: >> >> David S. Ahern wrote: >>> Rusty Russell wrote: >>>> On Wednesday 18 March 2009 16:59:36 Avi Kivity wrote: >>>>> Tomasz Chmielewski wrote: >>>>>> virtio_net virtio0: id 64 is not a head! >>>> This means that qemu said "I've finished with buffer 64" and the >>>> guest didn't >>>> know anything about buffer 64. >>>> >>>> We should not lock up, tho networking is toast: I think that qemu >>>> got upset >>>> and that caused this as well as it to chew 100% cpu. >>>> >>>> I'll see if I can reproduce with kvm-84 userspace and 2.6.27 guests, >>>> 32-bit >>>> guests on a 64-bit AMD host. What's your kvm/qemu command line? >>>> >>> I've hit this as well. >>> >>> Intel host, running RHEL5.3, x86_64 with KVM-81. >>> >>> Guest is RHEL4.7, 32-bit, with the virtio drivers from RHEL4.8 beta. >>> >>> Happens pretty darn quickly for me. >>> >>> david >>> >> >> Like I said, pretty darn quickly. > > Can you reproduce it also with e1000 instead of virtio? > > I have not had a problem with the e1000 nic. This seems to be strictly a virtio bug; I get the same messages. These are 2 separate runs, one from March 11: kernel: virtio_net virtio0: id 98 is not a head! and the other last night: kernel: virtio_net virtio0: id 6 is not a head! david ^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: strange guest slowness after some time 2009-03-17 17:10 ` Avi Kivity 2009-03-17 17:43 ` Tomasz Chmielewski @ 2009-03-17 18:57 ` Felix Leimbach 2009-03-18 5:54 ` Felix Leimbach 2 siblings, 0 replies; 70+ messages in thread From: Felix Leimbach @ 2009-03-17 18:57 UTC (permalink / raw) To: Avi Kivity; +Cc: Tomasz Chmielewski, kvm Avi Kivity wrote: > Felix Leimbach wrote: >> eth0 Link encap:Ethernet HWaddr 52:54:00:74:01:01 >> inet addr:10.75.13.1 Bcast:10.75.255.255 Mask:255.255.0.0 >> UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 >> RX packets:3542104 errors:0 dropped:0 overruns:0 frame:0 >> TX packets:412546 errors:0 dropped:0 overruns:0 carrier:0 >> collisions:0 txqueuelen:1000 >> RX bytes:682285568 (650.6 MiB) TX bytes:2907586796 (2.7 GiB) > > packet counters are will within 32-bit limits. byte counters not so > interesting. ah right, I checked the byte counters only. Testing packet counter overflow now (takes a while). > Do you experience the slowdown on Windows guests? both Linux and Windows Server 2003. All 32bit. But with me it is not a slowdown but a complete loss of network in the guest. Can't be pinged anymore. Although there might be a slowdown period before the that, I've heard hints in that direction from users. ^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: strange guest slowness after some time 2009-03-17 17:10 ` Avi Kivity 2009-03-17 17:43 ` Tomasz Chmielewski 2009-03-17 18:57 ` Felix Leimbach @ 2009-03-18 5:54 ` Felix Leimbach 2 siblings, 0 replies; 70+ messages in thread From: Felix Leimbach @ 2009-03-18 5:54 UTC (permalink / raw) To: Avi Kivity; +Cc: Tomasz Chmielewski, kvm Avi, Avi Kivity wrote: > packet counters are will within 32-bit limits. byte counters not so > interesting. This night my test overflowed the *packet* counters twice without any slowness or loss of connectivity. Snippet from my log file of the sending VM (linux 2.6.27): Wed Mar 18 05:14:18 CET 2009: TX packet counter = 4292944043 Wed Mar 18 05:15:18 CET 2009: TX packet counter = 6259211 ifconfig after stress test: # ifconfig eth0 Link encap:Ethernet HWaddr 52:54:00:74:01:01 inet addr:10.75.13.1 Bcast:10.75.255.255 Mask:255.255.0.0 UP BROADCAST RUNNING MULTICAST MTU:150 Metric:1 RX packets:48950340 errors:0 dropped:0 overruns:0 frame:0 TX packets:727911367 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:3686511201 (3.4 GiB) TX bytes:4207842269 (3.9 GiB) I didn't create a log file on the receiving Windows VM but they must have overflowed as well. Its packet counters are currently: Sent: 47.112.780 Received: 1.515.275.693 No problem on the Windows guest either. So the problem must lie elsewhere. ^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: strange guest slowness after some time 2009-03-17 15:32 ` Felix Leimbach 2009-03-17 15:43 ` Tomasz Chmielewski 2009-03-17 15:52 ` Avi Kivity @ 2009-03-17 16:27 ` Tomasz Chmielewski 2009-03-17 17:14 ` Felix Leimbach 2 siblings, 1 reply; 70+ messages in thread From: Tomasz Chmielewski @ 2009-03-17 16:27 UTC (permalink / raw) To: Felix Leimbach; +Cc: avi, kvm Felix Leimbach schrieb: >> Yes, all affected had virtio. Probably because I didn't have many >> guests with e1000 interface. >> >> After a guest gets slow, I stop it and add another interface, e1000. >> >> >> If it gets slow again, I'll check if e1000 interface is slow as well. >> >> Will keep you updated. > I see similar behavior: After a week one of my guests' network totally > stops to respond. Only guests using virtio networking get hit. Both > windows and linux guests are affected. Also, does guest reboot help for you (for me, it doesn't)? Or, you have to halt the guest and start it again (i.e. stop kvm/qemu process and start a new one) to make the network working properly again? -- Tomasz Chmielewski http://wpkg.org ^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: strange guest slowness after some time 2009-03-17 16:27 ` Tomasz Chmielewski @ 2009-03-17 17:14 ` Felix Leimbach 2009-03-17 17:19 ` Avi Kivity 2009-03-17 17:34 ` Tomasz Chmielewski 0 siblings, 2 replies; 70+ messages in thread From: Felix Leimbach @ 2009-03-17 17:14 UTC (permalink / raw) To: Tomasz Chmielewski; +Cc: avi, kvm Tomasz Chmielewski wrote: > > Felix Leimbach schrieb: >> I see similar behavior: After a week one of my guests' network >> totally stops to respond. Only guests using virtio networking get >> hit. Both windows and linux guests are affected. > > Also, does guest reboot help for you (for me, it doesn't)? > > Or, you have to halt the guest and start it again (i.e. stop kvm/qemu > process and start a new one) to make the network working properly again? I have not tried rebooting; always stopped and restarted the qemu instance. Will try on the next occasion. Before I wrote that I tested on kvm-83 and 84 but it turns out the kvm-84 part was wrong: Since the upgrade 4 days ago I have not yet had a hang. I noticed that you Tomasz are also running kvm-83. Maybe kvm-84 fixed the issue already? ^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: strange guest slowness after some time 2009-03-17 17:14 ` Felix Leimbach @ 2009-03-17 17:19 ` Avi Kivity 2009-03-17 17:34 ` Tomasz Chmielewski 1 sibling, 0 replies; 70+ messages in thread From: Avi Kivity @ 2009-03-17 17:19 UTC (permalink / raw) To: Felix Leimbach; +Cc: Tomasz Chmielewski, kvm Felix Leimbach wrote: > I noticed that you Tomasz are also running kvm-83. Maybe kvm-84 fixed > the issue already? kvm-84 fixes a serious problem with kvmclock on AMDs, but does not fix the problem with c1e, so it may not have fixed the problem completely. -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: strange guest slowness after some time 2009-03-17 17:14 ` Felix Leimbach 2009-03-17 17:19 ` Avi Kivity @ 2009-03-17 17:34 ` Tomasz Chmielewski 1 sibling, 0 replies; 70+ messages in thread From: Tomasz Chmielewski @ 2009-03-17 17:34 UTC (permalink / raw) To: Felix Leimbach; +Cc: avi, kvm Felix Leimbach schrieb: > I have not tried rebooting; always stopped and restarted the qemu > instance. Will try on the next occasion. > > Before I wrote that I tested on kvm-83 and 84 but it turns out the > kvm-84 part was wrong: Since the upgrade 4 days ago I have not yet had a > hang. > I noticed that you Tomasz are also running kvm-83. Maybe kvm-84 fixed > the issue already? No, I run kvm-84. With kvm-83 I had this issue much more frequently. With kvm-84, is seems less frequent. Or maybe that's just what I'd like to believe ;) -- Tomasz Chmielewski http://wpkg.org ^ permalink raw reply [flat|nested] 70+ messages in thread
end of thread, other threads:[~2009-06-16 14:26 UTC | newest] Thread overview: 70+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2009-03-07 15:47 strange guest slowness after some time Tomasz Chmielewski 2009-03-07 16:41 ` Johannes Baumann 2009-03-07 16:54 ` Tomasz Chmielewski 2009-03-09 9:18 ` Tomasz Chmielewski 2009-03-09 9:28 ` Tomasz Chmielewski 2009-03-19 13:03 ` Tomasz Chmielewski 2009-03-09 9:55 ` Avi Kivity 2009-03-09 10:22 ` Tomasz Chmielewski 2009-03-09 10:25 ` Avi Kivity 2009-03-09 10:31 ` Tomasz Chmielewski 2009-03-09 10:37 ` Avi Kivity 2009-03-09 10:54 ` Tomasz Chmielewski 2009-03-09 11:37 ` Tomasz Chmielewski 2009-03-09 12:14 ` Avi Kivity 2009-03-09 12:52 ` Tomasz Chmielewski 2009-03-15 15:41 ` Avi Kivity 2009-03-15 16:14 ` Avi Kivity 2009-03-15 13:19 ` Tomasz Chmielewski 2009-03-17 10:47 ` Tomasz Chmielewski 2009-03-17 11:16 ` Avi Kivity 2009-03-17 11:25 ` Tomasz Chmielewski 2009-03-17 15:32 ` Felix Leimbach 2009-03-17 15:43 ` Tomasz Chmielewski 2009-03-17 17:01 ` Felix Leimbach 2009-03-17 17:05 ` Avi Kivity 2009-03-17 18:49 ` Felix Leimbach 2009-03-18 6:36 ` Avi Kivity 2009-03-18 7:57 ` Felix Leimbach 2009-03-18 8:48 ` Avi Kivity 2009-03-18 9:08 ` Felix Leimbach 2009-03-17 17:38 ` Tomasz Chmielewski 2009-06-08 11:02 ` Felix Leimbach 2009-06-16 14:26 ` Tomasz Chmielewski 2009-03-31 8:50 ` Tomasz Chmielewski 2009-04-01 4:22 ` David S. Ahern 2009-04-01 6:21 ` Tomasz Chmielewski 2009-04-06 15:19 ` Tomasz Chmielewski 2009-04-08 0:49 ` Rusty Russell 2009-04-08 5:45 ` Tomasz Chmielewski 2009-05-26 11:49 ` Tomasz Chmielewski 2009-05-26 11:55 ` Avi Kivity 2009-05-26 12:05 ` Tomasz Chmielewski 2009-05-26 12:10 ` Avi Kivity 2009-05-26 14:07 ` Tomasz Chmielewski 2009-05-26 14:35 ` Avi Kivity 2009-05-28 14:58 ` Tomasz Chmielewski 2009-05-31 8:43 ` Avi Kivity 2009-03-17 15:52 ` Avi Kivity 2009-03-17 16:12 ` Tomasz Chmielewski 2009-03-17 17:05 ` Felix Leimbach 2009-03-17 17:10 ` Avi Kivity 2009-03-17 17:43 ` Tomasz Chmielewski 2009-03-17 18:55 ` Tomasz Chmielewski 2009-03-17 19:04 ` Felix Leimbach 2009-03-17 19:24 ` Tomasz Chmielewski 2009-03-17 20:14 ` Tomasz Chmielewski 2009-03-17 22:34 ` Tomasz Chmielewski 2009-03-17 23:02 ` Tomasz Chmielewski 2009-03-18 6:29 ` Avi Kivity 2009-03-19 4:59 ` Rusty Russell 2009-03-19 5:22 ` David S. Ahern 2009-03-19 6:08 ` David S. Ahern 2009-03-19 8:03 ` Tomasz Chmielewski 2009-03-19 14:11 ` David S. Ahern 2009-03-17 18:57 ` Felix Leimbach 2009-03-18 5:54 ` Felix Leimbach 2009-03-17 16:27 ` Tomasz Chmielewski 2009-03-17 17:14 ` Felix Leimbach 2009-03-17 17:19 ` Avi Kivity 2009-03-17 17:34 ` Tomasz Chmielewski
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.