All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: [Qemu-devel] BUG: RTC issue when Windows guest is idle
       [not found] ` <20130221222246.GH10005@vm>
@ 2013-03-14  4:15   ` Matthew Anderson
  2013-03-19 17:53     ` Gleb Natapov
  0 siblings, 1 reply; 16+ messages in thread
From: Matthew Anderson @ 2013-03-14  4:15 UTC (permalink / raw)
  To: 'mdroth'; +Cc: 'qemu-devel@nongnu.org'

Thanks for the suggestion but so far it hasn't made any difference.

I thought I may be an issue with the mainline kernel I was using so I changed over to Ubuntu 12.10 and QEMU 1.2.0. Stability is slightly better but I still get 2-3 VM's a day out of about 120 experiencing this bug in the exact same way as before.  Time in the guest simply stop's ticking over and when you attempt to interact with the VM the lost ticks get replayed as fast as possible. Because of the time being out practically nothing works as it breaks crypto support (as well as practically everything else).

In addition to what I posted previously I've tried -
Setting the CPU type to host, qemu64 and kvm64
Setting the machine type to 0.14, 0.15, 1.0, 1.1 and 1.2 (plus 1.3 and 1.4 from previous testing)
With and without the HPET timer using combinations of the above

The settings I'm using work fine on the Centos 6.3 (2.6.32) kernel and QEMU 1.4.0. I've only had this problem since upgrading the kernel which is the only evidence I have of it being a kernel/KVM bug. I'm incredibly desparate for any solution or advice that may lead to getting this problem sorted,I'm getting a tyrade of angry phone called every morning that's making me want to go postal. 

Thanks
-Matt

-----Original Message-----
From: fluxion [mailto:flukshun@gmail.com] On Behalf Of mdroth
Sent: Friday, 22 February 2013 6:23 AM
To: Matthew Anderson
Cc: 'qemu-devel@nongnu.org'
Subject: Re: [Qemu-devel] BUG: RTC issue when Windows guest is idle

On Thu, Feb 21, 2013 at 06:16:10PM +0000, Matthew Anderson wrote:
> If this isn't the correct list just let me know,
> 
> I've run into a bug whereby a Windows guest (tested on Server 2008R2 and 2012) no longer receives RTC ticks when it has been idle for a random amount of time. HPET is disabled and the guest is running Hyper-V relaxed timers (same situation without hv_relaxed). The guest clock stands still and the qemu process uses very little CPU (<0.5%, normally it's >5% when the guest is idle) . Eventually the guest stops responding to network requests but if you open the guest console via VNC and move the mouse around it comes back to life and QEMU replays the lost RTC ticks and the guest recovers. I've also been able to make it recover by querying the clock over the network via the net time command, you can see the clock stand still for 30 seconds then it replays the ticks and catches up.
> 
> I've tried to reproduce the issue but it seems fairly illusive, the only way I've been able to reproduce it is by letting the VM's idle and waiting. Sometimes it's hours and sometimes minutes. Can anyone suggest a way to narrow the issue down?
> 
> Qemu command line is-
> /usr/bin/kvm -name SQL01 -S -M pc-0.14 -cpu qemu64,hv_relaxed 
> -enable-kvm -m 2048 -smp 2,sockets=2,cores=1,threads=1 -uuid 
> 5f54333b-c250-aa72-c979-39d156814b85 -no-user-config -nodefaults 
> -chardev 
> socket,id=charmonitor,path=/var/lib/libvirt/qemu/iHost-SQL01.monitor,s
> erver,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc 
> base=localtime -no-hpet -no-shutdown -device 
> piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive 
> file=/mnt/gluster1-norep/iHost/SQL01.qed,if=none,id=drive-virtio-disk0
> ,format=qed,cache=writeback -device 
> virtio-blk-pci,scsi=off,bus=pci.0,addr=0x5,drive=drive-virtio-disk0,id
> =virtio-disk0 -drive 
> file=/mnt/gluster1-norep/iHost/SQL01-Data.qed,if=none,id=drive-virtio-
> disk2,format=qed,cache=writeback -device 
> virtio-blk-pci,scsi=off,bus=pci.0,addr=0x6,drive=drive-virtio-disk2,id
> =virtio-disk2 -drive if=none,id=drive-ide0-1-0,readonly=on,format=raw 
> -device 
> ide-cd,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0,bootindex=1 
> -netdev tap,fd=29,id=hostnet0,vhost=on,vhostfd=39 -device 
> virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:2c:8d:23,bus=pci.0
> ,addr=0x3 -chardev pty,id=charserial0 -device 
> isa-serial,chardev=charserial0,id=serial0 -device usb-tablet,id=input0 
> -vnc 127.0.0.1:22 -vga cirrus -device 
> virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x4
> 
> Environment is -
> Mainline 3.7.5 and 3.8.0
> Qemu 1.2.2, 1.3.1 and 1.4.0

Were all of these with -M pc-0.14? Only thing that stands out to me is kernel_irqchip being disabled in your case. -M 1.1 and higher will enable it by default. Worth a shot.

> Scientific Linux 6.3
> KSM enabled, transparent hugepages disabled.
> Dual Xeon 5650
> 192GB
> 
> Thanks all
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Qemu-devel] BUG: RTC issue when Windows guest is idle
  2013-03-14  4:15   ` [Qemu-devel] BUG: RTC issue when Windows guest is idle Matthew Anderson
@ 2013-03-19 17:53     ` Gleb Natapov
  0 siblings, 0 replies; 16+ messages in thread
From: Gleb Natapov @ 2013-03-19 17:53 UTC (permalink / raw)
  To: Matthew Anderson; +Cc: 'mdroth', 'qemu-devel@nongnu.org'

On Thu, Mar 14, 2013 at 04:15:10AM +0000, Matthew Anderson wrote:
> Thanks for the suggestion but so far it hasn't made any difference.
> 
> I thought I may be an issue with the mainline kernel I was using so I changed over to Ubuntu 12.10 and QEMU 1.2.0. Stability is slightly better but I still get 2-3 VM's a day out of about 120 experiencing this bug in the exact same way as before.  Time in the guest simply stop's ticking over and when you attempt to interact with the VM the lost ticks get replayed as fast as possible. Because of the time being out practically nothing works as it breaks crypto support (as well as practically everything else).
> 
> In addition to what I posted previously I've tried -
> Setting the CPU type to host, qemu64 and kvm64
> Setting the machine type to 0.14, 0.15, 1.0, 1.1 and 1.2 (plus 1.3 and 1.4 from previous testing)
> With and without the HPET timer using combinations of the above
> 
> The settings I'm using work fine on the Centos 6.3 (2.6.32) kernel and QEMU 1.4.0. I've only had this problem since upgrading the kernel which is the only evidence I have of it being a kernel/KVM bug. I'm incredibly desparate for any solution or advice that may lead to getting this problem sorted,I'm getting a tyrade of angry phone called every morning that's making me want to go postal. 
> 
Is this happening to all VMs on the same machine simultaneously?

--
			Gleb.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Qemu-devel] BUG: RTC issue when Windows guest is idle
  2013-10-30  1:29           ` Xiexiangyou
@ 2013-10-30  7:26             ` Alex Bligh
  0 siblings, 0 replies; 16+ messages in thread
From: Alex Bligh @ 2013-10-30  7:26 UTC (permalink / raw)
  To: Xiexiangyou
  Cc: Zhouxiangjiu, Stefan Hajnoczi, Luonengjun, qemu-devel,
	Huangpeng (Peter),
	Matthew Anderson, Alex Bligh


On 30 Oct 2013, at 01:29, Xiexiangyou wrote:

> RTC timer may stop after live migration if you set the rtc_clock = host_clock. Because the different hosts have different system time.
> Rtc is waiting for the next_periodic_time after migrating, it may wait more longer. During the time, VM will lose one tick with great possibility,

Surely at worst this should wait until the destination RTC catches up?

> and the RTC periodic timer interrupt is blocked by "ioapic->irr==1" in KVM module.

Don't understand the significance of that. Why would the ioapic register be different after migration than before?

> To solve the problem, you can try "rtc_clock = vm_clock"(HPET is ok because it using vm_clock). Like this:
>  <clock offset='utc'>
>    <timer name='rtc' tickpolicy='catchup' track='guest'/>
>    <timer name='hpet' present='no'/>
>  </clock>
> 
> I hope it will help you.

-- 
Alex Bligh

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Qemu-devel] BUG: RTC issue when Windows guest is idle
  2013-10-30  0:44             ` Xiexiangyou
@ 2013-10-30  7:23               ` Alex Bligh
  0 siblings, 0 replies; 16+ messages in thread
From: Alex Bligh @ 2013-10-30  7:23 UTC (permalink / raw)
  To: Xiexiangyou
  Cc: Zhouxiangjiu, Stefan Hajnoczi, Luonengjun, qemu-devel,
	Huangpeng (Peter),
	Matthew Anderson, Alex Bligh


On 30 Oct 2013, at 00:44, Xiexiangyou wrote:

> But I want to know why alarm timer will make the problem, is the reason that losing the alarm event?

I am not sure it was the use of alarm timers per se - doubtless it would be
possible to write code that worked 100% using alarm timers. The old code
had accumulated over years and had grown somewhat crufty - no one's fault
in particular - so there were various inconsistencies relating to edge
cases (infinite timeouts, no timers running, no FDs, >UINT32_MAX timeouts
etc). I just remember seeing several times bits of code that made me
think "I wonder if and how that can ever work in situation X" and not
investigating further as the replacement code was generic enough to
work in all situations. And if there were bugs in the alarm timer bit
they will simply have been deleted when I deleted several hundred lines
of qemu-timer.c. Sometimes a rewrite just fixes stuff without
any particular bug-hunting skill on the part of the author.

-- 
Alex Bligh

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Qemu-devel] BUG: RTC issue when Windows guest is idle
  2013-10-28  6:58         ` Matthew Anderson
  2013-10-28  7:44           ` Alex Bligh
  2013-10-28 11:51           ` Paolo Bonzini
@ 2013-10-30  1:29           ` Xiexiangyou
  2013-10-30  7:26             ` Alex Bligh
  2 siblings, 1 reply; 16+ messages in thread
From: Xiexiangyou @ 2013-10-30  1:29 UTC (permalink / raw)
  To: Matthew Anderson, 'Alex Bligh'
  Cc: Zhouxiangjiu, Stefan Hajnoczi, Luonengjun, qemu-devel, Huangpeng (Peter)

Hi Anderson,

RTC timer may stop after live migration if you set the rtc_clock = host_clock. Because the different hosts have different system time.
Rtc is waiting for the next_periodic_time after migrating, it may wait more longer. During the time, VM will lose one tick with great possibility,
and the RTC periodic timer interrupt is blocked by "ioapic->irr==1" in KVM module.
To solve the problem, you can try "rtc_clock = vm_clock"(HPET is ok because it using vm_clock). Like this:
  <clock offset='utc'>
    <timer name='rtc' tickpolicy='catchup' track='guest'/>
    <timer name='hpet' present='no'/>
  </clock>

I hope it will help you.

Best wishes
---
Xiexiangyou

-----Original Message-----
From: Matthew Anderson [mailto:matthewa@base3.com.au] 
Sent: Monday, October 28, 2013 2:59 PM
To: 'Alex Bligh'; Xiexiangyou
Cc: Stefan Hajnoczi; qemu-devel@nongnu.org
Subject: RE: [Qemu-devel] BUG: RTC issue when Windows guest is idle

Hi Alex,

I've been doing some testing with the latest git version and so far I haven't seen a guest freeze in the same circumstances as before. 

A weird thing that has been happening is the RTC timer stopping after a live migration. This happened in both 1.6.1 and the 1.6.50 git build. To replicate the issue I was migrating to/from the same machine and anywhere between 1 and 3 migrations the guest clock would stop. Connecting to the VNC console would not get it running again. I've tried to replicate the issue with the HPET enabled but the guest clock works flawlessly with it enabled.

Guest VM is Windows 2008R2. Host is Ubuntu 13.04 (kernel 3.8.0-25-generic)
Command line -
/usr/local/bin/qemu-system-x86_64 -enable-kvm -nodefconfig -nodefaults -daemonize -usb -chardev socket,id=charmonitor,path=/var/run/based1/monitor/525ce3d009c437d678000002.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -chardev socket,path=/var/run/based1/ga/525ce3d009c437d678000002.guestagent,server,nowait,id=qga0 -device virtio-serial -device virtserialport,chardev=qga0,name=org.qemu.guest_agent.0 -pidfile /var/run/based1/pid/525ce3d009c437d678000002.pid -vga cirrus -vnc 0.0.0.0:1 -M pc-i440fx-1.6 -m 1024 -smp sockets=1,cores=1,threads=1 -cpu qemu64,+vme,+dts,+acpi,+dtes64,+vmx,+smx,+ssse3,+sse4_1,+sse4_2,+tpr_shadow,+vnmi,+flexpriority,+ept,+vpid,hv_relaxed,hv_spinlocks=0xffff,hv_vapic -rtc base=utc,driftfix=slew --no-hpet -drive aio=native,file=rbd:sata/525ce3d009c437d678000003,if=virtio,id=disk-525ce3d009c437d678000005,format=raw,cache=writeback,media=disk,index=0,addr=0xa -netdev tap,id=netdev-5264b6d46e53c81719000236,vhost=off,ifname=tap2,script=no,downscript=no -device virtio-net-pci,netdev=netdev-5264b6d46e53c81719000236,id=interface-5264b6d46e53c81719000236,mac=9a:a5:63:64:6f:76,bus=pci.0,addr=0xb -incoming tcp:0:3004

Thanks
-Matt


-----Original Message-----
From: Alex Bligh [mailto:alex@alex.org.uk] 
Sent: Tuesday, 22 October 2013 5:36 PM
To: Xiexiangyou; Matthew Anderson
Cc: Stefan Hajnoczi; qemu-devel@nongnu.org; Alex Bligh
Subject: Re: [Qemu-devel] BUG: RTC issue when Windows guest is idle



--On 22 October 2013 08:28:08 +0000 Xiexiangyou <xiexiangyou@huawei.com>
wrote:

> Hi:
>
> I have run windows2008r2 guest with qemu-1.5.1/1.6(I have not test the 
> the newer  version) for long time, the issue (guest in hangup state) 
> will come out.When guest  in hangup state, QEMU main thread is blocked 
> in g_poll loop.
>> 398c398,399
>> <     uint32_t timeout = UINT32_MAX;
>> ---
>>>     /* uint32_t timeout = UINT32_MAX; */
>>>     uint32_t timeout = 1000;
>>
> It seems can fix the problem, and rtc/hpet interrupt can inject into 
> guest again  because of the "timeout", and guest will wake up . But 
> maybe the issue is also exist,  because during the time before timeout 
> , guest also will lose rtc/hpet ticks.

I do not think that is the correct fix for 1.5.1/1.6; what you are basically doing is limiting the wait in the mainloop to one second (1.5.1/1.6 are in milliseconds); however, I believe there may be other code that looks for infinite timeouts. Either there is some other bug that this is masking (in which case it may or may not be fixed in master / 1.7), or its a bug in the timer stuff in 1.5.1/1.6 (which would not surprise me) which is likely to have been fixed in master / 1.7.

--
Alex Bligh

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Qemu-devel] BUG: RTC issue when Windows guest is idle
  2013-10-28  7:44           ` Alex Bligh
  2013-10-28  8:46             ` Alex Bligh
@ 2013-10-30  0:44             ` Xiexiangyou
  2013-10-30  7:23               ` Alex Bligh
  1 sibling, 1 reply; 16+ messages in thread
From: Xiexiangyou @ 2013-10-30  0:44 UTC (permalink / raw)
  To: Alex Bligh, Matthew Anderson
  Cc: Zhouxiangjiu, Stefan Hajnoczi, Luonengjun, qemu-devel, Huangpeng (Peter)

Hi Alex:
    I have been test with the QEMU Master version for several days, and the issue is not appear too. 
I think you have fixed it using the timeout instead of alarm timer! 
It' great!
But I want to know why alarm timer will make the problem, is the reason that losing the alarm event?

Thanks
---
xiexiangyou  

-----Original Message-----
From: Alex Bligh [mailto:alex@alex.org.uk] 
Sent: Monday, October 28, 2013 3:45 PM
To: Matthew Anderson
Cc: Alex Bligh; Xiexiangyou; Stefan Hajnoczi; qemu-devel@nongnu.org
Subject: Re: [Qemu-devel] BUG: RTC issue when Windows guest is idle


On 28 Oct 2013, at 06:58, Matthew Anderson wrote:

> I've been doing some testing with the latest git version and so far I haven't seen a guest freeze in the same circumstances as before. 

That's good news.

> A weird thing that has been happening is the RTC timer stopping after a live migration. This happened in both 1.6.1 and the 1.6.50 git build. To replicate the issue I was migrating to/from the same machine and anywhere between 1 and 3 migrations the guest clock would stop. Connecting to the VNC console would not get it running again. I've tried to replicate the issue with the HPET enabled but the guest clock works flawlessly with it enabled.

Does that one happen on master/1.7 as well? Oddly we saw this one or something like it on Xen+qemu.

-- 
Alex Bligh





^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Qemu-devel] BUG: RTC issue when Windows guest is idle
  2013-10-28  6:58         ` Matthew Anderson
  2013-10-28  7:44           ` Alex Bligh
@ 2013-10-28 11:51           ` Paolo Bonzini
  2013-10-30  1:29           ` Xiexiangyou
  2 siblings, 0 replies; 16+ messages in thread
From: Paolo Bonzini @ 2013-10-28 11:51 UTC (permalink / raw)
  To: Matthew Anderson
  Cc: Stefan Hajnoczi, Xiexiangyou, 'Alex Bligh', qemu-devel

Il 28/10/2013 07:58, Matthew Anderson ha scritto:
> Hi Alex,
> 
> I've been doing some testing with the latest git version and so far I haven't seen a guest freeze in the same circumstances as before. 
> 
> A weird thing that has been happening is the RTC timer stopping after a live migration. This happened in both 1.6.1 and the 1.6.50 git build. To replicate the issue I was migrating to/from the same machine and anywhere between 1 and 3 migrations the guest clock would stop. Connecting to the VNC console would not get it running again. I've tried to replicate the issue with the HPET enabled but the guest clock works flawlessly with it enabled.

Please try this:

(1) reproduce it with migration to file + restore from file

     to save, from QEMU monitor:
         migrate exec:cat>migr.ckp

     to restore, from command line:
         ... -incoming 'exec:cat migr.ckp'

(2) If the RTC stops after restoring, compress migr.ckp and place it
somewhere I can download it.

Thanks,

Paolo

> Guest VM is Windows 2008R2. Host is Ubuntu 13.04 (kernel 3.8.0-25-generic)
> Command line -
> /usr/local/bin/qemu-system-x86_64 -enable-kvm -nodefconfig -nodefaults -daemonize -usb -chardev socket,id=charmonitor,path=/var/run/based1/monitor/525ce3d009c437d678000002.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -chardev socket,path=/var/run/based1/ga/525ce3d009c437d678000002.guestagent,server,nowait,id=qga0 -device virtio-serial -device virtserialport,chardev=qga0,name=org.qemu.guest_agent.0 -pidfile /var/run/based1/pid/525ce3d009c437d678000002.pid -vga cirrus -vnc 0.0.0.0:1 -M pc-i440fx-1.6 -m 1024 -smp sockets=1,cores=1,threads=1 -cpu qemu64,+vme,+dts,+acpi,+dtes64,+vmx,+smx,+ssse3,+sse4_1,+sse4_2,+tpr_shadow,+vnmi,+flexpriority,+ept,+vpid,hv_relaxed,hv_spinlocks=0xffff,hv_vapic -rtc base=utc,driftfix=slew --no-hpet -drive aio=native,file=rbd:sata/525ce3d009c437d678000003,if=virtio,id=disk-525ce3d009c437d678000005,format=raw,cache=writeback,media=disk,index=0,addr=0xa -netdev tap,id=netdev-5264b6d46e53c81719000236,vhost=off,ifname=tap2,script=no,dow
nscript=no -device virtio-net-pci,netdev=netdev-5264b6d46e53c81719000236,id=interface-5264b6d46e53c81719000236,mac=9a:a5:63:64:6f:76,bus=pci.0,addr=0xb -incoming tcp:0:3004
> 
> Thanks
> -Matt
> 
> 
> -----Original Message-----
> From: Alex Bligh [mailto:alex@alex.org.uk] 
> Sent: Tuesday, 22 October 2013 5:36 PM
> To: Xiexiangyou; Matthew Anderson
> Cc: Stefan Hajnoczi; qemu-devel@nongnu.org; Alex Bligh
> Subject: Re: [Qemu-devel] BUG: RTC issue when Windows guest is idle
> 
> 
> 
> --On 22 October 2013 08:28:08 +0000 Xiexiangyou <xiexiangyou@huawei.com>
> wrote:
> 
>> Hi:
>>
>> I have run windows2008r2 guest with qemu-1.5.1/1.6(I have not test the 
>> the newer  version) for long time, the issue (guest in hangup state) 
>> will come out.When guest  in hangup state, QEMU main thread is blocked 
>> in g_poll loop.
>>> 398c398,399
>>> <     uint32_t timeout = UINT32_MAX;
>>> ---
>>>>     /* uint32_t timeout = UINT32_MAX; */
>>>>     uint32_t timeout = 1000;
>>>
>> It seems can fix the problem, and rtc/hpet interrupt can inject into 
>> guest again  because of the "timeout", and guest will wake up . But 
>> maybe the issue is also exist,  because during the time before timeout 
>> , guest also will lose rtc/hpet ticks.
> 
> I do not think that is the correct fix for 1.5.1/1.6; what you are basically doing is limiting the wait in the mainloop to one second (1.5.1/1.6 are in milliseconds); however, I believe there may be other code that looks for infinite timeouts. Either there is some other bug that this is masking (in which case it may or may not be fixed in master / 1.7), or its a bug in the timer stuff in 1.5.1/1.6 (which would not surprise me) which is likely to have been fixed in master / 1.7.
> 
> --
> Alex Bligh
> 
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Qemu-devel] BUG: RTC issue when Windows guest is idle
  2013-10-28  7:44           ` Alex Bligh
@ 2013-10-28  8:46             ` Alex Bligh
  2013-10-30  0:44             ` Xiexiangyou
  1 sibling, 0 replies; 16+ messages in thread
From: Alex Bligh @ 2013-10-28  8:46 UTC (permalink / raw)
  To: Alex Bligh; +Cc: Matthew Anderson, qemu-devel, Xiexiangyou, Stefan Hajnoczi


On 28 Oct 2013, at 07:44, Alex Bligh wrote:

>> 
>> A weird thing that has been happening is the RTC timer stopping after a live migration. This happened in both 1.6.1 and the 1.6.50 git build. To replicate the issue I was migrating to/from the same machine and anywhere between 1 and 3 migrations the guest clock would stop. Connecting to the VNC console would not get it running again. I've tried to replicate the issue with the HPET enabled but the guest clock works flawlessly with it enabled.
> 
> Does that one happen on master/1.7 as well? Oddly we saw this one or something like it on Xen+qemu.

... and master actually calls itself 1.6.50 which I'd never realised, so this is broken on master it seems.

If it works with HPET it looks like a clock source / migration bug.

-- 
Alex Bligh

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Qemu-devel] BUG: RTC issue when Windows guest is idle
  2013-10-28  6:58         ` Matthew Anderson
@ 2013-10-28  7:44           ` Alex Bligh
  2013-10-28  8:46             ` Alex Bligh
  2013-10-30  0:44             ` Xiexiangyou
  2013-10-28 11:51           ` Paolo Bonzini
  2013-10-30  1:29           ` Xiexiangyou
  2 siblings, 2 replies; 16+ messages in thread
From: Alex Bligh @ 2013-10-28  7:44 UTC (permalink / raw)
  To: Matthew Anderson; +Cc: Stefan Hajnoczi, Xiexiangyou, Alex Bligh, qemu-devel


On 28 Oct 2013, at 06:58, Matthew Anderson wrote:

> I've been doing some testing with the latest git version and so far I haven't seen a guest freeze in the same circumstances as before. 

That's good news.

> A weird thing that has been happening is the RTC timer stopping after a live migration. This happened in both 1.6.1 and the 1.6.50 git build. To replicate the issue I was migrating to/from the same machine and anywhere between 1 and 3 migrations the guest clock would stop. Connecting to the VNC console would not get it running again. I've tried to replicate the issue with the HPET enabled but the guest clock works flawlessly with it enabled.

Does that one happen on master/1.7 as well? Oddly we saw this one or something like it on Xen+qemu.

-- 
Alex Bligh

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Qemu-devel] BUG: RTC issue when Windows guest is idle
  2013-10-22  9:35       ` Alex Bligh
@ 2013-10-28  6:58         ` Matthew Anderson
  2013-10-28  7:44           ` Alex Bligh
                             ` (2 more replies)
  0 siblings, 3 replies; 16+ messages in thread
From: Matthew Anderson @ 2013-10-28  6:58 UTC (permalink / raw)
  To: 'Alex Bligh', Xiexiangyou; +Cc: Stefan Hajnoczi, qemu-devel

Hi Alex,

I've been doing some testing with the latest git version and so far I haven't seen a guest freeze in the same circumstances as before. 

A weird thing that has been happening is the RTC timer stopping after a live migration. This happened in both 1.6.1 and the 1.6.50 git build. To replicate the issue I was migrating to/from the same machine and anywhere between 1 and 3 migrations the guest clock would stop. Connecting to the VNC console would not get it running again. I've tried to replicate the issue with the HPET enabled but the guest clock works flawlessly with it enabled.

Guest VM is Windows 2008R2. Host is Ubuntu 13.04 (kernel 3.8.0-25-generic)
Command line -
/usr/local/bin/qemu-system-x86_64 -enable-kvm -nodefconfig -nodefaults -daemonize -usb -chardev socket,id=charmonitor,path=/var/run/based1/monitor/525ce3d009c437d678000002.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -chardev socket,path=/var/run/based1/ga/525ce3d009c437d678000002.guestagent,server,nowait,id=qga0 -device virtio-serial -device virtserialport,chardev=qga0,name=org.qemu.guest_agent.0 -pidfile /var/run/based1/pid/525ce3d009c437d678000002.pid -vga cirrus -vnc 0.0.0.0:1 -M pc-i440fx-1.6 -m 1024 -smp sockets=1,cores=1,threads=1 -cpu qemu64,+vme,+dts,+acpi,+dtes64,+vmx,+smx,+ssse3,+sse4_1,+sse4_2,+tpr_shadow,+vnmi,+flexpriority,+ept,+vpid,hv_relaxed,hv_spinlocks=0xffff,hv_vapic -rtc base=utc,driftfix=slew --no-hpet -drive aio=native,file=rbd:sata/525ce3d009c437d678000003,if=virtio,id=disk-525ce3d009c437d678000005,format=raw,cache=writeback,media=disk,index=0,addr=0xa -netdev tap,id=netdev-5264b6d46e53c81719000236,vhost=off,ifname=tap2,script=no,downscript=no -device virtio-net-pci,netdev=netdev-5264b6d46e53c81719000236,id=interface-5264b6d46e53c81719000236,mac=9a:a5:63:64:6f:76,bus=pci.0,addr=0xb -incoming tcp:0:3004

Thanks
-Matt


-----Original Message-----
From: Alex Bligh [mailto:alex@alex.org.uk] 
Sent: Tuesday, 22 October 2013 5:36 PM
To: Xiexiangyou; Matthew Anderson
Cc: Stefan Hajnoczi; qemu-devel@nongnu.org; Alex Bligh
Subject: Re: [Qemu-devel] BUG: RTC issue when Windows guest is idle



--On 22 October 2013 08:28:08 +0000 Xiexiangyou <xiexiangyou@huawei.com>
wrote:

> Hi:
>
> I have run windows2008r2 guest with qemu-1.5.1/1.6(I have not test the 
> the newer  version) for long time, the issue (guest in hangup state) 
> will come out.When guest  in hangup state, QEMU main thread is blocked 
> in g_poll loop.
>> 398c398,399
>> <     uint32_t timeout = UINT32_MAX;
>> ---
>>>     /* uint32_t timeout = UINT32_MAX; */
>>>     uint32_t timeout = 1000;
>>
> It seems can fix the problem, and rtc/hpet interrupt can inject into 
> guest again  because of the "timeout", and guest will wake up . But 
> maybe the issue is also exist,  because during the time before timeout 
> , guest also will lose rtc/hpet ticks.

I do not think that is the correct fix for 1.5.1/1.6; what you are basically doing is limiting the wait in the mainloop to one second (1.5.1/1.6 are in milliseconds); however, I believe there may be other code that looks for infinite timeouts. Either there is some other bug that this is masking (in which case it may or may not be fixed in master / 1.7), or its a bug in the timer stuff in 1.5.1/1.6 (which would not surprise me) which is likely to have been fixed in master / 1.7.

--
Alex Bligh

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Qemu-devel] BUG: RTC issue when Windows guest is idle
  2013-10-22  8:28     ` Xiexiangyou
@ 2013-10-22  9:35       ` Alex Bligh
  2013-10-28  6:58         ` Matthew Anderson
  0 siblings, 1 reply; 16+ messages in thread
From: Alex Bligh @ 2013-10-22  9:35 UTC (permalink / raw)
  To: Xiexiangyou, Matthew Anderson; +Cc: Stefan Hajnoczi, qemu-devel, Alex Bligh



--On 22 October 2013 08:28:08 +0000 Xiexiangyou <xiexiangyou@huawei.com> 
wrote:

> Hi:
>
> I have run windows2008r2 guest with qemu-1.5.1/1.6(I have not test the
> the newer  version) for long time, the issue (guest in hangup state) will
> come out.When guest  in hangup state, QEMU main thread is blocked in
> g_poll loop.
>> 398c398,399
>> <     uint32_t timeout = UINT32_MAX;
>> ---
>>>     /* uint32_t timeout = UINT32_MAX; */
>>>     uint32_t timeout = 1000;
>>
> It seems can fix the problem, and rtc/hpet interrupt can inject into
> guest again  because of the "timeout", and guest will wake up . But maybe
> the issue is also exist,  because during the time before timeout , guest
> also will lose rtc/hpet ticks.

I do not think that is the correct fix for 1.5.1/1.6; what you
are basically doing is limiting the wait in the mainloop to one
second (1.5.1/1.6 are in milliseconds); however, I believe there
may be other code that looks for infinite timeouts. Either there
is some other bug that this is masking (in which case it may
or may not be fixed in master / 1.7), or its a bug in the timer
stuff in 1.5.1/1.6 (which would not surprise me) which is likely
to have been fixed in master / 1.7.

-- 
Alex Bligh

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Qemu-devel] BUG: RTC issue when Windows guest is idle
  2013-10-21 14:56   ` Alex Bligh
  2013-10-21 15:13     ` Alex Bligh
@ 2013-10-22  8:28     ` Xiexiangyou
  2013-10-22  9:35       ` Alex Bligh
  1 sibling, 1 reply; 16+ messages in thread
From: Xiexiangyou @ 2013-10-22  8:28 UTC (permalink / raw)
  To: Alex Bligh, Matthew Anderson; +Cc: Stefan Hajnoczi, qemu-devel

Hi:

I have run windows2008r2 guest with qemu-1.5.1/1.6(I have not test the the newer
 version) for long time, the issue (guest in hangup state) will come out.When guest 
in hangup state, QEMU main thread is blocked in g_poll loop.
> 398c398,399
> <     uint32_t timeout = UINT32_MAX;
> ---
>>     /* uint32_t timeout = UINT32_MAX; */
>>     uint32_t timeout = 1000;
>
It seems can fix the problem, and rtc/hpet interrupt can inject into guest again 
because of the "timeout", and guest will wake up . But maybe the issue is also exist, 
because during the time before timeout , guest also will lose rtc/hpet ticks. 

Regards
xiexiangyou

-----Original Message-----
From: Alex Bligh [mailto:alex@alex.org.uk] 
Sent: Monday, October 21, 2013 10:56 PM
To: Matthew Anderson; Xiexiangyou
Cc: qemu-devel@nongnu.org; Alex Bligh; Stefan Hajnoczi
Subject: Re: [Qemu-devel] BUG: RTC issue when Windows guest is idle



--On 16 October 2013 14:56:12 +0000 Matthew Anderson 
<matthewa@base3.com.au> wrote:

> Hi Xiangyouxie,
>
> I personally haven't tried to solve the problem as yet but I've been in
> contact with Anders Fudali who was able to find the issue with the help
> of one of his developers.
>
> See below for his comments. I've love to hear from any of the devs that
> can explain the issue further because as far as I know it's still an
> issue although I haven't with tried any releases past 1.4.1.

It would be interesting to know whether this happens on a current qemu
(i.e. master/1.7/whatever, not 1.6).

The timeout code in main-loop.c has been changed. For context, in
1.4.2, this does:

    395 int main_loop_wait(int nonblocking)
    396 {
    397     int ret;
    398     uint32_t timeout = UINT32_MAX;
    399
    400     if (nonblocking) {
    401         timeout = 0;
    402     }
    403

and I can promise that code has been changed. This might have fixed
things or (in other cases) made an existing bug more obvious.

I'm not sure whether the RTC is meant to be generating some sort of
qemu_notify() here, or using a timer, or what, but it should be doing
something to break out of the select() loop. Stefan - any idea?

Alex

> ----------------------------------------
>
> We use QEMU-1.4.2 and Kernel-3.8.8 on our host machines. During the
> latest "RTC-freeze" we managed to strace the affected QEMU-process and
> one of our developers figured out that a file descriptor selector loops
> with a NULL value for timeout... we adjusted the source code for the
> file: main-loop.c and replaced UINT32_MAX with the value: 1000, see the
> diff below:
>
> 398c398,399
> <     uint32_t timeout = UINT32_MAX;
> ---
>>     /* uint32_t timeout = UINT32_MAX; */
>>     uint32_t timeout = 1000;
>
> I'm not sure IF this might have some other undesirable effects or
> consequence and why only Windows guests seems to be affected, but we've
> been running with this "fix" for > 3 days now and haven't seen the
> problem since.
>
> Thought to share this with you.
>
> Regards
>
> Anders Fudali
>
> ------------------------------
>
> From: Xiexiangyou [mailto:xiexiangyou@huawei.com]
> Sent: Tuesday, 8 October 2013 5:39 PM
> To: Matthew Anderson
> Cc: qemu-devel@nongnu.org
> Subject: Re: [Qemu-devel] BUG: RTC issue when Windows guest is idle
>
> Hi:
>          I have met the same bug that windows2008 guest stop
> receive the RTC ticks when it in idle status by fortuitous. When vnc
> connect, guest will resume to receive RTC ticks and the time run fast
> because of the coalesced timer
>
> HPET is diabled, and RTC is set "catchup", as following:
>   <clock offset='utc'>
>     <timer name='rtc' tickpolicy='catchup' track='guest'/>
>     <timer name='hpet' present='no'/>
>   </clock>
>
> My kvm module is version3.6. Should I upgrade it to latest version. Any
> suggestion is welcome !
>
> Thanks!
>
> --xiangyouxie                              
>               
>
> On Thu, Feb 21, 2013 at 06:16:10PM +0000, Matthew Anderson wrote:
>> If this isn't the correct list just let me know,
>>
>> I've run into a bug whereby a Windows guest (tested on Server 2008R2 and
>> 2012) no longer receives RTC ticks when it has been idle for a random
>> amount  of time. HPET is disabled and the guest is running Hyper-V
>> relaxed timers  (same situation without hv_relaxed). The guest clock
>> stands still and the  qemu process uses very little CPU (<0.5%, normally
>> it's >5% when the guest is  idle) . Eventually the guest stops
>> responding to network requests but if you  open the guest console via
>> VNC and move the mouse around it comes back to  life and QEMU replays
>> the lost RTC ticks and the guest recovers. I've also  been able to make
>> it recover by querying the clock over the network via the  net time
>> command, you can see the clock stand still for 30 seconds then it
>> replays the ticks and catches up.
>>
>> I've tried to reproduce the issue but it seems fairly illusive, the only
>> way  I've been able to reproduce it is by letting the VM's idle and
>> waiting.  Sometimes it's hours and sometimes minutes. Can anyone suggest
>> a way to  narrow the issue down?
>>
>> Qemu command line is-
>> /usr/bin/kvm -name SQL01 -S -M pc-0.14 -cpu qemu64,hv_relaxed
>> -enable-kvm -m  2048 -smp 2,sockets=2,cores=1,threads=1 -uuid
>> 5f54333b-c250-aa72-c979-39d156814b85 -no-user-config -nodefaults
>> -chardev
>> socket,id=charmonitor,path=/var/lib/libvirt/qemu/iHost-SQL01.monitor,ser
>> ver,nowait   -mon chardev=charmonitor,id=monitor,mode=control -rtc
>> base=localtime  -no-hpet -no-shutdown -device
>> piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2  -drive
>> file=/mnt/gluster1-norep/iHost/SQL01.qed,if=none,id=drive-virtio-disk0,f
>> ormat=qed,cache=writeback   -device
>> virtio-blk-pci,scsi=off,bus=pci.0,addr=0x5,drive=drive-virtio-disk0,id=v
>> irtio-disk0   -drive
>> file=/mnt/gluster1-norep/iHost/SQL01-Data.qed,if=none,id=drive-virtio-di
>> sk2,format=qed,cache=writeback   -device
>> virtio-blk-pci,scsi=off,bus=pci.0,addr=0x6,drive=drive-virtio-disk2,id=v
>> irtio-disk2   -drive if=none,id=drive-ide0-1-0,readonly=on,format=raw
>> -device
>> ide-cd,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0,bootindex=1
>> -netdev  tap,fd=29,id=hostnet0,vhost=on,vhostfd=39 -device
>> virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:2c:8d:23,bus=pci.0,a
>> ddr=0x3   -chardev pty,id=charserial0 -device
>> isa-serial,chardev=charserial0,id=serial0 -device usb-tablet,id=input0
>> -vnc  127.0.0.1:22 -vga cirrus -device
>> virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x4
>>
>> Environment is -
>> Mainline 3.7.5 and 3.8.0
>> Qemu 1.2.2, 1.3.1 and 1.4.0
>
>
>
>



-- 
Alex Bligh

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Qemu-devel] BUG: RTC issue when Windows guest is idle
  2013-10-21 14:56   ` Alex Bligh
@ 2013-10-21 15:13     ` Alex Bligh
  2013-10-22  8:28     ` Xiexiangyou
  1 sibling, 0 replies; 16+ messages in thread
From: Alex Bligh @ 2013-10-21 15:13 UTC (permalink / raw)
  To: Alex Bligh, Matthew Anderson, 'Xiexiangyou'
  Cc: Stefan Hajnoczi, qemu-devel



--On 21 October 2013 15:56:27 +0100 Alex Bligh <alex@alex.org.uk> wrote:

> I'm not sure whether the RTC is meant to be generating some sort of
> qemu_notify() here, or using a timer, or what, but it should be doing
> something to break out of the select() loop. Stefan - any idea?

To answer my own question, I think it will be using hw/timer/mc146818rtc.c
which (on brief inspection) appears to use the timer code to set a timer
whenever the next event should go off.

There was previously an 'unusual state of affairs' (I won't go as far
as to say 'bug') where if the select loop had no FDs to listen to, it
did not block for ever (UINT32_MAX as was) but exited immediately; I
believe that check may have miscounted existence of the qemu notify fd.
There was also (I think) some confusion between UINT32_MAX and UINT64_MAX.
I believe my timer fixes plus a change or two from Stefan H may have
fixed this - it certainly makes the code easier to read. Unfortunately
these would be intrusive to backport. Anyway, I'd be interested to
know whether it works on master.

-- 
Alex Bligh

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Qemu-devel] BUG: RTC issue when Windows guest is idle
  2013-10-16 14:56 ` Matthew Anderson
@ 2013-10-21 14:56   ` Alex Bligh
  2013-10-21 15:13     ` Alex Bligh
  2013-10-22  8:28     ` Xiexiangyou
  0 siblings, 2 replies; 16+ messages in thread
From: Alex Bligh @ 2013-10-21 14:56 UTC (permalink / raw)
  To: Matthew Anderson, 'Xiexiangyou'
  Cc: Stefan Hajnoczi, qemu-devel, Alex Bligh



--On 16 October 2013 14:56:12 +0000 Matthew Anderson 
<matthewa@base3.com.au> wrote:

> Hi Xiangyouxie,
>
> I personally haven't tried to solve the problem as yet but I've been in
> contact with Anders Fudali who was able to find the issue with the help
> of one of his developers.
>
> See below for his comments. I've love to hear from any of the devs that
> can explain the issue further because as far as I know it's still an
> issue although I haven't with tried any releases past 1.4.1.

It would be interesting to know whether this happens on a current qemu
(i.e. master/1.7/whatever, not 1.6).

The timeout code in main-loop.c has been changed. For context, in
1.4.2, this does:

    395 int main_loop_wait(int nonblocking)
    396 {
    397     int ret;
    398     uint32_t timeout = UINT32_MAX;
    399
    400     if (nonblocking) {
    401         timeout = 0;
    402     }
    403

and I can promise that code has been changed. This might have fixed
things or (in other cases) made an existing bug more obvious.

I'm not sure whether the RTC is meant to be generating some sort of
qemu_notify() here, or using a timer, or what, but it should be doing
something to break out of the select() loop. Stefan - any idea?

Alex

> ----------------------------------------
>
> We use QEMU-1.4.2 and Kernel-3.8.8 on our host machines. During the
> latest "RTC-freeze" we managed to strace the affected QEMU-process and
> one of our developers figured out that a file descriptor selector loops
> with a NULL value for timeout... we adjusted the source code for the
> file: main-loop.c and replaced UINT32_MAX with the value: 1000, see the
> diff below:
>
> 398c398,399
> <     uint32_t timeout = UINT32_MAX;
> ---
>>     /* uint32_t timeout = UINT32_MAX; */
>>     uint32_t timeout = 1000;
>
> I'm not sure IF this might have some other undesirable effects or
> consequence and why only Windows guests seems to be affected, but we've
> been running with this "fix" for > 3 days now and haven't seen the
> problem since.
>
> Thought to share this with you.
>
> Regards
>
> Anders Fudali
>
> ------------------------------
>
> From: Xiexiangyou [mailto:xiexiangyou@huawei.com]
> Sent: Tuesday, 8 October 2013 5:39 PM
> To: Matthew Anderson
> Cc: qemu-devel@nongnu.org
> Subject: Re: [Qemu-devel] BUG: RTC issue when Windows guest is idle
>
> Hi:
>          I have met the same bug that windows2008 guest stop
> receive the RTC ticks when it in idle status by fortuitous. When vnc
> connect, guest will resume to receive RTC ticks and the time run fast
> because of the coalesced timer
>
> HPET is diabled, and RTC is set "catchup", as following:
>   <clock offset='utc'>
>     <timer name='rtc' tickpolicy='catchup' track='guest'/>
>     <timer name='hpet' present='no'/>
>   </clock>
>
> My kvm module is version3.6. Should I upgrade it to latest version. Any
> suggestion is welcome !
>
> Thanks!
>
> --xiangyouxie                              
>               
>
> On Thu, Feb 21, 2013 at 06:16:10PM +0000, Matthew Anderson wrote:
>> If this isn't the correct list just let me know,
>>
>> I've run into a bug whereby a Windows guest (tested on Server 2008R2 and
>> 2012) no longer receives RTC ticks when it has been idle for a random
>> amount  of time. HPET is disabled and the guest is running Hyper-V
>> relaxed timers  (same situation without hv_relaxed). The guest clock
>> stands still and the  qemu process uses very little CPU (<0.5%, normally
>> it's >5% when the guest is  idle) . Eventually the guest stops
>> responding to network requests but if you  open the guest console via
>> VNC and move the mouse around it comes back to  life and QEMU replays
>> the lost RTC ticks and the guest recovers. I've also  been able to make
>> it recover by querying the clock over the network via the  net time
>> command, you can see the clock stand still for 30 seconds then it
>> replays the ticks and catches up.
>>
>> I've tried to reproduce the issue but it seems fairly illusive, the only
>> way  I've been able to reproduce it is by letting the VM's idle and
>> waiting.  Sometimes it's hours and sometimes minutes. Can anyone suggest
>> a way to  narrow the issue down?
>>
>> Qemu command line is-
>> /usr/bin/kvm -name SQL01 -S -M pc-0.14 -cpu qemu64,hv_relaxed
>> -enable-kvm -m  2048 -smp 2,sockets=2,cores=1,threads=1 -uuid
>> 5f54333b-c250-aa72-c979-39d156814b85 -no-user-config -nodefaults
>> -chardev
>> socket,id=charmonitor,path=/var/lib/libvirt/qemu/iHost-SQL01.monitor,ser
>> ver,nowait   -mon chardev=charmonitor,id=monitor,mode=control -rtc
>> base=localtime  -no-hpet -no-shutdown -device
>> piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2  -drive
>> file=/mnt/gluster1-norep/iHost/SQL01.qed,if=none,id=drive-virtio-disk0,f
>> ormat=qed,cache=writeback   -device
>> virtio-blk-pci,scsi=off,bus=pci.0,addr=0x5,drive=drive-virtio-disk0,id=v
>> irtio-disk0   -drive
>> file=/mnt/gluster1-norep/iHost/SQL01-Data.qed,if=none,id=drive-virtio-di
>> sk2,format=qed,cache=writeback   -device
>> virtio-blk-pci,scsi=off,bus=pci.0,addr=0x6,drive=drive-virtio-disk2,id=v
>> irtio-disk2   -drive if=none,id=drive-ide0-1-0,readonly=on,format=raw
>> -device
>> ide-cd,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0,bootindex=1
>> -netdev  tap,fd=29,id=hostnet0,vhost=on,vhostfd=39 -device
>> virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:2c:8d:23,bus=pci.0,a
>> ddr=0x3   -chardev pty,id=charserial0 -device
>> isa-serial,chardev=charserial0,id=serial0 -device usb-tablet,id=input0
>> -vnc  127.0.0.1:22 -vga cirrus -device
>> virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x4
>>
>> Environment is -
>> Mainline 3.7.5 and 3.8.0
>> Qemu 1.2.2, 1.3.1 and 1.4.0
>
>
>
>



-- 
Alex Bligh

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Qemu-devel] BUG: RTC issue when Windows guest is idle
  2013-10-08  9:38 Xiexiangyou
@ 2013-10-16 14:56 ` Matthew Anderson
  2013-10-21 14:56   ` Alex Bligh
  0 siblings, 1 reply; 16+ messages in thread
From: Matthew Anderson @ 2013-10-16 14:56 UTC (permalink / raw)
  To: 'Xiexiangyou'; +Cc: qemu-devel

Hi Xiangyouxie,

I personally haven't tried to solve the problem as yet but I've been in contact with Anders Fudali who was able to find the issue with the help of one of his developers. 

See below for his comments. I've love to hear from any of the devs that can explain the issue further because as far as I know it's still an issue although I haven't with tried any releases past 1.4.1.

----------------------------------------

We use QEMU-1.4.2 and Kernel-3.8.8 on our host machines. During the latest "RTC-freeze" we managed to strace the affected QEMU-process and one of our developers figured out that a file descriptor selector loops with a NULL value for timeout... we adjusted the source code for the file: main-loop.c and replaced UINT32_MAX with the value: 1000, see the diff below:

398c398,399
<     uint32_t timeout = UINT32_MAX;
---
>     /* uint32_t timeout = UINT32_MAX; */
>     uint32_t timeout = 1000;

I'm not sure IF this might have some other undesirable effects or consequence and why only Windows guests seems to be affected, but we've been running with this "fix" for > 3 days now and haven't seen the problem since.

Thought to share this with you.

Regards

Anders Fudali

------------------------------

From: Xiexiangyou [mailto:xiexiangyou@huawei.com] 
Sent: Tuesday, 8 October 2013 5:39 PM
To: Matthew Anderson
Cc: qemu-devel@nongnu.org
Subject: Re: [Qemu-devel] BUG: RTC issue when Windows guest is idle

Hi:
         I have met the same bug that windows2008 guest stop receive the RTC ticks when it in idle status by fortuitous.
When vnc connect, guest will resume to receive RTC ticks and the time run fast because of the coalesced timer

HPET is diabled, and RTC is set "catchup", as following: 
  <clock offset='utc'>
    <timer name='rtc' tickpolicy='catchup' track='guest'/>
    <timer name='hpet' present='no'/>
  </clock>

My kvm module is version3.6. Should I upgrade it to latest version. Any suggestion is welcome !

Thanks! 

--xiangyouxie                                             

On Thu, Feb 21, 2013 at 06:16:10PM +0000, Matthew Anderson wrote:
> If this isn't the correct list just let me know,
> 
> I've run into a bug whereby a Windows guest (tested on Server 2008R2 and 
> 2012) no longer receives RTC ticks when it has been idle for a random amount 
> of time. HPET is disabled and the guest is running Hyper-V relaxed timers 
> (same situation without hv_relaxed). The guest clock stands still and the 
> qemu process uses very little CPU (<0.5%, normally it's >5% when the guest is 
> idle) . Eventually the guest stops responding to network requests but if you 
> open the guest console via VNC and move the mouse around it comes back to 
> life and QEMU replays the lost RTC ticks and the guest recovers. I've also 
> been able to make it recover by querying the clock over the network via the 
> net time command, you can see the clock stand still for 30 seconds then it 
> replays the ticks and catches up.
> 
> I've tried to reproduce the issue but it seems fairly illusive, the only way 
> I've been able to reproduce it is by letting the VM's idle and waiting. 
> Sometimes it's hours and sometimes minutes. Can anyone suggest a way to 
> narrow the issue down?
> 
> Qemu command line is-
> /usr/bin/kvm -name SQL01 -S -M pc-0.14 -cpu qemu64,hv_relaxed -enable-kvm -m 
> 2048 -smp 2,sockets=2,cores=1,threads=1 -uuid 
> 5f54333b-c250-aa72-c979-39d156814b85 -no-user-config -nodefaults -chardev 
> socket,id=charmonitor,path=/var/lib/libvirt/qemu/iHost-SQL01.monitor,server,nowait
>  -mon chardev=charmonitor,id=monitor,mode=control -rtc base=localtime 
> -no-hpet -no-shutdown -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 
> -drive 
> file=/mnt/gluster1-norep/iHost/SQL01.qed,if=none,id=drive-virtio-disk0,format=qed,cache=writeback
>  -device 
> virtio-blk-pci,scsi=off,bus=pci.0,addr=0x5,drive=drive-virtio-disk0,id=virtio-disk0
>  -drive 
> file=/mnt/gluster1-norep/iHost/SQL01-Data.qed,if=none,id=drive-virtio-disk2,format=qed,cache=writeback
>  -device 
> virtio-blk-pci,scsi=off,bus=pci.0,addr=0x6,drive=drive-virtio-disk2,id=virtio-disk2
>  -drive if=none,id=drive-ide0-1-0,readonly=on,format=raw -device 
> ide-cd,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0,bootindex=1 -netdev 
> tap,fd=29,id=hostnet0,vhost=on,vhostfd=39 -device 
> virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:2c:8d:23,bus=pci.0,addr=0x3
>  -chardev pty,id=charserial0 -device 
> isa-serial,chardev=charserial0,id=serial0 -device usb-tablet,id=input0 -vnc 
> 127.0.0.1:22 -vga cirrus -device 
> virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x4
> 
> Environment is -
> Mainline 3.7.5 and 3.8.0
> Qemu 1.2.2, 1.3.1 and 1.4.0

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Qemu-devel] BUG: RTC issue when Windows guest is idle
@ 2013-10-08  9:38 Xiexiangyou
  2013-10-16 14:56 ` Matthew Anderson
  0 siblings, 1 reply; 16+ messages in thread
From: Xiexiangyou @ 2013-10-08  9:38 UTC (permalink / raw)
  To: matthewa; +Cc: qemu-devel

[-- Attachment #1: Type: text/plain, Size: 3364 bytes --]

Hi:

         I have met the same bug that windows2008 guest stop receive the RTC ticks when it in idle status by fortuitous.

When vnc connect, guest will resume to receive RTC ticks and the time run fast because of the coalesced timer



HPET is diabled, and RTC is set "catchup", as following:

  <clock offset='utc'>

    <timer name='rtc' tickpolicy='catchup' track='guest'/>

    <timer name='hpet' present='no'/>

  </clock>



My kvm module is version3.6. Should I upgrade it to latest version. Any suggestion is welcome !



Thanks!



--xiangyouxie



On Thu, Feb 21, 2013 at 06:16:10PM +0000, Matthew Anderson wrote:

> If this isn't the correct list just let me know,

>

> I've run into a bug whereby a Windows guest (tested on Server 2008R2 and

> 2012) no longer receives RTC ticks when it has been idle for a random amount

> of time. HPET is disabled and the guest is running Hyper-V relaxed timers

> (same situation without hv_relaxed). The guest clock stands still and the

> qemu process uses very little CPU (<0.5%, normally it's >5% when the guest is

> idle) . Eventually the guest stops responding to network requests but if you

> open the guest console via VNC and move the mouse around it comes back to

> life and QEMU replays the lost RTC ticks and the guest recovers. I've also

> been able to make it recover by querying the clock over the network via the

> net time command, you can see the clock stand still for 30 seconds then it

> replays the ticks and catches up.

>

> I've tried to reproduce the issue but it seems fairly illusive, the only way

> I've been able to reproduce it is by letting the VM's idle and waiting.

> Sometimes it's hours and sometimes minutes. Can anyone suggest a way to

> narrow the issue down?

>

> Qemu command line is-

> /usr/bin/kvm -name SQL01 -S -M pc-0.14 -cpu qemu64,hv_relaxed -enable-kvm -m

> 2048 -smp 2,sockets=2,cores=1,threads=1 -uuid

> 5f54333b-c250-aa72-c979-39d156814b85 -no-user-config -nodefaults -chardev

> socket,id=charmonitor,path=/var/lib/libvirt/qemu/iHost-SQL01.monitor,server,nowait

>  -mon chardev=charmonitor,id=monitor,mode=control -rtc base=localtime

> -no-hpet -no-shutdown -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2

> -drive

> file=/mnt/gluster1-norep/iHost/SQL01.qed,if=none,id=drive-virtio-disk0,format=qed,cache=writeback

>  -device

> virtio-blk-pci,scsi=off,bus=pci.0,addr=0x5,drive=drive-virtio-disk0,id=virtio-disk0

>  -drive

> file=/mnt/gluster1-norep/iHost/SQL01-Data.qed,if=none,id=drive-virtio-disk2,format=qed,cache=writeback

>  -device

> virtio-blk-pci,scsi=off,bus=pci.0,addr=0x6,drive=drive-virtio-disk2,id=virtio-disk2

>  -drive if=none,id=drive-ide0-1-0,readonly=on,format=raw -device

> ide-cd,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0,bootindex=1 -netdev

> tap,fd=29,id=hostnet0,vhost=on,vhostfd=39 -device

> virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:2c:8d:23,bus=pci.0,addr=0x3

>  -chardev pty,id=charserial0 -device

> isa-serial,chardev=charserial0,id=serial0 -device usb-tablet,id=input0 -vnc

> 127.0.0.1:22 -vga cirrus -device

> virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x4

>

> Environment is -

> Mainline 3.7.5 and 3.8.0

> Qemu 1.2.2, 1.3.1 and 1.4.0




[-- Attachment #2: Type: text/html, Size: 10863 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2013-10-30  7:26 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <5C390CF7F373CC4AB58A6684FF6C8712013DD0E2@Exchange2010-2.corit.local>
     [not found] ` <20130221222246.GH10005@vm>
2013-03-14  4:15   ` [Qemu-devel] BUG: RTC issue when Windows guest is idle Matthew Anderson
2013-03-19 17:53     ` Gleb Natapov
2013-10-08  9:38 Xiexiangyou
2013-10-16 14:56 ` Matthew Anderson
2013-10-21 14:56   ` Alex Bligh
2013-10-21 15:13     ` Alex Bligh
2013-10-22  8:28     ` Xiexiangyou
2013-10-22  9:35       ` Alex Bligh
2013-10-28  6:58         ` Matthew Anderson
2013-10-28  7:44           ` Alex Bligh
2013-10-28  8:46             ` Alex Bligh
2013-10-30  0:44             ` Xiexiangyou
2013-10-30  7:23               ` Alex Bligh
2013-10-28 11:51           ` Paolo Bonzini
2013-10-30  1:29           ` Xiexiangyou
2013-10-30  7:26             ` Alex Bligh

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.