All of lore.kernel.org
 help / color / mirror / Atom feed
* Live migration locks up 3.2 guests in do_timer(ticks ~ 500000)
@ 2014-09-08  5:54 Matt Mullins
  2014-09-08  8:18 ` Paolo Bonzini
  0 siblings, 1 reply; 12+ messages in thread
From: Matt Mullins @ 2014-09-08  5:54 UTC (permalink / raw)
  To: kvm

Somewhere between kernel 3.2 and 3.11 on my VM hosts (yes, I know that narrows
it down a /whole lot/ ...), live migration started killing my Ubuntu precise
(kernel 3.2.x) guests, causing all of their vcpus to go into a busy loop.  Once
(and only once) I've observed the guest eventually becoming responsive again,
with a clock nearly 600 years in the future and a negative uptime.

I haven't been able to dig up any previous threads about this problem, so my
gut instinct is that I've configured something wonky.  Any pointers toward
/what/ I may have done wrong are appreciated.

It only seems to happen if I've given the guests Nehalem-class CPU features.
My longest-running VMs, from before I started passing-through the CPU
capabilities into the guest, seem to migrate without issue.

It also seems to happen reliably when the guest has been running for a while;
it's easily reproducible with guests that have been up ~1 day, and I've
reproduced it in VMs with an uptime of ~20 hours.  I haven't yet figured out a
lower-bound, which makes the testing cycle a little longer for me.

The guests that I reliably reproduce this on are Ubuntu 12.04 guests running
the current 3.2 kernel that Canonical distributes.  Recent Fedora kernels
(3.14+, IIRC) don't seem to busy-spin this way, though I haven't tested this
case exhaustively, and I haven't written down very good notes for the tests I
have done with Fedora.

The hosts are dual-socket Nehalem Xeons (L5520), currently running Ubuntu 14.04
and the associated 3.13 kernel.  I had previously reproduced this with 12.04
running a raring-backport 3.11 kernel as well, but I (seemingly erroneously)
assumed it may have been a qemu userspace discrepancy.

I have been poring through a debugger attached to the guest via qemu's
gdbserver after it gets sent in a busy-spin, and the stack trace is:

(gdb) bt
#0  second_overflow (secs=<optimized out>) at /build/buildd/linux-3.2.0/kernel/time/ntp.c:407
#1  0xffffffff81095c75 in logarithmic_accumulation (offset=3831765322649889943, shift=9) at /build/buildd/linux-3.2.0/kernel/time/timekeeping.c:987
#2  0xffffffff81096042 in update_wall_time () at /build/buildd/linux-3.2.0/kernel/time/timekeeping.c:1056
#3  0xffffffff81096e8d in do_timer (ticks=549606) at /build/buildd/linux-3.2.0/kernel/time/timekeeping.c:1246
#4  0xffffffff8109d825 in tick_do_update_jiffies64 (now=...) at /build/buildd/linux-3.2.0/kernel/time/tick-sched.c:77
#5  0xffffffff8109dda6 in tick_nohz_update_jiffies (now=...) at /build/buildd/linux-3.2.0/kernel/time/tick-sched.c:145
#6  0xffffffff8109e378 in tick_check_nohz (cpu=0) at /build/buildd/linux-3.2.0/kernel/time/tick-sched.c:713
#7  tick_check_idle (cpu=0) at /build/buildd/linux-3.2.0/kernel/time/tick-sched.c:731
#8  0xffffffff8106ff91 in irq_enter () at /build/buildd/linux-3.2.0/kernel/softirq.c:306
#9  0xffffffff8166cef3 in smp_apic_timer_interrupt (regs=<optimized out>) at /build/buildd/linux-3.2.0/arch/x86/kernel/apic/apic.c:880
#10 <signal handler called>
#11 0xffffffffffffff10 in ?? ()
(gdb) thread 2
[Switching to thread 2 (Thread 2)]
#0  read_seqbegin (sl=<optimized out>) at /build/buildd/linux-3.2.0/include/linux/seqlock.h:89
89      /build/buildd/linux-3.2.0/include/linux/seqlock.h: No such file or directory.
(gdb) bt
#0  read_seqbegin (sl=<optimized out>) at /build/buildd/linux-3.2.0/include/linux/seqlock.h:89
#1  ktime_get () at /build/buildd/linux-3.2.0/kernel/time/timekeeping.c:268
#2  0xffffffff8109e355 in tick_check_nohz (cpu=1) at /build/buildd/linux-3.2.0/kernel/time/tick-sched.c:709
#3  tick_check_idle (cpu=1) at /build/buildd/linux-3.2.0/kernel/time/tick-sched.c:731
#4  0xffffffff8106ff91 in irq_enter () at /build/buildd/linux-3.2.0/kernel/softirq.c:306
#5  0xffffffff8166cef3 in smp_apic_timer_interrupt (regs=<optimized out>) at /build/buildd/linux-3.2.0/arch/x86/kernel/apic/apic.c:880
#6  <signal handler called>
#7  0xffffffffffffff10 in ?? ()

If I continue, then re-stop the guest, logarithmic_accumulation() is still in
the stacktrace, with the same offset and shift; the line numbers indicate it's
stuck in the following loop:
    while (timekeeper.xtime_nsec >= nsecps) {
        int leap;
        timekeeper.xtime_nsec -= nsecps;
        xtime.tv_sec++;
        leap = second_overflow(xtime.tv_sec);
        xtime.tv_sec += leap;
        wall_to_monotonic.tv_sec -= leap;
        if (leap)
            clock_was_set_delayed();
    }

Live migration is initiated through libvirt by virDomainMigrate with
flags=VIR_MIGRATE_LIVE, uri="tcp://$recv_hostname".

The guest is spawned by libvirtd with:
qemu-system-x86_64 -enable-kvm -name dog -S 
-machine pc-i440fx-trusty,accel=kvm,usb=off 
-cpu Nehalem,+dca,+xtpr,+tm2,+est,+vmx,+ds_cpl,+monitor,+pbe,+tm,+ht,+ss,+acpi,+ds,+vme
-m 512 -realtime mlock=off -smp 2,sockets=2,cores=1,threads=1 
-uuid 55fd4c19-2477-40a5-988f-aaccd60b20dc -no-user-config -nodefaults 
-chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/dog.monitor,server,nowait
-mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown
-boot menu=on,strict=on
-device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2
-drive if=none,id=drive-ide0-1-0,readonly=on,format=raw
-device ide-cd,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0,bootindex=1
-drive file=rbd:rbd/dog:id=libvirt:key=________________________________________:auth_supported=cephx\;none,if=none,id=drive-virtio-disk0,format=raw,cache=none
-device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=2
-netdev tap,ifname=vm9_0,script=no,id=hostnet0,vhost=on,vhostfd=26
-device virtio-net-pci,netdev=hostnet0,id=net0,mac=00:16:3e:62:7a:9d,bus=pci.0,addr=0x3
-vnc 0.0.0.0:9,password
-device cirrus-vga,id=video0,bus=pci.0,addr=0x2
-incoming tcp:[::]:49152
-device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5 

The libvirt domain XML is:
<domain type='kvm' id='12'>
  <name>dog</name>
  <uuid>55fd4c19-2477-40a5-988f-aaccd60b20dc</uuid>
  <memory unit='KiB'>524288</memory>
  <currentMemory unit='KiB'>524288</currentMemory>
  <vcpu placement='static'>2</vcpu>
  <resource>
    <partition>/machine</partition>
  </resource>
  <os>
    <type arch='x86_64' machine='pc-i440fx-trusty'>hvm</type>
    <bootmenu enable='yes'/>
  </os>
  <features>
    <acpi/>
  </features>
  <cpu mode='custom' match='exact'>
    <model fallback='allow'>Nehalem</model>
    <feature policy='require' name='dca'/>
    <feature policy='require' name='xtpr'/>
    <feature policy='require' name='tm2'/>
    <feature policy='require' name='est'/>
    <feature policy='require' name='vmx'/>
    <feature policy='require' name='ds_cpl'/>
    <feature policy='require' name='monitor'/>
    <feature policy='require' name='pbe'/>
    <feature policy='require' name='tm'/>
    <feature policy='require' name='ht'/>
    <feature policy='require' name='ss'/>
    <feature policy='require' name='acpi'/>
    <feature policy='require' name='ds'/>
    <feature policy='require' name='vme'/>
  </cpu>
  <clock offset='utc'/>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>destroy</on_crash>
  <devices>
    <emulator>/usr/bin/kvm-spice</emulator>
    <disk type='file' device='cdrom'>
      <driver name='qemu' type='raw'/>
      <target dev='hdc' bus='ide'/>
      <readonly/>
      <boot order='1'/>
      <alias name='ide0-1-0'/>
      <address type='drive' controller='0' bus='1' target='0' unit='0'/>
    </disk>
    <disk type='network' device='disk' snapshot='no'>
      <driver name='qemu' type='raw' cache='none'/>
      <auth username='libvirt'>
        <secret type='ceph' uuid='e04aa789-0bd7-07ac-cf10-78d8f52a4162'/>
      </auth>
      <source protocol='rbd' name='rbd/dog'/>
      <target dev='vda' bus='virtio'/>
      <boot order='2'/>
      <alias name='virtio-disk0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/>
    </disk>
    <controller type='ide' index='0'>
      <alias name='ide0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x1'/>
    </controller>
    <controller type='usb' index='0'>
      <alias name='usb0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x2'/>
    </controller>
    <controller type='pci' index='0' model='pci-root'>
      <alias name='pci.0'/>
    </controller>
    <interface type='ethernet'>
      <mac address='00:16:3e:62:7a:9d'/>
      <script path='no'/>
      <target dev='vm9_0'/>
      <model type='virtio'/>
      <alias name='net0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
    </interface>
    <input type='mouse' bus='ps2'/>
    <input type='keyboard' bus='ps2'/>
    <graphics type='vnc' port='5909' autoport='no' listen='0.0.0.0'>
      <listen type='address' address='0.0.0.0'/>
    </graphics>
    <video>
      <model type='cirrus' vram='9216' heads='1'/>
      <alias name='video0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/>
    </video>
    <memballoon model='virtio'>
      <alias name='balloon0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/>
    </memballoon>
  </devices>
  <seclabel type='none'/>
</domain>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Live migration locks up 3.2 guests in do_timer(ticks ~ 500000)
  2014-09-08  5:54 Live migration locks up 3.2 guests in do_timer(ticks ~ 500000) Matt Mullins
@ 2014-09-08  8:18 ` Paolo Bonzini
  2014-09-08 15:56   ` Matt Mullins
  0 siblings, 1 reply; 12+ messages in thread
From: Paolo Bonzini @ 2014-09-08  8:18 UTC (permalink / raw)
  To: kvm

Il 08/09/2014 07:54, Matt Mullins ha scritto:
> It also seems to happen reliably when the guest has been running for a while;
> it's easily reproducible with guests that have been up ~1 day, and I've
> reproduced it in VMs with an uptime of ~20 hours.  I haven't yet figured out a
> lower-bound, which makes the testing cycle a little longer for me.
> 
> The guests that I reliably reproduce this on are Ubuntu 12.04 guests running
> the current 3.2 kernel that Canonical distributes.  Recent Fedora kernels
> (3.14+, IIRC) don't seem to busy-spin this way, though I haven't tested this
> case exhaustively, and I haven't written down very good notes for the tests I
> have done with Fedora.

What host are you running?

Paolo

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Live migration locks up 3.2 guests in do_timer(ticks ~ 500000)
  2014-09-08  8:18 ` Paolo Bonzini
@ 2014-09-08 15:56   ` Matt Mullins
  2014-09-08 16:18     ` Paolo Bonzini
  0 siblings, 1 reply; 12+ messages in thread
From: Matt Mullins @ 2014-09-08 15:56 UTC (permalink / raw)
  To: kvm

On Mon, Sep 08, 2014 at 10:18:06AM +0200, Paolo Bonzini wrote:
> Il 08/09/2014 07:54, Matt Mullins ha scritto:
> > It also seems to happen reliably when the guest has been running for a while;
> > it's easily reproducible with guests that have been up ~1 day, and I've
> > reproduced it in VMs with an uptime of ~20 hours.  I haven't yet figured out a
> > lower-bound, which makes the testing cycle a little longer for me.
> > 
> > The guests that I reliably reproduce this on are Ubuntu 12.04 guests running
> > the current 3.2 kernel that Canonical distributes.  Recent Fedora kernels
> > (3.14+, IIRC) don't seem to busy-spin this way, though I haven't tested this
> > case exhaustively, and I haven't written down very good notes for the tests I
> > have done with Fedora.
> 
> What host are you running?

What information do you want that I missed in my first email?

> The hosts are dual-socket Nehalem Xeons (L5520), currently running Ubuntu
> 14.04 and the associated 3.13 kernel.  I had previously reproduced this with
> 12.04 running a raring-backport 3.11 kernel as well, but I (seemingly
> erroneously) assumed it may have been a qemu userspace discrepancy.

I implied, but didn't explicitly state: I don't remember this happening with
Ubuntu 12.04's 3.2 kernel running on the hosts.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Live migration locks up 3.2 guests in do_timer(ticks ~ 500000)
  2014-09-08 15:56   ` Matt Mullins
@ 2014-09-08 16:18     ` Paolo Bonzini
  2014-09-08 16:39       ` Matt Mullins
  2014-09-10  6:53       ` Matt Mullins
  0 siblings, 2 replies; 12+ messages in thread
From: Paolo Bonzini @ 2014-09-08 16:18 UTC (permalink / raw)
  To: kvm

Il 08/09/2014 17:56, Matt Mullins ha scritto:
>> > What host are you running?
> What information do you want that I missed in my first email?

What version of QEMU?  Can you try the 12.04 qemu (which IIRC is 1.0) on
top of the newer kernel?

Paolo

> > The hosts are dual-socket Nehalem Xeons (L5520), currently running Ubuntu
> > 14.04 and the associated 3.13 kernel.  I had previously reproduced this with
> > 12.04 running a raring-backport 3.11 kernel as well, but I (seemingly
> > erroneously) assumed it may have been a qemu userspace discrepancy.
> I implied, but didn't explicitly state: I don't remember this happening with
> Ubuntu 12.04's 3.2 kernel running on the hosts.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Live migration locks up 3.2 guests in do_timer(ticks ~ 500000)
  2014-09-08 16:18     ` Paolo Bonzini
@ 2014-09-08 16:39       ` Matt Mullins
  2014-09-10  6:53       ` Matt Mullins
  1 sibling, 0 replies; 12+ messages in thread
From: Matt Mullins @ 2014-09-08 16:39 UTC (permalink / raw)
  To: kvm

On Mon, Sep 08, 2014 at 06:18:46PM +0200, Paolo Bonzini wrote:
> Il 08/09/2014 17:56, Matt Mullins ha scritto:
> >> > What host are you running?
> 
> What version of QEMU?  Can you try the 12.04 qemu (which IIRC is 1.0) on
> top of the newer kernel?

I'm currently running the version included with Ubuntu 14.04:
2.0.0+dfsg-2ubuntu1.2.

I had originally seen the hang with Ubuntu 12.04 (qemu 1.0+noroms-0ubuntu14.13)
running the Canonical-backported-to-12.04 3.11 kernel.

I'll try getting qemu 1.0 on two of my 14.04 / 3.13 machines this evening;
migration results would come in tomorrow evening.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Live migration locks up 3.2 guests in do_timer(ticks ~ 500000)
  2014-09-08 16:18     ` Paolo Bonzini
  2014-09-08 16:39       ` Matt Mullins
@ 2014-09-10  6:53       ` Matt Mullins
  2014-09-15 18:14         ` Matt Mullins
  1 sibling, 1 reply; 12+ messages in thread
From: Matt Mullins @ 2014-09-10  6:53 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: kvm

On Mon, Sep 08, 2014 at 06:18:46PM +0200, Paolo Bonzini wrote:
> Il 08/09/2014 17:56, Matt Mullins ha scritto:
> >> > What host are you running?
> > What information do you want that I missed in my first email?
> 
> What version of QEMU?  Can you try the 12.04 qemu (which IIRC is 1.0) on
> top of the newer kernel?

I did reproduce this on qemu 1.0.1.

What would you like me to test next?  I've got both VM hosts currently running
3.17-rc4, so I'll know tomorrow if that works.

I've been looking into this off-and-on for a while now, and this time around I
may have found other folks experiencing the same issue:
    https://issues.apache.org/jira/browse/CLOUDSTACK-6788
That one's a little empty on the details (do y'all know more about to whom that
bug was "known" than I do?), but
    https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/1297218
sees similar symptoms due to running NTP on the host.  I may try disabling that
after the current round of testing-3.17-rc4 finishes up.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Live migration locks up 3.2 guests in do_timer(ticks ~ 500000)
  2014-09-10  6:53       ` Matt Mullins
@ 2014-09-15 18:14         ` Matt Mullins
  2014-09-16  8:42           ` Paolo Bonzini
  0 siblings, 1 reply; 12+ messages in thread
From: Matt Mullins @ 2014-09-15 18:14 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: kvm

On Tue, Sep 09, 2014 at 11:53:49PM -0700, Matt Mullins wrote:
> On Mon, Sep 08, 2014 at 06:18:46PM +0200, Paolo Bonzini wrote:
> > What version of QEMU?  Can you try the 12.04 qemu (which IIRC is 1.0) on
> > top of the newer kernel?
> 
> I did reproduce this on qemu 1.0.1.
> 
> What would you like me to test next?  I've got both VM hosts currently running
> 3.17-rc4, so I'll know tomorrow if that works.
> 
> I've been looking into this off-and-on for a while now, and this time around I
> may have found other folks experiencing the same issue:
>     https://issues.apache.org/jira/browse/CLOUDSTACK-6788
> That one's a little empty on the details (do y'all know more about to whom that
> bug was "known" than I do?), but
>     https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/1297218
> sees similar symptoms due to running NTP on the host.  I may try disabling that
> after the current round of testing-3.17-rc4 finishes up.

A summary of my testing from last week:
  * 3.17-rc4 (with qemu 2.0) seems to have had the same problem as well.
  * Disabling ntpd didn't help either.
  * <timer name="kvmclock" present="no"/> _does_ seem to make my VM migrate
    without issue.

I'm not really sure what to look at from here.  I suppose leaving kvmclock
disabled is a workaround for now, but is there a major disadvantage to doing
so?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Live migration locks up 3.2 guests in do_timer(ticks ~ 500000)
  2014-09-15 18:14         ` Matt Mullins
@ 2014-09-16  8:42           ` Paolo Bonzini
  2014-10-27 18:42             ` Matt Mullins
  0 siblings, 1 reply; 12+ messages in thread
From: Paolo Bonzini @ 2014-09-16  8:42 UTC (permalink / raw)
  To: kvm

Il 15/09/2014 20:14, Matt Mullins ha scritto:
> On Tue, Sep 09, 2014 at 11:53:49PM -0700, Matt Mullins wrote:
>> On Mon, Sep 08, 2014 at 06:18:46PM +0200, Paolo Bonzini wrote:
>>> What version of QEMU?  Can you try the 12.04 qemu (which IIRC is 1.0) on
>>> top of the newer kernel?
>>
>> I did reproduce this on qemu 1.0.1.
>>
>> What would you like me to test next?  I've got both VM hosts currently running
>> 3.17-rc4, so I'll know tomorrow if that works.
>>
>> I've been looking into this off-and-on for a while now, and this time around I
>> may have found other folks experiencing the same issue:
>>     https://issues.apache.org/jira/browse/CLOUDSTACK-6788
>> That one's a little empty on the details (do y'all know more about to whom that
>> bug was "known" than I do?), but
>>     https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/1297218
>> sees similar symptoms due to running NTP on the host.  I may try disabling that
>> after the current round of testing-3.17-rc4 finishes up.
> 
> A summary of my testing from last week:
>   * 3.17-rc4 (with qemu 2.0) seems to have had the same problem as well.
>   * Disabling ntpd didn't help either.
>   * <timer name="kvmclock" present="no"/> _does_ seem to make my VM migrate
>     without issue.
> 
> I'm not really sure what to look at from here.  I suppose leaving kvmclock
> disabled is a workaround for now, but is there a major disadvantage to doing
> so?

Sorry for not following up.  I think we have QEMU patches to fix this
issue.  I'll reply as soon as they are available in a git tree.

Paolo

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Live migration locks up 3.2 guests in do_timer(ticks ~ 500000)
  2014-09-16  8:42           ` Paolo Bonzini
@ 2014-10-27 18:42             ` Matt Mullins
  2014-10-28  9:04               ` Paolo Bonzini
  0 siblings, 1 reply; 12+ messages in thread
From: Matt Mullins @ 2014-10-27 18:42 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: kvm

On Tue, Sep 16, 2014 at 10:42:41AM +0200, Paolo Bonzini wrote:
> Il 15/09/2014 20:14, Matt Mullins ha scritto:
> > I'm not really sure what to look at from here.  I suppose leaving kvmclock
> > disabled is a workaround for now, but is there a major disadvantage to doing
> > so?
> 
> Sorry for not following up.  I think we have QEMU patches to fix this
> issue.  I'll reply as soon as they are available in a git tree.

Do you have any more information on the fix?  Are there any downsides if I
disable kvmclock for my guests instead?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Live migration locks up 3.2 guests in do_timer(ticks ~ 500000)
  2014-10-27 18:42             ` Matt Mullins
@ 2014-10-28  9:04               ` Paolo Bonzini
  2014-11-11 19:57                 ` Matt Mullins
  0 siblings, 1 reply; 12+ messages in thread
From: Paolo Bonzini @ 2014-10-28  9:04 UTC (permalink / raw)
  To: kvm

On 10/27/2014 07:42 PM, Matt Mullins wrote:
>>> > > I'm not really sure what to look at from here.  I suppose leaving kvmclock
>>> > > disabled is a workaround for now, but is there a major disadvantage to doing
>>> > > so?
>> > 
>> > Sorry for not following up.  I think we have QEMU patches to fix this
>> > issue.  I'll reply as soon as they are available in a git tree.
> Do you have any more information on the fix?  Are there any downsides if I
> disable kvmclock for my guests instead?

Hi Matt,

can you test using QEMU from git://git.qemu-project.org/qemu.git?

Paolo

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Live migration locks up 3.2 guests in do_timer(ticks ~ 500000)
  2014-10-28  9:04               ` Paolo Bonzini
@ 2014-11-11 19:57                 ` Matt Mullins
  2014-11-12  8:59                   ` Paolo Bonzini
  0 siblings, 1 reply; 12+ messages in thread
From: Matt Mullins @ 2014-11-11 19:57 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: kvm

On Tue, Oct 28, 2014 at 10:04:10AM +0100, Paolo Bonzini wrote:
> can you test using QEMU from git://git.qemu-project.org/qemu.git?

That seems to work great, yes.

Looking through the commit history, I see:
      kvmclock: Ensure time in migration never goes backward
      kvmclock: Ensure proper env->tsc value for kvmclock_current_nsec calculation

Assuming those are the right fixes for this issue, are they suitable to request
backported to distros' qemu 2.0 branches?  The merge commit for them seemed to
indicate they were problematic at first.

On second thought - I found the qemu-devel threads about reverting them in the
2.1 timeframe, so I'm going to do a little more research before I start trying
to suggest how to fix for existing install-base.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Live migration locks up 3.2 guests in do_timer(ticks ~ 500000)
  2014-11-11 19:57                 ` Matt Mullins
@ 2014-11-12  8:59                   ` Paolo Bonzini
  0 siblings, 0 replies; 12+ messages in thread
From: Paolo Bonzini @ 2014-11-12  8:59 UTC (permalink / raw)
  To: kvm



On 11/11/2014 20:57, Matt Mullins wrote:
> That seems to work great, yes.
> 
> Looking through the commit history, I see:
>       kvmclock: Ensure time in migration never goes backward
>       kvmclock: Ensure proper env->tsc value for kvmclock_current_nsec calculation
> 
> Assuming those are the right fixes for this issue, are they suitable to request
> backported to distros' qemu 2.0 branches?  The merge commit for them seemed to
> indicate they were problematic at first.
> 
> On second thought - I found the qemu-devel threads about reverting them in the
> 2.1 timeframe, so I'm going to do a little more research before I start trying
> to suggest how to fix for existing install-base.

The right commits are

de9d61e83d43be9069e6646fa9d57a3f47779d28
317b0a6d8ba44e9bf8f9c3dbd776c4536843d82c
9a48bcd1b82494671c111109b0eefdb882581499

which are similar but not equivalent to the two commits you found.  They
should be in 2.1.3 if there will be one, and they are appropriate for
backporting to 2.0.

Thanks for confirming that they fix your problem!

Paolo

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2014-11-12  9:00 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-09-08  5:54 Live migration locks up 3.2 guests in do_timer(ticks ~ 500000) Matt Mullins
2014-09-08  8:18 ` Paolo Bonzini
2014-09-08 15:56   ` Matt Mullins
2014-09-08 16:18     ` Paolo Bonzini
2014-09-08 16:39       ` Matt Mullins
2014-09-10  6:53       ` Matt Mullins
2014-09-15 18:14         ` Matt Mullins
2014-09-16  8:42           ` Paolo Bonzini
2014-10-27 18:42             ` Matt Mullins
2014-10-28  9:04               ` Paolo Bonzini
2014-11-11 19:57                 ` Matt Mullins
2014-11-12  8:59                   ` Paolo Bonzini

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.