All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] troubleshooting live migration
@ 2014-01-14 15:31 Marcus Sorensen
  2014-01-15  6:08 ` Marcus Sorensen
  0 siblings, 1 reply; 4+ messages in thread
From: Marcus Sorensen @ 2014-01-14 15:31 UTC (permalink / raw)
  To: qemu-devel

Does anyone have tips on troubleshooting live migration? I've got
several E5-2650 servers running in test environment, kernel 3.10.26
and qemu 1.7.0. If I start a VM guest (say ubuntu, debian, or centos),
I can migrate it around  from host to host to host just fine, but if I
wait awhile (say 1 hour), I try to migrate and it succeeds but the
guest is hosed. No longer pings, cpu is thrashing. I've tried to
strace it and don't see anything that other working hosts aren't
doing, and I've tried gdb but I'm not entirely sure what I'm doing. I
tried downgrading to qemu 1.6.1. I've found dozens of reports of such
behavior, but they're all due to other things (migrating between
different host CPUs, someone thinking it's virtio or memballoon only
to later find a fix like changing machine type, etc). I'm at a loss.
This seems to work just fine with stock CentOS builds.

I'd be happy to try to capture a core if someone is willing to look at it.

Here's an example xml:

<domain type='kvm'>
  <name>VM12</name>
  <uuid>dd25acfc-e24d-4de6-814c-72ac465bc208</uuid>
  <description></description>
  <memory unit='KiB'>4194304</memory>
  <currentMemory unit='KiB'>4194304</currentMemory>
  <vcpu placement='static'>2</vcpu>
  <cputune>
    <shares>2000</shares>
  </cputune>
  <resource>
    <partition>/machine</partition>
  </resource>
  <os>
    <type arch='x86_64' machine='pc-i440fx-1.7'>hvm</type>
    <boot dev='cdrom'/>
    <boot dev='hd'/>
  </os>
  <features>
    <acpi/>
    <apic/>
    <pae/>
  </features>
  <cpu>
  </cpu>
  <clock offset='utc'>
    <timer name='kvmclock' tickpolicy='catchup'/>
  </clock>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>destroy</on_crash>
  <devices>
    <emulator>/usr/bin/qemu-kvm</emulator>
    <disk type='block' device='disk'>
      <driver name='qemu' type='raw' cache='none'/>
      <source dev='/dev/sdc'/>
      <target dev='vda' bus='virtio'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x05'
function='0x0'/>
    </disk>
    <disk type='file' device='cdrom'>
      <driver name='qemu' type='raw' cache='none'/>
      <target dev='hdc' bus='ide'/>
      <readonly/>
      <address type='drive' controller='0' bus='1' target='0' unit='0'/>
    </disk>
    <controller type='ide' index='0'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01'
function='0x1'/>
    </controller>
    <controller type='virtio-serial' index='0'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x04'
function='0x0'/>
    </controller>
    <controller type='usb' index='0'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01'
function='0x2'/>
    </controller>
    <controller type='pci' index='0' model='pci-root'/>
    <interface type='bridge'>
      <mac address='02:00:09:66:00:18'/>
      <source bridge='br1000192'/>
      <model type='virtio'/>
      <bandwidth>
        <inbound average='128000' peak='128000'/>
        <outbound average='128000' peak='128000'/>
      </bandwidth>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x03'
function='0x0'/>
    </interface>
    <serial type='pty'>
      <target port='0'/>
    </serial>
    <console type='pty'>
      <target type='serial' port='0'/>
    </console>
    <channel type='unix'>
      <source mode='bind' path='/var/lib/libvirt/qemu/VM12.agent'/>
      <target type='virtio' name='VM12.vport'/>
      <address type='virtio-serial' controller='0' bus='0' port='1'/>
    </channel>
    <input type='tablet' bus='usb'/>
    <input type='mouse' bus='ps2'/>
    <graphics type='vnc' port='-1' autoport='yes' listen='0.0.0.0'>
      <listen type='address' address='0.0.0.0'/>
    </graphics>
    <video>
      <model type='cirrus' vram='9216' heads='1'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02'
function='0x0'/>
    </video>
    <memballoon model='virtio'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x06'
function='0x0'/>
    </memballoon>
  </devices>
  <seclabel type='none'/>
</domain>

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Qemu-devel] troubleshooting live migration
  2014-01-14 15:31 [Qemu-devel] troubleshooting live migration Marcus Sorensen
@ 2014-01-15  6:08 ` Marcus Sorensen
  2014-01-15 14:27   ` Marcus Sorensen
  0 siblings, 1 reply; 4+ messages in thread
From: Marcus Sorensen @ 2014-01-15  6:08 UTC (permalink / raw)
  To: qemu-devel

Ok, more information. The console spews 'lapic increasing min_delta_ns
to ################' when this happens.

On Tue, Jan 14, 2014 at 8:31 AM, Marcus Sorensen <shadowsor@gmail.com> wrote:
> Does anyone have tips on troubleshooting live migration? I've got
> several E5-2650 servers running in test environment, kernel 3.10.26
> and qemu 1.7.0. If I start a VM guest (say ubuntu, debian, or centos),
> I can migrate it around  from host to host to host just fine, but if I
> wait awhile (say 1 hour), I try to migrate and it succeeds but the
> guest is hosed. No longer pings, cpu is thrashing. I've tried to
> strace it and don't see anything that other working hosts aren't
> doing, and I've tried gdb but I'm not entirely sure what I'm doing. I
> tried downgrading to qemu 1.6.1. I've found dozens of reports of such
> behavior, but they're all due to other things (migrating between
> different host CPUs, someone thinking it's virtio or memballoon only
> to later find a fix like changing machine type, etc). I'm at a loss.
> This seems to work just fine with stock CentOS builds.
>
> I'd be happy to try to capture a core if someone is willing to look at it.
>
> Here's an example xml:
>
> <domain type='kvm'>
>   <name>VM12</name>
>   <uuid>dd25acfc-e24d-4de6-814c-72ac465bc208</uuid>
>   <description></description>
>   <memory unit='KiB'>4194304</memory>
>   <currentMemory unit='KiB'>4194304</currentMemory>
>   <vcpu placement='static'>2</vcpu>
>   <cputune>
>     <shares>2000</shares>
>   </cputune>
>   <resource>
>     <partition>/machine</partition>
>   </resource>
>   <os>
>     <type arch='x86_64' machine='pc-i440fx-1.7'>hvm</type>
>     <boot dev='cdrom'/>
>     <boot dev='hd'/>
>   </os>
>   <features>
>     <acpi/>
>     <apic/>
>     <pae/>
>   </features>
>   <cpu>
>   </cpu>
>   <clock offset='utc'>
>     <timer name='kvmclock' tickpolicy='catchup'/>
>   </clock>
>   <on_poweroff>destroy</on_poweroff>
>   <on_reboot>restart</on_reboot>
>   <on_crash>destroy</on_crash>
>   <devices>
>     <emulator>/usr/bin/qemu-kvm</emulator>
>     <disk type='block' device='disk'>
>       <driver name='qemu' type='raw' cache='none'/>
>       <source dev='/dev/sdc'/>
>       <target dev='vda' bus='virtio'/>
>       <address type='pci' domain='0x0000' bus='0x00' slot='0x05'
> function='0x0'/>
>     </disk>
>     <disk type='file' device='cdrom'>
>       <driver name='qemu' type='raw' cache='none'/>
>       <target dev='hdc' bus='ide'/>
>       <readonly/>
>       <address type='drive' controller='0' bus='1' target='0' unit='0'/>
>     </disk>
>     <controller type='ide' index='0'>
>       <address type='pci' domain='0x0000' bus='0x00' slot='0x01'
> function='0x1'/>
>     </controller>
>     <controller type='virtio-serial' index='0'>
>       <address type='pci' domain='0x0000' bus='0x00' slot='0x04'
> function='0x0'/>
>     </controller>
>     <controller type='usb' index='0'>
>       <address type='pci' domain='0x0000' bus='0x00' slot='0x01'
> function='0x2'/>
>     </controller>
>     <controller type='pci' index='0' model='pci-root'/>
>     <interface type='bridge'>
>       <mac address='02:00:09:66:00:18'/>
>       <source bridge='br1000192'/>
>       <model type='virtio'/>
>       <bandwidth>
>         <inbound average='128000' peak='128000'/>
>         <outbound average='128000' peak='128000'/>
>       </bandwidth>
>       <address type='pci' domain='0x0000' bus='0x00' slot='0x03'
> function='0x0'/>
>     </interface>
>     <serial type='pty'>
>       <target port='0'/>
>     </serial>
>     <console type='pty'>
>       <target type='serial' port='0'/>
>     </console>
>     <channel type='unix'>
>       <source mode='bind' path='/var/lib/libvirt/qemu/VM12.agent'/>
>       <target type='virtio' name='VM12.vport'/>
>       <address type='virtio-serial' controller='0' bus='0' port='1'/>
>     </channel>
>     <input type='tablet' bus='usb'/>
>     <input type='mouse' bus='ps2'/>
>     <graphics type='vnc' port='-1' autoport='yes' listen='0.0.0.0'>
>       <listen type='address' address='0.0.0.0'/>
>     </graphics>
>     <video>
>       <model type='cirrus' vram='9216' heads='1'/>
>       <address type='pci' domain='0x0000' bus='0x00' slot='0x02'
> function='0x0'/>
>     </video>
>     <memballoon model='virtio'>
>       <address type='pci' domain='0x0000' bus='0x00' slot='0x06'
> function='0x0'/>
>     </memballoon>
>   </devices>
>   <seclabel type='none'/>
> </domain>

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Qemu-devel] troubleshooting live migration
  2014-01-15  6:08 ` Marcus Sorensen
@ 2014-01-15 14:27   ` Marcus Sorensen
  0 siblings, 0 replies; 4+ messages in thread
From: Marcus Sorensen @ 2014-01-15 14:27 UTC (permalink / raw)
  To: qemu-devel

I tried -no-hpet, was still able to replicate the 'lapic' issue. I
find it interesting that I can only trigger it if the vm has been
running awhile.

On Tue, Jan 14, 2014 at 11:08 PM, Marcus Sorensen <shadowsor@gmail.com> wrote:
> Ok, more information. The console spews 'lapic increasing min_delta_ns
> to ################' when this happens.
>
> On Tue, Jan 14, 2014 at 8:31 AM, Marcus Sorensen <shadowsor@gmail.com> wrote:
>> Does anyone have tips on troubleshooting live migration? I've got
>> several E5-2650 servers running in test environment, kernel 3.10.26
>> and qemu 1.7.0. If I start a VM guest (say ubuntu, debian, or centos),
>> I can migrate it around  from host to host to host just fine, but if I
>> wait awhile (say 1 hour), I try to migrate and it succeeds but the
>> guest is hosed. No longer pings, cpu is thrashing. I've tried to
>> strace it and don't see anything that other working hosts aren't
>> doing, and I've tried gdb but I'm not entirely sure what I'm doing. I
>> tried downgrading to qemu 1.6.1. I've found dozens of reports of such
>> behavior, but they're all due to other things (migrating between
>> different host CPUs, someone thinking it's virtio or memballoon only
>> to later find a fix like changing machine type, etc). I'm at a loss.
>> This seems to work just fine with stock CentOS builds.
>>
>> I'd be happy to try to capture a core if someone is willing to look at it.
>>
>> Here's an example xml:
>>
>> <domain type='kvm'>
>>   <name>VM12</name>
>>   <uuid>dd25acfc-e24d-4de6-814c-72ac465bc208</uuid>
>>   <description></description>
>>   <memory unit='KiB'>4194304</memory>
>>   <currentMemory unit='KiB'>4194304</currentMemory>
>>   <vcpu placement='static'>2</vcpu>
>>   <cputune>
>>     <shares>2000</shares>
>>   </cputune>
>>   <resource>
>>     <partition>/machine</partition>
>>   </resource>
>>   <os>
>>     <type arch='x86_64' machine='pc-i440fx-1.7'>hvm</type>
>>     <boot dev='cdrom'/>
>>     <boot dev='hd'/>
>>   </os>
>>   <features>
>>     <acpi/>
>>     <apic/>
>>     <pae/>
>>   </features>
>>   <cpu>
>>   </cpu>
>>   <clock offset='utc'>
>>     <timer name='kvmclock' tickpolicy='catchup'/>
>>   </clock>
>>   <on_poweroff>destroy</on_poweroff>
>>   <on_reboot>restart</on_reboot>
>>   <on_crash>destroy</on_crash>
>>   <devices>
>>     <emulator>/usr/bin/qemu-kvm</emulator>
>>     <disk type='block' device='disk'>
>>       <driver name='qemu' type='raw' cache='none'/>
>>       <source dev='/dev/sdc'/>
>>       <target dev='vda' bus='virtio'/>
>>       <address type='pci' domain='0x0000' bus='0x00' slot='0x05'
>> function='0x0'/>
>>     </disk>
>>     <disk type='file' device='cdrom'>
>>       <driver name='qemu' type='raw' cache='none'/>
>>       <target dev='hdc' bus='ide'/>
>>       <readonly/>
>>       <address type='drive' controller='0' bus='1' target='0' unit='0'/>
>>     </disk>
>>     <controller type='ide' index='0'>
>>       <address type='pci' domain='0x0000' bus='0x00' slot='0x01'
>> function='0x1'/>
>>     </controller>
>>     <controller type='virtio-serial' index='0'>
>>       <address type='pci' domain='0x0000' bus='0x00' slot='0x04'
>> function='0x0'/>
>>     </controller>
>>     <controller type='usb' index='0'>
>>       <address type='pci' domain='0x0000' bus='0x00' slot='0x01'
>> function='0x2'/>
>>     </controller>
>>     <controller type='pci' index='0' model='pci-root'/>
>>     <interface type='bridge'>
>>       <mac address='02:00:09:66:00:18'/>
>>       <source bridge='br1000192'/>
>>       <model type='virtio'/>
>>       <bandwidth>
>>         <inbound average='128000' peak='128000'/>
>>         <outbound average='128000' peak='128000'/>
>>       </bandwidth>
>>       <address type='pci' domain='0x0000' bus='0x00' slot='0x03'
>> function='0x0'/>
>>     </interface>
>>     <serial type='pty'>
>>       <target port='0'/>
>>     </serial>
>>     <console type='pty'>
>>       <target type='serial' port='0'/>
>>     </console>
>>     <channel type='unix'>
>>       <source mode='bind' path='/var/lib/libvirt/qemu/VM12.agent'/>
>>       <target type='virtio' name='VM12.vport'/>
>>       <address type='virtio-serial' controller='0' bus='0' port='1'/>
>>     </channel>
>>     <input type='tablet' bus='usb'/>
>>     <input type='mouse' bus='ps2'/>
>>     <graphics type='vnc' port='-1' autoport='yes' listen='0.0.0.0'>
>>       <listen type='address' address='0.0.0.0'/>
>>     </graphics>
>>     <video>
>>       <model type='cirrus' vram='9216' heads='1'/>
>>       <address type='pci' domain='0x0000' bus='0x00' slot='0x02'
>> function='0x0'/>
>>     </video>
>>     <memballoon model='virtio'>
>>       <address type='pci' domain='0x0000' bus='0x00' slot='0x06'
>> function='0x0'/>
>>     </memballoon>
>>   </devices>
>>   <seclabel type='none'/>
>> </domain>

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Qemu-devel] troubleshooting live migration
@ 2014-01-16 22:33 Marcin Gibuła
  0 siblings, 0 replies; 4+ messages in thread
From: Marcin Gibuła @ 2014-01-16 22:33 UTC (permalink / raw)
  To: qemu-devel

 > I tried -no-hpet, was still able to replicate the 'lapic' issue. I
 > find it interesting that I can only trigger it if the vm has been
 > running awhile.

Hi,

I've seen identical crashes with live migration in our environment. It 
looks identical - VM has to be idle for some time and after migration 
CPU is at 100% and VM is dead. All migration happens between same hardware.

I don't think I've ever had Windows guest crashing like this and I think 
this is somehow related to kvmclock. I've tried to debug qemu guest 
process and from I can tell, its kernel is busy looping in some time 
management related functions. Could you try to reproduce this issue with 
-no-kvmclock? Our testing environment is currently offline so I can't 
test it myself.

We also use 3.10 kernel (though 3.8 wasn't working either) and strugled 
with this issue with qemu 1.4, 1.5 and 1.6. Didn't test 1.7. Also we're 
using AMD CPUs, so it seems to be platform independend.

-- 
mg

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2014-01-16 22:34 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-01-14 15:31 [Qemu-devel] troubleshooting live migration Marcus Sorensen
2014-01-15  6:08 ` Marcus Sorensen
2014-01-15 14:27   ` Marcus Sorensen
2014-01-16 22:33 Marcin Gibuła

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.