All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] hitting intermittent issue with live migration from qemu-kvm-ev 2.3.0 to qemu-kvm-ev 2.6.0
@ 2017-03-31 20:12 Chris Friesen
  2017-04-03  9:30 ` Dr. David Alan Gilbert
  2017-04-03 19:11 ` Stefan Hajnoczi
  0 siblings, 2 replies; 6+ messages in thread
From: Chris Friesen @ 2017-03-31 20:12 UTC (permalink / raw)
  To: qemu-devel

Hi,

I'm running into an issue with live-migrating a guest from a host running 
qemu-kvm-ev 2.3.0-31 to a host running qemu-kvm-ev 2.6.0-27.1.  This is a 
libvirt-tunnelled migration, in the context of upgrading an OpenStack install to 
newer software.  The source host is running CentOS 7.2.1511, while the dest host 
is running CentOS 7.3.1611.

I'll include the qemu commandlines for the source/dest at the bottom.

Initially we have a bunch of guests running on compute-2 (which is running 
qemu-kvm-ev 2.3.0).  We then started live-migrating them one at a time to 
compute-0 (which is running qemu-kvm-ev 2.6.0).  Three of them migrated 
successfully.  The fourth (which was essentially identical in configuration to 
the first three) failed, as per the following logs in 
/var/log/libvirt/qemu/instance-0000000e.log:


2017-03-29T06:38:37.886940Z qemu-kvm: VQ 2 size 0x80 < last_avail_idx 0x47b - 
used_idx 0x47c
2017-03-29T06:38:37.886974Z qemu-kvm: error while loading state for instance 0x0 
of device '0000:00:07.0/virtio-balloon'
2017-03-29T06:38:37.888684Z qemu-kvm: load of migration failed: Operation not 
permitted
2017-03-29 06:38:37.896+0000: shutting down


Does anyone know of an existing bug report covering this issue?  (I took a look 
and didn't see anything obviously related.)


The qemu commandline on the source compute node is:


/usr/libexec/qemu-kvm -c 0x00000000000000000000000000000001 -n 4 
--proc-type=secondary --file-prefix=vs -- -enable-dpdk -name instance-0000000e 
-S -machine pc-i440fx-rhel7.2.0,accel=kvm,usb=off -m 512 -realtime mlock=off 
-smp 1,sockets=1,cores=1,threads=1 -object 
memory-backend-file,id=ram-node0,prealloc=yes,mem-path=/mnt/huge-2048kB/libvirt/qemu,share=yes,size=536870912,host-nodes=1,policy=bind 
-numa node,nodeid=0,cpus=0,memdev=ram-node0 -uuid 
57ae849f-aa66-422a-90a2-62db6c59db29 -smbios type=1,manufacturer=Fedora 
Project,product=OpenStack 
Nova,version=13.0.0-0.tis.4,serial=4c8121f1-d927-424e-8712-88b1de45be37,uuid=57ae849f-aa66-422a-90a2-62db6c59db29,family=Virtual 
Machine -no-user-config -nodefaults -chardev 
socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-instance-0000000e/monitor.sock,server,nowait 
-mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc,driftfix=slew 
-global kvm-pit.lost_tick_policy=discard -no-hpet -no-shutdown -boot 
reboot-timeout=5000,strict=on -device 
piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive 
file=/dev/disk/by-path/ip-192.168.205.6:3260-iscsi-iqn.2010-10.org.openstack:volume-ac57fcaa-7ecd-4d3b-8671-3bc740337a42-lun-0,if=none,id=drive-virtio-disk0,format=raw,serial=ac57fcaa-7ecd-4d3b-8671-3bc740337a42,cache=none,aio=native 
-device 
virtio-blk-pci,scsi=off,bus=pci.0,addr=0x6,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 
-chardev 
socket,id=charnet0,path=/var/run/vswitch/usvhost-9e574d3c-32dd-4d39-97e6-447b15fb00b4 
-netdev type=vhost-user,id=hostnet0,chardev=charnet0 -device 
virtio-net-pci,netdev=hostnet0,id=net0,mac=fa:16:3e:b0:59:a9,bus=pci.0,addr=0x3 
-chardev 
socket,id=charnet1,path=/var/run/vswitch/usvhost-7bc48d91-f215-4394-99ff-eb7f20d9ff1e 
-netdev type=vhost-user,id=hostnet1,chardev=charnet1 -device 
virtio-net-pci,netdev=hostnet1,id=net1,mac=fa:16:3e:8b:6f:09,bus=pci.0,addr=0x4 
-chardev 
socket,id=charnet2,path=/var/run/vswitch/usvhost-c32e2d0d-9ed4-4f4b-abc9-539a12a86008 
-netdev type=vhost-user,id=hostnet2,chardev=charnet2 -device 
virtio-net-pci,netdev=hostnet2,id=net2,mac=fa:16:3e:07:ca:a0,bus=pci.0,addr=0x5 
-chardev 
file,id=charserial0,path=/etc/nova/instances/57ae849f-aa66-422a-90a2-62db6c59db29/console.log 
-device isa-serial,chardev=charserial0,id=serial0 -chardev pty,id=charserial1 
-device isa-serial,chardev=charserial1,id=serial1 -device usb-tablet,id=input0 
-vnc 0.0.0.0:11 -k en-us -device cirrus-vga,id=video0,bus=pci.0,addr=0x2 
-incoming fd:25 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x7 -msg 
timestamp=on



The complete instance-0000000e.log file on the destination is:

2017-03-29 06:38:35.962+0000: starting up libvirt version: 2.0.0, package: 
10.el7_3.2.tis.24 (Unknown, 2017-03-15-14:59:22, yow-dsulliva-lx-vm1.wrs.com), 
qemu version: 2.6.0 (qemu-kvm-ev-2.6.0-27.1.el7.tis.31), hostname: compute-0
LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin 
QEMU_AUDIO_DRV=none /usr/libexec/qemu-kvm '-c 
0x00000000000000000000000000000001' '-n 4' --proc-type=secondary 
--file-prefix=vs -- -enable-dpdk -name guest=instance-0000000e,debug-threads=on 
-S -object 
secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-10-instance-0000000e/master-key.aes 
-machine pc-i440fx-rhel7.2.0,accel=kvm,usb=off -m 512 -realtime mlock=off -smp 
1,sockets=1,cores=1,threads=1 -object 
memory-backend-file,id=ram-node0,prealloc=yes,mem-path=/mnt/huge-2048kB/libvirt/qemu,share=yes,size=536870912,host-nodes=0,policy=bind 
-numa node,nodeid=0,cpus=0,memdev=ram-node0 -uuid 
57ae849f-aa66-422a-90a2-62db6c59db29 -smbios 'type=1,manufacturer=Fedora 
Project,product=OpenStack 
Nova,version=13.0.0-0.tis.4,serial=4c8121f1-d927-424e-8712-88b1de45be37,uuid=57ae849f-aa66-422a-90a2-62db6c59db29,family=Virtual 
Machine' -no-user-config -nodefaults -chardev 
socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-10-instance-0000000e/monitor.sock,server,nowait 
-mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc,driftfix=slew 
-global kvm-pit.lost_tick_policy=discard -no-hpet -no-shutdown -boot 
reboot-timeout=5000,strict=on -device 
piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive 
file=/dev/disk/by-path/ip-192.168.205.6:3260-iscsi-iqn.2010-10.org.openstack:volume-ac57fcaa-7ecd-4d3b-8671-3bc740337a42-lun-0,format=raw,if=none,id=drive-virtio-disk0,serial=ac57fcaa-7ecd-4d3b-8671-3bc740337a42,cache=none,aio=native 
-device 
virtio-blk-pci,scsi=off,bus=pci.0,addr=0x6,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 
-chardev 
socket,id=charnet0,path=/var/run/vswitch/usvhost-9e574d3c-32dd-4d39-97e6-447b15fb00b4 
-netdev type=vhost-user,id=hostnet0,chardev=charnet0 -device 
virtio-net-pci,netdev=hostnet0,id=net0,mac=fa:16:3e:b0:59:a9,bus=pci.0,addr=0x3 
-chardev 
socket,id=charnet1,path=/var/run/vswitch/usvhost-7bc48d91-f215-4394-99ff-eb7f20d9ff1e 
-netdev type=vhost-user,id=hostnet1,chardev=charnet1 -device 
virtio-net-pci,netdev=hostnet1,id=net1,mac=fa:16:3e:8b:6f:09,bus=pci.0,addr=0x4 
-chardev 
socket,id=charnet2,path=/var/run/vswitch/usvhost-c32e2d0d-9ed4-4f4b-abc9-539a12a86008 
-netdev type=vhost-user,id=hostnet2,chardev=charnet2 -device 
virtio-net-pci,netdev=hostnet2,id=net2,mac=fa:16:3e:07:ca:a0,bus=pci.0,addr=0x5 
-add-fd set=0,fd=51 -chardev file,id=charserial0,path=/dev/fdset/0,append=on 
-device isa-serial,chardev=charserial0,id=serial0 -chardev pty,id=charserial1 
-device isa-serial,chardev=charserial1,id=serial1 -device usb-tablet,id=input0 
-vnc 0.0.0.0:9 -k en-us -device cirrus-vga,id=video0,bus=pci.0,addr=0x2 
-incoming defer -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x7 -msg 
timestamp=on
Domain id=10 is tainted: high-privileges
EAL:eal_memory.c:1591: WARNING: Address Space Layout Randomization (ASLR) is 
enabled in the kernel.
EAL:eal_memory.c:1593:    This may cause issues with mapping memory into 
secondary processes
char device redirected to /dev/pts/9 (label charserial1)
2017-03-29T06:38:37.886940Z qemu-kvm: VQ 2 size 0x80 < last_avail_idx 0x47b - 
used_idx 0x47c
2017-03-29T06:38:37.886974Z qemu-kvm: error while loading state for instance 0x0 
of device '0000:00:07.0/virtio-balloon'
2017-03-29T06:38:37.888684Z qemu-kvm: load of migration failed: Operation not 
permitted
2017-03-29 06:38:37.896+0000: shutting down


For what it's worth, the differences between the two qemu command lines are as 
follows:

source:
-name instance-0000000e -chardev 
file,id=charserial0,path=/etc/nova/instances/57ae849f-aa66-422a-90a2-62db6c59db29/console.log 
-vnc 0.0.0.0:9 -incoming fd:25

destination:
-name guest=instance-0000000e,debug-threads=on -object 
secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-10-instance-0000000e/master-key.aes 
-add-fd set=0,fd=51 -chardev file,id=charserial0,path=/dev/fdset/0,append=on 
-vnc 0.0.0.0:11 -incoming defer

Thanks,
Chris

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Qemu-devel] hitting intermittent issue with live migration from qemu-kvm-ev 2.3.0 to qemu-kvm-ev 2.6.0
  2017-03-31 20:12 [Qemu-devel] hitting intermittent issue with live migration from qemu-kvm-ev 2.3.0 to qemu-kvm-ev 2.6.0 Chris Friesen
@ 2017-04-03  9:30 ` Dr. David Alan Gilbert
  2017-04-03 19:11 ` Stefan Hajnoczi
  1 sibling, 0 replies; 6+ messages in thread
From: Dr. David Alan Gilbert @ 2017-04-03  9:30 UTC (permalink / raw)
  To: Chris Friesen; +Cc: qemu-devel, lprosek

* Chris Friesen (chris.friesen@windriver.com) wrote:
> Hi,
> 
> I'm running into an issue with live-migrating a guest from a host running
> qemu-kvm-ev 2.3.0-31 to a host running qemu-kvm-ev 2.6.0-27.1.  This is a
> libvirt-tunnelled migration, in the context of upgrading an OpenStack
> install to newer software.  The source host is running CentOS 7.2.1511,
> while the dest host is running CentOS 7.3.1611.
> 
> I'll include the qemu commandlines for the source/dest at the bottom.
> 
> Initially we have a bunch of guests running on compute-2 (which is running
> qemu-kvm-ev 2.3.0).  We then started live-migrating them one at a time to
> compute-0 (which is running qemu-kvm-ev 2.6.0).  Three of them migrated
> successfully.  The fourth (which was essentially identical in configuration
> to the first three) failed, as per the following logs in
> /var/log/libvirt/qemu/instance-0000000e.log:
> 
> 
> 2017-03-29T06:38:37.886940Z qemu-kvm: VQ 2 size 0x80 < last_avail_idx 0x47b
> - used_idx 0x47c
> 2017-03-29T06:38:37.886974Z qemu-kvm: error while loading state for instance
> 0x0 of device '0000:00:07.0/virtio-balloon'
> 2017-03-29T06:38:37.888684Z qemu-kvm: load of migration failed: Operation
> not permitted
> 2017-03-29 06:38:37.896+0000: shutting down
> 
> 
> Does anyone know of an existing bug report covering this issue?  (I took a
> look and didn't see anything obviously related.)

There were a bunch of virtio bugs with similar errors; I think they were fixed in
the early 2.6.0-28 packages; I see the latest one on centos seems to be
2.6.0-28 3.6.1 - so it's worth a try.

I think the upstream fixes are 4a1e48, 297a75

Dave


> 
> The qemu commandline on the source compute node is:
> 
> 
> /usr/libexec/qemu-kvm -c 0x00000000000000000000000000000001 -n 4
> --proc-type=secondary --file-prefix=vs -- -enable-dpdk -name
> instance-0000000e -S -machine pc-i440fx-rhel7.2.0,accel=kvm,usb=off -m 512
> -realtime mlock=off -smp 1,sockets=1,cores=1,threads=1 -object memory-backend-file,id=ram-node0,prealloc=yes,mem-path=/mnt/huge-2048kB/libvirt/qemu,share=yes,size=536870912,host-nodes=1,policy=bind
> -numa node,nodeid=0,cpus=0,memdev=ram-node0 -uuid
> 57ae849f-aa66-422a-90a2-62db6c59db29 -smbios type=1,manufacturer=Fedora
> Project,product=OpenStack Nova,version=13.0.0-0.tis.4,serial=4c8121f1-d927-424e-8712-88b1de45be37,uuid=57ae849f-aa66-422a-90a2-62db6c59db29,family=Virtual
> Machine -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-instance-0000000e/monitor.sock,server,nowait
> -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc,driftfix=slew
> -global kvm-pit.lost_tick_policy=discard -no-hpet -no-shutdown -boot
> reboot-timeout=5000,strict=on -device
> piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive file=/dev/disk/by-path/ip-192.168.205.6:3260-iscsi-iqn.2010-10.org.openstack:volume-ac57fcaa-7ecd-4d3b-8671-3bc740337a42-lun-0,if=none,id=drive-virtio-disk0,format=raw,serial=ac57fcaa-7ecd-4d3b-8671-3bc740337a42,cache=none,aio=native
> -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x6,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1
> -chardev socket,id=charnet0,path=/var/run/vswitch/usvhost-9e574d3c-32dd-4d39-97e6-447b15fb00b4
> -netdev type=vhost-user,id=hostnet0,chardev=charnet0 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=fa:16:3e:b0:59:a9,bus=pci.0,addr=0x3
> -chardev socket,id=charnet1,path=/var/run/vswitch/usvhost-7bc48d91-f215-4394-99ff-eb7f20d9ff1e
> -netdev type=vhost-user,id=hostnet1,chardev=charnet1 -device virtio-net-pci,netdev=hostnet1,id=net1,mac=fa:16:3e:8b:6f:09,bus=pci.0,addr=0x4
> -chardev socket,id=charnet2,path=/var/run/vswitch/usvhost-c32e2d0d-9ed4-4f4b-abc9-539a12a86008
> -netdev type=vhost-user,id=hostnet2,chardev=charnet2 -device virtio-net-pci,netdev=hostnet2,id=net2,mac=fa:16:3e:07:ca:a0,bus=pci.0,addr=0x5
> -chardev file,id=charserial0,path=/etc/nova/instances/57ae849f-aa66-422a-90a2-62db6c59db29/console.log
> -device isa-serial,chardev=charserial0,id=serial0 -chardev
> pty,id=charserial1 -device isa-serial,chardev=charserial1,id=serial1 -device
> usb-tablet,id=input0 -vnc 0.0.0.0:11 -k en-us -device
> cirrus-vga,id=video0,bus=pci.0,addr=0x2 -incoming fd:25 -device
> virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x7 -msg timestamp=on
> 
> 
> 
> The complete instance-0000000e.log file on the destination is:
> 
> 2017-03-29 06:38:35.962+0000: starting up libvirt version: 2.0.0, package:
> 10.el7_3.2.tis.24 (Unknown, 2017-03-15-14:59:22,
> yow-dsulliva-lx-vm1.wrs.com), qemu version: 2.6.0
> (qemu-kvm-ev-2.6.0-27.1.el7.tis.31), hostname: compute-0
> LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin
> QEMU_AUDIO_DRV=none /usr/libexec/qemu-kvm '-c
> 0x00000000000000000000000000000001' '-n 4' --proc-type=secondary
> --file-prefix=vs -- -enable-dpdk -name
> guest=instance-0000000e,debug-threads=on -S -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-10-instance-0000000e/master-key.aes
> -machine pc-i440fx-rhel7.2.0,accel=kvm,usb=off -m 512 -realtime mlock=off
> -smp 1,sockets=1,cores=1,threads=1 -object memory-backend-file,id=ram-node0,prealloc=yes,mem-path=/mnt/huge-2048kB/libvirt/qemu,share=yes,size=536870912,host-nodes=0,policy=bind
> -numa node,nodeid=0,cpus=0,memdev=ram-node0 -uuid
> 57ae849f-aa66-422a-90a2-62db6c59db29 -smbios 'type=1,manufacturer=Fedora
> Project,product=OpenStack Nova,version=13.0.0-0.tis.4,serial=4c8121f1-d927-424e-8712-88b1de45be37,uuid=57ae849f-aa66-422a-90a2-62db6c59db29,family=Virtual
> Machine' -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-10-instance-0000000e/monitor.sock,server,nowait
> -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc,driftfix=slew
> -global kvm-pit.lost_tick_policy=discard -no-hpet -no-shutdown -boot
> reboot-timeout=5000,strict=on -device
> piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive file=/dev/disk/by-path/ip-192.168.205.6:3260-iscsi-iqn.2010-10.org.openstack:volume-ac57fcaa-7ecd-4d3b-8671-3bc740337a42-lun-0,format=raw,if=none,id=drive-virtio-disk0,serial=ac57fcaa-7ecd-4d3b-8671-3bc740337a42,cache=none,aio=native
> -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x6,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1
> -chardev socket,id=charnet0,path=/var/run/vswitch/usvhost-9e574d3c-32dd-4d39-97e6-447b15fb00b4
> -netdev type=vhost-user,id=hostnet0,chardev=charnet0 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=fa:16:3e:b0:59:a9,bus=pci.0,addr=0x3
> -chardev socket,id=charnet1,path=/var/run/vswitch/usvhost-7bc48d91-f215-4394-99ff-eb7f20d9ff1e
> -netdev type=vhost-user,id=hostnet1,chardev=charnet1 -device virtio-net-pci,netdev=hostnet1,id=net1,mac=fa:16:3e:8b:6f:09,bus=pci.0,addr=0x4
> -chardev socket,id=charnet2,path=/var/run/vswitch/usvhost-c32e2d0d-9ed4-4f4b-abc9-539a12a86008
> -netdev type=vhost-user,id=hostnet2,chardev=charnet2 -device virtio-net-pci,netdev=hostnet2,id=net2,mac=fa:16:3e:07:ca:a0,bus=pci.0,addr=0x5
> -add-fd set=0,fd=51 -chardev file,id=charserial0,path=/dev/fdset/0,append=on
> -device isa-serial,chardev=charserial0,id=serial0 -chardev
> pty,id=charserial1 -device isa-serial,chardev=charserial1,id=serial1 -device
> usb-tablet,id=input0 -vnc 0.0.0.0:9 -k en-us -device
> cirrus-vga,id=video0,bus=pci.0,addr=0x2 -incoming defer -device
> virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x7 -msg timestamp=on
> Domain id=10 is tainted: high-privileges
> EAL:eal_memory.c:1591: WARNING: Address Space Layout Randomization (ASLR) is
> enabled in the kernel.
> EAL:eal_memory.c:1593:    This may cause issues with mapping memory into
> secondary processes
> char device redirected to /dev/pts/9 (label charserial1)
> 2017-03-29T06:38:37.886940Z qemu-kvm: VQ 2 size 0x80 < last_avail_idx 0x47b
> - used_idx 0x47c
> 2017-03-29T06:38:37.886974Z qemu-kvm: error while loading state for instance
> 0x0 of device '0000:00:07.0/virtio-balloon'
> 2017-03-29T06:38:37.888684Z qemu-kvm: load of migration failed: Operation
> not permitted
> 2017-03-29 06:38:37.896+0000: shutting down
> 
> 
> For what it's worth, the differences between the two qemu command lines are
> as follows:
> 
> source:
> -name instance-0000000e -chardev file,id=charserial0,path=/etc/nova/instances/57ae849f-aa66-422a-90a2-62db6c59db29/console.log
> -vnc 0.0.0.0:9 -incoming fd:25
> 
> destination:
> -name guest=instance-0000000e,debug-threads=on -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-10-instance-0000000e/master-key.aes
> -add-fd set=0,fd=51 -chardev file,id=charserial0,path=/dev/fdset/0,append=on
> -vnc 0.0.0.0:11 -incoming defer
> 
> Thanks,
> Chris
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Qemu-devel] hitting intermittent issue with live migration from qemu-kvm-ev 2.3.0 to qemu-kvm-ev 2.6.0
  2017-03-31 20:12 [Qemu-devel] hitting intermittent issue with live migration from qemu-kvm-ev 2.3.0 to qemu-kvm-ev 2.6.0 Chris Friesen
  2017-04-03  9:30 ` Dr. David Alan Gilbert
@ 2017-04-03 19:11 ` Stefan Hajnoczi
  2017-04-04 13:56   ` Ladi Prosek
  1 sibling, 1 reply; 6+ messages in thread
From: Stefan Hajnoczi @ 2017-04-03 19:11 UTC (permalink / raw)
  To: Chris Friesen; +Cc: qemu-devel, Ladi Prosek, Dr. David Alan Gilbert

[-- Attachment #1: Type: text/plain, Size: 9041 bytes --]

On Fri, Mar 31, 2017 at 02:12:36PM -0600, Chris Friesen wrote:
> I'm running into an issue with live-migrating a guest from a host running
> qemu-kvm-ev 2.3.0-31 to a host running qemu-kvm-ev 2.6.0-27.1.  This is a
> libvirt-tunnelled migration, in the context of upgrading an OpenStack
> install to newer software.  The source host is running CentOS 7.2.1511,
> while the dest host is running CentOS 7.3.1611.
> 
> I'll include the qemu commandlines for the source/dest at the bottom.
> 
> Initially we have a bunch of guests running on compute-2 (which is running
> qemu-kvm-ev 2.3.0).  We then started live-migrating them one at a time to
> compute-0 (which is running qemu-kvm-ev 2.6.0).  Three of them migrated
> successfully.  The fourth (which was essentially identical in configuration
> to the first three) failed, as per the following logs in
> /var/log/libvirt/qemu/instance-0000000e.log:
> 
> 
> 2017-03-29T06:38:37.886940Z qemu-kvm: VQ 2 size 0x80 < last_avail_idx 0x47b
> - used_idx 0x47c
> 2017-03-29T06:38:37.886974Z qemu-kvm: error while loading state for instance
> 0x0 of device '0000:00:07.0/virtio-balloon'
> 2017-03-29T06:38:37.888684Z qemu-kvm: load of migration failed: Operation
> not permitted
> 2017-03-29 06:38:37.896+0000: shutting down
> 
> 
> Does anyone know of an existing bug report covering this issue?  (I took a
> look and didn't see anything obviously related.)

This is the virtio-balloon device.  If you remove the device the live
migration should work reliably.

Alternatively, you can temporarily rmmod virtio_balloon inside the guest
for live migration.  After migration you can modprobe virtio_balloon
again.

last_avail_idx 0x47b with used_idx 0x47c is an invalid device state.
I've diffed qemu-kvm-ev 2.6.0-27.1 hw/virtio/virtio-balloon.c against
qemu.git/master and do not see an obvious bug.  I also compared
qemu-kvm-ev 2.3.0-31 with qemu-kvm-ev 2.6.0-27.1.

> 
> 
> The qemu commandline on the source compute node is:
> 
> 
> /usr/libexec/qemu-kvm -c 0x00000000000000000000000000000001 -n 4
> --proc-type=secondary --file-prefix=vs -- -enable-dpdk -name
> instance-0000000e -S -machine pc-i440fx-rhel7.2.0,accel=kvm,usb=off -m 512
> -realtime mlock=off -smp 1,sockets=1,cores=1,threads=1 -object memory-backend-file,id=ram-node0,prealloc=yes,mem-path=/mnt/huge-2048kB/libvirt/qemu,share=yes,size=536870912,host-nodes=1,policy=bind
> -numa node,nodeid=0,cpus=0,memdev=ram-node0 -uuid
> 57ae849f-aa66-422a-90a2-62db6c59db29 -smbios type=1,manufacturer=Fedora
> Project,product=OpenStack Nova,version=13.0.0-0.tis.4,serial=4c8121f1-d927-424e-8712-88b1de45be37,uuid=57ae849f-aa66-422a-90a2-62db6c59db29,family=Virtual
> Machine -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-instance-0000000e/monitor.sock,server,nowait
> -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc,driftfix=slew
> -global kvm-pit.lost_tick_policy=discard -no-hpet -no-shutdown -boot
> reboot-timeout=5000,strict=on -device
> piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive file=/dev/disk/by-path/ip-192.168.205.6:3260-iscsi-iqn.2010-10.org.openstack:volume-ac57fcaa-7ecd-4d3b-8671-3bc740337a42-lun-0,if=none,id=drive-virtio-disk0,format=raw,serial=ac57fcaa-7ecd-4d3b-8671-3bc740337a42,cache=none,aio=native
> -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x6,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1
> -chardev socket,id=charnet0,path=/var/run/vswitch/usvhost-9e574d3c-32dd-4d39-97e6-447b15fb00b4
> -netdev type=vhost-user,id=hostnet0,chardev=charnet0 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=fa:16:3e:b0:59:a9,bus=pci.0,addr=0x3
> -chardev socket,id=charnet1,path=/var/run/vswitch/usvhost-7bc48d91-f215-4394-99ff-eb7f20d9ff1e
> -netdev type=vhost-user,id=hostnet1,chardev=charnet1 -device virtio-net-pci,netdev=hostnet1,id=net1,mac=fa:16:3e:8b:6f:09,bus=pci.0,addr=0x4
> -chardev socket,id=charnet2,path=/var/run/vswitch/usvhost-c32e2d0d-9ed4-4f4b-abc9-539a12a86008
> -netdev type=vhost-user,id=hostnet2,chardev=charnet2 -device virtio-net-pci,netdev=hostnet2,id=net2,mac=fa:16:3e:07:ca:a0,bus=pci.0,addr=0x5
> -chardev file,id=charserial0,path=/etc/nova/instances/57ae849f-aa66-422a-90a2-62db6c59db29/console.log
> -device isa-serial,chardev=charserial0,id=serial0 -chardev
> pty,id=charserial1 -device isa-serial,chardev=charserial1,id=serial1 -device
> usb-tablet,id=input0 -vnc 0.0.0.0:11 -k en-us -device
> cirrus-vga,id=video0,bus=pci.0,addr=0x2 -incoming fd:25 -device
> virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x7 -msg timestamp=on
> 
> 
> 
> The complete instance-0000000e.log file on the destination is:
> 
> 2017-03-29 06:38:35.962+0000: starting up libvirt version: 2.0.0, package:
> 10.el7_3.2.tis.24 (Unknown, 2017-03-15-14:59:22,
> yow-dsulliva-lx-vm1.wrs.com), qemu version: 2.6.0
> (qemu-kvm-ev-2.6.0-27.1.el7.tis.31), hostname: compute-0
> LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin
> QEMU_AUDIO_DRV=none /usr/libexec/qemu-kvm '-c
> 0x00000000000000000000000000000001' '-n 4' --proc-type=secondary
> --file-prefix=vs -- -enable-dpdk -name
> guest=instance-0000000e,debug-threads=on -S -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-10-instance-0000000e/master-key.aes
> -machine pc-i440fx-rhel7.2.0,accel=kvm,usb=off -m 512 -realtime mlock=off
> -smp 1,sockets=1,cores=1,threads=1 -object memory-backend-file,id=ram-node0,prealloc=yes,mem-path=/mnt/huge-2048kB/libvirt/qemu,share=yes,size=536870912,host-nodes=0,policy=bind
> -numa node,nodeid=0,cpus=0,memdev=ram-node0 -uuid
> 57ae849f-aa66-422a-90a2-62db6c59db29 -smbios 'type=1,manufacturer=Fedora
> Project,product=OpenStack Nova,version=13.0.0-0.tis.4,serial=4c8121f1-d927-424e-8712-88b1de45be37,uuid=57ae849f-aa66-422a-90a2-62db6c59db29,family=Virtual
> Machine' -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-10-instance-0000000e/monitor.sock,server,nowait
> -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc,driftfix=slew
> -global kvm-pit.lost_tick_policy=discard -no-hpet -no-shutdown -boot
> reboot-timeout=5000,strict=on -device
> piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive file=/dev/disk/by-path/ip-192.168.205.6:3260-iscsi-iqn.2010-10.org.openstack:volume-ac57fcaa-7ecd-4d3b-8671-3bc740337a42-lun-0,format=raw,if=none,id=drive-virtio-disk0,serial=ac57fcaa-7ecd-4d3b-8671-3bc740337a42,cache=none,aio=native
> -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x6,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1
> -chardev socket,id=charnet0,path=/var/run/vswitch/usvhost-9e574d3c-32dd-4d39-97e6-447b15fb00b4
> -netdev type=vhost-user,id=hostnet0,chardev=charnet0 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=fa:16:3e:b0:59:a9,bus=pci.0,addr=0x3
> -chardev socket,id=charnet1,path=/var/run/vswitch/usvhost-7bc48d91-f215-4394-99ff-eb7f20d9ff1e
> -netdev type=vhost-user,id=hostnet1,chardev=charnet1 -device virtio-net-pci,netdev=hostnet1,id=net1,mac=fa:16:3e:8b:6f:09,bus=pci.0,addr=0x4
> -chardev socket,id=charnet2,path=/var/run/vswitch/usvhost-c32e2d0d-9ed4-4f4b-abc9-539a12a86008
> -netdev type=vhost-user,id=hostnet2,chardev=charnet2 -device virtio-net-pci,netdev=hostnet2,id=net2,mac=fa:16:3e:07:ca:a0,bus=pci.0,addr=0x5
> -add-fd set=0,fd=51 -chardev file,id=charserial0,path=/dev/fdset/0,append=on
> -device isa-serial,chardev=charserial0,id=serial0 -chardev
> pty,id=charserial1 -device isa-serial,chardev=charserial1,id=serial1 -device
> usb-tablet,id=input0 -vnc 0.0.0.0:9 -k en-us -device
> cirrus-vga,id=video0,bus=pci.0,addr=0x2 -incoming defer -device
> virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x7 -msg timestamp=on
> Domain id=10 is tainted: high-privileges
> EAL:eal_memory.c:1591: WARNING: Address Space Layout Randomization (ASLR) is
> enabled in the kernel.
> EAL:eal_memory.c:1593:    This may cause issues with mapping memory into
> secondary processes
> char device redirected to /dev/pts/9 (label charserial1)
> 2017-03-29T06:38:37.886940Z qemu-kvm: VQ 2 size 0x80 < last_avail_idx 0x47b
> - used_idx 0x47c
> 2017-03-29T06:38:37.886974Z qemu-kvm: error while loading state for instance
> 0x0 of device '0000:00:07.0/virtio-balloon'
> 2017-03-29T06:38:37.888684Z qemu-kvm: load of migration failed: Operation
> not permitted
> 2017-03-29 06:38:37.896+0000: shutting down
> 
> 
> For what it's worth, the differences between the two qemu command lines are
> as follows:
> 
> source:
> -name instance-0000000e -chardev file,id=charserial0,path=/etc/nova/instances/57ae849f-aa66-422a-90a2-62db6c59db29/console.log
> -vnc 0.0.0.0:9 -incoming fd:25
> 
> destination:
> -name guest=instance-0000000e,debug-threads=on -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-10-instance-0000000e/master-key.aes
> -add-fd set=0,fd=51 -chardev file,id=charserial0,path=/dev/fdset/0,append=on
> -vnc 0.0.0.0:11 -incoming defer
> 
> Thanks,
> Chris
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Qemu-devel] hitting intermittent issue with live migration from qemu-kvm-ev 2.3.0 to qemu-kvm-ev 2.6.0
  2017-04-03 19:11 ` Stefan Hajnoczi
@ 2017-04-04 13:56   ` Ladi Prosek
  2017-04-04 14:28     ` Chris Friesen
  0 siblings, 1 reply; 6+ messages in thread
From: Ladi Prosek @ 2017-04-04 13:56 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: Chris Friesen, qemu-devel, Dr. David Alan Gilbert

On Mon, Apr 3, 2017 at 9:11 PM, Stefan Hajnoczi <stefanha@gmail.com> wrote:
> On Fri, Mar 31, 2017 at 02:12:36PM -0600, Chris Friesen wrote:
>> I'm running into an issue with live-migrating a guest from a host running
>> qemu-kvm-ev 2.3.0-31 to a host running qemu-kvm-ev 2.6.0-27.1.  This is a
>> libvirt-tunnelled migration, in the context of upgrading an OpenStack
>> install to newer software.  The source host is running CentOS 7.2.1511,
>> while the dest host is running CentOS 7.3.1611.
>>
>> I'll include the qemu commandlines for the source/dest at the bottom.
>>
>> Initially we have a bunch of guests running on compute-2 (which is running
>> qemu-kvm-ev 2.3.0).  We then started live-migrating them one at a time to
>> compute-0 (which is running qemu-kvm-ev 2.6.0).  Three of them migrated
>> successfully.  The fourth (which was essentially identical in configuration
>> to the first three) failed, as per the following logs in
>> /var/log/libvirt/qemu/instance-0000000e.log:
>>
>>
>> 2017-03-29T06:38:37.886940Z qemu-kvm: VQ 2 size 0x80 < last_avail_idx 0x47b
>> - used_idx 0x47c
>> 2017-03-29T06:38:37.886974Z qemu-kvm: error while loading state for instance
>> 0x0 of device '0000:00:07.0/virtio-balloon'
>> 2017-03-29T06:38:37.888684Z qemu-kvm: load of migration failed: Operation
>> not permitted
>> 2017-03-29 06:38:37.896+0000: shutting down
>>
>>
>> Does anyone know of an existing bug report covering this issue?  (I took a
>> look and didn't see anything obviously related.)
>
> This is the virtio-balloon device.  If you remove the device the live
> migration should work reliably.
>
> Alternatively, you can temporarily rmmod virtio_balloon inside the guest
> for live migration.  After migration you can modprobe virtio_balloon
> again.
>
> last_avail_idx 0x47b with used_idx 0x47c is an invalid device state.
> I've diffed qemu-kvm-ev 2.6.0-27.1 hw/virtio/virtio-balloon.c against
> qemu.git/master and do not see an obvious bug.  I also compared
> qemu-kvm-ev 2.3.0-31 with qemu-kvm-ev 2.6.0-27.1.

The device likely got into the invalid state as part of a previous
migration to an unfixed QEMU. I second Stefan's suggestion to
temporarily remove the device or unload the driver.

Thanks!
Ladi

>>
>>
>> The qemu commandline on the source compute node is:
>>
>>
>> /usr/libexec/qemu-kvm -c 0x00000000000000000000000000000001 -n 4
>> --proc-type=secondary --file-prefix=vs -- -enable-dpdk -name
>> instance-0000000e -S -machine pc-i440fx-rhel7.2.0,accel=kvm,usb=off -m 512
>> -realtime mlock=off -smp 1,sockets=1,cores=1,threads=1 -object memory-backend-file,id=ram-node0,prealloc=yes,mem-path=/mnt/huge-2048kB/libvirt/qemu,share=yes,size=536870912,host-nodes=1,policy=bind
>> -numa node,nodeid=0,cpus=0,memdev=ram-node0 -uuid
>> 57ae849f-aa66-422a-90a2-62db6c59db29 -smbios type=1,manufacturer=Fedora
>> Project,product=OpenStack Nova,version=13.0.0-0.tis.4,serial=4c8121f1-d927-424e-8712-88b1de45be37,uuid=57ae849f-aa66-422a-90a2-62db6c59db29,family=Virtual
>> Machine -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-instance-0000000e/monitor.sock,server,nowait
>> -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc,driftfix=slew
>> -global kvm-pit.lost_tick_policy=discard -no-hpet -no-shutdown -boot
>> reboot-timeout=5000,strict=on -device
>> piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive file=/dev/disk/by-path/ip-192.168.205.6:3260-iscsi-iqn.2010-10.org.openstack:volume-ac57fcaa-7ecd-4d3b-8671-3bc740337a42-lun-0,if=none,id=drive-virtio-disk0,format=raw,serial=ac57fcaa-7ecd-4d3b-8671-3bc740337a42,cache=none,aio=native
>> -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x6,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1
>> -chardev socket,id=charnet0,path=/var/run/vswitch/usvhost-9e574d3c-32dd-4d39-97e6-447b15fb00b4
>> -netdev type=vhost-user,id=hostnet0,chardev=charnet0 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=fa:16:3e:b0:59:a9,bus=pci.0,addr=0x3
>> -chardev socket,id=charnet1,path=/var/run/vswitch/usvhost-7bc48d91-f215-4394-99ff-eb7f20d9ff1e
>> -netdev type=vhost-user,id=hostnet1,chardev=charnet1 -device virtio-net-pci,netdev=hostnet1,id=net1,mac=fa:16:3e:8b:6f:09,bus=pci.0,addr=0x4
>> -chardev socket,id=charnet2,path=/var/run/vswitch/usvhost-c32e2d0d-9ed4-4f4b-abc9-539a12a86008
>> -netdev type=vhost-user,id=hostnet2,chardev=charnet2 -device virtio-net-pci,netdev=hostnet2,id=net2,mac=fa:16:3e:07:ca:a0,bus=pci.0,addr=0x5
>> -chardev file,id=charserial0,path=/etc/nova/instances/57ae849f-aa66-422a-90a2-62db6c59db29/console.log
>> -device isa-serial,chardev=charserial0,id=serial0 -chardev
>> pty,id=charserial1 -device isa-serial,chardev=charserial1,id=serial1 -device
>> usb-tablet,id=input0 -vnc 0.0.0.0:11 -k en-us -device
>> cirrus-vga,id=video0,bus=pci.0,addr=0x2 -incoming fd:25 -device
>> virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x7 -msg timestamp=on
>>
>>
>>
>> The complete instance-0000000e.log file on the destination is:
>>
>> 2017-03-29 06:38:35.962+0000: starting up libvirt version: 2.0.0, package:
>> 10.el7_3.2.tis.24 (Unknown, 2017-03-15-14:59:22,
>> yow-dsulliva-lx-vm1.wrs.com), qemu version: 2.6.0
>> (qemu-kvm-ev-2.6.0-27.1.el7.tis.31), hostname: compute-0
>> LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin
>> QEMU_AUDIO_DRV=none /usr/libexec/qemu-kvm '-c
>> 0x00000000000000000000000000000001' '-n 4' --proc-type=secondary
>> --file-prefix=vs -- -enable-dpdk -name
>> guest=instance-0000000e,debug-threads=on -S -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-10-instance-0000000e/master-key.aes
>> -machine pc-i440fx-rhel7.2.0,accel=kvm,usb=off -m 512 -realtime mlock=off
>> -smp 1,sockets=1,cores=1,threads=1 -object memory-backend-file,id=ram-node0,prealloc=yes,mem-path=/mnt/huge-2048kB/libvirt/qemu,share=yes,size=536870912,host-nodes=0,policy=bind
>> -numa node,nodeid=0,cpus=0,memdev=ram-node0 -uuid
>> 57ae849f-aa66-422a-90a2-62db6c59db29 -smbios 'type=1,manufacturer=Fedora
>> Project,product=OpenStack Nova,version=13.0.0-0.tis.4,serial=4c8121f1-d927-424e-8712-88b1de45be37,uuid=57ae849f-aa66-422a-90a2-62db6c59db29,family=Virtual
>> Machine' -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-10-instance-0000000e/monitor.sock,server,nowait
>> -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc,driftfix=slew
>> -global kvm-pit.lost_tick_policy=discard -no-hpet -no-shutdown -boot
>> reboot-timeout=5000,strict=on -device
>> piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive file=/dev/disk/by-path/ip-192.168.205.6:3260-iscsi-iqn.2010-10.org.openstack:volume-ac57fcaa-7ecd-4d3b-8671-3bc740337a42-lun-0,format=raw,if=none,id=drive-virtio-disk0,serial=ac57fcaa-7ecd-4d3b-8671-3bc740337a42,cache=none,aio=native
>> -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x6,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1
>> -chardev socket,id=charnet0,path=/var/run/vswitch/usvhost-9e574d3c-32dd-4d39-97e6-447b15fb00b4
>> -netdev type=vhost-user,id=hostnet0,chardev=charnet0 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=fa:16:3e:b0:59:a9,bus=pci.0,addr=0x3
>> -chardev socket,id=charnet1,path=/var/run/vswitch/usvhost-7bc48d91-f215-4394-99ff-eb7f20d9ff1e
>> -netdev type=vhost-user,id=hostnet1,chardev=charnet1 -device virtio-net-pci,netdev=hostnet1,id=net1,mac=fa:16:3e:8b:6f:09,bus=pci.0,addr=0x4
>> -chardev socket,id=charnet2,path=/var/run/vswitch/usvhost-c32e2d0d-9ed4-4f4b-abc9-539a12a86008
>> -netdev type=vhost-user,id=hostnet2,chardev=charnet2 -device virtio-net-pci,netdev=hostnet2,id=net2,mac=fa:16:3e:07:ca:a0,bus=pci.0,addr=0x5
>> -add-fd set=0,fd=51 -chardev file,id=charserial0,path=/dev/fdset/0,append=on
>> -device isa-serial,chardev=charserial0,id=serial0 -chardev
>> pty,id=charserial1 -device isa-serial,chardev=charserial1,id=serial1 -device
>> usb-tablet,id=input0 -vnc 0.0.0.0:9 -k en-us -device
>> cirrus-vga,id=video0,bus=pci.0,addr=0x2 -incoming defer -device
>> virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x7 -msg timestamp=on
>> Domain id=10 is tainted: high-privileges
>> EAL:eal_memory.c:1591: WARNING: Address Space Layout Randomization (ASLR) is
>> enabled in the kernel.
>> EAL:eal_memory.c:1593:    This may cause issues with mapping memory into
>> secondary processes
>> char device redirected to /dev/pts/9 (label charserial1)
>> 2017-03-29T06:38:37.886940Z qemu-kvm: VQ 2 size 0x80 < last_avail_idx 0x47b
>> - used_idx 0x47c
>> 2017-03-29T06:38:37.886974Z qemu-kvm: error while loading state for instance
>> 0x0 of device '0000:00:07.0/virtio-balloon'
>> 2017-03-29T06:38:37.888684Z qemu-kvm: load of migration failed: Operation
>> not permitted
>> 2017-03-29 06:38:37.896+0000: shutting down
>>
>>
>> For what it's worth, the differences between the two qemu command lines are
>> as follows:
>>
>> source:
>> -name instance-0000000e -chardev file,id=charserial0,path=/etc/nova/instances/57ae849f-aa66-422a-90a2-62db6c59db29/console.log
>> -vnc 0.0.0.0:9 -incoming fd:25
>>
>> destination:
>> -name guest=instance-0000000e,debug-threads=on -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-10-instance-0000000e/master-key.aes
>> -add-fd set=0,fd=51 -chardev file,id=charserial0,path=/dev/fdset/0,append=on
>> -vnc 0.0.0.0:11 -incoming defer
>>
>> Thanks,
>> Chris
>>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Qemu-devel] hitting intermittent issue with live migration from qemu-kvm-ev 2.3.0 to qemu-kvm-ev 2.6.0
  2017-04-04 13:56   ` Ladi Prosek
@ 2017-04-04 14:28     ` Chris Friesen
  2017-04-04 15:07       ` Ladi Prosek
  0 siblings, 1 reply; 6+ messages in thread
From: Chris Friesen @ 2017-04-04 14:28 UTC (permalink / raw)
  To: Ladi Prosek, Stefan Hajnoczi; +Cc: qemu-devel, Dr. David Alan Gilbert

On 04/04/2017 07:56 AM, Ladi Prosek wrote:
> On Mon, Apr 3, 2017 at 9:11 PM, Stefan Hajnoczi <stefanha@gmail.com> wrote:
>> On Fri, Mar 31, 2017 at 02:12:36PM -0600, Chris Friesen wrote:

>>> Initially we have a bunch of guests running on compute-2 (which is running
>>> qemu-kvm-ev 2.3.0).  We then started live-migrating them one at a time to
>>> compute-0 (which is running qemu-kvm-ev 2.6.0).  Three of them migrated
>>> successfully.  The fourth (which was essentially identical in configuration
>>> to the first three) failed, as per the following logs in
>>> /var/log/libvirt/qemu/instance-0000000e.log:
>>>
>>>
>>> 2017-03-29T06:38:37.886940Z qemu-kvm: VQ 2 size 0x80 < last_avail_idx 0x47b
>>> - used_idx 0x47c
>>> 2017-03-29T06:38:37.886974Z qemu-kvm: error while loading state for instance
>>> 0x0 of device '0000:00:07.0/virtio-balloon'
>>> 2017-03-29T06:38:37.888684Z qemu-kvm: load of migration failed: Operation
>>> not permitted
>>> 2017-03-29 06:38:37.896+0000: shutting down
>>>
>>>
>>> Does anyone know of an existing bug report covering this issue?  (I took a
>>> look and didn't see anything obviously related.)
>>
>> This is the virtio-balloon device.  If you remove the device the live
>> migration should work reliably.
>>
>> Alternatively, you can temporarily rmmod virtio_balloon inside the guest
>> for live migration.  After migration you can modprobe virtio_balloon
>> again.
>>
>> last_avail_idx 0x47b with used_idx 0x47c is an invalid device state.
>> I've diffed qemu-kvm-ev 2.6.0-27.1 hw/virtio/virtio-balloon.c against
>> qemu.git/master and do not see an obvious bug.  I also compared
>> qemu-kvm-ev 2.3.0-31 with qemu-kvm-ev 2.6.0-27.1.
>
> The device likely got into the invalid state as part of a previous
> migration to an unfixed QEMU. I second Stefan's suggestion to
> temporarily remove the device or unload the driver.

I'll give that a try (been busy with a separate issue).

If I have a guest already running, can I unilaterally hot-remove the device from 
the host side or does the guest need to be involved as well?  (I'm just trying 
to figure out how to deal with existing guests.)

Thanks,
Chris

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Qemu-devel] hitting intermittent issue with live migration from qemu-kvm-ev 2.3.0 to qemu-kvm-ev 2.6.0
  2017-04-04 14:28     ` Chris Friesen
@ 2017-04-04 15:07       ` Ladi Prosek
  0 siblings, 0 replies; 6+ messages in thread
From: Ladi Prosek @ 2017-04-04 15:07 UTC (permalink / raw)
  To: Chris Friesen; +Cc: Stefan Hajnoczi, qemu-devel, Dr. David Alan Gilbert

On Tue, Apr 4, 2017 at 4:28 PM, Chris Friesen
<chris.friesen@windriver.com> wrote:
> On 04/04/2017 07:56 AM, Ladi Prosek wrote:
>>
>> On Mon, Apr 3, 2017 at 9:11 PM, Stefan Hajnoczi <stefanha@gmail.com>
>> wrote:
>>>
>>> On Fri, Mar 31, 2017 at 02:12:36PM -0600, Chris Friesen wrote:
>
>
>>>> Initially we have a bunch of guests running on compute-2 (which is
>>>> running
>>>> qemu-kvm-ev 2.3.0).  We then started live-migrating them one at a time
>>>> to
>>>> compute-0 (which is running qemu-kvm-ev 2.6.0).  Three of them migrated
>>>> successfully.  The fourth (which was essentially identical in
>>>> configuration
>>>> to the first three) failed, as per the following logs in
>>>> /var/log/libvirt/qemu/instance-0000000e.log:
>>>>
>>>>
>>>> 2017-03-29T06:38:37.886940Z qemu-kvm: VQ 2 size 0x80 < last_avail_idx
>>>> 0x47b
>>>> - used_idx 0x47c
>>>> 2017-03-29T06:38:37.886974Z qemu-kvm: error while loading state for
>>>> instance
>>>> 0x0 of device '0000:00:07.0/virtio-balloon'
>>>> 2017-03-29T06:38:37.888684Z qemu-kvm: load of migration failed:
>>>> Operation
>>>> not permitted
>>>> 2017-03-29 06:38:37.896+0000: shutting down
>>>>
>>>>
>>>> Does anyone know of an existing bug report covering this issue?  (I took
>>>> a
>>>> look and didn't see anything obviously related.)
>>>
>>>
>>> This is the virtio-balloon device.  If you remove the device the live
>>> migration should work reliably.
>>>
>>> Alternatively, you can temporarily rmmod virtio_balloon inside the guest
>>> for live migration.  After migration you can modprobe virtio_balloon
>>> again.
>>>
>>> last_avail_idx 0x47b with used_idx 0x47c is an invalid device state.
>>> I've diffed qemu-kvm-ev 2.6.0-27.1 hw/virtio/virtio-balloon.c against
>>> qemu.git/master and do not see an obvious bug.  I also compared
>>> qemu-kvm-ev 2.3.0-31 with qemu-kvm-ev 2.6.0-27.1.
>>
>>
>> The device likely got into the invalid state as part of a previous
>> migration to an unfixed QEMU. I second Stefan's suggestion to
>> temporarily remove the device or unload the driver.
>
>
> I'll give that a try (been busy with a separate issue).
>
> If I have a guest already running, can I unilaterally hot-remove the device
> from the host side or does the guest need to be involved as well?  (I'm just
> trying to figure out how to deal with existing guests.)

Hot-remove should be fine.

> Thanks,
> Chris

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2017-04-04 15:07 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-03-31 20:12 [Qemu-devel] hitting intermittent issue with live migration from qemu-kvm-ev 2.3.0 to qemu-kvm-ev 2.6.0 Chris Friesen
2017-04-03  9:30 ` Dr. David Alan Gilbert
2017-04-03 19:11 ` Stefan Hajnoczi
2017-04-04 13:56   ` Ladi Prosek
2017-04-04 14:28     ` Chris Friesen
2017-04-04 15:07       ` Ladi Prosek

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.