* [Qemu-devel] assert during internal snapshot
@ 2015-11-07 15:20 Denis V. Lunev
2015-11-07 15:40 ` [Qemu-devel] [PATCH 1/1] migration: qemu_savevm_state_cleanup becomes mandatory operation Denis V. Lunev
` (4 more replies)
0 siblings, 5 replies; 15+ messages in thread
From: Denis V. Lunev @ 2015-11-07 15:20 UTC (permalink / raw)
To: Liang Li, Paolo Bonzini, Juan Quintela, Amit Shah; +Cc: QEMU
Hello, All!
This commit
commit 94f5a43704129ca4995aa3385303c5ae225bde42
Author: Liang Li <liang.z.li@intel.com>
Date: Mon Nov 2 15:37:00 2015 +0800
migration: defer migration_end & blk_mig_cleanup
Because of the patch 3ea3b7fa9af067982f34b of kvm, which introduces a
lazy collapsing of small sptes into large sptes mechanism, now
migration_end() is a time consuming operation because it calls
memroy_global_dirty_log_stop(), which will trigger the dropping of
small
sptes operation and takes about dozens of milliseconds, so call
migration_end() before all the vmsate data has already been transferred
to the destination will prolong VM downtime. This operation should be
deferred after all the data has been transferred to the destination.
blk_mig_cleanup() can be deferred too.
For a VM with 8G RAM, this patch can reduce the VM downtime about
30 ms.
Signed-off-by: Liang Li <liang.z.li@intel.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>al3
Reviewed-by: Amit Shah <amit.shah@redhat.com>al3
Signed-off-by: Juan Quintela <quintela@redhat.com>al3
introduces the following regression
(gdb) bt
#0 0x00007fd5d314a267 in __GI_raise (sig=sig@entry=6)
at ../sysdeps/unix/sysv/linux/raise.c:55
#1 0x00007fd5d314beca in __GI_abort () at abort.c:89
#2 0x00007fd5d314303d in __assert_fail_base (
fmt=0x7fd5d32a5028 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n",
assertion=assertion@entry=0x557288ed5b69 "i != mr->ioeventfd_nb",
file=file@entry=0x557288ed5a36 "/home/den/src/qemu/memory.c",
line=line@entry=1731,
function=function@entry=0x557288ed5fb0 <__PRETTY_FUNCTION__.32545>
"memory_region_del_eventfd") at assert.c:92
#3 0x00007fd5d31430f2 in __GI___assert_fail (
assertion=0x557288ed5b69 "i != mr->ioeventfd_nb",
file=0x557288ed5a36 "/home/den/src/qemu/memory.c", line=1731,
function=0x557288ed5fb0 <__PRETTY_FUNCTION__.32545>
"memory_region_del_eventfd") at assert.c:101
#4 0x0000557288b108fa in memory_region_del_eventfd (mr=0x55728ad83700,
addr=16, size=2, match_data=true, data=0, e=0x55728b21ff40)
at /home/den/src/qemu/memory.c:1731
#5 0x0000557288d9fc18 in virtio_pci_set_host_notifier_internal (
proxy=0x55728ad82e80, n=0, assign=false, set_handler=false)
at hw/virtio/virtio-pci.c:178
#6 0x0000557288da19a9 in virtio_pci_set_host_notifier
(d=0x55728ad82e80, n=0,
assign=false) at hw/virtio/virtio-pci.c:984
#7 0x0000557288b523df in virtio_scsi_dataplane_start (s=0x55728ad8afa0)
at /home/den/src/qemu/hw/scsi/virtio-scsi-dataplane.c:268
#8 0x0000557288b50210 in virtio_scsi_handle_cmd (vdev=0x55728ad8afa0,
vq=0x55728b21ffc0) at /home/den/src/qemu/hw/scsi/virtio-scsi.c:574
#9 0x0000557288b65cb7 in virtio_queue_notify_vq (vq=0x55728b21ffc0)
at /home/den/src/qemu/hw/virtio/virtio.c:966
#10 0x0000557288b67bbf in virtio_queue_host_notifier_read (n=0x55728b220010)
at /home/den/src/qemu/hw/virtio/virtio.c:1643
#11 0x0000557288e12a2b in aio_dispatch (ctx=0x55728acaeab0) at
aio-posix.c:160
#12 0x0000557288e03194 in aio_ctx_dispatch (source=0x55728acaeab0,
callback=0x0, user_data=0x0) at async.c:226
#13 0x00007fd5d409fff7 in g_main_context_dispatch ()
from /lib/x86_64-linux-gnu/libglib-2.0.so.0
---Type <return> to continue, or q <return> to quit---
#14 0x0000557288e1110d in glib_pollfds_poll () at main-loop.c:211
#15 0x0000557288e111e8 in os_host_main_loop_wait (timeout=0) at
main-loop.c:256
#16 0x0000557288e11295 in main_loop_wait (nonblocking=0) at main-loop.c:504
#17 0x0000557288c1c31c in main_loop () at vl.c:1890
#18 0x0000557288c23dec in main (argc=105, argv=0x7ffca9a6fa08,
envp=0x7ffca9a6fd58) at vl.c:4644
(gdb)
during 'virsh create-snapshot' operation over alive VM.
It happens 100% of time when VM is run using the following
command line:
7498 ? tl 0:37 qemu-system-x86_64 -enable-kvm -name rhel7
-S -machine pc-i440fx-2.2,accel=kvm,usb=off -cpu SandyBridge -m 1024
-realtime mlock=off -smp 1,sockets=1,cores=1,threads=1 -object
iothread,id=iothread1 -uuid 456af4d3-5d67-41c6-a229-c55ded6098e9
-no-user-config -nodefaults -chardev
socket,id=charmonitor,path=/var/lib/libvirt/qemu/rhel7.monitor,server,nowait
-mon chardev=charmonitor,id=monitor,mode=control -rtc
base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=discard -no-hpet
-no-shutdown -global PIIX4_PM.disable_s3=1 -global PIIX4_PM.disable_s4=1
-boot strict=on -device ich9-usb-ehci1,id=usb,bus=pci.0,addr=0x6.0x7
-device
ich9-usb-uhci1,masterbus=usb.0,firstport=0,bus=pci.0,multifunction=on,addr=0x6
-device
ich9-usb-uhci2,masterbus=usb.0,firstport=2,bus=pci.0,addr=0x6.0x1
-device
ich9-usb-uhci3,masterbus=usb.0,firstport=4,bus=pci.0,addr=0x6.0x2
-device virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x5 -device
virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x7 -drive
file=/var/lib/libvirt/images/rhel7.qcow2,if=none,id=drive-scsi0-0-0-0,format=qcow2,cache=none,aio=native
-device
scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=1
-drive
file=/home/den/tmp/CentOS-7.0-1406-x86_64-DVD.iso,if=none,id=drive-ide0-0-0,readonly=on,format=raw
-device ide-cd,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0 -netdev
tap,fd=24,id=hostnet0,vhost=on,vhostfd=25 -device
virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:97:2f:1d,bus=pci.0,addr=0x3
-chardev
socket,id=charchannel0,path=/var/lib/libvirt/qemu/channel/target/rhel7.org.qemu.guest_agent.0,server,nowait
-device
virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.qemu.guest_agent.0
-chardev spicevmc,id=charchannel1,name=vdagent -device
virtserialport,bus=virtio-serial0.0,nr=2,chardev=charchannel1,id=channel1,name=com.redhat.spice.0
-chardev pty,id=charconsole0 -device
virtconsole,chardev=charconsole0,id=console0 -device
usb-tablet,id=input0 -spice
port=5900,addr=127.0.0.1,disable-ticketing,seamless-migration=on -device
qxl-vga,id=video0,ram_size=67108864,vram_size=67108864,vgamem_mb=16,bus=pci.0,addr=0x2
-device intel-hda,id=sound0,bus=pci.0,addr=0x4 -device
hda-duplex,id=sound0-codec0,bus=sound0.0,cad=0 -chardev
spicevmc,id=charredir0,name=usbredir -device
usb-redir,chardev=charredir0,id=redir0 -chardev
spicevmc,id=charredir1,name=usbredir -device
usb-redir,chardev=charredir1,id=redir1 -chardev
spicevmc,id=charredir2,name=usbredir -device
usb-redir,chardev=charredir2,id=redir2 -chardev
spicevmc,id=charredir3,name=usbredir -device
usb-redir,chardev=charredir3,id=redir3 -device
virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x8 -set
device.scsi0.iothread=iothread1 -msg timestamp=on
Minimal dumb (!!!) fix is the following:
hades ~/src/qemu $ cat 1.diff
diff --git a/migration/ram.c b/migration/ram.c
index 25e9eeb..25ebf0d 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -1100,7 +1100,8 @@ static void migration_bitmap_free(struct BitmapRcu
*bmap)
g_free(bmap);
}
-static void migration_end(void)
+extern void migration_end(void);
+void migration_end(void)
{
/* caller have hold iothread lock or is in a bh, so there is
* no writing race against this migration_bitmap
diff --git a/migration/savevm.c b/migration/savevm.c
index dbcc39a..01da865 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -914,6 +914,7 @@ void qemu_savevm_state_cancel(void)
}
}
+extern void migration_end(void);
static int qemu_savevm_state(QEMUFile *f, Error **errp)
{
int ret;
@@ -942,6 +943,8 @@ static int qemu_savevm_state(QEMUFile *f, Error **errp)
qemu_savevm_state_complete(f);
ret = qemu_file_get_error(f);
}
+ migration_end();
+
if (ret != 0) {
qemu_savevm_state_cancel();
error_setg_errno(errp, -ret, "Error while writing VM state");
hades ~/src/qemu $
(patch attached to start a discussion).
For me it looks like commit is wrong and should be reverted and
reworked.
Den
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [Qemu-devel] [PATCH 1/1] migration: qemu_savevm_state_cleanup becomes mandatory operation
2015-11-07 15:20 [Qemu-devel] assert during internal snapshot Denis V. Lunev
@ 2015-11-07 15:40 ` Denis V. Lunev
2015-11-09 4:06 ` Amit Shah
2015-11-09 5:10 ` Li, Liang Z
2015-11-09 2:28 ` [Qemu-devel] assert during internal snapshot Li, Liang Z
` (3 subsequent siblings)
4 siblings, 2 replies; 15+ messages in thread
From: Denis V. Lunev @ 2015-11-07 15:40 UTC (permalink / raw)
Cc: Amit Shah, Denis V. Lunev, Juan Quintela, qemu-devel, Paolo Bonzini
since commit
commit 94f5a43704129ca4995aa3385303c5ae225bde42
Author: Liang Li <liang.z.li@intel.com>
Date: Mon Nov 2 15:37:00 2015 +0800
migration: defer migration_end & blk_mig_cleanup
when actual .cleanup callbacks calling was removed from complete operations.
Signed-off-by: Denis V. Lunev <den@openvz.org>
CC: Paolo Bonzini <pbonzini@redhat.com>
CC: Juan Quintela <quintela@redhat.com>
CC: Amit Shah <amit.shah@redhat.com>
---
migration/savevm.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/migration/savevm.c b/migration/savevm.c
index e05158d..9f2230f 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -942,8 +942,8 @@ static int qemu_savevm_state(QEMUFile *f, Error **errp)
qemu_savevm_state_complete(f);
ret = qemu_file_get_error(f);
}
+ qemu_savevm_state_cleanup();
if (ret != 0) {
- qemu_savevm_state_cleanup();
error_setg_errno(errp, -ret, "Error while writing VM state");
}
return ret;
--
2.5.0
^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [Qemu-devel] assert during internal snapshot
2015-11-07 15:20 [Qemu-devel] assert during internal snapshot Denis V. Lunev
2015-11-07 15:40 ` [Qemu-devel] [PATCH 1/1] migration: qemu_savevm_state_cleanup becomes mandatory operation Denis V. Lunev
@ 2015-11-09 2:28 ` Li, Liang Z
2015-11-09 3:12 ` Li, Liang Z
` (2 subsequent siblings)
4 siblings, 0 replies; 15+ messages in thread
From: Li, Liang Z @ 2015-11-09 2:28 UTC (permalink / raw)
To: Denis V. Lunev, Paolo Bonzini, Juan Quintela, Amit Shah; +Cc: QEMU
> Hello, All!
>
> This commit
>
> commit 94f5a43704129ca4995aa3385303c5ae225bde42
> Author: Liang Li <liang.z.li@intel.com>
> Date: Mon Nov 2 15:37:00 2015 +0800
>
> migration: defer migration_end & blk_mig_cleanup
>
> Because of the patch 3ea3b7fa9af067982f34b of kvm, which introduces a
> lazy collapsing of small sptes into large sptes mechanism, now
> migration_end() is a time consuming operation because it calls
> memroy_global_dirty_log_stop(), which will trigger the dropping of small
> sptes operation and takes about dozens of milliseconds, so call
> migration_end() before all the vmsate data has already been transferred
> to the destination will prolong VM downtime. This operation should be
> deferred after all the data has been transferred to the destination.
>
> blk_mig_cleanup() can be deferred too.
>
> For a VM with 8G RAM, this patch can reduce the VM downtime about
> 30 ms.
>
> Signed-off-by: Liang Li <liang.z.li@intel.com>
> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
> Reviewed-by: Juan Quintela <quintela@redhat.com>al3
> Reviewed-by: Amit Shah <amit.shah@redhat.com>al3
> Signed-off-by: Juan Quintela <quintela@redhat.com>al3
>
> introduces the following regression
>
> (gdb) bt
> #0 0x00007fd5d314a267 in __GI_raise (sig=sig@entry=6)
> at ../sysdeps/unix/sysv/linux/raise.c:55
> #1 0x00007fd5d314beca in __GI_abort () at abort.c:89
> #2 0x00007fd5d314303d in __assert_fail_base (
> fmt=0x7fd5d32a5028 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n",
> assertion=assertion@entry=0x557288ed5b69 "i != mr->ioeventfd_nb",
> file=file@entry=0x557288ed5a36 "/home/den/src/qemu/memory.c",
> line=line@entry=1731,
> function=function@entry=0x557288ed5fb0 <__PRETTY_FUNCTION__.32545>
> "memory_region_del_eventfd") at assert.c:92
> #3 0x00007fd5d31430f2 in __GI___assert_fail (
> assertion=0x557288ed5b69 "i != mr->ioeventfd_nb",
> file=0x557288ed5a36 "/home/den/src/qemu/memory.c", line=1731,
> function=0x557288ed5fb0 <__PRETTY_FUNCTION__.32545>
> "memory_region_del_eventfd") at assert.c:101
> #4 0x0000557288b108fa in memory_region_del_eventfd
> (mr=0x55728ad83700,
> addr=16, size=2, match_data=true, data=0, e=0x55728b21ff40)
> at /home/den/src/qemu/memory.c:1731
> #5 0x0000557288d9fc18 in virtio_pci_set_host_notifier_internal (
> proxy=0x55728ad82e80, n=0, assign=false, set_handler=false)
> at hw/virtio/virtio-pci.c:178
> #6 0x0000557288da19a9 in virtio_pci_set_host_notifier (d=0x55728ad82e80,
> n=0,
> assign=false) at hw/virtio/virtio-pci.c:984
> #7 0x0000557288b523df in virtio_scsi_dataplane_start (s=0x55728ad8afa0)
> at /home/den/src/qemu/hw/scsi/virtio-scsi-dataplane.c:268
> #8 0x0000557288b50210 in virtio_scsi_handle_cmd (vdev=0x55728ad8afa0,
> vq=0x55728b21ffc0) at /home/den/src/qemu/hw/scsi/virtio-scsi.c:574
> #9 0x0000557288b65cb7 in virtio_queue_notify_vq (vq=0x55728b21ffc0)
> at /home/den/src/qemu/hw/virtio/virtio.c:966
> #10 0x0000557288b67bbf in virtio_queue_host_notifier_read
> (n=0x55728b220010)
> at /home/den/src/qemu/hw/virtio/virtio.c:1643
> #11 0x0000557288e12a2b in aio_dispatch (ctx=0x55728acaeab0) at
> aio-posix.c:160
> #12 0x0000557288e03194 in aio_ctx_dispatch (source=0x55728acaeab0,
> callback=0x0, user_data=0x0) at async.c:226
> #13 0x00007fd5d409fff7 in g_main_context_dispatch ()
> from /lib/x86_64-linux-gnu/libglib-2.0.so.0
> ---Type <return> to continue, or q <return> to quit---
> #14 0x0000557288e1110d in glib_pollfds_poll () at main-loop.c:211
> #15 0x0000557288e111e8 in os_host_main_loop_wait (timeout=0) at
> main-loop.c:256
> #16 0x0000557288e11295 in main_loop_wait (nonblocking=0) at main-
> loop.c:504
> #17 0x0000557288c1c31c in main_loop () at vl.c:1890
> #18 0x0000557288c23dec in main (argc=105, argv=0x7ffca9a6fa08,
> envp=0x7ffca9a6fd58) at vl.c:4644
> (gdb)
>
> during 'virsh create-snapshot' operation over alive VM.
> It happens 100% of time when VM is run using the following command line:
>
> 7498 ? tl 0:37 qemu-system-x86_64 -enable-kvm -name rhel7
> -S -machine pc-i440fx-2.2,accel=kvm,usb=off -cpu SandyBridge -m 1024 -
> realtime mlock=off -smp 1,sockets=1,cores=1,threads=1 -object
> iothread,id=iothread1 -uuid 456af4d3-5d67-41c6-a229-c55ded6098e9
> -no-user-config -nodefaults -chardev
> socket,id=charmonitor,path=/var/lib/libvirt/qemu/rhel7.monitor,server,nowait
> -mon chardev=charmonitor,id=monitor,mode=control -rtc
> base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=discard -no-hpet -no-
> shutdown -global PIIX4_PM.disable_s3=1 -global PIIX4_PM.disable_s4=1 -boot
> strict=on -device ich9-usb-ehci1,id=usb,bus=pci.0,addr=0x6.0x7
> -device
> ich9-usb-
> uhci1,masterbus=usb.0,firstport=0,bus=pci.0,multifunction=on,addr=0x6
> -device
> ich9-usb-uhci2,masterbus=usb.0,firstport=2,bus=pci.0,addr=0x6.0x1
> -device
> ich9-usb-uhci3,masterbus=usb.0,firstport=4,bus=pci.0,addr=0x6.0x2
> -device virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x5 -device
> virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x7 -drive
> file=/var/lib/libvirt/images/rhel7.qcow2,if=none,id=drive-scsi0-0-0-
> 0,format=qcow2,cache=none,aio=native
> -device
> scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-
> 0-0,bootindex=1
> -drive
> file=/home/den/tmp/CentOS-7.0-1406-x86_64-DVD.iso,if=none,id=drive-ide0-0-
> 0,readonly=on,format=raw
> -device ide-cd,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0 -netdev
> tap,fd=24,id=hostnet0,vhost=on,vhostfd=25 -device
> virtio-net-
> pci,netdev=hostnet0,id=net0,mac=52:54:00:97:2f:1d,bus=pci.0,addr=0x3
> -chardev
> socket,id=charchannel0,path=/var/lib/libvirt/qemu/channel/target/rhel7.org.qe
> mu.guest_agent.0,server,nowait
> -device
> virtserialport,bus=virtio-
> serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.qemu.guest_agen
> t.0
> -chardev spicevmc,id=charchannel1,name=vdagent -device
> virtserialport,bus=virtio-
> serial0.0,nr=2,chardev=charchannel1,id=channel1,name=com.redhat.spice.0
> -chardev pty,id=charconsole0 -device
> virtconsole,chardev=charconsole0,id=console0 -device
> usb-tablet,id=input0 -spice
> port=5900,addr=127.0.0.1,disable-ticketing,seamless-migration=on -device
> qxl-
> vga,id=video0,ram_size=67108864,vram_size=67108864,vgamem_mb=16,bus=
> pci.0,addr=0x2
> -device intel-hda,id=sound0,bus=pci.0,addr=0x4 -device
> hda-duplex,id=sound0-codec0,bus=sound0.0,cad=0 -chardev
> spicevmc,id=charredir0,name=usbredir -device
> usb-redir,chardev=charredir0,id=redir0 -chardev
> spicevmc,id=charredir1,name=usbredir -device
> usb-redir,chardev=charredir1,id=redir1 -chardev
> spicevmc,id=charredir2,name=usbredir -device
> usb-redir,chardev=charredir2,id=redir2 -chardev
> spicevmc,id=charredir3,name=usbredir -device
> usb-redir,chardev=charredir3,id=redir3 -device
> virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x8 -set
> device.scsi0.iothread=iothread1 -msg timestamp=on
>
> Minimal dumb (!!!) fix is the following:
>
> hades ~/src/qemu $ cat 1.diff
> diff --git a/migration/ram.c b/migration/ram.c index 25e9eeb..25ebf0d 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -1100,7 +1100,8 @@ static void migration_bitmap_free(struct BitmapRcu
> *bmap)
> g_free(bmap);
> }
>
> -static void migration_end(void)
> +extern void migration_end(void);
> +void migration_end(void)
> {
> /* caller have hold iothread lock or is in a bh, so there is
> * no writing race against this migration_bitmap diff --git
> a/migration/savevm.c b/migration/savevm.c index dbcc39a..01da865 100644
> --- a/migration/savevm.c
> +++ b/migration/savevm.c
> @@ -914,6 +914,7 @@ void qemu_savevm_state_cancel(void)
> }
> }
>
> +extern void migration_end(void);
> static int qemu_savevm_state(QEMUFile *f, Error **errp)
> {
> int ret;
> @@ -942,6 +943,8 @@ static int qemu_savevm_state(QEMUFile *f, Error
> **errp)
> qemu_savevm_state_complete(f);
> ret = qemu_file_get_error(f);
> }
> + migration_end();
> +
> if (ret != 0) {
> qemu_savevm_state_cancel();
> error_setg_errno(errp, -ret, "Error while writing VM state"); hades
> ~/src/qemu $
>
> (patch attached to start a discussion).
>
> For me it looks like commit is wrong and should be reverted and reworked.
>
> Den
Hi Den,
I don't know the reason for this issue, could you please point out how can I reproduce the issue just with QEMU? with the fewest command option, I do not have the libvirt environment.
Liang
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Qemu-devel] assert during internal snapshot
2015-11-07 15:20 [Qemu-devel] assert during internal snapshot Denis V. Lunev
2015-11-07 15:40 ` [Qemu-devel] [PATCH 1/1] migration: qemu_savevm_state_cleanup becomes mandatory operation Denis V. Lunev
2015-11-09 2:28 ` [Qemu-devel] assert during internal snapshot Li, Liang Z
@ 2015-11-09 3:12 ` Li, Liang Z
2015-11-09 3:29 ` Li, Liang Z
2015-11-09 7:24 ` [Qemu-devel] [PATCH for 2.5 v2 1/1] migration: qemu_savevm_state_cleanup becomes mandatory operation Denis V. Lunev
4 siblings, 0 replies; 15+ messages in thread
From: Li, Liang Z @ 2015-11-09 3:12 UTC (permalink / raw)
To: Denis V. Lunev, Paolo Bonzini, Juan Quintela, Amit Shah; +Cc: QEMU
[-- Attachment #1: Type: text/plain, Size: 8617 bytes --]
> migration: defer migration_end & blk_mig_cleanup
>
> Because of the patch 3ea3b7fa9af067982f34b of kvm, which introduces a
> lazy collapsing of small sptes into large sptes mechanism, now
> migration_end() is a time consuming operation because it calls
> memroy_global_dirty_log_stop(), which will trigger the dropping of small
> sptes operation and takes about dozens of milliseconds, so call
> migration_end() before all the vmsate data has already been transferred
> to the destination will prolong VM downtime. This operation should be
> deferred after all the data has been transferred to the destination.
>
> blk_mig_cleanup() can be deferred too.
>
> For a VM with 8G RAM, this patch can reduce the VM downtime about
> 30 ms.
>
> Signed-off-by: Liang Li <liang.z.li@intel.com>
> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
> Reviewed-by: Juan Quintela <quintela@redhat.com>al3
> Reviewed-by: Amit Shah <amit.shah@redhat.com>al3
> Signed-off-by: Juan Quintela <quintela@redhat.com>al3
>
> introduces the following regression
>
> (gdb) bt
> #0 0x00007fd5d314a267 in __GI_raise (sig=sig@entry=6)
> at ../sysdeps/unix/sysv/linux/raise.c:55
> #1 0x00007fd5d314beca in __GI_abort () at abort.c:89
> #2 0x00007fd5d314303d in __assert_fail_base (
> fmt=0x7fd5d32a5028 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n",
> assertion=assertion@entry=0x557288ed5b69 "i != mr->ioeventfd_nb",
> file=file@entry=0x557288ed5a36 "/home/den/src/qemu/memory.c",
> line=line@entry=1731,
> function=function@entry=0x557288ed5fb0 <__PRETTY_FUNCTION__.32545>
> "memory_region_del_eventfd") at assert.c:92
> #3 0x00007fd5d31430f2 in __GI___assert_fail (
> assertion=0x557288ed5b69 "i != mr->ioeventfd_nb",
> file=0x557288ed5a36 "/home/den/src/qemu/memory.c", line=1731,
> function=0x557288ed5fb0 <__PRETTY_FUNCTION__.32545>
> "memory_region_del_eventfd") at assert.c:101
> #4 0x0000557288b108fa in memory_region_del_eventfd
> (mr=0x55728ad83700,
> addr=16, size=2, match_data=true, data=0, e=0x55728b21ff40)
> at /home/den/src/qemu/memory.c:1731
> #5 0x0000557288d9fc18 in virtio_pci_set_host_notifier_internal (
> proxy=0x55728ad82e80, n=0, assign=false, set_handler=false)
> at hw/virtio/virtio-pci.c:178
> #6 0x0000557288da19a9 in virtio_pci_set_host_notifier (d=0x55728ad82e80,
> n=0,
> assign=false) at hw/virtio/virtio-pci.c:984
> #7 0x0000557288b523df in virtio_scsi_dataplane_start (s=0x55728ad8afa0)
> at /home/den/src/qemu/hw/scsi/virtio-scsi-dataplane.c:268
> #8 0x0000557288b50210 in virtio_scsi_handle_cmd (vdev=0x55728ad8afa0,
> vq=0x55728b21ffc0) at /home/den/src/qemu/hw/scsi/virtio-scsi.c:574
> #9 0x0000557288b65cb7 in virtio_queue_notify_vq (vq=0x55728b21ffc0)
> at /home/den/src/qemu/hw/virtio/virtio.c:966
> #10 0x0000557288b67bbf in virtio_queue_host_notifier_read
> (n=0x55728b220010)
> at /home/den/src/qemu/hw/virtio/virtio.c:1643
> #11 0x0000557288e12a2b in aio_dispatch (ctx=0x55728acaeab0) at
> aio-posix.c:160
> #12 0x0000557288e03194 in aio_ctx_dispatch (source=0x55728acaeab0,
> callback=0x0, user_data=0x0) at async.c:226
> #13 0x00007fd5d409fff7 in g_main_context_dispatch ()
> from /lib/x86_64-linux-gnu/libglib-2.0.so.0
> ---Type <return> to continue, or q <return> to quit---
> #14 0x0000557288e1110d in glib_pollfds_poll () at main-loop.c:211
> #15 0x0000557288e111e8 in os_host_main_loop_wait (timeout=0) at
> main-loop.c:256
> #16 0x0000557288e11295 in main_loop_wait (nonblocking=0) at main-
> loop.c:504
> #17 0x0000557288c1c31c in main_loop () at vl.c:1890
> #18 0x0000557288c23dec in main (argc=105, argv=0x7ffca9a6fa08,
> envp=0x7ffca9a6fd58) at vl.c:4644
> (gdb)
>
> during 'virsh create-snapshot' operation over alive VM.
> It happens 100% of time when VM is run using the following command line:
>
> 7498 ? tl 0:37 qemu-system-x86_64 -enable-kvm -name rhel7
> -S -machine pc-i440fx-2.2,accel=kvm,usb=off -cpu SandyBridge -m 1024 -
> realtime mlock=off -smp 1,sockets=1,cores=1,threads=1 -object
> iothread,id=iothread1 -uuid 456af4d3-5d67-41c6-a229-c55ded6098e9
> -no-user-config -nodefaults -chardev
> socket,id=charmonitor,path=/var/lib/libvirt/qemu/rhel7.monitor,server,nowait
> -mon chardev=charmonitor,id=monitor,mode=control -rtc
> base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=discard -no-hpet -no-
> shutdown -global PIIX4_PM.disable_s3=1 -global PIIX4_PM.disable_s4=1 -boot
> strict=on -device ich9-usb-ehci1,id=usb,bus=pci.0,addr=0x6.0x7
> -device
> ich9-usb-
> uhci1,masterbus=usb.0,firstport=0,bus=pci.0,multifunction=on,addr=0x6
> -device
> ich9-usb-uhci2,masterbus=usb.0,firstport=2,bus=pci.0,addr=0x6.0x1
> -device
> ich9-usb-uhci3,masterbus=usb.0,firstport=4,bus=pci.0,addr=0x6.0x2
> -device virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x5 -device
> virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x7 -drive
> file=/var/lib/libvirt/images/rhel7.qcow2,if=none,id=drive-scsi0-0-0-
> 0,format=qcow2,cache=none,aio=native
> -device
> scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-
> 0-0,bootindex=1
> -drive
> file=/home/den/tmp/CentOS-7.0-1406-x86_64-DVD.iso,if=none,id=drive-ide0-0-
> 0,readonly=on,format=raw
> -device ide-cd,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0 -netdev
> tap,fd=24,id=hostnet0,vhost=on,vhostfd=25 -device
> virtio-net-
> pci,netdev=hostnet0,id=net0,mac=52:54:00:97:2f:1d,bus=pci.0,addr=0x3
> -chardev
> socket,id=charchannel0,path=/var/lib/libvirt/qemu/channel/target/rhel7.org.qe
> mu.guest_agent.0,server,nowait
> -device
> virtserialport,bus=virtio-
> serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.qemu.guest_agen
> t.0
> -chardev spicevmc,id=charchannel1,name=vdagent -device
> virtserialport,bus=virtio-
> serial0.0,nr=2,chardev=charchannel1,id=channel1,name=com.redhat.spice.0
> -chardev pty,id=charconsole0 -device
> virtconsole,chardev=charconsole0,id=console0 -device
> usb-tablet,id=input0 -spice
> port=5900,addr=127.0.0.1,disable-ticketing,seamless-migration=on -device
> qxl-
> vga,id=video0,ram_size=67108864,vram_size=67108864,vgamem_mb=16,bus=
> pci.0,addr=0x2
> -device intel-hda,id=sound0,bus=pci.0,addr=0x4 -device
> hda-duplex,id=sound0-codec0,bus=sound0.0,cad=0 -chardev
> spicevmc,id=charredir0,name=usbredir -device
> usb-redir,chardev=charredir0,id=redir0 -chardev
> spicevmc,id=charredir1,name=usbredir -device
> usb-redir,chardev=charredir1,id=redir1 -chardev
> spicevmc,id=charredir2,name=usbredir -device
> usb-redir,chardev=charredir2,id=redir2 -chardev
> spicevmc,id=charredir3,name=usbredir -device
> usb-redir,chardev=charredir3,id=redir3 -device
> virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x8 -set
> device.scsi0.iothread=iothread1 -msg timestamp=on
>
> Minimal dumb (!!!) fix is the following:
>
> hades ~/src/qemu $ cat 1.diff
> diff --git a/migration/ram.c b/migration/ram.c index 25e9eeb..25ebf0d 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -1100,7 +1100,8 @@ static void migration_bitmap_free(struct BitmapRcu
> *bmap)
> g_free(bmap);
> }
>
> -static void migration_end(void)
> +extern void migration_end(void);
> +void migration_end(void)
> {
> /* caller have hold iothread lock or is in a bh, so there is
> * no writing race against this migration_bitmap diff --git
> a/migration/savevm.c b/migration/savevm.c index dbcc39a..01da865 100644
> --- a/migration/savevm.c
> +++ b/migration/savevm.c
> @@ -914,6 +914,7 @@ void qemu_savevm_state_cancel(void)
> }
> }
>
> +extern void migration_end(void);
> static int qemu_savevm_state(QEMUFile *f, Error **errp)
> {
> int ret;
> @@ -942,6 +943,8 @@ static int qemu_savevm_state(QEMUFile *f, Error
> **errp)
> qemu_savevm_state_complete(f);
> ret = qemu_file_get_error(f);
> }
> + migration_end();
> +
> if (ret != 0) {
> qemu_savevm_state_cancel();
> error_setg_errno(errp, -ret, "Error while writing VM state"); hades
> ~/src/qemu $
>
> (patch attached to start a discussion).
>
> For me it looks like commit is wrong and should be reverted and reworked.
>
> Den
Hi Den,
Could you help to reset the commit to "79cf9fad341e", and apply this test patch to find out the issue is trigged by defer the 'migration_end()' or 'blk_mig_cleanup()'.
Thanks a lot.
Liang
[-- Attachment #2: test.patch --]
[-- Type: application/octet-stream, Size: 3264 bytes --]
diff --git a/include/migration/migration.h b/include/migration/migration.h
index 8334621..a4117d6 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -114,6 +114,10 @@ void migrate_fd_connect(MigrationState *s);
int migrate_fd_close(MigrationState *s);
+void migration_end(void);
+
+void blk_mig_cleanup(void);
+
void add_migration_state_change_notifier(Notifier *notify);
void remove_migration_state_change_notifier(Notifier *notify);
bool migration_in_setup(MigrationState *);
diff --git a/migration/block.c b/migration/block.c
index f7bb1e0..267953f 100644
--- a/migration/block.c
+++ b/migration/block.c
@@ -591,7 +591,7 @@ static int64_t get_remaining_dirty(void)
/* Called with iothread lock taken. */
-static void blk_mig_cleanup(void)
+void blk_mig_cleanup(void)
{
BlkMigDevState *bmds;
BlkMigBlock *blk;
@@ -751,6 +751,7 @@ static int block_save_complete(QEMUFile *f, void *opaque)
qemu_put_be64(f, BLK_MIG_FLAG_EOS);
blk_mig_cleanup();
+ //blk_mig_cleanup();
return 0;
}
diff --git a/migration/migration.c b/migration/migration.c
index b092f38..caad0e3 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -613,12 +613,9 @@ static void migrate_fd_cleanup(void *opaque)
assert(s->state != MIGRATION_STATUS_ACTIVE);
- if (s->state != MIGRATION_STATUS_COMPLETED) {
- qemu_savevm_state_cancel();
- if (s->state == MIGRATION_STATUS_CANCELLING) {
- migrate_set_state(s, MIGRATION_STATUS_CANCELLING,
- MIGRATION_STATUS_CANCELLED);
- }
+ if (s->state == MIGRATION_STATUS_CANCELLING) {
+ migrate_set_state(s, MIGRATION_STATUS_CANCELLING,
+ MIGRATION_STATUS_CANCELLED);
}
notifier_list_notify(&migration_state_notifiers, s);
@@ -1028,6 +1025,7 @@ static void *migration_thread(void *opaque)
int64_t initial_bytes = 0;
int64_t max_size = 0;
int64_t start_time = initial_time;
+ int64_t end_time;
bool old_vm_running = false;
rcu_register_thread();
@@ -1089,10 +1087,12 @@ static void *migration_thread(void *opaque)
/* If we enabled cpu throttling for auto-converge, turn it off. */
cpu_throttle_stop();
+ end_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
qemu_mutex_lock_iothread();
+
+ migration_end();
if (s->state == MIGRATION_STATUS_COMPLETED) {
- int64_t end_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
uint64_t transferred_bytes = qemu_ftell(s->file);
s->total_time = end_time - s->total_time;
s->downtime = end_time - start_time;
diff --git a/migration/ram.c b/migration/ram.c
index a25bcc7..23cc30c 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -1100,7 +1100,7 @@ static void migration_bitmap_free(struct BitmapRcu *bmap)
g_free(bmap);
}
-static void migration_end(void)
+void migration_end(void)
{
/* caller have hold iothread lock or is in a bh, so there is
* no writing race against this migration_bitmap
@@ -1344,7 +1344,7 @@ static int ram_save_complete(QEMUFile *f, void *opaque)
rcu_read_unlock();
- migration_end();
+ //migration_end();
qemu_put_be64(f, RAM_SAVE_FLAG_EOS);
return 0;
^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [Qemu-devel] assert during internal snapshot
2015-11-07 15:20 [Qemu-devel] assert during internal snapshot Denis V. Lunev
` (2 preceding siblings ...)
2015-11-09 3:12 ` Li, Liang Z
@ 2015-11-09 3:29 ` Li, Liang Z
2015-11-10 11:00 ` Stefan Hajnoczi
2015-11-09 7:24 ` [Qemu-devel] [PATCH for 2.5 v2 1/1] migration: qemu_savevm_state_cleanup becomes mandatory operation Denis V. Lunev
4 siblings, 1 reply; 15+ messages in thread
From: Li, Liang Z @ 2015-11-09 3:29 UTC (permalink / raw)
To: Denis V. Lunev, Paolo Bonzini, Juan Quintela, Amit Shah
Cc: QEMU, Stefan Hajnoczi
> -----Original Message-----
> From: Denis V. Lunev [mailto:den@openvz.org]
> Sent: Saturday, November 07, 2015 11:20 PM
> To: Li, Liang Z; Paolo Bonzini; Juan Quintela; Amit Shah
> Cc: QEMU
> Subject: assert during internal snapshot
>
> Hello, All!
>
> This commit
>
> commit 94f5a43704129ca4995aa3385303c5ae225bde42
> Author: Liang Li <liang.z.li@intel.com>
> Date: Mon Nov 2 15:37:00 2015 +0800
>
> migration: defer migration_end & blk_mig_cleanup
>
> Because of the patch 3ea3b7fa9af067982f34b of kvm, which introduces a
> lazy collapsing of small sptes into large sptes mechanism, now
> migration_end() is a time consuming operation because it calls
> memroy_global_dirty_log_stop(), which will trigger the dropping of small
> sptes operation and takes about dozens of milliseconds, so call
> migration_end() before all the vmsate data has already been transferred
> to the destination will prolong VM downtime. This operation should be
> deferred after all the data has been transferred to the destination.
>
> blk_mig_cleanup() can be deferred too.
>
> For a VM with 8G RAM, this patch can reduce the VM downtime about
> 30 ms.
>
> Signed-off-by: Liang Li <liang.z.li@intel.com>
> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
> Reviewed-by: Juan Quintela <quintela@redhat.com>al3
> Reviewed-by: Amit Shah <amit.shah@redhat.com>al3
> Signed-off-by: Juan Quintela <quintela@redhat.com>al3
>
> introduces the following regression
>
> (gdb) bt
> #0 0x00007fd5d314a267 in __GI_raise (sig=sig@entry=6)
> at ../sysdeps/unix/sysv/linux/raise.c:55
> #1 0x00007fd5d314beca in __GI_abort () at abort.c:89
> #2 0x00007fd5d314303d in __assert_fail_base (
> fmt=0x7fd5d32a5028 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n",
> assertion=assertion@entry=0x557288ed5b69 "i != mr->ioeventfd_nb",
> file=file@entry=0x557288ed5a36 "/home/den/src/qemu/memory.c",
> line=line@entry=1731,
> function=function@entry=0x557288ed5fb0 <__PRETTY_FUNCTION__.32545>
> "memory_region_del_eventfd") at assert.c:92
> #3 0x00007fd5d31430f2 in __GI___assert_fail (
> assertion=0x557288ed5b69 "i != mr->ioeventfd_nb",
> file=0x557288ed5a36 "/home/den/src/qemu/memory.c", line=1731,
> function=0x557288ed5fb0 <__PRETTY_FUNCTION__.32545>
> "memory_region_del_eventfd") at assert.c:101
> #4 0x0000557288b108fa in memory_region_del_eventfd
> (mr=0x55728ad83700,
> addr=16, size=2, match_data=true, data=0, e=0x55728b21ff40)
> at /home/den/src/qemu/memory.c:1731
> #5 0x0000557288d9fc18 in virtio_pci_set_host_notifier_internal (
> proxy=0x55728ad82e80, n=0, assign=false, set_handler=false)
> at hw/virtio/virtio-pci.c:178
> #6 0x0000557288da19a9 in virtio_pci_set_host_notifier (d=0x55728ad82e80,
> n=0,
> assign=false) at hw/virtio/virtio-pci.c:984
> #7 0x0000557288b523df in virtio_scsi_dataplane_start (s=0x55728ad8afa0)
> at /home/den/src/qemu/hw/scsi/virtio-scsi-dataplane.c:268
> #8 0x0000557288b50210 in virtio_scsi_handle_cmd (vdev=0x55728ad8afa0,
> vq=0x55728b21ffc0) at /home/den/src/qemu/hw/scsi/virtio-scsi.c:574
> #9 0x0000557288b65cb7 in virtio_queue_notify_vq (vq=0x55728b21ffc0)
> at /home/den/src/qemu/hw/virtio/virtio.c:966
> #10 0x0000557288b67bbf in virtio_queue_host_notifier_read
> (n=0x55728b220010)
> at /home/den/src/qemu/hw/virtio/virtio.c:1643
> #11 0x0000557288e12a2b in aio_dispatch (ctx=0x55728acaeab0) at
> aio-posix.c:160
> #12 0x0000557288e03194 in aio_ctx_dispatch (source=0x55728acaeab0,
> callback=0x0, user_data=0x0) at async.c:226
> #13 0x00007fd5d409fff7 in g_main_context_dispatch ()
> from /lib/x86_64-linux-gnu/libglib-2.0.so.0
> ---Type <return> to continue, or q <return> to quit---
> #14 0x0000557288e1110d in glib_pollfds_poll () at main-loop.c:211
> #15 0x0000557288e111e8 in os_host_main_loop_wait (timeout=0) at
> main-loop.c:256
> #16 0x0000557288e11295 in main_loop_wait (nonblocking=0) at main-
> loop.c:504
> #17 0x0000557288c1c31c in main_loop () at vl.c:1890
> #18 0x0000557288c23dec in main (argc=105, argv=0x7ffca9a6fa08,
> envp=0x7ffca9a6fd58) at vl.c:4644
> (gdb)
>
> during 'virsh create-snapshot' operation over alive VM.
> It happens 100% of time when VM is run using the following command line:
>
> 7498 ? tl 0:37 qemu-system-x86_64 -enable-kvm -name rhel7
> -S -machine pc-i440fx-2.2,accel=kvm,usb=off -cpu SandyBridge -m 1024 -
> realtime mlock=off -smp 1,sockets=1,cores=1,threads=1 -object
> iothread,id=iothread1 -uuid 456af4d3-5d67-41c6-a229-c55ded6098e9
> -no-user-config -nodefaults -chardev
> socket,id=charmonitor,path=/var/lib/libvirt/qemu/rhel7.monitor,server,nowait
> -mon chardev=charmonitor,id=monitor,mode=control -rtc
> base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=discard -no-hpet -no-
> shutdown -global PIIX4_PM.disable_s3=1 -global PIIX4_PM.disable_s4=1 -boot
> strict=on -device ich9-usb-ehci1,id=usb,bus=pci.0,addr=0x6.0x7
> -device
> ich9-usb-
> uhci1,masterbus=usb.0,firstport=0,bus=pci.0,multifunction=on,addr=0x6
> -device
> ich9-usb-uhci2,masterbus=usb.0,firstport=2,bus=pci.0,addr=0x6.0x1
> -device
> ich9-usb-uhci3,masterbus=usb.0,firstport=4,bus=pci.0,addr=0x6.0x2
> -device virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x5 -device
> virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x7 -drive
> file=/var/lib/libvirt/images/rhel7.qcow2,if=none,id=drive-scsi0-0-0-
> 0,format=qcow2,cache=none,aio=native
> -device
> scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-
> 0-0,bootindex=1
> -drive
> file=/home/den/tmp/CentOS-7.0-1406-x86_64-DVD.iso,if=none,id=drive-ide0-0-
> 0,readonly=on,format=raw
> -device ide-cd,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0 -netdev
> tap,fd=24,id=hostnet0,vhost=on,vhostfd=25 -device
> virtio-net-
> pci,netdev=hostnet0,id=net0,mac=52:54:00:97:2f:1d,bus=pci.0,addr=0x3
> -chardev
> socket,id=charchannel0,path=/var/lib/libvirt/qemu/channel/target/rhel7.org.qe
> mu.guest_agent.0,server,nowait
> -device
> virtserialport,bus=virtio-
> serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.qemu.guest_agen
> t.0
> -chardev spicevmc,id=charchannel1,name=vdagent -device
> virtserialport,bus=virtio-
> serial0.0,nr=2,chardev=charchannel1,id=channel1,name=com.redhat.spice.0
> -chardev pty,id=charconsole0 -device
> virtconsole,chardev=charconsole0,id=console0 -device
> usb-tablet,id=input0 -spice
> port=5900,addr=127.0.0.1,disable-ticketing,seamless-migration=on -device
> qxl-
> vga,id=video0,ram_size=67108864,vram_size=67108864,vgamem_mb=16,bus=
> pci.0,addr=0x2
> -device intel-hda,id=sound0,bus=pci.0,addr=0x4 -device
> hda-duplex,id=sound0-codec0,bus=sound0.0,cad=0 -chardev
> spicevmc,id=charredir0,name=usbredir -device
> usb-redir,chardev=charredir0,id=redir0 -chardev
> spicevmc,id=charredir1,name=usbredir -device
> usb-redir,chardev=charredir1,id=redir1 -chardev
> spicevmc,id=charredir2,name=usbredir -device
> usb-redir,chardev=charredir2,id=redir2 -chardev
> spicevmc,id=charredir3,name=usbredir -device
> usb-redir,chardev=charredir3,id=redir3 -device
> virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x8 -set
> device.scsi0.iothread=iothread1 -msg timestamp=on
>
> Minimal dumb (!!!) fix is the following:
>
> hades ~/src/qemu $ cat 1.diff
> diff --git a/migration/ram.c b/migration/ram.c index 25e9eeb..25ebf0d 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -1100,7 +1100,8 @@ static void migration_bitmap_free(struct BitmapRcu
> *bmap)
> g_free(bmap);
> }
>
> -static void migration_end(void)
> +extern void migration_end(void);
> +void migration_end(void)
> {
> /* caller have hold iothread lock or is in a bh, so there is
> * no writing race against this migration_bitmap diff --git
> a/migration/savevm.c b/migration/savevm.c index dbcc39a..01da865 100644
> --- a/migration/savevm.c
> +++ b/migration/savevm.c
> @@ -914,6 +914,7 @@ void qemu_savevm_state_cancel(void)
> }
> }
>
> +extern void migration_end(void);
> static int qemu_savevm_state(QEMUFile *f, Error **errp)
> {
> int ret;
> @@ -942,6 +943,8 @@ static int qemu_savevm_state(QEMUFile *f, Error
> **errp)
> qemu_savevm_state_complete(f);
> ret = qemu_file_get_error(f);
> }
> + migration_end();
> +
> if (ret != 0) {
> qemu_savevm_state_cancel();
> error_setg_errno(errp, -ret, "Error while writing VM state"); hades
> ~/src/qemu $
>
> (patch attached to start a discussion).
>
> For me it looks like commit is wrong and should be reverted and reworked.
>
> Den
Cc to : Stefan
It seems the 'bdrv_drain_all()' in ' blk_mig_cleanup ()' should not be deferred. Stefan, is that right?
Liang
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Qemu-devel] [PATCH 1/1] migration: qemu_savevm_state_cleanup becomes mandatory operation
2015-11-07 15:40 ` [Qemu-devel] [PATCH 1/1] migration: qemu_savevm_state_cleanup becomes mandatory operation Denis V. Lunev
@ 2015-11-09 4:06 ` Amit Shah
2015-11-09 5:10 ` Li, Liang Z
1 sibling, 0 replies; 15+ messages in thread
From: Amit Shah @ 2015-11-09 4:06 UTC (permalink / raw)
To: Denis V. Lunev; +Cc: Paolo Bonzini, liang.z.li, qemu-devel, Juan Quintela
CC'ing Liang Li, author of the patch.
On (Sat) 07 Nov 2015 [18:40:12], Denis V. Lunev wrote:
> since commit
> commit 94f5a43704129ca4995aa3385303c5ae225bde42
> Author: Liang Li <liang.z.li@intel.com>
> Date: Mon Nov 2 15:37:00 2015 +0800
>
> migration: defer migration_end & blk_mig_cleanup
>
> when actual .cleanup callbacks calling was removed from complete operations.
>
> Signed-off-by: Denis V. Lunev <den@openvz.org>
> CC: Paolo Bonzini <pbonzini@redhat.com>
> CC: Juan Quintela <quintela@redhat.com>
> CC: Amit Shah <amit.shah@redhat.com>
> ---
> migration/savevm.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/migration/savevm.c b/migration/savevm.c
> index e05158d..9f2230f 100644
> --- a/migration/savevm.c
> +++ b/migration/savevm.c
> @@ -942,8 +942,8 @@ static int qemu_savevm_state(QEMUFile *f, Error **errp)
> qemu_savevm_state_complete(f);
> ret = qemu_file_get_error(f);
> }
> + qemu_savevm_state_cleanup();
> if (ret != 0) {
> - qemu_savevm_state_cleanup();
> error_setg_errno(errp, -ret, "Error while writing VM state");
> }
> return ret;
> --
> 2.5.0
>
Amit
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Qemu-devel] [PATCH 1/1] migration: qemu_savevm_state_cleanup becomes mandatory operation
2015-11-07 15:40 ` [Qemu-devel] [PATCH 1/1] migration: qemu_savevm_state_cleanup becomes mandatory operation Denis V. Lunev
2015-11-09 4:06 ` Amit Shah
@ 2015-11-09 5:10 ` Li, Liang Z
2015-11-09 6:11 ` Denis V. Lunev
1 sibling, 1 reply; 15+ messages in thread
From: Li, Liang Z @ 2015-11-09 5:10 UTC (permalink / raw)
To: Denis V. Lunev; +Cc: Amit Shah, Paolo Bonzini, qemu-devel, Juan Quintela
> since commit
> commit 94f5a43704129ca4995aa3385303c5ae225bde42
> Author: Liang Li <liang.z.li@intel.com>
> Date: Mon Nov 2 15:37:00 2015 +0800
>
> migration: defer migration_end & blk_mig_cleanup
>
> when actual .cleanup callbacks calling was removed from complete operations.
>
> Signed-off-by: Denis V. Lunev <den@openvz.org>
> CC: Paolo Bonzini <pbonzini@redhat.com>
> CC: Juan Quintela <quintela@redhat.com>
> CC: Amit Shah <amit.shah@redhat.com>
> ---
> migration/savevm.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/migration/savevm.c b/migration/savevm.c index e05158d..9f2230f
> 100644
> --- a/migration/savevm.c
> +++ b/migration/savevm.c
> @@ -942,8 +942,8 @@ static int qemu_savevm_state(QEMUFile *f, Error
> **errp)
> qemu_savevm_state_complete(f);
> ret = qemu_file_get_error(f);
> }
> + qemu_savevm_state_cleanup();
> if (ret != 0) {
> - qemu_savevm_state_cleanup();
> error_setg_errno(errp, -ret, "Error while writing VM state");
> }
> return ret;
> --
> 2.5.0
>
Yes, you are right. Thanks a lot.
BTW, can this patch fix the regression you reported?
Reviewed-by: Liang Li <liang.z.li@intel.com>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Qemu-devel] [PATCH 1/1] migration: qemu_savevm_state_cleanup becomes mandatory operation
2015-11-09 5:10 ` Li, Liang Z
@ 2015-11-09 6:11 ` Denis V. Lunev
2015-11-09 6:16 ` Li, Liang Z
0 siblings, 1 reply; 15+ messages in thread
From: Denis V. Lunev @ 2015-11-09 6:11 UTC (permalink / raw)
To: Li, Liang Z; +Cc: Amit Shah, Paolo Bonzini, qemu-devel, Juan Quintela
On 11/09/2015 08:10 AM, Li, Liang Z wrote:
>> since commit
>> commit 94f5a43704129ca4995aa3385303c5ae225bde42
>> Author: Liang Li <liang.z.li@intel.com>
>> Date: Mon Nov 2 15:37:00 2015 +0800
>>
>> migration: defer migration_end & blk_mig_cleanup
>>
>> when actual .cleanup callbacks calling was removed from complete operations.
>>
>> Signed-off-by: Denis V. Lunev <den@openvz.org>
>> CC: Paolo Bonzini <pbonzini@redhat.com>
>> CC: Juan Quintela <quintela@redhat.com>
>> CC: Amit Shah <amit.shah@redhat.com>
>> ---
>> migration/savevm.c | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/migration/savevm.c b/migration/savevm.c index e05158d..9f2230f
>> 100644
>> --- a/migration/savevm.c
>> +++ b/migration/savevm.c
>> @@ -942,8 +942,8 @@ static int qemu_savevm_state(QEMUFile *f, Error
>> **errp)
>> qemu_savevm_state_complete(f);
>> ret = qemu_file_get_error(f);
>> }
>> + qemu_savevm_state_cleanup();
>> if (ret != 0) {
>> - qemu_savevm_state_cleanup();
>> error_setg_errno(errp, -ret, "Error while writing VM state");
>> }
>> return ret;
>> --
>> 2.5.0
>>
>
> Yes, you are right. Thanks a lot.
>
> BTW, can this patch fix the regression you reported?
>
> Reviewed-by: Liang Li <liang.z.li@intel.com>
>
yes
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Qemu-devel] [PATCH 1/1] migration: qemu_savevm_state_cleanup becomes mandatory operation
2015-11-09 6:11 ` Denis V. Lunev
@ 2015-11-09 6:16 ` Li, Liang Z
2015-11-09 6:18 ` Denis V. Lunev
0 siblings, 1 reply; 15+ messages in thread
From: Li, Liang Z @ 2015-11-09 6:16 UTC (permalink / raw)
To: Denis V. Lunev; +Cc: Amit Shah, Paolo Bonzini, qemu-devel, Juan Quintela
> On 11/09/2015 08:10 AM, Li, Liang Z wrote:
> >> since commit
> >> commit 94f5a43704129ca4995aa3385303c5ae225bde42
> >> Author: Liang Li <liang.z.li@intel.com>
> >> Date: Mon Nov 2 15:37:00 2015 +0800
> >>
> >> migration: defer migration_end & blk_mig_cleanup
> >>
> >> when actual .cleanup callbacks calling was removed from complete
> operations.
> >>
> >> Signed-off-by: Denis V. Lunev <den@openvz.org>
> >> CC: Paolo Bonzini <pbonzini@redhat.com>
> >> CC: Juan Quintela <quintela@redhat.com>
> >> CC: Amit Shah <amit.shah@redhat.com>
> >> ---
> >> migration/savevm.c | 2 +-
> >> 1 file changed, 1 insertion(+), 1 deletion(-)
> >>
> >> diff --git a/migration/savevm.c b/migration/savevm.c index
> e05158d..9f2230f
> >> 100644
> >> --- a/migration/savevm.c
> >> +++ b/migration/savevm.c
> >> @@ -942,8 +942,8 @@ static int qemu_savevm_state(QEMUFile *f, Error
> >> **errp)
> >> qemu_savevm_state_complete(f);
> >> ret = qemu_file_get_error(f);
> >> }
> >> + qemu_savevm_state_cleanup();
> >> if (ret != 0) {
> >> - qemu_savevm_state_cleanup();
> >> error_setg_errno(errp, -ret, "Error while writing VM state");
> >> }
> >> return ret;
> >> --
> >> 2.5.0
> >>
> >
> > Yes, you are right. Thanks a lot.
> >
> > BTW, can this patch fix the regression you reported?
> >
> > Reviewed-by: Liang Li <liang.z.li@intel.com>
> >
> yes
Great. You'd better change the commit message to make it more clear.
Liang
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Qemu-devel] [PATCH 1/1] migration: qemu_savevm_state_cleanup becomes mandatory operation
2015-11-09 6:16 ` Li, Liang Z
@ 2015-11-09 6:18 ` Denis V. Lunev
2015-11-09 8:22 ` Li, Liang Z
0 siblings, 1 reply; 15+ messages in thread
From: Denis V. Lunev @ 2015-11-09 6:18 UTC (permalink / raw)
To: Li, Liang Z; +Cc: Amit Shah, Paolo Bonzini, qemu-devel, Juan Quintela
On 11/09/2015 09:16 AM, Li, Liang Z wrote:
>> On 11/09/2015 08:10 AM, Li, Liang Z wrote:
>>>> since commit
>>>> commit 94f5a43704129ca4995aa3385303c5ae225bde42
>>>> Author: Liang Li <liang.z.li@intel.com>
>>>> Date: Mon Nov 2 15:37:00 2015 +0800
>>>>
>>>> migration: defer migration_end & blk_mig_cleanup
>>>>
>>>> when actual .cleanup callbacks calling was removed from complete
>> operations.
>>>> Signed-off-by: Denis V. Lunev <den@openvz.org>
>>>> CC: Paolo Bonzini <pbonzini@redhat.com>
>>>> CC: Juan Quintela <quintela@redhat.com>
>>>> CC: Amit Shah <amit.shah@redhat.com>
>>>> ---
>>>> migration/savevm.c | 2 +-
>>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>>>
>>>> diff --git a/migration/savevm.c b/migration/savevm.c index
>> e05158d..9f2230f
>>>> 100644
>>>> --- a/migration/savevm.c
>>>> +++ b/migration/savevm.c
>>>> @@ -942,8 +942,8 @@ static int qemu_savevm_state(QEMUFile *f, Error
>>>> **errp)
>>>> qemu_savevm_state_complete(f);
>>>> ret = qemu_file_get_error(f);
>>>> }
>>>> + qemu_savevm_state_cleanup();
>>>> if (ret != 0) {
>>>> - qemu_savevm_state_cleanup();
>>>> error_setg_errno(errp, -ret, "Error while writing VM state");
>>>> }
>>>> return ret;
>>>> --
>>>> 2.5.0
>>>>
>>> Yes, you are right. Thanks a lot.
>>>
>>> BTW, can this patch fix the regression you reported?
>>>
>>> Reviewed-by: Liang Li <liang.z.li@intel.com>
>>>
>> yes
> Great. You'd better change the commit message to make it more clear.
>
> Liang
argh.. you are right...
This problem has appeared in the end of big rework
of another problem with snapshots and dataplane.
Sorry that this is not clear that regression is fixed.
I'll resend the patch with better commit message
Den
^ permalink raw reply [flat|nested] 15+ messages in thread
* [Qemu-devel] [PATCH for 2.5 v2 1/1] migration: qemu_savevm_state_cleanup becomes mandatory operation
2015-11-07 15:20 [Qemu-devel] assert during internal snapshot Denis V. Lunev
` (3 preceding siblings ...)
2015-11-09 3:29 ` Li, Liang Z
@ 2015-11-09 7:24 ` Denis V. Lunev
2015-11-09 16:20 ` Juan Quintela
4 siblings, 1 reply; 15+ messages in thread
From: Denis V. Lunev @ 2015-11-09 7:24 UTC (permalink / raw)
Cc: Amit Shah, Denis V. Lunev, Juan Quintela, qemu-devel, Paolo Bonzini
since commit
commit 94f5a43704129ca4995aa3385303c5ae225bde42
Author: Liang Li <liang.z.li@intel.com>
Date: Mon Nov 2 15:37:00 2015 +0800
migration: defer migration_end & blk_mig_cleanup
when actual .cleanup callbacks calling was removed from complete operations.
The patch fixes regression introduced by the commit above results in
100% reliable assert for virtio-scsi VM with iothreads enabled during
'virsh create-snapshot' operation:
assert(i != mr->ioeventfd_nb);
memory_region_del_eventfd
virtio_pci_set_host_notifier_internal
virtio_pci_set_host_notifier
virtio_scsi_dataplane_start
virtio_scsi_handle_cmd
virtio_queue_notify_vq
virtio_queue_host_notifier_read
aio_dispatch
Signed-off-by: Denis V. Lunev <den@openvz.org>
Reviewed-by: Liang Li <liang.z.li@intel.com>
CC: Paolo Bonzini <pbonzini@redhat.com>
CC: Juan Quintela <quintela@redhat.com>
CC: Amit Shah <amit.shah@redhat.com>
---
migration/savevm.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/migration/savevm.c b/migration/savevm.c
index e05158d..9f2230f 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -942,8 +942,8 @@ static int qemu_savevm_state(QEMUFile *f, Error **errp)
qemu_savevm_state_complete(f);
ret = qemu_file_get_error(f);
}
+ qemu_savevm_state_cleanup();
if (ret != 0) {
- qemu_savevm_state_cleanup();
error_setg_errno(errp, -ret, "Error while writing VM state");
}
return ret;
--
2.1.4
^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [Qemu-devel] [PATCH 1/1] migration: qemu_savevm_state_cleanup becomes mandatory operation
2015-11-09 6:18 ` Denis V. Lunev
@ 2015-11-09 8:22 ` Li, Liang Z
0 siblings, 0 replies; 15+ messages in thread
From: Li, Liang Z @ 2015-11-09 8:22 UTC (permalink / raw)
To: Denis V. Lunev; +Cc: Amit Shah, Paolo Bonzini, qemu-devel, Juan Quintela
> >>> Yes, you are right. Thanks a lot.
> >>>
> >>> BTW, can this patch fix the regression you reported?
> >>>
> >>> Reviewed-by: Liang Li <liang.z.li@intel.com>
> >>>
> >> yes
> > Great. You'd better change the commit message to make it more clear.
> >
> > Liang
> argh.. you are right...
>
> This problem has appeared in the end of big rework of another problem with
> snapshots and dataplane.
> Sorry that this is not clear that regression is fixed.
> I'll resend the patch with better commit message
>
> Den
If the regression still exists with you patch, could please try the test.patch I sent to you plus your patch to find out what's the root cause? I am not sure about the bdrv_drain_all() in blk_mig_cleanup().
Liang
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Qemu-devel] [PATCH for 2.5 v2 1/1] migration: qemu_savevm_state_cleanup becomes mandatory operation
2015-11-09 7:24 ` [Qemu-devel] [PATCH for 2.5 v2 1/1] migration: qemu_savevm_state_cleanup becomes mandatory operation Denis V. Lunev
@ 2015-11-09 16:20 ` Juan Quintela
0 siblings, 0 replies; 15+ messages in thread
From: Juan Quintela @ 2015-11-09 16:20 UTC (permalink / raw)
To: Denis V. Lunev; +Cc: Amit Shah, Paolo Bonzini, qemu-devel
"Denis V. Lunev" <den@openvz.org> wrote:
> since commit
> commit 94f5a43704129ca4995aa3385303c5ae225bde42
> Author: Liang Li <liang.z.li@intel.com>
> Date: Mon Nov 2 15:37:00 2015 +0800
>
> migration: defer migration_end & blk_mig_cleanup
>
> when actual .cleanup callbacks calling was removed from complete operations.
>
> The patch fixes regression introduced by the commit above results in
> 100% reliable assert for virtio-scsi VM with iothreads enabled during
> 'virsh create-snapshot' operation:
> assert(i != mr->ioeventfd_nb);
> memory_region_del_eventfd
> virtio_pci_set_host_notifier_internal
> virtio_pci_set_host_notifier
> virtio_scsi_dataplane_start
> virtio_scsi_handle_cmd
> virtio_queue_notify_vq
> virtio_queue_host_notifier_read
> aio_dispatch
>
> Signed-off-by: Denis V. Lunev <den@openvz.org>
> Reviewed-by: Liang Li <liang.z.li@intel.com>
> CC: Paolo Bonzini <pbonzini@redhat.com>
> CC: Juan Quintela <quintela@redhat.com>
> CC: Amit Shah <amit.shah@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
And applied, thanks.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Qemu-devel] assert during internal snapshot
2015-11-09 3:29 ` Li, Liang Z
@ 2015-11-10 11:00 ` Stefan Hajnoczi
2015-11-10 11:44 ` Denis V. Lunev
0 siblings, 1 reply; 15+ messages in thread
From: Stefan Hajnoczi @ 2015-11-10 11:00 UTC (permalink / raw)
To: Li, Liang Z; +Cc: Amit Shah, Denis V. Lunev, Juan Quintela, QEMU, Paolo Bonzini
[-- Attachment #1: Type: text/plain, Size: 9623 bytes --]
On Mon, Nov 09, 2015 at 03:29:13AM +0000, Li, Liang Z wrote:
> > -----Original Message-----
> > From: Denis V. Lunev [mailto:den@openvz.org]
> > Sent: Saturday, November 07, 2015 11:20 PM
> > To: Li, Liang Z; Paolo Bonzini; Juan Quintela; Amit Shah
> > Cc: QEMU
> > Subject: assert during internal snapshot
> >
> > Hello, All!
> >
> > This commit
> >
> > commit 94f5a43704129ca4995aa3385303c5ae225bde42
> > Author: Liang Li <liang.z.li@intel.com>
> > Date: Mon Nov 2 15:37:00 2015 +0800
> >
> > migration: defer migration_end & blk_mig_cleanup
> >
> > Because of the patch 3ea3b7fa9af067982f34b of kvm, which introduces a
> > lazy collapsing of small sptes into large sptes mechanism, now
> > migration_end() is a time consuming operation because it calls
> > memroy_global_dirty_log_stop(), which will trigger the dropping of small
> > sptes operation and takes about dozens of milliseconds, so call
> > migration_end() before all the vmsate data has already been transferred
> > to the destination will prolong VM downtime. This operation should be
> > deferred after all the data has been transferred to the destination.
> >
> > blk_mig_cleanup() can be deferred too.
> >
> > For a VM with 8G RAM, this patch can reduce the VM downtime about
> > 30 ms.
> >
> > Signed-off-by: Liang Li <liang.z.li@intel.com>
> > Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
> > Reviewed-by: Juan Quintela <quintela@redhat.com>al3
> > Reviewed-by: Amit Shah <amit.shah@redhat.com>al3
> > Signed-off-by: Juan Quintela <quintela@redhat.com>al3
> >
> > introduces the following regression
> >
> > (gdb) bt
> > #0 0x00007fd5d314a267 in __GI_raise (sig=sig@entry=6)
> > at ../sysdeps/unix/sysv/linux/raise.c:55
> > #1 0x00007fd5d314beca in __GI_abort () at abort.c:89
> > #2 0x00007fd5d314303d in __assert_fail_base (
> > fmt=0x7fd5d32a5028 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n",
> > assertion=assertion@entry=0x557288ed5b69 "i != mr->ioeventfd_nb",
> > file=file@entry=0x557288ed5a36 "/home/den/src/qemu/memory.c",
> > line=line@entry=1731,
> > function=function@entry=0x557288ed5fb0 <__PRETTY_FUNCTION__.32545>
> > "memory_region_del_eventfd") at assert.c:92
> > #3 0x00007fd5d31430f2 in __GI___assert_fail (
> > assertion=0x557288ed5b69 "i != mr->ioeventfd_nb",
> > file=0x557288ed5a36 "/home/den/src/qemu/memory.c", line=1731,
> > function=0x557288ed5fb0 <__PRETTY_FUNCTION__.32545>
> > "memory_region_del_eventfd") at assert.c:101
> > #4 0x0000557288b108fa in memory_region_del_eventfd
> > (mr=0x55728ad83700,
> > addr=16, size=2, match_data=true, data=0, e=0x55728b21ff40)
> > at /home/den/src/qemu/memory.c:1731
> > #5 0x0000557288d9fc18 in virtio_pci_set_host_notifier_internal (
> > proxy=0x55728ad82e80, n=0, assign=false, set_handler=false)
> > at hw/virtio/virtio-pci.c:178
> > #6 0x0000557288da19a9 in virtio_pci_set_host_notifier (d=0x55728ad82e80,
> > n=0,
> > assign=false) at hw/virtio/virtio-pci.c:984
> > #7 0x0000557288b523df in virtio_scsi_dataplane_start (s=0x55728ad8afa0)
> > at /home/den/src/qemu/hw/scsi/virtio-scsi-dataplane.c:268
> > #8 0x0000557288b50210 in virtio_scsi_handle_cmd (vdev=0x55728ad8afa0,
> > vq=0x55728b21ffc0) at /home/den/src/qemu/hw/scsi/virtio-scsi.c:574
> > #9 0x0000557288b65cb7 in virtio_queue_notify_vq (vq=0x55728b21ffc0)
> > at /home/den/src/qemu/hw/virtio/virtio.c:966
> > #10 0x0000557288b67bbf in virtio_queue_host_notifier_read
> > (n=0x55728b220010)
> > at /home/den/src/qemu/hw/virtio/virtio.c:1643
> > #11 0x0000557288e12a2b in aio_dispatch (ctx=0x55728acaeab0) at
> > aio-posix.c:160
> > #12 0x0000557288e03194 in aio_ctx_dispatch (source=0x55728acaeab0,
> > callback=0x0, user_data=0x0) at async.c:226
> > #13 0x00007fd5d409fff7 in g_main_context_dispatch ()
> > from /lib/x86_64-linux-gnu/libglib-2.0.so.0
> > ---Type <return> to continue, or q <return> to quit---
> > #14 0x0000557288e1110d in glib_pollfds_poll () at main-loop.c:211
> > #15 0x0000557288e111e8 in os_host_main_loop_wait (timeout=0) at
> > main-loop.c:256
> > #16 0x0000557288e11295 in main_loop_wait (nonblocking=0) at main-
> > loop.c:504
> > #17 0x0000557288c1c31c in main_loop () at vl.c:1890
> > #18 0x0000557288c23dec in main (argc=105, argv=0x7ffca9a6fa08,
> > envp=0x7ffca9a6fd58) at vl.c:4644
> > (gdb)
> >
> > during 'virsh create-snapshot' operation over alive VM.
> > It happens 100% of time when VM is run using the following command line:
> >
> > 7498 ? tl 0:37 qemu-system-x86_64 -enable-kvm -name rhel7
> > -S -machine pc-i440fx-2.2,accel=kvm,usb=off -cpu SandyBridge -m 1024 -
> > realtime mlock=off -smp 1,sockets=1,cores=1,threads=1 -object
> > iothread,id=iothread1 -uuid 456af4d3-5d67-41c6-a229-c55ded6098e9
> > -no-user-config -nodefaults -chardev
> > socket,id=charmonitor,path=/var/lib/libvirt/qemu/rhel7.monitor,server,nowait
> > -mon chardev=charmonitor,id=monitor,mode=control -rtc
> > base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=discard -no-hpet -no-
> > shutdown -global PIIX4_PM.disable_s3=1 -global PIIX4_PM.disable_s4=1 -boot
> > strict=on -device ich9-usb-ehci1,id=usb,bus=pci.0,addr=0x6.0x7
> > -device
> > ich9-usb-
> > uhci1,masterbus=usb.0,firstport=0,bus=pci.0,multifunction=on,addr=0x6
> > -device
> > ich9-usb-uhci2,masterbus=usb.0,firstport=2,bus=pci.0,addr=0x6.0x1
> > -device
> > ich9-usb-uhci3,masterbus=usb.0,firstport=4,bus=pci.0,addr=0x6.0x2
> > -device virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x5 -device
> > virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x7 -drive
> > file=/var/lib/libvirt/images/rhel7.qcow2,if=none,id=drive-scsi0-0-0-
> > 0,format=qcow2,cache=none,aio=native
> > -device
> > scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-
> > 0-0,bootindex=1
> > -drive
> > file=/home/den/tmp/CentOS-7.0-1406-x86_64-DVD.iso,if=none,id=drive-ide0-0-
> > 0,readonly=on,format=raw
> > -device ide-cd,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0 -netdev
> > tap,fd=24,id=hostnet0,vhost=on,vhostfd=25 -device
> > virtio-net-
> > pci,netdev=hostnet0,id=net0,mac=52:54:00:97:2f:1d,bus=pci.0,addr=0x3
> > -chardev
> > socket,id=charchannel0,path=/var/lib/libvirt/qemu/channel/target/rhel7.org.qe
> > mu.guest_agent.0,server,nowait
> > -device
> > virtserialport,bus=virtio-
> > serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.qemu.guest_agen
> > t.0
> > -chardev spicevmc,id=charchannel1,name=vdagent -device
> > virtserialport,bus=virtio-
> > serial0.0,nr=2,chardev=charchannel1,id=channel1,name=com.redhat.spice.0
> > -chardev pty,id=charconsole0 -device
> > virtconsole,chardev=charconsole0,id=console0 -device
> > usb-tablet,id=input0 -spice
> > port=5900,addr=127.0.0.1,disable-ticketing,seamless-migration=on -device
> > qxl-
> > vga,id=video0,ram_size=67108864,vram_size=67108864,vgamem_mb=16,bus=
> > pci.0,addr=0x2
> > -device intel-hda,id=sound0,bus=pci.0,addr=0x4 -device
> > hda-duplex,id=sound0-codec0,bus=sound0.0,cad=0 -chardev
> > spicevmc,id=charredir0,name=usbredir -device
> > usb-redir,chardev=charredir0,id=redir0 -chardev
> > spicevmc,id=charredir1,name=usbredir -device
> > usb-redir,chardev=charredir1,id=redir1 -chardev
> > spicevmc,id=charredir2,name=usbredir -device
> > usb-redir,chardev=charredir2,id=redir2 -chardev
> > spicevmc,id=charredir3,name=usbredir -device
> > usb-redir,chardev=charredir3,id=redir3 -device
> > virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x8 -set
> > device.scsi0.iothread=iothread1 -msg timestamp=on
> >
> > Minimal dumb (!!!) fix is the following:
> >
> > hades ~/src/qemu $ cat 1.diff
> > diff --git a/migration/ram.c b/migration/ram.c index 25e9eeb..25ebf0d 100644
> > --- a/migration/ram.c
> > +++ b/migration/ram.c
> > @@ -1100,7 +1100,8 @@ static void migration_bitmap_free(struct BitmapRcu
> > *bmap)
> > g_free(bmap);
> > }
> >
> > -static void migration_end(void)
> > +extern void migration_end(void);
> > +void migration_end(void)
> > {
> > /* caller have hold iothread lock or is in a bh, so there is
> > * no writing race against this migration_bitmap diff --git
> > a/migration/savevm.c b/migration/savevm.c index dbcc39a..01da865 100644
> > --- a/migration/savevm.c
> > +++ b/migration/savevm.c
> > @@ -914,6 +914,7 @@ void qemu_savevm_state_cancel(void)
> > }
> > }
> >
> > +extern void migration_end(void);
> > static int qemu_savevm_state(QEMUFile *f, Error **errp)
> > {
> > int ret;
> > @@ -942,6 +943,8 @@ static int qemu_savevm_state(QEMUFile *f, Error
> > **errp)
> > qemu_savevm_state_complete(f);
> > ret = qemu_file_get_error(f);
> > }
> > + migration_end();
> > +
> > if (ret != 0) {
> > qemu_savevm_state_cancel();
> > error_setg_errno(errp, -ret, "Error while writing VM state"); hades
> > ~/src/qemu $
> >
> > (patch attached to start a discussion).
> >
> > For me it looks like commit is wrong and should be reverted and reworked.
> >
> > Den
>
> Cc to : Stefan
>
> It seems the 'bdrv_drain_all()' in ' blk_mig_cleanup ()' should not be deferred. Stefan, is that right?
Denis, do you still experience this crash with your own savevm patch
series applied? Since that series does AioContext acquire/release where
missing in savevm, it might fix this bug.
Stefan
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 473 bytes --]
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Qemu-devel] assert during internal snapshot
2015-11-10 11:00 ` Stefan Hajnoczi
@ 2015-11-10 11:44 ` Denis V. Lunev
0 siblings, 0 replies; 15+ messages in thread
From: Denis V. Lunev @ 2015-11-10 11:44 UTC (permalink / raw)
To: Stefan Hajnoczi, Li, Liang Z
Cc: Amit Shah, Paolo Bonzini, QEMU, Juan Quintela
On 11/10/2015 02:00 PM, Stefan Hajnoczi wrote:
> On Mon, Nov 09, 2015 at 03:29:13AM +0000, Li, Liang Z wrote:
>>> -----Original Message-----
>>> From: Denis V. Lunev [mailto:den@openvz.org]
>>> Sent: Saturday, November 07, 2015 11:20 PM
>>> To: Li, Liang Z; Paolo Bonzini; Juan Quintela; Amit Shah
>>> Cc: QEMU
>>> Subject: assert during internal snapshot
>>>
>>> Hello, All!
>>>
>>> This commit
>>>
>>> commit 94f5a43704129ca4995aa3385303c5ae225bde42
>>> Author: Liang Li <liang.z.li@intel.com>
>>> Date: Mon Nov 2 15:37:00 2015 +0800
>>>
>>> migration: defer migration_end & blk_mig_cleanup
>>>
>>> Because of the patch 3ea3b7fa9af067982f34b of kvm, which introduces a
>>> lazy collapsing of small sptes into large sptes mechanism, now
>>> migration_end() is a time consuming operation because it calls
>>> memroy_global_dirty_log_stop(), which will trigger the dropping of small
>>> sptes operation and takes about dozens of milliseconds, so call
>>> migration_end() before all the vmsate data has already been transferred
>>> to the destination will prolong VM downtime. This operation should be
>>> deferred after all the data has been transferred to the destination.
>>>
>>> blk_mig_cleanup() can be deferred too.
>>>
>>> For a VM with 8G RAM, this patch can reduce the VM downtime about
>>> 30 ms.
>>>
>>> Signed-off-by: Liang Li <liang.z.li@intel.com>
>>> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
>>> Reviewed-by: Juan Quintela <quintela@redhat.com>al3
>>> Reviewed-by: Amit Shah <amit.shah@redhat.com>al3
>>> Signed-off-by: Juan Quintela <quintela@redhat.com>al3
>>>
>>> introduces the following regression
>>>
>>> (gdb) bt
>>> #0 0x00007fd5d314a267 in __GI_raise (sig=sig@entry=6)
>>> at ../sysdeps/unix/sysv/linux/raise.c:55
>>> #1 0x00007fd5d314beca in __GI_abort () at abort.c:89
>>> #2 0x00007fd5d314303d in __assert_fail_base (
>>> fmt=0x7fd5d32a5028 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n",
>>> assertion=assertion@entry=0x557288ed5b69 "i != mr->ioeventfd_nb",
>>> file=file@entry=0x557288ed5a36 "/home/den/src/qemu/memory.c",
>>> line=line@entry=1731,
>>> function=function@entry=0x557288ed5fb0 <__PRETTY_FUNCTION__.32545>
>>> "memory_region_del_eventfd") at assert.c:92
>>> #3 0x00007fd5d31430f2 in __GI___assert_fail (
>>> assertion=0x557288ed5b69 "i != mr->ioeventfd_nb",
>>> file=0x557288ed5a36 "/home/den/src/qemu/memory.c", line=1731,
>>> function=0x557288ed5fb0 <__PRETTY_FUNCTION__.32545>
>>> "memory_region_del_eventfd") at assert.c:101
>>> #4 0x0000557288b108fa in memory_region_del_eventfd
>>> (mr=0x55728ad83700,
>>> addr=16, size=2, match_data=true, data=0, e=0x55728b21ff40)
>>> at /home/den/src/qemu/memory.c:1731
>>> #5 0x0000557288d9fc18 in virtio_pci_set_host_notifier_internal (
>>> proxy=0x55728ad82e80, n=0, assign=false, set_handler=false)
>>> at hw/virtio/virtio-pci.c:178
>>> #6 0x0000557288da19a9 in virtio_pci_set_host_notifier (d=0x55728ad82e80,
>>> n=0,
>>> assign=false) at hw/virtio/virtio-pci.c:984
>>> #7 0x0000557288b523df in virtio_scsi_dataplane_start (s=0x55728ad8afa0)
>>> at /home/den/src/qemu/hw/scsi/virtio-scsi-dataplane.c:268
>>> #8 0x0000557288b50210 in virtio_scsi_handle_cmd (vdev=0x55728ad8afa0,
>>> vq=0x55728b21ffc0) at /home/den/src/qemu/hw/scsi/virtio-scsi.c:574
>>> #9 0x0000557288b65cb7 in virtio_queue_notify_vq (vq=0x55728b21ffc0)
>>> at /home/den/src/qemu/hw/virtio/virtio.c:966
>>> #10 0x0000557288b67bbf in virtio_queue_host_notifier_read
>>> (n=0x55728b220010)
>>> at /home/den/src/qemu/hw/virtio/virtio.c:1643
>>> #11 0x0000557288e12a2b in aio_dispatch (ctx=0x55728acaeab0) at
>>> aio-posix.c:160
>>> #12 0x0000557288e03194 in aio_ctx_dispatch (source=0x55728acaeab0,
>>> callback=0x0, user_data=0x0) at async.c:226
>>> #13 0x00007fd5d409fff7 in g_main_context_dispatch ()
>>> from /lib/x86_64-linux-gnu/libglib-2.0.so.0
>>> ---Type <return> to continue, or q <return> to quit---
>>> #14 0x0000557288e1110d in glib_pollfds_poll () at main-loop.c:211
>>> #15 0x0000557288e111e8 in os_host_main_loop_wait (timeout=0) at
>>> main-loop.c:256
>>> #16 0x0000557288e11295 in main_loop_wait (nonblocking=0) at main-
>>> loop.c:504
>>> #17 0x0000557288c1c31c in main_loop () at vl.c:1890
>>> #18 0x0000557288c23dec in main (argc=105, argv=0x7ffca9a6fa08,
>>> envp=0x7ffca9a6fd58) at vl.c:4644
>>> (gdb)
>>>
>>> during 'virsh create-snapshot' operation over alive VM.
>>> It happens 100% of time when VM is run using the following command line:
>>>
>>> 7498 ? tl 0:37 qemu-system-x86_64 -enable-kvm -name rhel7
>>> -S -machine pc-i440fx-2.2,accel=kvm,usb=off -cpu SandyBridge -m 1024 -
>>> realtime mlock=off -smp 1,sockets=1,cores=1,threads=1 -object
>>> iothread,id=iothread1 -uuid 456af4d3-5d67-41c6-a229-c55ded6098e9
>>> -no-user-config -nodefaults -chardev
>>> socket,id=charmonitor,path=/var/lib/libvirt/qemu/rhel7.monitor,server,nowait
>>> -mon chardev=charmonitor,id=monitor,mode=control -rtc
>>> base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=discard -no-hpet -no-
>>> shutdown -global PIIX4_PM.disable_s3=1 -global PIIX4_PM.disable_s4=1 -boot
>>> strict=on -device ich9-usb-ehci1,id=usb,bus=pci.0,addr=0x6.0x7
>>> -device
>>> ich9-usb-
>>> uhci1,masterbus=usb.0,firstport=0,bus=pci.0,multifunction=on,addr=0x6
>>> -device
>>> ich9-usb-uhci2,masterbus=usb.0,firstport=2,bus=pci.0,addr=0x6.0x1
>>> -device
>>> ich9-usb-uhci3,masterbus=usb.0,firstport=4,bus=pci.0,addr=0x6.0x2
>>> -device virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x5 -device
>>> virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x7 -drive
>>> file=/var/lib/libvirt/images/rhel7.qcow2,if=none,id=drive-scsi0-0-0-
>>> 0,format=qcow2,cache=none,aio=native
>>> -device
>>> scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-
>>> 0-0,bootindex=1
>>> -drive
>>> file=/home/den/tmp/CentOS-7.0-1406-x86_64-DVD.iso,if=none,id=drive-ide0-0-
>>> 0,readonly=on,format=raw
>>> -device ide-cd,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0 -netdev
>>> tap,fd=24,id=hostnet0,vhost=on,vhostfd=25 -device
>>> virtio-net-
>>> pci,netdev=hostnet0,id=net0,mac=52:54:00:97:2f:1d,bus=pci.0,addr=0x3
>>> -chardev
>>> socket,id=charchannel0,path=/var/lib/libvirt/qemu/channel/target/rhel7.org.qe
>>> mu.guest_agent.0,server,nowait
>>> -device
>>> virtserialport,bus=virtio-
>>> serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.qemu.guest_agen
>>> t.0
>>> -chardev spicevmc,id=charchannel1,name=vdagent -device
>>> virtserialport,bus=virtio-
>>> serial0.0,nr=2,chardev=charchannel1,id=channel1,name=com.redhat.spice.0
>>> -chardev pty,id=charconsole0 -device
>>> virtconsole,chardev=charconsole0,id=console0 -device
>>> usb-tablet,id=input0 -spice
>>> port=5900,addr=127.0.0.1,disable-ticketing,seamless-migration=on -device
>>> qxl-
>>> vga,id=video0,ram_size=67108864,vram_size=67108864,vgamem_mb=16,bus=
>>> pci.0,addr=0x2
>>> -device intel-hda,id=sound0,bus=pci.0,addr=0x4 -device
>>> hda-duplex,id=sound0-codec0,bus=sound0.0,cad=0 -chardev
>>> spicevmc,id=charredir0,name=usbredir -device
>>> usb-redir,chardev=charredir0,id=redir0 -chardev
>>> spicevmc,id=charredir1,name=usbredir -device
>>> usb-redir,chardev=charredir1,id=redir1 -chardev
>>> spicevmc,id=charredir2,name=usbredir -device
>>> usb-redir,chardev=charredir2,id=redir2 -chardev
>>> spicevmc,id=charredir3,name=usbredir -device
>>> usb-redir,chardev=charredir3,id=redir3 -device
>>> virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x8 -set
>>> device.scsi0.iothread=iothread1 -msg timestamp=on
>>>
>>> Minimal dumb (!!!) fix is the following:
>>>
>>> hades ~/src/qemu $ cat 1.diff
>>> diff --git a/migration/ram.c b/migration/ram.c index 25e9eeb..25ebf0d 100644
>>> --- a/migration/ram.c
>>> +++ b/migration/ram.c
>>> @@ -1100,7 +1100,8 @@ static void migration_bitmap_free(struct BitmapRcu
>>> *bmap)
>>> g_free(bmap);
>>> }
>>>
>>> -static void migration_end(void)
>>> +extern void migration_end(void);
>>> +void migration_end(void)
>>> {
>>> /* caller have hold iothread lock or is in a bh, so there is
>>> * no writing race against this migration_bitmap diff --git
>>> a/migration/savevm.c b/migration/savevm.c index dbcc39a..01da865 100644
>>> --- a/migration/savevm.c
>>> +++ b/migration/savevm.c
>>> @@ -914,6 +914,7 @@ void qemu_savevm_state_cancel(void)
>>> }
>>> }
>>>
>>> +extern void migration_end(void);
>>> static int qemu_savevm_state(QEMUFile *f, Error **errp)
>>> {
>>> int ret;
>>> @@ -942,6 +943,8 @@ static int qemu_savevm_state(QEMUFile *f, Error
>>> **errp)
>>> qemu_savevm_state_complete(f);
>>> ret = qemu_file_get_error(f);
>>> }
>>> + migration_end();
>>> +
>>> if (ret != 0) {
>>> qemu_savevm_state_cancel();
>>> error_setg_errno(errp, -ret, "Error while writing VM state"); hades
>>> ~/src/qemu $
>>>
>>> (patch attached to start a discussion).
>>>
>>> For me it looks like commit is wrong and should be reverted and reworked.
>>>
>>> Den
>> Cc to : Stefan
>>
>> It seems the 'bdrv_drain_all()' in ' blk_mig_cleanup ()' should not be deferred. Stefan, is that right?
> Denis, do you still experience this crash with your own savevm patch
> series applied? Since that series does AioContext acquire/release where
> missing in savevm, it might fix this bug.
>
> Stefan
this patch
[Qemu-devel] [PULL 57/57] migration: qemu_savevm_state_cleanup becomes
mandatory operation
should be applied before or on top of my patches.
This problem is completely orthogonal and comes from the different source.
Anyway, thank you for paying attention :)
Den
^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2015-11-10 11:45 UTC | newest]
Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-11-07 15:20 [Qemu-devel] assert during internal snapshot Denis V. Lunev
2015-11-07 15:40 ` [Qemu-devel] [PATCH 1/1] migration: qemu_savevm_state_cleanup becomes mandatory operation Denis V. Lunev
2015-11-09 4:06 ` Amit Shah
2015-11-09 5:10 ` Li, Liang Z
2015-11-09 6:11 ` Denis V. Lunev
2015-11-09 6:16 ` Li, Liang Z
2015-11-09 6:18 ` Denis V. Lunev
2015-11-09 8:22 ` Li, Liang Z
2015-11-09 2:28 ` [Qemu-devel] assert during internal snapshot Li, Liang Z
2015-11-09 3:12 ` Li, Liang Z
2015-11-09 3:29 ` Li, Liang Z
2015-11-10 11:00 ` Stefan Hajnoczi
2015-11-10 11:44 ` Denis V. Lunev
2015-11-09 7:24 ` [Qemu-devel] [PATCH for 2.5 v2 1/1] migration: qemu_savevm_state_cleanup becomes mandatory operation Denis V. Lunev
2015-11-09 16:20 ` Juan Quintela
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.