[Qemu-devel] postcopy migration hangs while loading virtio state

* [Qemu-devel] postcopy migration hangs while loading virtio state
@ 2017-04-24 10:31 Christian Borntraeger
  2017-04-24 10:53 ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 22+ messages in thread
From: Christian Borntraeger @ 2017-04-24 10:31 UTC (permalink / raw)
  To: Dr. David Alan Gilbert, Juan Quintela, Michael S. Tsirkin; +Cc: qemu-devel

David, Juan,

I can trigger a hang of postcopy migration (when I do it early) so that
both sides are in paused state. Looks like one thread is still loading
vmstates for virtio and this load accesses guest memory and triggers
a userfault. 

Thread 1 (Thread 0x3ffa2f45f00 (LWP 21122)):
#0  0x0000000001017130 in lduw_he_p (ptr=0x3ff498d9002) at /root/qemu/include/qemu/bswap.h:317
		<---- according to the host kernel this threads hangs in handle_userfault.
#1  0x0000000001017342 in lduw_le_p (ptr=0x3ff498d9002) at /root/qemu/include/qemu/bswap.h:359
#2  0x0000000001025840 in address_space_lduw_internal_cached (cache=0x283491d0, addr=2, attrs=..., result=0x0, endian=DEVICE_LITTLE_ENDIAN) at /root/qemu/memory_ldst.inc.c:284
#3  0x00000000010259a6 in address_space_lduw_le_cached (cache=0x283491d0, addr=2, attrs=..., result=0x0) at /root/qemu/memory_ldst.inc.c:315
#4  0x0000000001025ad6 in lduw_le_phys_cached (cache=0x283491d0, addr=2) at /root/qemu/memory_ldst.inc.c:334
#5  0x0000000001116c10 in virtio_lduw_phys_cached (vdev=0x28275090, cache=0x283491d0, pa=2) at /root/qemu/include/hw/virtio/virtio-access.h:166
#6  0x0000000001117940 in vring_used_idx (vq=0x3ffa2e1c090) at /root/qemu/hw/virtio/virtio.c:263
#7  0x000000000111daea in virtio_load (vdev=0x28275090, f=0x28353600, version_id=2) at /root/qemu/hw/virtio/virtio.c:2168
#8  0x000000000111d0cc in virtio_device_get (f=0x28353600, opaque=0x28275090, size=0, field=0x16adf38 <__compound_literal.0>) at /root/qemu/hw/virtio/virtio.c:1975
#9  0x00000000012a7f50 in vmstate_load_state (f=0x28353600, vmsd=0x14db480 <vmstate_virtio_blk>, opaque=0x28275090, version_id=2) at /root/qemu/migration/vmstate.c:128
#10 0x00000000010cbc08 in vmstate_load (f=0x28353600, se=0x28279e60, version_id=2) at /root/qemu/migration/savevm.c:732
#11 0x00000000010ced6e in qemu_loadvm_section_start_full (f=0x28353600, mis=0x17169a8 <mis_current>) at /root/qemu/migration/savevm.c:1890
#12 0x00000000010cf00c in qemu_loadvm_state_main (f=0x28353600, mis=0x17169a8 <mis_current>) at /root/qemu/migration/savevm.c:1947
#13 0x00000000010ce5bc in loadvm_handle_cmd_packaged (mis=0x17169a8 <mis_current>) at /root/qemu/migration/savevm.c:1707
#14 0x00000000010ce87e in loadvm_process_command (f=0x282c7d20) at /root/qemu/migration/savevm.c:1770
#15 0x00000000010cf056 in qemu_loadvm_state_main (f=0x282c7d20, mis=0x17169a8 <mis_current>) at /root/qemu/migration/savevm.c:1960
#16 0x00000000010cf2b6 in qemu_loadvm_state (f=0x282c7d20) at /root/qemu/migration/savevm.c:2020
#17 0x000000000129d234 in process_incoming_migration_co (opaque=0x282c7d20) at /root/qemu/migration/migration.c:404
#18 0x00000000014b6ca2 in coroutine_trampoline (i0=0, i1=674533168) at /root/qemu/util/coroutine-ucontext.c:79
#19 0x000003ff8de530ea in __makecontext_ret () from target:/lib64/libc.so.6

Thread 8 (Thread 0x3ff40b38910 (LWP 21165)):
#0  0x000003ff8df04336 in syscall () from target:/lib64/libc.so.6
#1  0x00000000014933ce in qemu_futex_wait (f=0x17169b0 <mis_current+8>, val=4294967295) at /root/qemu/include/qemu/futex.h:26
#2  0x00000000014935d0 in qemu_event_wait (ev=0x17169b0 <mis_current+8>) at /root/qemu/util/qemu-thread-posix.c:399
#3  0x00000000010ce094 in postcopy_ram_listen_thread (opaque=0x282c7d20) at /root/qemu/migration/savevm.c:1532
#4  0x000003ff8e007bc4 in start_thread () from target:/lib64/libpthread.so.0
#5  0x000003ff8df0a9f2 in thread_start () from target:/lib64/libc.so.6

Thread 7 (Thread 0x3ff41338910 (LWP 21164)):
#0  0x000003ff8defd72c in ?? () from target:/lib64/libc.so.6
#1  0x00000000012af920 in postcopy_ram_fault_thread (opaque=0x17169a8 <mis_current>) at /root/qemu/migration/postcopy-ram.c:437
#2  0x000003ff8e007bc4 in start_thread () from target:/lib64/libpthread.so.0
#3  0x000003ff8df0a9f2 in thread_start () from target:/lib64/libc.so.6

Thread 6 (Thread 0x3ff41b38910 (LWP 21163)):
#0  0x000003ff8e00d450 in pthread_cond_wait@@GLIBC_2.3.2 () from target:/lib64/libpthread.so.0
#1  0x0000000001492d66 in qemu_cond_wait (cond=0x28302408, mutex=0x283023e0) at /root/qemu/util/qemu-thread-posix.c:133
#2  0x00000000010c6080 in do_data_decompress (opaque=0x283023d8) at /root/qemu/migration/ram.c:2277
#3  0x000003ff8e007bc4 in start_thread () from target:/lib64/libpthread.so.0
#4  0x000003ff8df0a9f2 in thread_start () from target:/lib64/libc.so.6

Thread 5 (Thread 0x3ff42338910 (LWP 21162)):
#0  0x000003ff8e00d450 in pthread_cond_wait@@GLIBC_2.3.2 () from target:/lib64/libpthread.so.0
#1  0x0000000001492d66 in qemu_cond_wait (cond=0x28302390, mutex=0x28302368) at /root/qemu/util/qemu-thread-posix.c:133
#2  0x00000000010c6080 in do_data_decompress (opaque=0x28302360) at /root/qemu/migration/ram.c:2277
#3  0x000003ff8e007bc4 in start_thread () from target:/lib64/libpthread.so.0
#4  0x000003ff8df0a9f2 in thread_start () from target:/lib64/libc.so.6

Thread 4 (Thread 0x3ff89287910 (LWP 21148)):
#0  0x000003ff8e010abc in __lll_lock_wait () from target:/lib64/libpthread.so.0
#1  0x000003ff8e00bcb6 in __pthread_mutex_cond_lock () from target:/lib64/libpthread.so.0
#2  0x000003ff8e00d4ee in pthread_cond_wait@@GLIBC_2.3.2 () from target:/lib64/libpthread.so.0
#3  0x0000000001492d66 in qemu_cond_wait (cond=0x2831eb00, mutex=0x1711688 <qemu_global_mutex>) at /root/qemu/util/qemu-thread-posix.c:133
#4  0x000000000107720e in qemu_kvm_wait_io_event (cpu=0x282e09b0) at /root/qemu/cpus.c:1085
#5  0x00000000010773aa in qemu_kvm_cpu_thread_fn (arg=0x282e09b0) at /root/qemu/cpus.c:1123
#6  0x000003ff8e007bc4 in start_thread () from target:/lib64/libpthread.so.0
#7  0x000003ff8df0a9f2 in thread_start () from target:/lib64/libc.so.6

Thread 3 (Thread 0x3ff8a7ff910 (LWP 21145)):
#0  0x000003ff8defd832 in ppoll () from target:/lib64/libc.so.6
#1  0x000000000148b092 in qemu_poll_ns (fds=0x3ff840008c0, nfds=1, timeout=-1) at /root/qemu/util/qemu-timer.c:322
#2  0x000000000148f46a in aio_poll (ctx=0x2827e7b0, blocking=true) at /root/qemu/util/aio-posix.c:622
#3  0x00000000011a78f6 in iothread_run (opaque=0x2827d740) at /root/qemu/iothread.c:59
#4  0x000003ff8e007bc4 in start_thread () from target:/lib64/libpthread.so.0
#5  0x000003ff8df0a9f2 in thread_start () from target:/lib64/libc.so.6

Thread 2 (Thread 0x3ff8afff910 (LWP 21142)):
#0  0x000003ff8df04336 in syscall () from target:/lib64/libc.so.6
#1  0x00000000014933ce in qemu_futex_wait (f=0x1b2b9b4 <rcu_call_ready_event>, val=4294967295) at /root/qemu/include/qemu/futex.h:26
#2  0x00000000014935d0 in qemu_event_wait (ev=0x1b2b9b4 <rcu_call_ready_event>) at /root/qemu/util/qemu-thread-posix.c:399
#3  0x00000000014b3e40 in call_rcu_thread (opaque=0x0) at /root/qemu/util/rcu.c:249
#4  0x000003ff8e007bc4 in start_thread () from target:/lib64/libpthread.so.0
#5  0x000003ff8df0a9f2 in thread_start () from target:/lib64/libc.so.6

Looks like that this
    <qemu:arg value='-drive'/>
    <qemu:arg value='driver=null-aio,id=null1,if=none,size=100G'/>
    <qemu:arg value='-device'/>
    <qemu:arg value='virtio-blk-ccw,scsi=off,drive=null1,serial=null1,num-queues=4'/>

is necessary to trigger the hang, with nun-qeues=1 everything seems to work fine, so it might
be a virtio issue after all?

Any idea?

^ permalink raw reply	[flat|nested] 22+ messages in thread