All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] qemu 2.0, deadlock in block-commit
@ 2014-05-22 20:49 Marcin Gibuła
  2014-05-22 21:05 ` Marcin Gibuła
                   ` (2 more replies)
  0 siblings, 3 replies; 14+ messages in thread
From: Marcin Gibuła @ 2014-05-22 20:49 UTC (permalink / raw)
  To: qemu-devel

Hi,

I've encountered deadlock in qemu during some stress testing. The test 
is making snapshots, committing them and constantly quering for block 
job info.

The version of QEMU is 2.0.0 rc3 (backtrace below says rc2, but it's 
manualy patched to rc3), but there seems to be no changes in block layer 
in final 2.0 (?).

This is backtrace of qemu process:

(gdb) thread apply all backtrace

Thread 22 (Thread 0x7f6994852700 (LWP 13651)):
#0  0x00007f69982f3d0c in pthread_cond_wait@@GLIBC_2.3.2 () from 
/lib64/libpthread.so.0
#1  0x00007f699ab4c4eb in ?? () from /usr/lib64/librados.so.2
#2  0x00007f69982eff3a in start_thread () from /lib64/libpthread.so.0
#3  0x00007f6998029dad in clone () from /lib64/libc.so.6

Thread 21 (Thread 0x7f698ffff700 (LWP 13652)):
#0  0x00007f69982f5ff1 in sem_timedwait () from /lib64/libpthread.so.0
#1  0x00007f699ac3e1b8 in ?? () from /usr/lib64/librados.so.2
#2  0x00007f69982eff3a in start_thread () from /lib64/libpthread.so.0
#3  0x00007f6998029dad in clone () from /lib64/libc.so.6

Thread 20 (Thread 0x7f698f7fe700 (LWP 13653)):
#0  0x00007f69982f3d0c in pthread_cond_wait@@GLIBC_2.3.2 () from 
/lib64/libpthread.so.0
#1  0x00007f699ab7b383 in ?? () from /usr/lib64/librados.so.2
#2  0x00007f699abe625d in ?? () from /usr/lib64/librados.so.2
#3  0x00007f69982eff3a in start_thread () from /lib64/libpthread.so.0
#4  0x00007f6998029dad in clone () from /lib64/libc.so.6

Thread 19 (Thread 0x7f698effd700 (LWP 13654)):
#0  0x00007f69982f3d0c in pthread_cond_wait@@GLIBC_2.3.2 () from 
/lib64/libpthread.so.0
#1  0x00007f699abe1c88 in ?? () from /usr/lib64/librados.so.2
#2  0x00007f699abe6a6d in ?? () from /usr/lib64/librados.so.2
#3  0x00007f69982eff3a in start_thread () from /lib64/libpthread.so.0
#4  0x00007f6998029dad in clone () from /lib64/libc.so.6

Thread 18 (Thread 0x7f698e7fc700 (LWP 13655)):
#0  0x00007f69982f40de in pthread_cond_timedwait@@GLIBC_2.3.2 () from 
/lib64/libpthread.so.0
#1  0x00007f699aaeced8 in ?? () from /usr/lib64/librados.so.2
#2  0x00007f699aaede0d in ?? () from /usr/lib64/librados.so.2
#3  0x00007f69982eff3a in start_thread () from /lib64/libpthread.so.0
#4  0x00007f6998029dad in clone () from /lib64/libc.so.6

Thread 17 (Thread 0x7f698dffb700 (LWP 13656)):
#0  0x00007f69982f3d0c in pthread_cond_wait@@GLIBC_2.3.2 () from 
/lib64/libpthread.so.0
#1  0x00007f699aaee862 in ?? () from /usr/lib64/librados.so.2
#2  0x00007f69982eff3a in start_thread () from /lib64/libpthread.so.0
#3  0x00007f6998029dad in clone () from /lib64/libc.so.6

Thread 16 (Thread 0x7f698d7fa700 (LWP 13657)):
#0  0x00007f69982f3d0c in pthread_cond_wait@@GLIBC_2.3.2 () from 
/lib64/libpthread.so.0
#1  0x00007f699abd288e in ?? () from /usr/lib64/librados.so.2
#2  0x00007f699abddf1d in ?? () from /usr/lib64/librados.so.2
#3  0x00007f69982eff3a in start_thread () from /lib64/libpthread.so.0
#4  0x00007f6998029dad in clone () from /lib64/libc.so.6

Thread 15 (Thread 0x7f698d6f9700 (LWP 13658)):
#0  0x00007f699802007d in poll () from /lib64/libc.so.6
#1  0x00007f699abc56ac in ?? () from /usr/lib64/librados.so.2
#2  0x00007f699abc7460 in ?? () from /usr/lib64/librados.so.2
#3  0x00007f699abd9c2c in ?? () from /usr/lib64/librados.so.2
#4  0x00007f699abde03d in ?? () from /usr/lib64/librados.so.2
#5  0x00007f69982eff3a in start_thread () from /lib64/libpthread.so.0
#6  0x00007f6998029dad in clone () from /lib64/libc.so.6

Thread 14 (Thread 0x7f698d5f8700 (LWP 13659)):
#0  0x00007f69982f40de in pthread_cond_timedwait@@GLIBC_2.3.2 () from 
/lib64/libpthread.so.0
#1  0x00007f699aaeced8 in ?? () from /usr/lib64/librados.so.2
#2  0x00007f699aaede0d in ?? () from /usr/lib64/librados.so.2
#3  0x00007f69982eff3a in start_thread () from /lib64/libpthread.so.0
#4  0x00007f6998029dad in clone () from /lib64/libc.so.6
---Type <return> to continue, or q <return> to quit---

Thread 13 (Thread 0x7f698cdf7700 (LWP 13660)):
#0  0x00007f69982f3d0c in pthread_cond_wait@@GLIBC_2.3.2 () from 
/lib64/libpthread.so.0
#1  0x00007f699aaee862 in ?? () from /usr/lib64/librados.so.2
#2  0x00007f69982eff3a in start_thread () from /lib64/libpthread.so.0
#3  0x00007f6998029dad in clone () from /lib64/libc.so.6

Thread 12 (Thread 0x7f697ffff700 (LWP 13661)):
#0  0x00007f69982f3d0c in pthread_cond_wait@@GLIBC_2.3.2 () from 
/lib64/libpthread.so.0
#1  0x00007f699aaee862 in ?? () from /usr/lib64/librados.so.2
#2  0x00007f69982eff3a in start_thread () from /lib64/libpthread.so.0
#3  0x00007f6998029dad in clone () from /lib64/libc.so.6

Thread 11 (Thread 0x7f697f7fe700 (LWP 13662)):
#0  0x00007f69982f40de in pthread_cond_timedwait@@GLIBC_2.3.2 () from 
/lib64/libpthread.so.0
#1  0x00007f699a5bc666 in ?? () from /usr/lib64/librbd.so.1
#2  0x00007f699a5cf76d in ?? () from /usr/lib64/librbd.so.1
#3  0x00007f69982eff3a in start_thread () from /lib64/libpthread.so.0
#4  0x00007f6998029dad in clone () from /lib64/libc.so.6

Thread 10 (Thread 0x7f698c5f6700 (LWP 13663)):
#0  0x00007f69982f3d0c in pthread_cond_wait@@GLIBC_2.3.2 () from 
/lib64/libpthread.so.0
#1  0x00007f699abd288e in ?? () from /usr/lib64/librados.so.2
#2  0x00007f699abddf1d in ?? () from /usr/lib64/librados.so.2
#3  0x00007f69982eff3a in start_thread () from /lib64/libpthread.so.0
#4  0x00007f6998029dad in clone () from /lib64/libc.so.6

Thread 9 (Thread 0x7f698c4f5700 (LWP 13664)):
#0  0x00007f699802007d in poll () from /lib64/libc.so.6
#1  0x00007f699abc56ac in ?? () from /usr/lib64/librados.so.2
#2  0x00007f699abc7460 in ?? () from /usr/lib64/librados.so.2
#3  0x00007f699abd9c2c in ?? () from /usr/lib64/librados.so.2
#4  0x00007f699abde03d in ?? () from /usr/lib64/librados.so.2
#5  0x00007f69982eff3a in start_thread () from /lib64/libpthread.so.0
#6  0x00007f6998029dad in clone () from /lib64/libc.so.6

Thread 8 (Thread 0x7f697effd700 (LWP 13665)):
#0  0x00007f69982f67a4 in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x00007f69982f219c in _L_lock_518 () from /lib64/libpthread.so.0
#2  0x00007f69982f1feb in pthread_mutex_lock () from /lib64/libpthread.so.0
#3  0x00007f699c332df9 in qemu_mutex_lock 
(mutex=mutex@entry=0x7f699cb87660 <qemu_global_mutex>) at 
/var/tmp/portage/app-emulation/qemu-2.0.0_rc2/work/qemu-2.0.0-rc2/util/qemu-thread-posix.c:76
#4  0x00007f699c238a10 in qemu_mutex_lock_iothread () at 
/var/tmp/portage/app-emulation/qemu-2.0.0_rc2/work/qemu-2.0.0-rc2/cpus.c:1043
#5  0x00007f699c291f3d in kvm_cpu_exec (cpu=cpu@entry=0x7f699f03d090) at 
/var/tmp/portage/app-emulation/qemu-2.0.0_rc2/work/qemu-2.0.0-rc2/kvm-all.c:1683
#6  0x00007f699c23771c in qemu_kvm_cpu_thread_fn (arg=0x7f699f03d090) at 
/var/tmp/portage/app-emulation/qemu-2.0.0_rc2/work/qemu-2.0.0-rc2/cpus.c:873
#7  0x00007f69982eff3a in start_thread () from /lib64/libpthread.so.0
#8  0x00007f6998029dad in clone () from /lib64/libc.so.6

Thread 7 (Thread 0x7f697e7fc700 (LWP 13666)):
#0  0x00007f69982f67a4 in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x00007f69982f219c in _L_lock_518 () from /lib64/libpthread.so.0
#2  0x00007f69982f1feb in pthread_mutex_lock () from /lib64/libpthread.so.0
#3  0x00007f699c332df9 in qemu_mutex_lock 
(mutex=mutex@entry=0x7f699cb87660 <qemu_global_mutex>) at 
/var/tmp/portage/app-emulation/qemu-2.0.0_rc2/work/qemu-2.0.0-rc2/util/qemu-thread-posix.c:76
#4  0x00007f699c238a10 in qemu_mutex_lock_iothread () at 
/var/tmp/portage/app-emulation/qemu-2.0.0_rc2/work/qemu-2.0.0-rc2/cpus.c:1043
#5  0x00007f699c291f3d in kvm_cpu_exec (cpu=cpu@entry=0x7f699f0774a0) at 
/var/tmp/portage/app-emulation/qemu-2.0.0_rc2/work/qemu-2.0.0-rc2/kvm-all.c:1683
#6  0x00007f699c23771c in qemu_kvm_cpu_thread_fn (arg=0x7f699f0774a0) at 
/var/tmp/portage/app-emulation/qemu-2.0.0_rc2/work/qemu-2.0.0-rc2/cpus.c:873
#7  0x00007f69982eff3a in start_thread () from /lib64/libpthread.so.0
#8  0x00007f6998029dad in clone () from /lib64/libc.so.6

Thread 6 (Thread 0x7f697cfff700 (LWP 13668)):
#0  0x00007f69982f3d0c in pthread_cond_wait@@GLIBC_2.3.2 () from 
/lib64/libpthread.so.0
#1  0x00007f699c333019 in qemu_cond_wait 
(cond=cond@entry=0x7f699f15fba0, mutex=mutex@entry=0x7f699f15fbd0) at 
/var/tmp/portage/app-emulation/qemu-2.0.0_rc2/work/qemu-2.0.0-rc2/util/qemu-thread-posix.c:135
#2  0x00007f699c22770b in vnc_worker_thread_loop 
(queue=queue@entry=0x7f699f15fba0) at 
/var/tmp/portage/app-emulation/qemu-2.0.0_rc2/work/qemu-2.0.0-rc2/ui/vnc-jobs.c:222
#3  0x00007f699c227ae0 in vnc_worker_thread (arg=0x7f699f15fba0) at 
/var/tmp/portage/app-emulation/qemu-2.0.0_rc2/work/qemu-2.0.0-rc2/ui/vnc-jobs.c:323
#4  0x00007f69982eff3a in start_thread () from /lib64/libpthread.so.0
#5  0x00007f6998029dad in clone () from /lib64/libc.so.6

Thread 5 (Thread 0x7f6995458700 (LWP 23551)):
#0  0x00007f69982f3d0c in pthread_cond_wait@@GLIBC_2.3.2 () from 
/lib64/libpthread.so.0
#1  0x00007f699abd288e in ?? () from /usr/lib64/librados.so.2
#2  0x00007f699abddf1d in ?? () from /usr/lib64/librados.so.2
#3  0x00007f69982eff3a in start_thread () from /lib64/libpthread.so.0
#4  0x00007f6998029dad in clone () from /lib64/libc.so.6

Thread 4 (Thread 0x7f697dffb700 (LWP 23552)):
#0  0x00007f699802007d in poll () from /lib64/libc.so.6
#1  0x00007f699abc56ac in ?? () from /usr/lib64/librados.so.2
#2  0x00007f699abc7460 in ?? () from /usr/lib64/librados.so.2
#3  0x00007f699abd9c2c in ?? () from /usr/lib64/librados.so.2
#4  0x00007f699abde03d in ?? () from /usr/lib64/librados.so.2
#5  0x00007f69982eff3a in start_thread () from /lib64/libpthread.so.0
#6  0x00007f6998029dad in clone () from /lib64/libc.so.6

Thread 3 (Thread 0x7f6995056700 (LWP 23553)):
#0  0x00007f69982f3d0c in pthread_cond_wait@@GLIBC_2.3.2 () from 
/lib64/libpthread.so.0
#1  0x00007f699abd288e in ?? () from /usr/lib64/librados.so.2
#2  0x00007f699abddf1d in ?? () from /usr/lib64/librados.so.2
#3  0x00007f69982eff3a in start_thread () from /lib64/libpthread.so.0
#4  0x00007f6998029dad in clone () from /lib64/libc.so.6

Thread 2 (Thread 0x7f697ddff700 (LWP 23554)):
#0  0x00007f699802007d in poll () from /lib64/libc.so.6
#1  0x00007f699abc56ac in ?? () from /usr/lib64/librados.so.2
#2  0x00007f699abc7460 in ?? () from /usr/lib64/librados.so.2
#3  0x00007f699abd9c2c in ?? () from /usr/lib64/librados.so.2
#4  0x00007f699abde03d in ?? () from /usr/lib64/librados.so.2
#5  0x00007f69982eff3a in start_thread () from /lib64/libpthread.so.0
#6  0x00007f6998029dad in clone () from /lib64/libc.so.6

Thread 1 (Thread 0x7f699bfcd900 (LWP 13647)):
#0  0x00007f6998020286 in ppoll () from /lib64/libc.so.6
#1  0x00007f699c1f3d9b in ppoll (__ss=0x0, __timeout=0x0, 
__nfds=<optimized out>, __fds=<optimized out>) at 
/usr/include/bits/poll2.h:77
#2  qemu_poll_ns (fds=<optimized out>, nfds=<optimized out>, 
timeout=<optimized out>) at 
/var/tmp/portage/app-emulation/qemu-2.0.0_rc2/work/qemu-2.0.0-rc2/qemu-timer.c:311
#3  0x00007f699c0877e0 in aio_poll (ctx=0x7f699e4c9c00, 
blocking=blocking@entry=true) at 
/var/tmp/portage/app-emulation/qemu-2.0.0_rc2/work/qemu-2.0.0-rc2/aio-posix.c:221
#4  0x00007f699c095c0a in bdrv_drain_all () at 
/var/tmp/portage/app-emulation/qemu-2.0.0_rc2/work/qemu-2.0.0-rc2/block.c:1805
#5  0x00007f699c09c87e in bdrv_close (bs=bs@entry=0x7f699f0bc520) at 
/var/tmp/portage/app-emulation/qemu-2.0.0_rc2/work/qemu-2.0.0-rc2/block.c:1695
#6  0x00007f699c09c5fa in bdrv_delete (bs=0x7f699f0bc520) at 
/var/tmp/portage/app-emulation/qemu-2.0.0_rc2/work/qemu-2.0.0-rc2/block.c:1978
#7  bdrv_unref (bs=0x7f699f0bc520) at 
/var/tmp/portage/app-emulation/qemu-2.0.0_rc2/work/qemu-2.0.0-rc2/block.c:5198
#8  0x00007f699c09c812 in bdrv_drop_intermediate 
(active=active@entry=0x7f699ebfd330, top=top@entry=0x7f699f0bc520, 
base=base@entry=0x7f699eec43d0) at 
/var/tmp/portage/app-emulation/qemu-2.0.0_rc2/work/qemu-2.0.0-rc2/block.c:2567
#9  0x00007f699c0a1963 in commit_run (opaque=0x7f699f17dcc0) at 
/var/tmp/portage/app-emulation/qemu-2.0.0_rc2/work/qemu-2.0.0-rc2/block/commit.c:144
#10 0x00007f699c0e0dca in coroutine_trampoline (i0=<optimized out>, 
i1=<optimized out>) at 
/var/tmp/portage/app-emulation/qemu-2.0.0_rc2/work/qemu-2.0.0-rc2/coroutine-ucontext.c:118
#11 0x00007f6997f859f0 in ?? () from /lib64/libc.so.6
#12 0x00007fffdbe06750 in ?? ()
#13 0x0000000000000000 in ?? ()

-- 
mg

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] qemu 2.0, deadlock in block-commit
  2014-05-22 20:49 [Qemu-devel] qemu 2.0, deadlock in block-commit Marcin Gibuła
@ 2014-05-22 21:05 ` Marcin Gibuła
  2014-05-23  8:19   ` Paolo Bonzini
  2014-05-22 21:25 ` Eric Blake
  2014-05-23 12:58 ` Stefan Hajnoczi
  2 siblings, 1 reply; 14+ messages in thread
From: Marcin Gibuła @ 2014-05-22 21:05 UTC (permalink / raw)
  To: qemu-devel

W dniu 2014-05-22 22:49, Marcin Gibuła pisze:
>
> Thread 1 (Thread 0x7f699bfcd900 (LWP 13647)):
> #0  0x00007f6998020286 in ppoll () from /lib64/libc.so.6
> #1  0x00007f699c1f3d9b in ppoll (__ss=0x0, __timeout=0x0,
> __nfds=<optimized out>, __fds=<optimized out>) at
> /usr/include/bits/poll2.h:77
> #2  qemu_poll_ns (fds=<optimized out>, nfds=<optimized out>,
> timeout=<optimized out>) at
> /var/tmp/portage/app-emulation/qemu-2.0.0_rc2/work/qemu-2.0.0-rc2/qemu-timer.c:311
> #3  0x00007f699c0877e0 in aio_poll (ctx=0x7f699e4c9c00,
> blocking=blocking@entry=true) at
> /var/tmp/portage/app-emulation/qemu-2.0.0_rc2/work/qemu-2.0.0-rc2/aio-posix.c:221
> #4  0x00007f699c095c0a in bdrv_drain_all () at
> /var/tmp/portage/app-emulation/qemu-2.0.0_rc2/work/qemu-2.0.0-rc2/block.c:1805

Some more info.
VM was doing lot of write IO during this test.

ppoll() is listening for these descriptors (from strace):

ppoll([{fd=25, events=POLLIN|POLLERR|POLLHUP}, {fd=23, 
events=POLLIN|POLLERR|POLLHUP}, {fd=17, events=POLLIN|POLLERR|POLLHUP}, 
{fd=4, events=POLLIN|POLLERR|POLLHUP}], 4, NULL, NULL, 8, ...)

fd # ls -l 25 23 17 4
lrwx------ 1 usr_5062 qemu 64 May 22 23:00 17 -> anon_inode:[eventfd]
lrwx------ 1 usr_5062 qemu 64 May 22 23:00 23 -> anon_inode:[eventfd]
lrwx------ 1 usr_5062 qemu 64 May 22 23:00 25 -> anon_inode:[eventfd]
lrwx------ 1 usr_5062 qemu 64 May 22 23:00 4 -> anon_inode:[eventfd]

VM is started via libvirt. No errors are reported in logs.
Command line is:

/usr/bin/qemu-system-x86_64 -machine accel=kvm -name 
68189c3c-02f6-4aae-88a2-5f13c5e6f53a -S -machine 
pc-i440fx-2.0,accel=kvm,usb=off -cpu SandyBridge,-kvmclock -m 1536 
-realtime mlock=on -smp 2,sockets=2,cores=10,threads=1 -uuid 
68189c3c-02f6-4aae-88a2-5f13c5e6f53a -smbios type=0,vendor=HAL 9000 
-smbios type=1,manufacturer=cloud -no-user-config -nodefaults -chardev 
socket,id=charmonitor,path=/var/lib/libvirt/qemu/68189c3c-02f6-4aae-88a2-5f13c5e6f53a.monitor,server,nowait 
-mon chardev=charmonitor,id=monitor,mode=control -rtc 
base=utc,clock=vm,driftfix=slew -global kvm-pit.lost_tick_policy=discard 
-no-shutdown -boot menu=off,strict=on -device 
piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device 
virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x5 -drive 
file=/dev/cube2/5f751718-ff36-420f-b034-5f31230b5f23,if=none,id=drive-virtio-disk0,format=raw,cache=none,aio=native,bps_rd=57671680,bps_wr=57671680,iops_rd=275,iops_wr=275 
-device 
virtio-blk-pci,scsi=off,bus=pci.0,addr=0x6,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=2 
-drive 
file=/dev/cube1/c5b7a6e3-11f8-4b08-ac3e-5ea054028221,if=none,id=drive-virtio-disk1,format=raw,cache=none,aio=native,bps_rd=57671680,bps_wr=57671680,iops_rd=275,iops_wr=275 
-device 
virtio-blk-pci,scsi=off,bus=pci.0,addr=0x7,drive=drive-virtio-disk1,id=virtio-disk1 
-drive 
file=/mnt/nfs/volumes/66346c1b-add5-4412-89d9-b00a3bb13e75/72be1b50-982e-458a-9a84-c0fbd48b4b3c.qcow2,if=none,id=drive-virtio-disk2,format=qcow2,cache=none,aio=threads,bps_rd=57671680,bps_wr=57671680,iops_rd=275,iops_wr=275 
-device 
virtio-blk-pci,scsi=off,bus=pci.0,addr=0x8,drive=drive-virtio-disk2,id=virtio-disk2 
-drive 
file=/mnt/nfs/volumes/a20c3b29-6f21-4b3d-a3fb-8b80599e50df/b84716ea-2564-47cc-bbbf-dea6029132b4.qcow2,if=none,id=drive-virtio-disk3,format=qcow2,cache=none,aio=threads,bps_rd=57671680,bps_wr=57671680,iops_rd=275,iops_wr=275 
-device 
virtio-blk-pci,scsi=off,bus=pci.0,addr=0x9,drive=drive-virtio-disk3,id=virtio-disk3 
-drive 
file=/mnt/nfs/volumes/0c2996b5-abec-47ea-9e88-ebd7ebf0c79d/453cb20a-1705-45e2-9f9e-bc1ea096d52a.qcow2,if=none,id=drive-virtio-disk4,format=qcow2,cache=none,aio=threads,bps_rd=57671680,bps_wr=57671680,iops_rd=275,iops_wr=275 
-device 
virtio-blk-pci,scsi=off,bus=pci.0,addr=0xa,drive=drive-virtio-disk4,id=virtio-disk4 
-drive 
file=/mnt/nfs/volumes/7dcbd9ba-f0bc-4d3c-9b5c-b2ac824584d5/a8bb7e11-a9b5-4613-9b63-b9722fba2166.qcow2,if=none,id=drive-virtio-disk5,format=qcow2,cache=none,aio=threads,bps_rd=57671680,bps_wr=57671680,iops_rd=275,iops_wr=275 
-device 
virtio-blk-pci,scsi=off,bus=pci.0,addr=0xb,drive=drive-virtio-disk5,id=virtio-disk5 
-drive 
file=rbd:iso-images/rescue.iso:auth_supported=none,if=none,id=drive-ide0-0-0,readonly=on,format=raw 
-device 
ide-cd,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0,bootindex=1 
-netdev tap,fd=19,id=hostnet0,vhost=on,vhostfd=20 -device 
virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:82:41:c9,bus=pci.0,addr=0x3 
-netdev tap,fd=21,id=hostnet1,vhost=on,vhostfd=22 -device 
virtio-net-pci,netdev=hostnet1,id=net1,mac=52:54:00:70:10:35,bus=pci.0,addr=0x4 
-chardev pty,id=charserial0 -device 
isa-serial,chardev=charserial0,id=serial0 -chardev 
socket,id=charchannel0,path=/var/lib/libvirt/qemu/68189c3c-02f6-4aae-88a2-5f13c5e6f53a.agent,server,nowait 
-device 
virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.qemu.guest_agent.0 
-chardev 
socket,id=charchannel1,path=/var/lib/libvirt/qemu/68189c3c-02f6-4aae-88a2-5f13c5e6f53a.cloud.agent,server,nowait 
-device 
virtserialport,bus=virtio-serial0.0,nr=2,chardev=charchannel1,id=channel1,name=com.cloud.guest_agent.1 
-device usb-tablet,id=input0 -vnc 0.0.0.0:0,password -device 
cirrus-vga,id=video0,bus=pci.0,addr=0x2 -device 
virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0xc -sandbox on -device 
pvpanic

-- 
mg

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] qemu 2.0, deadlock in block-commit
  2014-05-22 20:49 [Qemu-devel] qemu 2.0, deadlock in block-commit Marcin Gibuła
  2014-05-22 21:05 ` Marcin Gibuła
@ 2014-05-22 21:25 ` Eric Blake
  2014-05-22 21:43   ` Marcin Gibuła
  2014-05-23 12:58 ` Stefan Hajnoczi
  2 siblings, 1 reply; 14+ messages in thread
From: Eric Blake @ 2014-05-22 21:25 UTC (permalink / raw)
  To: Marcin Gibuła, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 938 bytes --]

On 05/22/2014 02:49 PM, Marcin Gibuła wrote:
> Hi,
> 
> I've encountered deadlock in qemu during some stress testing. The test
> is making snapshots, committing them and constantly quering for block
> job info.

What is the exact command you used for triggering the block-commit?  Was
it via direct HMP or QMP, or indirect via libvirt? Were you trying to
commit the active layer? If so, libvirt doesn't yet support that (I'm
going to guarantee that libvirt 1.2.4 either supports it or rejects it
up front, depending on how hard the libvirt implementation turns out to
be), even though it was added in qemu 2.0 (it turns out that committing
the active layer is subtly different than committing intermediate
layers, and libvirt 1.2.3 wasn't prepared for the subtle difference and
thus gets stuck if you try it).

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] qemu 2.0, deadlock in block-commit
  2014-05-22 21:25 ` Eric Blake
@ 2014-05-22 21:43   ` Marcin Gibuła
  0 siblings, 0 replies; 14+ messages in thread
From: Marcin Gibuła @ 2014-05-22 21:43 UTC (permalink / raw)
  To: qemu-devel

>> I've encountered deadlock in qemu during some stress testing. The test
>> is making snapshots, committing them and constantly quering for block
>> job info.
>
> What is the exact command you used for triggering the block-commit?  Was
> it via direct HMP or QMP, or indirect via libvirt?

Via libvirt.

> Were you trying to
> commit the active layer?

No. Commit was to intermediate file. I'm aware that libvirt does not 
support active layer commit yet.

Plus, judging from backtrace, hang seems to be deep inside qemu. The VM 
is unresponsive after this.

-- 
mg

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] qemu 2.0, deadlock in block-commit
  2014-05-22 21:05 ` Marcin Gibuła
@ 2014-05-23  8:19   ` Paolo Bonzini
  2014-05-23  8:21     ` Marcin Gibuła
  2014-05-23  9:25     ` Marcin Gibuła
  0 siblings, 2 replies; 14+ messages in thread
From: Paolo Bonzini @ 2014-05-23  8:19 UTC (permalink / raw)
  To: Marcin Gibuła, qemu-devel

Il 22/05/2014 23:05, Marcin Gibuła ha scritto:
> Some more info.
> VM was doing lot of write IO during this test.

QEMU is waiting for librados to complete I/O.  Can you reproduce it with 
a different driver?

Paolo

> ppoll() is listening for these descriptors (from strace):
>
> ppoll([{fd=25, events=POLLIN|POLLERR|POLLHUP}, {fd=23,
> events=POLLIN|POLLERR|POLLHUP}, {fd=17, events=POLLIN|POLLERR|POLLHUP},
> {fd=4, events=POLLIN|POLLERR|POLLHUP}], 4, NULL, NULL, 8, ...)
>
> fd # ls -l 25 23 17 4
> lrwx------ 1 usr_5062 qemu 64 May 22 23:00 17 -> anon_inode:[eventfd]
> lrwx------ 1 usr_5062 qemu 64 May 22 23:00 23 -> anon_inode:[eventfd]
> lrwx------ 1 usr_5062 qemu 64 May 22 23:00 25 -> anon_inode:[eventfd]
> lrwx------ 1 usr_5062 qemu 64 May 22 23:00 4 -> anon_inode:[eventfd]
>
> VM is started via libvirt. No errors are reported in logs.
> Command line is:
>
> /usr/bin/qemu-system-x86_64 -machine accel=kvm -name
> 68189c3c-02f6-4aae-88a2-5f13c5e6f53a -S -machine
> pc-i440fx-2.0,accel=kvm,usb=off -cpu SandyBridge,-kvmclock -m 1536
> -realtime mlock=on -smp 2,sockets=2,cores=10,threads=1 -uuid
> 68189c3c-02f6-4aae-88a2-5f13c5e6f53a -smbios type=0,vendor=HAL 9000
> -smbios type=1,manufacturer=cloud -no-user-config -nodefaults -chardev
> socket,id=charmonitor,path=/var/lib/libvirt/qemu/68189c3c-02f6-4aae-88a2-5f13c5e6f53a.monitor,server,nowait
> -mon chardev=charmonitor,id=monitor,mode=control -rtc
> base=utc,clock=vm,driftfix=slew -global kvm-pit.lost_tick_policy=discard
> -no-shutdown -boot menu=off,strict=on -device
> piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device
> virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x5 -drive
> file=/dev/cube2/5f751718-ff36-420f-b034-5f31230b5f23,if=none,id=drive-virtio-disk0,format=raw,cache=none,aio=native,bps_rd=57671680,bps_wr=57671680,iops_rd=275,iops_wr=275
> -device
> virtio-blk-pci,scsi=off,bus=pci.0,addr=0x6,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=2
> -drive
> file=/dev/cube1/c5b7a6e3-11f8-4b08-ac3e-5ea054028221,if=none,id=drive-virtio-disk1,format=raw,cache=none,aio=native,bps_rd=57671680,bps_wr=57671680,iops_rd=275,iops_wr=275
> -device
> virtio-blk-pci,scsi=off,bus=pci.0,addr=0x7,drive=drive-virtio-disk1,id=virtio-disk1
> -drive
> file=/mnt/nfs/volumes/66346c1b-add5-4412-89d9-b00a3bb13e75/72be1b50-982e-458a-9a84-c0fbd48b4b3c.qcow2,if=none,id=drive-virtio-disk2,format=qcow2,cache=none,aio=threads,bps_rd=57671680,bps_wr=57671680,iops_rd=275,iops_wr=275
> -device
> virtio-blk-pci,scsi=off,bus=pci.0,addr=0x8,drive=drive-virtio-disk2,id=virtio-disk2
> -drive
> file=/mnt/nfs/volumes/a20c3b29-6f21-4b3d-a3fb-8b80599e50df/b84716ea-2564-47cc-bbbf-dea6029132b4.qcow2,if=none,id=drive-virtio-disk3,format=qcow2,cache=none,aio=threads,bps_rd=57671680,bps_wr=57671680,iops_rd=275,iops_wr=275
> -device
> virtio-blk-pci,scsi=off,bus=pci.0,addr=0x9,drive=drive-virtio-disk3,id=virtio-disk3
> -drive
> file=/mnt/nfs/volumes/0c2996b5-abec-47ea-9e88-ebd7ebf0c79d/453cb20a-1705-45e2-9f9e-bc1ea096d52a.qcow2,if=none,id=drive-virtio-disk4,format=qcow2,cache=none,aio=threads,bps_rd=57671680,bps_wr=57671680,iops_rd=275,iops_wr=275
> -device
> virtio-blk-pci,scsi=off,bus=pci.0,addr=0xa,drive=drive-virtio-disk4,id=virtio-disk4
> -drive
> file=/mnt/nfs/volumes/7dcbd9ba-f0bc-4d3c-9b5c-b2ac824584d5/a8bb7e11-a9b5-4613-9b63-b9722fba2166.qcow2,if=none,id=drive-virtio-disk5,format=qcow2,cache=none,aio=threads,bps_rd=57671680,bps_wr=57671680,iops_rd=275,iops_wr=275
> -device
> virtio-blk-pci,scsi=off,bus=pci.0,addr=0xb,drive=drive-virtio-disk5,id=virtio-disk5
> -drive
> file=rbd:iso-images/rescue.iso:auth_supported=none,if=none,id=drive-ide0-0-0,readonly=on,format=raw
> -device
> ide-cd,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0,bootindex=1
> -netdev tap,fd=19,id=hostnet0,vhost=on,vhostfd=20 -device
> virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:82:41:c9,bus=pci.0,addr=0x3
> -netdev tap,fd=21,id=hostnet1,vhost=on,vhostfd=22 -device
> virtio-net-pci,netdev=hostnet1,id=net1,mac=52:54:00:70:10:35,bus=pci.0,addr=0x4
> -chardev pty,id=charserial0 -device
> isa-serial,chardev=charserial0,id=serial0 -chardev
> socket,id=charchannel0,path=/var/lib/libvirt/qemu/68189c3c-02f6-4aae-88a2-5f13c5e6f53a.agent,server,nowait
> -device
> virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.qemu.guest_agent.0
> -chardev
> socket,id=charchannel1,path=/var/lib/libvirt/qemu/68189c3c-02f6-4aae-88a2-5f13c5e6f53a.cloud.agent,server,nowait
> -device
> virtserialport,bus=virtio-serial0.0,nr=2,chardev=charchannel1,id=channel1,name=com.cloud.guest_agent.1
> -device usb-tablet,id=input0 -vnc 0.0.0.0:0,password -device
> cirrus-vga,id=video0,bus=pci.0,addr=0x2 -device
> virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0xc -sandbox on -device
> pvpanic
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] qemu 2.0, deadlock in block-commit
  2014-05-23  8:19   ` Paolo Bonzini
@ 2014-05-23  8:21     ` Marcin Gibuła
  2014-05-23  9:25     ` Marcin Gibuła
  1 sibling, 0 replies; 14+ messages in thread
From: Marcin Gibuła @ 2014-05-23  8:21 UTC (permalink / raw)
  To: Paolo Bonzini, qemu-devel

On 23.05.2014 10:19, Paolo Bonzini wrote:
> Il 22/05/2014 23:05, Marcin Gibuła ha scritto:
>> Some more info.
>> VM was doing lot of write IO during this test.
>
> QEMU is waiting for librados to complete I/O.  Can you reproduce it with
> a different driver?


I'll try.

However RBD is used only as read-only ISO - 
rbd:iso-images/rescue.iso:auth_supported=none,if=none,id=drive-ide0-0-0,readonly=on,format=raw) 
- what IO would it have to complete?


-- 
mg

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] qemu 2.0, deadlock in block-commit
  2014-05-23  8:19   ` Paolo Bonzini
  2014-05-23  8:21     ` Marcin Gibuła
@ 2014-05-23  9:25     ` Marcin Gibuła
  2014-05-23 12:29       ` Kevin Wolf
  1 sibling, 1 reply; 14+ messages in thread
From: Marcin Gibuła @ 2014-05-23  9:25 UTC (permalink / raw)
  To: qemu-devel, Paolo Bonzini

On 23.05.2014 10:19, Paolo Bonzini wrote:
> Il 22/05/2014 23:05, Marcin Gibuła ha scritto:
>> Some more info.
>> VM was doing lot of write IO during this test.
>
> QEMU is waiting for librados to complete I/O.  Can you reproduce it with
> a different driver?

Hi,

I've reproduced it without RBD. Backtrace below:

(gdb) thread apply all backtrace

Thread 4 (Thread 0x7f9c8cccd700 (LWP 2017)):
#0  0x00007f9c907717a4 in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x00007f9c9076d19c in _L_lock_518 () from /lib64/libpthread.so.0
#2  0x00007f9c9076cfeb in pthread_mutex_lock () from /lib64/libpthread.so.0
#3  0x00007f9c947addf9 in qemu_mutex_lock 
(mutex=mutex@entry=0x7f9c95002660 <qemu_global_mutex>) at 
/var/tmp/portage/app-emulation/qemu-2.0.0_rc2/work/qemu-2.0.0-rc2/util/qemu-thread-posix.c:76
#4  0x00007f9c946b3a10 in qemu_mutex_lock_iothread () at 
/var/tmp/portage/app-emulation/qemu-2.0.0_rc2/work/qemu-2.0.0-rc2/cpus.c:1043
#5  0x00007f9c9470cf3d in kvm_cpu_exec (cpu=cpu@entry=0x7f9c968bf290) at 
/var/tmp/portage/app-emulation/qemu-2.0.0_rc2/work/qemu-2.0.0-rc2/kvm-all.c:1683
#6  0x00007f9c946b271c in qemu_kvm_cpu_thread_fn (arg=0x7f9c968bf290) at 
/var/tmp/portage/app-emulation/qemu-2.0.0_rc2/work/qemu-2.0.0-rc2/cpus.c:873
#7  0x00007f9c9076af3a in start_thread () from /lib64/libpthread.so.0
#8  0x00007f9c904a4dad in clone () from /lib64/libc.so.6

Thread 3 (Thread 0x7f9c87fff700 (LWP 2018)):
#0  0x00007f9c9049c897 in ioctl () from /lib64/libc.so.6
#1  0x00007f9c9470cdf9 in kvm_vcpu_ioctl (cpu=cpu@entry=0x7f9c968fa300, 
type=type@entry=44672) at 
/var/tmp/portage/app-emulation/qemu-2.0.0_rc2/work/qemu-2.0.0-rc2/kvm-all.c:1796
#2  0x00007f9c9470cf35 in kvm_cpu_exec (cpu=cpu@entry=0x7f9c968fa300) at 
/var/tmp/portage/app-emulation/qemu-2.0.0_rc2/work/qemu-2.0.0-rc2/kvm-all.c:1681
#3  0x00007f9c946b271c in qemu_kvm_cpu_thread_fn (arg=0x7f9c968fa300) at 
/var/tmp/portage/app-emulation/qemu-2.0.0_rc2/work/qemu-2.0.0-rc2/cpus.c:873
#4  0x00007f9c9076af3a in start_thread () from /lib64/libpthread.so.0
#5  0x00007f9c904a4dad in clone () from /lib64/libc.so.6

Thread 2 (Thread 0x7f9c869ff700 (LWP 2020)):
#0  0x00007f9c9076ed0c in pthread_cond_wait@@GLIBC_2.3.2 () from 
/lib64/libpthread.so.0
#1  0x00007f9c947ae019 in qemu_cond_wait 
(cond=cond@entry=0x7f9c9695a250, mutex=mutex@entry=0x7f9c9695a280)
     at 
/var/tmp/portage/app-emulation/qemu-2.0.0_rc2/work/qemu-2.0.0-rc2/util/qemu-thread-posix.c:135
#2  0x00007f9c946a270b in vnc_worker_thread_loop 
(queue=queue@entry=0x7f9c9695a250) at 
/var/tmp/portage/app-emulation/qemu-2.0.0_rc2/work/qemu-2.0.0-rc2/ui/vnc-jobs.c:222
#3  0x00007f9c946a2ae0 in vnc_worker_thread (arg=0x7f9c9695a250) at 
/var/tmp/portage/app-emulation/qemu-2.0.0_rc2/work/qemu-2.0.0-rc2/ui/vnc-jobs.c:323
#4  0x00007f9c9076af3a in start_thread () from /lib64/libpthread.so.0
#5  0x00007f9c904a4dad in clone () from /lib64/libc.so.6

Thread 1 (Thread 0x7f9c94448900 (LWP 2013)):
#0  0x00007f9c9049b286 in ppoll () from /lib64/libc.so.6
#1  0x00007f9c9466ed9b in ppoll (__ss=0x0, __timeout=0x0, 
__nfds=<optimized out>, __fds=<optimized out>) at 
/usr/include/bits/poll2.h:77
#2  qemu_poll_ns (fds=<optimized out>, nfds=<optimized out>, 
timeout=<optimized out>) at 
/var/tmp/portage/app-emulation/qemu-2.0.0_rc2/work/qemu-2.0.0-rc2/qemu-timer.c:311
#3  0x00007f9c945027e0 in aio_poll (ctx=0x7f9c95d5bc00, 
blocking=blocking@entry=true) at 
/var/tmp/portage/app-emulation/qemu-2.0.0_rc2/work/qemu-2.0.0-rc2/aio-posix.c:221
#4  0x00007f9c94510c0a in bdrv_drain_all () at 
/var/tmp/portage/app-emulation/qemu-2.0.0_rc2/work/qemu-2.0.0-rc2/block.c:1805
#5  0x00007f9c9451787e in bdrv_close (bs=bs@entry=0x7f9c969b7d90) at 
/var/tmp/portage/app-emulation/qemu-2.0.0_rc2/work/qemu-2.0.0-rc2/block.c:1695
#6  0x00007f9c945175fa in bdrv_delete (bs=0x7f9c969b7d90) at 
/var/tmp/portage/app-emulation/qemu-2.0.0_rc2/work/qemu-2.0.0-rc2/block.c:1978
#7  bdrv_unref (bs=0x7f9c969b7d90) at 
/var/tmp/portage/app-emulation/qemu-2.0.0_rc2/work/qemu-2.0.0-rc2/block.c:5198
#8  0x00007f9c94517812 in bdrv_drop_intermediate 
(active=active@entry=0x7f9c9648f490, top=top@entry=0x7f9c969b7d90, 
base=base@entry=0x7f9c96756500)
     at 
/var/tmp/portage/app-emulation/qemu-2.0.0_rc2/work/qemu-2.0.0-rc2/block.c:2567
#9  0x00007f9c9451c963 in commit_run (opaque=0x7f9c96a1e280) at 
/var/tmp/portage/app-emulation/qemu-2.0.0_rc2/work/qemu-2.0.0-rc2/block/commit.c:144
#10 0x00007f9c9455bdca in coroutine_trampoline (i0=<optimized out>, 
i1=<optimized out>) at 
/var/tmp/portage/app-emulation/qemu-2.0.0_rc2/work/qemu-2.0.0-rc2/coroutine-ucontext.c:118
#11 0x00007f9c904009f0 in ?? () from /lib64/libc.so.6
#12 0x00007fffe4bcfee0 in ?? ()
#13 0x0000000000000000 in ?? ()

I still have this process running (hanging ;) ) if you need any more 
info. I also have no problems with reproducing it.

-- 
mg

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] qemu 2.0, deadlock in block-commit
  2014-05-23  9:25     ` Marcin Gibuła
@ 2014-05-23 12:29       ` Kevin Wolf
  2014-05-23 12:38         ` Marcin Gibuła
  0 siblings, 1 reply; 14+ messages in thread
From: Kevin Wolf @ 2014-05-23 12:29 UTC (permalink / raw)
  To: Marcin Gibuła; +Cc: Paolo Bonzini, qemu-devel

Am 23.05.2014 um 11:25 hat Marcin Gibuła geschrieben:
> On 23.05.2014 10:19, Paolo Bonzini wrote:
> >Il 22/05/2014 23:05, Marcin Gibuła ha scritto:
> >>Some more info.
> >>VM was doing lot of write IO during this test.
> >
> >QEMU is waiting for librados to complete I/O.  Can you reproduce it with
> >a different driver?
> 
> Hi,
> 
> I've reproduced it without RBD. Backtrace below:
> [...]

I see that you have a mix of aio=native and aio=threads. I can't say
much about the aio=native disks (perhaps try to reproduce without
them?), but there are definitely no worker threads for the other disks
that bdrv_drain_all() would have to wait for.

bdrv_requests_pending(), called by bdrv_requests_pending_all(), is the
function that determines for each of the disks in your VM if it still
has requests in flight that need to be completed. This function must
have returned true even though there is nothing to wait for.

Can you check which of its conditions led to this behaviour, and for
which disk it did? Either by setting a breakpoint there and
singlestepping through the function the next time it is called (if the
poll even has a timeout), or by inspecting the conditions manually in
gdb.

Kevin

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] qemu 2.0, deadlock in block-commit
  2014-05-23 12:29       ` Kevin Wolf
@ 2014-05-23 12:38         ` Marcin Gibuła
  2014-05-23 13:14           ` Marcin Gibuła
  0 siblings, 1 reply; 14+ messages in thread
From: Marcin Gibuła @ 2014-05-23 12:38 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, Paolo Bonzini

> I see that you have a mix of aio=native and aio=threads. I can't say
> much about the aio=native disks (perhaps try to reproduce without
> them?), but there are definitely no worker threads for the other disks
> that bdrv_drain_all() would have to wait for.

True. But I/O was being done only qcow2 disk with threads backend. And 
snapshot was made on this disk.

I'll try to reproduce with all 'threads'.

> bdrv_requests_pending(), called by bdrv_requests_pending_all(), is the
> function that determines for each of the disks in your VM if it still
> has requests in flight that need to be completed. This function must
> have returned true even though there is nothing to wait for.
>
> Can you check which of its conditions led to this behaviour, and for
> which disk it did? Either by setting a breakpoint there and
> singlestepping through the function the next time it is called (if the
> poll even has a timeout), or by inspecting the conditions manually in
> gdb.

I'm on it.

-- 
mg

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] qemu 2.0, deadlock in block-commit
  2014-05-22 20:49 [Qemu-devel] qemu 2.0, deadlock in block-commit Marcin Gibuła
  2014-05-22 21:05 ` Marcin Gibuła
  2014-05-22 21:25 ` Eric Blake
@ 2014-05-23 12:58 ` Stefan Hajnoczi
  2014-05-23 16:25   ` Marcin Gibuła
  2 siblings, 1 reply; 14+ messages in thread
From: Stefan Hajnoczi @ 2014-05-23 12:58 UTC (permalink / raw)
  To: Marcin Gibuła; +Cc: qemu-devel

On Thu, May 22, 2014 at 10:49:18PM +0200, Marcin Gibuła wrote:
> This is backtrace of qemu process:
> 
> (gdb) thread apply all backtrace

[...] a bunch of rbd threads, vnc worker thread, QEMU worker threads

> Thread 1 (Thread 0x7f699bfcd900 (LWP 13647)):
> #0  0x00007f6998020286 in ppoll () from /lib64/libc.so.6
> #1  0x00007f699c1f3d9b in ppoll (__ss=0x0, __timeout=0x0, __nfds=<optimized
> out>, __fds=<optimized out>) at /usr/include/bits/poll2.h:77
> #2  qemu_poll_ns (fds=<optimized out>, nfds=<optimized out>,
> timeout=<optimized out>) at /var/tmp/portage/app-emulation/qemu-2.0.0_rc2/work/qemu-2.0.0-rc2/qemu-timer.c:311
> #3  0x00007f699c0877e0 in aio_poll (ctx=0x7f699e4c9c00,
> blocking=blocking@entry=true) at /var/tmp/portage/app-emulation/qemu-2.0.0_rc2/work/qemu-2.0.0-rc2/aio-posix.c:221
> #4  0x00007f699c095c0a in bdrv_drain_all () at /var/tmp/portage/app-emulation/qemu-2.0.0_rc2/work/qemu-2.0.0-rc2/block.c:1805

QEMU is waiting for all block I/O requests to complete.  I wonder if
there is some weird interaction with rbd here, which is why this never
completes.

In gdb you can iterate over bdrv_states to inspect the open
BlockDriverState structs.  Each BDS struct has a tracked_requests list
and bdrv_drain_all() is waiting for this pending requests list to become
empty.

If you see a pending request on a RADOS block device (rbd) then it would
be good to dig deeper into QEMU's block/rbd.c driver to see why it's not
completing that request.

Are you using qcow2 on top of rbd?

> #5  0x00007f699c09c87e in bdrv_close (bs=bs@entry=0x7f699f0bc520) at /var/tmp/portage/app-emulation/qemu-2.0.0_rc2/work/qemu-2.0.0-rc2/block.c:1695
> #6  0x00007f699c09c5fa in bdrv_delete (bs=0x7f699f0bc520) at /var/tmp/portage/app-emulation/qemu-2.0.0_rc2/work/qemu-2.0.0-rc2/block.c:1978
> #7  bdrv_unref (bs=0x7f699f0bc520) at /var/tmp/portage/app-emulation/qemu-2.0.0_rc2/work/qemu-2.0.0-rc2/block.c:5198
> #8  0x00007f699c09c812 in bdrv_drop_intermediate
> (active=active@entry=0x7f699ebfd330, top=top@entry=0x7f699f0bc520,
> base=base@entry=0x7f699eec43d0) at /var/tmp/portage/app-emulation/qemu-2.0.0_rc2/work/qemu-2.0.0-rc2/block.c:2567
> #9  0x00007f699c0a1963 in commit_run (opaque=0x7f699f17dcc0) at /var/tmp/portage/app-emulation/qemu-2.0.0_rc2/work/qemu-2.0.0-rc2/block/commit.c:144
> #10 0x00007f699c0e0dca in coroutine_trampoline (i0=<optimized out>,
> i1=<optimized out>) at /var/tmp/portage/app-emulation/qemu-2.0.0_rc2/work/qemu-2.0.0-rc2/coroutine-ucontext.c:118
> #11 0x00007f6997f859f0 in ?? () from /lib64/libc.so.6
> #12 0x00007fffdbe06750 in ?? ()
> #13 0x0000000000000000 in ?? ()

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] qemu 2.0, deadlock in block-commit
  2014-05-23 12:38         ` Marcin Gibuła
@ 2014-05-23 13:14           ` Marcin Gibuła
  2014-05-23 13:18             ` Marcin Gibuła
  2014-05-23 22:02             ` Marcin Gibuła
  0 siblings, 2 replies; 14+ messages in thread
From: Marcin Gibuła @ 2014-05-23 13:14 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, Paolo Bonzini

>> bdrv_requests_pending(), called by bdrv_requests_pending_all(), is the
>> function that determines for each of the disks in your VM if it still
>> has requests in flight that need to be completed. This function must
>> have returned true even though there is nothing to wait for.
>>
>> Can you check which of its conditions led to this behaviour, and for
>> which disk it did? Either by setting a breakpoint there and
>> singlestepping through the function the next time it is called (if the
>> poll even has a timeout), or by inspecting the conditions manually in
>> gdb.

The condition that is true is:

if (!QLIST_EMPTY(&bs->tracked_requests))

and it's returned for intermediate qcow2 which is being commited.

-- 
mg

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] qemu 2.0, deadlock in block-commit
  2014-05-23 13:14           ` Marcin Gibuła
@ 2014-05-23 13:18             ` Marcin Gibuła
  2014-05-23 22:02             ` Marcin Gibuła
  1 sibling, 0 replies; 14+ messages in thread
From: Marcin Gibuła @ 2014-05-23 13:18 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, Paolo Bonzini

> The condition that is true is:
>
> if (!QLIST_EMPTY(&bs->tracked_requests))
>
> and it's returned for intermediate qcow2 which is being commited.

Btw - it's also disk that is being pounded with writes during commit.

-- 
mg

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] qemu 2.0, deadlock in block-commit
  2014-05-23 12:58 ` Stefan Hajnoczi
@ 2014-05-23 16:25   ` Marcin Gibuła
  0 siblings, 0 replies; 14+ messages in thread
From: Marcin Gibuła @ 2014-05-23 16:25 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: qemu-devel

> If you see a pending request on a RADOS block device (rbd) then it would
> be good to dig deeper into QEMU's block/rbd.c driver to see why it's not
> completing that request.
>
> Are you using qcow2 on top of rbd?

Hi,
I've already recreated this without rbd and with stock qemu 2.0.

-- 
mg

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] qemu 2.0, deadlock in block-commit
  2014-05-23 13:14           ` Marcin Gibuła
  2014-05-23 13:18             ` Marcin Gibuła
@ 2014-05-23 22:02             ` Marcin Gibuła
  1 sibling, 0 replies; 14+ messages in thread
From: Marcin Gibuła @ 2014-05-23 22:02 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, Paolo Bonzini

W dniu 2014-05-23 15:14, Marcin Gibuła pisze:
>>> bdrv_requests_pending(), called by bdrv_requests_pending_all(), is the
>>> function that determines for each of the disks in your VM if it still
>>> has requests in flight that need to be completed. This function must
>>> have returned true even though there is nothing to wait for.
>>>
>>> Can you check which of its conditions led to this behaviour, and for
>>> which disk it did? Either by setting a breakpoint there and
>>> singlestepping through the function the next time it is called (if the
>>> poll even has a timeout), or by inspecting the conditions manually in
>>> gdb.
>
> The condition that is true is:
>
> if (!QLIST_EMPTY(&bs->tracked_requests))
>
> and it's returned for intermediate qcow2 which is being commited.

My mistake, this condition is true not for intermediate file, but for an 
active one. Sorry for confusion.

-- 
mg

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2014-05-23 22:03 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-05-22 20:49 [Qemu-devel] qemu 2.0, deadlock in block-commit Marcin Gibuła
2014-05-22 21:05 ` Marcin Gibuła
2014-05-23  8:19   ` Paolo Bonzini
2014-05-23  8:21     ` Marcin Gibuła
2014-05-23  9:25     ` Marcin Gibuła
2014-05-23 12:29       ` Kevin Wolf
2014-05-23 12:38         ` Marcin Gibuła
2014-05-23 13:14           ` Marcin Gibuła
2014-05-23 13:18             ` Marcin Gibuła
2014-05-23 22:02             ` Marcin Gibuła
2014-05-22 21:25 ` Eric Blake
2014-05-22 21:43   ` Marcin Gibuła
2014-05-23 12:58 ` Stefan Hajnoczi
2014-05-23 16:25   ` Marcin Gibuła

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.