All of lore.kernel.org
 help / color / mirror / Atom feed
* bdrv_drained_begin deadlock with io-threads
@ 2020-03-31  8:46 Dietmar Maurer
  2020-03-31  9:17 ` Dietmar Maurer
  2020-03-31 12:58 ` Kevin Wolf
  0 siblings, 2 replies; 30+ messages in thread
From: Dietmar Maurer @ 2020-03-31  8:46 UTC (permalink / raw)
  To: Sergio Lopez, Max Reitz, jsnow, Stefan Hajnoczi, qemu-devel,
	kwolf, vsementsov

I can see and reproduce this error with latest code from today. 
But I also see it on stable 4.1.1 (sometimes).

I guess this is a similar problem as reported earlier:
https://lists.gnu.org/archive/html/qemu-devel/2020-03/msg07363.html

To reproduce, you need a VM using virtio-scsi-single drive using io-threads,
the run "stress -d 5" inside the VM (Debian Buster).

Then I start simply drive-backup jobs like:

{ "execute": "drive-backup", "arguments": { "device": "drive-scsi0", "sync": 
"full", "target": "backup-scsi0.raw" } }

abort them after a few seconds, then repeat that.

After a few iteration the VM freeze inside bdrv_drained_begin():

Thread 1 (Thread 0x7fffe9291080 (LWP 30949)):
#0  0x00007ffff5cb3916 in __GI_ppoll (fds=0x7fff63d30c40, nfds=2, timeout=<optimized out>, timeout@entry=0x0, sigmask=sigmask@entry=0x0) at ../sysdeps/unix/sysv/linux/ppoll.c:39
#1  0x0000555555c60419 in ppoll (__ss=0x0, __timeout=0x0, __nfds=<optimized out>, __fds=<optimized out>) at /usr/include/x86_64-linux-gnu/bits/poll2.h:77
#2  0x0000555555c60419 in qemu_poll_ns (fds=<optimized out>, nfds=<optimized out>, timeout=timeout@entry=-1) at util/qemu-timer.c:335
#3  0x0000555555c62c01 in fdmon_poll_wait (ctx=0x7fffe8905e80, ready_list=0x7fffffffd3a8, timeout=-1) at util/fdmon-poll.c:79
#4  0x0000555555c621e7 in aio_poll (ctx=0x7fffe8905e80, blocking=blocking@entry=true) at util/aio-posix.c:589
#5  0x0000555555bc2565 in bdrv_do_drained_begin (poll=<optimized out>, ignore_bds_parents=false, parent=0x0, recursive=false, bs=0x7fff67240b80) at block/io.c:430
#6  0x0000555555bc2565 in bdrv_do_drained_begin (bs=0x7fff67240b80, recursive=<optimized out>, parent=0x0, ignore_bds_parents=<optimized out>, poll=<optimized out>) at block/io.c:395
#7  0x0000555555bde252 in bdrv_backup_top_drop (bs=0x7fff67240b80) at block/backup-top.c:273
#8  0x0000555555bd995c in backup_clean (job=0x7fffe5609200) at block/backup.c:114
#9  0x0000555555b6e08d in job_clean (job=0x7fffe5609200) at job.c:657
#10 0x0000555555b6e08d in job_finalize_single (job=0x7fffe5609200) at job.c:673
#11 0x0000555555b6e08d in job_finalize_single (job=0x7fffe5609200) at job.c:661
#12 0x0000555555b6ea3a in job_completed_txn_abort (job=<optimized out>) at job.c:749
#13 0x0000555555b6ec42 in job_completed (job=0x7fffe5609200) at job.c:843
#14 0x0000555555b6ec42 in job_completed (job=0x7fffe5609200) at job.c:836
#15 0x0000555555b6edf0 in job_exit (opaque=0x7fffe5609200) at job.c:864
#16 0x0000555555c5e975 in aio_bh_call (bh=0x7fffe721a2d0) at util/async.c:164
#17 0x0000555555c5e975 in aio_bh_poll (ctx=ctx@entry=0x7fffe8905e80) at util/async.c:164
#18 0x0000555555c6202e in aio_dispatch (ctx=0x7fffe8905e80) at util/aio-posix.c:380
#19 0x0000555555c5e85e in aio_ctx_dispatch (source=<optimized out>, callback=<optimized out>, user_data=<optimized out>) at util/async.c:298
#20 0x00007ffff7c7ef2e in g_main_context_dispatch () at /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0
#21 0x0000555555c61298 in glib_pollfds_poll () at util/main-loop.c:219
#22 0x0000555555c61298 in os_host_main_loop_wait (timeout=<optimized out>) at util/main-loop.c:242
#23 0x0000555555c61298 in main_loop_wait (nonblocking=nonblocking@entry=0) at util/main-loop.c:518
#24 0x00005555558fc5a9 in qemu_main_loop () at /home/dietmar/pve5-devel/mirror_qemu/softmmu/vl.c:1665
#25 0x0000555555800c3e in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at /home/dietmar/pve5-devel/mirror_qemu/softmmu/main.c:49

Thread 1 locks the BQL, while thread7 wants to aquire it to complete the read in prepare_mmio_access():

(gdb) source ./scripts/qemu-gdb.py 
(gdb) qemu tcg-lock-status 
Thread, BQL (iothread_mutex), Replay, Blocked?
10/31149, false, false, not blocked
8/30998, false, false, __lll_lock_wait waiting on 0x555556142e60 <qemu_global_mutex> from 30949
7/30997, false, false, __lll_lock_wait waiting on 0x555556142e60 <qemu_global_mutex> from 30949
6/30996, false, false, not blocked
4/30955, false, false, not blocked
3/30954, false, false, __lll_lock_wait waiting on 0x7fffe89151e0 from 30949
2/30953, false, false, not blocked
1/30949, true, false, not blocked


(gdb) thread apply 7 bt
Thread 7 (Thread 0x7fff669ff700 (LWP 30997)):
#0  0x00007ffff5d9729c in __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:103
#1  0x00007ffff5d90714 in __GI___pthread_mutex_lock (mutex=mutex@entry=0x555556142e60 <qemu_global_mutex>) at ../nptl/pthread_mutex_lock.c:80
#2  0x0000555555c65013 in qemu_mutex_lock_impl (mutex=0x555556142e60 <qemu_global_mutex>, file=0x555555c94008 "/home/dietmar/pve5-devel/mirror_qemu/exec.c", line=3089) at util/qemu-thread-posix.c:78
#3  0x00005555558480ee in qemu_mutex_lock_iothread_impl (file=file@entry=0x555555c94008 "/home/dietmar/pve5-devel/mirror_qemu/exec.c", line=line@entry=3089)
    at /home/dietmar/pve5-devel/mirror_qemu/cpus.c:1831
#4  0x0000555555802425 in prepare_mmio_access (mr=<optimized out>, mr=<optimized out>) at /home/dietmar/pve5-devel/mirror_qemu/exec.c:3089
#5  0x000055555580759f in flatview_read_continue
    (fv=fv@entry=0x7fff5f9baf00, addr=addr@entry=375, attrs=..., ptr=ptr@entry=0x7ffff7fbe000, len=len@entry=1, addr1=<optimized out>, l=<optimized out>, mr=0x7fffe8e66e80)
    at /home/dietmar/pve5-devel/mirror_qemu/exec.c:3188
#6  0x0000555555807803 in flatview_read (fv=0x7fff5f9baf00, addr=375, attrs=..., buf=0x7ffff7fbe000, len=1) at /home/dietmar/pve5-devel/mirror_qemu/exec.c:3229
#7  0x000055555580791b in address_space_read_full (as=<optimized out>, addr=<optimized out>, attrs=..., buf=<optimized out>, len=<optimized out>) at /home/dietmar/pve5-devel/mirror_qemu/exec.c:3242
#8  0x0000555555807a25 in address_space_rw (as=<optimized out>, addr=addr@entry=375, attrs=..., attrs@entry=..., buf=<optimized out>, len=len@entry=1, is_write=is_write@entry=false)
    at /home/dietmar/pve5-devel/mirror_qemu/exec.c:3270
#9  0x0000555555865024 in kvm_handle_io (count=1, size=1, direction=<optimized out>, data=<optimized out>, attrs=..., port=375) at /home/dietmar/pve5-devel/mirror_qemu/accel/kvm/kvm-all.c:2140
#10 0x0000555555865024 in kvm_cpu_exec (cpu=cpu@entry=0x7fffe89efb00) at /home/dietmar/pve5-devel/mirror_qemu/accel/kvm/kvm-all.c:2386
#11 0x000055555584835e in qemu_kvm_cpu_thread_fn (arg=0x7fffe89efb00) at /home/dietmar/pve5-devel/mirror_qemu/cpus.c:1246
#12 0x000055555584835e in qemu_kvm_cpu_thread_fn (arg=arg@entry=0x7fffe89efb00) at /home/dietmar/pve5-devel/mirror_qemu/cpus.c:1218
#13 0x0000555555c64e1a in qemu_thread_start (args=<optimized out>) at util/qemu-thread-posix.c:519
#14 0x00007ffff5d8dfa3 in start_thread (arg=<optimized out>) at pthread_create.c:486
#15 0x00007ffff5cbe4cf in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Above code is triggered by a cancelled backup, but it sometimes triggers from drive_backup_prepare(),
for example:

(gdb) bt
#0  0x00007ffff5cb3916 in __GI_ppoll (fds=0x7fff63d30c40, nfds=2, timeout=<optimized out>, timeout@entry=0x0, sigmask=sigmask@entry=0x0) at ../sysdeps/unix/sysv/linux/ppoll.c:39
#1  0x0000555555c60419 in ppoll (__ss=0x0, __timeout=0x0, __nfds=<optimized out>, __fds=<optimized out>) at /usr/include/x86_64-linux-gnu/bits/poll2.h:77
#2  0x0000555555c60419 in qemu_poll_ns (fds=<optimized out>, nfds=<optimized out>, timeout=timeout@entry=-1) at util/qemu-timer.c:335
#3  0x0000555555c62c01 in fdmon_poll_wait (ctx=0x7fffe8905e80, ready_list=0x7fffffffd188, timeout=-1) at util/fdmon-poll.c:79
#4  0x0000555555c621e7 in aio_poll (ctx=0x7fffe8905e80, blocking=blocking@entry=true) at util/aio-posix.c:589
#5  0x0000555555bc2565 in bdrv_do_drained_begin (poll=<optimized out>, ignore_bds_parents=false, parent=0x0, recursive=false, bs=0x7fffe8954bc0) at block/io.c:430
#6  0x0000555555bc2565 in bdrv_do_drained_begin (bs=0x7fffe8954bc0, recursive=<optimized out>, parent=0x0, ignore_bds_parents=<optimized out>, poll=<optimized out>) at block/io.c:395
#7  0x000055555595f753 in drive_backup_prepare (common=0x7fffe8910780, errp=0x7fffffffd2c8) at blockdev.c:1759
#8  0x0000555555961f0a in qmp_transaction (dev_list=dev_list@entry=0x7fffffffd340, has_props=has_props@entry=false, props=0x7fffe560e200, props@entry=0x0, errp=errp@entry=0x7fffffffd378) at blockdev.c:2406
#9  0x00005555559633c2 in blockdev_do_action (errp=0x7fffffffd378, action=0x7fffffffd330) at blockdev.c:1054
#10 0x00005555559633c2 in qmp_drive_backup (backup=backup@entry=0x7fffffffd380, errp=errp@entry=0x7fffffffd378) at blockdev.c:3134
#11 0x0000555555b1e1d1 in qmp_marshal_drive_backup (args=<optimized out>, ret=<optimized out>, errp=0x7fffffffd458) at qapi/qapi-commands-block-core.c:555
#12 0x0000555555c15aa8 in qmp_dispatch (cmds=0x555556162000 <qmp_commands>, request=<optimized out>, allow_oob=<optimized out>) at qapi/qmp-dispatch.c:155
#13 0x0000555555affca1 in monitor_qmp_dispatch (mon=0x7fffe8915d80, req=<optimized out>) at monitor/qmp.c:145
#14 0x0000555555b0048a in monitor_qmp_bh_dispatcher (data=<optimized out>) at monitor/qmp.c:234
#15 0x0000555555c5e975 in aio_bh_call (bh=0x7fffe8e8f960) at util/async.c:164
#16 0x0000555555c5e975 in aio_bh_poll (ctx=ctx@entry=0x7fffe8f3ef80) at util/async.c:164
#17 0x0000555555c6202e in aio_dispatch (ctx=0x7fffe8f3ef80) at util/aio-posix.c:380
#18 0x0000555555c5e85e in aio_ctx_dispatch (source=<optimized out>, callback=<optimized out>, user_data=<optimized out>) at util/async.c:298
#19 0x00007ffff7c7ef2e in g_main_context_dispatch () at /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0
#20 0x0000555555c61298 in glib_pollfds_poll () at util/main-loop.c:219
#21 0x0000555555c61298 in os_host_main_loop_wait (timeout=<optimized out>) at util/main-loop.c:242
#22 0x0000555555c61298 in main_loop_wait (nonblocking=nonblocking@entry=0) at util/main-loop.c:518
#23 0x00005555558fc5a9 in qemu_main_loop () at /home/dietmar/pve5-devel/mirror_qemu/softmmu/vl.c:1665
#24 0x0000555555800c3e in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at /home/dietmar/pve5-devel/mirror_qemu/softmmu/main.c:49


(gdb) source ./scripts/qemu-gdb.py 
(gdb) qemu tcg-lock-status 
Thread, BQL (iothread_mutex), Replay, Blocked?
10/7233, false, false, not blocked
8/7082, false, false, __lll_lock_wait waiting on 0x555556142e60 <qemu_global_mutex> from 7037
7/7081, false, false, __lll_lock_wait waiting on 0x555556142e60 <qemu_global_mutex> from 7037
6/7080, false, false, not blocked
4/7043, false, false, not blocked
3/7042, false, false, not blocked
2/7041, false, false, not blocked
1/7037, true, false, not blocked

(gdb) thread apply 7 bt

Thread 7 (Thread 0x7fff669ff700 (LWP 7081)):
#0  0x00007ffff5d9729c in __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:103
#1  0x00007ffff5d90714 in __GI___pthread_mutex_lock (mutex=mutex@entry=0x555556142e60 <qemu_global_mutex>) at ../nptl/pthread_mutex_lock.c:80
#2  0x0000555555c65013 in qemu_mutex_lock_impl (mutex=0x555556142e60 <qemu_global_mutex>, file=0x555555c94008 "/home/dietmar/pve5-devel/mirror_qemu/exec.c", line=3089) at util/qemu-thread-posix.c:78
#3  0x00005555558480ee in qemu_mutex_lock_iothread_impl (file=file@entry=0x555555c94008 "/home/dietmar/pve5-devel/mirror_qemu/exec.c", line=line@entry=3089)
    at /home/dietmar/pve5-devel/mirror_qemu/cpus.c:1831
#4  0x0000555555802425 in prepare_mmio_access (mr=<optimized out>, mr=<optimized out>) at /home/dietmar/pve5-devel/mirror_qemu/exec.c:3089
#5  0x000055555580759f in flatview_read_continue
    (fv=fv@entry=0x7fff5f635f00, addr=addr@entry=375, attrs=..., ptr=ptr@entry=0x7ffff7fbe000, len=len@entry=1, addr1=<optimized out>, l=<optimized out>, mr=0x7fffe8e66e80)
    at /home/dietmar/pve5-devel/mirror_qemu/exec.c:3188
#6  0x0000555555807803 in flatview_read (fv=0x7fff5f635f00, addr=375, attrs=..., buf=0x7ffff7fbe000, len=1) at /home/dietmar/pve5-devel/mirror_qemu/exec.c:3229
#7  0x000055555580791b in address_space_read_full (as=<optimized out>, addr=<optimized out>, attrs=..., buf=<optimized out>, len=<optimized out>) at /home/dietmar/pve5-devel/mirror_qemu/exec.c:3242
#8  0x0000555555807a25 in address_space_rw (as=<optimized out>, addr=addr@entry=375, attrs=..., attrs@entry=..., buf=<optimized out>, len=len@entry=1, is_write=is_write@entry=false)
    at /home/dietmar/pve5-devel/mirror_qemu/exec.c:3270
#9  0x0000555555865024 in kvm_handle_io (count=1, size=1, direction=<optimized out>, data=<optimized out>, attrs=..., port=375) at /home/dietmar/pve5-devel/mirror_qemu/accel/kvm/kvm-all.c:2140
#10 0x0000555555865024 in kvm_cpu_exec (cpu=cpu@entry=0x7fffe89efb00) at /home/dietmar/pve5-devel/mirror_qemu/accel/kvm/kvm-all.c:2386
#11 0x000055555584835e in qemu_kvm_cpu_thread_fn (arg=0x7fffe89efb00) at /home/dietmar/pve5-devel/mirror_qemu/cpus.c:1246
#12 0x000055555584835e in qemu_kvm_cpu_thread_fn (arg=arg@entry=0x7fffe89efb00) at /home/dietmar/pve5-devel/mirror_qemu/cpus.c:1218
#13 0x0000555555c64e1a in qemu_thread_start (args=<optimized out>) at util/qemu-thread-posix.c:519
#14 0x00007ffff5d8dfa3 in start_thread (arg=<optimized out>) at pthread_create.c:486
#15 0x00007ffff5cbe4cf in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

So the pattern is:

1.) Main thread has the BQL
2.) Main thread calls bdrv_drained_begin(bs);
3.) CPU thread want to complete read prepare_mmio_access(), but needs BQL
4.) deadlock

Any ideas how to prevent that?



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: bdrv_drained_begin deadlock with io-threads
  2020-03-31  8:46 bdrv_drained_begin deadlock with io-threads Dietmar Maurer
@ 2020-03-31  9:17 ` Dietmar Maurer
  2020-03-31  9:33   ` Dietmar Maurer
  2020-03-31 12:58 ` Kevin Wolf
  1 sibling, 1 reply; 30+ messages in thread
From: Dietmar Maurer @ 2020-03-31  9:17 UTC (permalink / raw)
  To: Sergio Lopez, Max Reitz, jsnow, Stefan Hajnoczi, qemu-devel,
	kwolf, vsementsov

Inside exec.c, there is a race:

-------
static bool prepare_mmio_access(MemoryRegion *mr)
{
    bool unlocked = !qemu_mutex_iothread_locked();
    bool release_lock = false;

    if (unlocked && mr->global_locking) {
        qemu_mutex_lock_iothread();
------

IMHO, checking for unlocked that way is wrong. Should use some kind
of _trylock instead?



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: bdrv_drained_begin deadlock with io-threads
  2020-03-31  9:17 ` Dietmar Maurer
@ 2020-03-31  9:33   ` Dietmar Maurer
  0 siblings, 0 replies; 30+ messages in thread
From: Dietmar Maurer @ 2020-03-31  9:33 UTC (permalink / raw)
  To: Sergio Lopez, Max Reitz, jsnow, Stefan Hajnoczi, qemu-devel,
	kwolf, vsementsov

> Inside exec.c, there is a race:
> 
> -------
> static bool prepare_mmio_access(MemoryRegion *mr)
> {
>     bool unlocked = !qemu_mutex_iothread_locked();
>     bool release_lock = false;
> 
>     if (unlocked && mr->global_locking) {
>         qemu_mutex_lock_iothread();
> ------
> 
> IMHO, checking for unlocked that way is wrong. Should use some kind
> of _trylock instead?

Answering this myself: above code is correct, because "iothread_locked" is a thread local var,
and we want to lock the mutex if we did not already...



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: bdrv_drained_begin deadlock with io-threads
  2020-03-31  8:46 bdrv_drained_begin deadlock with io-threads Dietmar Maurer
  2020-03-31  9:17 ` Dietmar Maurer
@ 2020-03-31 12:58 ` Kevin Wolf
  2020-03-31 14:32   ` Dietmar Maurer
  1 sibling, 1 reply; 30+ messages in thread
From: Kevin Wolf @ 2020-03-31 12:58 UTC (permalink / raw)
  To: Dietmar Maurer
  Cc: vsementsov, Sergio Lopez, qemu-block, qemu-devel, Max Reitz,
	Stefan Hajnoczi, jsnow

[ CC qemu-block ]

Am 31.03.2020 um 10:46 hat Dietmar Maurer geschrieben:
> I can see and reproduce this error with latest code from today. 
> But I also see it on stable 4.1.1 (sometimes).
> 
> I guess this is a similar problem as reported earlier:
> https://lists.gnu.org/archive/html/qemu-devel/2020-03/msg07363.html
> 
> To reproduce, you need a VM using virtio-scsi-single drive using io-threads,
> the run "stress -d 5" inside the VM (Debian Buster).
> 
> Then I start simply drive-backup jobs like:
> 
> { "execute": "drive-backup", "arguments": { "device": "drive-scsi0", "sync": 
> "full", "target": "backup-scsi0.raw" } }
> 
> abort them after a few seconds, then repeat that.
> 
> After a few iteration the VM freeze inside bdrv_drained_begin():
> 
> Thread 1 (Thread 0x7fffe9291080 (LWP 30949)):
> #0  0x00007ffff5cb3916 in __GI_ppoll (fds=0x7fff63d30c40, nfds=2, timeout=<optimized out>, timeout@entry=0x0, sigmask=sigmask@entry=0x0) at ../sysdeps/unix/sysv/linux/ppoll.c:39
> #1  0x0000555555c60419 in ppoll (__ss=0x0, __timeout=0x0, __nfds=<optimized out>, __fds=<optimized out>) at /usr/include/x86_64-linux-gnu/bits/poll2.h:77
> #2  0x0000555555c60419 in qemu_poll_ns (fds=<optimized out>, nfds=<optimized out>, timeout=timeout@entry=-1) at util/qemu-timer.c:335
> #3  0x0000555555c62c01 in fdmon_poll_wait (ctx=0x7fffe8905e80, ready_list=0x7fffffffd3a8, timeout=-1) at util/fdmon-poll.c:79
> #4  0x0000555555c621e7 in aio_poll (ctx=0x7fffe8905e80, blocking=blocking@entry=true) at util/aio-posix.c:589
> #5  0x0000555555bc2565 in bdrv_do_drained_begin (poll=<optimized out>, ignore_bds_parents=false, parent=0x0, recursive=false, bs=0x7fff67240b80) at block/io.c:430
> #6  0x0000555555bc2565 in bdrv_do_drained_begin (bs=0x7fff67240b80, recursive=<optimized out>, parent=0x0, ignore_bds_parents=<optimized out>, poll=<optimized out>) at block/io.c:395
> #7  0x0000555555bde252 in bdrv_backup_top_drop (bs=0x7fff67240b80) at block/backup-top.c:273
> #8  0x0000555555bd995c in backup_clean (job=0x7fffe5609200) at block/backup.c:114
> #9  0x0000555555b6e08d in job_clean (job=0x7fffe5609200) at job.c:657
> #10 0x0000555555b6e08d in job_finalize_single (job=0x7fffe5609200) at job.c:673
> #11 0x0000555555b6e08d in job_finalize_single (job=0x7fffe5609200) at job.c:661
> #12 0x0000555555b6ea3a in job_completed_txn_abort (job=<optimized out>) at job.c:749
> #13 0x0000555555b6ec42 in job_completed (job=0x7fffe5609200) at job.c:843
> #14 0x0000555555b6ec42 in job_completed (job=0x7fffe5609200) at job.c:836
> #15 0x0000555555b6edf0 in job_exit (opaque=0x7fffe5609200) at job.c:864
> #16 0x0000555555c5e975 in aio_bh_call (bh=0x7fffe721a2d0) at util/async.c:164
> #17 0x0000555555c5e975 in aio_bh_poll (ctx=ctx@entry=0x7fffe8905e80) at util/async.c:164
> #18 0x0000555555c6202e in aio_dispatch (ctx=0x7fffe8905e80) at util/aio-posix.c:380
> #19 0x0000555555c5e85e in aio_ctx_dispatch (source=<optimized out>, callback=<optimized out>, user_data=<optimized out>) at util/async.c:298
> #20 0x00007ffff7c7ef2e in g_main_context_dispatch () at /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0
> #21 0x0000555555c61298 in glib_pollfds_poll () at util/main-loop.c:219
> #22 0x0000555555c61298 in os_host_main_loop_wait (timeout=<optimized out>) at util/main-loop.c:242
> #23 0x0000555555c61298 in main_loop_wait (nonblocking=nonblocking@entry=0) at util/main-loop.c:518
> #24 0x00005555558fc5a9 in qemu_main_loop () at /home/dietmar/pve5-devel/mirror_qemu/softmmu/vl.c:1665
> #25 0x0000555555800c3e in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at /home/dietmar/pve5-devel/mirror_qemu/softmmu/main.c:49

The thing that we need to figure out is probably what operation we are
(or were) waiting for and why it doesn't complete.

A typical cause of a hang like this is that the operation we're waiting
for actually did already complete in a different thread, but forgot to
call aio_wait_kick().

To confirm, if you have gdb attached to a hanging process, you could
manually call bdrv_drain_poll_top_level(bs, 0, 0) and check its result.
If it returns false, then a kick is missing somewhere. If it returns
true, some request is still waiting for completion somewhere and can be
inspected in gdb.

> Thread 1 locks the BQL, while thread7 wants to aquire it to complete the read in prepare_mmio_access():
> 
> (gdb) source ./scripts/qemu-gdb.py 
> (gdb) qemu tcg-lock-status 
> Thread, BQL (iothread_mutex), Replay, Blocked?
> 10/31149, false, false, not blocked
> 8/30998, false, false, __lll_lock_wait waiting on 0x555556142e60 <qemu_global_mutex> from 30949
> 7/30997, false, false, __lll_lock_wait waiting on 0x555556142e60 <qemu_global_mutex> from 30949
> 6/30996, false, false, not blocked
> 4/30955, false, false, not blocked
> 3/30954, false, false, __lll_lock_wait waiting on 0x7fffe89151e0 from 30949
> 2/30953, false, false, not blocked
> 1/30949, true, false, not blocked
> 
> 
> (gdb) thread apply 7 bt
> Thread 7 (Thread 0x7fff669ff700 (LWP 30997)):
> #0  0x00007ffff5d9729c in __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:103
> #1  0x00007ffff5d90714 in __GI___pthread_mutex_lock (mutex=mutex@entry=0x555556142e60 <qemu_global_mutex>) at ../nptl/pthread_mutex_lock.c:80
> #2  0x0000555555c65013 in qemu_mutex_lock_impl (mutex=0x555556142e60 <qemu_global_mutex>, file=0x555555c94008 "/home/dietmar/pve5-devel/mirror_qemu/exec.c", line=3089) at util/qemu-thread-posix.c:78
> #3  0x00005555558480ee in qemu_mutex_lock_iothread_impl (file=file@entry=0x555555c94008 "/home/dietmar/pve5-devel/mirror_qemu/exec.c", line=line@entry=3089)
>     at /home/dietmar/pve5-devel/mirror_qemu/cpus.c:1831
> #4  0x0000555555802425 in prepare_mmio_access (mr=<optimized out>, mr=<optimized out>) at /home/dietmar/pve5-devel/mirror_qemu/exec.c:3089
> #5  0x000055555580759f in flatview_read_continue
>     (fv=fv@entry=0x7fff5f9baf00, addr=addr@entry=375, attrs=..., ptr=ptr@entry=0x7ffff7fbe000, len=len@entry=1, addr1=<optimized out>, l=<optimized out>, mr=0x7fffe8e66e80)
>     at /home/dietmar/pve5-devel/mirror_qemu/exec.c:3188
> #6  0x0000555555807803 in flatview_read (fv=0x7fff5f9baf00, addr=375, attrs=..., buf=0x7ffff7fbe000, len=1) at /home/dietmar/pve5-devel/mirror_qemu/exec.c:3229
> #7  0x000055555580791b in address_space_read_full (as=<optimized out>, addr=<optimized out>, attrs=..., buf=<optimized out>, len=<optimized out>) at /home/dietmar/pve5-devel/mirror_qemu/exec.c:3242
> #8  0x0000555555807a25 in address_space_rw (as=<optimized out>, addr=addr@entry=375, attrs=..., attrs@entry=..., buf=<optimized out>, len=len@entry=1, is_write=is_write@entry=false)
>     at /home/dietmar/pve5-devel/mirror_qemu/exec.c:3270
> #9  0x0000555555865024 in kvm_handle_io (count=1, size=1, direction=<optimized out>, data=<optimized out>, attrs=..., port=375) at /home/dietmar/pve5-devel/mirror_qemu/accel/kvm/kvm-all.c:2140
> #10 0x0000555555865024 in kvm_cpu_exec (cpu=cpu@entry=0x7fffe89efb00) at /home/dietmar/pve5-devel/mirror_qemu/accel/kvm/kvm-all.c:2386
> #11 0x000055555584835e in qemu_kvm_cpu_thread_fn (arg=0x7fffe89efb00) at /home/dietmar/pve5-devel/mirror_qemu/cpus.c:1246
> #12 0x000055555584835e in qemu_kvm_cpu_thread_fn (arg=arg@entry=0x7fffe89efb00) at /home/dietmar/pve5-devel/mirror_qemu/cpus.c:1218
> #13 0x0000555555c64e1a in qemu_thread_start (args=<optimized out>) at util/qemu-thread-posix.c:519
> #14 0x00007ffff5d8dfa3 in start_thread (arg=<optimized out>) at pthread_create.c:486
> #15 0x00007ffff5cbe4cf in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
> [...]

> So the pattern is:
> 
> 1.) Main thread has the BQL
> 2.) Main thread calls bdrv_drained_begin(bs);
> 3.) CPU thread want to complete read prepare_mmio_access(), but needs BQL
> 4.) deadlock
> 
> Any ideas how to prevent that?

What's going on in the vcpu thread (thread 7) is probably not part of
the problem. It does have to wait for the main thread, but it would only
really be part of the deadlock if the main thread would also wait for
the vcpu thread, which I don't think it does.

Kevin



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: bdrv_drained_begin deadlock with io-threads
  2020-03-31 12:58 ` Kevin Wolf
@ 2020-03-31 14:32   ` Dietmar Maurer
  2020-03-31 14:53     ` Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 30+ messages in thread
From: Dietmar Maurer @ 2020-03-31 14:32 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: vsementsov, Sergio Lopez, qemu-block, qemu-devel, Max Reitz,
	Stefan Hajnoczi, jsnow

> > After a few iteration the VM freeze inside bdrv_drained_begin():
> > 
> > Thread 1 (Thread 0x7fffe9291080 (LWP 30949)):
> > #0  0x00007ffff5cb3916 in __GI_ppoll (fds=0x7fff63d30c40, nfds=2, timeout=<optimized out>, timeout@entry=0x0, sigmask=sigmask@entry=0x0) at ../sysdeps/unix/sysv/linux/ppoll.c:39
> > #1  0x0000555555c60419 in ppoll (__ss=0x0, __timeout=0x0, __nfds=<optimized out>, __fds=<optimized out>) at /usr/include/x86_64-linux-gnu/bits/poll2.h:77
> > #2  0x0000555555c60419 in qemu_poll_ns (fds=<optimized out>, nfds=<optimized out>, timeout=timeout@entry=-1) at util/qemu-timer.c:335
> > #3  0x0000555555c62c01 in fdmon_poll_wait (ctx=0x7fffe8905e80, ready_list=0x7fffffffd3a8, timeout=-1) at util/fdmon-poll.c:79
> > #4  0x0000555555c621e7 in aio_poll (ctx=0x7fffe8905e80, blocking=blocking@entry=true) at util/aio-posix.c:589
> > #5  0x0000555555bc2565 in bdrv_do_drained_begin (poll=<optimized out>, ignore_bds_parents=false, parent=0x0, recursive=false, bs=0x7fff67240b80) at block/io.c:430
> > #6  0x0000555555bc2565 in bdrv_do_drained_begin (bs=0x7fff67240b80, recursive=<optimized out>, parent=0x0, ignore_bds_parents=<optimized out>, poll=<optimized out>) at block/io.c:395
> > #7  0x0000555555bde252 in bdrv_backup_top_drop (bs=0x7fff67240b80) at block/backup-top.c:273
> > #8  0x0000555555bd995c in backup_clean (job=0x7fffe5609200) at block/backup.c:114
> > #9  0x0000555555b6e08d in job_clean (job=0x7fffe5609200) at job.c:657
> > #10 0x0000555555b6e08d in job_finalize_single (job=0x7fffe5609200) at job.c:673
> > #11 0x0000555555b6e08d in job_finalize_single (job=0x7fffe5609200) at job.c:661
> > #12 0x0000555555b6ea3a in job_completed_txn_abort (job=<optimized out>) at job.c:749
> > #13 0x0000555555b6ec42 in job_completed (job=0x7fffe5609200) at job.c:843
> > #14 0x0000555555b6ec42 in job_completed (job=0x7fffe5609200) at job.c:836
> > #15 0x0000555555b6edf0 in job_exit (opaque=0x7fffe5609200) at job.c:864
> > #16 0x0000555555c5e975 in aio_bh_call (bh=0x7fffe721a2d0) at util/async.c:164
> > #17 0x0000555555c5e975 in aio_bh_poll (ctx=ctx@entry=0x7fffe8905e80) at util/async.c:164
> > #18 0x0000555555c6202e in aio_dispatch (ctx=0x7fffe8905e80) at util/aio-posix.c:380
> > #19 0x0000555555c5e85e in aio_ctx_dispatch (source=<optimized out>, callback=<optimized out>, user_data=<optimized out>) at util/async.c:298
> > #20 0x00007ffff7c7ef2e in g_main_context_dispatch () at /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0
> > #21 0x0000555555c61298 in glib_pollfds_poll () at util/main-loop.c:219
> > #22 0x0000555555c61298 in os_host_main_loop_wait (timeout=<optimized out>) at util/main-loop.c:242
> > #23 0x0000555555c61298 in main_loop_wait (nonblocking=nonblocking@entry=0) at util/main-loop.c:518
> > #24 0x00005555558fc5a9 in qemu_main_loop () at /home/dietmar/pve5-devel/mirror_qemu/softmmu/vl.c:1665
> > #25 0x0000555555800c3e in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at /home/dietmar/pve5-devel/mirror_qemu/softmmu/main.c:49
> 
> The thing that we need to figure out is probably what operation we are
> (or were) waiting for and why it doesn't complete.
> 
> A typical cause of a hang like this is that the operation we're waiting
> for actually did already complete in a different thread, but forgot to
> call aio_wait_kick().

no, that is not the case.

> To confirm, if you have gdb attached to a hanging process, you could
> manually call bdrv_drain_poll_top_level(bs, 0, 0) and check its result.
> If it returns false, 

it return true, because there are about 30 in_flight request (block_backend). But poll does not
get any events ...

> then a kick is missing somewhere. If it returns
> true, some request is still waiting for completion somewhere and can be
> inspected in gdb.

How can I see/debug those waiting request? 

> > Thread 1 locks the BQL, while thread7 wants to aquire it to complete the read in prepare_mmio_access():
> > 
> > (gdb) source ./scripts/qemu-gdb.py 
> > (gdb) qemu tcg-lock-status 
> > Thread, BQL (iothread_mutex), Replay, Blocked?
> > 10/31149, false, false, not blocked
> > 8/30998, false, false, __lll_lock_wait waiting on 0x555556142e60 <qemu_global_mutex> from 30949
> > 7/30997, false, false, __lll_lock_wait waiting on 0x555556142e60 <qemu_global_mutex> from 30949
> > 6/30996, false, false, not blocked
> > 4/30955, false, false, not blocked
> > 3/30954, false, false, __lll_lock_wait waiting on 0x7fffe89151e0 from 30949
> > 2/30953, false, false, not blocked
> > 1/30949, true, false, not blocked
> > 
> > 
> > (gdb) thread apply 7 bt
> > Thread 7 (Thread 0x7fff669ff700 (LWP 30997)):
> > #0  0x00007ffff5d9729c in __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:103
> > #1  0x00007ffff5d90714 in __GI___pthread_mutex_lock (mutex=mutex@entry=0x555556142e60 <qemu_global_mutex>) at ../nptl/pthread_mutex_lock.c:80
> > #2  0x0000555555c65013 in qemu_mutex_lock_impl (mutex=0x555556142e60 <qemu_global_mutex>, file=0x555555c94008 "/home/dietmar/pve5-devel/mirror_qemu/exec.c", line=3089) at util/qemu-thread-posix.c:78
> > #3  0x00005555558480ee in qemu_mutex_lock_iothread_impl (file=file@entry=0x555555c94008 "/home/dietmar/pve5-devel/mirror_qemu/exec.c", line=line@entry=3089)
> >     at /home/dietmar/pve5-devel/mirror_qemu/cpus.c:1831
> > #4  0x0000555555802425 in prepare_mmio_access (mr=<optimized out>, mr=<optimized out>) at /home/dietmar/pve5-devel/mirror_qemu/exec.c:3089
> > #5  0x000055555580759f in flatview_read_continue
> >     (fv=fv@entry=0x7fff5f9baf00, addr=addr@entry=375, attrs=..., ptr=ptr@entry=0x7ffff7fbe000, len=len@entry=1, addr1=<optimized out>, l=<optimized out>, mr=0x7fffe8e66e80)
> >     at /home/dietmar/pve5-devel/mirror_qemu/exec.c:3188
> > #6  0x0000555555807803 in flatview_read (fv=0x7fff5f9baf00, addr=375, attrs=..., buf=0x7ffff7fbe000, len=1) at /home/dietmar/pve5-devel/mirror_qemu/exec.c:3229
> > #7  0x000055555580791b in address_space_read_full (as=<optimized out>, addr=<optimized out>, attrs=..., buf=<optimized out>, len=<optimized out>) at /home/dietmar/pve5-devel/mirror_qemu/exec.c:3242
> > #8  0x0000555555807a25 in address_space_rw (as=<optimized out>, addr=addr@entry=375, attrs=..., attrs@entry=..., buf=<optimized out>, len=len@entry=1, is_write=is_write@entry=false)
> >     at /home/dietmar/pve5-devel/mirror_qemu/exec.c:3270
> > #9  0x0000555555865024 in kvm_handle_io (count=1, size=1, direction=<optimized out>, data=<optimized out>, attrs=..., port=375) at /home/dietmar/pve5-devel/mirror_qemu/accel/kvm/kvm-all.c:2140
> > #10 0x0000555555865024 in kvm_cpu_exec (cpu=cpu@entry=0x7fffe89efb00) at /home/dietmar/pve5-devel/mirror_qemu/accel/kvm/kvm-all.c:2386
> > #11 0x000055555584835e in qemu_kvm_cpu_thread_fn (arg=0x7fffe89efb00) at /home/dietmar/pve5-devel/mirror_qemu/cpus.c:1246
> > #12 0x000055555584835e in qemu_kvm_cpu_thread_fn (arg=arg@entry=0x7fffe89efb00) at /home/dietmar/pve5-devel/mirror_qemu/cpus.c:1218
> > #13 0x0000555555c64e1a in qemu_thread_start (args=<optimized out>) at util/qemu-thread-posix.c:519
> > #14 0x00007ffff5d8dfa3 in start_thread (arg=<optimized out>) at pthread_create.c:486
> > #15 0x00007ffff5cbe4cf in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
> > [...]
> 
> > So the pattern is:
> > 
> > 1.) Main thread has the BQL
> > 2.) Main thread calls bdrv_drained_begin(bs);
> > 3.) CPU thread want to complete read prepare_mmio_access(), but needs BQL
> > 4.) deadlock
> > 
> > Any ideas how to prevent that?
> 
> What's going on in the vcpu thread (thread 7) is probably not part of
> the problem. It does have to wait for the main thread, but it would only
> really be part of the deadlock if the main thread would also wait for
> the vcpu thread, which I don't think it does.

Yes, I guess you are right. This read is triggered by the guest (not by bdrv_drained_begin).



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: bdrv_drained_begin deadlock with io-threads
  2020-03-31 14:32   ` Dietmar Maurer
@ 2020-03-31 14:53     ` Vladimir Sementsov-Ogievskiy
  2020-03-31 15:24       ` Dietmar Maurer
  0 siblings, 1 reply; 30+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-03-31 14:53 UTC (permalink / raw)
  To: Dietmar Maurer, Kevin Wolf
  Cc: Sergio Lopez, qemu-block, qemu-devel, Max Reitz, Stefan Hajnoczi, jsnow

31.03.2020 17:32, Dietmar Maurer wrote:
>>> After a few iteration the VM freeze inside bdrv_drained_begin():
>>>
>>> Thread 1 (Thread 0x7fffe9291080 (LWP 30949)):
>>> #0  0x00007ffff5cb3916 in __GI_ppoll (fds=0x7fff63d30c40, nfds=2, timeout=<optimized out>, timeout@entry=0x0, sigmask=sigmask@entry=0x0) at ../sysdeps/unix/sysv/linux/ppoll.c:39
>>> #1  0x0000555555c60419 in ppoll (__ss=0x0, __timeout=0x0, __nfds=<optimized out>, __fds=<optimized out>) at /usr/include/x86_64-linux-gnu/bits/poll2.h:77
>>> #2  0x0000555555c60419 in qemu_poll_ns (fds=<optimized out>, nfds=<optimized out>, timeout=timeout@entry=-1) at util/qemu-timer.c:335
>>> #3  0x0000555555c62c01 in fdmon_poll_wait (ctx=0x7fffe8905e80, ready_list=0x7fffffffd3a8, timeout=-1) at util/fdmon-poll.c:79
>>> #4  0x0000555555c621e7 in aio_poll (ctx=0x7fffe8905e80, blocking=blocking@entry=true) at util/aio-posix.c:589
>>> #5  0x0000555555bc2565 in bdrv_do_drained_begin (poll=<optimized out>, ignore_bds_parents=false, parent=0x0, recursive=false, bs=0x7fff67240b80) at block/io.c:430
>>> #6  0x0000555555bc2565 in bdrv_do_drained_begin (bs=0x7fff67240b80, recursive=<optimized out>, parent=0x0, ignore_bds_parents=<optimized out>, poll=<optimized out>) at block/io.c:395
>>> #7  0x0000555555bde252 in bdrv_backup_top_drop (bs=0x7fff67240b80) at block/backup-top.c:273
>>> #8  0x0000555555bd995c in backup_clean (job=0x7fffe5609200) at block/backup.c:114
>>> #9  0x0000555555b6e08d in job_clean (job=0x7fffe5609200) at job.c:657
>>> #10 0x0000555555b6e08d in job_finalize_single (job=0x7fffe5609200) at job.c:673
>>> #11 0x0000555555b6e08d in job_finalize_single (job=0x7fffe5609200) at job.c:661
>>> #12 0x0000555555b6ea3a in job_completed_txn_abort (job=<optimized out>) at job.c:749
>>> #13 0x0000555555b6ec42 in job_completed (job=0x7fffe5609200) at job.c:843
>>> #14 0x0000555555b6ec42 in job_completed (job=0x7fffe5609200) at job.c:836
>>> #15 0x0000555555b6edf0 in job_exit (opaque=0x7fffe5609200) at job.c:864
>>> #16 0x0000555555c5e975 in aio_bh_call (bh=0x7fffe721a2d0) at util/async.c:164
>>> #17 0x0000555555c5e975 in aio_bh_poll (ctx=ctx@entry=0x7fffe8905e80) at util/async.c:164
>>> #18 0x0000555555c6202e in aio_dispatch (ctx=0x7fffe8905e80) at util/aio-posix.c:380
>>> #19 0x0000555555c5e85e in aio_ctx_dispatch (source=<optimized out>, callback=<optimized out>, user_data=<optimized out>) at util/async.c:298
>>> #20 0x00007ffff7c7ef2e in g_main_context_dispatch () at /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0
>>> #21 0x0000555555c61298 in glib_pollfds_poll () at util/main-loop.c:219
>>> #22 0x0000555555c61298 in os_host_main_loop_wait (timeout=<optimized out>) at util/main-loop.c:242
>>> #23 0x0000555555c61298 in main_loop_wait (nonblocking=nonblocking@entry=0) at util/main-loop.c:518
>>> #24 0x00005555558fc5a9 in qemu_main_loop () at /home/dietmar/pve5-devel/mirror_qemu/softmmu/vl.c:1665
>>> #25 0x0000555555800c3e in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at /home/dietmar/pve5-devel/mirror_qemu/softmmu/main.c:49
>>
>> The thing that we need to figure out is probably what operation we are
>> (or were) waiting for and why it doesn't complete.
>>
>> A typical cause of a hang like this is that the operation we're waiting
>> for actually did already complete in a different thread, but forgot to
>> call aio_wait_kick().
> 
> no, that is not the case.
> 
>> To confirm, if you have gdb attached to a hanging process, you could
>> manually call bdrv_drain_poll_top_level(bs, 0, 0) and check its result.
>> If it returns false,
> 
> it return true, because there are about 30 in_flight request (block_backend). But poll does not
> get any events ...
> 
>> then a kick is missing somewhere. If it returns
>> true, some request is still waiting for completion somewhere and can be
>> inspected in gdb.
> 
> How can I see/debug those waiting request?

Examine bs->tracked_requests list.

BdrvTrackedRequest has "Coroutine *co" field. It's a pointer of coroutine of this request. You may use qemu-gdb script to print request's coroutine back-trace:

gdb> source qemu_source/scripts/qemu-gdb.py

gdb> qemu coroutine CO_POINTER

- this will show, what exactly the request is doing now/waiting for.

> 
>>> Thread 1 locks the BQL, while thread7 wants to aquire it to complete the read in prepare_mmio_access():
>>>
>>> (gdb) source ./scripts/qemu-gdb.py
>>> (gdb) qemu tcg-lock-status
>>> Thread, BQL (iothread_mutex), Replay, Blocked?
>>> 10/31149, false, false, not blocked
>>> 8/30998, false, false, __lll_lock_wait waiting on 0x555556142e60 <qemu_global_mutex> from 30949
>>> 7/30997, false, false, __lll_lock_wait waiting on 0x555556142e60 <qemu_global_mutex> from 30949
>>> 6/30996, false, false, not blocked
>>> 4/30955, false, false, not blocked
>>> 3/30954, false, false, __lll_lock_wait waiting on 0x7fffe89151e0 from 30949
>>> 2/30953, false, false, not blocked
>>> 1/30949, true, false, not blocked
>>>
>>>
>>> (gdb) thread apply 7 bt
>>> Thread 7 (Thread 0x7fff669ff700 (LWP 30997)):
>>> #0  0x00007ffff5d9729c in __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:103
>>> #1  0x00007ffff5d90714 in __GI___pthread_mutex_lock (mutex=mutex@entry=0x555556142e60 <qemu_global_mutex>) at ../nptl/pthread_mutex_lock.c:80
>>> #2  0x0000555555c65013 in qemu_mutex_lock_impl (mutex=0x555556142e60 <qemu_global_mutex>, file=0x555555c94008 "/home/dietmar/pve5-devel/mirror_qemu/exec.c", line=3089) at util/qemu-thread-posix.c:78
>>> #3  0x00005555558480ee in qemu_mutex_lock_iothread_impl (file=file@entry=0x555555c94008 "/home/dietmar/pve5-devel/mirror_qemu/exec.c", line=line@entry=3089)
>>>      at /home/dietmar/pve5-devel/mirror_qemu/cpus.c:1831
>>> #4  0x0000555555802425 in prepare_mmio_access (mr=<optimized out>, mr=<optimized out>) at /home/dietmar/pve5-devel/mirror_qemu/exec.c:3089
>>> #5  0x000055555580759f in flatview_read_continue
>>>      (fv=fv@entry=0x7fff5f9baf00, addr=addr@entry=375, attrs=..., ptr=ptr@entry=0x7ffff7fbe000, len=len@entry=1, addr1=<optimized out>, l=<optimized out>, mr=0x7fffe8e66e80)
>>>      at /home/dietmar/pve5-devel/mirror_qemu/exec.c:3188
>>> #6  0x0000555555807803 in flatview_read (fv=0x7fff5f9baf00, addr=375, attrs=..., buf=0x7ffff7fbe000, len=1) at /home/dietmar/pve5-devel/mirror_qemu/exec.c:3229
>>> #7  0x000055555580791b in address_space_read_full (as=<optimized out>, addr=<optimized out>, attrs=..., buf=<optimized out>, len=<optimized out>) at /home/dietmar/pve5-devel/mirror_qemu/exec.c:3242
>>> #8  0x0000555555807a25 in address_space_rw (as=<optimized out>, addr=addr@entry=375, attrs=..., attrs@entry=..., buf=<optimized out>, len=len@entry=1, is_write=is_write@entry=false)
>>>      at /home/dietmar/pve5-devel/mirror_qemu/exec.c:3270
>>> #9  0x0000555555865024 in kvm_handle_io (count=1, size=1, direction=<optimized out>, data=<optimized out>, attrs=..., port=375) at /home/dietmar/pve5-devel/mirror_qemu/accel/kvm/kvm-all.c:2140
>>> #10 0x0000555555865024 in kvm_cpu_exec (cpu=cpu@entry=0x7fffe89efb00) at /home/dietmar/pve5-devel/mirror_qemu/accel/kvm/kvm-all.c:2386
>>> #11 0x000055555584835e in qemu_kvm_cpu_thread_fn (arg=0x7fffe89efb00) at /home/dietmar/pve5-devel/mirror_qemu/cpus.c:1246
>>> #12 0x000055555584835e in qemu_kvm_cpu_thread_fn (arg=arg@entry=0x7fffe89efb00) at /home/dietmar/pve5-devel/mirror_qemu/cpus.c:1218
>>> #13 0x0000555555c64e1a in qemu_thread_start (args=<optimized out>) at util/qemu-thread-posix.c:519
>>> #14 0x00007ffff5d8dfa3 in start_thread (arg=<optimized out>) at pthread_create.c:486
>>> #15 0x00007ffff5cbe4cf in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
>>> [...]
>>
>>> So the pattern is:
>>>
>>> 1.) Main thread has the BQL
>>> 2.) Main thread calls bdrv_drained_begin(bs);
>>> 3.) CPU thread want to complete read prepare_mmio_access(), but needs BQL
>>> 4.) deadlock
>>>
>>> Any ideas how to prevent that?
>>
>> What's going on in the vcpu thread (thread 7) is probably not part of
>> the problem. It does have to wait for the main thread, but it would only
>> really be part of the deadlock if the main thread would also wait for
>> the vcpu thread, which I don't think it does.
> 
> Yes, I guess you are right. This read is triggered by the guest (not by bdrv_drained_begin).
> 


-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: bdrv_drained_begin deadlock with io-threads
  2020-03-31 14:53     ` Vladimir Sementsov-Ogievskiy
@ 2020-03-31 15:24       ` Dietmar Maurer
  2020-03-31 15:37         ` Kevin Wolf
  0 siblings, 1 reply; 30+ messages in thread
From: Dietmar Maurer @ 2020-03-31 15:24 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, Kevin Wolf
  Cc: Sergio Lopez, qemu-block, qemu-devel, Max Reitz, Stefan Hajnoczi, jsnow


> > How can I see/debug those waiting request?
> 
> Examine bs->tracked_requests list.
> 
> BdrvTrackedRequest has "Coroutine *co" field. It's a pointer of coroutine of this request. You may use qemu-gdb script to print request's coroutine back-trace:

I would, but there are no tracked request at all.

print bs->tracked_requests
$2 = {lh_first = 0x0}


> gdb> source qemu_source/scripts/qemu-gdb.py
> 
> gdb> qemu coroutine CO_POINTER
> 
> - this will show, what exactly the request is doing now/waiting for.

(gdb) up
#1  0x0000555555c60489 in ppoll (__ss=0x0, __timeout=0x0, __nfds=<optimized out>, __fds=<optimized out>) at /usr/include/x86_64-linux-gnu/bits/poll2.h:77
77	  return __ppoll_alias (__fds, __nfds, __timeout, __ss);
(gdb) up
#2  qemu_poll_ns (fds=<optimized out>, nfds=<optimized out>, timeout=timeout@entry=-1) at util/qemu-timer.c:335
335	        return ppoll((struct pollfd *)fds, nfds, NULL, NULL);
(gdb) up
#3  0x0000555555c62c71 in fdmon_poll_wait (ctx=0x7fffe8905e80, ready_list=0x7fffffffdce8, timeout=-1) at util/fdmon-poll.c:79
79	    ret = qemu_poll_ns(pollfds, npfd, timeout);
(gdb) up
#4  0x0000555555c62257 in aio_poll (ctx=0x7fffe8905e80, blocking=blocking@entry=true) at util/aio-posix.c:589
589	        ret = ctx->fdmon_ops->wait(ctx, &ready_list, timeout);
(gdb) up
#5  0x0000555555bc25e5 in bdrv_do_drained_begin (poll=<optimized out>, ignore_bds_parents=false, parent=0x0, recursive=false, bs=0x7fffe8954bc0)
    at block/io.c:430
430	        BDRV_POLL_WHILE(bs, bdrv_drain_poll_top_level(bs, recursive, parent));

(gdb) print *bs
$1 = {open_flags = 139426, read_only = false, encrypted = false, sg = false, probed = false, force_share = false, implicit = false, 
  drv = 0x555556115080 <bdrv_raw>, opaque = 0x7fffe8918400, aio_context = 0x7fffe8915180, aio_notifiers = {lh_first = 0x0}, walking_aio_notifiers = false, 
  filename = "/dev/pve/vm-101-disk-0", '\000' <repeats 4073 times>, backing_file = '\000' <repeats 4095 times>, 
  auto_backing_file = '\000' <repeats 4095 times>, backing_format = '\000' <repeats 15 times>, full_open_options = 0x7fffe562c000, 
  exact_filename = "/dev/pve/vm-101-disk-0", '\000' <repeats 4073 times>, backing = 0x0, file = 0x7fffe88e9b60, bl = {request_alignment = 1, 
    max_pdiscard = 0, pdiscard_alignment = 0, max_pwrite_zeroes = 0, pwrite_zeroes_alignment = 0, opt_transfer = 0, max_transfer = 0, 
    min_mem_alignment = 512, opt_mem_alignment = 4096, max_iov = 1024}, supported_write_flags = 64, supported_zero_flags = 324, 
  node_name = "#block163", '\000' <repeats 22 times>, node_list = {tqe_next = 0x7fffe8975180, tqe_circ = {tql_next = 0x7fffe8975180, 
      tql_prev = 0x7fffe8963540}}, bs_list = {tqe_next = 0x7fffe895f480, tqe_circ = {tql_next = 0x7fffe895f480, 
      tql_prev = 0x555556114f10 <all_bdrv_states>}}, monitor_list = {tqe_next = 0x0, tqe_circ = {tql_next = 0x0, tql_prev = 0x0}}, refcnt = 2, op_blockers = {
    {lh_first = 0x0} <repeats 16 times>}, inherits_from = 0x0, children = {lh_first = 0x7fffe88e9b60}, parents = {lh_first = 0x7fffe88ea180}, 
  options = 0x7fffe8933400, explicit_options = 0x7fffe8934800, detect_zeroes = BLOCKDEV_DETECT_ZEROES_OPTIONS_ON, backing_blocker = 0x0, 
  total_sectors = 67108864, before_write_notifiers = {notifiers = {lh_first = 0x0}}, write_threshold_offset = 0, write_threshold_notifier = {notify = 0x0, 
    node = {le_next = 0x0, le_prev = 0x0}}, dirty_bitmap_mutex = {lock = pthread_mutex_t = {Type = Normal, Status = Not acquired, Robust = No, Shared = No, 
      Protocol = None}, initialized = true}, dirty_bitmaps = {lh_first = 0x0}, wr_highest_offset = {value = 28049412096}, copy_on_read = 0, in_flight = 0, 
  serialising_in_flight = 0, io_plugged = 0, enable_write_cache = 0, quiesce_counter = 1, recursive_quiesce_counter = 0, write_gen = 113581, reqs_lock = {
    locked = 0, ctx = 0x0, from_push = {slh_first = 0x0}, to_pop = {slh_first = 0x0}, handoff = 0, sequence = 0, holder = 0x0}, tracked_requests = {
    lh_first = 0x0}, flush_queue = {entries = {sqh_first = 0x0, sqh_last = 0x7fffe8958e38}}, active_flush_req = false, flushed_gen = 112020, 
  never_freeze = false}


Looks bdrv_parent_drained_poll_single() calls  blk_root_drained_poll(), which return true in my case (in_flight > 5). Looks like I am loosing poll events somewhere?



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: bdrv_drained_begin deadlock with io-threads
  2020-03-31 15:24       ` Dietmar Maurer
@ 2020-03-31 15:37         ` Kevin Wolf
  2020-03-31 16:18           ` Dietmar Maurer
  0 siblings, 1 reply; 30+ messages in thread
From: Kevin Wolf @ 2020-03-31 15:37 UTC (permalink / raw)
  To: Dietmar Maurer
  Cc: Vladimir Sementsov-Ogievskiy, qemu-block, Sergio Lopez,
	qemu-devel, Max Reitz, Stefan Hajnoczi, jsnow

Am 31.03.2020 um 17:24 hat Dietmar Maurer geschrieben:
> 
> > > How can I see/debug those waiting request?
> > 
> > Examine bs->tracked_requests list.
> > 
> > BdrvTrackedRequest has "Coroutine *co" field. It's a pointer of coroutine of this request. You may use qemu-gdb script to print request's coroutine back-trace:
> 
> I would, but there are no tracked request at all.
> 
> print bs->tracked_requests
> $2 = {lh_first = 0x0}

Then it's one of the recursively checked parents, as you already figured
out.

> Looks bdrv_parent_drained_poll_single() calls
> blk_root_drained_poll(), which return true in my case (in_flight > 5).

Can you identify which BlockBackend is this? Specifically if it's the
one attached to a guest device or whether it belongs to the block job.

Maybe have a look at the job coroutine, too. You can probably easiest
find it in the 'jobs' list, and then print the coroutine backtrace for
job->co.

> Looks like I am loosing poll events somewhere?

I don't think we've lost any event if in_flight > 0. It means that
something is still supposedly active. Maybe the job deadlocked.

Kevin



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: bdrv_drained_begin deadlock with io-threads
  2020-03-31 15:37         ` Kevin Wolf
@ 2020-03-31 16:18           ` Dietmar Maurer
  2020-04-01 10:37             ` Kevin Wolf
  0 siblings, 1 reply; 30+ messages in thread
From: Dietmar Maurer @ 2020-03-31 16:18 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: Vladimir Sementsov-Ogievskiy, qemu-block, Sergio Lopez,
	qemu-devel, Max Reitz, Stefan Hajnoczi, jsnow


> On March 31, 2020 5:37 PM Kevin Wolf <kwolf@redhat.com> wrote:
> 
>  
> Am 31.03.2020 um 17:24 hat Dietmar Maurer geschrieben:
> > 
> > > > How can I see/debug those waiting request?
> > > 
> > > Examine bs->tracked_requests list.
> > > 
> > > BdrvTrackedRequest has "Coroutine *co" field. It's a pointer of coroutine of this request. You may use qemu-gdb script to print request's coroutine back-trace:
> > 
> > I would, but there are no tracked request at all.
> > 
> > print bs->tracked_requests
> > $2 = {lh_first = 0x0}
> 
> Then it's one of the recursively checked parents, as you already figured
> out.
> 
> > Looks bdrv_parent_drained_poll_single() calls
> > blk_root_drained_poll(), which return true in my case (in_flight > 5).
> 
> Can you identify which BlockBackend is this? Specifically if it's the
> one attached to a guest device or whether it belongs to the block job.

This can trigger from various different places, but the simplest case is when
its called from drive_backup_prepare 

>   bdrv_drained_begin(bs);

which is the backup source drive.

Note: I have added printf debug statement to the source, so the line numbers may not exactly match

#0  0x00007ffff5cb3916 in __GI_ppoll (fds=0x7fff63d30c40, nfds=2, timeout=<optimized out>, timeout@entry=0x0, sigmask=sigmask@entry=0x0)
    at ../sysdeps/unix/sysv/linux/ppoll.c:39
#1  0x0000555555c60479 in ppoll (__ss=0x0, __timeout=0x0, __nfds=<optimized out>, __fds=<optimized out>) at /usr/include/x86_64-linux-gnu/bits/poll2.h:77
#2  0x0000555555c60479 in qemu_poll_ns (fds=<optimized out>, nfds=<optimized out>, timeout=timeout@entry=-1) at util/qemu-timer.c:335
#3  0x0000555555c62c61 in fdmon_poll_wait (ctx=0x7fffe8905e80, ready_list=0x7fffffffda68, timeout=-1) at util/fdmon-poll.c:79
#4  0x0000555555c62247 in aio_poll (ctx=0x7fffe8905e80, blocking=blocking@entry=true) at util/aio-posix.c:589
#5  0x0000555555bc25d5 in bdrv_do_drained_begin (poll=<optimized out>, ignore_bds_parents=false, parent=0x0, recursive=false, bs=0x7fffe8954bc0)
    at block/io.c:430
#6  0x0000555555bc25d5 in bdrv_do_drained_begin
    (bs=0x7fffe8954bc0, recursive=<optimized out>, parent=0x0, ignore_bds_parents=<optimized out>, poll=<optimized out>) at block/io.c:395
#7  0x000055555595f753 in drive_backup_prepare (common=0x7fffe560c200, errp=0x7fffffffdba8) at blockdev.c:1759
#8  0x0000555555961f0a in qmp_transaction (dev_list=dev_list@entry=0x7fffffffdc20, has_props=has_props@entry=false, props=0x7fffe560e0e0, 
    props@entry=0x0, errp=errp@entry=0x7fffffffdc58) at blockdev.c:2406
#9  0x00005555559633c2 in blockdev_do_action (errp=0x7fffffffdc58, action=0x7fffffffdc10) at blockdev.c:1054
#10 0x00005555559633c2 in qmp_drive_backup (backup=backup@entry=0x7fffffffdc60, errp=errp@entry=0x7fffffffdc58) at blockdev.c:3134
#11 0x0000555555b1e1d1 in qmp_marshal_drive_backup (args=<optimized out>, ret=<optimized out>, errp=0x7fffffffdd38) at qapi/qapi-commands-block-core.c:555
#12 0x0000555555c15b08 in qmp_dispatch (cmds=0x555556162000 <qmp_commands>, request=<optimized out>, allow_oob=<optimized out>) at qapi/qmp-dispatch.c:155
#13 0x0000555555affca1 in monitor_qmp_dispatch (mon=0x7fffe8915d80, req=<optimized out>) at monitor/qmp.c:145
#14 0x0000555555b0048a in monitor_qmp_bh_dispatcher (data=<optimized out>) at monitor/qmp.c:234
#15 0x0000555555c5e9d5 in aio_bh_call (bh=0x7fffe8e8f960) at util/async.c:164
#16 0x0000555555c5e9d5 in aio_bh_poll (ctx=ctx@entry=0x7fffe8f3ef80) at util/async.c:164
#17 0x0000555555c6208e in aio_dispatch (ctx=0x7fffe8f3ef80) at util/aio-posix.c:380
#18 0x0000555555c5e8be in aio_ctx_dispatch (source=<optimized out>, callback=<optimized out>, user_data=<optimized out>) at util/async.c:298
#19 0x00007ffff7c7ef2e in g_main_context_dispatch () at /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0
#20 0x0000555555c612f8 in glib_pollfds_poll () at util/main-loop.c:219
#21 0x0000555555c612f8 in os_host_main_loop_wait (timeout=<optimized out>) at util/main-loop.c:242
#22 0x0000555555c612f8 in main_loop_wait (nonblocking=nonblocking@entry=0) at util/main-loop.c:518
#23 0x00005555558fc5a9 in qemu_main_loop () at /home/dietmar/pve5-devel/mirror_qemu/softmmu/vl.c:1665
#24 0x0000555555800c3e in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at /home/dietmar/pve5-devel/mirror_qemu/softmmu/main.c:49

> 
> Maybe have a look at the job coroutine, too. You can probably easiest
> find it in the 'jobs' list, and then print the coroutine backtrace for
> job->co.

There is in drive_backup_prepare(), before the job gets created.

> > Looks like I am loosing poll events somewhere?
> 
> I don't think we've lost any event if in_flight > 0. It means that
> something is still supposedly active. Maybe the job deadlocked.

This is a simple call to bdrv_drained_begin(bs) (before we setup the job).

I really nobody else able to reproduce this (somebody already tried to reproduce)?



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: bdrv_drained_begin deadlock with io-threads
  2020-03-31 16:18           ` Dietmar Maurer
@ 2020-04-01 10:37             ` Kevin Wolf
  2020-04-01 15:37               ` Dietmar Maurer
  0 siblings, 1 reply; 30+ messages in thread
From: Kevin Wolf @ 2020-04-01 10:37 UTC (permalink / raw)
  To: Dietmar Maurer
  Cc: Vladimir Sementsov-Ogievskiy, qemu-block, Sergio Lopez,
	qemu-devel, Max Reitz, Stefan Hajnoczi, jsnow

Am 31.03.2020 um 18:18 hat Dietmar Maurer geschrieben:
> > > Looks bdrv_parent_drained_poll_single() calls
> > > blk_root_drained_poll(), which return true in my case (in_flight > 5).
> > 
> > Can you identify which BlockBackend is this? Specifically if it's the
> > one attached to a guest device or whether it belongs to the block job.
> 
> This can trigger from various different places, but the simplest case is when
> its called from drive_backup_prepare 
> 
> >   bdrv_drained_begin(bs);
> 
> which is the backup source drive.

I mean the BlockBackend for which blk_root_drained_poll() is called.

> > Maybe have a look at the job coroutine, too. You can probably easiest
> > find it in the 'jobs' list, and then print the coroutine backtrace for
> > job->co.
> 
> There is in drive_backup_prepare(), before the job gets created.

Oh, I see. Then it can't be job BlockBackend, of course.

> > > Looks like I am loosing poll events somewhere?
> > 
> > I don't think we've lost any event if in_flight > 0. It means that
> > something is still supposedly active. Maybe the job deadlocked.
> 
> This is a simple call to bdrv_drained_begin(bs) (before we setup the job).
> 
> I really nobody else able to reproduce this (somebody already tried to reproduce)?

I can get hangs, but that's for job_completed(), not for starting the
job. Also, my hangs have a non-empty bs->tracked_requests, so it looks
like a different case to me.

In my case, the hanging requests looks like this:

(gdb) qemu coroutine 0x556e055750e0
#0  0x0000556e03999150 in qemu_coroutine_switch (from_=from_@entry=0x556e055750e0, to_=to_@entry=0x7fd34bbeb5b8, action=action@entry=COROUTINE_YIELD) at util/coroutine-ucontext.c:218
#1  0x0000556e03997e31 in qemu_coroutine_yield () at util/qemu-coroutine.c:193
#2  0x0000556e0397fc88 in thread_pool_submit_co (pool=0x7fd33c003120, func=func@entry=0x556e038d59a0 <handle_aiocb_rw>, arg=arg@entry=0x7fd2d2b96440) at util/thread-pool.c:289
#3  0x0000556e038d511d in raw_thread_pool_submit (bs=bs@entry=0x556e04e459b0, func=func@entry=0x556e038d59a0 <handle_aiocb_rw>, arg=arg@entry=0x7fd2d2b96440) at block/file-posix.c:1894
#4  0x0000556e038d58c3 in raw_co_prw (bs=0x556e04e459b0, offset=230957056, bytes=4096, qiov=0x7fd33c006fe0, type=1) at block/file-posix.c:1941

Checking the thread pool request:

(gdb) p *((ThreadPool*)0x7fd33c003120).head .lh_first
$9 = {common = {aiocb_info = 0x556e03f43f80 <thread_pool_aiocb_info>, bs = 0x0, cb = 0x556e0397f670 <thread_pool_co_cb>, opaque = 0x7fd2d2b96400, refcnt = 1}, pool = 0x7fd33c003120,
  func = 0x556e038d59a0 <handle_aiocb_rw>, arg = 0x7fd2d2b96440, state = THREAD_DONE, ret = 0, reqs = {tqe_next = 0x0, tqe_circ = {tql_next = 0x0, tql_prev = 0x0}}, all = {le_next = 0x0,
    le_prev = 0x7fd33c0031d0}}

So apparently the request is THREAD_DONE, but the coroutine was never
reentered. I saw one case where ctx.bh_list was empty, but I also have a
case where a BH sits there scheduled and apparently just doesn't get
run:

(gdb) p *((ThreadPool*)0x7fd33c003120).ctx.bh_list .slh_first
$13 = {ctx = 0x556e04e41a10, cb = 0x556e0397f8e0 <thread_pool_completion_bh>, opaque = 0x7fd33c003120, next = {sle_next = 0x0}, flags = 3}

Stefan, I wonder if this is related to the recent changes to the BH
implementation.

Kevin



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: bdrv_drained_begin deadlock with io-threads
  2020-04-01 10:37             ` Kevin Wolf
@ 2020-04-01 15:37               ` Dietmar Maurer
  2020-04-01 15:50                 ` Dietmar Maurer
  2020-04-01 18:12                 ` Kevin Wolf
  0 siblings, 2 replies; 30+ messages in thread
From: Dietmar Maurer @ 2020-04-01 15:37 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: Vladimir Sementsov-Ogievskiy, qemu-block, Sergio Lopez,
	qemu-devel, Max Reitz, Stefan Hajnoczi, jsnow

> > I really nobody else able to reproduce this (somebody already tried to reproduce)?
> 
> I can get hangs, but that's for job_completed(), not for starting the
> job. Also, my hangs have a non-empty bs->tracked_requests, so it looks
> like a different case to me.

Please can you post the command line args of your VM? I use something like

./x86_64-softmmu/qemu-system-x86_64 -chardev 'socket,id=qmp,path=/var/run/qemu-server/101.qmp,server,nowait' -mon 'chardev=qmp,mode=control' -pidfile /var/run/qemu-server/101.pid  -m 1024 -object 'iothread,id=iothread-virtioscsi0' -device 'virtio-scsi-pci,id=virtioscsi0,iothread=iothread-virtioscsi0' -drive 'file=/backup/disk3/debian-buster.raw,if=none,id=drive-scsi0,format=raw,cache=none,aio=native,detect-zeroes=on' -device 'scsi-hd,bus=virtioscsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0,id=scsi0' -machine "type=pc,accel=kvm"

Do you also run "stress-ng -d 5" indied the VM?



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: bdrv_drained_begin deadlock with io-threads
  2020-04-01 15:37               ` Dietmar Maurer
@ 2020-04-01 15:50                 ` Dietmar Maurer
  2020-04-01 18:12                 ` Kevin Wolf
  1 sibling, 0 replies; 30+ messages in thread
From: Dietmar Maurer @ 2020-04-01 15:50 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: Vladimir Sementsov-Ogievskiy, qemu-block, Sergio Lopez,
	qemu-devel, Max Reitz, Stefan Hajnoczi, jsnow


> On April 1, 2020 5:37 PM Dietmar Maurer <dietmar@proxmox.com> wrote:
> 
>  
> > > I really nobody else able to reproduce this (somebody already tried to reproduce)?
> > 
> > I can get hangs, but that's for job_completed(), not for starting the
> > job. Also, my hangs have a non-empty bs->tracked_requests, so it looks
> > like a different case to me.
> 
> Please can you post the command line args of your VM? I use something like
> 
> ./x86_64-softmmu/qemu-system-x86_64 -chardev 'socket,id=qmp,path=/var/run/qemu-server/101.qmp,server,nowait' -mon 'chardev=qmp,mode=control' -pidfile /var/run/qemu-server/101.pid  -m 1024 -object 'iothread,id=iothread-virtioscsi0' -device 'virtio-scsi-pci,id=virtioscsi0,iothread=iothread-virtioscsi0' -drive 'file=/backup/disk3/debian-buster.raw,if=none,id=drive-scsi0,format=raw,cache=none,aio=native,detect-zeroes=on' -device 'scsi-hd,bus=virtioscsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0,id=scsi0' -machine "type=pc,accel=kvm"

BTW, I get a segfault if I start above VM without "accel=kvm", 

gdb --args ./x86_64-softmmu/qemu-system-x86_64 -chardev 'socket,id=qmp,path=/var/run/qemu-server/101.qmp,server,nowait' -mon 'chardev=qmp,mode=control' -pidfile /var/run/qemu-server/101.pid  -m 1024 -object 'iothread,id=iothread-virtioscsi0' -device 'virtio-scsi-pci,id=virtioscsi0,iothread=iothread-virtioscsi0' -drive 'file=/backup/disk3/debian-buster.raw,if=none,id=drive-scsi0,format=raw,cache=none,aio=native,detect-zeroes=on' -device 'scsi-hd,bus=virtioscsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0,id=scsi0' -machine "type=pc"

after a few seconds:

Thread 3 "qemu-system-x86" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffe857e700 (LWP 22257)]
0x000055555587c130 in do_tb_phys_invalidate (tb=tb@entry=0x7fffa7b40500 <code_gen_buffer+3409107>, 
    rm_from_page_list=rm_from_page_list@entry=true)
    at /home/dietmar/pve5-devel/mirror_qemu/accel/tcg/translate-all.c:1483
1483	    atomic_set(&tcg_ctx->tb_phys_invalidate_count,
(gdb) bt
#0  0x000055555587c130 in do_tb_phys_invalidate
    (tb=tb@entry=0x7fffa7b40500 <code_gen_buffer+3409107>, rm_from_page_list=rm_from_page_list@entry=true)
    at /home/dietmar/pve5-devel/mirror_qemu/accel/tcg/translate-all.c:1483
#1  0x000055555587c53b in tb_phys_invalidate__locked (tb=0x7fffa7b40500 <code_gen_buffer+3409107>)
    at /home/dietmar/pve5-devel/mirror_qemu/accel/tcg/translate-all.c:1960
#2  0x000055555587c53b in tb_invalidate_phys_page_range__locked
    (pages=pages@entry=0x7fffe780d400, p=0x7fff651066a0, start=start@entry=1072709632, end=end@entry=1072713728, retaddr=retaddr@entry=0) at /home/dietmar/pve5-devel/mirror_qemu/accel/tcg/translate-all.c:1960
#3  0x000055555587dad1 in tb_invalidate_phys_range (start=1072709632, end=1072771072)
    at /home/dietmar/pve5-devel/mirror_qemu/accel/tcg/translate-all.c:2036
#4  0x0000555555801c12 in invalidate_and_set_dirty
    (mr=<optimized out>, addr=<optimized out>, length=<optimized out>)
    at /home/dietmar/pve5-devel/mirror_qemu/exec.c:3036
#5  0x00005555558072df in address_space_unmap
    (as=<optimized out>, buffer=<optimized out>, len=<optimized out>, is_write=<optimized out>, access_len=65536)
    at /home/dietmar/pve5-devel/mirror_qemu/exec.c:3571
#6  0x0000555555967ff6 in dma_memory_unmap
    (access_len=<optimized out>, dir=<optimized out>, len=<optimized out>, buffer=<optimized out>, as=<optimized out>) at /home/dietmar/pve5-devel/mirror_qemu/include/sysemu/dma.h:145
#7  0x0000555555967ff6 in dma_blk_unmap (dbs=dbs@entry=0x7fffe7839220) at dma-helpers.c:104
#8  0x0000555555968394 in dma_complete (ret=0, dbs=0x7fffe7839220) at dma-helpers.c:116
#9  0x0000555555968394 in dma_blk_cb (opaque=0x7fffe7839220, ret=0) at dma-helpers.c:136
#10 0x0000555555bac78e in blk_aio_complete (acb=0x7fffe783da00) at block/block-backend.c:1339
#11 0x0000555555c7280b in coroutine_trampoline (i0=<optimized out>, i1=<optimized out>)
    at util/coroutine-ucontext.c:115
#12 0x00007ffff6176b50 in __correctly_grouped_prefixwc
    (begin=0x7fffa7b40240 <code_gen_buffer+3408403> L"\x3ff0497b", end=0x12 <error: Cannot access memory at address 0x12>, thousands=0 L'\000', grouping=0x7fffa7b40590 <code_gen_buffer+3409251> "\001") at grouping.c:171
#13 0x0000000000000000 in  ()


It runs fine without iothreads.

But I guess this is a totally different problem?



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: bdrv_drained_begin deadlock with io-threads
  2020-04-01 15:37               ` Dietmar Maurer
  2020-04-01 15:50                 ` Dietmar Maurer
@ 2020-04-01 18:12                 ` Kevin Wolf
  2020-04-01 18:28                   ` Dietmar Maurer
                                     ` (2 more replies)
  1 sibling, 3 replies; 30+ messages in thread
From: Kevin Wolf @ 2020-04-01 18:12 UTC (permalink / raw)
  To: Dietmar Maurer
  Cc: Vladimir Sementsov-Ogievskiy, qemu-block, Sergio Lopez,
	qemu-devel, Max Reitz, Stefan Hajnoczi, jsnow

Am 01.04.2020 um 17:37 hat Dietmar Maurer geschrieben:
> > > I really nobody else able to reproduce this (somebody already tried to reproduce)?
> > 
> > I can get hangs, but that's for job_completed(), not for starting the
> > job. Also, my hangs have a non-empty bs->tracked_requests, so it looks
> > like a different case to me.
> 
> Please can you post the command line args of your VM? I use something like
> 
> ./x86_64-softmmu/qemu-system-x86_64 -chardev
> 'socket,id=qmp,path=/var/run/qemu-server/101.qmp,server,nowait' -mon
> 'chardev=qmp,mode=control' -pidfile /var/run/qemu-server/101.pid  -m
> 1024 -object 'iothread,id=iothread-virtioscsi0' -device
> 'virtio-scsi-pci,id=virtioscsi0,iothread=iothread-virtioscsi0' -drive
> 'file=/backup/disk3/debian-buster.raw,if=none,id=drive-scsi0,format=raw,cache=none,aio=native,detect-zeroes=on'
> -device
> 'scsi-hd,bus=virtioscsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0,id=scsi0'
> -machine "type=pc,accel=kvm"
> 
> Do you also run "stress-ng -d 5" indied the VM?

I'm not using the exact same test case, but something that I thought
would be similar enough. Specifically, I run the script below, which
boots from a RHEL 8 CD and in the rescue shell, I'll do 'dd if=/dev/zero
of=/dev/sda' while the script keeps starting and cancelling backup jobs
in the background.

Anyway, I finally managed to bisect my problem now (did it wrong the
first time) and got this result:

00e30f05de1d19586345ec373970ef4c192c6270 is the first bad commit
commit 00e30f05de1d19586345ec373970ef4c192c6270
Author: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Date:   Tue Oct 1 16:14:09 2019 +0300

    block/backup: use backup-top instead of write notifiers

    Drop write notifiers and use filter node instead.

    = Changes =

    1. Add filter-node-name argument for backup qmp api. We have to do it
    in this commit, as 257 needs to be fixed.

    2. There are no more write notifiers here, so is_write_notifier
    parameter is dropped from block-copy paths.

    3. To sync with in-flight requests at job finish we now have drained
    removing of the filter, we don't need rw-lock.

    4. Block-copy is now using BdrvChildren instead of BlockBackends

    5. As backup-top owns these children, we also move block-copy state
    into backup-top's ownership.

    [...]


That's a pretty big change, and I'm not sure how it's related to
completed requests hanging in the thread pool instead of reentering the
file-posix coroutine. But I also tested it enough that I'm confident
it's really the first bad commit.

Maybe you want to try if your problem starts at the same commit?

Kevin


#!/bin/bash

qmp() {
cat <<EOF
{'execute':'qmp_capabilities'}
EOF

while true; do
cat <<EOF
{ "execute": "drive-backup", "arguments": {
  "job-id":"drive_image1","device": "drive_image1", "sync": "full", "target": "/tmp/backup.raw" } }
EOF
sleep 1
cat <<EOF
{ "execute": "block-job-cancel", "arguments": { "device": "drive_image1"} }
EOF
sleep 2
done
}

./qemu-img create -f qcow2 /tmp/test.qcow2 4G
for i in $(seq 0 1); do echo "write ${i}G 1G"; done | ./qemu-io /tmp/test.qcow2

qmp | x86_64-softmmu/qemu-system-x86_64 \
    -enable-kvm \
    -machine pc \
    -m 1G \
    -object 'iothread,id=iothread-virtioscsi0' \
    -device 'virtio-scsi-pci,id=virtioscsi0,iothread=iothread-virtioscsi0' \
    -blockdev node-name=my_drive,driver=file,filename=/tmp/test.qcow2 \
    -blockdev driver=qcow2,node-name=drive_image1,file=my_drive \
    -device scsi-hd,drive=drive_image1,id=image1 \
    -cdrom ~/images/iso/RHEL-8.0-20190116.1-x86_64-dvd1.iso \
    -boot d \
    -qmp stdio -monitor vc



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: bdrv_drained_begin deadlock with io-threads
  2020-04-01 18:12                 ` Kevin Wolf
@ 2020-04-01 18:28                   ` Dietmar Maurer
  2020-04-01 18:44                     ` Kevin Wolf
  2020-04-01 18:35                   ` Kevin Wolf
  2020-04-02  9:21                   ` Dietmar Maurer
  2 siblings, 1 reply; 30+ messages in thread
From: Dietmar Maurer @ 2020-04-01 18:28 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: Vladimir Sementsov-Ogievskiy, qemu-block, Sergio Lopez,
	qemu-devel, Max Reitz, Stefan Hajnoczi, jsnow

> That's a pretty big change, and I'm not sure how it's related to
> completed requests hanging in the thread pool instead of reentering the
> file-posix coroutine. But I also tested it enough that I'm confident
> it's really the first bad commit.
> 
> Maybe you want to try if your problem starts at the same commit?

Stefan already found that by bisecting last week:

See: https://lists.gnu.org/archive/html/qemu-devel/2020-03/msg07629.html

But, IMHO the commit is not the reason for (my) bug - It just makes
it easier to trigger... I can see (my) bug sometimes with 4.1.1, although
I have no easy way to reproduce it reliable.

Also, Stefan sent some patches to the list to fix some of the problems.

https://lists.gnu.org/archive/html/qemu-devel/2020-04/msg00022.html

Does that fix your problem?

I will run further tests with your script, thanks.

> Kevin
> 
> 
> #!/bin/bash
> 
> qmp() {
> cat <<EOF
> {'execute':'qmp_capabilities'}
> EOF
> 
> while true; do
> cat <<EOF
> { "execute": "drive-backup", "arguments": {
>   "job-id":"drive_image1","device": "drive_image1", "sync": "full", "target": "/tmp/backup.raw" } }
> EOF
> sleep 1
> cat <<EOF
> { "execute": "block-job-cancel", "arguments": { "device": "drive_image1"} }
> EOF
> sleep 2
> done
> }
> 
> ./qemu-img create -f qcow2 /tmp/test.qcow2 4G
> for i in $(seq 0 1); do echo "write ${i}G 1G"; done | ./qemu-io /tmp/test.qcow2
> 
> qmp | x86_64-softmmu/qemu-system-x86_64 \
>     -enable-kvm \
>     -machine pc \
>     -m 1G \
>     -object 'iothread,id=iothread-virtioscsi0' \
>     -device 'virtio-scsi-pci,id=virtioscsi0,iothread=iothread-virtioscsi0' \
>     -blockdev node-name=my_drive,driver=file,filename=/tmp/test.qcow2 \
>     -blockdev driver=qcow2,node-name=drive_image1,file=my_drive \
>     -device scsi-hd,drive=drive_image1,id=image1 \
>     -cdrom ~/images/iso/RHEL-8.0-20190116.1-x86_64-dvd1.iso \
>     -boot d \
>     -qmp stdio -monitor vc



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: bdrv_drained_begin deadlock with io-threads
  2020-04-01 18:12                 ` Kevin Wolf
  2020-04-01 18:28                   ` Dietmar Maurer
@ 2020-04-01 18:35                   ` Kevin Wolf
  2020-04-02  9:21                   ` Dietmar Maurer
  2 siblings, 0 replies; 30+ messages in thread
From: Kevin Wolf @ 2020-04-01 18:35 UTC (permalink / raw)
  To: Dietmar Maurer
  Cc: Vladimir Sementsov-Ogievskiy, Stefan Hajnoczi, qemu-devel,
	qemu-block, Max Reitz

Am 01.04.2020 um 20:12 hat Kevin Wolf geschrieben:
> Am 01.04.2020 um 17:37 hat Dietmar Maurer geschrieben:
> > > > I really nobody else able to reproduce this (somebody already tried to reproduce)?
> > > 
> > > I can get hangs, but that's for job_completed(), not for starting the
> > > job. Also, my hangs have a non-empty bs->tracked_requests, so it looks
> > > like a different case to me.
> > 
> > Please can you post the command line args of your VM? I use something like
> > 
> > ./x86_64-softmmu/qemu-system-x86_64 -chardev
> > 'socket,id=qmp,path=/var/run/qemu-server/101.qmp,server,nowait' -mon
> > 'chardev=qmp,mode=control' -pidfile /var/run/qemu-server/101.pid  -m
> > 1024 -object 'iothread,id=iothread-virtioscsi0' -device
> > 'virtio-scsi-pci,id=virtioscsi0,iothread=iothread-virtioscsi0' -drive
> > 'file=/backup/disk3/debian-buster.raw,if=none,id=drive-scsi0,format=raw,cache=none,aio=native,detect-zeroes=on'
> > -device
> > 'scsi-hd,bus=virtioscsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0,id=scsi0'
> > -machine "type=pc,accel=kvm"
> > 
> > Do you also run "stress-ng -d 5" indied the VM?
> 
> I'm not using the exact same test case, but something that I thought
> would be similar enough. Specifically, I run the script below, which
> boots from a RHEL 8 CD and in the rescue shell, I'll do 'dd if=/dev/zero
> of=/dev/sda' while the script keeps starting and cancelling backup jobs
> in the background.
> 
> Anyway, I finally managed to bisect my problem now (did it wrong the
> first time) and got this result:
> [...]

So back on current git master, my deadlock is between main thread and
the iothread. For some reason, the main thread holds the thread pool
mutex of the iothread's thread pool. This means that the iothread can't
complete its requests and the drain operation in the main thread can't
make progress.

I think there is no reason why the main thread should ever take the
mutex of the thread pool of a different thread, so I'm not sure. But
maybe that backup commit changed something in the way nodes are moved
between AioContexts that would cause this to happen.

Kevin


Thread 3 (Thread 0x7f53c7438700 (LWP 3967)):
#0  0x00007f53d23449ad in __lll_lock_wait () at /lib64/libpthread.so.0
#1  0x00007f53d233dd94 in pthread_mutex_lock () at /lib64/libpthread.so.0
#2  0x000055dcc331bdb3 in qemu_mutex_lock_impl (mutex=0x55dcc4ff1c80, file=0x55dcc3512bff "util/async.c", line=557) at util/qemu-thread-posix.c:78
#3  0x000055dcc33167ae in thread_pool_completion_bh (opaque=0x7f53b8003120) at util/thread-pool.c:167
#4  0x000055dcc331597e in aio_bh_call (bh=0x55dcc5b94680) at util/async.c:117
#5  0x000055dcc331597e in aio_bh_poll (ctx=ctx@entry=0x55dcc4ff1c20) at util/async.c:117
#6  0x000055dcc3318ee7 in aio_poll (ctx=0x55dcc4ff1c20, blocking=blocking@entry=true) at util/aio-posix.c:638
#7  0x000055dcc2ff7df0 in iothread_run (opaque=opaque@entry=0x55dcc4cfdac0) at iothread.c:75
#8  0x000055dcc331bbba in qemu_thread_start (args=<optimized out>) at util/qemu-thread-posix.c:519
#9  0x00007f53d233b4aa in start_thread () at /lib64/libpthread.so.0
#10 0x00007f53d226b3f3 in clone () at /lib64/libc.so.6

Thread 1 (Thread 0x7f53c7dab680 (LWP 3962)):
#0  0x00007f53d2260526 in ppoll () at /lib64/libc.so.6
#1  0x000055dcc33171c9 in ppoll (__ss=0x0, __timeout=0x0, __nfds=<optimized out>, __fds=<optimized out>) at /usr/include/bits/poll2.h:77
#2  0x000055dcc33171c9 in qemu_poll_ns (fds=<optimized out>, nfds=<optimized out>, timeout=timeout@entry=-1) at util/qemu-timer.c:335
#3  0x000055dcc33199a1 in fdmon_poll_wait (ctx=0x55dcc4f78470, ready_list=0x7fffb3506768, timeout=-1) at util/fdmon-poll.c:79
#4  0x000055dcc3318f87 in aio_poll (ctx=0x55dcc4f78470, blocking=blocking@entry=true) at util/aio-posix.c:589
#5  0x000055dcc3276763 in bdrv_do_drained_begin (poll=<optimized out>, ignore_bds_parents=false, parent=0x0, recursive=false, bs=0x55dcc5b66010) at block/io.c:429
#6  0x000055dcc3276763 in bdrv_do_drained_begin (bs=0x55dcc5b66010, recursive=<optimized out>, parent=0x0, ignore_bds_parents=<optimized out>, poll=<optimized out>) at block/io.c:395
#7  0x000055dcc3291422 in bdrv_backup_top_drop (bs=0x55dcc5b66010) at block/backup-top.c:273
#8  0x000055dcc328cb4c in backup_clean (job=0x55dcc64ab800) at block/backup.c:132
#9  0x000055dcc322324d in job_clean (job=0x55dcc64ab800) at job.c:656
#10 0x000055dcc322324d in job_finalize_single (job=0x55dcc64ab800) at job.c:672
#11 0x000055dcc322324d in job_finalize_single (job=0x55dcc64ab800) at job.c:660
#12 0x000055dcc3223baa in job_completed_txn_abort (job=<optimized out>) at job.c:748
#13 0x000055dcc3223db2 in job_completed (job=0x55dcc64ab800) at job.c:842
#14 0x000055dcc3223db2 in job_completed (job=0x55dcc64ab800) at job.c:835
#15 0x000055dcc3223f60 in job_exit (opaque=0x55dcc64ab800) at job.c:863
#16 0x000055dcc331597e in aio_bh_call (bh=0x7f53b8010eb0) at util/async.c:117
#17 0x000055dcc331597e in aio_bh_poll (ctx=ctx@entry=0x55dcc4f78470) at util/async.c:117
#18 0x000055dcc3318dde in aio_dispatch (ctx=0x55dcc4f78470) at util/aio-posix.c:380
#19 0x000055dcc331585e in aio_ctx_dispatch (source=<optimized out>, callback=<optimized out>, user_data=<optimized out>) at util/async.c:260
#20 0x00007f53d4ccd06d in g_main_context_dispatch () at /lib64/libglib-2.0.so.0
#21 0x000055dcc3318048 in glib_pollfds_poll () at util/main-loop.c:219
#22 0x000055dcc3318048 in os_host_main_loop_wait (timeout=<optimized out>) at util/main-loop.c:242
#23 0x000055dcc3318048 in main_loop_wait (nonblocking=nonblocking@entry=0) at util/main-loop.c:518
#24 0x000055dcc2f8e3f9 in qemu_main_loop () at /home/kwolf/source/qemu/softmmu/vl.c:1665
#25 0x000055dcc2e7793e in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at /home/kwolf/source/qemu/softmmu/main.c:49

(gdb) thread 3
[Switching to thread 3 (Thread 0x7f53c7438700 (LWP 3967))]
#0  0x00007f53d23449ad in __lll_lock_wait () from /lib64/libpthread.so.0
(gdb) up
#1  0x00007f53d233dd94 in pthread_mutex_lock () from /lib64/libpthread.so.0
(gdb)
#2  0x000055dcc331bdb3 in qemu_mutex_lock_impl (mutex=0x55dcc4ff1c80, file=0x55dcc3512bff "util/async.c", line=557) at util/qemu-thread-posix.c:78
78          err = pthread_mutex_lock(&mutex->lock);
(gdb) p *mutex
$1 = {lock = {__data = {__lock = 2, __count = 1, __owner = 3962, __nusers = 1, __kind = 1, __spins = 0, __elision = 0, __list = {__prev = 0x0, __next = 0x0}},
    __size = "\002\000\000\000\001\000\000\000z\017\000\000\001\000\000\000\001", '\000' <repeats 22 times>, __align = 4294967298}, initialized = true}
(gdb) info threads
  Id   Target Id                                          Frame
  1    Thread 0x7f53c7dab680 (LWP 3962) "qemu-system-x86" 0x00007f53d2260526 in ppoll () from /lib64/libc.so.6
  2    Thread 0x7f53c7c39700 (LWP 3966) "qemu-system-x86" 0x00007f53d2265c4d in syscall () from /lib64/libc.so.6
* 3    Thread 0x7f53c7438700 (LWP 3967) "qemu-system-x86" 0x00007f53d23449ad in __lll_lock_wait () from /lib64/libpthread.so.0
  4    Thread 0x7f53c61b2700 (LWP 3970) "qemu-system-x86" 0x00007f53d2260431 in poll () from /lib64/libc.so.6
  5    Thread 0x7f53c59b1700 (LWP 3972) "qemu-system-x86" 0x00007f53d23449ad in __lll_lock_wait () from /lib64/libpthread.so.0
  6    Thread 0x7f53c49ff700 (LWP 3974) "gmain"           0x00007f53d2260431 in poll () from /lib64/libc.so.6
  7    Thread 0x7f536ebff700 (LWP 3975) "gdbus"           0x00007f53d2260431 in poll () from /lib64/libc.so.6



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: bdrv_drained_begin deadlock with io-threads
  2020-04-01 18:28                   ` Dietmar Maurer
@ 2020-04-01 18:44                     ` Kevin Wolf
  2020-04-02  6:48                       ` Dietmar Maurer
  2020-04-02  9:10                       ` Dietmar Maurer
  0 siblings, 2 replies; 30+ messages in thread
From: Kevin Wolf @ 2020-04-01 18:44 UTC (permalink / raw)
  To: Dietmar Maurer
  Cc: Vladimir Sementsov-Ogievskiy, qemu-block, Sergio Lopez,
	qemu-devel, Max Reitz, Stefan Hajnoczi, jsnow

Am 01.04.2020 um 20:28 hat Dietmar Maurer geschrieben:
> > That's a pretty big change, and I'm not sure how it's related to
> > completed requests hanging in the thread pool instead of reentering the
> > file-posix coroutine. But I also tested it enough that I'm confident
> > it's really the first bad commit.
> > 
> > Maybe you want to try if your problem starts at the same commit?
> 
> Stefan already found that by bisecting last week:
> 
> See: https://lists.gnu.org/archive/html/qemu-devel/2020-03/msg07629.html
> 
> But, IMHO the commit is not the reason for (my) bug - It just makes
> it easier to trigger... I can see (my) bug sometimes with 4.1.1, although
> I have no easy way to reproduce it reliable.
> 
> Also, Stefan sent some patches to the list to fix some of the problems.
> 
> https://lists.gnu.org/archive/html/qemu-devel/2020-04/msg00022.html
> 
> Does that fix your problem?

It seems to fix it, yes. Now I don't get any hangs any more. (Also, I
guess this means that this day was essentially wasted because I worked
on a problem that already has a fix... *sigh*)

Kevin



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: bdrv_drained_begin deadlock with io-threads
  2020-04-01 18:44                     ` Kevin Wolf
@ 2020-04-02  6:48                       ` Dietmar Maurer
  2020-04-02  9:10                       ` Dietmar Maurer
  1 sibling, 0 replies; 30+ messages in thread
From: Dietmar Maurer @ 2020-04-02  6:48 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: Vladimir Sementsov-Ogievskiy, qemu-block, Sergio Lopez,
	qemu-devel, Max Reitz, Stefan Hajnoczi, jsnow

> > But, IMHO the commit is not the reason for (my) bug - It just makes
> > it easier to trigger... I can see (my) bug sometimes with 4.1.1, although
> > I have no easy way to reproduce it reliable.
> > 
> > Also, Stefan sent some patches to the list to fix some of the problems.
> > 
> > https://lists.gnu.org/archive/html/qemu-devel/2020-04/msg00022.html
> > 
> > Does that fix your problem?
> 
> It seems to fix it, yes. Now I don't get any hangs any more. (Also, I
> guess this means that this day was essentially wasted because I worked
> on a problem that already has a fix... *sigh*)

It is not wasted, because we now know that you can't reproduce the bug we observe here. 

We are hunting those AIO bugs for about three weeks now, and it looks we still have
not found all of them ...

Also, it would we great if somebody reviews the patches from Stefan. It would be great to
get them in before 5.0.0.



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: bdrv_drained_begin deadlock with io-threads
  2020-04-01 18:44                     ` Kevin Wolf
  2020-04-02  6:48                       ` Dietmar Maurer
@ 2020-04-02  9:10                       ` Dietmar Maurer
  2020-04-02 12:14                         ` Kevin Wolf
  1 sibling, 1 reply; 30+ messages in thread
From: Dietmar Maurer @ 2020-04-02  9:10 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: Vladimir Sementsov-Ogievskiy, qemu-block, Sergio Lopez,
	qemu-devel, Max Reitz, Stefan Hajnoczi, jsnow

> It seems to fix it, yes. Now I don't get any hangs any more. 

I just tested using your configuration, and a recent centos8 image
running dd loop inside it:

# while dd if=/dev/urandom of=testfile.raw bs=1M count=100; do sync; done

With that, I am unable to trigger the bug.

Would you mind running the test using a Debian Buster image running "stress-ng -d 5" inside?
I (and to other people here) can trigger the bug quite reliable with that.

On Debian, you can easily install stress-ng using apt:

# apt update
# apt install stress-ng

Seems stress-ng uses a different write pattern which can trigger the bug 
more reliable.



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: bdrv_drained_begin deadlock with io-threads
  2020-04-01 18:12                 ` Kevin Wolf
  2020-04-01 18:28                   ` Dietmar Maurer
  2020-04-01 18:35                   ` Kevin Wolf
@ 2020-04-02  9:21                   ` Dietmar Maurer
  2 siblings, 0 replies; 30+ messages in thread
From: Dietmar Maurer @ 2020-04-02  9:21 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: Vladimir Sementsov-Ogievskiy, qemu-block, Sergio Lopez,
	qemu-devel, Max Reitz, Stefan Hajnoczi, jsnow

> > Do you also run "stress-ng -d 5" indied the VM?
> 
> I'm not using the exact same test case, but something that I thought
> would be similar enough. Specifically, I run the script below, which
> boots from a RHEL 8 CD and in the rescue shell, I'll do 'dd if=/dev/zero
> of=/dev/sda' 

This test does not trigger the bug - you need to run "stress-ng -d 5" inside the VM.



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: bdrv_drained_begin deadlock with io-threads
  2020-04-02  9:10                       ` Dietmar Maurer
@ 2020-04-02 12:14                         ` Kevin Wolf
  2020-04-02 14:25                           ` Kevin Wolf
  0 siblings, 1 reply; 30+ messages in thread
From: Kevin Wolf @ 2020-04-02 12:14 UTC (permalink / raw)
  To: Dietmar Maurer
  Cc: Vladimir Sementsov-Ogievskiy, qemu-block, Sergio Lopez,
	qemu-devel, Max Reitz, Stefan Hajnoczi, jsnow

Am 02.04.2020 um 11:10 hat Dietmar Maurer geschrieben:
> > It seems to fix it, yes. Now I don't get any hangs any more. 
> 
> I just tested using your configuration, and a recent centos8 image
> running dd loop inside it:
> 
> # while dd if=/dev/urandom of=testfile.raw bs=1M count=100; do sync; done
> 
> With that, I am unable to trigger the bug.
> 
> Would you mind running the test using a Debian Buster image running "stress-ng -d 5" inside?
> I (and to other people here) can trigger the bug quite reliable with that.
> 
> On Debian, you can easily install stress-ng using apt:
> 
> # apt update
> # apt install stress-ng
> 
> Seems stress-ng uses a different write pattern which can trigger the bug 
> more reliable.

I was going to, just give me some time...

Kevin



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: bdrv_drained_begin deadlock with io-threads
  2020-04-02 12:14                         ` Kevin Wolf
@ 2020-04-02 14:25                           ` Kevin Wolf
  2020-04-02 15:40                             ` Dietmar Maurer
  2020-04-02 15:44                             ` Dietmar Maurer
  0 siblings, 2 replies; 30+ messages in thread
From: Kevin Wolf @ 2020-04-02 14:25 UTC (permalink / raw)
  To: Dietmar Maurer
  Cc: Vladimir Sementsov-Ogievskiy, Stefan Hajnoczi, qemu-devel,
	qemu-block, Max Reitz

Am 02.04.2020 um 14:14 hat Kevin Wolf geschrieben:
> Am 02.04.2020 um 11:10 hat Dietmar Maurer geschrieben:
> > > It seems to fix it, yes. Now I don't get any hangs any more. 
> > 
> > I just tested using your configuration, and a recent centos8 image
> > running dd loop inside it:
> > 
> > # while dd if=/dev/urandom of=testfile.raw bs=1M count=100; do sync; done
> > 
> > With that, I am unable to trigger the bug.
> > 
> > Would you mind running the test using a Debian Buster image running "stress-ng -d 5" inside?
> > I (and to other people here) can trigger the bug quite reliable with that.
> > 
> > On Debian, you can easily install stress-ng using apt:
> > 
> > # apt update
> > # apt install stress-ng
> > 
> > Seems stress-ng uses a different write pattern which can trigger the bug 
> > more reliable.
> 
> I was going to, just give me some time...

Can you reproduce the problem with my script, but pointing it to your
Debian image and running stress-ng instead of dd? If so, how long does
it take to reproduce for you?

I was just going to write that I can't reproduce in my first attempt
(which is still with the image on tmpfs as in my script, and therefore
without O_DIRECT or Linux AIO) when it finally did hang. However, this
is still while completing a job, not while starting it:

(gdb) bt
#0  0x00007f8b6b4e9526 in ppoll () at /lib64/libc.so.6
#1  0x00005619fc090919 in ppoll (__ss=0x0, __timeout=0x0, __nfds=<optimized out>, __fds=<optimized out>) at /usr/include/bits/poll2.h:77
#2  0x00005619fc090919 in qemu_poll_ns (fds=<optimized out>, nfds=<optimized out>, timeout=timeout@entry=-1) at util/qemu-timer.c:335
#3  0x00005619fc0930f1 in fdmon_poll_wait (ctx=0x5619fe79ae00, ready_list=0x7fff4006cf58, timeout=-1) at util/fdmon-poll.c:79
#4  0x00005619fc0926d7 in aio_poll (ctx=0x5619fe79ae00, blocking=blocking@entry=true) at util/aio-posix.c:589
#5  0x00005619fbfefd83 in bdrv_do_drained_begin (poll=<optimized out>, ignore_bds_parents=false, parent=0x0, recursive=false, bs=0x5619fe81e490) at block/io.c:429
#6  0x00005619fbfefd83 in bdrv_do_drained_begin (bs=0x5619fe81e490, recursive=<optimized out>, parent=0x0, ignore_bds_parents=<optimized out>, poll=<optimized out>) at block/io.c:395
#7  0x00005619fbfe0ce7 in blk_drain (blk=0x5619ffd35c00) at block/block-backend.c:1617
#8  0x00005619fbfe18cd in blk_unref (blk=0x5619ffd35c00) at block/block-backend.c:473
#9  0x00005619fbf9b185 in block_job_free (job=0x5619ffd0b800) at blockjob.c:89
#10 0x00005619fbf9c769 in job_unref (job=0x5619ffd0b800) at job.c:378
#11 0x00005619fbf9c769 in job_unref (job=0x5619ffd0b800) at job.c:370
#12 0x00005619fbf9d57d in job_exit (opaque=0x5619ffd0b800) at job.c:892
#13 0x00005619fc08eea5 in aio_bh_call (bh=0x7f8b5406f410) at util/async.c:164
#14 0x00005619fc08eea5 in aio_bh_poll (ctx=ctx@entry=0x5619fe79ae00) at util/async.c:164
#15 0x00005619fc09252e in aio_dispatch (ctx=0x5619fe79ae00) at util/aio-posix.c:380
#16 0x00005619fc08ed8e in aio_ctx_dispatch (source=<optimized out>, callback=<optimized out>, user_data=<optimized out>) at util/async.c:298
#17 0x00007f8b6df5606d in g_main_context_dispatch () at /lib64/libglib-2.0.so.0
#18 0x00005619fc091798 in glib_pollfds_poll () at util/main-loop.c:219
#19 0x00005619fc091798 in os_host_main_loop_wait (timeout=<optimized out>) at util/main-loop.c:242
#20 0x00005619fc091798 in main_loop_wait (nonblocking=nonblocking@entry=0) at util/main-loop.c:518
#21 0x00005619fbd07559 in qemu_main_loop () at /home/kwolf/source/qemu/softmmu/vl.c:1664
#22 0x00005619fbbf093e in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at /home/kwolf/source/qemu/softmmu/main.c:49

It does looks more like your case because I now have bs.in_flight == 0
and the BlockBackend of the scsi-hd device has in_flight == 8. Of
course, this still doesn't answer why it happens, and I'm not sure if we
can tell without adding some debug code.

I'm testing on my current block branch with Stefan's fixes on top.

Kevin



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: bdrv_drained_begin deadlock with io-threads
  2020-04-02 14:25                           ` Kevin Wolf
@ 2020-04-02 15:40                             ` Dietmar Maurer
  2020-04-02 16:47                               ` Kevin Wolf
  2020-04-02 15:44                             ` Dietmar Maurer
  1 sibling, 1 reply; 30+ messages in thread
From: Dietmar Maurer @ 2020-04-02 15:40 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: Vladimir Sementsov-Ogievskiy, Stefan Hajnoczi, qemu-devel,
	qemu-block, Max Reitz

> Can you reproduce the problem with my script, but pointing it to your
> Debian image and running stress-ng instead of dd? 

yes

> If so, how long does
> it take to reproduce for you?

I sometimes need up to 130 iterations ...

Worse, I thought several times the bug is gone, but then it triggered again (sigh).

But most times below 30 iteration (if you run "stress-ng -d 5").

Also note that it can happen at different places, but always inside a drained section ...



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: bdrv_drained_begin deadlock with io-threads
  2020-04-02 14:25                           ` Kevin Wolf
  2020-04-02 15:40                             ` Dietmar Maurer
@ 2020-04-02 15:44                             ` Dietmar Maurer
  1 sibling, 0 replies; 30+ messages in thread
From: Dietmar Maurer @ 2020-04-02 15:44 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: Vladimir Sementsov-Ogievskiy, Stefan Hajnoczi, qemu-devel,
	qemu-block, Max Reitz

> It does looks more like your case because I now have bs.in_flight == 0
> and the BlockBackend of the scsi-hd device has in_flight == 8. 

yes, this looks very familiar.

> Of course, this still doesn't answer why it happens, and I'm not sure if we
> can tell without adding some debug code.
> 
> I'm testing on my current block branch with Stefan's fixes on top.

Note: I can trigger it with and without Stefans patches on top



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: bdrv_drained_begin deadlock with io-threads
  2020-04-02 15:40                             ` Dietmar Maurer
@ 2020-04-02 16:47                               ` Kevin Wolf
  2020-04-02 17:10                                 ` Kevin Wolf
  0 siblings, 1 reply; 30+ messages in thread
From: Kevin Wolf @ 2020-04-02 16:47 UTC (permalink / raw)
  To: Dietmar Maurer
  Cc: Vladimir Sementsov-Ogievskiy, Stefan Hajnoczi, qemu-devel,
	qemu-block, Max Reitz

Am 02.04.2020 um 17:40 hat Dietmar Maurer geschrieben:
> > Can you reproduce the problem with my script, but pointing it to your
> > Debian image and running stress-ng instead of dd? 
> 
> yes
> 
> > If so, how long does
> > it take to reproduce for you?
> 
> I sometimes need up to 130 iterations ...
> 
> Worse, I thought several times the bug is gone, but then it triggered again (sigh).
> 
> But most times below 30 iteration (if you run "stress-ng -d 5").
> 
> Also note that it can happen at different places, but always inside a drained section ...

I got a stracktrace of a hanging coroutine:

(gdb) qemu coroutine 0x7fd8c00132c0
#0  0x00005630e16e9840 in qemu_coroutine_switch (from_=from_@entry=0x7fd8c00132c0, to_=to_@entry=0x7fd8cda865b8, action=action@entry=COROUTINE_YIELD) at util/coroutine-ucontext.c:218
#1  0x00005630e16e8521 in qemu_coroutine_yield () at util/qemu-coroutine.c:193
#2  0x00005630e16e8ba5 in qemu_co_queue_wait_impl (queue=queue@entry=0x5630e48ab580, lock=lock@entry=0x0) at util/qemu-coroutine-lock.c:56
#3  0x00005630e161f1ae in blk_wait_while_drained (blk=blk@entry=0x5630e48ab260) at /home/kwolf/source/qemu/include/qemu/lockable.h:46
#4  0x00005630e1620600 in blk_wait_while_drained (blk=0x5630e48ab260) at block/block-backend.c:1189
#5  0x00005630e1620600 in blk_co_pwritev_part (blk=0x5630e48ab260, offset=2922381312, bytes=856064, qiov=qiov@entry=0x7fd8c002cd70, qiov_offset=qiov_offset@entry=0, flags=0)
    at block/block-backend.c:1189
#6  0x00005630e16207ce in blk_co_pwritev (flags=<optimized out>, qiov=0x7fd8c002cd70, bytes=<optimized out>, offset=<optimized out>, blk=<optimized out>) at block/block-backend.c:1429
#7  0x00005630e16207ce in blk_aio_write_entry (opaque=0x7fd8c002cdc0) at block/block-backend.c:1429
#8  0x00005630e16e98bb in coroutine_trampoline (i0=<optimized out>, i1=<optimized out>) at util/coroutine-ucontext.c:115

So I think this is the bug: Calling blk_wait_while_drained() from
anywhere between blk_inc_in_flight() and blk_dec_in_flight() is wrong
because it will deadlock the drain operation.

blk_aio_read/write_entry() take care of this and drop their reference
around blk_wait_while_drained(). But if we hit the race condition that
drain hasn't yet started there, but it has when we get to
blk_co_preadv() or blk_co_pwritev_part(), then we're in a buggy code
path.

Kevin



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: bdrv_drained_begin deadlock with io-threads
  2020-04-02 16:47                               ` Kevin Wolf
@ 2020-04-02 17:10                                 ` Kevin Wolf
  2020-04-03  6:48                                   ` Thomas Lamprecht
  2020-04-03  8:26                                   ` Dietmar Maurer
  0 siblings, 2 replies; 30+ messages in thread
From: Kevin Wolf @ 2020-04-02 17:10 UTC (permalink / raw)
  To: Dietmar Maurer
  Cc: qemu-block, Vladimir Sementsov-Ogievskiy, qemu-devel,
	Stefan Hajnoczi, Max Reitz

Am 02.04.2020 um 18:47 hat Kevin Wolf geschrieben:
> Am 02.04.2020 um 17:40 hat Dietmar Maurer geschrieben:
> > > Can you reproduce the problem with my script, but pointing it to your
> > > Debian image and running stress-ng instead of dd? 
> > 
> > yes
> > 
> > > If so, how long does
> > > it take to reproduce for you?
> > 
> > I sometimes need up to 130 iterations ...
> > 
> > Worse, I thought several times the bug is gone, but then it triggered again (sigh).
> > 
> > But most times below 30 iteration (if you run "stress-ng -d 5").
> > 
> > Also note that it can happen at different places, but always inside a drained section ...
> 
> I got a stracktrace of a hanging coroutine:
> 
> (gdb) qemu coroutine 0x7fd8c00132c0
> #0  0x00005630e16e9840 in qemu_coroutine_switch (from_=from_@entry=0x7fd8c00132c0, to_=to_@entry=0x7fd8cda865b8, action=action@entry=COROUTINE_YIELD) at util/coroutine-ucontext.c:218
> #1  0x00005630e16e8521 in qemu_coroutine_yield () at util/qemu-coroutine.c:193
> #2  0x00005630e16e8ba5 in qemu_co_queue_wait_impl (queue=queue@entry=0x5630e48ab580, lock=lock@entry=0x0) at util/qemu-coroutine-lock.c:56
> #3  0x00005630e161f1ae in blk_wait_while_drained (blk=blk@entry=0x5630e48ab260) at /home/kwolf/source/qemu/include/qemu/lockable.h:46
> #4  0x00005630e1620600 in blk_wait_while_drained (blk=0x5630e48ab260) at block/block-backend.c:1189
> #5  0x00005630e1620600 in blk_co_pwritev_part (blk=0x5630e48ab260, offset=2922381312, bytes=856064, qiov=qiov@entry=0x7fd8c002cd70, qiov_offset=qiov_offset@entry=0, flags=0)
>     at block/block-backend.c:1189
> #6  0x00005630e16207ce in blk_co_pwritev (flags=<optimized out>, qiov=0x7fd8c002cd70, bytes=<optimized out>, offset=<optimized out>, blk=<optimized out>) at block/block-backend.c:1429
> #7  0x00005630e16207ce in blk_aio_write_entry (opaque=0x7fd8c002cdc0) at block/block-backend.c:1429
> #8  0x00005630e16e98bb in coroutine_trampoline (i0=<optimized out>, i1=<optimized out>) at util/coroutine-ucontext.c:115
> 
> So I think this is the bug: Calling blk_wait_while_drained() from
> anywhere between blk_inc_in_flight() and blk_dec_in_flight() is wrong
> because it will deadlock the drain operation.
> 
> blk_aio_read/write_entry() take care of this and drop their reference
> around blk_wait_while_drained(). But if we hit the race condition that
> drain hasn't yet started there, but it has when we get to
> blk_co_preadv() or blk_co_pwritev_part(), then we're in a buggy code
> path.

With the following patch, it seems to survive for now. I'll give it some
more testing tomorrow (also qemu-iotests to check that I didn't
accidentally break something else.)

Kevin


diff --git a/block/block-backend.c b/block/block-backend.c
index 8b8f2a80a0..634acd1541 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -1143,7 +1143,9 @@ static int blk_check_byte_request(BlockBackend *blk, int64_t offset,
 static void coroutine_fn blk_wait_while_drained(BlockBackend *blk)
 {
     if (blk->quiesce_counter && !blk->disable_request_queuing) {
+        blk_dec_in_flight(blk);
         qemu_co_queue_wait(&blk->queued_requests, NULL);
+        blk_inc_in_flight(blk);
     }
 }
 
@@ -1154,8 +1156,6 @@ int coroutine_fn blk_co_preadv(BlockBackend *blk, int64_t offset,
     int ret;
     BlockDriverState *bs;
 
-    blk_wait_while_drained(blk);
-
     /* Call blk_bs() only after waiting, the graph may have changed */
     bs = blk_bs(blk);
     trace_blk_co_preadv(blk, bs, offset, bytes, flags);
@@ -1186,8 +1186,6 @@ int coroutine_fn blk_co_pwritev_part(BlockBackend *blk, int64_t offset,
     int ret;
     BlockDriverState *bs;
 
-    blk_wait_while_drained(blk);
-
     /* Call blk_bs() only after waiting, the graph may have changed */
     bs = blk_bs(blk);
     trace_blk_co_pwritev(blk, bs, offset, bytes, flags);
@@ -1262,6 +1260,7 @@ static int blk_prw(BlockBackend *blk, int64_t offset, uint8_t *buf,
         .ret    = NOT_DONE,
     };
 
+    blk_inc_in_flight(blk);
     if (qemu_in_coroutine()) {
         /* Fast-path if already in coroutine context */
         co_entry(&rwco);
@@ -1270,6 +1269,7 @@ static int blk_prw(BlockBackend *blk, int64_t offset, uint8_t *buf,
         bdrv_coroutine_enter(blk_bs(blk), co);
         BDRV_POLL_WHILE(blk_bs(blk), rwco.ret == NOT_DONE);
     }
+    blk_dec_in_flight(blk);
 
     return rwco.ret;
 }
@@ -1388,9 +1388,7 @@ static void blk_aio_read_entry(void *opaque)
     QEMUIOVector *qiov = rwco->iobuf;
 
     if (rwco->blk->quiesce_counter) {
-        blk_dec_in_flight(rwco->blk);
         blk_wait_while_drained(rwco->blk);
-        blk_inc_in_flight(rwco->blk);
     }
 
     assert(qiov->size == acb->bytes);
@@ -1406,9 +1404,7 @@ static void blk_aio_write_entry(void *opaque)
     QEMUIOVector *qiov = rwco->iobuf;
 
     if (rwco->blk->quiesce_counter) {
-        blk_dec_in_flight(rwco->blk);
         blk_wait_while_drained(rwco->blk);
-        blk_inc_in_flight(rwco->blk);
     }
 
     assert(!qiov || qiov->size == acb->bytes);



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: bdrv_drained_begin deadlock with io-threads
  2020-04-02 17:10                                 ` Kevin Wolf
@ 2020-04-03  6:48                                   ` Thomas Lamprecht
  2020-04-03  8:26                                   ` Dietmar Maurer
  1 sibling, 0 replies; 30+ messages in thread
From: Thomas Lamprecht @ 2020-04-03  6:48 UTC (permalink / raw)
  To: Kevin Wolf, Dietmar Maurer
  Cc: Vladimir Sementsov-Ogievskiy, Stefan Hajnoczi, qemu-devel,
	qemu-block, Max Reitz

On 4/2/20 7:10 PM, Kevin Wolf wrote:
> Am 02.04.2020 um 18:47 hat Kevin Wolf geschrieben:
>> So I think this is the bug: Calling blk_wait_while_drained() from
>> anywhere between blk_inc_in_flight() and blk_dec_in_flight() is wrong
>> because it will deadlock the drain operation.
>>
>> blk_aio_read/write_entry() take care of this and drop their reference
>> around blk_wait_while_drained(). But if we hit the race condition that
>> drain hasn't yet started there, but it has when we get to
>> blk_co_preadv() or blk_co_pwritev_part(), then we're in a buggy code
>> path.
> 
> With the following patch, it seems to survive for now. I'll give it some
> more testing tomorrow (also qemu-iotests to check that I didn't
> accidentally break something else.)
> 

So I only followed the discussion loosely, but tried some simple reproducing
to ensure it was an issue independent of some artifacts on Dietmar's setup.

Before that patch I got always a hang before reaching the fifth drive-backup
+ block-job-cancel cycle. With your patch applied I had no hang so far,
currently into >885 cycles (and yes I confirmed stress -d 5 was really
running).

So, FWIW, the patch definitively fixes the issue or at least the symptoms
here, I cannot comment on its correctness or the like at all, as I'm
currently missing to much background.

cheers,
Thomas



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: bdrv_drained_begin deadlock with io-threads
  2020-04-02 17:10                                 ` Kevin Wolf
  2020-04-03  6:48                                   ` Thomas Lamprecht
@ 2020-04-03  8:26                                   ` Dietmar Maurer
  2020-04-03  8:47                                     ` Kevin Wolf
  1 sibling, 1 reply; 30+ messages in thread
From: Dietmar Maurer @ 2020-04-03  8:26 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: qemu-block, Vladimir Sementsov-Ogievskiy, qemu-devel,
	Stefan Hajnoczi, Max Reitz

> With the following patch, it seems to survive for now. I'll give it some
> more testing tomorrow (also qemu-iotests to check that I didn't
> accidentally break something else.)

Wow, that was fast! Seems your patch fixes the bug!

I wonder what commit introduced that problem, maybe:

https://github.com/qemu/qemu/commit/cf3129323f900ef5ddbccbe86e4fa801e88c566e#diff-7cb66df56045598b75a219eebc27efb6

If so, version 4.1.X in not affected by this bug, but 4.2.0 and later?



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: bdrv_drained_begin deadlock with io-threads
  2020-04-03  8:26                                   ` Dietmar Maurer
@ 2020-04-03  8:47                                     ` Kevin Wolf
  2020-04-03 16:31                                       ` Dietmar Maurer
  0 siblings, 1 reply; 30+ messages in thread
From: Kevin Wolf @ 2020-04-03  8:47 UTC (permalink / raw)
  To: Dietmar Maurer
  Cc: qemu-block, Vladimir Sementsov-Ogievskiy, qemu-devel,
	Stefan Hajnoczi, Max Reitz

Am 03.04.2020 um 10:26 hat Dietmar Maurer geschrieben:
> > With the following patch, it seems to survive for now. I'll give it some
> > more testing tomorrow (also qemu-iotests to check that I didn't
> > accidentally break something else.)
> 
> Wow, that was fast! Seems your patch fixes the bug!
> 
> I wonder what commit introduced that problem, maybe:
> 
> https://github.com/qemu/qemu/commit/cf3129323f900ef5ddbccbe86e4fa801e88c566e#diff-7cb66df56045598b75a219eebc27efb6
> 
> If so, version 4.1.X in not affected by this bug, but 4.2.0 and later?

Yes, I'm pretty sure that's the one.

Kevin



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: bdrv_drained_begin deadlock with io-threads
  2020-04-03  8:47                                     ` Kevin Wolf
@ 2020-04-03 16:31                                       ` Dietmar Maurer
  2020-04-06  8:31                                         ` Kevin Wolf
  0 siblings, 1 reply; 30+ messages in thread
From: Dietmar Maurer @ 2020-04-03 16:31 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: qemu-block, Vladimir Sementsov-Ogievskiy, qemu-devel,
	Stefan Hajnoczi, Max Reitz


> On April 3, 2020 10:47 AM Kevin Wolf <kwolf@redhat.com> wrote:
> 
>  
> Am 03.04.2020 um 10:26 hat Dietmar Maurer geschrieben:
> > > With the following patch, it seems to survive for now. I'll give it some
> > > more testing tomorrow (also qemu-iotests to check that I didn't
> > > accidentally break something else.)
> > 
> > Wow, that was fast! Seems your patch fixes the bug!
> > 
> > I wonder what commit introduced that problem, maybe:
> > 
> > https://github.com/qemu/qemu/commit/cf3129323f900ef5ddbccbe86e4fa801e88c566e#diff-7cb66df56045598b75a219eebc27efb6
> > 
> > If so, version 4.1.X in not affected by this bug, but 4.2.0 and later?
> 
> Yes, I'm pretty sure that's the one.

I also wonder if we can loose an aio_wait_kick() by directly accessing blk->in_flight.
I thought this should use atomic_mb_read()?

diff --git a/block/block-backend.c b/block/block-backend.c
index 8b8f2a80a0..48f3721505 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -2275,7 +2275,7 @@ static bool blk_root_drained_poll(BdrvChild *child)
 {
     BlockBackend *blk = child->opaque;
     assert(blk->quiesce_counter);
-    return !!blk->in_flight;
+    return !!atomic_mb_read(&blk->in_flight);
 }
 
 static void blk_root_drained_end(BdrvChild *child, int *drained_end_counter)



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: bdrv_drained_begin deadlock with io-threads
  2020-04-03 16:31                                       ` Dietmar Maurer
@ 2020-04-06  8:31                                         ` Kevin Wolf
  0 siblings, 0 replies; 30+ messages in thread
From: Kevin Wolf @ 2020-04-06  8:31 UTC (permalink / raw)
  To: Dietmar Maurer
  Cc: qemu-block, Vladimir Sementsov-Ogievskiy, qemu-devel,
	Stefan Hajnoczi, Max Reitz

Am 03.04.2020 um 18:31 hat Dietmar Maurer geschrieben:
> 
> > On April 3, 2020 10:47 AM Kevin Wolf <kwolf@redhat.com> wrote:
> > 
> >  
> > Am 03.04.2020 um 10:26 hat Dietmar Maurer geschrieben:
> > > > With the following patch, it seems to survive for now. I'll give it some
> > > > more testing tomorrow (also qemu-iotests to check that I didn't
> > > > accidentally break something else.)
> > > 
> > > Wow, that was fast! Seems your patch fixes the bug!
> > > 
> > > I wonder what commit introduced that problem, maybe:
> > > 
> > > https://github.com/qemu/qemu/commit/cf3129323f900ef5ddbccbe86e4fa801e88c566e#diff-7cb66df56045598b75a219eebc27efb6
> > > 
> > > If so, version 4.1.X in not affected by this bug, but 4.2.0 and later?
> > 
> > Yes, I'm pretty sure that's the one.
> 
> I also wonder if we can loose an aio_wait_kick() by directly accessing blk->in_flight.
> I thought this should use atomic_mb_read()?
> 
> diff --git a/block/block-backend.c b/block/block-backend.c
> index 8b8f2a80a0..48f3721505 100644
> --- a/block/block-backend.c
> +++ b/block/block-backend.c
> @@ -2275,7 +2275,7 @@ static bool blk_root_drained_poll(BdrvChild *child)
>  {
>      BlockBackend *blk = child->opaque;
>      assert(blk->quiesce_counter);
> -    return !!blk->in_flight;
> +    return !!atomic_mb_read(&blk->in_flight);
>  }

I'm not sure if it actually makes a difference (and especially what the
barrier is supposed to achieve here), but it would be more consistent
with other users of the variable, so I think I agree.

Kevin



^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2020-04-06  8:33 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-03-31  8:46 bdrv_drained_begin deadlock with io-threads Dietmar Maurer
2020-03-31  9:17 ` Dietmar Maurer
2020-03-31  9:33   ` Dietmar Maurer
2020-03-31 12:58 ` Kevin Wolf
2020-03-31 14:32   ` Dietmar Maurer
2020-03-31 14:53     ` Vladimir Sementsov-Ogievskiy
2020-03-31 15:24       ` Dietmar Maurer
2020-03-31 15:37         ` Kevin Wolf
2020-03-31 16:18           ` Dietmar Maurer
2020-04-01 10:37             ` Kevin Wolf
2020-04-01 15:37               ` Dietmar Maurer
2020-04-01 15:50                 ` Dietmar Maurer
2020-04-01 18:12                 ` Kevin Wolf
2020-04-01 18:28                   ` Dietmar Maurer
2020-04-01 18:44                     ` Kevin Wolf
2020-04-02  6:48                       ` Dietmar Maurer
2020-04-02  9:10                       ` Dietmar Maurer
2020-04-02 12:14                         ` Kevin Wolf
2020-04-02 14:25                           ` Kevin Wolf
2020-04-02 15:40                             ` Dietmar Maurer
2020-04-02 16:47                               ` Kevin Wolf
2020-04-02 17:10                                 ` Kevin Wolf
2020-04-03  6:48                                   ` Thomas Lamprecht
2020-04-03  8:26                                   ` Dietmar Maurer
2020-04-03  8:47                                     ` Kevin Wolf
2020-04-03 16:31                                       ` Dietmar Maurer
2020-04-06  8:31                                         ` Kevin Wolf
2020-04-02 15:44                             ` Dietmar Maurer
2020-04-01 18:35                   ` Kevin Wolf
2020-04-02  9:21                   ` Dietmar Maurer

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.