All of lore.kernel.org
 help / color / mirror / Atom feed
* backup transaction with io-thread core dumps
@ 2020-03-24 11:13 Dietmar Maurer
  2020-03-24 13:30 ` Dietmar Maurer
  0 siblings, 1 reply; 23+ messages in thread
From: Dietmar Maurer @ 2020-03-24 11:13 UTC (permalink / raw)
  To: jsnow, qemu-devel

I get a core dump with backup transactions when using io-threads.

To reproduce, create and start a VM with:

# qemu-img create disk1.raw 100M
# qemu-img create disk2.raw 100M
#./x86_64-softmmu/qemu-system-x86_64 -chardev 'socket,id=qmp,path=/var/run/qemu-test.qmp,server,nowait' -mon 'chardev=qmp,mode=control' -pidfile /var/run/qemu-server/108.pid  -m 512 -object 'iothread,id=iothread-virtioscsi0' -object 'iothread,id=iothread-virtioscsi1'  -device 'virtio-scsi-pci,id=virtioscsi0,iothread=iothread-virtioscsi0' -drive 'file=disk1.raw,if=none,id=drive-scsi0,format=raw,cache=none,aio=native,detect-zeroes=on' -device 'scsi-hd,bus=virtioscsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0,id=scsi0' -device 'virtio-scsi-pci,id=virtioscsi1,iothread=iothread-virtioscsi1' -drive 'file=disk2.raw,if=none,id=drive-scsi1,format=raw,cache=none,aio=native,detect-zeroes=on' -device 'scsi-hd,bus=virtioscsi1.0,channel=0,scsi-id=0,lun=1,drive=drive-scsi1,id=scsi1'

Then open socat to the qmp socket
# socat /var/run/qemu-test.qmp -

And run the following qmp command:

{ "execute": "qmp_capabilities", "arguments": {} }
{ "execute": "transaction", "arguments":  { "actions": [{ "type": "drive-backup", "data": { "device": "drive-scsi0", "sync": "full", "target": "backup-sysi0.raw" }}, { "type": "drive-backup", "data": { "device": "drive-scsi1", "sync": "full", "target": "backup-scsi1.raw" }}], "properties": { "completion-mode": "grouped" } } }

The VM will core dump:

qemu: qemu_mutex_unlock_impl: Operation not permitted
Aborted (core dumped)
(gdb) bt
#0  0x00007f099d5037bb in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x00007f099d4ee535 in __GI_abort () at abort.c:79
#2  0x000055c04e39525e in error_exit (err=<optimized out>, msg=msg@entry=0x55c04e5122e0 <__func__.16544> "qemu_mutex_unlock_impl") at util/qemu-thread-posix.c:36
#3  0x000055c04e395813 in qemu_mutex_unlock_impl (mutex=mutex@entry=0x7f09903154e0, file=file@entry=0x55c04e51129f "util/async.c", line=line@entry=601)
    at util/qemu-thread-posix.c:108
#4  0x000055c04e38f8e5 in aio_context_release (ctx=ctx@entry=0x7f0990315480) at util/async.c:601
#5  0x000055c04e299073 in bdrv_set_aio_context_ignore (bs=0x7f0929a76500, new_context=new_context@entry=0x7f0990315000, ignore=ignore@entry=0x7ffe08fa7400)
    at block.c:6238
#6  0x000055c04e2990cc in bdrv_set_aio_context_ignore (bs=bs@entry=0x7f092af47900, new_context=new_context@entry=0x7f0990315000, ignore=ignore@entry=0x7ffe08fa7400)
    at block.c:6211
#7  0x000055c04e299443 in bdrv_child_try_set_aio_context (bs=bs@entry=0x7f092af47900, ctx=0x7f0990315000, ignore_child=ignore_child@entry=0x0, errp=errp@entry=0x0)
    at block.c:6324
#8  0x000055c04e299576 in bdrv_try_set_aio_context (errp=0x0, ctx=<optimized out>, bs=0x7f092af47900) at block.c:6333
#9  0x000055c04e299576 in bdrv_replace_child (child=child@entry=0x7f09902ef5e0, new_bs=new_bs@entry=0x0) at block.c:2551
#10 0x000055c04e2995ff in bdrv_detach_child (child=0x7f09902ef5e0) at block.c:2666
#11 0x000055c04e299ec9 in bdrv_root_unref_child (child=<optimized out>) at block.c:2677
#12 0x000055c04e29f3fe in block_job_remove_all_bdrv (job=job@entry=0x7f0927c18800) at blockjob.c:191
#13 0x000055c04e29f429 in block_job_free (job=0x7f0927c18800) at blockjob.c:88
#14 0x000055c04e2a0909 in job_unref (job=0x7f0927c18800) at job.c:359
#15 0x000055c04e2a0909 in job_unref (job=0x7f0927c18800) at job.c:351
#16 0x000055c04e2a0b68 in job_conclude (job=0x7f0927c18800) at job.c:620
#17 0x000055c04e2a0b68 in job_finalize_single (job=0x7f0927c18800) at job.c:688
#18 0x000055c04e2a0b68 in job_finalize_single (job=0x7f0927c18800) at job.c:660
#19 0x000055c04e2a14fc in job_txn_apply (txn=<optimized out>, fn=0x55c04e2a0a50 <job_finalize_single>) at job.c:145
#20 0x000055c04e2a14fc in job_do_finalize (job=0x7f0927c1c200) at job.c:781
#21 0x000055c04e2a1751 in job_completed_txn_success (job=0x7f0927c1c200) at job.c:831
#22 0x000055c04e2a1751 in job_completed (job=0x7f0927c1c200) at job.c:844
#23 0x000055c04e2a1751 in job_completed (job=0x7f0927c1c200) at job.c:835
#24 0x000055c04e2a17b0 in job_exit (opaque=0x7f0927c1c200) at job.c:863
#25 0x000055c04e38ee75 in aio_bh_call (bh=0x7f098ec52000) at util/async.c:164
#26 0x000055c04e38ee75 in aio_bh_poll (ctx=ctx@entry=0x7f0990315000) at util/async.c:164
#27 0x000055c04e3924fe in aio_dispatch (ctx=0x7f0990315000) at util/aio-posix.c:380
#28 0x000055c04e38ed5e in aio_ctx_dispatch (source=<optimized out>, callback=<optimized out>, user_data=<optimized out>) at util/async.c:298
#29 0x00007f099f020f2e in g_main_context_dispatch () at /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0
#30 0x000055c04e391768 in glib_pollfds_poll () at util/main-loop.c:219
#31 0x000055c04e391768 in os_host_main_loop_wait (timeout=<optimized out>) at util/main-loop.c:242
#32 0x000055c04e391768 in main_loop_wait (nonblocking=nonblocking@entry=0) at util/main-loop.c:518
#33 0x000055c04e032329 in qemu_main_loop () at /home/dietmar/pve5-devel/mirror_qemu/softmmu/vl.c:1665
#34 0x000055c04df36a8e in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at /home/dietmar/pve5-devel/mirror_qemu/softmmu/main.c:49



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: backup transaction with io-thread core dumps
  2020-03-24 11:13 backup transaction with io-thread core dumps Dietmar Maurer
@ 2020-03-24 13:30 ` Dietmar Maurer
  2020-03-24 13:33   ` Dietmar Maurer
  0 siblings, 1 reply; 23+ messages in thread
From: Dietmar Maurer @ 2020-03-24 13:30 UTC (permalink / raw)
  To: jsnow, qemu-devel, mreitz

This is fixed with this patch:

https://lists.gnu.org/archive/html/qemu-devel/2020-03/msg07249.html

thanks!

> On March 24, 2020 12:13 PM Dietmar Maurer <dietmar@proxmox.com> wrote:
> 
>  
> I get a core dump with backup transactions when using io-threads.
> 
> To reproduce, create and start a VM with:
> 
> # qemu-img create disk1.raw 100M
> # qemu-img create disk2.raw 100M
> #./x86_64-softmmu/qemu-system-x86_64 -chardev 'socket,id=qmp,path=/var/run/qemu-test.qmp,server,nowait' -mon 'chardev=qmp,mode=control' -pidfile /var/run/qemu-server/108.pid  -m 512 -object 'iothread,id=iothread-virtioscsi0' -object 'iothread,id=iothread-virtioscsi1'  -device 'virtio-scsi-pci,id=virtioscsi0,iothread=iothread-virtioscsi0' -drive 'file=disk1.raw,if=none,id=drive-scsi0,format=raw,cache=none,aio=native,detect-zeroes=on' -device 'scsi-hd,bus=virtioscsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0,id=scsi0' -device 'virtio-scsi-pci,id=virtioscsi1,iothread=iothread-virtioscsi1' -drive 'file=disk2.raw,if=none,id=drive-scsi1,format=raw,cache=none,aio=native,detect-zeroes=on' -device 'scsi-hd,bus=virtioscsi1.0,channel=0,scsi-id=0,lun=1,drive=drive-scsi1,id=scsi1'
> 
> Then open socat to the qmp socket
> # socat /var/run/qemu-test.qmp -
> 
> And run the following qmp command:
> 
> { "execute": "qmp_capabilities", "arguments": {} }
> { "execute": "transaction", "arguments":  { "actions": [{ "type": "drive-backup", "data": { "device": "drive-scsi0", "sync": "full", "target": "backup-sysi0.raw" }}, { "type": "drive-backup", "data": { "device": "drive-scsi1", "sync": "full", "target": "backup-scsi1.raw" }}], "properties": { "completion-mode": "grouped" } } }
> 
> The VM will core dump:
> 
> qemu: qemu_mutex_unlock_impl: Operation not permitted
> Aborted (core dumped)
> (gdb) bt
> #0  0x00007f099d5037bb in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
> #1  0x00007f099d4ee535 in __GI_abort () at abort.c:79
> #2  0x000055c04e39525e in error_exit (err=<optimized out>, msg=msg@entry=0x55c04e5122e0 <__func__.16544> "qemu_mutex_unlock_impl") at util/qemu-thread-posix.c:36
> #3  0x000055c04e395813 in qemu_mutex_unlock_impl (mutex=mutex@entry=0x7f09903154e0, file=file@entry=0x55c04e51129f "util/async.c", line=line@entry=601)
>     at util/qemu-thread-posix.c:108
> #4  0x000055c04e38f8e5 in aio_context_release (ctx=ctx@entry=0x7f0990315480) at util/async.c:601
> #5  0x000055c04e299073 in bdrv_set_aio_context_ignore (bs=0x7f0929a76500, new_context=new_context@entry=0x7f0990315000, ignore=ignore@entry=0x7ffe08fa7400)
>     at block.c:6238
> #6  0x000055c04e2990cc in bdrv_set_aio_context_ignore (bs=bs@entry=0x7f092af47900, new_context=new_context@entry=0x7f0990315000, ignore=ignore@entry=0x7ffe08fa7400)
>     at block.c:6211
> #7  0x000055c04e299443 in bdrv_child_try_set_aio_context (bs=bs@entry=0x7f092af47900, ctx=0x7f0990315000, ignore_child=ignore_child@entry=0x0, errp=errp@entry=0x0)
>     at block.c:6324
> #8  0x000055c04e299576 in bdrv_try_set_aio_context (errp=0x0, ctx=<optimized out>, bs=0x7f092af47900) at block.c:6333
> #9  0x000055c04e299576 in bdrv_replace_child (child=child@entry=0x7f09902ef5e0, new_bs=new_bs@entry=0x0) at block.c:2551
> #10 0x000055c04e2995ff in bdrv_detach_child (child=0x7f09902ef5e0) at block.c:2666
> #11 0x000055c04e299ec9 in bdrv_root_unref_child (child=<optimized out>) at block.c:2677
> #12 0x000055c04e29f3fe in block_job_remove_all_bdrv (job=job@entry=0x7f0927c18800) at blockjob.c:191
> #13 0x000055c04e29f429 in block_job_free (job=0x7f0927c18800) at blockjob.c:88
> #14 0x000055c04e2a0909 in job_unref (job=0x7f0927c18800) at job.c:359
> #15 0x000055c04e2a0909 in job_unref (job=0x7f0927c18800) at job.c:351
> #16 0x000055c04e2a0b68 in job_conclude (job=0x7f0927c18800) at job.c:620
> #17 0x000055c04e2a0b68 in job_finalize_single (job=0x7f0927c18800) at job.c:688
> #18 0x000055c04e2a0b68 in job_finalize_single (job=0x7f0927c18800) at job.c:660
> #19 0x000055c04e2a14fc in job_txn_apply (txn=<optimized out>, fn=0x55c04e2a0a50 <job_finalize_single>) at job.c:145
> #20 0x000055c04e2a14fc in job_do_finalize (job=0x7f0927c1c200) at job.c:781
> #21 0x000055c04e2a1751 in job_completed_txn_success (job=0x7f0927c1c200) at job.c:831
> #22 0x000055c04e2a1751 in job_completed (job=0x7f0927c1c200) at job.c:844
> #23 0x000055c04e2a1751 in job_completed (job=0x7f0927c1c200) at job.c:835
> #24 0x000055c04e2a17b0 in job_exit (opaque=0x7f0927c1c200) at job.c:863
> #25 0x000055c04e38ee75 in aio_bh_call (bh=0x7f098ec52000) at util/async.c:164
> #26 0x000055c04e38ee75 in aio_bh_poll (ctx=ctx@entry=0x7f0990315000) at util/async.c:164
> #27 0x000055c04e3924fe in aio_dispatch (ctx=0x7f0990315000) at util/aio-posix.c:380
> #28 0x000055c04e38ed5e in aio_ctx_dispatch (source=<optimized out>, callback=<optimized out>, user_data=<optimized out>) at util/async.c:298
> #29 0x00007f099f020f2e in g_main_context_dispatch () at /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0
> #30 0x000055c04e391768 in glib_pollfds_poll () at util/main-loop.c:219
> #31 0x000055c04e391768 in os_host_main_loop_wait (timeout=<optimized out>) at util/main-loop.c:242
> #32 0x000055c04e391768 in main_loop_wait (nonblocking=nonblocking@entry=0) at util/main-loop.c:518
> #33 0x000055c04e032329 in qemu_main_loop () at /home/dietmar/pve5-devel/mirror_qemu/softmmu/vl.c:1665
> #34 0x000055c04df36a8e in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at /home/dietmar/pve5-devel/mirror_qemu/softmmu/main.c:49



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: backup transaction with io-thread core dumps
  2020-03-24 13:30 ` Dietmar Maurer
@ 2020-03-24 13:33   ` Dietmar Maurer
  2020-03-24 13:44     ` John Snow
  2020-03-24 13:47     ` Max Reitz
  0 siblings, 2 replies; 23+ messages in thread
From: Dietmar Maurer @ 2020-03-24 13:33 UTC (permalink / raw)
  To: jsnow, qemu-devel, mreitz

spoke too soon - the error is still there, sigh
 
> This is fixed with this patch:
> 
> https://lists.gnu.org/archive/html/qemu-devel/2020-03/msg07249.html
> 
> thanks!
> 
> > On March 24, 2020 12:13 PM Dietmar Maurer <dietmar@proxmox.com> wrote:
> > 
> >  
> > I get a core dump with backup transactions when using io-threads.
> > 
> > To reproduce, create and start a VM with:
> > 
> > # qemu-img create disk1.raw 100M
> > # qemu-img create disk2.raw 100M
> > #./x86_64-softmmu/qemu-system-x86_64 -chardev 'socket,id=qmp,path=/var/run/qemu-test.qmp,server,nowait' -mon 'chardev=qmp,mode=control' -pidfile /var/run/qemu-server/108.pid  -m 512 -object 'iothread,id=iothread-virtioscsi0' -object 'iothread,id=iothread-virtioscsi1'  -device 'virtio-scsi-pci,id=virtioscsi0,iothread=iothread-virtioscsi0' -drive 'file=disk1.raw,if=none,id=drive-scsi0,format=raw,cache=none,aio=native,detect-zeroes=on' -device 'scsi-hd,bus=virtioscsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0,id=scsi0' -device 'virtio-scsi-pci,id=virtioscsi1,iothread=iothread-virtioscsi1' -drive 'file=disk2.raw,if=none,id=drive-scsi1,format=raw,cache=none,aio=native,detect-zeroes=on' -device 'scsi-hd,bus=virtioscsi1.0,channel=0,scsi-id=0,lun=1,drive=drive-scsi1,id=scsi1'
> > 
> > Then open socat to the qmp socket
> > # socat /var/run/qemu-test.qmp -
> > 
> > And run the following qmp command:
> > 
> > { "execute": "qmp_capabilities", "arguments": {} }
> > { "execute": "transaction", "arguments":  { "actions": [{ "type": "drive-backup", "data": { "device": "drive-scsi0", "sync": "full", "target": "backup-sysi0.raw" }}, { "type": "drive-backup", "data": { "device": "drive-scsi1", "sync": "full", "target": "backup-scsi1.raw" }}], "properties": { "completion-mode": "grouped" } } }
> > 
> > The VM will core dump:
> > 
> > qemu: qemu_mutex_unlock_impl: Operation not permitted
> > Aborted (core dumped)
> > (gdb) bt
> > #0  0x00007f099d5037bb in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
> > #1  0x00007f099d4ee535 in __GI_abort () at abort.c:79
> > #2  0x000055c04e39525e in error_exit (err=<optimized out>, msg=msg@entry=0x55c04e5122e0 <__func__.16544> "qemu_mutex_unlock_impl") at util/qemu-thread-posix.c:36
> > #3  0x000055c04e395813 in qemu_mutex_unlock_impl (mutex=mutex@entry=0x7f09903154e0, file=file@entry=0x55c04e51129f "util/async.c", line=line@entry=601)
> >     at util/qemu-thread-posix.c:108
> > #4  0x000055c04e38f8e5 in aio_context_release (ctx=ctx@entry=0x7f0990315480) at util/async.c:601
> > #5  0x000055c04e299073 in bdrv_set_aio_context_ignore (bs=0x7f0929a76500, new_context=new_context@entry=0x7f0990315000, ignore=ignore@entry=0x7ffe08fa7400)
> >     at block.c:6238
> > #6  0x000055c04e2990cc in bdrv_set_aio_context_ignore (bs=bs@entry=0x7f092af47900, new_context=new_context@entry=0x7f0990315000, ignore=ignore@entry=0x7ffe08fa7400)
> >     at block.c:6211
> > #7  0x000055c04e299443 in bdrv_child_try_set_aio_context (bs=bs@entry=0x7f092af47900, ctx=0x7f0990315000, ignore_child=ignore_child@entry=0x0, errp=errp@entry=0x0)
> >     at block.c:6324
> > #8  0x000055c04e299576 in bdrv_try_set_aio_context (errp=0x0, ctx=<optimized out>, bs=0x7f092af47900) at block.c:6333
> > #9  0x000055c04e299576 in bdrv_replace_child (child=child@entry=0x7f09902ef5e0, new_bs=new_bs@entry=0x0) at block.c:2551
> > #10 0x000055c04e2995ff in bdrv_detach_child (child=0x7f09902ef5e0) at block.c:2666
> > #11 0x000055c04e299ec9 in bdrv_root_unref_child (child=<optimized out>) at block.c:2677
> > #12 0x000055c04e29f3fe in block_job_remove_all_bdrv (job=job@entry=0x7f0927c18800) at blockjob.c:191
> > #13 0x000055c04e29f429 in block_job_free (job=0x7f0927c18800) at blockjob.c:88
> > #14 0x000055c04e2a0909 in job_unref (job=0x7f0927c18800) at job.c:359
> > #15 0x000055c04e2a0909 in job_unref (job=0x7f0927c18800) at job.c:351
> > #16 0x000055c04e2a0b68 in job_conclude (job=0x7f0927c18800) at job.c:620
> > #17 0x000055c04e2a0b68 in job_finalize_single (job=0x7f0927c18800) at job.c:688
> > #18 0x000055c04e2a0b68 in job_finalize_single (job=0x7f0927c18800) at job.c:660
> > #19 0x000055c04e2a14fc in job_txn_apply (txn=<optimized out>, fn=0x55c04e2a0a50 <job_finalize_single>) at job.c:145
> > #20 0x000055c04e2a14fc in job_do_finalize (job=0x7f0927c1c200) at job.c:781
> > #21 0x000055c04e2a1751 in job_completed_txn_success (job=0x7f0927c1c200) at job.c:831
> > #22 0x000055c04e2a1751 in job_completed (job=0x7f0927c1c200) at job.c:844
> > #23 0x000055c04e2a1751 in job_completed (job=0x7f0927c1c200) at job.c:835
> > #24 0x000055c04e2a17b0 in job_exit (opaque=0x7f0927c1c200) at job.c:863
> > #25 0x000055c04e38ee75 in aio_bh_call (bh=0x7f098ec52000) at util/async.c:164
> > #26 0x000055c04e38ee75 in aio_bh_poll (ctx=ctx@entry=0x7f0990315000) at util/async.c:164
> > #27 0x000055c04e3924fe in aio_dispatch (ctx=0x7f0990315000) at util/aio-posix.c:380
> > #28 0x000055c04e38ed5e in aio_ctx_dispatch (source=<optimized out>, callback=<optimized out>, user_data=<optimized out>) at util/async.c:298
> > #29 0x00007f099f020f2e in g_main_context_dispatch () at /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0
> > #30 0x000055c04e391768 in glib_pollfds_poll () at util/main-loop.c:219
> > #31 0x000055c04e391768 in os_host_main_loop_wait (timeout=<optimized out>) at util/main-loop.c:242
> > #32 0x000055c04e391768 in main_loop_wait (nonblocking=nonblocking@entry=0) at util/main-loop.c:518
> > #33 0x000055c04e032329 in qemu_main_loop () at /home/dietmar/pve5-devel/mirror_qemu/softmmu/vl.c:1665
> > #34 0x000055c04df36a8e in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at /home/dietmar/pve5-devel/mirror_qemu/softmmu/main.c:49



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: backup transaction with io-thread core dumps
  2020-03-24 13:33   ` Dietmar Maurer
@ 2020-03-24 13:44     ` John Snow
  2020-03-24 14:00       ` Dietmar Maurer
  2020-03-24 13:47     ` Max Reitz
  1 sibling, 1 reply; 23+ messages in thread
From: John Snow @ 2020-03-24 13:44 UTC (permalink / raw)
  To: Dietmar Maurer, qemu-devel, mreitz

Well, that was a rollercoaster of emotions as I read your mails.

On 3/24/20 9:33 AM, Dietmar Maurer wrote:
> spoke too soon - the error is still there, sigh
>  
>> This is fixed with this patch:
>>
>> https://lists.gnu.org/archive/html/qemu-devel/2020-03/msg07249.html
>>
>> thanks!
>>
>>> On March 24, 2020 12:13 PM Dietmar Maurer <dietmar@proxmox.com> wrote:
>>>
>>>  
>>> I get a core dump with backup transactions when using io-threads.
>>>
>>> To reproduce, create and start a VM with:
>>>
>>> # qemu-img create disk1.raw 100M
>>> # qemu-img create disk2.raw 100M
>>> #./x86_64-softmmu/qemu-system-x86_64 -chardev 'socket,id=qmp,path=/var/run/qemu-test.qmp,server,nowait' -mon 'chardev=qmp,mode=control' -pidfile /var/run/qemu-server/108.pid  -m 512 -object 'iothread,id=iothread-virtioscsi0' -object 'iothread,id=iothread-virtioscsi1'  -device 'virtio-scsi-pci,id=virtioscsi0,iothread=iothread-virtioscsi0' -drive 'file=disk1.raw,if=none,id=drive-scsi0,format=raw,cache=none,aio=native,detect-zeroes=on' -device 'scsi-hd,bus=virtioscsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0,id=scsi0' -device 'virtio-scsi-pci,id=virtioscsi1,iothread=iothread-virtioscsi1' -drive 'file=disk2.raw,if=none,id=drive-scsi1,format=raw,cache=none,aio=native,detect-zeroes=on' -device 'scsi-hd,bus=virtioscsi1.0,channel=0,scsi-id=0,lun=1,drive=drive-scsi1,id=scsi1'
>>>
>>> Then open socat to the qmp socket
>>> # socat /var/run/qemu-test.qmp -
>>>
>>> And run the following qmp command:
>>>
>>> { "execute": "qmp_capabilities", "arguments": {} }
>>> { "execute": "transaction", "arguments":  { "actions": [{ "type": "drive-backup", "data": { "device": "drive-scsi0", "sync": "full", "target": "backup-sysi0.raw" }}, { "type": "drive-backup", "data": { "device": "drive-scsi1", "sync": "full", "target": "backup-scsi1.raw" }}], "properties": { "completion-mode": "grouped" } } }
>>>
>>> The VM will core dump:
>>>
>>> qemu: qemu_mutex_unlock_impl: Operation not permitted
>>> Aborted (core dumped)
>>> (gdb) bt
>>> #0  0x00007f099d5037bb in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
>>> #1  0x00007f099d4ee535 in __GI_abort () at abort.c:79
>>> #2  0x000055c04e39525e in error_exit (err=<optimized out>, msg=msg@entry=0x55c04e5122e0 <__func__.16544> "qemu_mutex_unlock_impl") at util/qemu-thread-posix.c:36
>>> #3  0x000055c04e395813 in qemu_mutex_unlock_impl (mutex=mutex@entry=0x7f09903154e0, file=file@entry=0x55c04e51129f "util/async.c", line=line@entry=601)
>>>     at util/qemu-thread-posix.c:108
>>> #4  0x000055c04e38f8e5 in aio_context_release (ctx=ctx@entry=0x7f0990315480) at util/async.c:601
>>> #5  0x000055c04e299073 in bdrv_set_aio_context_ignore (bs=0x7f0929a76500, new_context=new_context@entry=0x7f0990315000, ignore=ignore@entry=0x7ffe08fa7400)
>>>     at block.c:6238
>>> #6  0x000055c04e2990cc in bdrv_set_aio_context_ignore (bs=bs@entry=0x7f092af47900, new_context=new_context@entry=0x7f0990315000, ignore=ignore@entry=0x7ffe08fa7400)
>>>     at block.c:6211
>>> #7  0x000055c04e299443 in bdrv_child_try_set_aio_context (bs=bs@entry=0x7f092af47900, ctx=0x7f0990315000, ignore_child=ignore_child@entry=0x0, errp=errp@entry=0x0)
>>>     at block.c:6324
>>> #8  0x000055c04e299576 in bdrv_try_set_aio_context (errp=0x0, ctx=<optimized out>, bs=0x7f092af47900) at block.c:6333
>>> #9  0x000055c04e299576 in bdrv_replace_child (child=child@entry=0x7f09902ef5e0, new_bs=new_bs@entry=0x0) at block.c:2551
>>> #10 0x000055c04e2995ff in bdrv_detach_child (child=0x7f09902ef5e0) at block.c:2666
>>> #11 0x000055c04e299ec9 in bdrv_root_unref_child (child=<optimized out>) at block.c:2677
>>> #12 0x000055c04e29f3fe in block_job_remove_all_bdrv (job=job@entry=0x7f0927c18800) at blockjob.c:191
>>> #13 0x000055c04e29f429 in block_job_free (job=0x7f0927c18800) at blockjob.c:88
>>> #14 0x000055c04e2a0909 in job_unref (job=0x7f0927c18800) at job.c:359
>>> #15 0x000055c04e2a0909 in job_unref (job=0x7f0927c18800) at job.c:351
>>> #16 0x000055c04e2a0b68 in job_conclude (job=0x7f0927c18800) at job.c:620
>>> #17 0x000055c04e2a0b68 in job_finalize_single (job=0x7f0927c18800) at job.c:688
>>> #18 0x000055c04e2a0b68 in job_finalize_single (job=0x7f0927c18800) at job.c:660
>>> #19 0x000055c04e2a14fc in job_txn_apply (txn=<optimized out>, fn=0x55c04e2a0a50 <job_finalize_single>) at job.c:145
>>> #20 0x000055c04e2a14fc in job_do_finalize (job=0x7f0927c1c200) at job.c:781
>>> #21 0x000055c04e2a1751 in job_completed_txn_success (job=0x7f0927c1c200) at job.c:831
>>> #22 0x000055c04e2a1751 in job_completed (job=0x7f0927c1c200) at job.c:844
>>> #23 0x000055c04e2a1751 in job_completed (job=0x7f0927c1c200) at job.c:835
>>> #24 0x000055c04e2a17b0 in job_exit (opaque=0x7f0927c1c200) at job.c:863
>>> #25 0x000055c04e38ee75 in aio_bh_call (bh=0x7f098ec52000) at util/async.c:164
>>> #26 0x000055c04e38ee75 in aio_bh_poll (ctx=ctx@entry=0x7f0990315000) at util/async.c:164
>>> #27 0x000055c04e3924fe in aio_dispatch (ctx=0x7f0990315000) at util/aio-posix.c:380
>>> #28 0x000055c04e38ed5e in aio_ctx_dispatch (source=<optimized out>, callback=<optimized out>, user_data=<optimized out>) at util/async.c:298
>>> #29 0x00007f099f020f2e in g_main_context_dispatch () at /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0
>>> #30 0x000055c04e391768 in glib_pollfds_poll () at util/main-loop.c:219
>>> #31 0x000055c04e391768 in os_host_main_loop_wait (timeout=<optimized out>) at util/main-loop.c:242
>>> #32 0x000055c04e391768 in main_loop_wait (nonblocking=nonblocking@entry=0) at util/main-loop.c:518
>>> #33 0x000055c04e032329 in qemu_main_loop () at /home/dietmar/pve5-devel/mirror_qemu/softmmu/vl.c:1665
>>> #34 0x000055c04df36a8e in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at /home/dietmar/pve5-devel/mirror_qemu/softmmu/main.c:49
> 

Alright, let's start the field autopsy.

What version of QEMU are you running? Sergio contributed a lot of AIO
context (and iothread) fixes this release; this looks (at a glance) to
be similar and related.

If you are running 4.2 and not the latest HEAD, it's a strong candidate.

Check this out:
https://lists.gnu.org/archive/html/qemu-devel/2020-01/msg01440.html

hth,
--js



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: backup transaction with io-thread core dumps
  2020-03-24 13:33   ` Dietmar Maurer
  2020-03-24 13:44     ` John Snow
@ 2020-03-24 13:47     ` Max Reitz
  2020-03-24 14:02       ` Dietmar Maurer
                         ` (2 more replies)
  1 sibling, 3 replies; 23+ messages in thread
From: Max Reitz @ 2020-03-24 13:47 UTC (permalink / raw)
  To: Dietmar Maurer, jsnow, qemu-devel, Stefan Hajnoczi


[-- Attachment #1.1: Type: text/plain, Size: 5974 bytes --]

Hi Dietmar,

I assume this is with master and has popped up only recently?

Maybe it has something to do with the recent mutex patches by Stefan, so
I’m Cc-ing him.

Max

On 24.03.20 14:33, Dietmar Maurer wrote:
> spoke too soon - the error is still there, sigh
>  
>> This is fixed with this patch:
>>
>> https://lists.gnu.org/archive/html/qemu-devel/2020-03/msg07249.html
>>
>> thanks!
>>
>>> On March 24, 2020 12:13 PM Dietmar Maurer <dietmar@proxmox.com> wrote:
>>>
>>>  
>>> I get a core dump with backup transactions when using io-threads.
>>>
>>> To reproduce, create and start a VM with:
>>>
>>> # qemu-img create disk1.raw 100M
>>> # qemu-img create disk2.raw 100M
>>> #./x86_64-softmmu/qemu-system-x86_64 -chardev 'socket,id=qmp,path=/var/run/qemu-test.qmp,server,nowait' -mon 'chardev=qmp,mode=control' -pidfile /var/run/qemu-server/108.pid  -m 512 -object 'iothread,id=iothread-virtioscsi0' -object 'iothread,id=iothread-virtioscsi1'  -device 'virtio-scsi-pci,id=virtioscsi0,iothread=iothread-virtioscsi0' -drive 'file=disk1.raw,if=none,id=drive-scsi0,format=raw,cache=none,aio=native,detect-zeroes=on' -device 'scsi-hd,bus=virtioscsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0,id=scsi0' -device 'virtio-scsi-pci,id=virtioscsi1,iothread=iothread-virtioscsi1' -drive 'file=disk2.raw,if=none,id=drive-scsi1,format=raw,cache=none,aio=native,detect-zeroes=on' -device 'scsi-hd,bus=virtioscsi1.0,channel=0,scsi-id=0,lun=1,drive=drive-scsi1,id=scsi1'
>>>
>>> Then open socat to the qmp socket
>>> # socat /var/run/qemu-test.qmp -
>>>
>>> And run the following qmp command:
>>>
>>> { "execute": "qmp_capabilities", "arguments": {} }
>>> { "execute": "transaction", "arguments":  { "actions": [{ "type": "drive-backup", "data": { "device": "drive-scsi0", "sync": "full", "target": "backup-sysi0.raw" }}, { "type": "drive-backup", "data": { "device": "drive-scsi1", "sync": "full", "target": "backup-scsi1.raw" }}], "properties": { "completion-mode": "grouped" } } }
>>>
>>> The VM will core dump:
>>>
>>> qemu: qemu_mutex_unlock_impl: Operation not permitted
>>> Aborted (core dumped)
>>> (gdb) bt
>>> #0  0x00007f099d5037bb in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
>>> #1  0x00007f099d4ee535 in __GI_abort () at abort.c:79
>>> #2  0x000055c04e39525e in error_exit (err=<optimized out>, msg=msg@entry=0x55c04e5122e0 <__func__.16544> "qemu_mutex_unlock_impl") at util/qemu-thread-posix.c:36
>>> #3  0x000055c04e395813 in qemu_mutex_unlock_impl (mutex=mutex@entry=0x7f09903154e0, file=file@entry=0x55c04e51129f "util/async.c", line=line@entry=601)
>>>     at util/qemu-thread-posix.c:108
>>> #4  0x000055c04e38f8e5 in aio_context_release (ctx=ctx@entry=0x7f0990315480) at util/async.c:601
>>> #5  0x000055c04e299073 in bdrv_set_aio_context_ignore (bs=0x7f0929a76500, new_context=new_context@entry=0x7f0990315000, ignore=ignore@entry=0x7ffe08fa7400)
>>>     at block.c:6238
>>> #6  0x000055c04e2990cc in bdrv_set_aio_context_ignore (bs=bs@entry=0x7f092af47900, new_context=new_context@entry=0x7f0990315000, ignore=ignore@entry=0x7ffe08fa7400)
>>>     at block.c:6211
>>> #7  0x000055c04e299443 in bdrv_child_try_set_aio_context (bs=bs@entry=0x7f092af47900, ctx=0x7f0990315000, ignore_child=ignore_child@entry=0x0, errp=errp@entry=0x0)
>>>     at block.c:6324
>>> #8  0x000055c04e299576 in bdrv_try_set_aio_context (errp=0x0, ctx=<optimized out>, bs=0x7f092af47900) at block.c:6333
>>> #9  0x000055c04e299576 in bdrv_replace_child (child=child@entry=0x7f09902ef5e0, new_bs=new_bs@entry=0x0) at block.c:2551
>>> #10 0x000055c04e2995ff in bdrv_detach_child (child=0x7f09902ef5e0) at block.c:2666
>>> #11 0x000055c04e299ec9 in bdrv_root_unref_child (child=<optimized out>) at block.c:2677
>>> #12 0x000055c04e29f3fe in block_job_remove_all_bdrv (job=job@entry=0x7f0927c18800) at blockjob.c:191
>>> #13 0x000055c04e29f429 in block_job_free (job=0x7f0927c18800) at blockjob.c:88
>>> #14 0x000055c04e2a0909 in job_unref (job=0x7f0927c18800) at job.c:359
>>> #15 0x000055c04e2a0909 in job_unref (job=0x7f0927c18800) at job.c:351
>>> #16 0x000055c04e2a0b68 in job_conclude (job=0x7f0927c18800) at job.c:620
>>> #17 0x000055c04e2a0b68 in job_finalize_single (job=0x7f0927c18800) at job.c:688
>>> #18 0x000055c04e2a0b68 in job_finalize_single (job=0x7f0927c18800) at job.c:660
>>> #19 0x000055c04e2a14fc in job_txn_apply (txn=<optimized out>, fn=0x55c04e2a0a50 <job_finalize_single>) at job.c:145
>>> #20 0x000055c04e2a14fc in job_do_finalize (job=0x7f0927c1c200) at job.c:781
>>> #21 0x000055c04e2a1751 in job_completed_txn_success (job=0x7f0927c1c200) at job.c:831
>>> #22 0x000055c04e2a1751 in job_completed (job=0x7f0927c1c200) at job.c:844
>>> #23 0x000055c04e2a1751 in job_completed (job=0x7f0927c1c200) at job.c:835
>>> #24 0x000055c04e2a17b0 in job_exit (opaque=0x7f0927c1c200) at job.c:863
>>> #25 0x000055c04e38ee75 in aio_bh_call (bh=0x7f098ec52000) at util/async.c:164
>>> #26 0x000055c04e38ee75 in aio_bh_poll (ctx=ctx@entry=0x7f0990315000) at util/async.c:164
>>> #27 0x000055c04e3924fe in aio_dispatch (ctx=0x7f0990315000) at util/aio-posix.c:380
>>> #28 0x000055c04e38ed5e in aio_ctx_dispatch (source=<optimized out>, callback=<optimized out>, user_data=<optimized out>) at util/async.c:298
>>> #29 0x00007f099f020f2e in g_main_context_dispatch () at /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0
>>> #30 0x000055c04e391768 in glib_pollfds_poll () at util/main-loop.c:219
>>> #31 0x000055c04e391768 in os_host_main_loop_wait (timeout=<optimized out>) at util/main-loop.c:242
>>> #32 0x000055c04e391768 in main_loop_wait (nonblocking=nonblocking@entry=0) at util/main-loop.c:518
>>> #33 0x000055c04e032329 in qemu_main_loop () at /home/dietmar/pve5-devel/mirror_qemu/softmmu/vl.c:1665
>>> #34 0x000055c04df36a8e in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at /home/dietmar/pve5-devel/mirror_qemu/softmmu/main.c:49
> 



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: backup transaction with io-thread core dumps
  2020-03-24 13:44     ` John Snow
@ 2020-03-24 14:00       ` Dietmar Maurer
  0 siblings, 0 replies; 23+ messages in thread
From: Dietmar Maurer @ 2020-03-24 14:00 UTC (permalink / raw)
  To: John Snow, qemu-devel, mreitz


> What version of QEMU are you running? Sergio contributed a lot of AIO
> context (and iothread) fixes this release; this looks (at a glance) to
> be similar and related.

I test with latest code from git

commit 09a98dd988c715157c0b80af16fa5baa80101eed 
Merge: f1e748d279 1583794b9b
Author: Peter Maydell <peter.maydell@linaro.org>
Date:   Tue Mar 24 09:50:46 2020 +0000

    Merge remote-tracking branch 'remotes/dgibson/tags/ppc-for-5.0-20200324' into staging
    
    ppc patch queue for 2020-03-24



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: backup transaction with io-thread core dumps
  2020-03-24 13:47     ` Max Reitz
@ 2020-03-24 14:02       ` Dietmar Maurer
  2020-03-24 16:49       ` Dietmar Maurer
  2020-03-25  8:13       ` backup transaction with io-thread core dumps Sergio Lopez
  2 siblings, 0 replies; 23+ messages in thread
From: Dietmar Maurer @ 2020-03-24 14:02 UTC (permalink / raw)
  To: Max Reitz, jsnow, qemu-devel, Stefan Hajnoczi

> I assume this is with master and has popped up only recently?

sure
 
> Maybe it has something to do with the recent mutex patches by Stefan, so
> I’m Cc-ing him.

Thanks.



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: backup transaction with io-thread core dumps
  2020-03-24 13:47     ` Max Reitz
  2020-03-24 14:02       ` Dietmar Maurer
@ 2020-03-24 16:49       ` Dietmar Maurer
  2020-03-25 11:40         ` Stefan Reiter
  2020-03-25  8:13       ` backup transaction with io-thread core dumps Sergio Lopez
  2 siblings, 1 reply; 23+ messages in thread
From: Dietmar Maurer @ 2020-03-24 16:49 UTC (permalink / raw)
  To: Max Reitz, jsnow, qemu-devel, Stefan Hajnoczi

A more serious issue is that I also get a hang inside the poll loop
when the VM is under load. For example, running "stress -d 5" inside
the VM (Debian Buster).

Then running a simply drive-backup like:

{ "execute": "drive-backup", "arguments": { "device": "drive-scsi0", "sync": "full", "target": "backup-scsi0.raw" } }

At the end of the backup, VM hangs, gdb says:

(gdb) bt
#0  0x00007ffff5cb3916 in __GI_ppoll (fds=0x7fff63d35c40, nfds=2, timeout=<optimized out>, 
    timeout@entry=0x0, sigmask=sigmask@entry=0x0) at ../sysdeps/unix/sysv/linux/ppoll.c:39
#1  0x0000555555c5f2c9 in ppoll (__ss=0x0, __timeout=0x0, __nfds=<optimized out>, __fds=<optimized out>)
    at /usr/include/x86_64-linux-gnu/bits/poll2.h:77
#2  0x0000555555c5f2c9 in qemu_poll_ns (fds=<optimized out>, nfds=<optimized out>, timeout=timeout@entry=-1)
    at util/qemu-timer.c:335
#3  0x0000555555c61ab1 in fdmon_poll_wait (ctx=0x7fffe8905e80, ready_list=0x7fffffffdc68, timeout=-1)
    at util/fdmon-poll.c:79
#4  0x0000555555c61097 in aio_poll (ctx=0x7fffe8905e80, blocking=blocking@entry=true) at util/aio-posix.c:589
#5  0x0000555555bc2243 in bdrv_do_drained_begin
    (poll=<optimized out>, ignore_bds_parents=false, parent=0x0, recursive=false, bs=0x7fff671c11c0) at block/io.c:429
#6  0x0000555555bc2243 in bdrv_do_drained_begin
    (bs=0x7fff671c11c0, recursive=<optimized out>, parent=0x0, ignore_bds_parents=<optimized out>, poll=<optimized out>)
    at block/io.c:395
#7  0x0000555555bdd132 in bdrv_backup_top_drop (bs=0x7fff671c11c0) at block/backup-top.c:273
#8  0x0000555555bd883c in backup_clean (job=0x7fffe5609a00) at block/backup.c:114
#9  0x0000555555b6d46d in job_clean (job=0x7fffe5609a00) at job.c:657
#10 0x0000555555b6d46d in job_finalize_single (job=0x7fffe5609a00) at job.c:673
#11 0x0000555555b6d46d in job_finalize_single (job=0x7fffe5609a00) at job.c:661
#12 0x0000555555b6dedc in job_txn_apply (txn=<optimized out>, fn=0x555555b6d420 <job_finalize_single>) at job.c:145
#13 0x0000555555b6dedc in job_do_finalize (job=0x7fffe5609a00) at job.c:782
#14 0x0000555555b6e131 in job_completed_txn_success (job=0x7fffe5609a00) at job.c:832
#15 0x0000555555b6e131 in job_completed (job=0x7fffe5609a00) at job.c:845
#16 0x0000555555b6e131 in job_completed (job=0x7fffe5609a00) at job.c:836
#17 0x0000555555b6e190 in job_exit (opaque=0x7fffe5609a00) at job.c:864
#18 0x0000555555c5d855 in aio_bh_call (bh=0x7fffe721a3c0) at util/async.c:164
#19 0x0000555555c5d855 in aio_bh_poll (ctx=ctx@entry=0x7fffe8905e80) at util/async.c:164
#20 0x0000555555c60ede in aio_dispatch (ctx=0x7fffe8905e80) at util/aio-posix.c:380
#21 0x0000555555c5d73e in aio_ctx_dispatch (source=<optimized out>, callback=<optimized out>, user_data=<optimized out>)
    at util/async.c:298
#22 0x00007ffff7c7ef2e in g_main_context_dispatch () at /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0
#23 0x0000555555c60148 in glib_pollfds_poll () at util/main-loop.c:219
#24 0x0000555555c60148 in os_host_main_loop_wait (timeout=<optimized out>) at util/main-loop.c:242
#25 0x0000555555c60148 in main_loop_wait (nonblocking=nonblocking@entry=0) at util/main-loop.c:518
#26 0x00005555558fc579 in qemu_main_loop () at /home/dietmar/pve5-devel/mirror_qemu/softmmu/vl.c:1665
#27 0x0000555555800c2e in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>)
    at /home/dietmar/pve5-devel/mirror_qemu/softmmu/main.c:49

The VM uses a similar hardware setup with io-threads as the previous 
example.

Is somebody able to reproduce that? And ideas whats wrong here?

> On March 24, 2020 2:47 PM Max Reitz <mreitz@redhat.com> wrote:
> 
>  
> Hi Dietmar,
> 
> I assume this is with master and has popped up only recently?
> 
> Maybe it has something to do with the recent mutex patches by Stefan, so
> I’m Cc-ing him.
> 
> Max
> 
> On 24.03.20 14:33, Dietmar Maurer wrote:
> > spoke too soon - the error is still there, sigh
> >  
> >> This is fixed with this patch:
> >>
> >> https://lists.gnu.org/archive/html/qemu-devel/2020-03/msg07249.html
> >>
> >> thanks!
> >>
> >>> On March 24, 2020 12:13 PM Dietmar Maurer <dietmar@proxmox.com> wrote:
> >>>
> >>>  
> >>> I get a core dump with backup transactions when using io-threads.
> >>>
> >>> To reproduce, create and start a VM with:
> >>>
> >>> # qemu-img create disk1.raw 100M
> >>> # qemu-img create disk2.raw 100M
> >>> #./x86_64-softmmu/qemu-system-x86_64 -chardev 'socket,id=qmp,path=/var/run/qemu-test.qmp,server,nowait' -mon 'chardev=qmp,mode=control' -pidfile /var/run/qemu-server/108.pid  -m 512 -object 'iothread,id=iothread-virtioscsi0' -object 'iothread,id=iothread-virtioscsi1'  -device 'virtio-scsi-pci,id=virtioscsi0,iothread=iothread-virtioscsi0' -drive 'file=disk1.raw,if=none,id=drive-scsi0,format=raw,cache=none,aio=native,detect-zeroes=on' -device 'scsi-hd,bus=virtioscsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0,id=scsi0' -device 'virtio-scsi-pci,id=virtioscsi1,iothread=iothread-virtioscsi1' -drive 'file=disk2.raw,if=none,id=drive-scsi1,format=raw,cache=none,aio=native,detect-zeroes=on' -device 'scsi-hd,bus=virtioscsi1.0,channel=0,scsi-id=0,lun=1,drive=drive-scsi1,id=scsi1'
> >>>
> >>> Then open socat to the qmp socket
> >>> # socat /var/run/qemu-test.qmp -
> >>>
> >>> And run the following qmp command:
> >>>
> >>> { "execute": "qmp_capabilities", "arguments": {} }
> >>> { "execute": "transaction", "arguments":  { "actions": [{ "type": "drive-backup", "data": { "device": "drive-scsi0", "sync": "full", "target": "backup-sysi0.raw" }}, { "type": "drive-backup", "data": { "device": "drive-scsi1", "sync": "full", "target": "backup-scsi1.raw" }}], "properties": { "completion-mode": "grouped" } } }
> >>>
> >>> The VM will core dump:
> >>>
> >>> qemu: qemu_mutex_unlock_impl: Operation not permitted
> >>> Aborted (core dumped)
> >>> (gdb) bt
> >>> #0  0x00007f099d5037bb in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
> >>> #1  0x00007f099d4ee535 in __GI_abort () at abort.c:79
> >>> #2  0x000055c04e39525e in error_exit (err=<optimized out>, msg=msg@entry=0x55c04e5122e0 <__func__.16544> "qemu_mutex_unlock_impl") at util/qemu-thread-posix.c:36
> >>> #3  0x000055c04e395813 in qemu_mutex_unlock_impl (mutex=mutex@entry=0x7f09903154e0, file=file@entry=0x55c04e51129f "util/async.c", line=line@entry=601)
> >>>     at util/qemu-thread-posix.c:108
> >>> #4  0x000055c04e38f8e5 in aio_context_release (ctx=ctx@entry=0x7f0990315480) at util/async.c:601
> >>> #5  0x000055c04e299073 in bdrv_set_aio_context_ignore (bs=0x7f0929a76500, new_context=new_context@entry=0x7f0990315000, ignore=ignore@entry=0x7ffe08fa7400)
> >>>     at block.c:6238
> >>> #6  0x000055c04e2990cc in bdrv_set_aio_context_ignore (bs=bs@entry=0x7f092af47900, new_context=new_context@entry=0x7f0990315000, ignore=ignore@entry=0x7ffe08fa7400)
> >>>     at block.c:6211
> >>> #7  0x000055c04e299443 in bdrv_child_try_set_aio_context (bs=bs@entry=0x7f092af47900, ctx=0x7f0990315000, ignore_child=ignore_child@entry=0x0, errp=errp@entry=0x0)
> >>>     at block.c:6324
> >>> #8  0x000055c04e299576 in bdrv_try_set_aio_context (errp=0x0, ctx=<optimized out>, bs=0x7f092af47900) at block.c:6333
> >>> #9  0x000055c04e299576 in bdrv_replace_child (child=child@entry=0x7f09902ef5e0, new_bs=new_bs@entry=0x0) at block.c:2551
> >>> #10 0x000055c04e2995ff in bdrv_detach_child (child=0x7f09902ef5e0) at block.c:2666
> >>> #11 0x000055c04e299ec9 in bdrv_root_unref_child (child=<optimized out>) at block.c:2677
> >>> #12 0x000055c04e29f3fe in block_job_remove_all_bdrv (job=job@entry=0x7f0927c18800) at blockjob.c:191
> >>> #13 0x000055c04e29f429 in block_job_free (job=0x7f0927c18800) at blockjob.c:88
> >>> #14 0x000055c04e2a0909 in job_unref (job=0x7f0927c18800) at job.c:359
> >>> #15 0x000055c04e2a0909 in job_unref (job=0x7f0927c18800) at job.c:351
> >>> #16 0x000055c04e2a0b68 in job_conclude (job=0x7f0927c18800) at job.c:620
> >>> #17 0x000055c04e2a0b68 in job_finalize_single (job=0x7f0927c18800) at job.c:688
> >>> #18 0x000055c04e2a0b68 in job_finalize_single (job=0x7f0927c18800) at job.c:660
> >>> #19 0x000055c04e2a14fc in job_txn_apply (txn=<optimized out>, fn=0x55c04e2a0a50 <job_finalize_single>) at job.c:145
> >>> #20 0x000055c04e2a14fc in job_do_finalize (job=0x7f0927c1c200) at job.c:781
> >>> #21 0x000055c04e2a1751 in job_completed_txn_success (job=0x7f0927c1c200) at job.c:831
> >>> #22 0x000055c04e2a1751 in job_completed (job=0x7f0927c1c200) at job.c:844
> >>> #23 0x000055c04e2a1751 in job_completed (job=0x7f0927c1c200) at job.c:835
> >>> #24 0x000055c04e2a17b0 in job_exit (opaque=0x7f0927c1c200) at job.c:863
> >>> #25 0x000055c04e38ee75 in aio_bh_call (bh=0x7f098ec52000) at util/async.c:164
> >>> #26 0x000055c04e38ee75 in aio_bh_poll (ctx=ctx@entry=0x7f0990315000) at util/async.c:164
> >>> #27 0x000055c04e3924fe in aio_dispatch (ctx=0x7f0990315000) at util/aio-posix.c:380
> >>> #28 0x000055c04e38ed5e in aio_ctx_dispatch (source=<optimized out>, callback=<optimized out>, user_data=<optimized out>) at util/async.c:298
> >>> #29 0x00007f099f020f2e in g_main_context_dispatch () at /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0
> >>> #30 0x000055c04e391768 in glib_pollfds_poll () at util/main-loop.c:219
> >>> #31 0x000055c04e391768 in os_host_main_loop_wait (timeout=<optimized out>) at util/main-loop.c:242
> >>> #32 0x000055c04e391768 in main_loop_wait (nonblocking=nonblocking@entry=0) at util/main-loop.c:518
> >>> #33 0x000055c04e032329 in qemu_main_loop () at /home/dietmar/pve5-devel/mirror_qemu/softmmu/vl.c:1665
> >>> #34 0x000055c04df36a8e in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at /home/dietmar/pve5-devel/mirror_qemu/softmmu/main.c:49
> >



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: backup transaction with io-thread core dumps
  2020-03-24 13:47     ` Max Reitz
  2020-03-24 14:02       ` Dietmar Maurer
  2020-03-24 16:49       ` Dietmar Maurer
@ 2020-03-25  8:13       ` Sergio Lopez
  2020-03-25 11:46         ` Sergio Lopez
  2 siblings, 1 reply; 23+ messages in thread
From: Sergio Lopez @ 2020-03-25  8:13 UTC (permalink / raw)
  To: Max Reitz; +Cc: jsnow, Dietmar Maurer, Stefan Hajnoczi, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 6502 bytes --]

On Tue, Mar 24, 2020 at 02:47:43PM +0100, Max Reitz wrote:
> Hi Dietmar,
>
> I assume this is with master and has popped up only recently?
>
> Maybe it has something to do with the recent mutex patches by Stefan, so
> I’m Cc-ing him.
>

Hi,

I was able to reproduce the issue with a build after the last batch of
AIO fixes and before Stefan's optimizations. This seems to be a new
issue related to { "completion-mode": "grouped" }. Without that
property, the transaction finishes without a crash.

I'm going to take a look at this.

Sergio.

>
> On 24.03.20 14:33, Dietmar Maurer wrote:
> > spoke too soon - the error is still there, sigh
> >
> >> This is fixed with this patch:
> >>
> >> https://lists.gnu.org/archive/html/qemu-devel/2020-03/msg07249.html
> >>
> >> thanks!
> >>
> >>> On March 24, 2020 12:13 PM Dietmar Maurer <dietmar@proxmox.com> wrote:
> >>>
> >>>
> >>> I get a core dump with backup transactions when using io-threads.
> >>>
> >>> To reproduce, create and start a VM with:
> >>>
> >>> # qemu-img create disk1.raw 100M
> >>> # qemu-img create disk2.raw 100M
> >>> #./x86_64-softmmu/qemu-system-x86_64 -chardev 'socket,id=qmp,path=/var/run/qemu-test.qmp,server,nowait' -mon 'chardev=qmp,mode=control' -pidfile /var/run/qemu-server/108.pid  -m 512 -object 'iothread,id=iothread-virtioscsi0' -object 'iothread,id=iothread-virtioscsi1'  -device 'virtio-scsi-pci,id=virtioscsi0,iothread=iothread-virtioscsi0' -drive 'file=disk1.raw,if=none,id=drive-scsi0,format=raw,cache=none,aio=native,detect-zeroes=on' -device 'scsi-hd,bus=virtioscsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0,id=scsi0' -device 'virtio-scsi-pci,id=virtioscsi1,iothread=iothread-virtioscsi1' -drive 'file=disk2.raw,if=none,id=drive-scsi1,format=raw,cache=none,aio=native,detect-zeroes=on' -device 'scsi-hd,bus=virtioscsi1.0,channel=0,scsi-id=0,lun=1,drive=drive-scsi1,id=scsi1'
> >>>
> >>> Then open socat to the qmp socket
> >>> # socat /var/run/qemu-test.qmp -
> >>>
> >>> And run the following qmp command:
> >>>
> >>> { "execute": "qmp_capabilities", "arguments": {} }
> >>> { "execute": "transaction", "arguments":  { "actions": [{ "type": "drive-backup", "data": { "device": "drive-scsi0", "sync": "full", "target": "backup-sysi0.raw" }}, { "type": "drive-backup", "data": { "device": "drive-scsi1", "sync": "full", "target": "backup-scsi1.raw" }}], "properties": { "completion-mode": "grouped" } } }
> >>>
> >>> The VM will core dump:
> >>>
> >>> qemu: qemu_mutex_unlock_impl: Operation not permitted
> >>> Aborted (core dumped)
> >>> (gdb) bt
> >>> #0  0x00007f099d5037bb in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
> >>> #1  0x00007f099d4ee535 in __GI_abort () at abort.c:79
> >>> #2  0x000055c04e39525e in error_exit (err=<optimized out>, msg=msg@entry=0x55c04e5122e0 <__func__.16544> "qemu_mutex_unlock_impl") at util/qemu-thread-posix.c:36
> >>> #3  0x000055c04e395813 in qemu_mutex_unlock_impl (mutex=mutex@entry=0x7f09903154e0, file=file@entry=0x55c04e51129f "util/async.c", line=line@entry=601)
> >>>     at util/qemu-thread-posix.c:108
> >>> #4  0x000055c04e38f8e5 in aio_context_release (ctx=ctx@entry=0x7f0990315480) at util/async.c:601
> >>> #5  0x000055c04e299073 in bdrv_set_aio_context_ignore (bs=0x7f0929a76500, new_context=new_context@entry=0x7f0990315000, ignore=ignore@entry=0x7ffe08fa7400)
> >>>     at block.c:6238
> >>> #6  0x000055c04e2990cc in bdrv_set_aio_context_ignore (bs=bs@entry=0x7f092af47900, new_context=new_context@entry=0x7f0990315000, ignore=ignore@entry=0x7ffe08fa7400)
> >>>     at block.c:6211
> >>> #7  0x000055c04e299443 in bdrv_child_try_set_aio_context (bs=bs@entry=0x7f092af47900, ctx=0x7f0990315000, ignore_child=ignore_child@entry=0x0, errp=errp@entry=0x0)
> >>>     at block.c:6324
> >>> #8  0x000055c04e299576 in bdrv_try_set_aio_context (errp=0x0, ctx=<optimized out>, bs=0x7f092af47900) at block.c:6333
> >>> #9  0x000055c04e299576 in bdrv_replace_child (child=child@entry=0x7f09902ef5e0, new_bs=new_bs@entry=0x0) at block.c:2551
> >>> #10 0x000055c04e2995ff in bdrv_detach_child (child=0x7f09902ef5e0) at block.c:2666
> >>> #11 0x000055c04e299ec9 in bdrv_root_unref_child (child=<optimized out>) at block.c:2677
> >>> #12 0x000055c04e29f3fe in block_job_remove_all_bdrv (job=job@entry=0x7f0927c18800) at blockjob.c:191
> >>> #13 0x000055c04e29f429 in block_job_free (job=0x7f0927c18800) at blockjob.c:88
> >>> #14 0x000055c04e2a0909 in job_unref (job=0x7f0927c18800) at job.c:359
> >>> #15 0x000055c04e2a0909 in job_unref (job=0x7f0927c18800) at job.c:351
> >>> #16 0x000055c04e2a0b68 in job_conclude (job=0x7f0927c18800) at job.c:620
> >>> #17 0x000055c04e2a0b68 in job_finalize_single (job=0x7f0927c18800) at job.c:688
> >>> #18 0x000055c04e2a0b68 in job_finalize_single (job=0x7f0927c18800) at job.c:660
> >>> #19 0x000055c04e2a14fc in job_txn_apply (txn=<optimized out>, fn=0x55c04e2a0a50 <job_finalize_single>) at job.c:145
> >>> #20 0x000055c04e2a14fc in job_do_finalize (job=0x7f0927c1c200) at job.c:781
> >>> #21 0x000055c04e2a1751 in job_completed_txn_success (job=0x7f0927c1c200) at job.c:831
> >>> #22 0x000055c04e2a1751 in job_completed (job=0x7f0927c1c200) at job.c:844
> >>> #23 0x000055c04e2a1751 in job_completed (job=0x7f0927c1c200) at job.c:835
> >>> #24 0x000055c04e2a17b0 in job_exit (opaque=0x7f0927c1c200) at job.c:863
> >>> #25 0x000055c04e38ee75 in aio_bh_call (bh=0x7f098ec52000) at util/async.c:164
> >>> #26 0x000055c04e38ee75 in aio_bh_poll (ctx=ctx@entry=0x7f0990315000) at util/async.c:164
> >>> #27 0x000055c04e3924fe in aio_dispatch (ctx=0x7f0990315000) at util/aio-posix.c:380
> >>> #28 0x000055c04e38ed5e in aio_ctx_dispatch (source=<optimized out>, callback=<optimized out>, user_data=<optimized out>) at util/async.c:298
> >>> #29 0x00007f099f020f2e in g_main_context_dispatch () at /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0
> >>> #30 0x000055c04e391768 in glib_pollfds_poll () at util/main-loop.c:219
> >>> #31 0x000055c04e391768 in os_host_main_loop_wait (timeout=<optimized out>) at util/main-loop.c:242
> >>> #32 0x000055c04e391768 in main_loop_wait (nonblocking=nonblocking@entry=0) at util/main-loop.c:518
> >>> #33 0x000055c04e032329 in qemu_main_loop () at /home/dietmar/pve5-devel/mirror_qemu/softmmu/vl.c:1665
> >>> #34 0x000055c04df36a8e in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at /home/dietmar/pve5-devel/mirror_qemu/softmmu/main.c:49
> >
>
>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: backup transaction with io-thread core dumps
  2020-03-24 16:49       ` Dietmar Maurer
@ 2020-03-25 11:40         ` Stefan Reiter
  2020-03-25 12:23           ` Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 23+ messages in thread
From: Stefan Reiter @ 2020-03-25 11:40 UTC (permalink / raw)
  Cc: vsementsov, qemu-devel, Max Reitz, Stefan Hajnoczi, jsnow,
	Dietmar Maurer

On 24/03/2020 17:49, Dietmar Maurer wrote:
> A more serious issue is that I also get a hang inside the poll loop
> when the VM is under load. For example, running "stress -d 5" inside
> the VM (Debian Buster).
> 
> Then running a simply drive-backup like:
> 
> { "execute": "drive-backup", "arguments": { "device": "drive-scsi0", "sync": "full", "target": "backup-scsi0.raw" } }
> 
> At the end of the backup, VM hangs, gdb says:
> 
> (gdb) bt
> #0  0x00007ffff5cb3916 in __GI_ppoll (fds=0x7fff63d35c40, nfds=2, timeout=<optimized out>,
>      timeout@entry=0x0, sigmask=sigmask@entry=0x0) at ../sysdeps/unix/sysv/linux/ppoll.c:39
> #1  0x0000555555c5f2c9 in ppoll (__ss=0x0, __timeout=0x0, __nfds=<optimized out>, __fds=<optimized out>)
>      at /usr/include/x86_64-linux-gnu/bits/poll2.h:77
> #2  0x0000555555c5f2c9 in qemu_poll_ns (fds=<optimized out>, nfds=<optimized out>, timeout=timeout@entry=-1)
>      at util/qemu-timer.c:335
> #3  0x0000555555c61ab1 in fdmon_poll_wait (ctx=0x7fffe8905e80, ready_list=0x7fffffffdc68, timeout=-1)
>      at util/fdmon-poll.c:79
> #4  0x0000555555c61097 in aio_poll (ctx=0x7fffe8905e80, blocking=blocking@entry=true) at util/aio-posix.c:589
> #5  0x0000555555bc2243 in bdrv_do_drained_begin
>      (poll=<optimized out>, ignore_bds_parents=false, parent=0x0, recursive=false, bs=0x7fff671c11c0) at block/io.c:429
> #6  0x0000555555bc2243 in bdrv_do_drained_begin
>      (bs=0x7fff671c11c0, recursive=<optimized out>, parent=0x0, ignore_bds_parents=<optimized out>, poll=<optimized out>)
>      at block/io.c:395
> #7  0x0000555555bdd132 in bdrv_backup_top_drop (bs=0x7fff671c11c0) at block/backup-top.c:273
> #8  0x0000555555bd883c in backup_clean (job=0x7fffe5609a00) at block/backup.c:114
> #9  0x0000555555b6d46d in job_clean (job=0x7fffe5609a00) at job.c:657
> #10 0x0000555555b6d46d in job_finalize_single (job=0x7fffe5609a00) at job.c:673
> #11 0x0000555555b6d46d in job_finalize_single (job=0x7fffe5609a00) at job.c:661
> #12 0x0000555555b6dedc in job_txn_apply (txn=<optimized out>, fn=0x555555b6d420 <job_finalize_single>) at job.c:145
> #13 0x0000555555b6dedc in job_do_finalize (job=0x7fffe5609a00) at job.c:782
> #14 0x0000555555b6e131 in job_completed_txn_success (job=0x7fffe5609a00) at job.c:832
> #15 0x0000555555b6e131 in job_completed (job=0x7fffe5609a00) at job.c:845
> #16 0x0000555555b6e131 in job_completed (job=0x7fffe5609a00) at job.c:836
> #17 0x0000555555b6e190 in job_exit (opaque=0x7fffe5609a00) at job.c:864
> #18 0x0000555555c5d855 in aio_bh_call (bh=0x7fffe721a3c0) at util/async.c:164
> #19 0x0000555555c5d855 in aio_bh_poll (ctx=ctx@entry=0x7fffe8905e80) at util/async.c:164
> #20 0x0000555555c60ede in aio_dispatch (ctx=0x7fffe8905e80) at util/aio-posix.c:380
> #21 0x0000555555c5d73e in aio_ctx_dispatch (source=<optimized out>, callback=<optimized out>, user_data=<optimized out>)
>      at util/async.c:298
> #22 0x00007ffff7c7ef2e in g_main_context_dispatch () at /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0
> #23 0x0000555555c60148 in glib_pollfds_poll () at util/main-loop.c:219
> #24 0x0000555555c60148 in os_host_main_loop_wait (timeout=<optimized out>) at util/main-loop.c:242
> #25 0x0000555555c60148 in main_loop_wait (nonblocking=nonblocking@entry=0) at util/main-loop.c:518
> #26 0x00005555558fc579 in qemu_main_loop () at /home/dietmar/pve5-devel/mirror_qemu/softmmu/vl.c:1665
> #27 0x0000555555800c2e in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>)
>      at /home/dietmar/pve5-devel/mirror_qemu/softmmu/main.c:49
> 
> The VM uses a similar hardware setup with io-threads as the previous
> example.
> 
> Is somebody able to reproduce that? And ideas whats wrong here?
> 

Can't speak for the original issue, but this one I can reproduce exactly 
as you say: main thread hangs at the end of a backup job of a drive that 
uses IO threads, when said drive is under load. Single disk, raw-file 
backed.

I've bisected the issue to commit 00e30f05de1d ("block/backup: use 
backup-top instead of write notifiers") - CC'ing Vladimir, since he 
wrote that one. Before that patch, backup works fine.


For completeness, here's the trace I get on my machine (latest master) 
when everything's stuck, Thread 1 is the main thread, 3 the IO thread 
for the drive:

Thread 1 (Thread 0x7ffff5d20680 (LWP 958009)):
#0    0x00007ffff6cf2916 in __GI_ppoll (fds=0x7ffff4a0c2c0, nfds=0x2, 
timeout=<optimized out>, sigmask=0x0) at 
../sysdeps/unix/sysv/linux/ppoll.c:39
#1    0x0000555555dba2dd in qemu_poll_ns (fds=0x7ffff4a0c2c0, nfds=0x2, 
timeout=0xffffffffffffffff) at util/qemu-timer.c:335
#2    0x0000555555dbdb1d in fdmon_poll_wait (ctx=0x7ffff49e3380, 
ready_list=0x7fffffffdab0, timeout=0xffffffffffffffff) at 
util/fdmon-poll.c:79
#3    0x0000555555dbd48d in aio_poll (ctx=0x7ffff49e3380, blocking=0x1) 
at util/aio-posix.c:589
#4    0x0000555555cf5a83 in bdrv_do_drained_begin (bs=0x7fffd1577180, 
recursive=0x0, parent=0x0, ignore_bds_parents=0x0, poll=0x1) at 
block/io.c:429
#5    0x0000555555cf5aef in bdrv_drained_begin (bs=0x7fffd1577180) at 
block/io.c:435
#6    0x0000555555d1b135 in bdrv_backup_top_drop (bs=0x7fffd1577180) at 
block/backup-top.c:273
#7    0x0000555555d14734 in backup_clean (job=0x7ffff49fba00) at 
block/backup.c:132
#8    0x0000555555c81be5 in job_clean (job=0x7ffff49fba00) at job.c:656
#9    0x0000555555c81c60 in job_finalize_single (job=0x7ffff49fba00) at 
job.c:672
#10   0x0000555555c80830 in job_txn_apply (txn=0x7fffd1576a80, 
fn=0x555555c81be8 <job_finalize_single>) at job.c:145
#11   0x0000555555c82017 in job_do_finalize (job=0x7ffff49fba00) at 
job.c:781
#12   0x0000555555c821b4 in job_completed_txn_success 
(job=0x7ffff49fba00) at job.c:831
#13   0x0000555555c82254 in job_completed (job=0x7ffff49fba00) at job.c:844
#14   0x0000555555c822a3 in job_exit (opaque=0x7ffff49fba00) at job.c:863
#15   0x0000555555db78a0 in aio_bh_call (bh=0x7ffff3c0d240) at 
util/async.c:136
#16   0x0000555555db79aa in aio_bh_poll (ctx=0x7ffff49e3380) at 
util/async.c:164
#17   0x0000555555dbcdae in aio_dispatch (ctx=0x7ffff49e3380) at 
util/aio-posix.c:380
#18   0x0000555555db7dbc in aio_ctx_dispatch (source=0x7ffff49e3380, 
callback=0x0, user_data=0x0) at util/async.c:298
#19   0x00007ffff7c8ff2e in g_main_context_dispatch + 0x2ae () at 
/lib/x86_64-linux-gnu/libglib-2.0.so.0
#20   0x0000555555dbb360 in glib_pollfds_poll () at util/main-loop.c:219
#21   0x0000555555dbb3da in os_host_main_loop_wait (timeout=0x3b9aca00) 
at util/main-loop.c:242
#22   0x0000555555dbb4df in main_loop_wait (nonblocking=0x0) at 
util/main-loop.c:518
#23   0x0000555555947478 in qemu_main_loop () at 
/root/qemu/softmmu/vl.c:1665
#24   0x0000555555d58ddf in main (argc=0x15, argv=0x7fffffffe058, 
envp=0x7fffffffe108) at /root/qemu/softmmu/main.c:49

Thread 3 (Thread 0x7ffff497e700 (LWP 958039)):
#0    0x00007ffff6dd629c in __lll_lock_wait + 0x1c () at 
../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:103
#1    0x00007ffff6dcf7d1 in __GI___pthread_mutex_lock 
(mutex=0x7ffff49e36e0) at ../nptl/pthread_mutex_lock.c:115
#2    0x0000555555dc0ad7 in qemu_mutex_lock_impl (mutex=0x7ffff49e36e0, 
file=0x555555f66501 "util/async.c", line=0x254) at 
util/qemu-thread-posix.c:78
#3    0x0000555555db872e in aio_context_acquire (ctx=0x7ffff49e3680) at 
util/async.c:596
#4    0x0000555555db905e in thread_pool_completion_bh 
(opaque=0x7ffff3c0c000) at util/thread-pool.c:167
#5    0x0000555555db78a0 in aio_bh_call (bh=0x7ffff49d7f20) at 
util/async.c:136
#6    0x0000555555db79aa in aio_bh_poll (ctx=0x7ffff49e3680) at 
util/async.c:164
#7    0x0000555555dbd673 in aio_poll (ctx=0x7ffff49e3680, blocking=0x1) 
at util/aio-posix.c:638
#8    0x00005555559d7611 in iothread_run (opaque=0x7ffff49b1c80) at 
iothread.c:75
#9    0x0000555555dc1650 in qemu_thread_start (args=0x7ffff58ea480) at 
util/qemu-thread-posix.c:519
#10   0x00007ffff6dccfa3 in start_thread (arg=<optimized out>) at 
pthread_create.c:486
#11   0x00007ffff6cfd4cf in clone + 0x3f () at 
../sysdeps/unix/sysv/linux/x86_64/clone.S:95

>> On March 24, 2020 2:47 PM Max Reitz <mreitz@redhat.com> wrote:
>>
>>   
>> Hi Dietmar,
>>
>> I assume this is with master and has popped up only recently?
>>
>> Maybe it has something to do with the recent mutex patches by Stefan, so
>> I’m Cc-ing him.
>>
>> Max
>>
>> On 24.03.20 14:33, Dietmar Maurer wrote:
>>> spoke too soon - the error is still there, sigh
>>>   
>>>> This is fixed with this patch:
>>>>
>>>> https://lists.gnu.org/archive/html/qemu-devel/2020-03/msg07249.html
>>>>
>>>> thanks!
>>>>
>>>>> On March 24, 2020 12:13 PM Dietmar Maurer <dietmar@proxmox.com> wrote:
>>>>>
>>>>>   
>>>>> I get a core dump with backup transactions when using io-threads.
>>>>>
>>>>> To reproduce, create and start a VM with:
>>>>>
>>>>> # qemu-img create disk1.raw 100M
>>>>> # qemu-img create disk2.raw 100M
>>>>> #./x86_64-softmmu/qemu-system-x86_64 -chardev 'socket,id=qmp,path=/var/run/qemu-test.qmp,server,nowait' -mon 'chardev=qmp,mode=control' -pidfile /var/run/qemu-server/108.pid  -m 512 -object 'iothread,id=iothread-virtioscsi0' -object 'iothread,id=iothread-virtioscsi1'  -device 'virtio-scsi-pci,id=virtioscsi0,iothread=iothread-virtioscsi0' -drive 'file=disk1.raw,if=none,id=drive-scsi0,format=raw,cache=none,aio=native,detect-zeroes=on' -device 'scsi-hd,bus=virtioscsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0,id=scsi0' -device 'virtio-scsi-pci,id=virtioscsi1,iothread=iothread-virtioscsi1' -drive 'file=disk2.raw,if=none,id=drive-scsi1,format=raw,cache=none,aio=native,detect-zeroes=on' -device 'scsi-hd,bus=virtioscsi1.0,channel=0,scsi-id=0,lun=1,drive=drive-scsi1,id=scsi1'
>>>>>
>>>>> Then open socat to the qmp socket
>>>>> # socat /var/run/qemu-test.qmp -
>>>>>
>>>>> And run the following qmp command:
>>>>>
>>>>> { "execute": "qmp_capabilities", "arguments": {} }
>>>>> { "execute": "transaction", "arguments":  { "actions": [{ "type": "drive-backup", "data": { "device": "drive-scsi0", "sync": "full", "target": "backup-sysi0.raw" }}, { "type": "drive-backup", "data": { "device": "drive-scsi1", "sync": "full", "target": "backup-scsi1.raw" }}], "properties": { "completion-mode": "grouped" } } }
>>>>>
>>>>> The VM will core dump:
>>>>>
>>>>> qemu: qemu_mutex_unlock_impl: Operation not permitted
>>>>> Aborted (core dumped)
>>>>> (gdb) bt
>>>>> #0  0x00007f099d5037bb in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
>>>>> #1  0x00007f099d4ee535 in __GI_abort () at abort.c:79
>>>>> #2  0x000055c04e39525e in error_exit (err=<optimized out>, msg=msg@entry=0x55c04e5122e0 <__func__.16544> "qemu_mutex_unlock_impl") at util/qemu-thread-posix.c:36
>>>>> #3  0x000055c04e395813 in qemu_mutex_unlock_impl (mutex=mutex@entry=0x7f09903154e0, file=file@entry=0x55c04e51129f "util/async.c", line=line@entry=601)
>>>>>      at util/qemu-thread-posix.c:108
>>>>> #4  0x000055c04e38f8e5 in aio_context_release (ctx=ctx@entry=0x7f0990315480) at util/async.c:601
>>>>> #5  0x000055c04e299073 in bdrv_set_aio_context_ignore (bs=0x7f0929a76500, new_context=new_context@entry=0x7f0990315000, ignore=ignore@entry=0x7ffe08fa7400)
>>>>>      at block.c:6238
>>>>> #6  0x000055c04e2990cc in bdrv_set_aio_context_ignore (bs=bs@entry=0x7f092af47900, new_context=new_context@entry=0x7f0990315000, ignore=ignore@entry=0x7ffe08fa7400)
>>>>>      at block.c:6211
>>>>> #7  0x000055c04e299443 in bdrv_child_try_set_aio_context (bs=bs@entry=0x7f092af47900, ctx=0x7f0990315000, ignore_child=ignore_child@entry=0x0, errp=errp@entry=0x0)
>>>>>      at block.c:6324
>>>>> #8  0x000055c04e299576 in bdrv_try_set_aio_context (errp=0x0, ctx=<optimized out>, bs=0x7f092af47900) at block.c:6333
>>>>> #9  0x000055c04e299576 in bdrv_replace_child (child=child@entry=0x7f09902ef5e0, new_bs=new_bs@entry=0x0) at block.c:2551
>>>>> #10 0x000055c04e2995ff in bdrv_detach_child (child=0x7f09902ef5e0) at block.c:2666
>>>>> #11 0x000055c04e299ec9 in bdrv_root_unref_child (child=<optimized out>) at block.c:2677
>>>>> #12 0x000055c04e29f3fe in block_job_remove_all_bdrv (job=job@entry=0x7f0927c18800) at blockjob.c:191
>>>>> #13 0x000055c04e29f429 in block_job_free (job=0x7f0927c18800) at blockjob.c:88
>>>>> #14 0x000055c04e2a0909 in job_unref (job=0x7f0927c18800) at job.c:359
>>>>> #15 0x000055c04e2a0909 in job_unref (job=0x7f0927c18800) at job.c:351
>>>>> #16 0x000055c04e2a0b68 in job_conclude (job=0x7f0927c18800) at job.c:620
>>>>> #17 0x000055c04e2a0b68 in job_finalize_single (job=0x7f0927c18800) at job.c:688
>>>>> #18 0x000055c04e2a0b68 in job_finalize_single (job=0x7f0927c18800) at job.c:660
>>>>> #19 0x000055c04e2a14fc in job_txn_apply (txn=<optimized out>, fn=0x55c04e2a0a50 <job_finalize_single>) at job.c:145
>>>>> #20 0x000055c04e2a14fc in job_do_finalize (job=0x7f0927c1c200) at job.c:781
>>>>> #21 0x000055c04e2a1751 in job_completed_txn_success (job=0x7f0927c1c200) at job.c:831
>>>>> #22 0x000055c04e2a1751 in job_completed (job=0x7f0927c1c200) at job.c:844
>>>>> #23 0x000055c04e2a1751 in job_completed (job=0x7f0927c1c200) at job.c:835
>>>>> #24 0x000055c04e2a17b0 in job_exit (opaque=0x7f0927c1c200) at job.c:863
>>>>> #25 0x000055c04e38ee75 in aio_bh_call (bh=0x7f098ec52000) at util/async.c:164
>>>>> #26 0x000055c04e38ee75 in aio_bh_poll (ctx=ctx@entry=0x7f0990315000) at util/async.c:164
>>>>> #27 0x000055c04e3924fe in aio_dispatch (ctx=0x7f0990315000) at util/aio-posix.c:380
>>>>> #28 0x000055c04e38ed5e in aio_ctx_dispatch (source=<optimized out>, callback=<optimized out>, user_data=<optimized out>) at util/async.c:298
>>>>> #29 0x00007f099f020f2e in g_main_context_dispatch () at /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0
>>>>> #30 0x000055c04e391768 in glib_pollfds_poll () at util/main-loop.c:219
>>>>> #31 0x000055c04e391768 in os_host_main_loop_wait (timeout=<optimized out>) at util/main-loop.c:242
>>>>> #32 0x000055c04e391768 in main_loop_wait (nonblocking=nonblocking@entry=0) at util/main-loop.c:518
>>>>> #33 0x000055c04e032329 in qemu_main_loop () at /home/dietmar/pve5-devel/mirror_qemu/softmmu/vl.c:1665
>>>>> #34 0x000055c04df36a8e in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at /home/dietmar/pve5-devel/mirror_qemu/softmmu/main.c:49



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: backup transaction with io-thread core dumps
  2020-03-25  8:13       ` backup transaction with io-thread core dumps Sergio Lopez
@ 2020-03-25 11:46         ` Sergio Lopez
  2020-03-25 12:29           ` Dietmar Maurer
  0 siblings, 1 reply; 23+ messages in thread
From: Sergio Lopez @ 2020-03-25 11:46 UTC (permalink / raw)
  To: Max Reitz; +Cc: kwolf, jsnow, Dietmar Maurer, Stefan Hajnoczi, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 7624 bytes --]

On Wed, Mar 25, 2020 at 09:13:12AM +0100, Sergio Lopez wrote:
> On Tue, Mar 24, 2020 at 02:47:43PM +0100, Max Reitz wrote:
> > Hi Dietmar,
> >
> > I assume this is with master and has popped up only recently?
> >
> > Maybe it has something to do with the recent mutex patches by Stefan, so
> > I’m Cc-ing him.
> >
>
> Hi,
>
> I was able to reproduce the issue with a build after the last batch of
> AIO fixes and before Stefan's optimizations. This seems to be a new
> issue related to { "completion-mode": "grouped" }. Without that
> property, the transaction finishes without a crash.
>
> I'm going to take a look at this.
>

The problem is that, with completion-mode !=
ACTION_COMPLETION_MODE_INDIVIDUAL, both jobs and their related
BDS's are accessed while running on the AioContext of just one of
them. So, when there's an attempt to work on the "foreign BDS" (the
one that's being accessed while running in the AioContext of the other
BDS), QEMU crashes because its context wasn't acquired.

As expected, if both BDS are running on the same IOThread (and thus,
the same AioContext), the problem is not reproducible.

In a general sense, we could say that completion modes other than
"individual" are not supported for a transaction that may access
different AioContexts. I don't see a safe an easy way to fix this. We
could opt for simply detect and forbid such completion modes when the
BDS's are assigned to different AioContexts. Perhaps Kevin (in CC) has
a better idea.

Sergio.

>
> >
> > On 24.03.20 14:33, Dietmar Maurer wrote:
> > > spoke too soon - the error is still there, sigh
> > >
> > >> This is fixed with this patch:
> > >>
> > >> https://lists.gnu.org/archive/html/qemu-devel/2020-03/msg07249.html
> > >>
> > >> thanks!
> > >>
> > >>> On March 24, 2020 12:13 PM Dietmar Maurer <dietmar@proxmox.com> wrote:
> > >>>
> > >>>
> > >>> I get a core dump with backup transactions when using io-threads.
> > >>>
> > >>> To reproduce, create and start a VM with:
> > >>>
> > >>> # qemu-img create disk1.raw 100M
> > >>> # qemu-img create disk2.raw 100M
> > >>> #./x86_64-softmmu/qemu-system-x86_64 -chardev 'socket,id=qmp,path=/var/run/qemu-test.qmp,server,nowait' -mon 'chardev=qmp,mode=control' -pidfile /var/run/qemu-server/108.pid  -m 512 -object 'iothread,id=iothread-virtioscsi0' -object 'iothread,id=iothread-virtioscsi1'  -device 'virtio-scsi-pci,id=virtioscsi0,iothread=iothread-virtioscsi0' -drive 'file=disk1.raw,if=none,id=drive-scsi0,format=raw,cache=none,aio=native,detect-zeroes=on' -device 'scsi-hd,bus=virtioscsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0,id=scsi0' -device 'virtio-scsi-pci,id=virtioscsi1,iothread=iothread-virtioscsi1' -drive 'file=disk2.raw,if=none,id=drive-scsi1,format=raw,cache=none,aio=native,detect-zeroes=on' -device 'scsi-hd,bus=virtioscsi1.0,channel=0,scsi-id=0,lun=1,drive=drive-scsi1,id=scsi1'
> > >>>
> > >>> Then open socat to the qmp socket
> > >>> # socat /var/run/qemu-test.qmp -
> > >>>
> > >>> And run the following qmp command:
> > >>>
> > >>> { "execute": "qmp_capabilities", "arguments": {} }
> > >>> { "execute": "transaction", "arguments":  { "actions": [{ "type": "drive-backup", "data": { "device": "drive-scsi0", "sync": "full", "target": "backup-sysi0.raw" }}, { "type": "drive-backup", "data": { "device": "drive-scsi1", "sync": "full", "target": "backup-scsi1.raw" }}], "properties": { "completion-mode": "grouped" } } }
> > >>>
> > >>> The VM will core dump:
> > >>>
> > >>> qemu: qemu_mutex_unlock_impl: Operation not permitted
> > >>> Aborted (core dumped)
> > >>> (gdb) bt
> > >>> #0  0x00007f099d5037bb in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
> > >>> #1  0x00007f099d4ee535 in __GI_abort () at abort.c:79
> > >>> #2  0x000055c04e39525e in error_exit (err=<optimized out>, msg=msg@entry=0x55c04e5122e0 <__func__.16544> "qemu_mutex_unlock_impl") at util/qemu-thread-posix.c:36
> > >>> #3  0x000055c04e395813 in qemu_mutex_unlock_impl (mutex=mutex@entry=0x7f09903154e0, file=file@entry=0x55c04e51129f "util/async.c", line=line@entry=601)
> > >>>     at util/qemu-thread-posix.c:108
> > >>> #4  0x000055c04e38f8e5 in aio_context_release (ctx=ctx@entry=0x7f0990315480) at util/async.c:601
> > >>> #5  0x000055c04e299073 in bdrv_set_aio_context_ignore (bs=0x7f0929a76500, new_context=new_context@entry=0x7f0990315000, ignore=ignore@entry=0x7ffe08fa7400)
> > >>>     at block.c:6238
> > >>> #6  0x000055c04e2990cc in bdrv_set_aio_context_ignore (bs=bs@entry=0x7f092af47900, new_context=new_context@entry=0x7f0990315000, ignore=ignore@entry=0x7ffe08fa7400)
> > >>>     at block.c:6211
> > >>> #7  0x000055c04e299443 in bdrv_child_try_set_aio_context (bs=bs@entry=0x7f092af47900, ctx=0x7f0990315000, ignore_child=ignore_child@entry=0x0, errp=errp@entry=0x0)
> > >>>     at block.c:6324
> > >>> #8  0x000055c04e299576 in bdrv_try_set_aio_context (errp=0x0, ctx=<optimized out>, bs=0x7f092af47900) at block.c:6333
> > >>> #9  0x000055c04e299576 in bdrv_replace_child (child=child@entry=0x7f09902ef5e0, new_bs=new_bs@entry=0x0) at block.c:2551
> > >>> #10 0x000055c04e2995ff in bdrv_detach_child (child=0x7f09902ef5e0) at block.c:2666
> > >>> #11 0x000055c04e299ec9 in bdrv_root_unref_child (child=<optimized out>) at block.c:2677
> > >>> #12 0x000055c04e29f3fe in block_job_remove_all_bdrv (job=job@entry=0x7f0927c18800) at blockjob.c:191
> > >>> #13 0x000055c04e29f429 in block_job_free (job=0x7f0927c18800) at blockjob.c:88
> > >>> #14 0x000055c04e2a0909 in job_unref (job=0x7f0927c18800) at job.c:359
> > >>> #15 0x000055c04e2a0909 in job_unref (job=0x7f0927c18800) at job.c:351
> > >>> #16 0x000055c04e2a0b68 in job_conclude (job=0x7f0927c18800) at job.c:620
> > >>> #17 0x000055c04e2a0b68 in job_finalize_single (job=0x7f0927c18800) at job.c:688
> > >>> #18 0x000055c04e2a0b68 in job_finalize_single (job=0x7f0927c18800) at job.c:660
> > >>> #19 0x000055c04e2a14fc in job_txn_apply (txn=<optimized out>, fn=0x55c04e2a0a50 <job_finalize_single>) at job.c:145
> > >>> #20 0x000055c04e2a14fc in job_do_finalize (job=0x7f0927c1c200) at job.c:781
> > >>> #21 0x000055c04e2a1751 in job_completed_txn_success (job=0x7f0927c1c200) at job.c:831
> > >>> #22 0x000055c04e2a1751 in job_completed (job=0x7f0927c1c200) at job.c:844
> > >>> #23 0x000055c04e2a1751 in job_completed (job=0x7f0927c1c200) at job.c:835
> > >>> #24 0x000055c04e2a17b0 in job_exit (opaque=0x7f0927c1c200) at job.c:863
> > >>> #25 0x000055c04e38ee75 in aio_bh_call (bh=0x7f098ec52000) at util/async.c:164
> > >>> #26 0x000055c04e38ee75 in aio_bh_poll (ctx=ctx@entry=0x7f0990315000) at util/async.c:164
> > >>> #27 0x000055c04e3924fe in aio_dispatch (ctx=0x7f0990315000) at util/aio-posix.c:380
> > >>> #28 0x000055c04e38ed5e in aio_ctx_dispatch (source=<optimized out>, callback=<optimized out>, user_data=<optimized out>) at util/async.c:298
> > >>> #29 0x00007f099f020f2e in g_main_context_dispatch () at /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0
> > >>> #30 0x000055c04e391768 in glib_pollfds_poll () at util/main-loop.c:219
> > >>> #31 0x000055c04e391768 in os_host_main_loop_wait (timeout=<optimized out>) at util/main-loop.c:242
> > >>> #32 0x000055c04e391768 in main_loop_wait (nonblocking=nonblocking@entry=0) at util/main-loop.c:518
> > >>> #33 0x000055c04e032329 in qemu_main_loop () at /home/dietmar/pve5-devel/mirror_qemu/softmmu/vl.c:1665
> > >>> #34 0x000055c04df36a8e in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at /home/dietmar/pve5-devel/mirror_qemu/softmmu/main.c:49
> > >
> >
> >

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: backup transaction with io-thread core dumps
  2020-03-25 11:40         ` Stefan Reiter
@ 2020-03-25 12:23           ` Vladimir Sementsov-Ogievskiy
  2020-03-25 15:50             ` [PATCH] backup: don't acquire aio_context in backup_clean Stefan Reiter
  0 siblings, 1 reply; 23+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-03-25 12:23 UTC (permalink / raw)
  To: Stefan Reiter
  Cc: qemu-devel, jsnow, Dietmar Maurer, Stefan Hajnoczi, Max Reitz

25.03.2020 14:40, Stefan Reiter wrote:
> On 24/03/2020 17:49, Dietmar Maurer wrote:
>> A more serious issue is that I also get a hang inside the poll loop
>> when the VM is under load. For example, running "stress -d 5" inside
>> the VM (Debian Buster).
>>
>> Then running a simply drive-backup like:
>>
>> { "execute": "drive-backup", "arguments": { "device": "drive-scsi0", "sync": "full", "target": "backup-scsi0.raw" } }
>>
>> At the end of the backup, VM hangs, gdb says:
>>
>> (gdb) bt
>> #0  0x00007ffff5cb3916 in __GI_ppoll (fds=0x7fff63d35c40, nfds=2, timeout=<optimized out>,
>>      timeout@entry=0x0, sigmask=sigmask@entry=0x0) at ../sysdeps/unix/sysv/linux/ppoll.c:39
>> #1  0x0000555555c5f2c9 in ppoll (__ss=0x0, __timeout=0x0, __nfds=<optimized out>, __fds=<optimized out>)
>>      at /usr/include/x86_64-linux-gnu/bits/poll2.h:77
>> #2  0x0000555555c5f2c9 in qemu_poll_ns (fds=<optimized out>, nfds=<optimized out>, timeout=timeout@entry=-1)
>>      at util/qemu-timer.c:335
>> #3  0x0000555555c61ab1 in fdmon_poll_wait (ctx=0x7fffe8905e80, ready_list=0x7fffffffdc68, timeout=-1)
>>      at util/fdmon-poll.c:79
>> #4  0x0000555555c61097 in aio_poll (ctx=0x7fffe8905e80, blocking=blocking@entry=true) at util/aio-posix.c:589
>> #5  0x0000555555bc2243 in bdrv_do_drained_begin
>>      (poll=<optimized out>, ignore_bds_parents=false, parent=0x0, recursive=false, bs=0x7fff671c11c0) at block/io.c:429
>> #6  0x0000555555bc2243 in bdrv_do_drained_begin
>>      (bs=0x7fff671c11c0, recursive=<optimized out>, parent=0x0, ignore_bds_parents=<optimized out>, poll=<optimized out>)
>>      at block/io.c:395
>> #7  0x0000555555bdd132 in bdrv_backup_top_drop (bs=0x7fff671c11c0) at block/backup-top.c:273
>> #8  0x0000555555bd883c in backup_clean (job=0x7fffe5609a00) at block/backup.c:114
>> #9  0x0000555555b6d46d in job_clean (job=0x7fffe5609a00) at job.c:657
>> #10 0x0000555555b6d46d in job_finalize_single (job=0x7fffe5609a00) at job.c:673
>> #11 0x0000555555b6d46d in job_finalize_single (job=0x7fffe5609a00) at job.c:661
>> #12 0x0000555555b6dedc in job_txn_apply (txn=<optimized out>, fn=0x555555b6d420 <job_finalize_single>) at job.c:145
>> #13 0x0000555555b6dedc in job_do_finalize (job=0x7fffe5609a00) at job.c:782
>> #14 0x0000555555b6e131 in job_completed_txn_success (job=0x7fffe5609a00) at job.c:832
>> #15 0x0000555555b6e131 in job_completed (job=0x7fffe5609a00) at job.c:845
>> #16 0x0000555555b6e131 in job_completed (job=0x7fffe5609a00) at job.c:836
>> #17 0x0000555555b6e190 in job_exit (opaque=0x7fffe5609a00) at job.c:864
>> #18 0x0000555555c5d855 in aio_bh_call (bh=0x7fffe721a3c0) at util/async.c:164
>> #19 0x0000555555c5d855 in aio_bh_poll (ctx=ctx@entry=0x7fffe8905e80) at util/async.c:164
>> #20 0x0000555555c60ede in aio_dispatch (ctx=0x7fffe8905e80) at util/aio-posix.c:380
>> #21 0x0000555555c5d73e in aio_ctx_dispatch (source=<optimized out>, callback=<optimized out>, user_data=<optimized out>)
>>      at util/async.c:298
>> #22 0x00007ffff7c7ef2e in g_main_context_dispatch () at /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0
>> #23 0x0000555555c60148 in glib_pollfds_poll () at util/main-loop.c:219
>> #24 0x0000555555c60148 in os_host_main_loop_wait (timeout=<optimized out>) at util/main-loop.c:242
>> #25 0x0000555555c60148 in main_loop_wait (nonblocking=nonblocking@entry=0) at util/main-loop.c:518
>> #26 0x00005555558fc579 in qemu_main_loop () at /home/dietmar/pve5-devel/mirror_qemu/softmmu/vl.c:1665
>> #27 0x0000555555800c2e in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>)
>>      at /home/dietmar/pve5-devel/mirror_qemu/softmmu/main.c:49
>>
>> The VM uses a similar hardware setup with io-threads as the previous
>> example.
>>
>> Is somebody able to reproduce that? And ideas whats wrong here?
>>
> 
> Can't speak for the original issue, but this one I can reproduce exactly as you say: main thread hangs at the end of a backup job of a drive that uses IO threads, when said drive is under load. Single disk, raw-file backed.
> 
> I've bisected the issue to commit 00e30f05de1d ("block/backup: use backup-top instead of write notifiers") - CC'ing Vladimir, since he wrote that one. Before that patch, backup works fine.
> 
> 
> For completeness, here's the trace I get on my machine (latest master) when everything's stuck, Thread 1 is the main thread, 3 the IO thread for the drive:
> 
> Thread 1 (Thread 0x7ffff5d20680 (LWP 958009)):
> #0    0x00007ffff6cf2916 in __GI_ppoll (fds=0x7ffff4a0c2c0, nfds=0x2, timeout=<optimized out>, sigmask=0x0) at ../sysdeps/unix/sysv/linux/ppoll.c:39
> #1    0x0000555555dba2dd in qemu_poll_ns (fds=0x7ffff4a0c2c0, nfds=0x2, timeout=0xffffffffffffffff) at util/qemu-timer.c:335
> #2    0x0000555555dbdb1d in fdmon_poll_wait (ctx=0x7ffff49e3380, ready_list=0x7fffffffdab0, timeout=0xffffffffffffffff) at util/fdmon-poll.c:79
> #3    0x0000555555dbd48d in aio_poll (ctx=0x7ffff49e3380, blocking=0x1) at util/aio-posix.c:589
> #4    0x0000555555cf5a83 in bdrv_do_drained_begin (bs=0x7fffd1577180, recursive=0x0, parent=0x0, ignore_bds_parents=0x0, poll=0x1) at block/io.c:429

So we wait for some request, as we want to drain... And we are in BDRV_POLL_WHILE loop, which should release aio context before aio_poll() call. So it's strange that thread 3 hangs.
Do we loop in BDRV_POLL_WHILE, or we are really hang here?

You may print bs->tracked_requests, to see which requests we are waiting for. Requests has co pointers to their coroutines, which may be examined by

gdb> source scripts/qemu-gdb.py
gdb> qemu coroutine <coroutine-pointer>


> #5    0x0000555555cf5aef in bdrv_drained_begin (bs=0x7fffd1577180) at block/io.c:435
> #6    0x0000555555d1b135 in bdrv_backup_top_drop (bs=0x7fffd1577180) at block/backup-top.c:273
> #7    0x0000555555d14734 in backup_clean (job=0x7ffff49fba00) at block/backup.c:132
> #8    0x0000555555c81be5 in job_clean (job=0x7ffff49fba00) at job.c:656
> #9    0x0000555555c81c60 in job_finalize_single (job=0x7ffff49fba00) at job.c:672
> #10   0x0000555555c80830 in job_txn_apply (txn=0x7fffd1576a80, fn=0x555555c81be8 <job_finalize_single>) at job.c:145
> #11   0x0000555555c82017 in job_do_finalize (job=0x7ffff49fba00) at job.c:781
> #12   0x0000555555c821b4 in job_completed_txn_success (job=0x7ffff49fba00) at job.c:831
> #13   0x0000555555c82254 in job_completed (job=0x7ffff49fba00) at job.c:844
> #14   0x0000555555c822a3 in job_exit (opaque=0x7ffff49fba00) at job.c:863
> #15   0x0000555555db78a0 in aio_bh_call (bh=0x7ffff3c0d240) at util/async.c:136
> #16   0x0000555555db79aa in aio_bh_poll (ctx=0x7ffff49e3380) at util/async.c:164
> #17   0x0000555555dbcdae in aio_dispatch (ctx=0x7ffff49e3380) at util/aio-posix.c:380
> #18   0x0000555555db7dbc in aio_ctx_dispatch (source=0x7ffff49e3380, callback=0x0, user_data=0x0) at util/async.c:298
> #19   0x00007ffff7c8ff2e in g_main_context_dispatch + 0x2ae () at /lib/x86_64-linux-gnu/libglib-2.0.so.0
> #20   0x0000555555dbb360 in glib_pollfds_poll () at util/main-loop.c:219
> #21   0x0000555555dbb3da in os_host_main_loop_wait (timeout=0x3b9aca00) at util/main-loop.c:242
> #22   0x0000555555dbb4df in main_loop_wait (nonblocking=0x0) at util/main-loop.c:518
> #23   0x0000555555947478 in qemu_main_loop () at /root/qemu/softmmu/vl.c:1665
> #24   0x0000555555d58ddf in main (argc=0x15, argv=0x7fffffffe058, envp=0x7fffffffe108) at /root/qemu/softmmu/main.c:49
> 
> Thread 3 (Thread 0x7ffff497e700 (LWP 958039)):

Print the mutex content, to see who holds it.

> #0    0x00007ffff6dd629c in __lll_lock_wait + 0x1c () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:103
> #1    0x00007ffff6dcf7d1 in __GI___pthread_mutex_lock (mutex=0x7ffff49e36e0) at ../nptl/pthread_mutex_lock.c:115
> #2    0x0000555555dc0ad7 in qemu_mutex_lock_impl (mutex=0x7ffff49e36e0, file=0x555555f66501 "util/async.c", line=0x254) at util/qemu-thread-posix.c:78
> #3    0x0000555555db872e in aio_context_acquire (ctx=0x7ffff49e3680) at util/async.c:596
> #4    0x0000555555db905e in thread_pool_completion_bh (opaque=0x7ffff3c0c000) at util/thread-pool.c:167
> #5    0x0000555555db78a0 in aio_bh_call (bh=0x7ffff49d7f20) at util/async.c:136
> #6    0x0000555555db79aa in aio_bh_poll (ctx=0x7ffff49e3680) at util/async.c:164
> #7    0x0000555555dbd673 in aio_poll (ctx=0x7ffff49e3680, blocking=0x1) at util/aio-posix.c:638
> #8    0x00005555559d7611 in iothread_run (opaque=0x7ffff49b1c80) at iothread.c:75
> #9    0x0000555555dc1650 in qemu_thread_start (args=0x7ffff58ea480) at util/qemu-thread-posix.c:519
> #10   0x00007ffff6dccfa3 in start_thread (arg=<optimized out>) at pthread_create.c:486
> #11   0x00007ffff6cfd4cf in clone + 0x3f () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
> 
>>> On March 24, 2020 2:47 PM Max Reitz <mreitz@redhat.com> wrote:
>>>
>>> Hi Dietmar,
>>>
>>> I assume this is with master and has popped up only recently?
>>>
>>> Maybe it has something to do with the recent mutex patches by Stefan, so
>>> I’m Cc-ing him.
>>>
>>> Max
>>>
>>> On 24.03.20 14:33, Dietmar Maurer wrote:
>>>> spoke too soon - the error is still there, sigh
>>>>> This is fixed with this patch:
>>>>>
>>>>> https://lists.gnu.org/archive/html/qemu-devel/2020-03/msg07249.html
>>>>>
>>>>> thanks!
>>>>>
>>>>>> On March 24, 2020 12:13 PM Dietmar Maurer <dietmar@proxmox.com> wrote:
>>>>>>
>>>>>> I get a core dump with backup transactions when using io-threads.
>>>>>>
>>>>>> To reproduce, create and start a VM with:
>>>>>>
>>>>>> # qemu-img create disk1.raw 100M
>>>>>> # qemu-img create disk2.raw 100M
>>>>>> #./x86_64-softmmu/qemu-system-x86_64 -chardev 'socket,id=qmp,path=/var/run/qemu-test.qmp,server,nowait' -mon 'chardev=qmp,mode=control' -pidfile /var/run/qemu-server/108.pid  -m 512 -object 'iothread,id=iothread-virtioscsi0' -object 'iothread,id=iothread-virtioscsi1'  -device 'virtio-scsi-pci,id=virtioscsi0,iothread=iothread-virtioscsi0' -drive 'file=disk1.raw,if=none,id=drive-scsi0,format=raw,cache=none,aio=native,detect-zeroes=on' -device 'scsi-hd,bus=virtioscsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0,id=scsi0' -device 'virtio-scsi-pci,id=virtioscsi1,iothread=iothread-virtioscsi1' -drive 'file=disk2.raw,if=none,id=drive-scsi1,format=raw,cache=none,aio=native,detect-zeroes=on' -device 'scsi-hd,bus=virtioscsi1.0,channel=0,scsi-id=0,lun=1,drive=drive-scsi1,id=scsi1'
>>>>>>
>>>>>> Then open socat to the qmp socket
>>>>>> # socat /var/run/qemu-test.qmp -
>>>>>>
>>>>>> And run the following qmp command:
>>>>>>
>>>>>> { "execute": "qmp_capabilities", "arguments": {} }
>>>>>> { "execute": "transaction", "arguments":  { "actions": [{ "type": "drive-backup", "data": { "device": "drive-scsi0", "sync": "full", "target": "backup-sysi0.raw" }}, { "type": "drive-backup", "data": { "device": "drive-scsi1", "sync": "full", "target": "backup-scsi1.raw" }}], "properties": { "completion-mode": "grouped" } } }
>>>>>>
>>>>>> The VM will core dump:
>>>>>>
>>>>>> qemu: qemu_mutex_unlock_impl: Operation not permitted
>>>>>> Aborted (core dumped)
>>>>>> (gdb) bt
>>>>>> #0  0x00007f099d5037bb in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
>>>>>> #1  0x00007f099d4ee535 in __GI_abort () at abort.c:79
>>>>>> #2  0x000055c04e39525e in error_exit (err=<optimized out>, msg=msg@entry=0x55c04e5122e0 <__func__.16544> "qemu_mutex_unlock_impl") at util/qemu-thread-posix.c:36
>>>>>> #3  0x000055c04e395813 in qemu_mutex_unlock_impl (mutex=mutex@entry=0x7f09903154e0, file=file@entry=0x55c04e51129f "util/async.c", line=line@entry=601)
>>>>>>      at util/qemu-thread-posix.c:108
>>>>>> #4  0x000055c04e38f8e5 in aio_context_release (ctx=ctx@entry=0x7f0990315480) at util/async.c:601
>>>>>> #5  0x000055c04e299073 in bdrv_set_aio_context_ignore (bs=0x7f0929a76500, new_context=new_context@entry=0x7f0990315000, ignore=ignore@entry=0x7ffe08fa7400)
>>>>>>      at block.c:6238
>>>>>> #6  0x000055c04e2990cc in bdrv_set_aio_context_ignore (bs=bs@entry=0x7f092af47900, new_context=new_context@entry=0x7f0990315000, ignore=ignore@entry=0x7ffe08fa7400)
>>>>>>      at block.c:6211
>>>>>> #7  0x000055c04e299443 in bdrv_child_try_set_aio_context (bs=bs@entry=0x7f092af47900, ctx=0x7f0990315000, ignore_child=ignore_child@entry=0x0, errp=errp@entry=0x0)
>>>>>>      at block.c:6324
>>>>>> #8  0x000055c04e299576 in bdrv_try_set_aio_context (errp=0x0, ctx=<optimized out>, bs=0x7f092af47900) at block.c:6333
>>>>>> #9  0x000055c04e299576 in bdrv_replace_child (child=child@entry=0x7f09902ef5e0, new_bs=new_bs@entry=0x0) at block.c:2551
>>>>>> #10 0x000055c04e2995ff in bdrv_detach_child (child=0x7f09902ef5e0) at block.c:2666
>>>>>> #11 0x000055c04e299ec9 in bdrv_root_unref_child (child=<optimized out>) at block.c:2677
>>>>>> #12 0x000055c04e29f3fe in block_job_remove_all_bdrv (job=job@entry=0x7f0927c18800) at blockjob.c:191
>>>>>> #13 0x000055c04e29f429 in block_job_free (job=0x7f0927c18800) at blockjob.c:88
>>>>>> #14 0x000055c04e2a0909 in job_unref (job=0x7f0927c18800) at job.c:359
>>>>>> #15 0x000055c04e2a0909 in job_unref (job=0x7f0927c18800) at job.c:351
>>>>>> #16 0x000055c04e2a0b68 in job_conclude (job=0x7f0927c18800) at job.c:620
>>>>>> #17 0x000055c04e2a0b68 in job_finalize_single (job=0x7f0927c18800) at job.c:688
>>>>>> #18 0x000055c04e2a0b68 in job_finalize_single (job=0x7f0927c18800) at job.c:660
>>>>>> #19 0x000055c04e2a14fc in job_txn_apply (txn=<optimized out>, fn=0x55c04e2a0a50 <job_finalize_single>) at job.c:145
>>>>>> #20 0x000055c04e2a14fc in job_do_finalize (job=0x7f0927c1c200) at job.c:781
>>>>>> #21 0x000055c04e2a1751 in job_completed_txn_success (job=0x7f0927c1c200) at job.c:831
>>>>>> #22 0x000055c04e2a1751 in job_completed (job=0x7f0927c1c200) at job.c:844
>>>>>> #23 0x000055c04e2a1751 in job_completed (job=0x7f0927c1c200) at job.c:835
>>>>>> #24 0x000055c04e2a17b0 in job_exit (opaque=0x7f0927c1c200) at job.c:863
>>>>>> #25 0x000055c04e38ee75 in aio_bh_call (bh=0x7f098ec52000) at util/async.c:164
>>>>>> #26 0x000055c04e38ee75 in aio_bh_poll (ctx=ctx@entry=0x7f0990315000) at util/async.c:164
>>>>>> #27 0x000055c04e3924fe in aio_dispatch (ctx=0x7f0990315000) at util/aio-posix.c:380
>>>>>> #28 0x000055c04e38ed5e in aio_ctx_dispatch (source=<optimized out>, callback=<optimized out>, user_data=<optimized out>) at util/async.c:298
>>>>>> #29 0x00007f099f020f2e in g_main_context_dispatch () at /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0
>>>>>> #30 0x000055c04e391768 in glib_pollfds_poll () at util/main-loop.c:219
>>>>>> #31 0x000055c04e391768 in os_host_main_loop_wait (timeout=<optimized out>) at util/main-loop.c:242
>>>>>> #32 0x000055c04e391768 in main_loop_wait (nonblocking=nonblocking@entry=0) at util/main-loop.c:518
>>>>>> #33 0x000055c04e032329 in qemu_main_loop () at /home/dietmar/pve5-devel/mirror_qemu/softmmu/vl.c:1665
>>>>>> #34 0x000055c04df36a8e in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at /home/dietmar/pve5-devel/mirror_qemu/softmmu/main.c:49
> 


-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: backup transaction with io-thread core dumps
  2020-03-25 11:46         ` Sergio Lopez
@ 2020-03-25 12:29           ` Dietmar Maurer
  2020-03-25 12:39             ` Sergio Lopez
  0 siblings, 1 reply; 23+ messages in thread
From: Dietmar Maurer @ 2020-03-25 12:29 UTC (permalink / raw)
  To: Sergio Lopez, Max Reitz; +Cc: kwolf, jsnow, qemu-devel, Stefan Hajnoczi

> As expected, if both BDS are running on the same IOThread (and thus,
> the same AioContext), the problem is not reproducible.
> 
> In a general sense, we could say that completion modes other than
> "individual" are not supported for a transaction that may access
> different AioContexts. I don't see a safe an easy way to fix this. We
> could opt for simply detect and forbid such completion modes when the
> BDS's are assigned to different AioContexts. Perhaps Kevin (in CC) has
> a better idea.

So the solution is to disable backups when using io-threads?



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: backup transaction with io-thread core dumps
  2020-03-25 12:29           ` Dietmar Maurer
@ 2020-03-25 12:39             ` Sergio Lopez
  2020-03-25 15:40               ` Dietmar Maurer
  0 siblings, 1 reply; 23+ messages in thread
From: Sergio Lopez @ 2020-03-25 12:39 UTC (permalink / raw)
  To: Dietmar Maurer; +Cc: kwolf, jsnow, qemu-devel, Stefan Hajnoczi, Max Reitz

[-- Attachment #1: Type: text/plain, Size: 849 bytes --]

On Wed, Mar 25, 2020 at 01:29:48PM +0100, Dietmar Maurer wrote:
> > As expected, if both BDS are running on the same IOThread (and thus,
> > the same AioContext), the problem is not reproducible.
> >
> > In a general sense, we could say that completion modes other than
> > "individual" are not supported for a transaction that may access
> > different AioContexts. I don't see a safe an easy way to fix this. We
> > could opt for simply detect and forbid such completion modes when the
> > BDS's are assigned to different AioContexts. Perhaps Kevin (in CC) has
> > a better idea.
>
> So the solution is to disable backups when using io-threads?
>

I meant forbidding transactions with completion-mode == grouped. It
would be still possible running transactions (and thus, backups) with
completion-mode == individual, which is the default.

Sergio.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: backup transaction with io-thread core dumps
  2020-03-25 12:39             ` Sergio Lopez
@ 2020-03-25 15:40               ` Dietmar Maurer
  2020-03-26  7:50                 ` Sergio Lopez
  0 siblings, 1 reply; 23+ messages in thread
From: Dietmar Maurer @ 2020-03-25 15:40 UTC (permalink / raw)
  To: Sergio Lopez; +Cc: kwolf, jsnow, qemu-devel, Stefan Hajnoczi, Max Reitz

> On March 25, 2020 1:39 PM Sergio Lopez <slp@redhat.com> wrote:
> 
>  
> On Wed, Mar 25, 2020 at 01:29:48PM +0100, Dietmar Maurer wrote:
> > > As expected, if both BDS are running on the same IOThread (and thus,
> > > the same AioContext), the problem is not reproducible.
> > >
> > > In a general sense, we could say that completion modes other than
> > > "individual" are not supported for a transaction that may access
> > > different AioContexts. I don't see a safe an easy way to fix this. We
> > > could opt for simply detect and forbid such completion modes when the
> > > BDS's are assigned to different AioContexts. Perhaps Kevin (in CC) has
> > > a better idea.
> >
> > So the solution is to disable backups when using io-threads?
> >
> 
> I meant forbidding transactions with completion-mode == grouped. It
> would be still possible running transactions (and thus, backups) with
> completion-mode == individual, which is the default.

As mentioned earlier, even a totally simple/normal backup job fails when
using io-threads and the VM is under load. It results in a total
VM freeze!



^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH] backup: don't acquire aio_context in backup_clean
  2020-03-25 12:23           ` Vladimir Sementsov-Ogievskiy
@ 2020-03-25 15:50             ` Stefan Reiter
  2020-03-26  5:54               ` Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 23+ messages in thread
From: Stefan Reiter @ 2020-03-25 15:50 UTC (permalink / raw)
  To: qemu-devel, qemu-block
  Cc: kwolf, vsementsov, mreitz, stefanha, jsnow, dietmar

backup_clean is only ever called as a handler via job_exit, which
already acquires the job's context. The job's context is guaranteed to
be the same as the one used by backup_top via backup_job_create.

Since the previous logic effectively acquired the lock twice, this
broke cleanup of backups for disks using IO threads, since the BDRV_POLL_WHILE
in bdrv_backup_top_drop -> bdrv_do_drained_begin would only release the lock
once, thus deadlocking with the IO thread.

Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
---

This is a fix for the issue discussed in this part of the thread:
https://lists.gnu.org/archive/html/qemu-devel/2020-03/msg07639.html
...not the original problem (core dump) posted by Dietmar.

I've still seen it occasionally hang during a backup abort. I'm trying to figure
out why that happens, stack trace indicates a similar problem with the main
thread hanging at bdrv_do_drained_begin, though I have no clue why as of yet.

 block/backup.c | 4 ----
 1 file changed, 4 deletions(-)

diff --git a/block/backup.c b/block/backup.c
index 7430ca5883..a7a7dcaf4c 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -126,11 +126,7 @@ static void backup_abort(Job *job)
 static void backup_clean(Job *job)
 {
     BackupBlockJob *s = container_of(job, BackupBlockJob, common.job);
-    AioContext *aio_context = bdrv_get_aio_context(s->backup_top);
-
-    aio_context_acquire(aio_context);
     bdrv_backup_top_drop(s->backup_top);
-    aio_context_release(aio_context);
 }
 
 void backup_do_checkpoint(BlockJob *job, Error **errp)
-- 
2.25.2




^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH] backup: don't acquire aio_context in backup_clean
  2020-03-25 15:50             ` [PATCH] backup: don't acquire aio_context in backup_clean Stefan Reiter
@ 2020-03-26  5:54               ` Vladimir Sementsov-Ogievskiy
  2020-03-26  9:43                 ` Stefan Reiter
  2020-03-26 11:53                 ` Sergio Lopez
  0 siblings, 2 replies; 23+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-03-26  5:54 UTC (permalink / raw)
  To: Stefan Reiter, qemu-devel, qemu-block
  Cc: kwolf, slp, mreitz, stefanha, jsnow, dietmar

25.03.2020 18:50, Stefan Reiter wrote:
> backup_clean is only ever called as a handler via job_exit, which

Hmm.. I'm afraid it's not quite correct.

job_clean

   job_finalize_single

      job_completed_txn_abort (lock aio context)

      job_do_finalize


Hmm. job_do_finalize calls job_completed_txn_abort, which cares to lock aio context..
And on the same time, it directaly calls job_txn_apply(job->txn, job_finalize_single)
without locking. Is it a bug?

And, even if job_do_finalize called always with locked context, where is guarantee that all
context of all jobs in txn are locked?

Still, let's look through its callers.

	  job_finalize

                qmp_block_job_finalize (lock aio context)
                qmp_job_finalize (lock aio context)
                test_cancel_concluded (doesn't lock, but it's a test)

           job_completed_txn_success

                job_completed

                     job_exit (lock aio context)

                     job_cancel
	 	
                          blockdev_mark_auto_del (lock aio context)

                          job_user_cancel

                              qmp_block_job_cancel (locks context)
                              qmp_job_cancel  (locks context)

                          job_cancel_err

                               job_cancel_sync (return job_finish_sync(job, &job_cancel_err, NULL);, job_finish_sync just calls callback)

                                    replication_close (it's .bdrv_close.. Hmm, I don't see context locking, where is it ?)

                                    replication_stop (locks context)

                                    drive_backup_abort (locks context)

                                    blockdev_backup_abort (locks context)

                                    job_cancel_sync_all (locks context)

                                    cancel_common (locks context)

                          test_* (I don't care)

> already acquires the job's context. The job's context is guaranteed to
> be the same as the one used by backup_top via backup_job_create.
> 
> Since the previous logic effectively acquired the lock twice, this
> broke cleanup of backups for disks using IO threads, since the BDRV_POLL_WHILE
> in bdrv_backup_top_drop -> bdrv_do_drained_begin would only release the lock
> once, thus deadlocking with the IO thread.
> 
> Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>

Just note, that this thing were recently touched by 0abf2581717a19 , so add Sergio (its author) to CC.

> ---
> 
> This is a fix for the issue discussed in this part of the thread:
> https://lists.gnu.org/archive/html/qemu-devel/2020-03/msg07639.html
> ...not the original problem (core dump) posted by Dietmar.
> 
> I've still seen it occasionally hang during a backup abort. I'm trying to figure
> out why that happens, stack trace indicates a similar problem with the main
> thread hanging at bdrv_do_drained_begin, though I have no clue why as of yet.
> 
>   block/backup.c | 4 ----
>   1 file changed, 4 deletions(-)
> 
> diff --git a/block/backup.c b/block/backup.c
> index 7430ca5883..a7a7dcaf4c 100644
> --- a/block/backup.c
> +++ b/block/backup.c
> @@ -126,11 +126,7 @@ static void backup_abort(Job *job)
>   static void backup_clean(Job *job)
>   {
>       BackupBlockJob *s = container_of(job, BackupBlockJob, common.job);
> -    AioContext *aio_context = bdrv_get_aio_context(s->backup_top);
> -
> -    aio_context_acquire(aio_context);
>       bdrv_backup_top_drop(s->backup_top);
> -    aio_context_release(aio_context);
>   }
>   
>   void backup_do_checkpoint(BlockJob *job, Error **errp)
> 


-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: backup transaction with io-thread core dumps
  2020-03-25 15:40               ` Dietmar Maurer
@ 2020-03-26  7:50                 ` Sergio Lopez
  2020-03-26  8:14                   ` Dietmar Maurer
  0 siblings, 1 reply; 23+ messages in thread
From: Sergio Lopez @ 2020-03-26  7:50 UTC (permalink / raw)
  To: Dietmar Maurer; +Cc: kwolf, jsnow, qemu-devel, Stefan Hajnoczi, Max Reitz

[-- Attachment #1: Type: text/plain, Size: 1328 bytes --]

On Wed, Mar 25, 2020 at 04:40:48PM +0100, Dietmar Maurer wrote:
> > On March 25, 2020 1:39 PM Sergio Lopez <slp@redhat.com> wrote:
> > 
> >  
> > On Wed, Mar 25, 2020 at 01:29:48PM +0100, Dietmar Maurer wrote:
> > > > As expected, if both BDS are running on the same IOThread (and thus,
> > > > the same AioContext), the problem is not reproducible.
> > > >
> > > > In a general sense, we could say that completion modes other than
> > > > "individual" are not supported for a transaction that may access
> > > > different AioContexts. I don't see a safe an easy way to fix this. We
> > > > could opt for simply detect and forbid such completion modes when the
> > > > BDS's are assigned to different AioContexts. Perhaps Kevin (in CC) has
> > > > a better idea.
> > >
> > > So the solution is to disable backups when using io-threads?
> > >
> > 
> > I meant forbidding transactions with completion-mode == grouped. It
> > would be still possible running transactions (and thus, backups) with
> > completion-mode == individual, which is the default.
> 
> As mentioned earlier, even a totally simple/normal backup job fails when
> using io-threads and the VM is under load. It results in a total
> VM freeze!
> 

This is definitely a different issue. I'll take a look at it today.

Cheers,
Sergio.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: backup transaction with io-thread core dumps
  2020-03-26  7:50                 ` Sergio Lopez
@ 2020-03-26  8:14                   ` Dietmar Maurer
  2020-03-26  9:23                     ` Dietmar Maurer
  0 siblings, 1 reply; 23+ messages in thread
From: Dietmar Maurer @ 2020-03-26  8:14 UTC (permalink / raw)
  To: Sergio Lopez; +Cc: kwolf, jsnow, qemu-devel, Stefan Hajnoczi, Max Reitz


> > > > So the solution is to disable backups when using io-threads?
> > > >
> > > 
> > > I meant forbidding transactions with completion-mode == grouped. It
> > > would be still possible running transactions (and thus, backups) with
> > > completion-mode == individual, which is the default.
> > 
> > As mentioned earlier, even a totally simple/normal backup job fails when
> > using io-threads and the VM is under load. It results in a total
> > VM freeze!
> > 
> 
> This is definitely a different issue. I'll take a look at it today.

Thanks. Stefan found a way to avoid that bug with:

https://lists.gnu.org/archive/html/qemu-devel/2020-03/msg07749.html

But there are doubts that this is the correct way to fix it ...



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: backup transaction with io-thread core dumps
  2020-03-26  8:14                   ` Dietmar Maurer
@ 2020-03-26  9:23                     ` Dietmar Maurer
  0 siblings, 0 replies; 23+ messages in thread
From: Dietmar Maurer @ 2020-03-26  9:23 UTC (permalink / raw)
  To: Sergio Lopez; +Cc: kwolf, jsnow, qemu-devel, Stefan Hajnoczi, Max Reitz

> > > As mentioned earlier, even a totally simple/normal backup job fails when
> > > using io-threads and the VM is under load. It results in a total
> > > VM freeze!
> > > 
> > 
> > This is definitely a different issue. I'll take a look at it today.
> 
> Thanks. Stefan found a way to avoid that bug with:
> 
> https://lists.gnu.org/archive/html/qemu-devel/2020-03/msg07749.html
> 
> But there are doubts that this is the correct way to fix it ...

And I just run into another freeze (with Stefans path applied). This time
when I cancel a running backup:

#0  0x00007ffff5cb3916 in __GI_ppoll (fds=0x7fff63d35c40, nfds=2, timeout=<optimized out>, timeout@entry=0x0, sigmask=sigmask@entry=0x0) at ../sysdeps/unix/sysv/linux/ppoll.c:39
#1  0x0000555555c5fcd9 in ppoll (__ss=0x0, __timeout=0x0, __nfds=<optimized out>, __fds=<optimized out>) at /usr/include/x86_64-linux-gnu/bits/poll2.h:77
#2  0x0000555555c5fcd9 in qemu_poll_ns (fds=<optimized out>, nfds=<optimized out>, timeout=timeout@entry=-1) at util/qemu-timer.c:335
#3  0x0000555555c624c1 in fdmon_poll_wait (ctx=0x7fffe8905e80, ready_list=0x7fffffffd2a8, timeout=-1) at util/fdmon-poll.c:79
#4  0x0000555555c61aa7 in aio_poll (ctx=0x7fffe8905e80, blocking=blocking@entry=true) at util/aio-posix.c:589
#5  0x0000555555bc2c83 in bdrv_do_drained_begin (poll=<optimized out>, ignore_bds_parents=false, parent=0x0, recursive=false, bs=0x7fffe8954bc0) at block/io.c:429
#6  0x0000555555bc2c83 in bdrv_do_drained_begin (bs=0x7fffe8954bc0, recursive=<optimized out>, parent=0x0, ignore_bds_parents=<optimized out>, poll=<optimized out>) at block/io.c:395
#7  0x0000555555bb3c37 in blk_drain (blk=0x7fffe8ebcf80) at block/block-backend.c:1617
#8  0x0000555555bb481d in blk_unref (blk=0x7fffe8ebcf80) at block/block-backend.c:473
#9  0x0000555555b6c835 in block_job_free (job=0x7fff64505000) at blockjob.c:89
#10 0x0000555555b6dd19 in job_unref (job=0x7fff64505000) at job.c:360
#11 0x0000555555b6dd19 in job_unref (job=0x7fff64505000) at job.c:352
#12 0x0000555555b6e69a in job_finish_sync (job=job@entry=0x7fff64505000, finish=finish@entry=0x555555b6ec80 <job_cancel_err>, errp=errp@entry=0x0) at job.c:988
#13 0x0000555555b6ec9e in job_cancel_sync (job=job@entry=0x7fff64505000) at job.c:931
...

Any ideas?



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] backup: don't acquire aio_context in backup_clean
  2020-03-26  5:54               ` Vladimir Sementsov-Ogievskiy
@ 2020-03-26  9:43                 ` Stefan Reiter
  2020-03-26 12:46                   ` Vladimir Sementsov-Ogievskiy
  2020-03-26 11:53                 ` Sergio Lopez
  1 sibling, 1 reply; 23+ messages in thread
From: Stefan Reiter @ 2020-03-26  9:43 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-devel, qemu-block
  Cc: kwolf, slp, mreitz, stefanha, jsnow, dietmar

On 26/03/2020 06:54, Vladimir Sementsov-Ogievskiy wrote:
> 25.03.2020 18:50, Stefan Reiter wrote:
>> backup_clean is only ever called as a handler via job_exit, which
> 
> Hmm.. I'm afraid it's not quite correct.
> 
> job_clean
> 
>    job_finalize_single
> 
>       job_completed_txn_abort (lock aio context)
> 
>       job_do_finalize
> 
> 
> Hmm. job_do_finalize calls job_completed_txn_abort, which cares to lock 
> aio context..
> And on the same time, it directaly calls job_txn_apply(job->txn, 
> job_finalize_single)
> without locking. Is it a bug?
> 

I think, as you say, the idea is that job_do_finalize is always called 
with the lock acquired. That's why job_completed_txn_abort takes care to 
release the lock (at least of the "outer_ctx" as it calls it) before 
reacquiring it.

> And, even if job_do_finalize called always with locked context, where is 
> guarantee that all
> context of all jobs in txn are locked?
> 

I also don't see anything that guarantees that... I guess it could be 
adapted to handle locks like job_completed_txn_abort does?

Haven't looked into transactions too much, but does it even make sense 
to have jobs in different contexts in one transaction?

> Still, let's look through its callers.
> 
>        job_finalize
> 
>                 qmp_block_job_finalize (lock aio context)
>                 qmp_job_finalize (lock aio context)
>                 test_cancel_concluded (doesn't lock, but it's a test)
> 
>            job_completed_txn_success
> 
>                 job_completed
> 
>                      job_exit (lock aio context)
> 
>                      job_cancel
> 
>                           blockdev_mark_auto_del (lock aio context)
> 
>                           job_user_cancel
> 
>                               qmp_block_job_cancel (locks context)
>                               qmp_job_cancel  (locks context)
> 
>                           job_cancel_err
> 
>                                job_cancel_sync (return 
> job_finish_sync(job, &job_cancel_err, NULL);, job_finish_sync just calls 
> callback)
> 
>                                     replication_close (it's 
> .bdrv_close.. Hmm, I don't see context locking, where is it ?)
Hm, don't see it either. This might indeed be a way to get to job_clean 
without a lock held.

I don't have any testing set up for replication atm, but if you believe 
this would be correct I can send a patch for that as well (just acquire 
the lock in replication_close before job_cancel_async?).

> 
>                                     replication_stop (locks context)
> 
>                                     drive_backup_abort (locks context)
> 
>                                     blockdev_backup_abort (locks context)
> 
>                                     job_cancel_sync_all (locks context)
> 
>                                     cancel_common (locks context)
> 
>                           test_* (I don't care)
> 

To clarify, aside from the commit message the patch itself does not 
appear to be wrong? All paths (aside from replication_close mentioned 
above) guarantee the job lock to be held.

>> already acquires the job's context. The job's context is guaranteed to
>> be the same as the one used by backup_top via backup_job_create.
>>
>> Since the previous logic effectively acquired the lock twice, this
>> broke cleanup of backups for disks using IO threads, since the 
>> BDRV_POLL_WHILE
>> in bdrv_backup_top_drop -> bdrv_do_drained_begin would only release 
>> the lock
>> once, thus deadlocking with the IO thread.
>>
>> Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
> 
> Just note, that this thing were recently touched by 0abf2581717a19 , so 
> add Sergio (its author) to CC.
> 
>> ---
>>
>> This is a fix for the issue discussed in this part of the thread:
>> https://lists.gnu.org/archive/html/qemu-devel/2020-03/msg07639.html
>> ...not the original problem (core dump) posted by Dietmar.
>>
>> I've still seen it occasionally hang during a backup abort. I'm trying 
>> to figure
>> out why that happens, stack trace indicates a similar problem with the 
>> main
>> thread hanging at bdrv_do_drained_begin, though I have no clue why as 
>> of yet.
>>
>>   block/backup.c | 4 ----
>>   1 file changed, 4 deletions(-)
>>
>> diff --git a/block/backup.c b/block/backup.c
>> index 7430ca5883..a7a7dcaf4c 100644
>> --- a/block/backup.c
>> +++ b/block/backup.c
>> @@ -126,11 +126,7 @@ static void backup_abort(Job *job)
>>   static void backup_clean(Job *job)
>>   {
>>       BackupBlockJob *s = container_of(job, BackupBlockJob, common.job);
>> -    AioContext *aio_context = bdrv_get_aio_context(s->backup_top);
>> -
>> -    aio_context_acquire(aio_context);
>>       bdrv_backup_top_drop(s->backup_top);
>> -    aio_context_release(aio_context);
>>   }
>>   void backup_do_checkpoint(BlockJob *job, Error **errp)
>>
> 
> 



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] backup: don't acquire aio_context in backup_clean
  2020-03-26  5:54               ` Vladimir Sementsov-Ogievskiy
  2020-03-26  9:43                 ` Stefan Reiter
@ 2020-03-26 11:53                 ` Sergio Lopez
  1 sibling, 0 replies; 23+ messages in thread
From: Sergio Lopez @ 2020-03-26 11:53 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy
  Cc: kwolf, qemu-block, Stefan Reiter, qemu-devel, mreitz, stefanha,
	jsnow, dietmar

[-- Attachment #1: Type: text/plain, Size: 4406 bytes --]

On Thu, Mar 26, 2020 at 08:54:53AM +0300, Vladimir Sementsov-Ogievskiy wrote:
> 25.03.2020 18:50, Stefan Reiter wrote:
> > backup_clean is only ever called as a handler via job_exit, which
> 
> Hmm.. I'm afraid it's not quite correct.
> 
> job_clean
> 
>   job_finalize_single
> 
>      job_completed_txn_abort (lock aio context)
> 
>      job_do_finalize
> 
> 
> Hmm. job_do_finalize calls job_completed_txn_abort, which cares to lock aio context..
> And on the same time, it directaly calls job_txn_apply(job->txn, job_finalize_single)
> without locking. Is it a bug?

Indeed, looks like a bug to me. In fact, that's the one causing the
issue that Dietmar initially reported.

In think the proper fix is drop the context acquisition/release that
in backup_clean that I added in 0abf2581717a19, as Stefan proposed,
and also acquire the context of "foreign" jobs at job_txn_apply, just
as job_completed_txn_abort does.

Thanks,
Sergio.

> And, even if job_do_finalize called always with locked context, where is guarantee that all
> context of all jobs in txn are locked?
> 
> Still, let's look through its callers.
> 
> 	  job_finalize
> 
>                qmp_block_job_finalize (lock aio context)
>                qmp_job_finalize (lock aio context)
>                test_cancel_concluded (doesn't lock, but it's a test)
> 
>           job_completed_txn_success
> 
>                job_completed
> 
>                     job_exit (lock aio context)
> 
>                     job_cancel
> 	 	
>                          blockdev_mark_auto_del (lock aio context)
> 
>                          job_user_cancel
> 
>                              qmp_block_job_cancel (locks context)
>                              qmp_job_cancel  (locks context)
> 
>                          job_cancel_err
> 
>                               job_cancel_sync (return job_finish_sync(job, &job_cancel_err, NULL);, job_finish_sync just calls callback)
> 
>                                    replication_close (it's .bdrv_close.. Hmm, I don't see context locking, where is it ?)
> 
>                                    replication_stop (locks context)
> 
>                                    drive_backup_abort (locks context)
> 
>                                    blockdev_backup_abort (locks context)
> 
>                                    job_cancel_sync_all (locks context)
> 
>                                    cancel_common (locks context)
> 
>                          test_* (I don't care)
> 
> > already acquires the job's context. The job's context is guaranteed to
> > be the same as the one used by backup_top via backup_job_create.
> > 
> > Since the previous logic effectively acquired the lock twice, this
> > broke cleanup of backups for disks using IO threads, since the BDRV_POLL_WHILE
> > in bdrv_backup_top_drop -> bdrv_do_drained_begin would only release the lock
> > once, thus deadlocking with the IO thread.
> > 
> > Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
> 
> Just note, that this thing were recently touched by 0abf2581717a19 , so add Sergio (its author) to CC.
> 
> > ---
> > 
> > This is a fix for the issue discussed in this part of the thread:
> > https://lists.gnu.org/archive/html/qemu-devel/2020-03/msg07639.html
> > ...not the original problem (core dump) posted by Dietmar.
> > 
> > I've still seen it occasionally hang during a backup abort. I'm trying to figure
> > out why that happens, stack trace indicates a similar problem with the main
> > thread hanging at bdrv_do_drained_begin, though I have no clue why as of yet.
> > 
> >   block/backup.c | 4 ----
> >   1 file changed, 4 deletions(-)
> > 
> > diff --git a/block/backup.c b/block/backup.c
> > index 7430ca5883..a7a7dcaf4c 100644
> > --- a/block/backup.c
> > +++ b/block/backup.c
> > @@ -126,11 +126,7 @@ static void backup_abort(Job *job)
> >   static void backup_clean(Job *job)
> >   {
> >       BackupBlockJob *s = container_of(job, BackupBlockJob, common.job);
> > -    AioContext *aio_context = bdrv_get_aio_context(s->backup_top);
> > -
> > -    aio_context_acquire(aio_context);
> >       bdrv_backup_top_drop(s->backup_top);
> > -    aio_context_release(aio_context);
> >   }
> >   void backup_do_checkpoint(BlockJob *job, Error **errp)
> > 
> 
> 
> -- 
> Best regards,
> Vladimir
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] backup: don't acquire aio_context in backup_clean
  2020-03-26  9:43                 ` Stefan Reiter
@ 2020-03-26 12:46                   ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 23+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-03-26 12:46 UTC (permalink / raw)
  To: Stefan Reiter, qemu-devel, qemu-block
  Cc: kwolf, slp, mreitz, stefanha, jsnow, dietmar

26.03.2020 12:43, Stefan Reiter wrote:
> On 26/03/2020 06:54, Vladimir Sementsov-Ogievskiy wrote:
>> 25.03.2020 18:50, Stefan Reiter wrote:
>>> backup_clean is only ever called as a handler via job_exit, which
>>
>> Hmm.. I'm afraid it's not quite correct.
>>
>> job_clean
>>
>>    job_finalize_single
>>
>>       job_completed_txn_abort (lock aio context)
>>
>>       job_do_finalize
>>
>>
>> Hmm. job_do_finalize calls job_completed_txn_abort, which cares to lock aio context..
>> And on the same time, it directaly calls job_txn_apply(job->txn, job_finalize_single)
>> without locking. Is it a bug?
>>
> 
> I think, as you say, the idea is that job_do_finalize is always called with the lock acquired. That's why job_completed_txn_abort takes care to release the lock (at least of the "outer_ctx" as it calls it) before reacquiring it.
> 
>> And, even if job_do_finalize called always with locked context, where is guarantee that all
>> context of all jobs in txn are locked?
>>
> 
> I also don't see anything that guarantees that... I guess it could be adapted to handle locks like job_completed_txn_abort does?
> 
> Haven't looked into transactions too much, but does it even make sense to have jobs in different contexts in one transaction?

Why not? Assume backing two disks in one transaction, each in separate io thread.. (honestly, I don't know does it work)

> 
>> Still, let's look through its callers.
>>
>>        job_finalize
>>
>>                 qmp_block_job_finalize (lock aio context)
>>                 qmp_job_finalize (lock aio context)
>>                 test_cancel_concluded (doesn't lock, but it's a test)
>>
>>            job_completed_txn_success
>>
>>                 job_completed
>>
>>                      job_exit (lock aio context)
>>
>>                      job_cancel
>>
>>                           blockdev_mark_auto_del (lock aio context)
>>
>>                           job_user_cancel
>>
>>                               qmp_block_job_cancel (locks context)
>>                               qmp_job_cancel  (locks context)
>>
>>                           job_cancel_err
>>
>>                                job_cancel_sync (return job_finish_sync(job, &job_cancel_err, NULL);, job_finish_sync just calls callback)
>>
>>                                     replication_close (it's .bdrv_close.. Hmm, I don't see context locking, where is it ?)
> Hm, don't see it either. This might indeed be a way to get to job_clean without a lock held.
> 
> I don't have any testing set up for replication atm, but if you believe this would be correct I can send a patch for that as well (just acquire the lock in replication_close before job_cancel_async?).

I don't know.. But sending a patch is good way to start a discussion)

> 
>>
>>                                     replication_stop (locks context)
>>
>>                                     drive_backup_abort (locks context)
>>
>>                                     blockdev_backup_abort (locks context)
>>
>>                                     job_cancel_sync_all (locks context)
>>
>>                                     cancel_common (locks context)
>>
>>                           test_* (I don't care)
>>
> 
> To clarify, aside from the commit message the patch itself does not appear to be wrong? All paths (aside from replication_close mentioned above) guarantee the job lock to be held.

I mostly worry about the case with transaction with jobs from different aio contexts than about replication..

Anyway, I hope that someone who has better understanding of these things will look at this.

It usually not good idea to send [PATCH] inside discussion thread, it'd better be a separate thread, to be more visible.

May be you send separate series, which will include this patch, some fix for replication, and try to fix job_do_finalize in some way, and we continue discussion from this new series?

> 
>>> already acquires the job's context. The job's context is guaranteed to
>>> be the same as the one used by backup_top via backup_job_create.
>>>
>>> Since the previous logic effectively acquired the lock twice, this
>>> broke cleanup of backups for disks using IO threads, since the BDRV_POLL_WHILE
>>> in bdrv_backup_top_drop -> bdrv_do_drained_begin would only release the lock
>>> once, thus deadlocking with the IO thread.
>>>
>>> Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
>>
>> Just note, that this thing were recently touched by 0abf2581717a19 , so add Sergio (its author) to CC.
>>
>>> ---
>>>
>>> This is a fix for the issue discussed in this part of the thread:
>>> https://lists.gnu.org/archive/html/qemu-devel/2020-03/msg07639.html
>>> ...not the original problem (core dump) posted by Dietmar.
>>>
>>> I've still seen it occasionally hang during a backup abort. I'm trying to figure
>>> out why that happens, stack trace indicates a similar problem with the main
>>> thread hanging at bdrv_do_drained_begin, though I have no clue why as of yet.
>>>
>>>   block/backup.c | 4 ----
>>>   1 file changed, 4 deletions(-)
>>>
>>> diff --git a/block/backup.c b/block/backup.c
>>> index 7430ca5883..a7a7dcaf4c 100644
>>> --- a/block/backup.c
>>> +++ b/block/backup.c
>>> @@ -126,11 +126,7 @@ static void backup_abort(Job *job)
>>>   static void backup_clean(Job *job)
>>>   {
>>>       BackupBlockJob *s = container_of(job, BackupBlockJob, common.job);
>>> -    AioContext *aio_context = bdrv_get_aio_context(s->backup_top);
>>> -
>>> -    aio_context_acquire(aio_context);
>>>       bdrv_backup_top_drop(s->backup_top);
>>> -    aio_context_release(aio_context);
>>>   }
>>>   void backup_do_checkpoint(BlockJob *job, Error **errp)
>>>
>>
>>
> 


-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2020-03-26 12:47 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-03-24 11:13 backup transaction with io-thread core dumps Dietmar Maurer
2020-03-24 13:30 ` Dietmar Maurer
2020-03-24 13:33   ` Dietmar Maurer
2020-03-24 13:44     ` John Snow
2020-03-24 14:00       ` Dietmar Maurer
2020-03-24 13:47     ` Max Reitz
2020-03-24 14:02       ` Dietmar Maurer
2020-03-24 16:49       ` Dietmar Maurer
2020-03-25 11:40         ` Stefan Reiter
2020-03-25 12:23           ` Vladimir Sementsov-Ogievskiy
2020-03-25 15:50             ` [PATCH] backup: don't acquire aio_context in backup_clean Stefan Reiter
2020-03-26  5:54               ` Vladimir Sementsov-Ogievskiy
2020-03-26  9:43                 ` Stefan Reiter
2020-03-26 12:46                   ` Vladimir Sementsov-Ogievskiy
2020-03-26 11:53                 ` Sergio Lopez
2020-03-25  8:13       ` backup transaction with io-thread core dumps Sergio Lopez
2020-03-25 11:46         ` Sergio Lopez
2020-03-25 12:29           ` Dietmar Maurer
2020-03-25 12:39             ` Sergio Lopez
2020-03-25 15:40               ` Dietmar Maurer
2020-03-26  7:50                 ` Sergio Lopez
2020-03-26  8:14                   ` Dietmar Maurer
2020-03-26  9:23                     ` Dietmar Maurer

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.