On Wed, Mar 25, 2020 at 09:13:12AM +0100, Sergio Lopez wrote: > On Tue, Mar 24, 2020 at 02:47:43PM +0100, Max Reitz wrote: > > Hi Dietmar, > > > > I assume this is with master and has popped up only recently? > > > > Maybe it has something to do with the recent mutex patches by Stefan, so > > I’m Cc-ing him. > > > > Hi, > > I was able to reproduce the issue with a build after the last batch of > AIO fixes and before Stefan's optimizations. This seems to be a new > issue related to { "completion-mode": "grouped" }. Without that > property, the transaction finishes without a crash. > > I'm going to take a look at this. > The problem is that, with completion-mode != ACTION_COMPLETION_MODE_INDIVIDUAL, both jobs and their related BDS's are accessed while running on the AioContext of just one of them. So, when there's an attempt to work on the "foreign BDS" (the one that's being accessed while running in the AioContext of the other BDS), QEMU crashes because its context wasn't acquired. As expected, if both BDS are running on the same IOThread (and thus, the same AioContext), the problem is not reproducible. In a general sense, we could say that completion modes other than "individual" are not supported for a transaction that may access different AioContexts. I don't see a safe an easy way to fix this. We could opt for simply detect and forbid such completion modes when the BDS's are assigned to different AioContexts. Perhaps Kevin (in CC) has a better idea. Sergio. > > > > > On 24.03.20 14:33, Dietmar Maurer wrote: > > > spoke too soon - the error is still there, sigh > > > > > >> This is fixed with this patch: > > >> > > >> https://lists.gnu.org/archive/html/qemu-devel/2020-03/msg07249.html > > >> > > >> thanks! > > >> > > >>> On March 24, 2020 12:13 PM Dietmar Maurer wrote: > > >>> > > >>> > > >>> I get a core dump with backup transactions when using io-threads. > > >>> > > >>> To reproduce, create and start a VM with: > > >>> > > >>> # qemu-img create disk1.raw 100M > > >>> # qemu-img create disk2.raw 100M > > >>> #./x86_64-softmmu/qemu-system-x86_64 -chardev 'socket,id=qmp,path=/var/run/qemu-test.qmp,server,nowait' -mon 'chardev=qmp,mode=control' -pidfile /var/run/qemu-server/108.pid -m 512 -object 'iothread,id=iothread-virtioscsi0' -object 'iothread,id=iothread-virtioscsi1' -device 'virtio-scsi-pci,id=virtioscsi0,iothread=iothread-virtioscsi0' -drive 'file=disk1.raw,if=none,id=drive-scsi0,format=raw,cache=none,aio=native,detect-zeroes=on' -device 'scsi-hd,bus=virtioscsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0,id=scsi0' -device 'virtio-scsi-pci,id=virtioscsi1,iothread=iothread-virtioscsi1' -drive 'file=disk2.raw,if=none,id=drive-scsi1,format=raw,cache=none,aio=native,detect-zeroes=on' -device 'scsi-hd,bus=virtioscsi1.0,channel=0,scsi-id=0,lun=1,drive=drive-scsi1,id=scsi1' > > >>> > > >>> Then open socat to the qmp socket > > >>> # socat /var/run/qemu-test.qmp - > > >>> > > >>> And run the following qmp command: > > >>> > > >>> { "execute": "qmp_capabilities", "arguments": {} } > > >>> { "execute": "transaction", "arguments": { "actions": [{ "type": "drive-backup", "data": { "device": "drive-scsi0", "sync": "full", "target": "backup-sysi0.raw" }}, { "type": "drive-backup", "data": { "device": "drive-scsi1", "sync": "full", "target": "backup-scsi1.raw" }}], "properties": { "completion-mode": "grouped" } } } > > >>> > > >>> The VM will core dump: > > >>> > > >>> qemu: qemu_mutex_unlock_impl: Operation not permitted > > >>> Aborted (core dumped) > > >>> (gdb) bt > > >>> #0 0x00007f099d5037bb in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50 > > >>> #1 0x00007f099d4ee535 in __GI_abort () at abort.c:79 > > >>> #2 0x000055c04e39525e in error_exit (err=, msg=msg@entry=0x55c04e5122e0 <__func__.16544> "qemu_mutex_unlock_impl") at util/qemu-thread-posix.c:36 > > >>> #3 0x000055c04e395813 in qemu_mutex_unlock_impl (mutex=mutex@entry=0x7f09903154e0, file=file@entry=0x55c04e51129f "util/async.c", line=line@entry=601) > > >>> at util/qemu-thread-posix.c:108 > > >>> #4 0x000055c04e38f8e5 in aio_context_release (ctx=ctx@entry=0x7f0990315480) at util/async.c:601 > > >>> #5 0x000055c04e299073 in bdrv_set_aio_context_ignore (bs=0x7f0929a76500, new_context=new_context@entry=0x7f0990315000, ignore=ignore@entry=0x7ffe08fa7400) > > >>> at block.c:6238 > > >>> #6 0x000055c04e2990cc in bdrv_set_aio_context_ignore (bs=bs@entry=0x7f092af47900, new_context=new_context@entry=0x7f0990315000, ignore=ignore@entry=0x7ffe08fa7400) > > >>> at block.c:6211 > > >>> #7 0x000055c04e299443 in bdrv_child_try_set_aio_context (bs=bs@entry=0x7f092af47900, ctx=0x7f0990315000, ignore_child=ignore_child@entry=0x0, errp=errp@entry=0x0) > > >>> at block.c:6324 > > >>> #8 0x000055c04e299576 in bdrv_try_set_aio_context (errp=0x0, ctx=, bs=0x7f092af47900) at block.c:6333 > > >>> #9 0x000055c04e299576 in bdrv_replace_child (child=child@entry=0x7f09902ef5e0, new_bs=new_bs@entry=0x0) at block.c:2551 > > >>> #10 0x000055c04e2995ff in bdrv_detach_child (child=0x7f09902ef5e0) at block.c:2666 > > >>> #11 0x000055c04e299ec9 in bdrv_root_unref_child (child=) at block.c:2677 > > >>> #12 0x000055c04e29f3fe in block_job_remove_all_bdrv (job=job@entry=0x7f0927c18800) at blockjob.c:191 > > >>> #13 0x000055c04e29f429 in block_job_free (job=0x7f0927c18800) at blockjob.c:88 > > >>> #14 0x000055c04e2a0909 in job_unref (job=0x7f0927c18800) at job.c:359 > > >>> #15 0x000055c04e2a0909 in job_unref (job=0x7f0927c18800) at job.c:351 > > >>> #16 0x000055c04e2a0b68 in job_conclude (job=0x7f0927c18800) at job.c:620 > > >>> #17 0x000055c04e2a0b68 in job_finalize_single (job=0x7f0927c18800) at job.c:688 > > >>> #18 0x000055c04e2a0b68 in job_finalize_single (job=0x7f0927c18800) at job.c:660 > > >>> #19 0x000055c04e2a14fc in job_txn_apply (txn=, fn=0x55c04e2a0a50 ) at job.c:145 > > >>> #20 0x000055c04e2a14fc in job_do_finalize (job=0x7f0927c1c200) at job.c:781 > > >>> #21 0x000055c04e2a1751 in job_completed_txn_success (job=0x7f0927c1c200) at job.c:831 > > >>> #22 0x000055c04e2a1751 in job_completed (job=0x7f0927c1c200) at job.c:844 > > >>> #23 0x000055c04e2a1751 in job_completed (job=0x7f0927c1c200) at job.c:835 > > >>> #24 0x000055c04e2a17b0 in job_exit (opaque=0x7f0927c1c200) at job.c:863 > > >>> #25 0x000055c04e38ee75 in aio_bh_call (bh=0x7f098ec52000) at util/async.c:164 > > >>> #26 0x000055c04e38ee75 in aio_bh_poll (ctx=ctx@entry=0x7f0990315000) at util/async.c:164 > > >>> #27 0x000055c04e3924fe in aio_dispatch (ctx=0x7f0990315000) at util/aio-posix.c:380 > > >>> #28 0x000055c04e38ed5e in aio_ctx_dispatch (source=, callback=, user_data=) at util/async.c:298 > > >>> #29 0x00007f099f020f2e in g_main_context_dispatch () at /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0 > > >>> #30 0x000055c04e391768 in glib_pollfds_poll () at util/main-loop.c:219 > > >>> #31 0x000055c04e391768 in os_host_main_loop_wait (timeout=) at util/main-loop.c:242 > > >>> #32 0x000055c04e391768 in main_loop_wait (nonblocking=nonblocking@entry=0) at util/main-loop.c:518 > > >>> #33 0x000055c04e032329 in qemu_main_loop () at /home/dietmar/pve5-devel/mirror_qemu/softmmu/vl.c:1665 > > >>> #34 0x000055c04df36a8e in main (argc=, argv=, envp=) at /home/dietmar/pve5-devel/mirror_qemu/softmmu/main.c:49 > > > > > > >