Eric Blake writes: > On 12/9/19 10:06 AM, Kevin Wolf wrote: >> Am 28.11.2019 um 11:41 hat Sergio Lopez geschrieben: >>> bdrv_try_set_aio_context() requires that the old context is held, and >>> the new context is not held. Fix all the occurrences where it's not >>> done this way. >>> >>> Suggested-by: Max Reitz >>> Signed-off-by: Sergio Lopez >>> --- > >> Or in fact, I think you need to hold the AioContext of a bs to >> bdrv_unref() it, so maybe 'goto out' is right, but you need to unref >> target_bs while you still hold old_context. > > I suspect https://bugzilla.redhat.com/show_bug.cgi?id=1779036 is also > a symptom of this. The v5 patch did not fix this simple test case: > > > $ qemu-img create -f qcow2 f1 100m > $ qemu-img create -f qcow2 f2 100m > $ ./qemu-kvm -nodefaults -nographic -qmp stdio -object iothread,id=io0 \ > -drive driver=qcow2,id=drive1,file=f1,if=none -device > virtio-scsi-pci,id=scsi0,iothread=io0 -device > scsi-hd,id=image1,drive=drive1 \ > -drive driver=qcow2,id=drive2,file=f2,if=none -device > virtio-blk-pci,id=image2,drive=drive2,iothread=io0 > > {'execute':'qmp_capabilities'} > > {'execute':'transaction','arguments':{'actions':[ > {'type':'blockdev-snapshot-sync','data':{'device':'drive1', > 'snapshot-file':'sn1','mode':'absolute-paths','format':'qcow2'}}, > {'type':'blockdev-snapshot-sync','data':{'device':'drive2', > 'snapshot-file':'/aa/sn1','mode':'absolute-paths','format':'qcow2'}}]}} > > which is an aio context bug somewhere on the error path of > blockdev-snapshot-sync (the first one has to be rolled back because > the second part of the transaction fails early on a nonexistent > directory) This is slightly different. The problem resides in external_snapshot_abort(): 1717 static void external_snapshot_abort(BlkActionState *common) 1718 { 1719 ExternalSnapshotState *state = 1720 DO_UPCAST(ExternalSnapshotState, common, common); 1721 if (state->new_bs) { 1722 if (state->overlay_appended) { 1723 AioContext *aio_context; 1724 1725 aio_context = bdrv_get_aio_context(state->old_bs); 1726 aio_context_acquire(aio_context); 1727 1728 bdrv_ref(state->old_bs); /* we can't let bdrv_set_backind_hd() 1729 close state->old_bs; we need it */ 1730 bdrv_set_backing_hd(state->new_bs, NULL, &error_abort); 1731 bdrv_replace_node(state->new_bs, state->old_bs, &error_abort); 1732 bdrv_unref(state->old_bs); /* bdrv_replace_node() ref'ed old_bs */ 1733 1734 aio_context_release(aio_context); 1735 } 1736 } 1737 } bdrv_set_backing_hd() returns state->old_bs to the main AioContext, while bdrv_replace_node() expects state->new_bs and state->old_bs to be using the same AioContext. I'm thinking sending this as a separate patch: diff --git a/blockdev.c b/blockdev.c index e33abd7fd2..6c73ac4e32 100644 --- a/blockdev.c +++ b/blockdev.c @@ -1731,6 +1731,8 @@ static void external_snapshot_abort(BlkActionState *common) if (state->new_bs) { if (state->overlay_appended) { AioContext *aio_context; + AioContext *tmp_context; + int ret; aio_context = bdrv_get_aio_context(state->old_bs); aio_context_acquire(aio_context); @@ -1738,6 +1740,25 @@ static void external_snapshot_abort(BlkActionState *common) bdrv_ref(state->old_bs); /* we can't let bdrv_set_backind_hd() close state->old_bs; we need it */ bdrv_set_backing_hd(state->new_bs, NULL, &error_abort); + + /* + * The call to bdrv_set_backing_hd() above returns state->old_bs to + * the main AioContext. As we're still going to be using it, return + * it to the AioContext it was before. + */ + tmp_context = bdrv_get_aio_context(state->old_bs); + if (aio_context != tmp_context) { + aio_context_release(aio_context); + aio_context_acquire(tmp_context); + + ret = bdrv_try_set_aio_context(state->old_bs, + aio_context, NULL); + assert(ret == 0); + + aio_context_release(tmp_context); + aio_context_acquire(aio_context); + } + bdrv_replace_node(state->new_bs, state->old_bs, &error_abort); bdrv_unref(state->old_bs); /* bdrv_replace_node() ref'ed old_bs */ What do you think? Sergio.