* [Qemu-devel] Assertion failure taking external snapshot with virtio drive + iothread
@ 2017-03-17 16:55 Ed Swierk
2017-03-17 17:11 ` Paolo Bonzini
2017-03-21 5:26 ` Fam Zheng
0 siblings, 2 replies; 14+ messages in thread
From: Ed Swierk @ 2017-03-17 16:55 UTC (permalink / raw)
To: Fam Zheng, Kevin Wolf; +Cc: qemu-devel, Paolo Bonzini
I'm running into the same problem taking an external snapshot with a
virtio-blk drive with iothread, so it's not specific to virtio-scsi.
Run a Linux guest on qemu master
qemu-system-x86_64 -nographic -enable-kvm -monitor
telnet:0.0.0.0:1234,server,nowait -m 1024 -object
iothread,id=iothread1 -drive file=/x/drive.qcow2,if=none,id=drive0
-device virtio-blk-pci,iothread=iothread1,drive=drive0
Then in the monitor
snapshot_blkdev drive0 /x/snap1.qcow2
qemu bombs with
qemu-system-x86_64: /x/qemu/include/block/aio.h:457:
aio_enable_external: Assertion `ctx->external_disable_cnt > 0' failed.
whereas without the iothread the assertion failure does not occur.
--Ed
On Thu, Mar 16, 2017 at 5:26 PM, Ed Swierk <eswierk@skyportsystems.com> wrote:
> With this change on top of 2.9.0-rc0, I am able to boot a Linux guest
> from a virtio-scsi drive with an iothread, e.g.
>
> qemu-system-x86_64 -nographic -enable-kvm -monitor
> telnet:0.0.0.0:1234,server,nowait -m 1024 -object
> iothread,id=iothread1 -device
> virtio-scsi-pci,iothread=iothread1,id=scsi0 -drive
> file=/x/drive.qcow2,format=qcow2,if=none,id=drive0,cache=directsync,aio=native
> -device scsi-hd,drive=drive0,bootindex=1
>
> But when I try to take a snapshot by running this in the monitor
>
> snapshot_blkdev drive0 /x/snap1.qcow2
>
> qemu bombs with
>
> qemu-system-x86_64: /x/qemu/include/block/aio.h:457:
> aio_enable_external: Assertion `ctx->external_disable_cnt > 0' failed.
>
> This does not occur if I don't use the iothread.
>
> I instrumented the code a bit, printing the value of bs,
> bdrv_get_aio_context(bs), and
> bdrv_get_aio_context(bs)->external_disable_cnt before and after
> aio_{disable,enable}_external() in bdrv_drained_{begin,end}().
>
> Without the iothread, nested calls to these functions cause the
> counter to increase and decrease as you'd expect, and the context is
> the same in each call.
>
> bdrv_drained_begin 0 bs=0x7fe9f5ad65a0 ctx=0x7fe9f5abc7b0 cnt=0
> bdrv_drained_begin 1 bs=0x7fe9f5ad65a0 ctx=0x7fe9f5abc7b0 cnt=1
> bdrv_drained_begin 0 bs=0x7fe9f5d12a00 ctx=0x7fe9f5abc7b0 cnt=1
> bdrv_drained_begin 1 bs=0x7fe9f5d12a00 ctx=0x7fe9f5abc7b0 cnt=2
> bdrv_drained_end 0 bs=0x7fe9f5d12a00 ctx=0x7fe9f5abc7b0 cnt=2
> bdrv_drained_end 1 bs=0x7fe9f5d12a00 ctx=0x7fe9f5abc7b0 cnt=1
> bdrv_drained_begin 0 bs=0x7fe9f5d12a00 ctx=0x7fe9f5abc7b0 cnt=1
> bdrv_drained_begin 1 bs=0x7fe9f5d12a00 ctx=0x7fe9f5abc7b0 cnt=2
> bdrv_drained_begin 0 bs=0x7fe9f67cfde0 ctx=0x7fe9f5abc7b0 cnt=2
> bdrv_drained_begin 1 bs=0x7fe9f67cfde0 ctx=0x7fe9f5abc7b0 cnt=3
> bdrv_drained_end 0 bs=0x7fe9f67cfde0 ctx=0x7fe9f5abc7b0 cnt=3
> bdrv_drained_end 1 bs=0x7fe9f67cfde0 ctx=0x7fe9f5abc7b0 cnt=2
> bdrv_drained_end 0 bs=0x7fe9f5d12a00 ctx=0x7fe9f5abc7b0 cnt=2
> bdrv_drained_end 1 bs=0x7fe9f5d12a00 ctx=0x7fe9f5abc7b0 cnt=1
> bdrv_drained_begin 0 bs=0x7fe9f5d12a00 ctx=0x7fe9f5abc7b0 cnt=1
> bdrv_drained_begin 1 bs=0x7fe9f5d12a00 ctx=0x7fe9f5abc7b0 cnt=2
> bdrv_drained_begin 0 bs=0x7fe9f67cfde0 ctx=0x7fe9f5abc7b0 cnt=2
> bdrv_drained_begin 1 bs=0x7fe9f67cfde0 ctx=0x7fe9f5abc7b0 cnt=3
> bdrv_drained_end 0 bs=0x7fe9f67cfde0 ctx=0x7fe9f5abc7b0 cnt=3
> bdrv_drained_end 1 bs=0x7fe9f67cfde0 ctx=0x7fe9f5abc7b0 cnt=2
> bdrv_drained_end 0 bs=0x7fe9f5d12a00 ctx=0x7fe9f5abc7b0 cnt=2
> bdrv_drained_end 1 bs=0x7fe9f5d12a00 ctx=0x7fe9f5abc7b0 cnt=1
> bdrv_drained_begin 0 bs=0x7fe9f5d12a00 ctx=0x7fe9f5abc7b0 cnt=1
> bdrv_drained_begin 1 bs=0x7fe9f5d12a00 ctx=0x7fe9f5abc7b0 cnt=2
> bdrv_drained_end 0 bs=0x7fe9f5d12a00 ctx=0x7fe9f5abc7b0 cnt=2
> bdrv_drained_end 1 bs=0x7fe9f5d12a00 ctx=0x7fe9f5abc7b0 cnt=1
> bdrv_drained_end 0 bs=0x7fe9f5ad65a0 ctx=0x7fe9f5abc7b0 cnt=1
> bdrv_drained_end 1 bs=0x7fe9f5ad65a0 ctx=0x7fe9f5abc7b0 cnt=0
>
> But with the iothread, there are at least two different context
> pointers, and there is one extra call to bdrv_drained_end() without a
> matching bdrv_drained_begin(). That last call comes from
> external_snapshot_clean().
>
> bdrv_drained_begin 0 bs=0x7fe4437545c0 ctx=0x7fe443749a00 cnt=0
> bdrv_drained_begin 1 bs=0x7fe4437545c0 ctx=0x7fe443749a00 cnt=1
> bdrv_drained_begin 0 bs=0x7fe443990a00 ctx=0x7fe44373a7b0 cnt=0
> bdrv_drained_begin 1 bs=0x7fe443990a00 ctx=0x7fe44373a7b0 cnt=1
> bdrv_drained_end 0 bs=0x7fe443990a00 ctx=0x7fe44373a7b0 cnt=1
> bdrv_drained_end 1 bs=0x7fe443990a00 ctx=0x7fe44373a7b0 cnt=0
> bdrv_drained_begin 0 bs=0x7fe443990a00 ctx=0x7fe44373a7b0 cnt=0
> bdrv_drained_begin 1 bs=0x7fe443990a00 ctx=0x7fe44373a7b0 cnt=1
> bdrv_drained_begin 0 bs=0x7fe44444de20 ctx=0x7fe44373a7b0 cnt=1
> bdrv_drained_begin 1 bs=0x7fe44444de20 ctx=0x7fe44373a7b0 cnt=2
> bdrv_drained_end 0 bs=0x7fe44444de20 ctx=0x7fe44373a7b0 cnt=2
> bdrv_drained_end 1 bs=0x7fe44444de20 ctx=0x7fe44373a7b0 cnt=1
> bdrv_drained_end 0 bs=0x7fe443990a00 ctx=0x7fe44373a7b0 cnt=1
> bdrv_drained_end 1 bs=0x7fe443990a00 ctx=0x7fe44373a7b0 cnt=0
> bdrv_drained_begin 0 bs=0x7fe443990a00 ctx=0x7fe44373a7b0 cnt=0
> bdrv_drained_begin 1 bs=0x7fe443990a00 ctx=0x7fe44373a7b0 cnt=1
> bdrv_drained_begin 0 bs=0x7fe44444de20 ctx=0x7fe44373a7b0 cnt=1
> bdrv_drained_begin 1 bs=0x7fe44444de20 ctx=0x7fe44373a7b0 cnt=2
> bdrv_drained_end 0 bs=0x7fe44444de20 ctx=0x7fe44373a7b0 cnt=2
> bdrv_drained_end 1 bs=0x7fe44444de20 ctx=0x7fe44373a7b0 cnt=1
> bdrv_drained_end 0 bs=0x7fe443990a00 ctx=0x7fe44373a7b0 cnt=1
> bdrv_drained_end 1 bs=0x7fe443990a00 ctx=0x7fe44373a7b0 cnt=0
> bdrv_drained_begin 0 bs=0x7fe443990a00 ctx=0x7fe44373a7b0 cnt=0
> bdrv_drained_begin 1 bs=0x7fe443990a00 ctx=0x7fe44373a7b0 cnt=1
> bdrv_drained_end 0 bs=0x7fe443990a00 ctx=0x7fe443749a00 cnt=1
> bdrv_drained_end 1 bs=0x7fe443990a00 ctx=0x7fe443749a00 cnt=0
> bdrv_drained_end 0 bs=0x7fe4437545c0 ctx=0x7fe443749a00 cnt=0
> qemu-system-x86_64: /x/qemu/include/block/aio.h:457:
> aio_enable_external: Assertion `ctx->external_disable_cnt > 0' failed.
>
> I didn't have much luck bisecting the bug, since about 200 commits
> prior to 2.9.0-rc0 qemu bombs immediately on boot, and after that I
> get the assertion addressed by your patch. I have to go farther back
> to find a working version.
>
> Any help would be appreciated.
>
> --Ed
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [Qemu-devel] Assertion failure taking external snapshot with virtio drive + iothread 2017-03-17 16:55 [Qemu-devel] Assertion failure taking external snapshot with virtio drive + iothread Ed Swierk @ 2017-03-17 17:11 ` Paolo Bonzini 2017-03-17 17:15 ` Paolo Bonzini 2017-03-21 5:26 ` Fam Zheng 1 sibling, 1 reply; 14+ messages in thread From: Paolo Bonzini @ 2017-03-17 17:11 UTC (permalink / raw) To: Ed Swierk, Fam Zheng, Kevin Wolf; +Cc: qemu-devel On 17/03/2017 17:55, Ed Swierk wrote: > I'm running into the same problem taking an external snapshot with a > virtio-blk drive with iothread, so it's not specific to virtio-scsi. > Run a Linux guest on qemu master > > qemu-system-x86_64 -nographic -enable-kvm -monitor > telnet:0.0.0.0:1234,server,nowait -m 1024 -object > iothread,id=iothread1 -drive file=/x/drive.qcow2,if=none,id=drive0 > -device virtio-blk-pci,iothread=iothread1,drive=drive0 > > Then in the monitor > > snapshot_blkdev drive0 /x/snap1.qcow2 > > qemu bombs with > > qemu-system-x86_64: /x/qemu/include/block/aio.h:457: > aio_enable_external: Assertion `ctx->external_disable_cnt > 0' failed. > > whereas without the iothread the assertion failure does not occur. Please try this patch: diff --git a/block/block-backend.c b/block/block-backend.c index 5742c09..1d95879 100644 --- a/block/block-backend.c +++ b/block/block-backend.c @@ -1876,6 +1876,7 @@ static void blk_root_drained_begin(BdrvChild *child) if (blk->public.io_limits_disabled++ == 0) { throttle_group_restart_blk(blk); } + aio_disable_external(bdrv_get_aio_context(bs)); } static void blk_root_drained_end(BdrvChild *child) @@ -1883,5 +1884,6 @@ static void blk_root_drained_end(BdrvChild *child) BlockBackend *blk = child->opaque; assert(blk->public.io_limits_disabled); + aio_enable_external(bdrv_get_aio_context(bs)); --blk->public.io_limits_disabled; } diff --git a/block/io.c b/block/io.c index 2709a70..a6dcef5 100644 --- a/block/io.c +++ b/block/io.c @@ -224,7 +224,6 @@ void bdrv_drained_begin(BlockDriverState *bs) } if (!bs->quiesce_counter++) { - aio_disable_external(bdrv_get_aio_context(bs)); bdrv_parent_drained_begin(bs); } @@ -239,7 +238,6 @@ void bdrv_drained_end(BlockDriverState *bs) } bdrv_parent_drained_end(bs); - aio_enable_external(bdrv_get_aio_context(bs)); } /* This is not a proper fix, the right one would move the calls into virtio-blk and virtio-scsi, but it might be a start. I think the issue is that you have one call to aio_disable_external for drive.qcow2 before the snapshot, and two calls to aio_enable_external (one for drive.qcow2 and one for snap1.qcow2) after. Paolo ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [Qemu-devel] Assertion failure taking external snapshot with virtio drive + iothread 2017-03-17 17:11 ` Paolo Bonzini @ 2017-03-17 17:15 ` Paolo Bonzini 2017-03-17 17:32 ` Ed Swierk 0 siblings, 1 reply; 14+ messages in thread From: Paolo Bonzini @ 2017-03-17 17:15 UTC (permalink / raw) To: Ed Swierk, Fam Zheng, Kevin Wolf; +Cc: qemu-devel On 17/03/2017 18:11, Paolo Bonzini wrote: > > > On 17/03/2017 17:55, Ed Swierk wrote: >> I'm running into the same problem taking an external snapshot with a >> virtio-blk drive with iothread, so it's not specific to virtio-scsi. >> Run a Linux guest on qemu master >> >> qemu-system-x86_64 -nographic -enable-kvm -monitor >> telnet:0.0.0.0:1234,server,nowait -m 1024 -object >> iothread,id=iothread1 -drive file=/x/drive.qcow2,if=none,id=drive0 >> -device virtio-blk-pci,iothread=iothread1,drive=drive0 >> >> Then in the monitor >> >> snapshot_blkdev drive0 /x/snap1.qcow2 >> >> qemu bombs with >> >> qemu-system-x86_64: /x/qemu/include/block/aio.h:457: >> aio_enable_external: Assertion `ctx->external_disable_cnt > 0' failed. >> >> whereas without the iothread the assertion failure does not occur. > > Please try this patch: Hmm, no. I'll post the full fix on top of John Snow's patches. Paolo > diff --git a/block/block-backend.c b/block/block-backend.c > index 5742c09..1d95879 100644 > --- a/block/block-backend.c > +++ b/block/block-backend.c > @@ -1876,6 +1876,7 @@ static void blk_root_drained_begin(BdrvChild *child) > if (blk->public.io_limits_disabled++ == 0) { > throttle_group_restart_blk(blk); > } > + aio_disable_external(bdrv_get_aio_context(bs)); > } > > static void blk_root_drained_end(BdrvChild *child) > @@ -1883,5 +1884,6 @@ static void blk_root_drained_end(BdrvChild *child) > BlockBackend *blk = child->opaque; > > assert(blk->public.io_limits_disabled); > + aio_enable_external(bdrv_get_aio_context(bs)); > --blk->public.io_limits_disabled; > } > diff --git a/block/io.c b/block/io.c > index 2709a70..a6dcef5 100644 > --- a/block/io.c > +++ b/block/io.c > @@ -224,7 +224,6 @@ void bdrv_drained_begin(BlockDriverState *bs) > } > > if (!bs->quiesce_counter++) { > - aio_disable_external(bdrv_get_aio_context(bs)); > bdrv_parent_drained_begin(bs); > } > > @@ -239,7 +238,6 @@ void bdrv_drained_end(BlockDriverState *bs) > } > > bdrv_parent_drained_end(bs); > - aio_enable_external(bdrv_get_aio_context(bs)); > } > > /* > > This is not a proper fix, the right one would move the calls into > virtio-blk and virtio-scsi, but it might be a start. I think the > issue is that you have one call to aio_disable_external for drive.qcow2 > before the snapshot, and two calls to aio_enable_external (one for > drive.qcow2 and one for snap1.qcow2) after. > > Paolo > ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [Qemu-devel] Assertion failure taking external snapshot with virtio drive + iothread 2017-03-17 17:15 ` Paolo Bonzini @ 2017-03-17 17:32 ` Ed Swierk 2017-03-17 18:10 ` Paolo Bonzini 2017-03-17 19:27 ` Paolo Bonzini 0 siblings, 2 replies; 14+ messages in thread From: Ed Swierk @ 2017-03-17 17:32 UTC (permalink / raw) To: Paolo Bonzini; +Cc: Fam Zheng, Kevin Wolf, qemu-devel On Fri, Mar 17, 2017 at 10:15 AM, Paolo Bonzini <pbonzini@redhat.com> wrote: > > > On 17/03/2017 18:11, Paolo Bonzini wrote: >> >> >> On 17/03/2017 17:55, Ed Swierk wrote: >>> I'm running into the same problem taking an external snapshot with a >>> virtio-blk drive with iothread, so it's not specific to virtio-scsi. >>> Run a Linux guest on qemu master >>> >>> qemu-system-x86_64 -nographic -enable-kvm -monitor >>> telnet:0.0.0.0:1234,server,nowait -m 1024 -object >>> iothread,id=iothread1 -drive file=/x/drive.qcow2,if=none,id=drive0 >>> -device virtio-blk-pci,iothread=iothread1,drive=drive0 >>> >>> Then in the monitor >>> >>> snapshot_blkdev drive0 /x/snap1.qcow2 >>> >>> qemu bombs with >>> >>> qemu-system-x86_64: /x/qemu/include/block/aio.h:457: >>> aio_enable_external: Assertion `ctx->external_disable_cnt > 0' failed. >>> >>> whereas without the iothread the assertion failure does not occur. >> >> Please try this patch: > > Hmm, no. I'll post the full fix on top of John Snow's patches. OK. Incidentally, testing with virtio-blk I bisected the assertion failure to b2c2832c6140cfe3ddc0de2d77eeb0b77dea8fd3 ("block: Add Error parameter to bdrv_append()"). --Ed ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [Qemu-devel] Assertion failure taking external snapshot with virtio drive + iothread 2017-03-17 17:32 ` Ed Swierk @ 2017-03-17 18:10 ` Paolo Bonzini 2017-03-17 19:27 ` Paolo Bonzini 1 sibling, 0 replies; 14+ messages in thread From: Paolo Bonzini @ 2017-03-17 18:10 UTC (permalink / raw) To: Ed Swierk; +Cc: Fam Zheng, Kevin Wolf, qemu-devel On 17/03/2017 18:32, Ed Swierk wrote: > On Fri, Mar 17, 2017 at 10:15 AM, Paolo Bonzini <pbonzini@redhat.com> wrote: >> >> >> On 17/03/2017 18:11, Paolo Bonzini wrote: >>> >>> >>> On 17/03/2017 17:55, Ed Swierk wrote: >>>> I'm running into the same problem taking an external snapshot with a >>>> virtio-blk drive with iothread, so it's not specific to virtio-scsi. >>>> Run a Linux guest on qemu master >>>> >>>> qemu-system-x86_64 -nographic -enable-kvm -monitor >>>> telnet:0.0.0.0:1234,server,nowait -m 1024 -object >>>> iothread,id=iothread1 -drive file=/x/drive.qcow2,if=none,id=drive0 >>>> -device virtio-blk-pci,iothread=iothread1,drive=drive0 >>>> >>>> Then in the monitor >>>> >>>> snapshot_blkdev drive0 /x/snap1.qcow2 >>>> >>>> qemu bombs with >>>> >>>> qemu-system-x86_64: /x/qemu/include/block/aio.h:457: >>>> aio_enable_external: Assertion `ctx->external_disable_cnt > 0' failed. >>>> >>>> whereas without the iothread the assertion failure does not occur. >>> >>> Please try this patch: >> >> Hmm, no. I'll post the full fix on top of John Snow's patches. > > OK. Incidentally, testing with virtio-blk I bisected the assertion > failure to b2c2832c6140cfe3ddc0de2d77eeb0b77dea8fd3 ("block: Add Error > parameter to bdrv_append()"). And in particular to this: + bdrv_set_backing_hd(bs_new, bs_top, &local_err); + if (local_err) { + error_propagate(errp, local_err); + goto out; + } change_parent_backing_link(bs_top, bs_new); - /* FIXME Error handling */ - bdrv_set_backing_hd(bs_new, bs_top, &error_abort); Paolo ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [Qemu-devel] Assertion failure taking external snapshot with virtio drive + iothread 2017-03-17 17:32 ` Ed Swierk 2017-03-17 18:10 ` Paolo Bonzini @ 2017-03-17 19:27 ` Paolo Bonzini 2017-03-20 21:54 ` Ed Swierk 2017-03-21 1:48 ` Ed Swierk 1 sibling, 2 replies; 14+ messages in thread From: Paolo Bonzini @ 2017-03-17 19:27 UTC (permalink / raw) To: Ed Swierk; +Cc: Fam Zheng, Kevin Wolf, qemu-devel [-- Attachment #1: Type: text/plain, Size: 2427 bytes --] On 17/03/2017 18:32, Ed Swierk wrote: > On Fri, Mar 17, 2017 at 10:15 AM, Paolo Bonzini <pbonzini@redhat.com> wrote: >> >> >> On 17/03/2017 18:11, Paolo Bonzini wrote: >>> >>> >>> On 17/03/2017 17:55, Ed Swierk wrote: >>>> I'm running into the same problem taking an external snapshot with a >>>> virtio-blk drive with iothread, so it's not specific to virtio-scsi. >>>> Run a Linux guest on qemu master >>>> >>>> qemu-system-x86_64 -nographic -enable-kvm -monitor >>>> telnet:0.0.0.0:1234,server,nowait -m 1024 -object >>>> iothread,id=iothread1 -drive file=/x/drive.qcow2,if=none,id=drive0 >>>> -device virtio-blk-pci,iothread=iothread1,drive=drive0 >>>> >>>> Then in the monitor >>>> >>>> snapshot_blkdev drive0 /x/snap1.qcow2 >>>> >>>> qemu bombs with >>>> >>>> qemu-system-x86_64: /x/qemu/include/block/aio.h:457: >>>> aio_enable_external: Assertion `ctx->external_disable_cnt > 0' failed. >>>> >>>> whereas without the iothread the assertion failure does not occur. >>> >>> Please try this patch: >> >> Hmm, no. I'll post the full fix on top of John Snow's patches. > > OK. Incidentally, testing with virtio-blk I bisected the assertion > failure to b2c2832c6140cfe3ddc0de2d77eeb0b77dea8fd3 ("block: Add Error > parameter to bdrv_append()"). And this is a fix, but I have no idea why/how it works and what else it may break. Patches 1 and 2 are pretty obvious and would be the first step towards eliminating aio_disable/enable_external altogether. However I got patch 3 more or less by trial and error, and when I thought I had the reasoning right I noticed this: bdrv_drained_end(state->old_bs); in external_snapshot_clean which makes no sense given the bdrv_drained_begin(bs_new); that I added to bdrv_append. So take this with a ton of salt. The basic idea is that calling child->role->drained_begin and child->role->drained_end is not necessary and in fact actively wrong when both the old and the new child should be in a drained section. But maybe instead it should be asserted that they are, except for the special case of adding or removing a child. i.e. after int drain = !!(old_bs && old_bs->quiesce_counter) - !!(new_bs && new_bs->quiesce_counter); add assert(!(drain && old_bs && new_bs)); Throwing this out because it's Friday evening... Maybe Fam can pick it up on Monday. Paolo [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #2: ff.patch --] [-- Type: text/x-patch; name="ff.patch", Size: 9282 bytes --] From f399388896c49fae4fd3f4837520d58b704c024a Mon Sep 17 00:00:00 2001 From: Paolo Bonzini <pbonzini@redhat.com> Date: Fri, 17 Mar 2017 19:05:44 +0100 Subject: [PATCH 1/3] scsi: add drained_begin/drained_end callbacks to bus Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> --- hw/scsi/scsi-bus.c | 14 ++++++++++++++ hw/scsi/scsi-disk.c | 24 ++++++++++++++++++++++++ include/hw/scsi/scsi.h | 5 +++++ 3 files changed, 43 insertions(+) diff --git a/hw/scsi/scsi-bus.c b/hw/scsi/scsi-bus.c index f557446..fcad82b 100644 --- a/hw/scsi/scsi-bus.c +++ b/hw/scsi/scsi-bus.c @@ -97,6 +97,20 @@ void scsi_bus_new(SCSIBus *bus, size_t bus_size, DeviceState *host, qbus_set_bus_hotplug_handler(BUS(bus), &error_abort); } +void scsi_bus_drained_begin(SCSIBus *bus) +{ + if (bus->info->drained_begin) { + bus->info->drained_begin(bus); + } +} + +void scsi_bus_drained_end(SCSIBus *bus) +{ + if (bus->info->drained_end) { + bus->info->drained_end(bus); + } +} + static void scsi_dma_restart_bh(void *opaque) { SCSIDevice *s = opaque; diff --git a/hw/scsi/scsi-disk.c b/hw/scsi/scsi-disk.c index a53f058..faca77c 100644 --- a/hw/scsi/scsi-disk.c +++ b/hw/scsi/scsi-disk.c @@ -2281,6 +2281,24 @@ static bool scsi_cd_is_medium_locked(void *opaque) return ((SCSIDiskState *)opaque)->tray_locked; } +static void scsi_disk_drained_begin(void *opaque) +{ + SCSIDiskState *s = opaque; + SCSIDevice *sdev = SCSI_DEVICE(s); + SCSIBus *bus = DO_UPCAST(SCSIBus, qbus, sdev->qdev.parent_bus); + + scsi_bus_drained_begin(bus); +} + +static void scsi_disk_drained_end(void *opaque) +{ + SCSIDiskState *s = opaque; + SCSIDevice *sdev = SCSI_DEVICE(s); + SCSIBus *bus = DO_UPCAST(SCSIBus, qbus, sdev->qdev.parent_bus); + + scsi_bus_drained_end(bus); +} + static const BlockDevOps scsi_disk_removable_block_ops = { .change_media_cb = scsi_cd_change_media_cb, .eject_request_cb = scsi_cd_eject_request_cb, @@ -2288,10 +2306,16 @@ static const BlockDevOps scsi_disk_removable_block_ops = { .is_medium_locked = scsi_cd_is_medium_locked, .resize_cb = scsi_disk_resize_cb, + + .drained_begin = scsi_disk_drained_begin, + .drained_end = scsi_disk_drained_end, }; static const BlockDevOps scsi_disk_block_ops = { .resize_cb = scsi_disk_resize_cb, + + .drained_begin = scsi_disk_drained_begin, + .drained_end = scsi_disk_drained_end, }; static void scsi_disk_unit_attention_reported(SCSIDevice *dev) diff --git a/include/hw/scsi/scsi.h b/include/hw/scsi/scsi.h index 6b85786..915c1bb 100644 --- a/include/hw/scsi/scsi.h +++ b/include/hw/scsi/scsi.h @@ -153,6 +153,9 @@ struct SCSIBusInfo { void (*save_request)(QEMUFile *f, SCSIRequest *req); void *(*load_request)(QEMUFile *f, SCSIRequest *req); void (*free_request)(SCSIBus *bus, void *priv); + + void (*drained_begin)(SCSIBus *bus); + void (*drained_end)(SCSIBus *bus); }; #define TYPE_SCSI_BUS "SCSI" @@ -257,6 +260,8 @@ void scsi_req_unref(SCSIRequest *req); int scsi_bus_parse_cdb(SCSIDevice *dev, SCSICommand *cmd, uint8_t *buf, void *hba_private); +void scsi_bus_drained_begin(SCSIBus *bus); +void scsi_bus_drained_end(SCSIBus *bus); int scsi_req_parse_cdb(SCSIDevice *dev, SCSICommand *cmd, uint8_t *buf); void scsi_req_build_sense(SCSIRequest *req, SCSISense sense); void scsi_req_print(SCSIRequest *req); -- 2.9.3 From b150bf792721b6bbd652aa9017ffde083a075a0d Mon Sep 17 00:00:00 2001 From: Paolo Bonzini <pbonzini@redhat.com> Date: Fri, 17 Mar 2017 18:31:41 +0100 Subject: [PATCH 2/3] block: move aio_disable_external/aio_enable_external to virtio devices Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> --- block/io.c | 4 ---- hw/block/virtio-blk.c | 16 ++++++++++++++++ hw/scsi/virtio-scsi.c | 19 +++++++++++++++++++ 3 files changed, 35 insertions(+), 4 deletions(-) diff --git a/block/io.c b/block/io.c index 2709a70..d6c19f9 100644 --- a/block/io.c +++ b/block/io.c @@ -224,7 +224,6 @@ void bdrv_drained_begin(BlockDriverState *bs) } if (!bs->quiesce_counter++) { - aio_disable_external(bdrv_get_aio_context(bs)); bdrv_parent_drained_begin(bs); } @@ -239,7 +238,6 @@ void bdrv_drained_end(BlockDriverState *bs) } bdrv_parent_drained_end(bs); - aio_enable_external(bdrv_get_aio_context(bs)); } /* @@ -300,7 +298,6 @@ void bdrv_drain_all_begin(void) aio_context_acquire(aio_context); bdrv_parent_drained_begin(bs); - aio_disable_external(aio_context); aio_context_release(aio_context); if (!g_slist_find(aio_ctxs, aio_context)) { @@ -343,7 +340,6 @@ void bdrv_drain_all_end(void) AioContext *aio_context = bdrv_get_aio_context(bs); aio_context_acquire(aio_context); - aio_enable_external(aio_context); bdrv_parent_drained_end(bs); aio_context_release(aio_context); } diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c index 98c16a7..de061c0 100644 --- a/hw/block/virtio-blk.c +++ b/hw/block/virtio-blk.c @@ -902,7 +902,23 @@ static void virtio_blk_resize(void *opaque) virtio_notify_config(vdev); } +static void virtio_blk_drained_begin(void *opaque) +{ + VirtIOBlock *s = VIRTIO_BLK(opaque); + + aio_disable_external(blk_get_aio_context(s->conf.conf.blk)); +} + +static void virtio_blk_drained_end(void *opaque) +{ + VirtIOBlock *s = VIRTIO_BLK(opaque); + + aio_enable_external(blk_get_aio_context(s->conf.conf.blk)); +} + static const BlockDevOps virtio_block_ops = { + .drained_begin = virtio_blk_drained_begin, + .drained_end = virtio_blk_drained_end, .resize_cb = virtio_blk_resize, }; diff --git a/hw/scsi/virtio-scsi.c b/hw/scsi/virtio-scsi.c index bd62d08..788d36a 100644 --- a/hw/scsi/virtio-scsi.c +++ b/hw/scsi/virtio-scsi.c @@ -826,6 +826,22 @@ static void virtio_scsi_hotunplug(HotplugHandler *hotplug_dev, DeviceState *dev, qdev_simple_device_unplug_cb(hotplug_dev, dev, errp); } +static void virtio_scsi_drained_begin(SCSIBus *bus) +{ + VirtIOSCSI *s = container_of(bus, VirtIOSCSI, bus); + if (s->ctx) { + aio_disable_external(s->ctx); + } +} + +static void virtio_scsi_drained_end(SCSIBus *bus) +{ + VirtIOSCSI *s = container_of(bus, VirtIOSCSI, bus); + if (s->ctx) { + aio_enable_external(s->ctx); + } +} + static struct SCSIBusInfo virtio_scsi_scsi_info = { .tcq = true, .max_channel = VIRTIO_SCSI_MAX_CHANNEL, @@ -839,6 +855,9 @@ static struct SCSIBusInfo virtio_scsi_scsi_info = { .get_sg_list = virtio_scsi_get_sg_list, .save_request = virtio_scsi_save_request, .load_request = virtio_scsi_load_request, + + .drained_begin = virtio_scsi_drained_begin, + .drained_end = virtio_scsi_drained_end, }; void virtio_scsi_common_realize(DeviceState *dev, Error **errp, -- 2.9.3 From ba29c0d2665f88f9ba158da424f3d9bdca56062b Mon Sep 17 00:00:00 2001 From: Paolo Bonzini <pbonzini@redhat.com> Date: Fri, 17 Mar 2017 18:31:15 +0100 Subject: [PATCH 3/3] fix --- block.c | 13 +++++++++++-- 1 file changed, 11 insertions(+), 2 deletions(-) diff --git a/block.c b/block.c index 6e906ec..b3d42ef 100644 --- a/block.c +++ b/block.c @@ -1736,9 +1736,10 @@ static void bdrv_replace_child_noperm(BdrvChild *child, BlockDriverState *new_bs) { BlockDriverState *old_bs = child->bs; + int drain = !!(old_bs && old_bs->quiesce_counter) - !!(new_bs && new_bs->quiesce_counter); if (old_bs) { - if (old_bs->quiesce_counter && child->role->drained_end) { + if (drain < 0 && child->role->drained_end) { child->role->drained_end(child); } if (child->role->detach) { @@ -1751,7 +1752,7 @@ static void bdrv_replace_child_noperm(BdrvChild *child, if (new_bs) { QLIST_INSERT_HEAD(&new_bs->parents, child, next_parent); - if (new_bs->quiesce_counter && child->role->drained_begin) { + if (drain > 0 && child->role->drained_begin) { child->role->drained_begin(child); } @@ -3026,6 +3027,10 @@ void bdrv_append(BlockDriverState *bs_new, BlockDriverState *bs_top, { Error *local_err = NULL; + assert(bs_new->quiesce_counter == 0); + assert(bs_top->quiesce_counter == 1); + bdrv_drained_begin(bs_new); + bdrv_set_backing_hd(bs_new, bs_top, &local_err); if (local_err) { error_propagate(errp, local_err); @@ -3036,9 +3041,13 @@ void bdrv_append(BlockDriverState *bs_new, BlockDriverState *bs_top, if (local_err) { error_propagate(errp, local_err); bdrv_set_backing_hd(bs_new, NULL, &error_abort); + bdrv_drained_end(bs_new); goto out; } + assert(bs_new->quiesce_counter == 1); + assert(bs_top->quiesce_counter == 1); + /* bs_new is now referenced by its new parents, we don't need the * additional reference any more. */ out: -- 2.9.3 ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [Qemu-devel] Assertion failure taking external snapshot with virtio drive + iothread 2017-03-17 19:27 ` Paolo Bonzini @ 2017-03-20 21:54 ` Ed Swierk 2017-03-21 1:48 ` Ed Swierk 1 sibling, 0 replies; 14+ messages in thread From: Ed Swierk @ 2017-03-20 21:54 UTC (permalink / raw) To: Paolo Bonzini; +Cc: Fam Zheng, Kevin Wolf, qemu-devel On Fri, Mar 17, 2017 at 12:27 PM, Paolo Bonzini <pbonzini@redhat.com> wrote: > And this is a fix, but I have no idea why/how it works and what else it > may break. > > Patches 1 and 2 are pretty obvious and would be the first step towards > eliminating aio_disable/enable_external altogether. > > However I got patch 3 more or less by trial and error, and when I > thought I had the reasoning right I noticed this: > > bdrv_drained_end(state->old_bs); > > in external_snapshot_clean which makes no sense given the > > bdrv_drained_begin(bs_new); > > that I added to bdrv_append. So take this with a ton of salt. > > The basic idea is that calling child->role->drained_begin and > child->role->drained_end is not necessary and in fact actively wrong > when both the old and the new child should be in a drained section. > But maybe instead it should be asserted that they are, except for the > special case of adding or removing a child. i.e. after > > int drain = !!(old_bs && old_bs->quiesce_counter) - !!(new_bs && new_bs->quiesce_counter); > > add > > assert(!(drain && old_bs && new_bs)); > > Throwing this out because it's Friday evening... Maybe Fam can pick > it up on Monday. OK, thanks. It would be good to figure this out for 2.9, since the workaround of disabling iothreads will affect performance. Let me know if there's anything I can do to help. Meanwhile I'm also looking into an intermittent crash running block-commit on an external snapshot. This is with an earlier QEMU snapshot (from around 20 Jan): /x/qemu/deb/debuild/block.c:2433: bdrv_append: Assertion `!bdrv_requests_pending(bs_top)' failed. That assertion no longer exists in the current master, but I'm trying to reproduce it reliably and see whether the bug itself has disappeared. --Ed ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [Qemu-devel] Assertion failure taking external snapshot with virtio drive + iothread 2017-03-17 19:27 ` Paolo Bonzini 2017-03-20 21:54 ` Ed Swierk @ 2017-03-21 1:48 ` Ed Swierk 1 sibling, 0 replies; 14+ messages in thread From: Ed Swierk @ 2017-03-21 1:48 UTC (permalink / raw) To: Paolo Bonzini; +Cc: Fam Zheng, Kevin Wolf, qemu-devel On Fri, Mar 17, 2017 at 12:27 PM, Paolo Bonzini <pbonzini@redhat.com> wrote: > And this is a fix, but I have no idea why/how it works and what else it > may break. > > Patches 1 and 2 are pretty obvious and would be the first step towards > eliminating aio_disable/enable_external altogether. > > However I got patch 3 more or less by trial and error, and when I > thought I had the reasoning right I noticed this: > > bdrv_drained_end(state->old_bs); > > in external_snapshot_clean which makes no sense given the > > bdrv_drained_begin(bs_new); > > that I added to bdrv_append. So take this with a ton of salt. > > The basic idea is that calling child->role->drained_begin and > child->role->drained_end is not necessary and in fact actively wrong > when both the old and the new child should be in a drained section. > But maybe instead it should be asserted that they are, except for the > special case of adding or removing a child. i.e. after > > int drain = !!(old_bs && old_bs->quiesce_counter) - !!(new_bs && new_bs->quiesce_counter); > > add > > assert(!(drain && old_bs && new_bs)); > > Throwing this out because it's Friday evening... Maybe Fam can pick > it up on Monday. I just tested this patch on top of today's master. It does make the ctx->external_disable_cnt > 0 assertion failure on snapshot_blkdev go away. But it seems to cause a different assertion failure when running without an iothread, e.g. qemu-system-x86_64 -nographic -enable-kvm -monitor telnet:0.0.0.0:1234,server,nowait -m 1024 -object iothread,id=iothread1 -drive file=/x/drive.qcow2,if=none,id=drive0 -device virtio-blk-pci,drive=drive0 and with the guest constantly writing to the disk with something like while true; do echo 12345 >blah; done Running snapshot_blkdev in the monitor repeatedly (with a new backing file each time) triggers the following after a few tries: qemu-system-x86_64: /x/qemu/block.c:2965: bdrv_replace_node: Assertion `!({ typedef struct { int:(sizeof(*&from->in_flight) > sizeof(void *)) ? -1 : 1; } qemu_build_bug_on__4 __attribute__((unused)); __atomic_load_n(&from->in_flight, 0); })' failed. This does not occur on today's master without this patch. --Ed ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [Qemu-devel] Assertion failure taking external snapshot with virtio drive + iothread 2017-03-17 16:55 [Qemu-devel] Assertion failure taking external snapshot with virtio drive + iothread Ed Swierk 2017-03-17 17:11 ` Paolo Bonzini @ 2017-03-21 5:26 ` Fam Zheng 2017-03-21 12:20 ` Ed Swierk 1 sibling, 1 reply; 14+ messages in thread From: Fam Zheng @ 2017-03-21 5:26 UTC (permalink / raw) To: Ed Swierk; +Cc: Kevin Wolf, Paolo Bonzini, qemu-devel On Fri, 03/17 09:55, Ed Swierk wrote: > I'm running into the same problem taking an external snapshot with a > virtio-blk drive with iothread, so it's not specific to virtio-scsi. > Run a Linux guest on qemu master > > qemu-system-x86_64 -nographic -enable-kvm -monitor > telnet:0.0.0.0:1234,server,nowait -m 1024 -object > iothread,id=iothread1 -drive file=/x/drive.qcow2,if=none,id=drive0 > -device virtio-blk-pci,iothread=iothread1,drive=drive0 > > Then in the monitor > > snapshot_blkdev drive0 /x/snap1.qcow2 > > qemu bombs with > > qemu-system-x86_64: /x/qemu/include/block/aio.h:457: > aio_enable_external: Assertion `ctx->external_disable_cnt > 0' failed. > > whereas without the iothread the assertion failure does not occur. Can you test this one? --- diff --git a/blockdev.c b/blockdev.c index c5b2c2c..4c217d5 100644 --- a/blockdev.c +++ b/blockdev.c @@ -1772,6 +1772,8 @@ static void external_snapshot_prepare(BlkActionState *common, return; } + bdrv_set_aio_context(state->new_bs, state->aio_context); + /* This removes our old bs and adds the new bs. This is an operation that * can fail, so we need to do it in .prepare; undoing it for abort is * always possible. */ @@ -1789,8 +1791,6 @@ static void external_snapshot_commit(BlkActionState *common) ExternalSnapshotState *state = DO_UPCAST(ExternalSnapshotState, common, common); - bdrv_set_aio_context(state->new_bs, state->aio_context); - /* We don't need (or want) to use the transactional * bdrv_reopen_multiple() across all the entries at once, because we * don't want to abort all of them if one of them fails the reopen */ ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [Qemu-devel] Assertion failure taking external snapshot with virtio drive + iothread 2017-03-21 5:26 ` Fam Zheng @ 2017-03-21 12:20 ` Ed Swierk 2017-03-21 12:50 ` Fam Zheng 0 siblings, 1 reply; 14+ messages in thread From: Ed Swierk @ 2017-03-21 12:20 UTC (permalink / raw) To: Fam Zheng; +Cc: Kevin Wolf, Paolo Bonzini, qemu-devel On Mon, Mar 20, 2017 at 10:26 PM, Fam Zheng <famz@redhat.com> wrote: > On Fri, 03/17 09:55, Ed Swierk wrote: >> I'm running into the same problem taking an external snapshot with a >> virtio-blk drive with iothread, so it's not specific to virtio-scsi. >> Run a Linux guest on qemu master >> >> qemu-system-x86_64 -nographic -enable-kvm -monitor >> telnet:0.0.0.0:1234,server,nowait -m 1024 -object >> iothread,id=iothread1 -drive file=/x/drive.qcow2,if=none,id=drive0 >> -device virtio-blk-pci,iothread=iothread1,drive=drive0 >> >> Then in the monitor >> >> snapshot_blkdev drive0 /x/snap1.qcow2 >> >> qemu bombs with >> >> qemu-system-x86_64: /x/qemu/include/block/aio.h:457: >> aio_enable_external: Assertion `ctx->external_disable_cnt > 0' failed. >> >> whereas without the iothread the assertion failure does not occur. > > > Can you test this one? > > --- > > > diff --git a/blockdev.c b/blockdev.c > index c5b2c2c..4c217d5 100644 > --- a/blockdev.c > +++ b/blockdev.c > @@ -1772,6 +1772,8 @@ static void external_snapshot_prepare(BlkActionState *common, > return; > } > > + bdrv_set_aio_context(state->new_bs, state->aio_context); > + > /* This removes our old bs and adds the new bs. This is an operation that > * can fail, so we need to do it in .prepare; undoing it for abort is > * always possible. */ > @@ -1789,8 +1791,6 @@ static void external_snapshot_commit(BlkActionState *common) > ExternalSnapshotState *state = > DO_UPCAST(ExternalSnapshotState, common, common); > > - bdrv_set_aio_context(state->new_bs, state->aio_context); > - > /* We don't need (or want) to use the transactional > * bdrv_reopen_multiple() across all the entries at once, because we > * don't want to abort all of them if one of them fails the reopen */ With this change, a different assertion fails on running snapshot_blkdev: qemu-system-x86_64: /x/qemu/block/io.c:164: bdrv_drain_recurse: Assertion `qemu_get_current_aio_context() == qemu_get_aio_context()' failed. --Ed ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [Qemu-devel] Assertion failure taking external snapshot with virtio drive + iothread 2017-03-21 12:20 ` Ed Swierk @ 2017-03-21 12:50 ` Fam Zheng 2017-03-21 13:05 ` Ed Swierk 0 siblings, 1 reply; 14+ messages in thread From: Fam Zheng @ 2017-03-21 12:50 UTC (permalink / raw) To: Ed Swierk; +Cc: Kevin Wolf, Paolo Bonzini, qemu-devel On Tue, 03/21 05:20, Ed Swierk wrote: > On Mon, Mar 20, 2017 at 10:26 PM, Fam Zheng <famz@redhat.com> wrote: > > On Fri, 03/17 09:55, Ed Swierk wrote: > >> I'm running into the same problem taking an external snapshot with a > >> virtio-blk drive with iothread, so it's not specific to virtio-scsi. > >> Run a Linux guest on qemu master > >> > >> qemu-system-x86_64 -nographic -enable-kvm -monitor > >> telnet:0.0.0.0:1234,server,nowait -m 1024 -object > >> iothread,id=iothread1 -drive file=/x/drive.qcow2,if=none,id=drive0 > >> -device virtio-blk-pci,iothread=iothread1,drive=drive0 > >> > >> Then in the monitor > >> > >> snapshot_blkdev drive0 /x/snap1.qcow2 > >> > >> qemu bombs with > >> > >> qemu-system-x86_64: /x/qemu/include/block/aio.h:457: > >> aio_enable_external: Assertion `ctx->external_disable_cnt > 0' failed. > >> > >> whereas without the iothread the assertion failure does not occur. > > > > > > Can you test this one? > > > > --- > > > > > > diff --git a/blockdev.c b/blockdev.c > > index c5b2c2c..4c217d5 100644 > > --- a/blockdev.c > > +++ b/blockdev.c > > @@ -1772,6 +1772,8 @@ static void external_snapshot_prepare(BlkActionState *common, > > return; > > } > > > > + bdrv_set_aio_context(state->new_bs, state->aio_context); > > + > > /* This removes our old bs and adds the new bs. This is an operation that > > * can fail, so we need to do it in .prepare; undoing it for abort is > > * always possible. */ > > @@ -1789,8 +1791,6 @@ static void external_snapshot_commit(BlkActionState *common) > > ExternalSnapshotState *state = > > DO_UPCAST(ExternalSnapshotState, common, common); > > > > - bdrv_set_aio_context(state->new_bs, state->aio_context); > > - > > /* We don't need (or want) to use the transactional > > * bdrv_reopen_multiple() across all the entries at once, because we > > * don't want to abort all of them if one of them fails the reopen */ > > With this change, a different assertion fails on running snapshot_blkdev: > > qemu-system-x86_64: /x/qemu/block/io.c:164: bdrv_drain_recurse: > Assertion `qemu_get_current_aio_context() == qemu_get_aio_context()' > failed. Is there a backtrace? ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [Qemu-devel] Assertion failure taking external snapshot with virtio drive + iothread 2017-03-21 12:50 ` Fam Zheng @ 2017-03-21 13:05 ` Ed Swierk 2017-03-22 9:19 ` Fam Zheng 0 siblings, 1 reply; 14+ messages in thread From: Ed Swierk @ 2017-03-21 13:05 UTC (permalink / raw) To: Fam Zheng; +Cc: Kevin Wolf, Paolo Bonzini, qemu-devel On Tue, Mar 21, 2017 at 5:50 AM, Fam Zheng <famz@redhat.com> wrote: > On Tue, 03/21 05:20, Ed Swierk wrote: >> On Mon, Mar 20, 2017 at 10:26 PM, Fam Zheng <famz@redhat.com> wrote: >> > On Fri, 03/17 09:55, Ed Swierk wrote: >> >> I'm running into the same problem taking an external snapshot with a >> >> virtio-blk drive with iothread, so it's not specific to virtio-scsi. >> >> Run a Linux guest on qemu master >> >> >> >> qemu-system-x86_64 -nographic -enable-kvm -monitor >> >> telnet:0.0.0.0:1234,server,nowait -m 1024 -object >> >> iothread,id=iothread1 -drive file=/x/drive.qcow2,if=none,id=drive0 >> >> -device virtio-blk-pci,iothread=iothread1,drive=drive0 >> >> >> >> Then in the monitor >> >> >> >> snapshot_blkdev drive0 /x/snap1.qcow2 >> >> >> >> qemu bombs with >> >> >> >> qemu-system-x86_64: /x/qemu/include/block/aio.h:457: >> >> aio_enable_external: Assertion `ctx->external_disable_cnt > 0' failed. >> >> >> >> whereas without the iothread the assertion failure does not occur. >> > >> > >> > Can you test this one? >> > >> > --- >> > >> > >> > diff --git a/blockdev.c b/blockdev.c >> > index c5b2c2c..4c217d5 100644 >> > --- a/blockdev.c >> > +++ b/blockdev.c >> > @@ -1772,6 +1772,8 @@ static void external_snapshot_prepare(BlkActionState *common, >> > return; >> > } >> > >> > + bdrv_set_aio_context(state->new_bs, state->aio_context); >> > + >> > /* This removes our old bs and adds the new bs. This is an operation that >> > * can fail, so we need to do it in .prepare; undoing it for abort is >> > * always possible. */ >> > @@ -1789,8 +1791,6 @@ static void external_snapshot_commit(BlkActionState *common) >> > ExternalSnapshotState *state = >> > DO_UPCAST(ExternalSnapshotState, common, common); >> > >> > - bdrv_set_aio_context(state->new_bs, state->aio_context); >> > - >> > /* We don't need (or want) to use the transactional >> > * bdrv_reopen_multiple() across all the entries at once, because we >> > * don't want to abort all of them if one of them fails the reopen */ >> >> With this change, a different assertion fails on running snapshot_blkdev: >> >> qemu-system-x86_64: /x/qemu/block/io.c:164: bdrv_drain_recurse: >> Assertion `qemu_get_current_aio_context() == qemu_get_aio_context()' >> failed. Actually running snapshot_blkdev command in the text monitor doesn't trigger this assertion (I mixed up my notes). Instead it's triggered by the following sequence in qmp-shell: (QEMU) blockdev-snapshot-sync device=drive0 format=qcow2 snapshot-file=/x/snap1.qcow2 {"return": {}} (QEMU) block-commit device=drive0 {"return": {}} (QEMU) block-job-complete device=drive0 {"return": {}} > Is there a backtrace? #0 0x00007ffff3757067 in raise () from /lib/x86_64-linux-gnu/libc.so.6 #1 0x00007ffff3758448 in abort () from /lib/x86_64-linux-gnu/libc.so.6 #2 0x00007ffff3750266 in ?? () from /lib/x86_64-linux-gnu/libc.so.6 #3 0x00007ffff3750312 in __assert_fail () from /lib/x86_64-linux-gnu/libc.so.6 #4 0x0000555555b4b0bb in bdrv_drain_recurse (bs=bs@entry=0x555557bd6010) at /x/qemu/block/io.c:164 #5 0x0000555555b4b7ad in bdrv_drained_begin (bs=0x555557bd6010) at /x/qemu/block/io.c:231 #6 0x0000555555b4b802 in bdrv_parent_drained_begin (bs=0x5555568c1a00) at /x/qemu/block/io.c:53 #7 bdrv_drained_begin (bs=bs@entry=0x5555568c1a00) at /x/qemu/block/io.c:228 #8 0x0000555555b4be1e in bdrv_co_drain_bh_cb (opaque=0x7fff9aaece40) at /x/qemu/block/io.c:190 #9 0x0000555555bb431e in aio_bh_call (bh=0x55555750e5f0) at /x/qemu/util/async.c:90 #10 aio_bh_poll (ctx=ctx@entry=0x555556718090) at /x/qemu/util/async.c:118 #11 0x0000555555bb72eb in aio_poll (ctx=0x555556718090, blocking=blocking@entry=true) at /x/qemu/util/aio-posix.c:682 #12 0x00005555559443ce in iothread_run (opaque=0x555556717b80) at /x/qemu/iothread.c:59 #13 0x00007ffff3ad50a4 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0 #14 0x00007ffff380a87d in clone () from /lib/x86_64-linux-gnu/libc.so.6 --Ed ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [Qemu-devel] Assertion failure taking external snapshot with virtio drive + iothread 2017-03-21 13:05 ` Ed Swierk @ 2017-03-22 9:19 ` Fam Zheng 2017-03-22 17:36 ` Ed Swierk 0 siblings, 1 reply; 14+ messages in thread From: Fam Zheng @ 2017-03-22 9:19 UTC (permalink / raw) To: Ed Swierk; +Cc: Kevin Wolf, Paolo Bonzini, qemu-devel On Tue, 03/21 06:05, Ed Swierk wrote: > On Tue, Mar 21, 2017 at 5:50 AM, Fam Zheng <famz@redhat.com> wrote: > > On Tue, 03/21 05:20, Ed Swierk wrote: > >> On Mon, Mar 20, 2017 at 10:26 PM, Fam Zheng <famz@redhat.com> wrote: > >> > On Fri, 03/17 09:55, Ed Swierk wrote: > >> >> I'm running into the same problem taking an external snapshot with a > >> >> virtio-blk drive with iothread, so it's not specific to virtio-scsi. > >> >> Run a Linux guest on qemu master > >> >> > >> >> qemu-system-x86_64 -nographic -enable-kvm -monitor > >> >> telnet:0.0.0.0:1234,server,nowait -m 1024 -object > >> >> iothread,id=iothread1 -drive file=/x/drive.qcow2,if=none,id=drive0 > >> >> -device virtio-blk-pci,iothread=iothread1,drive=drive0 > >> >> > >> >> Then in the monitor > >> >> > >> >> snapshot_blkdev drive0 /x/snap1.qcow2 > >> >> > >> >> qemu bombs with > >> >> > >> >> qemu-system-x86_64: /x/qemu/include/block/aio.h:457: > >> >> aio_enable_external: Assertion `ctx->external_disable_cnt > 0' failed. > >> >> > >> >> whereas without the iothread the assertion failure does not occur. > >> > > >> > > >> > Can you test this one? > >> > > >> > --- > >> > > >> > > >> > diff --git a/blockdev.c b/blockdev.c > >> > index c5b2c2c..4c217d5 100644 > >> > --- a/blockdev.c > >> > +++ b/blockdev.c > >> > @@ -1772,6 +1772,8 @@ static void external_snapshot_prepare(BlkActionState *common, > >> > return; > >> > } > >> > > >> > + bdrv_set_aio_context(state->new_bs, state->aio_context); > >> > + > >> > /* This removes our old bs and adds the new bs. This is an operation that > >> > * can fail, so we need to do it in .prepare; undoing it for abort is > >> > * always possible. */ > >> > @@ -1789,8 +1791,6 @@ static void external_snapshot_commit(BlkActionState *common) > >> > ExternalSnapshotState *state = > >> > DO_UPCAST(ExternalSnapshotState, common, common); > >> > > >> > - bdrv_set_aio_context(state->new_bs, state->aio_context); > >> > - > >> > /* We don't need (or want) to use the transactional > >> > * bdrv_reopen_multiple() across all the entries at once, because we > >> > * don't want to abort all of them if one of them fails the reopen */ > >> > >> With this change, a different assertion fails on running snapshot_blkdev: > >> > >> qemu-system-x86_64: /x/qemu/block/io.c:164: bdrv_drain_recurse: > >> Assertion `qemu_get_current_aio_context() == qemu_get_aio_context()' > >> failed. > > Actually running snapshot_blkdev command in the text monitor doesn't > trigger this assertion (I mixed up my notes). Instead it's triggered > by the following sequence in qmp-shell: > > (QEMU) blockdev-snapshot-sync device=drive0 format=qcow2 > snapshot-file=/x/snap1.qcow2 > {"return": {}} > (QEMU) block-commit device=drive0 > {"return": {}} > (QEMU) block-job-complete device=drive0 > {"return": {}} > > > Is there a backtrace? > > #0 0x00007ffff3757067 in raise () from /lib/x86_64-linux-gnu/libc.so.6 > #1 0x00007ffff3758448 in abort () from /lib/x86_64-linux-gnu/libc.so.6 > #2 0x00007ffff3750266 in ?? () from /lib/x86_64-linux-gnu/libc.so.6 > #3 0x00007ffff3750312 in __assert_fail () from /lib/x86_64-linux-gnu/libc.so.6 > #4 0x0000555555b4b0bb in bdrv_drain_recurse > (bs=bs@entry=0x555557bd6010) at /x/qemu/block/io.c:164 > #5 0x0000555555b4b7ad in bdrv_drained_begin (bs=0x555557bd6010) at > /x/qemu/block/io.c:231 > #6 0x0000555555b4b802 in bdrv_parent_drained_begin > (bs=0x5555568c1a00) at /x/qemu/block/io.c:53 > #7 bdrv_drained_begin (bs=bs@entry=0x5555568c1a00) at /x/qemu/block/io.c:228 > #8 0x0000555555b4be1e in bdrv_co_drain_bh_cb (opaque=0x7fff9aaece40) > at /x/qemu/block/io.c:190 > #9 0x0000555555bb431e in aio_bh_call (bh=0x55555750e5f0) at > /x/qemu/util/async.c:90 > #10 aio_bh_poll (ctx=ctx@entry=0x555556718090) at /x/qemu/util/async.c:118 > #11 0x0000555555bb72eb in aio_poll (ctx=0x555556718090, > blocking=blocking@entry=true) at /x/qemu/util/aio-posix.c:682 > #12 0x00005555559443ce in iothread_run (opaque=0x555556717b80) at > /x/qemu/iothread.c:59 > #13 0x00007ffff3ad50a4 in start_thread () from > /lib/x86_64-linux-gnu/libpthread.so.0 > #14 0x00007ffff380a87d in clone () from /lib/x86_64-linux-gnu/libc.so.6 Hmm, looks like a separate bug to me. In addition please apply this (the assertion here is correct I think, but all callers are not audited yet): diff --git a/block.c b/block.c index 6e906ec..447d908 100644 --- a/block.c +++ b/block.c @@ -1737,6 +1737,9 @@ static void bdrv_replace_child_noperm(BdrvChild *child, { BlockDriverState *old_bs = child->bs; + if (old_bs && new_bs) { + assert(bdrv_get_aio_context(old_bs) == bdrv_get_aio_context(new_bs)); + } if (old_bs) { if (old_bs->quiesce_counter && child->role->drained_end) { child->role->drained_end(child); diff --git a/block/mirror.c b/block/mirror.c index ca4baa5..a23ca9e 100644 --- a/block/mirror.c +++ b/block/mirror.c @@ -1147,6 +1147,7 @@ static void mirror_start_job(const char *job_id, BlockDriverState *bs, return; } mirror_top_bs->total_sectors = bs->total_sectors; + bdrv_set_aio_context(mirror_top_bs, bdrv_get_aio_context(bs)); /* bdrv_append takes ownership of the mirror_top_bs reference, need to keep * it alive until block_job_create() even if bs has no parent. */ ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [Qemu-devel] Assertion failure taking external snapshot with virtio drive + iothread 2017-03-22 9:19 ` Fam Zheng @ 2017-03-22 17:36 ` Ed Swierk 0 siblings, 0 replies; 14+ messages in thread From: Ed Swierk @ 2017-03-22 17:36 UTC (permalink / raw) To: Fam Zheng; +Cc: Kevin Wolf, Paolo Bonzini, qemu-devel On Wed, Mar 22, 2017 at 2:19 AM, Fam Zheng <famz@redhat.com> wrote: > On Tue, 03/21 06:05, Ed Swierk wrote: >> Actually running snapshot_blkdev command in the text monitor doesn't >> trigger this assertion (I mixed up my notes). Instead it's triggered >> by the following sequence in qmp-shell: >> >> (QEMU) blockdev-snapshot-sync device=drive0 format=qcow2 >> snapshot-file=/x/snap1.qcow2 >> {"return": {}} >> (QEMU) block-commit device=drive0 >> {"return": {}} >> (QEMU) block-job-complete device=drive0 >> {"return": {}} >> >> > Is there a backtrace? >> >> #0 0x00007ffff3757067 in raise () from /lib/x86_64-linux-gnu/libc.so.6 >> #1 0x00007ffff3758448 in abort () from /lib/x86_64-linux-gnu/libc.so.6 >> #2 0x00007ffff3750266 in ?? () from /lib/x86_64-linux-gnu/libc.so.6 >> #3 0x00007ffff3750312 in __assert_fail () from /lib/x86_64-linux-gnu/libc.so.6 >> #4 0x0000555555b4b0bb in bdrv_drain_recurse >> (bs=bs@entry=0x555557bd6010) at /x/qemu/block/io.c:164 >> #5 0x0000555555b4b7ad in bdrv_drained_begin (bs=0x555557bd6010) at >> /x/qemu/block/io.c:231 >> #6 0x0000555555b4b802 in bdrv_parent_drained_begin >> (bs=0x5555568c1a00) at /x/qemu/block/io.c:53 >> #7 bdrv_drained_begin (bs=bs@entry=0x5555568c1a00) at /x/qemu/block/io.c:228 >> #8 0x0000555555b4be1e in bdrv_co_drain_bh_cb (opaque=0x7fff9aaece40) >> at /x/qemu/block/io.c:190 >> #9 0x0000555555bb431e in aio_bh_call (bh=0x55555750e5f0) at >> /x/qemu/util/async.c:90 >> #10 aio_bh_poll (ctx=ctx@entry=0x555556718090) at /x/qemu/util/async.c:118 >> #11 0x0000555555bb72eb in aio_poll (ctx=0x555556718090, >> blocking=blocking@entry=true) at /x/qemu/util/aio-posix.c:682 >> #12 0x00005555559443ce in iothread_run (opaque=0x555556717b80) at >> /x/qemu/iothread.c:59 >> #13 0x00007ffff3ad50a4 in start_thread () from >> /lib/x86_64-linux-gnu/libpthread.so.0 >> #14 0x00007ffff380a87d in clone () from /lib/x86_64-linux-gnu/libc.so.6 > > Hmm, looks like a separate bug to me. In addition please apply this (the > assertion here is correct I think, but all callers are not audited yet): > > diff --git a/block.c b/block.c > index 6e906ec..447d908 100644 > --- a/block.c > +++ b/block.c > @@ -1737,6 +1737,9 @@ static void bdrv_replace_child_noperm(BdrvChild *child, > { > BlockDriverState *old_bs = child->bs; > > + if (old_bs && new_bs) { > + assert(bdrv_get_aio_context(old_bs) == bdrv_get_aio_context(new_bs)); > + } > if (old_bs) { > if (old_bs->quiesce_counter && child->role->drained_end) { > child->role->drained_end(child); > diff --git a/block/mirror.c b/block/mirror.c > index ca4baa5..a23ca9e 100644 > --- a/block/mirror.c > +++ b/block/mirror.c > @@ -1147,6 +1147,7 @@ static void mirror_start_job(const char *job_id, BlockDriverState *bs, > return; > } > mirror_top_bs->total_sectors = bs->total_sectors; > + bdrv_set_aio_context(mirror_top_bs, bdrv_get_aio_context(bs)); > > /* bdrv_append takes ownership of the mirror_top_bs reference, need to keep > * it alive until block_job_create() even if bs has no parent. */ With this patch, I'm seeing either assertions or hangs when I run blockdev-snapshot-sync, block-commit and block-job-complete repeatedly. The exact assertion seems to depend on timing and/or what combination of your other patches I apply. They include: /x/qemu/hw/virtio/virtio.c:212: vring_get_region_caches: Assertion `caches != ((void *)0)' failed. /x/qemu/block/mirror.c:350: mirror_iteration: Assertion `sector_num >= 0' failed. /x/qemu/block/mirror.c:865: mirror_run: Assertion `((&bs->tracked_requests)->lh_first == ((void *)0))' failed. We don't appear to be converging on a solution here. Perhaps I should instead focus on implementing automated tests so that you or anyone else can easily reproduce these problems. The only tricky part is extending qemu-iotests to include booting a guest to generate block IO and trigger race conditions, but I have some ideas about how to do this with a minimal (< 5 MB) Linux kernel+rootfs. --Ed ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2017-03-22 17:36 UTC | newest] Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2017-03-17 16:55 [Qemu-devel] Assertion failure taking external snapshot with virtio drive + iothread Ed Swierk 2017-03-17 17:11 ` Paolo Bonzini 2017-03-17 17:15 ` Paolo Bonzini 2017-03-17 17:32 ` Ed Swierk 2017-03-17 18:10 ` Paolo Bonzini 2017-03-17 19:27 ` Paolo Bonzini 2017-03-20 21:54 ` Ed Swierk 2017-03-21 1:48 ` Ed Swierk 2017-03-21 5:26 ` Fam Zheng 2017-03-21 12:20 ` Ed Swierk 2017-03-21 12:50 ` Fam Zheng 2017-03-21 13:05 ` Ed Swierk 2017-03-22 9:19 ` Fam Zheng 2017-03-22 17:36 ` Ed Swierk
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.