* [Qemu-devel] Assertion failure taking external snapshot with virtio drive + iothread
@ 2017-03-17 16:55 Ed Swierk
2017-03-17 17:11 ` Paolo Bonzini
2017-03-21 5:26 ` Fam Zheng
0 siblings, 2 replies; 14+ messages in thread
From: Ed Swierk @ 2017-03-17 16:55 UTC (permalink / raw)
To: Fam Zheng, Kevin Wolf; +Cc: qemu-devel, Paolo Bonzini
I'm running into the same problem taking an external snapshot with a
virtio-blk drive with iothread, so it's not specific to virtio-scsi.
Run a Linux guest on qemu master
qemu-system-x86_64 -nographic -enable-kvm -monitor
telnet:0.0.0.0:1234,server,nowait -m 1024 -object
iothread,id=iothread1 -drive file=/x/drive.qcow2,if=none,id=drive0
-device virtio-blk-pci,iothread=iothread1,drive=drive0
Then in the monitor
snapshot_blkdev drive0 /x/snap1.qcow2
qemu bombs with
qemu-system-x86_64: /x/qemu/include/block/aio.h:457:
aio_enable_external: Assertion `ctx->external_disable_cnt > 0' failed.
whereas without the iothread the assertion failure does not occur.
--Ed
On Thu, Mar 16, 2017 at 5:26 PM, Ed Swierk <eswierk@skyportsystems.com> wrote:
> With this change on top of 2.9.0-rc0, I am able to boot a Linux guest
> from a virtio-scsi drive with an iothread, e.g.
>
> qemu-system-x86_64 -nographic -enable-kvm -monitor
> telnet:0.0.0.0:1234,server,nowait -m 1024 -object
> iothread,id=iothread1 -device
> virtio-scsi-pci,iothread=iothread1,id=scsi0 -drive
> file=/x/drive.qcow2,format=qcow2,if=none,id=drive0,cache=directsync,aio=native
> -device scsi-hd,drive=drive0,bootindex=1
>
> But when I try to take a snapshot by running this in the monitor
>
> snapshot_blkdev drive0 /x/snap1.qcow2
>
> qemu bombs with
>
> qemu-system-x86_64: /x/qemu/include/block/aio.h:457:
> aio_enable_external: Assertion `ctx->external_disable_cnt > 0' failed.
>
> This does not occur if I don't use the iothread.
>
> I instrumented the code a bit, printing the value of bs,
> bdrv_get_aio_context(bs), and
> bdrv_get_aio_context(bs)->external_disable_cnt before and after
> aio_{disable,enable}_external() in bdrv_drained_{begin,end}().
>
> Without the iothread, nested calls to these functions cause the
> counter to increase and decrease as you'd expect, and the context is
> the same in each call.
>
> bdrv_drained_begin 0 bs=0x7fe9f5ad65a0 ctx=0x7fe9f5abc7b0 cnt=0
> bdrv_drained_begin 1 bs=0x7fe9f5ad65a0 ctx=0x7fe9f5abc7b0 cnt=1
> bdrv_drained_begin 0 bs=0x7fe9f5d12a00 ctx=0x7fe9f5abc7b0 cnt=1
> bdrv_drained_begin 1 bs=0x7fe9f5d12a00 ctx=0x7fe9f5abc7b0 cnt=2
> bdrv_drained_end 0 bs=0x7fe9f5d12a00 ctx=0x7fe9f5abc7b0 cnt=2
> bdrv_drained_end 1 bs=0x7fe9f5d12a00 ctx=0x7fe9f5abc7b0 cnt=1
> bdrv_drained_begin 0 bs=0x7fe9f5d12a00 ctx=0x7fe9f5abc7b0 cnt=1
> bdrv_drained_begin 1 bs=0x7fe9f5d12a00 ctx=0x7fe9f5abc7b0 cnt=2
> bdrv_drained_begin 0 bs=0x7fe9f67cfde0 ctx=0x7fe9f5abc7b0 cnt=2
> bdrv_drained_begin 1 bs=0x7fe9f67cfde0 ctx=0x7fe9f5abc7b0 cnt=3
> bdrv_drained_end 0 bs=0x7fe9f67cfde0 ctx=0x7fe9f5abc7b0 cnt=3
> bdrv_drained_end 1 bs=0x7fe9f67cfde0 ctx=0x7fe9f5abc7b0 cnt=2
> bdrv_drained_end 0 bs=0x7fe9f5d12a00 ctx=0x7fe9f5abc7b0 cnt=2
> bdrv_drained_end 1 bs=0x7fe9f5d12a00 ctx=0x7fe9f5abc7b0 cnt=1
> bdrv_drained_begin 0 bs=0x7fe9f5d12a00 ctx=0x7fe9f5abc7b0 cnt=1
> bdrv_drained_begin 1 bs=0x7fe9f5d12a00 ctx=0x7fe9f5abc7b0 cnt=2
> bdrv_drained_begin 0 bs=0x7fe9f67cfde0 ctx=0x7fe9f5abc7b0 cnt=2
> bdrv_drained_begin 1 bs=0x7fe9f67cfde0 ctx=0x7fe9f5abc7b0 cnt=3
> bdrv_drained_end 0 bs=0x7fe9f67cfde0 ctx=0x7fe9f5abc7b0 cnt=3
> bdrv_drained_end 1 bs=0x7fe9f67cfde0 ctx=0x7fe9f5abc7b0 cnt=2
> bdrv_drained_end 0 bs=0x7fe9f5d12a00 ctx=0x7fe9f5abc7b0 cnt=2
> bdrv_drained_end 1 bs=0x7fe9f5d12a00 ctx=0x7fe9f5abc7b0 cnt=1
> bdrv_drained_begin 0 bs=0x7fe9f5d12a00 ctx=0x7fe9f5abc7b0 cnt=1
> bdrv_drained_begin 1 bs=0x7fe9f5d12a00 ctx=0x7fe9f5abc7b0 cnt=2
> bdrv_drained_end 0 bs=0x7fe9f5d12a00 ctx=0x7fe9f5abc7b0 cnt=2
> bdrv_drained_end 1 bs=0x7fe9f5d12a00 ctx=0x7fe9f5abc7b0 cnt=1
> bdrv_drained_end 0 bs=0x7fe9f5ad65a0 ctx=0x7fe9f5abc7b0 cnt=1
> bdrv_drained_end 1 bs=0x7fe9f5ad65a0 ctx=0x7fe9f5abc7b0 cnt=0
>
> But with the iothread, there are at least two different context
> pointers, and there is one extra call to bdrv_drained_end() without a
> matching bdrv_drained_begin(). That last call comes from
> external_snapshot_clean().
>
> bdrv_drained_begin 0 bs=0x7fe4437545c0 ctx=0x7fe443749a00 cnt=0
> bdrv_drained_begin 1 bs=0x7fe4437545c0 ctx=0x7fe443749a00 cnt=1
> bdrv_drained_begin 0 bs=0x7fe443990a00 ctx=0x7fe44373a7b0 cnt=0
> bdrv_drained_begin 1 bs=0x7fe443990a00 ctx=0x7fe44373a7b0 cnt=1
> bdrv_drained_end 0 bs=0x7fe443990a00 ctx=0x7fe44373a7b0 cnt=1
> bdrv_drained_end 1 bs=0x7fe443990a00 ctx=0x7fe44373a7b0 cnt=0
> bdrv_drained_begin 0 bs=0x7fe443990a00 ctx=0x7fe44373a7b0 cnt=0
> bdrv_drained_begin 1 bs=0x7fe443990a00 ctx=0x7fe44373a7b0 cnt=1
> bdrv_drained_begin 0 bs=0x7fe44444de20 ctx=0x7fe44373a7b0 cnt=1
> bdrv_drained_begin 1 bs=0x7fe44444de20 ctx=0x7fe44373a7b0 cnt=2
> bdrv_drained_end 0 bs=0x7fe44444de20 ctx=0x7fe44373a7b0 cnt=2
> bdrv_drained_end 1 bs=0x7fe44444de20 ctx=0x7fe44373a7b0 cnt=1
> bdrv_drained_end 0 bs=0x7fe443990a00 ctx=0x7fe44373a7b0 cnt=1
> bdrv_drained_end 1 bs=0x7fe443990a00 ctx=0x7fe44373a7b0 cnt=0
> bdrv_drained_begin 0 bs=0x7fe443990a00 ctx=0x7fe44373a7b0 cnt=0
> bdrv_drained_begin 1 bs=0x7fe443990a00 ctx=0x7fe44373a7b0 cnt=1
> bdrv_drained_begin 0 bs=0x7fe44444de20 ctx=0x7fe44373a7b0 cnt=1
> bdrv_drained_begin 1 bs=0x7fe44444de20 ctx=0x7fe44373a7b0 cnt=2
> bdrv_drained_end 0 bs=0x7fe44444de20 ctx=0x7fe44373a7b0 cnt=2
> bdrv_drained_end 1 bs=0x7fe44444de20 ctx=0x7fe44373a7b0 cnt=1
> bdrv_drained_end 0 bs=0x7fe443990a00 ctx=0x7fe44373a7b0 cnt=1
> bdrv_drained_end 1 bs=0x7fe443990a00 ctx=0x7fe44373a7b0 cnt=0
> bdrv_drained_begin 0 bs=0x7fe443990a00 ctx=0x7fe44373a7b0 cnt=0
> bdrv_drained_begin 1 bs=0x7fe443990a00 ctx=0x7fe44373a7b0 cnt=1
> bdrv_drained_end 0 bs=0x7fe443990a00 ctx=0x7fe443749a00 cnt=1
> bdrv_drained_end 1 bs=0x7fe443990a00 ctx=0x7fe443749a00 cnt=0
> bdrv_drained_end 0 bs=0x7fe4437545c0 ctx=0x7fe443749a00 cnt=0
> qemu-system-x86_64: /x/qemu/include/block/aio.h:457:
> aio_enable_external: Assertion `ctx->external_disable_cnt > 0' failed.
>
> I didn't have much luck bisecting the bug, since about 200 commits
> prior to 2.9.0-rc0 qemu bombs immediately on boot, and after that I
> get the assertion addressed by your patch. I have to go farther back
> to find a working version.
>
> Any help would be appreciated.
>
> --Ed
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [Qemu-devel] Assertion failure taking external snapshot with virtio drive + iothread
2017-03-17 16:55 [Qemu-devel] Assertion failure taking external snapshot with virtio drive + iothread Ed Swierk
@ 2017-03-17 17:11 ` Paolo Bonzini
2017-03-17 17:15 ` Paolo Bonzini
2017-03-21 5:26 ` Fam Zheng
1 sibling, 1 reply; 14+ messages in thread
From: Paolo Bonzini @ 2017-03-17 17:11 UTC (permalink / raw)
To: Ed Swierk, Fam Zheng, Kevin Wolf; +Cc: qemu-devel
On 17/03/2017 17:55, Ed Swierk wrote:
> I'm running into the same problem taking an external snapshot with a
> virtio-blk drive with iothread, so it's not specific to virtio-scsi.
> Run a Linux guest on qemu master
>
> qemu-system-x86_64 -nographic -enable-kvm -monitor
> telnet:0.0.0.0:1234,server,nowait -m 1024 -object
> iothread,id=iothread1 -drive file=/x/drive.qcow2,if=none,id=drive0
> -device virtio-blk-pci,iothread=iothread1,drive=drive0
>
> Then in the monitor
>
> snapshot_blkdev drive0 /x/snap1.qcow2
>
> qemu bombs with
>
> qemu-system-x86_64: /x/qemu/include/block/aio.h:457:
> aio_enable_external: Assertion `ctx->external_disable_cnt > 0' failed.
>
> whereas without the iothread the assertion failure does not occur.
Please try this patch:
diff --git a/block/block-backend.c b/block/block-backend.c
index 5742c09..1d95879 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -1876,6 +1876,7 @@ static void blk_root_drained_begin(BdrvChild *child)
if (blk->public.io_limits_disabled++ == 0) {
throttle_group_restart_blk(blk);
}
+ aio_disable_external(bdrv_get_aio_context(bs));
}
static void blk_root_drained_end(BdrvChild *child)
@@ -1883,5 +1884,6 @@ static void blk_root_drained_end(BdrvChild *child)
BlockBackend *blk = child->opaque;
assert(blk->public.io_limits_disabled);
+ aio_enable_external(bdrv_get_aio_context(bs));
--blk->public.io_limits_disabled;
}
diff --git a/block/io.c b/block/io.c
index 2709a70..a6dcef5 100644
--- a/block/io.c
+++ b/block/io.c
@@ -224,7 +224,6 @@ void bdrv_drained_begin(BlockDriverState *bs)
}
if (!bs->quiesce_counter++) {
- aio_disable_external(bdrv_get_aio_context(bs));
bdrv_parent_drained_begin(bs);
}
@@ -239,7 +238,6 @@ void bdrv_drained_end(BlockDriverState *bs)
}
bdrv_parent_drained_end(bs);
- aio_enable_external(bdrv_get_aio_context(bs));
}
/*
This is not a proper fix, the right one would move the calls into
virtio-blk and virtio-scsi, but it might be a start. I think the
issue is that you have one call to aio_disable_external for drive.qcow2
before the snapshot, and two calls to aio_enable_external (one for
drive.qcow2 and one for snap1.qcow2) after.
Paolo
^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [Qemu-devel] Assertion failure taking external snapshot with virtio drive + iothread
2017-03-17 17:11 ` Paolo Bonzini
@ 2017-03-17 17:15 ` Paolo Bonzini
2017-03-17 17:32 ` Ed Swierk
0 siblings, 1 reply; 14+ messages in thread
From: Paolo Bonzini @ 2017-03-17 17:15 UTC (permalink / raw)
To: Ed Swierk, Fam Zheng, Kevin Wolf; +Cc: qemu-devel
On 17/03/2017 18:11, Paolo Bonzini wrote:
>
>
> On 17/03/2017 17:55, Ed Swierk wrote:
>> I'm running into the same problem taking an external snapshot with a
>> virtio-blk drive with iothread, so it's not specific to virtio-scsi.
>> Run a Linux guest on qemu master
>>
>> qemu-system-x86_64 -nographic -enable-kvm -monitor
>> telnet:0.0.0.0:1234,server,nowait -m 1024 -object
>> iothread,id=iothread1 -drive file=/x/drive.qcow2,if=none,id=drive0
>> -device virtio-blk-pci,iothread=iothread1,drive=drive0
>>
>> Then in the monitor
>>
>> snapshot_blkdev drive0 /x/snap1.qcow2
>>
>> qemu bombs with
>>
>> qemu-system-x86_64: /x/qemu/include/block/aio.h:457:
>> aio_enable_external: Assertion `ctx->external_disable_cnt > 0' failed.
>>
>> whereas without the iothread the assertion failure does not occur.
>
> Please try this patch:
Hmm, no. I'll post the full fix on top of John Snow's patches.
Paolo
> diff --git a/block/block-backend.c b/block/block-backend.c
> index 5742c09..1d95879 100644
> --- a/block/block-backend.c
> +++ b/block/block-backend.c
> @@ -1876,6 +1876,7 @@ static void blk_root_drained_begin(BdrvChild *child)
> if (blk->public.io_limits_disabled++ == 0) {
> throttle_group_restart_blk(blk);
> }
> + aio_disable_external(bdrv_get_aio_context(bs));
> }
>
> static void blk_root_drained_end(BdrvChild *child)
> @@ -1883,5 +1884,6 @@ static void blk_root_drained_end(BdrvChild *child)
> BlockBackend *blk = child->opaque;
>
> assert(blk->public.io_limits_disabled);
> + aio_enable_external(bdrv_get_aio_context(bs));
> --blk->public.io_limits_disabled;
> }
> diff --git a/block/io.c b/block/io.c
> index 2709a70..a6dcef5 100644
> --- a/block/io.c
> +++ b/block/io.c
> @@ -224,7 +224,6 @@ void bdrv_drained_begin(BlockDriverState *bs)
> }
>
> if (!bs->quiesce_counter++) {
> - aio_disable_external(bdrv_get_aio_context(bs));
> bdrv_parent_drained_begin(bs);
> }
>
> @@ -239,7 +238,6 @@ void bdrv_drained_end(BlockDriverState *bs)
> }
>
> bdrv_parent_drained_end(bs);
> - aio_enable_external(bdrv_get_aio_context(bs));
> }
>
> /*
>
> This is not a proper fix, the right one would move the calls into
> virtio-blk and virtio-scsi, but it might be a start. I think the
> issue is that you have one call to aio_disable_external for drive.qcow2
> before the snapshot, and two calls to aio_enable_external (one for
> drive.qcow2 and one for snap1.qcow2) after.
>
> Paolo
>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [Qemu-devel] Assertion failure taking external snapshot with virtio drive + iothread
2017-03-17 17:15 ` Paolo Bonzini
@ 2017-03-17 17:32 ` Ed Swierk
2017-03-17 18:10 ` Paolo Bonzini
2017-03-17 19:27 ` Paolo Bonzini
0 siblings, 2 replies; 14+ messages in thread
From: Ed Swierk @ 2017-03-17 17:32 UTC (permalink / raw)
To: Paolo Bonzini; +Cc: Fam Zheng, Kevin Wolf, qemu-devel
On Fri, Mar 17, 2017 at 10:15 AM, Paolo Bonzini <pbonzini@redhat.com> wrote:
>
>
> On 17/03/2017 18:11, Paolo Bonzini wrote:
>>
>>
>> On 17/03/2017 17:55, Ed Swierk wrote:
>>> I'm running into the same problem taking an external snapshot with a
>>> virtio-blk drive with iothread, so it's not specific to virtio-scsi.
>>> Run a Linux guest on qemu master
>>>
>>> qemu-system-x86_64 -nographic -enable-kvm -monitor
>>> telnet:0.0.0.0:1234,server,nowait -m 1024 -object
>>> iothread,id=iothread1 -drive file=/x/drive.qcow2,if=none,id=drive0
>>> -device virtio-blk-pci,iothread=iothread1,drive=drive0
>>>
>>> Then in the monitor
>>>
>>> snapshot_blkdev drive0 /x/snap1.qcow2
>>>
>>> qemu bombs with
>>>
>>> qemu-system-x86_64: /x/qemu/include/block/aio.h:457:
>>> aio_enable_external: Assertion `ctx->external_disable_cnt > 0' failed.
>>>
>>> whereas without the iothread the assertion failure does not occur.
>>
>> Please try this patch:
>
> Hmm, no. I'll post the full fix on top of John Snow's patches.
OK. Incidentally, testing with virtio-blk I bisected the assertion
failure to b2c2832c6140cfe3ddc0de2d77eeb0b77dea8fd3 ("block: Add Error
parameter to bdrv_append()").
--Ed
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [Qemu-devel] Assertion failure taking external snapshot with virtio drive + iothread
2017-03-17 17:32 ` Ed Swierk
@ 2017-03-17 18:10 ` Paolo Bonzini
2017-03-17 19:27 ` Paolo Bonzini
1 sibling, 0 replies; 14+ messages in thread
From: Paolo Bonzini @ 2017-03-17 18:10 UTC (permalink / raw)
To: Ed Swierk; +Cc: Fam Zheng, Kevin Wolf, qemu-devel
On 17/03/2017 18:32, Ed Swierk wrote:
> On Fri, Mar 17, 2017 at 10:15 AM, Paolo Bonzini <pbonzini@redhat.com> wrote:
>>
>>
>> On 17/03/2017 18:11, Paolo Bonzini wrote:
>>>
>>>
>>> On 17/03/2017 17:55, Ed Swierk wrote:
>>>> I'm running into the same problem taking an external snapshot with a
>>>> virtio-blk drive with iothread, so it's not specific to virtio-scsi.
>>>> Run a Linux guest on qemu master
>>>>
>>>> qemu-system-x86_64 -nographic -enable-kvm -monitor
>>>> telnet:0.0.0.0:1234,server,nowait -m 1024 -object
>>>> iothread,id=iothread1 -drive file=/x/drive.qcow2,if=none,id=drive0
>>>> -device virtio-blk-pci,iothread=iothread1,drive=drive0
>>>>
>>>> Then in the monitor
>>>>
>>>> snapshot_blkdev drive0 /x/snap1.qcow2
>>>>
>>>> qemu bombs with
>>>>
>>>> qemu-system-x86_64: /x/qemu/include/block/aio.h:457:
>>>> aio_enable_external: Assertion `ctx->external_disable_cnt > 0' failed.
>>>>
>>>> whereas without the iothread the assertion failure does not occur.
>>>
>>> Please try this patch:
>>
>> Hmm, no. I'll post the full fix on top of John Snow's patches.
>
> OK. Incidentally, testing with virtio-blk I bisected the assertion
> failure to b2c2832c6140cfe3ddc0de2d77eeb0b77dea8fd3 ("block: Add Error
> parameter to bdrv_append()").
And in particular to this:
+ bdrv_set_backing_hd(bs_new, bs_top, &local_err);
+ if (local_err) {
+ error_propagate(errp, local_err);
+ goto out;
+ }
change_parent_backing_link(bs_top, bs_new);
- /* FIXME Error handling */
- bdrv_set_backing_hd(bs_new, bs_top, &error_abort);
Paolo
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [Qemu-devel] Assertion failure taking external snapshot with virtio drive + iothread
2017-03-17 17:32 ` Ed Swierk
2017-03-17 18:10 ` Paolo Bonzini
@ 2017-03-17 19:27 ` Paolo Bonzini
2017-03-20 21:54 ` Ed Swierk
2017-03-21 1:48 ` Ed Swierk
1 sibling, 2 replies; 14+ messages in thread
From: Paolo Bonzini @ 2017-03-17 19:27 UTC (permalink / raw)
To: Ed Swierk; +Cc: Fam Zheng, Kevin Wolf, qemu-devel
[-- Attachment #1: Type: text/plain, Size: 2427 bytes --]
On 17/03/2017 18:32, Ed Swierk wrote:
> On Fri, Mar 17, 2017 at 10:15 AM, Paolo Bonzini <pbonzini@redhat.com> wrote:
>>
>>
>> On 17/03/2017 18:11, Paolo Bonzini wrote:
>>>
>>>
>>> On 17/03/2017 17:55, Ed Swierk wrote:
>>>> I'm running into the same problem taking an external snapshot with a
>>>> virtio-blk drive with iothread, so it's not specific to virtio-scsi.
>>>> Run a Linux guest on qemu master
>>>>
>>>> qemu-system-x86_64 -nographic -enable-kvm -monitor
>>>> telnet:0.0.0.0:1234,server,nowait -m 1024 -object
>>>> iothread,id=iothread1 -drive file=/x/drive.qcow2,if=none,id=drive0
>>>> -device virtio-blk-pci,iothread=iothread1,drive=drive0
>>>>
>>>> Then in the monitor
>>>>
>>>> snapshot_blkdev drive0 /x/snap1.qcow2
>>>>
>>>> qemu bombs with
>>>>
>>>> qemu-system-x86_64: /x/qemu/include/block/aio.h:457:
>>>> aio_enable_external: Assertion `ctx->external_disable_cnt > 0' failed.
>>>>
>>>> whereas without the iothread the assertion failure does not occur.
>>>
>>> Please try this patch:
>>
>> Hmm, no. I'll post the full fix on top of John Snow's patches.
>
> OK. Incidentally, testing with virtio-blk I bisected the assertion
> failure to b2c2832c6140cfe3ddc0de2d77eeb0b77dea8fd3 ("block: Add Error
> parameter to bdrv_append()").
And this is a fix, but I have no idea why/how it works and what else it
may break.
Patches 1 and 2 are pretty obvious and would be the first step towards
eliminating aio_disable/enable_external altogether.
However I got patch 3 more or less by trial and error, and when I
thought I had the reasoning right I noticed this:
bdrv_drained_end(state->old_bs);
in external_snapshot_clean which makes no sense given the
bdrv_drained_begin(bs_new);
that I added to bdrv_append. So take this with a ton of salt.
The basic idea is that calling child->role->drained_begin and
child->role->drained_end is not necessary and in fact actively wrong
when both the old and the new child should be in a drained section.
But maybe instead it should be asserted that they are, except for the
special case of adding or removing a child. i.e. after
int drain = !!(old_bs && old_bs->quiesce_counter) - !!(new_bs && new_bs->quiesce_counter);
add
assert(!(drain && old_bs && new_bs));
Throwing this out because it's Friday evening... Maybe Fam can pick
it up on Monday.
Paolo
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: ff.patch --]
[-- Type: text/x-patch; name="ff.patch", Size: 9282 bytes --]
From f399388896c49fae4fd3f4837520d58b704c024a Mon Sep 17 00:00:00 2001
From: Paolo Bonzini <pbonzini@redhat.com>
Date: Fri, 17 Mar 2017 19:05:44 +0100
Subject: [PATCH 1/3] scsi: add drained_begin/drained_end callbacks to bus
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
hw/scsi/scsi-bus.c | 14 ++++++++++++++
hw/scsi/scsi-disk.c | 24 ++++++++++++++++++++++++
include/hw/scsi/scsi.h | 5 +++++
3 files changed, 43 insertions(+)
diff --git a/hw/scsi/scsi-bus.c b/hw/scsi/scsi-bus.c
index f557446..fcad82b 100644
--- a/hw/scsi/scsi-bus.c
+++ b/hw/scsi/scsi-bus.c
@@ -97,6 +97,20 @@ void scsi_bus_new(SCSIBus *bus, size_t bus_size, DeviceState *host,
qbus_set_bus_hotplug_handler(BUS(bus), &error_abort);
}
+void scsi_bus_drained_begin(SCSIBus *bus)
+{
+ if (bus->info->drained_begin) {
+ bus->info->drained_begin(bus);
+ }
+}
+
+void scsi_bus_drained_end(SCSIBus *bus)
+{
+ if (bus->info->drained_end) {
+ bus->info->drained_end(bus);
+ }
+}
+
static void scsi_dma_restart_bh(void *opaque)
{
SCSIDevice *s = opaque;
diff --git a/hw/scsi/scsi-disk.c b/hw/scsi/scsi-disk.c
index a53f058..faca77c 100644
--- a/hw/scsi/scsi-disk.c
+++ b/hw/scsi/scsi-disk.c
@@ -2281,6 +2281,24 @@ static bool scsi_cd_is_medium_locked(void *opaque)
return ((SCSIDiskState *)opaque)->tray_locked;
}
+static void scsi_disk_drained_begin(void *opaque)
+{
+ SCSIDiskState *s = opaque;
+ SCSIDevice *sdev = SCSI_DEVICE(s);
+ SCSIBus *bus = DO_UPCAST(SCSIBus, qbus, sdev->qdev.parent_bus);
+
+ scsi_bus_drained_begin(bus);
+}
+
+static void scsi_disk_drained_end(void *opaque)
+{
+ SCSIDiskState *s = opaque;
+ SCSIDevice *sdev = SCSI_DEVICE(s);
+ SCSIBus *bus = DO_UPCAST(SCSIBus, qbus, sdev->qdev.parent_bus);
+
+ scsi_bus_drained_end(bus);
+}
+
static const BlockDevOps scsi_disk_removable_block_ops = {
.change_media_cb = scsi_cd_change_media_cb,
.eject_request_cb = scsi_cd_eject_request_cb,
@@ -2288,10 +2306,16 @@ static const BlockDevOps scsi_disk_removable_block_ops = {
.is_medium_locked = scsi_cd_is_medium_locked,
.resize_cb = scsi_disk_resize_cb,
+
+ .drained_begin = scsi_disk_drained_begin,
+ .drained_end = scsi_disk_drained_end,
};
static const BlockDevOps scsi_disk_block_ops = {
.resize_cb = scsi_disk_resize_cb,
+
+ .drained_begin = scsi_disk_drained_begin,
+ .drained_end = scsi_disk_drained_end,
};
static void scsi_disk_unit_attention_reported(SCSIDevice *dev)
diff --git a/include/hw/scsi/scsi.h b/include/hw/scsi/scsi.h
index 6b85786..915c1bb 100644
--- a/include/hw/scsi/scsi.h
+++ b/include/hw/scsi/scsi.h
@@ -153,6 +153,9 @@ struct SCSIBusInfo {
void (*save_request)(QEMUFile *f, SCSIRequest *req);
void *(*load_request)(QEMUFile *f, SCSIRequest *req);
void (*free_request)(SCSIBus *bus, void *priv);
+
+ void (*drained_begin)(SCSIBus *bus);
+ void (*drained_end)(SCSIBus *bus);
};
#define TYPE_SCSI_BUS "SCSI"
@@ -257,6 +260,8 @@ void scsi_req_unref(SCSIRequest *req);
int scsi_bus_parse_cdb(SCSIDevice *dev, SCSICommand *cmd, uint8_t *buf,
void *hba_private);
+void scsi_bus_drained_begin(SCSIBus *bus);
+void scsi_bus_drained_end(SCSIBus *bus);
int scsi_req_parse_cdb(SCSIDevice *dev, SCSICommand *cmd, uint8_t *buf);
void scsi_req_build_sense(SCSIRequest *req, SCSISense sense);
void scsi_req_print(SCSIRequest *req);
--
2.9.3
From b150bf792721b6bbd652aa9017ffde083a075a0d Mon Sep 17 00:00:00 2001
From: Paolo Bonzini <pbonzini@redhat.com>
Date: Fri, 17 Mar 2017 18:31:41 +0100
Subject: [PATCH 2/3] block: move aio_disable_external/aio_enable_external to
virtio devices
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
block/io.c | 4 ----
hw/block/virtio-blk.c | 16 ++++++++++++++++
hw/scsi/virtio-scsi.c | 19 +++++++++++++++++++
3 files changed, 35 insertions(+), 4 deletions(-)
diff --git a/block/io.c b/block/io.c
index 2709a70..d6c19f9 100644
--- a/block/io.c
+++ b/block/io.c
@@ -224,7 +224,6 @@ void bdrv_drained_begin(BlockDriverState *bs)
}
if (!bs->quiesce_counter++) {
- aio_disable_external(bdrv_get_aio_context(bs));
bdrv_parent_drained_begin(bs);
}
@@ -239,7 +238,6 @@ void bdrv_drained_end(BlockDriverState *bs)
}
bdrv_parent_drained_end(bs);
- aio_enable_external(bdrv_get_aio_context(bs));
}
/*
@@ -300,7 +298,6 @@ void bdrv_drain_all_begin(void)
aio_context_acquire(aio_context);
bdrv_parent_drained_begin(bs);
- aio_disable_external(aio_context);
aio_context_release(aio_context);
if (!g_slist_find(aio_ctxs, aio_context)) {
@@ -343,7 +340,6 @@ void bdrv_drain_all_end(void)
AioContext *aio_context = bdrv_get_aio_context(bs);
aio_context_acquire(aio_context);
- aio_enable_external(aio_context);
bdrv_parent_drained_end(bs);
aio_context_release(aio_context);
}
diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
index 98c16a7..de061c0 100644
--- a/hw/block/virtio-blk.c
+++ b/hw/block/virtio-blk.c
@@ -902,7 +902,23 @@ static void virtio_blk_resize(void *opaque)
virtio_notify_config(vdev);
}
+static void virtio_blk_drained_begin(void *opaque)
+{
+ VirtIOBlock *s = VIRTIO_BLK(opaque);
+
+ aio_disable_external(blk_get_aio_context(s->conf.conf.blk));
+}
+
+static void virtio_blk_drained_end(void *opaque)
+{
+ VirtIOBlock *s = VIRTIO_BLK(opaque);
+
+ aio_enable_external(blk_get_aio_context(s->conf.conf.blk));
+}
+
static const BlockDevOps virtio_block_ops = {
+ .drained_begin = virtio_blk_drained_begin,
+ .drained_end = virtio_blk_drained_end,
.resize_cb = virtio_blk_resize,
};
diff --git a/hw/scsi/virtio-scsi.c b/hw/scsi/virtio-scsi.c
index bd62d08..788d36a 100644
--- a/hw/scsi/virtio-scsi.c
+++ b/hw/scsi/virtio-scsi.c
@@ -826,6 +826,22 @@ static void virtio_scsi_hotunplug(HotplugHandler *hotplug_dev, DeviceState *dev,
qdev_simple_device_unplug_cb(hotplug_dev, dev, errp);
}
+static void virtio_scsi_drained_begin(SCSIBus *bus)
+{
+ VirtIOSCSI *s = container_of(bus, VirtIOSCSI, bus);
+ if (s->ctx) {
+ aio_disable_external(s->ctx);
+ }
+}
+
+static void virtio_scsi_drained_end(SCSIBus *bus)
+{
+ VirtIOSCSI *s = container_of(bus, VirtIOSCSI, bus);
+ if (s->ctx) {
+ aio_enable_external(s->ctx);
+ }
+}
+
static struct SCSIBusInfo virtio_scsi_scsi_info = {
.tcq = true,
.max_channel = VIRTIO_SCSI_MAX_CHANNEL,
@@ -839,6 +855,9 @@ static struct SCSIBusInfo virtio_scsi_scsi_info = {
.get_sg_list = virtio_scsi_get_sg_list,
.save_request = virtio_scsi_save_request,
.load_request = virtio_scsi_load_request,
+
+ .drained_begin = virtio_scsi_drained_begin,
+ .drained_end = virtio_scsi_drained_end,
};
void virtio_scsi_common_realize(DeviceState *dev, Error **errp,
--
2.9.3
From ba29c0d2665f88f9ba158da424f3d9bdca56062b Mon Sep 17 00:00:00 2001
From: Paolo Bonzini <pbonzini@redhat.com>
Date: Fri, 17 Mar 2017 18:31:15 +0100
Subject: [PATCH 3/3] fix
---
block.c | 13 +++++++++++--
1 file changed, 11 insertions(+), 2 deletions(-)
diff --git a/block.c b/block.c
index 6e906ec..b3d42ef 100644
--- a/block.c
+++ b/block.c
@@ -1736,9 +1736,10 @@ static void bdrv_replace_child_noperm(BdrvChild *child,
BlockDriverState *new_bs)
{
BlockDriverState *old_bs = child->bs;
+ int drain = !!(old_bs && old_bs->quiesce_counter) - !!(new_bs && new_bs->quiesce_counter);
if (old_bs) {
- if (old_bs->quiesce_counter && child->role->drained_end) {
+ if (drain < 0 && child->role->drained_end) {
child->role->drained_end(child);
}
if (child->role->detach) {
@@ -1751,7 +1752,7 @@ static void bdrv_replace_child_noperm(BdrvChild *child,
if (new_bs) {
QLIST_INSERT_HEAD(&new_bs->parents, child, next_parent);
- if (new_bs->quiesce_counter && child->role->drained_begin) {
+ if (drain > 0 && child->role->drained_begin) {
child->role->drained_begin(child);
}
@@ -3026,6 +3027,10 @@ void bdrv_append(BlockDriverState *bs_new, BlockDriverState *bs_top,
{
Error *local_err = NULL;
+ assert(bs_new->quiesce_counter == 0);
+ assert(bs_top->quiesce_counter == 1);
+ bdrv_drained_begin(bs_new);
+
bdrv_set_backing_hd(bs_new, bs_top, &local_err);
if (local_err) {
error_propagate(errp, local_err);
@@ -3036,9 +3041,13 @@ void bdrv_append(BlockDriverState *bs_new, BlockDriverState *bs_top,
if (local_err) {
error_propagate(errp, local_err);
bdrv_set_backing_hd(bs_new, NULL, &error_abort);
+ bdrv_drained_end(bs_new);
goto out;
}
+ assert(bs_new->quiesce_counter == 1);
+ assert(bs_top->quiesce_counter == 1);
+
/* bs_new is now referenced by its new parents, we don't need the
* additional reference any more. */
out:
--
2.9.3
^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [Qemu-devel] Assertion failure taking external snapshot with virtio drive + iothread
2017-03-17 19:27 ` Paolo Bonzini
@ 2017-03-20 21:54 ` Ed Swierk
2017-03-21 1:48 ` Ed Swierk
1 sibling, 0 replies; 14+ messages in thread
From: Ed Swierk @ 2017-03-20 21:54 UTC (permalink / raw)
To: Paolo Bonzini; +Cc: Fam Zheng, Kevin Wolf, qemu-devel
On Fri, Mar 17, 2017 at 12:27 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
> And this is a fix, but I have no idea why/how it works and what else it
> may break.
>
> Patches 1 and 2 are pretty obvious and would be the first step towards
> eliminating aio_disable/enable_external altogether.
>
> However I got patch 3 more or less by trial and error, and when I
> thought I had the reasoning right I noticed this:
>
> bdrv_drained_end(state->old_bs);
>
> in external_snapshot_clean which makes no sense given the
>
> bdrv_drained_begin(bs_new);
>
> that I added to bdrv_append. So take this with a ton of salt.
>
> The basic idea is that calling child->role->drained_begin and
> child->role->drained_end is not necessary and in fact actively wrong
> when both the old and the new child should be in a drained section.
> But maybe instead it should be asserted that they are, except for the
> special case of adding or removing a child. i.e. after
>
> int drain = !!(old_bs && old_bs->quiesce_counter) - !!(new_bs && new_bs->quiesce_counter);
>
> add
>
> assert(!(drain && old_bs && new_bs));
>
> Throwing this out because it's Friday evening... Maybe Fam can pick
> it up on Monday.
OK, thanks. It would be good to figure this out for 2.9, since the
workaround of disabling iothreads will affect performance. Let me know
if there's anything I can do to help.
Meanwhile I'm also looking into an intermittent crash running
block-commit on an external snapshot. This is with an earlier QEMU
snapshot (from around 20 Jan):
/x/qemu/deb/debuild/block.c:2433: bdrv_append: Assertion
`!bdrv_requests_pending(bs_top)' failed.
That assertion no longer exists in the current master, but I'm trying
to reproduce it reliably and see whether the bug itself has
disappeared.
--Ed
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [Qemu-devel] Assertion failure taking external snapshot with virtio drive + iothread
2017-03-17 19:27 ` Paolo Bonzini
2017-03-20 21:54 ` Ed Swierk
@ 2017-03-21 1:48 ` Ed Swierk
1 sibling, 0 replies; 14+ messages in thread
From: Ed Swierk @ 2017-03-21 1:48 UTC (permalink / raw)
To: Paolo Bonzini; +Cc: Fam Zheng, Kevin Wolf, qemu-devel
On Fri, Mar 17, 2017 at 12:27 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
> And this is a fix, but I have no idea why/how it works and what else it
> may break.
>
> Patches 1 and 2 are pretty obvious and would be the first step towards
> eliminating aio_disable/enable_external altogether.
>
> However I got patch 3 more or less by trial and error, and when I
> thought I had the reasoning right I noticed this:
>
> bdrv_drained_end(state->old_bs);
>
> in external_snapshot_clean which makes no sense given the
>
> bdrv_drained_begin(bs_new);
>
> that I added to bdrv_append. So take this with a ton of salt.
>
> The basic idea is that calling child->role->drained_begin and
> child->role->drained_end is not necessary and in fact actively wrong
> when both the old and the new child should be in a drained section.
> But maybe instead it should be asserted that they are, except for the
> special case of adding or removing a child. i.e. after
>
> int drain = !!(old_bs && old_bs->quiesce_counter) - !!(new_bs && new_bs->quiesce_counter);
>
> add
>
> assert(!(drain && old_bs && new_bs));
>
> Throwing this out because it's Friday evening... Maybe Fam can pick
> it up on Monday.
I just tested this patch on top of today's master. It does make the
ctx->external_disable_cnt > 0 assertion failure on snapshot_blkdev go
away. But it seems to cause a different assertion failure when running
without an iothread, e.g.
qemu-system-x86_64 -nographic -enable-kvm -monitor
telnet:0.0.0.0:1234,server,nowait -m 1024 -object
iothread,id=iothread1 -drive file=/x/drive.qcow2,if=none,id=drive0
-device virtio-blk-pci,drive=drive0
and with the guest constantly writing to the disk with something like
while true; do echo 12345 >blah; done
Running snapshot_blkdev in the monitor repeatedly (with a new backing
file each time) triggers the following after a few tries:
qemu-system-x86_64: /x/qemu/block.c:2965: bdrv_replace_node:
Assertion `!({ typedef struct { int:(sizeof(*&from->in_flight) >
sizeof(void *)) ? -1 : 1; } qemu_build_bug_on__4
__attribute__((unused)); __atomic_load_n(&from->in_flight, 0); })'
failed.
This does not occur on today's master without this patch.
--Ed
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [Qemu-devel] Assertion failure taking external snapshot with virtio drive + iothread
2017-03-17 16:55 [Qemu-devel] Assertion failure taking external snapshot with virtio drive + iothread Ed Swierk
2017-03-17 17:11 ` Paolo Bonzini
@ 2017-03-21 5:26 ` Fam Zheng
2017-03-21 12:20 ` Ed Swierk
1 sibling, 1 reply; 14+ messages in thread
From: Fam Zheng @ 2017-03-21 5:26 UTC (permalink / raw)
To: Ed Swierk; +Cc: Kevin Wolf, Paolo Bonzini, qemu-devel
On Fri, 03/17 09:55, Ed Swierk wrote:
> I'm running into the same problem taking an external snapshot with a
> virtio-blk drive with iothread, so it's not specific to virtio-scsi.
> Run a Linux guest on qemu master
>
> qemu-system-x86_64 -nographic -enable-kvm -monitor
> telnet:0.0.0.0:1234,server,nowait -m 1024 -object
> iothread,id=iothread1 -drive file=/x/drive.qcow2,if=none,id=drive0
> -device virtio-blk-pci,iothread=iothread1,drive=drive0
>
> Then in the monitor
>
> snapshot_blkdev drive0 /x/snap1.qcow2
>
> qemu bombs with
>
> qemu-system-x86_64: /x/qemu/include/block/aio.h:457:
> aio_enable_external: Assertion `ctx->external_disable_cnt > 0' failed.
>
> whereas without the iothread the assertion failure does not occur.
Can you test this one?
---
diff --git a/blockdev.c b/blockdev.c
index c5b2c2c..4c217d5 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -1772,6 +1772,8 @@ static void external_snapshot_prepare(BlkActionState *common,
return;
}
+ bdrv_set_aio_context(state->new_bs, state->aio_context);
+
/* This removes our old bs and adds the new bs. This is an operation that
* can fail, so we need to do it in .prepare; undoing it for abort is
* always possible. */
@@ -1789,8 +1791,6 @@ static void external_snapshot_commit(BlkActionState *common)
ExternalSnapshotState *state =
DO_UPCAST(ExternalSnapshotState, common, common);
- bdrv_set_aio_context(state->new_bs, state->aio_context);
-
/* We don't need (or want) to use the transactional
* bdrv_reopen_multiple() across all the entries at once, because we
* don't want to abort all of them if one of them fails the reopen */
^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [Qemu-devel] Assertion failure taking external snapshot with virtio drive + iothread
2017-03-21 5:26 ` Fam Zheng
@ 2017-03-21 12:20 ` Ed Swierk
2017-03-21 12:50 ` Fam Zheng
0 siblings, 1 reply; 14+ messages in thread
From: Ed Swierk @ 2017-03-21 12:20 UTC (permalink / raw)
To: Fam Zheng; +Cc: Kevin Wolf, Paolo Bonzini, qemu-devel
On Mon, Mar 20, 2017 at 10:26 PM, Fam Zheng <famz@redhat.com> wrote:
> On Fri, 03/17 09:55, Ed Swierk wrote:
>> I'm running into the same problem taking an external snapshot with a
>> virtio-blk drive with iothread, so it's not specific to virtio-scsi.
>> Run a Linux guest on qemu master
>>
>> qemu-system-x86_64 -nographic -enable-kvm -monitor
>> telnet:0.0.0.0:1234,server,nowait -m 1024 -object
>> iothread,id=iothread1 -drive file=/x/drive.qcow2,if=none,id=drive0
>> -device virtio-blk-pci,iothread=iothread1,drive=drive0
>>
>> Then in the monitor
>>
>> snapshot_blkdev drive0 /x/snap1.qcow2
>>
>> qemu bombs with
>>
>> qemu-system-x86_64: /x/qemu/include/block/aio.h:457:
>> aio_enable_external: Assertion `ctx->external_disable_cnt > 0' failed.
>>
>> whereas without the iothread the assertion failure does not occur.
>
>
> Can you test this one?
>
> ---
>
>
> diff --git a/blockdev.c b/blockdev.c
> index c5b2c2c..4c217d5 100644
> --- a/blockdev.c
> +++ b/blockdev.c
> @@ -1772,6 +1772,8 @@ static void external_snapshot_prepare(BlkActionState *common,
> return;
> }
>
> + bdrv_set_aio_context(state->new_bs, state->aio_context);
> +
> /* This removes our old bs and adds the new bs. This is an operation that
> * can fail, so we need to do it in .prepare; undoing it for abort is
> * always possible. */
> @@ -1789,8 +1791,6 @@ static void external_snapshot_commit(BlkActionState *common)
> ExternalSnapshotState *state =
> DO_UPCAST(ExternalSnapshotState, common, common);
>
> - bdrv_set_aio_context(state->new_bs, state->aio_context);
> -
> /* We don't need (or want) to use the transactional
> * bdrv_reopen_multiple() across all the entries at once, because we
> * don't want to abort all of them if one of them fails the reopen */
With this change, a different assertion fails on running snapshot_blkdev:
qemu-system-x86_64: /x/qemu/block/io.c:164: bdrv_drain_recurse:
Assertion `qemu_get_current_aio_context() == qemu_get_aio_context()'
failed.
--Ed
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [Qemu-devel] Assertion failure taking external snapshot with virtio drive + iothread
2017-03-21 12:20 ` Ed Swierk
@ 2017-03-21 12:50 ` Fam Zheng
2017-03-21 13:05 ` Ed Swierk
0 siblings, 1 reply; 14+ messages in thread
From: Fam Zheng @ 2017-03-21 12:50 UTC (permalink / raw)
To: Ed Swierk; +Cc: Kevin Wolf, Paolo Bonzini, qemu-devel
On Tue, 03/21 05:20, Ed Swierk wrote:
> On Mon, Mar 20, 2017 at 10:26 PM, Fam Zheng <famz@redhat.com> wrote:
> > On Fri, 03/17 09:55, Ed Swierk wrote:
> >> I'm running into the same problem taking an external snapshot with a
> >> virtio-blk drive with iothread, so it's not specific to virtio-scsi.
> >> Run a Linux guest on qemu master
> >>
> >> qemu-system-x86_64 -nographic -enable-kvm -monitor
> >> telnet:0.0.0.0:1234,server,nowait -m 1024 -object
> >> iothread,id=iothread1 -drive file=/x/drive.qcow2,if=none,id=drive0
> >> -device virtio-blk-pci,iothread=iothread1,drive=drive0
> >>
> >> Then in the monitor
> >>
> >> snapshot_blkdev drive0 /x/snap1.qcow2
> >>
> >> qemu bombs with
> >>
> >> qemu-system-x86_64: /x/qemu/include/block/aio.h:457:
> >> aio_enable_external: Assertion `ctx->external_disable_cnt > 0' failed.
> >>
> >> whereas without the iothread the assertion failure does not occur.
> >
> >
> > Can you test this one?
> >
> > ---
> >
> >
> > diff --git a/blockdev.c b/blockdev.c
> > index c5b2c2c..4c217d5 100644
> > --- a/blockdev.c
> > +++ b/blockdev.c
> > @@ -1772,6 +1772,8 @@ static void external_snapshot_prepare(BlkActionState *common,
> > return;
> > }
> >
> > + bdrv_set_aio_context(state->new_bs, state->aio_context);
> > +
> > /* This removes our old bs and adds the new bs. This is an operation that
> > * can fail, so we need to do it in .prepare; undoing it for abort is
> > * always possible. */
> > @@ -1789,8 +1791,6 @@ static void external_snapshot_commit(BlkActionState *common)
> > ExternalSnapshotState *state =
> > DO_UPCAST(ExternalSnapshotState, common, common);
> >
> > - bdrv_set_aio_context(state->new_bs, state->aio_context);
> > -
> > /* We don't need (or want) to use the transactional
> > * bdrv_reopen_multiple() across all the entries at once, because we
> > * don't want to abort all of them if one of them fails the reopen */
>
> With this change, a different assertion fails on running snapshot_blkdev:
>
> qemu-system-x86_64: /x/qemu/block/io.c:164: bdrv_drain_recurse:
> Assertion `qemu_get_current_aio_context() == qemu_get_aio_context()'
> failed.
Is there a backtrace?
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [Qemu-devel] Assertion failure taking external snapshot with virtio drive + iothread
2017-03-21 12:50 ` Fam Zheng
@ 2017-03-21 13:05 ` Ed Swierk
2017-03-22 9:19 ` Fam Zheng
0 siblings, 1 reply; 14+ messages in thread
From: Ed Swierk @ 2017-03-21 13:05 UTC (permalink / raw)
To: Fam Zheng; +Cc: Kevin Wolf, Paolo Bonzini, qemu-devel
On Tue, Mar 21, 2017 at 5:50 AM, Fam Zheng <famz@redhat.com> wrote:
> On Tue, 03/21 05:20, Ed Swierk wrote:
>> On Mon, Mar 20, 2017 at 10:26 PM, Fam Zheng <famz@redhat.com> wrote:
>> > On Fri, 03/17 09:55, Ed Swierk wrote:
>> >> I'm running into the same problem taking an external snapshot with a
>> >> virtio-blk drive with iothread, so it's not specific to virtio-scsi.
>> >> Run a Linux guest on qemu master
>> >>
>> >> qemu-system-x86_64 -nographic -enable-kvm -monitor
>> >> telnet:0.0.0.0:1234,server,nowait -m 1024 -object
>> >> iothread,id=iothread1 -drive file=/x/drive.qcow2,if=none,id=drive0
>> >> -device virtio-blk-pci,iothread=iothread1,drive=drive0
>> >>
>> >> Then in the monitor
>> >>
>> >> snapshot_blkdev drive0 /x/snap1.qcow2
>> >>
>> >> qemu bombs with
>> >>
>> >> qemu-system-x86_64: /x/qemu/include/block/aio.h:457:
>> >> aio_enable_external: Assertion `ctx->external_disable_cnt > 0' failed.
>> >>
>> >> whereas without the iothread the assertion failure does not occur.
>> >
>> >
>> > Can you test this one?
>> >
>> > ---
>> >
>> >
>> > diff --git a/blockdev.c b/blockdev.c
>> > index c5b2c2c..4c217d5 100644
>> > --- a/blockdev.c
>> > +++ b/blockdev.c
>> > @@ -1772,6 +1772,8 @@ static void external_snapshot_prepare(BlkActionState *common,
>> > return;
>> > }
>> >
>> > + bdrv_set_aio_context(state->new_bs, state->aio_context);
>> > +
>> > /* This removes our old bs and adds the new bs. This is an operation that
>> > * can fail, so we need to do it in .prepare; undoing it for abort is
>> > * always possible. */
>> > @@ -1789,8 +1791,6 @@ static void external_snapshot_commit(BlkActionState *common)
>> > ExternalSnapshotState *state =
>> > DO_UPCAST(ExternalSnapshotState, common, common);
>> >
>> > - bdrv_set_aio_context(state->new_bs, state->aio_context);
>> > -
>> > /* We don't need (or want) to use the transactional
>> > * bdrv_reopen_multiple() across all the entries at once, because we
>> > * don't want to abort all of them if one of them fails the reopen */
>>
>> With this change, a different assertion fails on running snapshot_blkdev:
>>
>> qemu-system-x86_64: /x/qemu/block/io.c:164: bdrv_drain_recurse:
>> Assertion `qemu_get_current_aio_context() == qemu_get_aio_context()'
>> failed.
Actually running snapshot_blkdev command in the text monitor doesn't
trigger this assertion (I mixed up my notes). Instead it's triggered
by the following sequence in qmp-shell:
(QEMU) blockdev-snapshot-sync device=drive0 format=qcow2
snapshot-file=/x/snap1.qcow2
{"return": {}}
(QEMU) block-commit device=drive0
{"return": {}}
(QEMU) block-job-complete device=drive0
{"return": {}}
> Is there a backtrace?
#0 0x00007ffff3757067 in raise () from /lib/x86_64-linux-gnu/libc.so.6
#1 0x00007ffff3758448 in abort () from /lib/x86_64-linux-gnu/libc.so.6
#2 0x00007ffff3750266 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#3 0x00007ffff3750312 in __assert_fail () from /lib/x86_64-linux-gnu/libc.so.6
#4 0x0000555555b4b0bb in bdrv_drain_recurse
(bs=bs@entry=0x555557bd6010) at /x/qemu/block/io.c:164
#5 0x0000555555b4b7ad in bdrv_drained_begin (bs=0x555557bd6010) at
/x/qemu/block/io.c:231
#6 0x0000555555b4b802 in bdrv_parent_drained_begin
(bs=0x5555568c1a00) at /x/qemu/block/io.c:53
#7 bdrv_drained_begin (bs=bs@entry=0x5555568c1a00) at /x/qemu/block/io.c:228
#8 0x0000555555b4be1e in bdrv_co_drain_bh_cb (opaque=0x7fff9aaece40)
at /x/qemu/block/io.c:190
#9 0x0000555555bb431e in aio_bh_call (bh=0x55555750e5f0) at
/x/qemu/util/async.c:90
#10 aio_bh_poll (ctx=ctx@entry=0x555556718090) at /x/qemu/util/async.c:118
#11 0x0000555555bb72eb in aio_poll (ctx=0x555556718090,
blocking=blocking@entry=true) at /x/qemu/util/aio-posix.c:682
#12 0x00005555559443ce in iothread_run (opaque=0x555556717b80) at
/x/qemu/iothread.c:59
#13 0x00007ffff3ad50a4 in start_thread () from
/lib/x86_64-linux-gnu/libpthread.so.0
#14 0x00007ffff380a87d in clone () from /lib/x86_64-linux-gnu/libc.so.6
--Ed
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [Qemu-devel] Assertion failure taking external snapshot with virtio drive + iothread
2017-03-21 13:05 ` Ed Swierk
@ 2017-03-22 9:19 ` Fam Zheng
2017-03-22 17:36 ` Ed Swierk
0 siblings, 1 reply; 14+ messages in thread
From: Fam Zheng @ 2017-03-22 9:19 UTC (permalink / raw)
To: Ed Swierk; +Cc: Kevin Wolf, Paolo Bonzini, qemu-devel
On Tue, 03/21 06:05, Ed Swierk wrote:
> On Tue, Mar 21, 2017 at 5:50 AM, Fam Zheng <famz@redhat.com> wrote:
> > On Tue, 03/21 05:20, Ed Swierk wrote:
> >> On Mon, Mar 20, 2017 at 10:26 PM, Fam Zheng <famz@redhat.com> wrote:
> >> > On Fri, 03/17 09:55, Ed Swierk wrote:
> >> >> I'm running into the same problem taking an external snapshot with a
> >> >> virtio-blk drive with iothread, so it's not specific to virtio-scsi.
> >> >> Run a Linux guest on qemu master
> >> >>
> >> >> qemu-system-x86_64 -nographic -enable-kvm -monitor
> >> >> telnet:0.0.0.0:1234,server,nowait -m 1024 -object
> >> >> iothread,id=iothread1 -drive file=/x/drive.qcow2,if=none,id=drive0
> >> >> -device virtio-blk-pci,iothread=iothread1,drive=drive0
> >> >>
> >> >> Then in the monitor
> >> >>
> >> >> snapshot_blkdev drive0 /x/snap1.qcow2
> >> >>
> >> >> qemu bombs with
> >> >>
> >> >> qemu-system-x86_64: /x/qemu/include/block/aio.h:457:
> >> >> aio_enable_external: Assertion `ctx->external_disable_cnt > 0' failed.
> >> >>
> >> >> whereas without the iothread the assertion failure does not occur.
> >> >
> >> >
> >> > Can you test this one?
> >> >
> >> > ---
> >> >
> >> >
> >> > diff --git a/blockdev.c b/blockdev.c
> >> > index c5b2c2c..4c217d5 100644
> >> > --- a/blockdev.c
> >> > +++ b/blockdev.c
> >> > @@ -1772,6 +1772,8 @@ static void external_snapshot_prepare(BlkActionState *common,
> >> > return;
> >> > }
> >> >
> >> > + bdrv_set_aio_context(state->new_bs, state->aio_context);
> >> > +
> >> > /* This removes our old bs and adds the new bs. This is an operation that
> >> > * can fail, so we need to do it in .prepare; undoing it for abort is
> >> > * always possible. */
> >> > @@ -1789,8 +1791,6 @@ static void external_snapshot_commit(BlkActionState *common)
> >> > ExternalSnapshotState *state =
> >> > DO_UPCAST(ExternalSnapshotState, common, common);
> >> >
> >> > - bdrv_set_aio_context(state->new_bs, state->aio_context);
> >> > -
> >> > /* We don't need (or want) to use the transactional
> >> > * bdrv_reopen_multiple() across all the entries at once, because we
> >> > * don't want to abort all of them if one of them fails the reopen */
> >>
> >> With this change, a different assertion fails on running snapshot_blkdev:
> >>
> >> qemu-system-x86_64: /x/qemu/block/io.c:164: bdrv_drain_recurse:
> >> Assertion `qemu_get_current_aio_context() == qemu_get_aio_context()'
> >> failed.
>
> Actually running snapshot_blkdev command in the text monitor doesn't
> trigger this assertion (I mixed up my notes). Instead it's triggered
> by the following sequence in qmp-shell:
>
> (QEMU) blockdev-snapshot-sync device=drive0 format=qcow2
> snapshot-file=/x/snap1.qcow2
> {"return": {}}
> (QEMU) block-commit device=drive0
> {"return": {}}
> (QEMU) block-job-complete device=drive0
> {"return": {}}
>
> > Is there a backtrace?
>
> #0 0x00007ffff3757067 in raise () from /lib/x86_64-linux-gnu/libc.so.6
> #1 0x00007ffff3758448 in abort () from /lib/x86_64-linux-gnu/libc.so.6
> #2 0x00007ffff3750266 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
> #3 0x00007ffff3750312 in __assert_fail () from /lib/x86_64-linux-gnu/libc.so.6
> #4 0x0000555555b4b0bb in bdrv_drain_recurse
> (bs=bs@entry=0x555557bd6010) at /x/qemu/block/io.c:164
> #5 0x0000555555b4b7ad in bdrv_drained_begin (bs=0x555557bd6010) at
> /x/qemu/block/io.c:231
> #6 0x0000555555b4b802 in bdrv_parent_drained_begin
> (bs=0x5555568c1a00) at /x/qemu/block/io.c:53
> #7 bdrv_drained_begin (bs=bs@entry=0x5555568c1a00) at /x/qemu/block/io.c:228
> #8 0x0000555555b4be1e in bdrv_co_drain_bh_cb (opaque=0x7fff9aaece40)
> at /x/qemu/block/io.c:190
> #9 0x0000555555bb431e in aio_bh_call (bh=0x55555750e5f0) at
> /x/qemu/util/async.c:90
> #10 aio_bh_poll (ctx=ctx@entry=0x555556718090) at /x/qemu/util/async.c:118
> #11 0x0000555555bb72eb in aio_poll (ctx=0x555556718090,
> blocking=blocking@entry=true) at /x/qemu/util/aio-posix.c:682
> #12 0x00005555559443ce in iothread_run (opaque=0x555556717b80) at
> /x/qemu/iothread.c:59
> #13 0x00007ffff3ad50a4 in start_thread () from
> /lib/x86_64-linux-gnu/libpthread.so.0
> #14 0x00007ffff380a87d in clone () from /lib/x86_64-linux-gnu/libc.so.6
Hmm, looks like a separate bug to me. In addition please apply this (the
assertion here is correct I think, but all callers are not audited yet):
diff --git a/block.c b/block.c
index 6e906ec..447d908 100644
--- a/block.c
+++ b/block.c
@@ -1737,6 +1737,9 @@ static void bdrv_replace_child_noperm(BdrvChild *child,
{
BlockDriverState *old_bs = child->bs;
+ if (old_bs && new_bs) {
+ assert(bdrv_get_aio_context(old_bs) == bdrv_get_aio_context(new_bs));
+ }
if (old_bs) {
if (old_bs->quiesce_counter && child->role->drained_end) {
child->role->drained_end(child);
diff --git a/block/mirror.c b/block/mirror.c
index ca4baa5..a23ca9e 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -1147,6 +1147,7 @@ static void mirror_start_job(const char *job_id, BlockDriverState *bs,
return;
}
mirror_top_bs->total_sectors = bs->total_sectors;
+ bdrv_set_aio_context(mirror_top_bs, bdrv_get_aio_context(bs));
/* bdrv_append takes ownership of the mirror_top_bs reference, need to keep
* it alive until block_job_create() even if bs has no parent. */
^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [Qemu-devel] Assertion failure taking external snapshot with virtio drive + iothread
2017-03-22 9:19 ` Fam Zheng
@ 2017-03-22 17:36 ` Ed Swierk
0 siblings, 0 replies; 14+ messages in thread
From: Ed Swierk @ 2017-03-22 17:36 UTC (permalink / raw)
To: Fam Zheng; +Cc: Kevin Wolf, Paolo Bonzini, qemu-devel
On Wed, Mar 22, 2017 at 2:19 AM, Fam Zheng <famz@redhat.com> wrote:
> On Tue, 03/21 06:05, Ed Swierk wrote:
>> Actually running snapshot_blkdev command in the text monitor doesn't
>> trigger this assertion (I mixed up my notes). Instead it's triggered
>> by the following sequence in qmp-shell:
>>
>> (QEMU) blockdev-snapshot-sync device=drive0 format=qcow2
>> snapshot-file=/x/snap1.qcow2
>> {"return": {}}
>> (QEMU) block-commit device=drive0
>> {"return": {}}
>> (QEMU) block-job-complete device=drive0
>> {"return": {}}
>>
>> > Is there a backtrace?
>>
>> #0 0x00007ffff3757067 in raise () from /lib/x86_64-linux-gnu/libc.so.6
>> #1 0x00007ffff3758448 in abort () from /lib/x86_64-linux-gnu/libc.so.6
>> #2 0x00007ffff3750266 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
>> #3 0x00007ffff3750312 in __assert_fail () from /lib/x86_64-linux-gnu/libc.so.6
>> #4 0x0000555555b4b0bb in bdrv_drain_recurse
>> (bs=bs@entry=0x555557bd6010) at /x/qemu/block/io.c:164
>> #5 0x0000555555b4b7ad in bdrv_drained_begin (bs=0x555557bd6010) at
>> /x/qemu/block/io.c:231
>> #6 0x0000555555b4b802 in bdrv_parent_drained_begin
>> (bs=0x5555568c1a00) at /x/qemu/block/io.c:53
>> #7 bdrv_drained_begin (bs=bs@entry=0x5555568c1a00) at /x/qemu/block/io.c:228
>> #8 0x0000555555b4be1e in bdrv_co_drain_bh_cb (opaque=0x7fff9aaece40)
>> at /x/qemu/block/io.c:190
>> #9 0x0000555555bb431e in aio_bh_call (bh=0x55555750e5f0) at
>> /x/qemu/util/async.c:90
>> #10 aio_bh_poll (ctx=ctx@entry=0x555556718090) at /x/qemu/util/async.c:118
>> #11 0x0000555555bb72eb in aio_poll (ctx=0x555556718090,
>> blocking=blocking@entry=true) at /x/qemu/util/aio-posix.c:682
>> #12 0x00005555559443ce in iothread_run (opaque=0x555556717b80) at
>> /x/qemu/iothread.c:59
>> #13 0x00007ffff3ad50a4 in start_thread () from
>> /lib/x86_64-linux-gnu/libpthread.so.0
>> #14 0x00007ffff380a87d in clone () from /lib/x86_64-linux-gnu/libc.so.6
>
> Hmm, looks like a separate bug to me. In addition please apply this (the
> assertion here is correct I think, but all callers are not audited yet):
>
> diff --git a/block.c b/block.c
> index 6e906ec..447d908 100644
> --- a/block.c
> +++ b/block.c
> @@ -1737,6 +1737,9 @@ static void bdrv_replace_child_noperm(BdrvChild *child,
> {
> BlockDriverState *old_bs = child->bs;
>
> + if (old_bs && new_bs) {
> + assert(bdrv_get_aio_context(old_bs) == bdrv_get_aio_context(new_bs));
> + }
> if (old_bs) {
> if (old_bs->quiesce_counter && child->role->drained_end) {
> child->role->drained_end(child);
> diff --git a/block/mirror.c b/block/mirror.c
> index ca4baa5..a23ca9e 100644
> --- a/block/mirror.c
> +++ b/block/mirror.c
> @@ -1147,6 +1147,7 @@ static void mirror_start_job(const char *job_id, BlockDriverState *bs,
> return;
> }
> mirror_top_bs->total_sectors = bs->total_sectors;
> + bdrv_set_aio_context(mirror_top_bs, bdrv_get_aio_context(bs));
>
> /* bdrv_append takes ownership of the mirror_top_bs reference, need to keep
> * it alive until block_job_create() even if bs has no parent. */
With this patch, I'm seeing either assertions or hangs when I run
blockdev-snapshot-sync, block-commit and block-job-complete
repeatedly. The exact assertion seems to depend on timing and/or what
combination of your other patches I apply. They include:
/x/qemu/hw/virtio/virtio.c:212: vring_get_region_caches: Assertion
`caches != ((void *)0)' failed.
/x/qemu/block/mirror.c:350: mirror_iteration: Assertion `sector_num >=
0' failed.
/x/qemu/block/mirror.c:865: mirror_run: Assertion
`((&bs->tracked_requests)->lh_first == ((void *)0))' failed.
We don't appear to be converging on a solution here. Perhaps I should
instead focus on implementing automated tests so that you or anyone
else can easily reproduce these problems. The only tricky part is
extending qemu-iotests to include booting a guest to generate block IO
and trigger race conditions, but I have some ideas about how to do
this with a minimal (< 5 MB) Linux kernel+rootfs.
--Ed
^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2017-03-22 17:36 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-03-17 16:55 [Qemu-devel] Assertion failure taking external snapshot with virtio drive + iothread Ed Swierk
2017-03-17 17:11 ` Paolo Bonzini
2017-03-17 17:15 ` Paolo Bonzini
2017-03-17 17:32 ` Ed Swierk
2017-03-17 18:10 ` Paolo Bonzini
2017-03-17 19:27 ` Paolo Bonzini
2017-03-20 21:54 ` Ed Swierk
2017-03-21 1:48 ` Ed Swierk
2017-03-21 5:26 ` Fam Zheng
2017-03-21 12:20 ` Ed Swierk
2017-03-21 12:50 ` Fam Zheng
2017-03-21 13:05 ` Ed Swierk
2017-03-22 9:19 ` Fam Zheng
2017-03-22 17:36 ` Ed Swierk
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.