* [Qemu-devel] [PATCH v2 0/6] block: Fixes regarding dataplane and management operations @ 2017-04-07 6:54 Fam Zheng 2017-04-07 6:54 ` [Qemu-devel] [PATCH v2 1/6] block: Fix unpaired aio_disable_external in external snapshot Fam Zheng ` (6 more replies) 0 siblings, 7 replies; 26+ messages in thread From: Fam Zheng @ 2017-04-07 6:54 UTC (permalink / raw) To: qemu-devel Cc: Paolo Bonzini, qemu-block, Ed Swierk, Fam Zheng, Kevin Wolf, Max Reitz, Eric Blake, Stefan Hajnoczi v2: - Drop patch 4 in v1. A second thought made me feel neither it nor Kevin's suggestion to move the BH process to bdrv_drain_recurse/BDRV_POLL_WHILE is a complete fix. So leave it for a separate patch. - Add rev-by to patches 1, 3, 4. - Split from patch 1 in v1 and add patch 2, for the new assertions. [Kevin] - Rewrite patch 5. Fix block job's co when a BDS is moved to a different aio context. [Kevin] - Add patch 6. Crashes are reported on dataplane devices when doing snapshot and commit under guest I/O. With this series, Ed's test case '176' now passes (reran 10+ times): https://github.com/skyportsystems/qemu-1/commits/eswierk-iotests-2.9 The biggest fix for this is patch 5, which fixes a race condition between main thread and iothread. Fam Zheng (6): block: Fix unpaired aio_disable_external in external snapshot block: Assert attached child node has right aio context mirror: Fix aio context of mirror_top_bs block: Quiesce old aio context during bdrv_set_aio_context coroutine: Explicitly specify AioContext when entering coroutine tests/block-job-txn: Don't start block job before adding to txn block.c | 30 ++++++++++++++++++++++++++---- block/blkdebug.c | 4 ++-- block/blkverify.c | 8 ++++---- block/block-backend.c | 4 ++-- block/io.c | 14 +++++++------- block/mirror.c | 3 ++- block/quorum.c | 16 ++++++++-------- block/sheepdog.c | 4 ++-- blockdev.c | 4 ++-- blockjob.c | 4 ++-- hw/9pfs/9p.c | 4 ++-- hw/9pfs/coth.c | 4 ++-- include/block/block.h | 11 +++++++++++ include/qemu/coroutine.h | 11 ++++++----- include/qemu/main-loop.h | 2 +- migration/colo.c | 3 ++- migration/migration.c | 2 +- nbd/server.c | 5 +++-- qemu-img.c | 4 ++-- qemu-io-cmds.c | 2 +- tests/test-blockjob-txn.c | 6 +++++- tests/test-coroutine.c | 41 +++++++++++++++++++++-------------------- tests/test-thread-pool.c | 2 +- util/async.c | 4 ++-- util/qemu-coroutine-io.c | 3 ++- util/qemu-coroutine-lock.c | 6 ++++-- util/qemu-coroutine.c | 10 +++++----- util/trace-events | 2 +- 28 files changed, 129 insertions(+), 84 deletions(-) -- 2.9.3 ^ permalink raw reply [flat|nested] 26+ messages in thread
* [Qemu-devel] [PATCH v2 1/6] block: Fix unpaired aio_disable_external in external snapshot 2017-04-07 6:54 [Qemu-devel] [PATCH v2 0/6] block: Fixes regarding dataplane and management operations Fam Zheng @ 2017-04-07 6:54 ` Fam Zheng 2017-04-07 13:23 ` Stefan Hajnoczi 2017-04-07 6:54 ` [Qemu-devel] [PATCH v2 2/6] block: Assert attached child node has right aio context Fam Zheng ` (5 subsequent siblings) 6 siblings, 1 reply; 26+ messages in thread From: Fam Zheng @ 2017-04-07 6:54 UTC (permalink / raw) To: qemu-devel Cc: Paolo Bonzini, qemu-block, Ed Swierk, Fam Zheng, Kevin Wolf, Max Reitz, Eric Blake, Stefan Hajnoczi bdrv_replace_child_noperm tries to hand over the quiesce_counter state from old bs to the new one, but if they are not on the same aio context this causes unbalance. Fix this by setting the correct aio context before calling bdrv_append(). Reported-by: Ed Swierk <eswierk@skyportsystems.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com> --- blockdev.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/blockdev.c b/blockdev.c index 040c152..4927914 100644 --- a/blockdev.c +++ b/blockdev.c @@ -1772,6 +1772,8 @@ static void external_snapshot_prepare(BlkActionState *common, return; } + bdrv_set_aio_context(state->new_bs, state->aio_context); + /* This removes our old bs and adds the new bs. This is an operation that * can fail, so we need to do it in .prepare; undoing it for abort is * always possible. */ @@ -1789,8 +1791,6 @@ static void external_snapshot_commit(BlkActionState *common) ExternalSnapshotState *state = DO_UPCAST(ExternalSnapshotState, common, common); - bdrv_set_aio_context(state->new_bs, state->aio_context); - /* We don't need (or want) to use the transactional * bdrv_reopen_multiple() across all the entries at once, because we * don't want to abort all of them if one of them fails the reopen */ -- 2.9.3 ^ permalink raw reply related [flat|nested] 26+ messages in thread
* Re: [Qemu-devel] [PATCH v2 1/6] block: Fix unpaired aio_disable_external in external snapshot 2017-04-07 6:54 ` [Qemu-devel] [PATCH v2 1/6] block: Fix unpaired aio_disable_external in external snapshot Fam Zheng @ 2017-04-07 13:23 ` Stefan Hajnoczi 0 siblings, 0 replies; 26+ messages in thread From: Stefan Hajnoczi @ 2017-04-07 13:23 UTC (permalink / raw) To: Fam Zheng Cc: qemu-devel, Paolo Bonzini, qemu-block, Ed Swierk, Kevin Wolf, Max Reitz, Eric Blake [-- Attachment #1: Type: text/plain, Size: 608 bytes --] On Fri, Apr 07, 2017 at 02:54:09PM +0800, Fam Zheng wrote: > bdrv_replace_child_noperm tries to hand over the quiesce_counter state > from old bs to the new one, but if they are not on the same aio context > this causes unbalance. > > Fix this by setting the correct aio context before calling > bdrv_append(). > > Reported-by: Ed Swierk <eswierk@skyportsystems.com> > Reviewed-by: Eric Blake <eblake@redhat.com> > Signed-off-by: Fam Zheng <famz@redhat.com> > --- > blockdev.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 455 bytes --] ^ permalink raw reply [flat|nested] 26+ messages in thread
* [Qemu-devel] [PATCH v2 2/6] block: Assert attached child node has right aio context 2017-04-07 6:54 [Qemu-devel] [PATCH v2 0/6] block: Fixes regarding dataplane and management operations Fam Zheng 2017-04-07 6:54 ` [Qemu-devel] [PATCH v2 1/6] block: Fix unpaired aio_disable_external in external snapshot Fam Zheng @ 2017-04-07 6:54 ` Fam Zheng 2017-04-07 13:24 ` Stefan Hajnoczi 2017-04-07 6:54 ` [Qemu-devel] [PATCH v2 3/6] mirror: Fix aio context of mirror_top_bs Fam Zheng ` (4 subsequent siblings) 6 siblings, 1 reply; 26+ messages in thread From: Fam Zheng @ 2017-04-07 6:54 UTC (permalink / raw) To: qemu-devel Cc: Paolo Bonzini, qemu-block, Ed Swierk, Fam Zheng, Kevin Wolf, Max Reitz, Eric Blake, Stefan Hajnoczi Suggested-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com> --- block.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/block.c b/block.c index 927ba89..b8a3011 100644 --- a/block.c +++ b/block.c @@ -1752,6 +1752,9 @@ static void bdrv_replace_child_noperm(BdrvChild *child, { BlockDriverState *old_bs = child->bs; + if (old_bs && new_bs) { + assert(bdrv_get_aio_context(old_bs) == bdrv_get_aio_context(new_bs)); + } if (old_bs) { if (old_bs->quiesce_counter && child->role->drained_end) { child->role->drained_end(child); @@ -1852,6 +1855,7 @@ BdrvChild *bdrv_attach_child(BlockDriverState *parent_bs, bdrv_get_cumulative_perm(parent_bs, &perm, &shared_perm); assert(parent_bs->drv); + assert(bdrv_get_aio_context(parent_bs) == bdrv_get_aio_context(child_bs)); parent_bs->drv->bdrv_child_perm(parent_bs, NULL, child_role, perm, shared_perm, &perm, &shared_perm); -- 2.9.3 ^ permalink raw reply related [flat|nested] 26+ messages in thread
* Re: [Qemu-devel] [PATCH v2 2/6] block: Assert attached child node has right aio context 2017-04-07 6:54 ` [Qemu-devel] [PATCH v2 2/6] block: Assert attached child node has right aio context Fam Zheng @ 2017-04-07 13:24 ` Stefan Hajnoczi 0 siblings, 0 replies; 26+ messages in thread From: Stefan Hajnoczi @ 2017-04-07 13:24 UTC (permalink / raw) To: Fam Zheng Cc: qemu-devel, Paolo Bonzini, qemu-block, Ed Swierk, Kevin Wolf, Max Reitz, Eric Blake [-- Attachment #1: Type: text/plain, Size: 263 bytes --] On Fri, Apr 07, 2017 at 02:54:10PM +0800, Fam Zheng wrote: > Suggested-by: Kevin Wolf <kwolf@redhat.com> > Signed-off-by: Fam Zheng <famz@redhat.com> > --- > block.c | 4 ++++ > 1 file changed, 4 insertions(+) Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 455 bytes --] ^ permalink raw reply [flat|nested] 26+ messages in thread
* [Qemu-devel] [PATCH v2 3/6] mirror: Fix aio context of mirror_top_bs 2017-04-07 6:54 [Qemu-devel] [PATCH v2 0/6] block: Fixes regarding dataplane and management operations Fam Zheng 2017-04-07 6:54 ` [Qemu-devel] [PATCH v2 1/6] block: Fix unpaired aio_disable_external in external snapshot Fam Zheng 2017-04-07 6:54 ` [Qemu-devel] [PATCH v2 2/6] block: Assert attached child node has right aio context Fam Zheng @ 2017-04-07 6:54 ` Fam Zheng 2017-04-07 13:24 ` Stefan Hajnoczi 2017-04-07 6:54 ` [Qemu-devel] [PATCH v2 4/6] block: Quiesce old aio context during bdrv_set_aio_context Fam Zheng ` (3 subsequent siblings) 6 siblings, 1 reply; 26+ messages in thread From: Fam Zheng @ 2017-04-07 6:54 UTC (permalink / raw) To: qemu-devel Cc: Paolo Bonzini, qemu-block, Ed Swierk, Fam Zheng, Kevin Wolf, Max Reitz, Eric Blake, Stefan Hajnoczi It should be moved to the same context as source, before inserting to the graph. Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com> --- block/mirror.c | 1 + 1 file changed, 1 insertion(+) diff --git a/block/mirror.c b/block/mirror.c index 9e2fecc..e904fef 100644 --- a/block/mirror.c +++ b/block/mirror.c @@ -1148,6 +1148,7 @@ static void mirror_start_job(const char *job_id, BlockDriverState *bs, return; } mirror_top_bs->total_sectors = bs->total_sectors; + bdrv_set_aio_context(mirror_top_bs, bdrv_get_aio_context(bs)); /* bdrv_append takes ownership of the mirror_top_bs reference, need to keep * it alive until block_job_create() even if bs has no parent. */ -- 2.9.3 ^ permalink raw reply related [flat|nested] 26+ messages in thread
* Re: [Qemu-devel] [PATCH v2 3/6] mirror: Fix aio context of mirror_top_bs 2017-04-07 6:54 ` [Qemu-devel] [PATCH v2 3/6] mirror: Fix aio context of mirror_top_bs Fam Zheng @ 2017-04-07 13:24 ` Stefan Hajnoczi 0 siblings, 0 replies; 26+ messages in thread From: Stefan Hajnoczi @ 2017-04-07 13:24 UTC (permalink / raw) To: Fam Zheng Cc: qemu-devel, Paolo Bonzini, qemu-block, Ed Swierk, Kevin Wolf, Max Reitz, Eric Blake [-- Attachment #1: Type: text/plain, Size: 411 bytes --] On Fri, Apr 07, 2017 at 02:54:11PM +0800, Fam Zheng wrote: > It should be moved to the same context as source, before inserting to the > graph. > > Reviewed-by: Eric Blake <eblake@redhat.com> > Reviewed-by: Kevin Wolf <kwolf@redhat.com> > Signed-off-by: Fam Zheng <famz@redhat.com> > --- > block/mirror.c | 1 + > 1 file changed, 1 insertion(+) Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 455 bytes --] ^ permalink raw reply [flat|nested] 26+ messages in thread
* [Qemu-devel] [PATCH v2 4/6] block: Quiesce old aio context during bdrv_set_aio_context 2017-04-07 6:54 [Qemu-devel] [PATCH v2 0/6] block: Fixes regarding dataplane and management operations Fam Zheng ` (2 preceding siblings ...) 2017-04-07 6:54 ` [Qemu-devel] [PATCH v2 3/6] mirror: Fix aio context of mirror_top_bs Fam Zheng @ 2017-04-07 6:54 ` Fam Zheng 2017-04-07 12:50 ` Stefan Hajnoczi 2017-04-07 6:54 ` [Qemu-devel] [PATCH v2 5/6] coroutine: Explicitly specify AioContext when entering coroutine Fam Zheng ` (2 subsequent siblings) 6 siblings, 1 reply; 26+ messages in thread From: Fam Zheng @ 2017-04-07 6:54 UTC (permalink / raw) To: qemu-devel Cc: Paolo Bonzini, qemu-block, Ed Swierk, Fam Zheng, Kevin Wolf, Max Reitz, Eric Blake, Stefan Hajnoczi The fact that the bs->aio_context is changing can confuse the dataplane iothread, because of the now fine granularity aio context lock. bdrv_drain should rather be a bdrv_drained_begin/end pair, but since bs->aio_context is changing, we can just use aio_disable_external and block_job_pause. Reported-by: Ed Swierk <eswierk@skyportsystems.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com> --- block.c | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-) diff --git a/block.c b/block.c index b8a3011..512515a 100644 --- a/block.c +++ b/block.c @@ -4396,11 +4396,14 @@ void bdrv_attach_aio_context(BlockDriverState *bs, void bdrv_set_aio_context(BlockDriverState *bs, AioContext *new_context) { - AioContext *ctx; + AioContext *ctx = bdrv_get_aio_context(bs); + aio_disable_external(ctx); + if (bs->job) { + block_job_pause(bs->job); + } bdrv_drain(bs); /* ensure there are no in-flight requests */ - ctx = bdrv_get_aio_context(bs); while (aio_poll(ctx, false)) { /* wait for all bottom halves to execute */ } @@ -4413,6 +4416,10 @@ void bdrv_set_aio_context(BlockDriverState *bs, AioContext *new_context) aio_context_acquire(new_context); bdrv_attach_aio_context(bs, new_context); aio_context_release(new_context); + if (bs->job) { + block_job_resume(bs->job); + } + aio_enable_external(ctx); } void bdrv_add_aio_context_notifier(BlockDriverState *bs, -- 2.9.3 ^ permalink raw reply related [flat|nested] 26+ messages in thread
* Re: [Qemu-devel] [PATCH v2 4/6] block: Quiesce old aio context during bdrv_set_aio_context 2017-04-07 6:54 ` [Qemu-devel] [PATCH v2 4/6] block: Quiesce old aio context during bdrv_set_aio_context Fam Zheng @ 2017-04-07 12:50 ` Stefan Hajnoczi 2017-04-08 3:43 ` Fam Zheng 0 siblings, 1 reply; 26+ messages in thread From: Stefan Hajnoczi @ 2017-04-07 12:50 UTC (permalink / raw) To: Fam Zheng Cc: qemu-devel, Paolo Bonzini, qemu-block, Ed Swierk, Kevin Wolf, Max Reitz, Eric Blake [-- Attachment #1: Type: text/plain, Size: 421 bytes --] On Fri, Apr 07, 2017 at 02:54:12PM +0800, Fam Zheng wrote: > @@ -4413,6 +4416,10 @@ void bdrv_set_aio_context(BlockDriverState *bs, AioContext *new_context) > aio_context_acquire(new_context); > bdrv_attach_aio_context(bs, new_context); > aio_context_release(new_context); > + if (bs->job) { > + block_job_resume(bs->job); > + } Should this be called before aio_context_release(new_context)? [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 455 bytes --] ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [Qemu-devel] [PATCH v2 4/6] block: Quiesce old aio context during bdrv_set_aio_context 2017-04-07 12:50 ` Stefan Hajnoczi @ 2017-04-08 3:43 ` Fam Zheng 2017-04-10 8:06 ` Kevin Wolf 0 siblings, 1 reply; 26+ messages in thread From: Fam Zheng @ 2017-04-08 3:43 UTC (permalink / raw) To: Stefan Hajnoczi Cc: qemu-devel, Paolo Bonzini, qemu-block, Ed Swierk, Kevin Wolf, Max Reitz, Eric Blake On Fri, 04/07 13:50, Stefan Hajnoczi wrote: > On Fri, Apr 07, 2017 at 02:54:12PM +0800, Fam Zheng wrote: > > @@ -4413,6 +4416,10 @@ void bdrv_set_aio_context(BlockDriverState *bs, AioContext *new_context) > > aio_context_acquire(new_context); > > bdrv_attach_aio_context(bs, new_context); > > aio_context_release(new_context); > > + if (bs->job) { > > + block_job_resume(bs->job); > > + } > > Should this be called before aio_context_release(new_context)? Yes, and I'm going to replace it with bdrv_parent_drained_begin() as Kevin suggested. Fam ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [Qemu-devel] [PATCH v2 4/6] block: Quiesce old aio context during bdrv_set_aio_context 2017-04-08 3:43 ` Fam Zheng @ 2017-04-10 8:06 ` Kevin Wolf 2017-04-10 8:45 ` Fam Zheng 0 siblings, 1 reply; 26+ messages in thread From: Kevin Wolf @ 2017-04-10 8:06 UTC (permalink / raw) To: Fam Zheng Cc: Stefan Hajnoczi, qemu-devel, Paolo Bonzini, qemu-block, Ed Swierk, Max Reitz, Eric Blake Am 08.04.2017 um 05:43 hat Fam Zheng geschrieben: > On Fri, 04/07 13:50, Stefan Hajnoczi wrote: > > On Fri, Apr 07, 2017 at 02:54:12PM +0800, Fam Zheng wrote: > > > @@ -4413,6 +4416,10 @@ void bdrv_set_aio_context(BlockDriverState *bs, AioContext *new_context) > > > aio_context_acquire(new_context); > > > bdrv_attach_aio_context(bs, new_context); > > > aio_context_release(new_context); > > > + if (bs->job) { > > > + block_job_resume(bs->job); > > > + } > > > > Should this be called before aio_context_release(new_context)? > > Yes, and I'm going to replace it with bdrv_parent_drained_begin() as Kevin > suggested. I think at the moment bdrv_parent_drained_begin() can't replace it yet, but you need both. Kevin ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [Qemu-devel] [PATCH v2 4/6] block: Quiesce old aio context during bdrv_set_aio_context 2017-04-10 8:06 ` Kevin Wolf @ 2017-04-10 8:45 ` Fam Zheng 2017-04-10 9:11 ` Kevin Wolf 0 siblings, 1 reply; 26+ messages in thread From: Fam Zheng @ 2017-04-10 8:45 UTC (permalink / raw) To: Kevin Wolf Cc: Stefan Hajnoczi, qemu-devel, Paolo Bonzini, qemu-block, Ed Swierk, Max Reitz, Eric Blake On Mon, 04/10 10:06, Kevin Wolf wrote: > Am 08.04.2017 um 05:43 hat Fam Zheng geschrieben: > > On Fri, 04/07 13:50, Stefan Hajnoczi wrote: > > > On Fri, Apr 07, 2017 at 02:54:12PM +0800, Fam Zheng wrote: > > > > @@ -4413,6 +4416,10 @@ void bdrv_set_aio_context(BlockDriverState *bs, AioContext *new_context) > > > > aio_context_acquire(new_context); > > > > bdrv_attach_aio_context(bs, new_context); > > > > aio_context_release(new_context); > > > > + if (bs->job) { > > > > + block_job_resume(bs->job); > > > > + } > > > > > > Should this be called before aio_context_release(new_context)? > > > > Yes, and I'm going to replace it with bdrv_parent_drained_begin() as Kevin > > suggested. > > I think at the moment bdrv_parent_drained_begin() can't replace it yet, > but you need both. I think we have it already, see 600ac6a0e (blockjob: add devops to blockjob backends): bdrv_parent_drained_begin -> blk_root_drained_begin -> block_job_drained_begin -> block_job_pause Fam ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [Qemu-devel] [PATCH v2 4/6] block: Quiesce old aio context during bdrv_set_aio_context 2017-04-10 8:45 ` Fam Zheng @ 2017-04-10 9:11 ` Kevin Wolf 0 siblings, 0 replies; 26+ messages in thread From: Kevin Wolf @ 2017-04-10 9:11 UTC (permalink / raw) To: Fam Zheng Cc: Stefan Hajnoczi, qemu-devel, Paolo Bonzini, qemu-block, Ed Swierk, Max Reitz, Eric Blake Am 10.04.2017 um 10:45 hat Fam Zheng geschrieben: > On Mon, 04/10 10:06, Kevin Wolf wrote: > > Am 08.04.2017 um 05:43 hat Fam Zheng geschrieben: > > > On Fri, 04/07 13:50, Stefan Hajnoczi wrote: > > > > On Fri, Apr 07, 2017 at 02:54:12PM +0800, Fam Zheng wrote: > > > > > @@ -4413,6 +4416,10 @@ void bdrv_set_aio_context(BlockDriverState *bs, AioContext *new_context) > > > > > aio_context_acquire(new_context); > > > > > bdrv_attach_aio_context(bs, new_context); > > > > > aio_context_release(new_context); > > > > > + if (bs->job) { > > > > > + block_job_resume(bs->job); > > > > > + } > > > > > > > > Should this be called before aio_context_release(new_context)? > > > > > > Yes, and I'm going to replace it with bdrv_parent_drained_begin() as Kevin > > > suggested. > > > > I think at the moment bdrv_parent_drained_begin() can't replace it yet, > > but you need both. > > I think we have it already, see 600ac6a0e (blockjob: add devops to blockjob > backends): > > bdrv_parent_drained_begin > -> blk_root_drained_begin > -> block_job_drained_begin > -> block_job_pause Ah, yes, you're right. Somehow I thought this was only for 2.10. Kevin ^ permalink raw reply [flat|nested] 26+ messages in thread
* [Qemu-devel] [PATCH v2 5/6] coroutine: Explicitly specify AioContext when entering coroutine 2017-04-07 6:54 [Qemu-devel] [PATCH v2 0/6] block: Fixes regarding dataplane and management operations Fam Zheng ` (3 preceding siblings ...) 2017-04-07 6:54 ` [Qemu-devel] [PATCH v2 4/6] block: Quiesce old aio context during bdrv_set_aio_context Fam Zheng @ 2017-04-07 6:54 ` Fam Zheng 2017-04-07 9:57 ` Fam Zheng ` (3 more replies) 2017-04-07 6:54 ` [Qemu-devel] [PATCH v2 6/6] tests/block-job-txn: Don't start block job before adding to txn Fam Zheng 2017-04-07 12:45 ` [Qemu-devel] [PATCH v2 0/6] block: Fixes regarding dataplane and management operations Kevin Wolf 6 siblings, 4 replies; 26+ messages in thread From: Fam Zheng @ 2017-04-07 6:54 UTC (permalink / raw) To: qemu-devel Cc: Paolo Bonzini, qemu-block, Ed Swierk, Fam Zheng, Kevin Wolf, Max Reitz, Eric Blake, Stefan Hajnoczi Coroutine in block layer should always be waken up in bs->aio_context rather than the "current" context where it is entered. They differ when the main loop is doing QMP tasks. Race conditions happen without this patch, because the wrong context is acquired in co_schedule_bh_cb, while the entered coroutine works on a different one: main loop iothread ----------------------------------------------------------------------- blockdev_snapshot aio_context_acquire(bs->ctx) bdrv_flush(bs) bdrv_co_flush(bs) ... qemu_coroutine_yield(co) BDRV_POLL_WHILE() aio_context_release(bs->ctx) aio_context_acquire(bs->ctx) ... aio_co_wake(co) aio_poll(qemu_aio_context) ... co_schedule_bh_cb() ... qemu_coroutine_enter(co) ... /* (A) bdrv_co_flush(bs) /* (B) I/O on bs */ continues... */ aio_context_release(bs->ctx) Both (A) and (B) can access resources protected by bs->ctx, but (A) is not thread-safe. Make the block layer explicitly specify a desired context for the entered coroutine. For the rest callers, stick to the old behavior, qemu_get_aio_context() or qemu_get_current_aio_context(). Signed-off-by: Fam Zheng <famz@redhat.com> --- block.c | 15 +++++++++++++-- block/blkdebug.c | 4 ++-- block/blkverify.c | 8 ++++---- block/block-backend.c | 4 ++-- block/io.c | 14 +++++++------- block/mirror.c | 2 +- block/quorum.c | 16 ++++++++-------- block/sheepdog.c | 4 ++-- blockjob.c | 4 ++-- hw/9pfs/9p.c | 4 ++-- hw/9pfs/coth.c | 4 ++-- include/block/block.h | 11 +++++++++++ include/qemu/coroutine.h | 11 ++++++----- include/qemu/main-loop.h | 2 +- migration/colo.c | 3 ++- migration/migration.c | 2 +- nbd/server.c | 5 +++-- qemu-img.c | 4 ++-- qemu-io-cmds.c | 2 +- tests/test-coroutine.c | 41 +++++++++++++++++++++-------------------- tests/test-thread-pool.c | 2 +- util/async.c | 4 ++-- util/qemu-coroutine-io.c | 3 ++- util/qemu-coroutine-lock.c | 6 ++++-- util/qemu-coroutine.c | 10 +++++----- util/trace-events | 2 +- 26 files changed, 108 insertions(+), 79 deletions(-) diff --git a/block.c b/block.c index 512515a..6513484 100644 --- a/block.c +++ b/block.c @@ -357,10 +357,11 @@ int bdrv_create(BlockDriver *drv, const char* filename, /* Fast-path if already in coroutine context */ bdrv_create_co_entry(&cco); } else { + AioContext *ctx = qemu_get_aio_context(); co = qemu_coroutine_create(bdrv_create_co_entry, &cco); - qemu_coroutine_enter(co); + qemu_coroutine_enter(ctx, co); while (cco.ret == NOT_DONE) { - aio_poll(qemu_get_aio_context(), true); + aio_poll(ctx, true); } } @@ -4324,6 +4325,16 @@ AioContext *bdrv_get_aio_context(BlockDriverState *bs) return bs->aio_context; } +void bdrv_coroutine_enter(BlockDriverState *bs, Coroutine *coroutine) +{ + qemu_coroutine_enter(bdrv_get_aio_context(bs), coroutine); +} + +void bdrv_coroutine_enter_if_inactive(BlockDriverState *bs, Coroutine *co) +{ + qemu_coroutine_enter_if_inactive(bdrv_get_aio_context(bs), co); +} + static void bdrv_do_remove_aio_context_notifier(BdrvAioNotifier *ban) { QLIST_REMOVE(ban, list); diff --git a/block/blkdebug.c b/block/blkdebug.c index 67e8024..2370f73 100644 --- a/block/blkdebug.c +++ b/block/blkdebug.c @@ -610,7 +610,7 @@ static int blkdebug_debug_resume(BlockDriverState *bs, const char *tag) QLIST_FOREACH_SAFE(r, &s->suspended_reqs, next, next) { if (!strcmp(r->tag, tag)) { - qemu_coroutine_enter(r->co); + bdrv_coroutine_enter(bs, r->co); return 0; } } @@ -636,7 +636,7 @@ static int blkdebug_debug_remove_breakpoint(BlockDriverState *bs, } QLIST_FOREACH_SAFE(r, &s->suspended_reqs, next, r_next) { if (!strcmp(r->tag, tag)) { - qemu_coroutine_enter(r->co); + bdrv_coroutine_enter(bs, r->co); ret = 0; } } diff --git a/block/blkverify.c b/block/blkverify.c index 9a1e21c..c11a636 100644 --- a/block/blkverify.c +++ b/block/blkverify.c @@ -172,7 +172,7 @@ static void coroutine_fn blkverify_do_test_req(void *opaque) r->ret = r->request_fn(s->test_file, r->offset, r->bytes, r->qiov, r->flags); r->done++; - qemu_coroutine_enter_if_inactive(r->co); + bdrv_coroutine_enter_if_inactive(r->bs, r->co); } static void coroutine_fn blkverify_do_raw_req(void *opaque) @@ -182,7 +182,7 @@ static void coroutine_fn blkverify_do_raw_req(void *opaque) r->raw_ret = r->request_fn(r->bs->file, r->offset, r->bytes, r->raw_qiov, r->flags); r->done++; - qemu_coroutine_enter_if_inactive(r->co); + bdrv_coroutine_enter_if_inactive(r->bs, r->co); } static int coroutine_fn @@ -207,8 +207,8 @@ blkverify_co_prwv(BlockDriverState *bs, BlkverifyRequest *r, uint64_t offset, co_a = qemu_coroutine_create(blkverify_do_test_req, r); co_b = qemu_coroutine_create(blkverify_do_raw_req, r); - qemu_coroutine_enter(co_a); - qemu_coroutine_enter(co_b); + bdrv_coroutine_enter(bs, co_a); + bdrv_coroutine_enter(bs, co_b); while (r->done < 2) { qemu_coroutine_yield(); diff --git a/block/block-backend.c b/block/block-backend.c index 0b63773..5eeaad1 100644 --- a/block/block-backend.c +++ b/block/block-backend.c @@ -1007,7 +1007,7 @@ static int blk_prw(BlockBackend *blk, int64_t offset, uint8_t *buf, co_entry(&rwco); } else { Coroutine *co = qemu_coroutine_create(co_entry, &rwco); - qemu_coroutine_enter(co); + bdrv_coroutine_enter(blk_bs(blk), co); BDRV_POLL_WHILE(blk_bs(blk), rwco.ret == NOT_DONE); } @@ -1114,7 +1114,7 @@ static BlockAIOCB *blk_aio_prwv(BlockBackend *blk, int64_t offset, int bytes, acb->has_returned = false; co = qemu_coroutine_create(co_entry, acb); - qemu_coroutine_enter(co); + bdrv_coroutine_enter(blk_bs(blk), co); acb->has_returned = true; if (acb->rwco.ret != NOT_DONE) { diff --git a/block/io.c b/block/io.c index 2709a70..30083a7 100644 --- a/block/io.c +++ b/block/io.c @@ -616,7 +616,7 @@ static int bdrv_prwv_co(BdrvChild *child, int64_t offset, bdrv_rw_co_entry(&rwco); } else { co = qemu_coroutine_create(bdrv_rw_co_entry, &rwco); - qemu_coroutine_enter(co); + bdrv_coroutine_enter(child->bs, co); BDRV_POLL_WHILE(child->bs, rwco.ret == NOT_DONE); } return rwco.ret; @@ -1873,7 +1873,7 @@ int64_t bdrv_get_block_status_above(BlockDriverState *bs, } else { co = qemu_coroutine_create(bdrv_get_block_status_above_co_entry, &data); - qemu_coroutine_enter(co); + bdrv_coroutine_enter(bs, co); BDRV_POLL_WHILE(bs, !data.done); } return data.ret; @@ -1999,7 +1999,7 @@ bdrv_rw_vmstate(BlockDriverState *bs, QEMUIOVector *qiov, int64_t pos, }; Coroutine *co = qemu_coroutine_create(bdrv_co_rw_vmstate_entry, &data); - qemu_coroutine_enter(co); + bdrv_coroutine_enter(bs, co); while (data.ret == -EINPROGRESS) { aio_poll(bdrv_get_aio_context(bs), true); } @@ -2216,7 +2216,7 @@ static BlockAIOCB *bdrv_co_aio_prw_vector(BdrvChild *child, acb->is_write = is_write; co = qemu_coroutine_create(bdrv_co_do_rw, acb); - qemu_coroutine_enter(co); + bdrv_coroutine_enter(child->bs, co); bdrv_co_maybe_schedule_bh(acb); return &acb->common; @@ -2247,7 +2247,7 @@ BlockAIOCB *bdrv_aio_flush(BlockDriverState *bs, acb->req.error = -EINPROGRESS; co = qemu_coroutine_create(bdrv_aio_flush_co_entry, acb); - qemu_coroutine_enter(co); + bdrv_coroutine_enter(bs, co); bdrv_co_maybe_schedule_bh(acb); return &acb->common; @@ -2380,7 +2380,7 @@ int bdrv_flush(BlockDriverState *bs) bdrv_flush_co_entry(&flush_co); } else { co = qemu_coroutine_create(bdrv_flush_co_entry, &flush_co); - qemu_coroutine_enter(co); + bdrv_coroutine_enter(bs, co); BDRV_POLL_WHILE(bs, flush_co.ret == NOT_DONE); } @@ -2527,7 +2527,7 @@ int bdrv_pdiscard(BlockDriverState *bs, int64_t offset, int count) bdrv_pdiscard_co_entry(&rwco); } else { co = qemu_coroutine_create(bdrv_pdiscard_co_entry, &rwco); - qemu_coroutine_enter(co); + bdrv_coroutine_enter(bs, co); BDRV_POLL_WHILE(bs, rwco.ret == NOT_DONE); } diff --git a/block/mirror.c b/block/mirror.c index e904fef..a68855c 100644 --- a/block/mirror.c +++ b/block/mirror.c @@ -130,7 +130,7 @@ static void mirror_iteration_done(MirrorOp *op, int ret) g_free(op); if (s->waiting_for_io) { - qemu_coroutine_enter(s->common.co); + bdrv_coroutine_enter(s->source, s->common.co); } } diff --git a/block/quorum.c b/block/quorum.c index 40205fb..4d42e4a 100644 --- a/block/quorum.c +++ b/block/quorum.c @@ -270,16 +270,16 @@ static void quorum_rewrite_entry(void *opaque) QuorumCo *co = opaque; QuorumAIOCB *acb = co->acb; BDRVQuorumState *s = acb->bs->opaque; + BdrvChild *child = s->children[co->idx]; /* Ignore any errors, it's just a correction attempt for already * corrupted data. */ - bdrv_co_pwritev(s->children[co->idx], acb->offset, acb->bytes, - acb->qiov, 0); + bdrv_co_pwritev(child, acb->offset, acb->bytes, acb->qiov, 0); /* Wake up the caller after the last rewrite */ acb->rewrite_count--; if (!acb->rewrite_count) { - qemu_coroutine_enter_if_inactive(acb->co); + bdrv_coroutine_enter_if_inactive(child->bs, acb->co); } } @@ -318,7 +318,7 @@ static bool quorum_rewrite_bad_versions(QuorumAIOCB *acb, }; co = qemu_coroutine_create(quorum_rewrite_entry, &data); - qemu_coroutine_enter(co); + bdrv_coroutine_enter(acb->bs, co); } } @@ -602,7 +602,7 @@ static void read_quorum_children_entry(void *opaque) /* Wake up the caller after the last read */ if (acb->count == s->num_children) { - qemu_coroutine_enter_if_inactive(acb->co); + bdrv_coroutine_enter_if_inactive(sacb->bs, acb->co); } } @@ -626,7 +626,7 @@ static int read_quorum_children(QuorumAIOCB *acb) }; co = qemu_coroutine_create(read_quorum_children_entry, &data); - qemu_coroutine_enter(co); + bdrv_coroutine_enter(acb->bs, co); } while (acb->count < s->num_children) { @@ -712,7 +712,7 @@ static void write_quorum_entry(void *opaque) /* Wake up the caller after the last write */ if (acb->count == s->num_children) { - qemu_coroutine_enter_if_inactive(acb->co); + bdrv_coroutine_enter_if_inactive(sacb->bs, acb->co); } } @@ -731,7 +731,7 @@ static int quorum_co_pwritev(BlockDriverState *bs, uint64_t offset, }; co = qemu_coroutine_create(write_quorum_entry, &data); - qemu_coroutine_enter(co); + bdrv_coroutine_enter(bs, co); } while (acb->count < s->num_children) { diff --git a/block/sheepdog.c b/block/sheepdog.c index 1b71fc8..ac6d32a 100644 --- a/block/sheepdog.c +++ b/block/sheepdog.c @@ -736,10 +736,10 @@ static int do_req(int sockfd, BlockDriverState *bs, SheepdogReq *hdr, } else { co = qemu_coroutine_create(do_co_req, &srco); if (bs) { - qemu_coroutine_enter(co); + bdrv_coroutine_enter(bs, co); BDRV_POLL_WHILE(bs, !srco.finished); } else { - qemu_coroutine_enter(co); + bdrv_coroutine_enter(bs, co); while (!srco.finished) { aio_poll(qemu_get_aio_context(), true); } diff --git a/blockjob.c b/blockjob.c index 9b619f385..6e48932 100644 --- a/blockjob.c +++ b/blockjob.c @@ -290,7 +290,7 @@ void block_job_start(BlockJob *job) job->pause_count--; job->busy = true; job->paused = false; - qemu_coroutine_enter(job->co); + bdrv_coroutine_enter(blk_bs(job->blk), job->co); } void block_job_ref(BlockJob *job) @@ -532,7 +532,7 @@ void block_job_user_resume(BlockJob *job) void block_job_enter(BlockJob *job) { if (job->co && !job->busy) { - qemu_coroutine_enter(job->co); + bdrv_coroutine_enter(blk_bs(job->blk), job->co); } } diff --git a/hw/9pfs/9p.c b/hw/9pfs/9p.c index c80ba67..af4acb4 100644 --- a/hw/9pfs/9p.c +++ b/hw/9pfs/9p.c @@ -3463,7 +3463,7 @@ void pdu_submit(V9fsPDU *pdu) handler = v9fs_fs_ro; } co = qemu_coroutine_create(handler, pdu); - qemu_coroutine_enter(co); + qemu_coroutine_enter(qemu_get_aio_context(), co); } /* Returns 0 on success, 1 on failure. */ @@ -3596,7 +3596,7 @@ void v9fs_reset(V9fsState *s) } co = qemu_coroutine_create(virtfs_co_reset, &data); - qemu_coroutine_enter(co); + qemu_coroutine_enter(qemu_get_aio_context(), co); while (!data.done) { aio_poll(qemu_get_aio_context(), true); diff --git a/hw/9pfs/coth.c b/hw/9pfs/coth.c index 89018de..70e2667 100644 --- a/hw/9pfs/coth.c +++ b/hw/9pfs/coth.c @@ -22,14 +22,14 @@ static void coroutine_enter_cb(void *opaque, int ret) { Coroutine *co = opaque; - qemu_coroutine_enter(co); + qemu_coroutine_enter(qemu_get_aio_context(), co); } /* Called from worker thread. */ static int coroutine_enter_func(void *arg) { Coroutine *co = arg; - qemu_coroutine_enter(co); + qemu_coroutine_enter(qemu_get_aio_context(), co); return 0; } diff --git a/include/block/block.h b/include/block/block.h index 5149260..00db53f 100644 --- a/include/block/block.h +++ b/include/block/block.h @@ -556,6 +556,17 @@ bool bdrv_debug_is_suspended(BlockDriverState *bs, const char *tag); AioContext *bdrv_get_aio_context(BlockDriverState *bs); /** + * Transfer control to @co in the aio context of @bs + */ +void bdrv_coroutine_enter(BlockDriverState *bs, Coroutine *co); + +/** + * Transfer control to @co in the aio context of @bs if it's not active (i.e. + * part of the call stack of the running coroutine). Otherwise, do nothing. + */ +void bdrv_coroutine_enter_if_inactive(BlockDriverState *bs, Coroutine *co); + +/** * bdrv_set_aio_context: * * Changes the #AioContext used for fd handlers, timers, and BHs by this diff --git a/include/qemu/coroutine.h b/include/qemu/coroutine.h index e60beaf..77f2de1 100644 --- a/include/qemu/coroutine.h +++ b/include/qemu/coroutine.h @@ -66,15 +66,16 @@ typedef void coroutine_fn CoroutineEntry(void *opaque); Coroutine *qemu_coroutine_create(CoroutineEntry *entry, void *opaque); /** - * Transfer control to a coroutine + * Transfer control to a coroutine and associate it to @ctx */ -void qemu_coroutine_enter(Coroutine *coroutine); +void qemu_coroutine_enter(AioContext *ctx, Coroutine *coroutine); /** - * Transfer control to a coroutine if it's not active (i.e. part of the call - * stack of the running coroutine). Otherwise, do nothing. + * Transfer control to a coroutine and associate it to @ctx, if it's not active + * (i.e. part of the call stack of the running coroutine). Otherwise, do + * nothing. */ -void qemu_coroutine_enter_if_inactive(Coroutine *co); +void qemu_coroutine_enter_if_inactive(AioContext *ctx, Coroutine *co); /** * Transfer control back to a coroutine's caller diff --git a/include/qemu/main-loop.h b/include/qemu/main-loop.h index d7e24af..8a19e10 100644 --- a/include/qemu/main-loop.h +++ b/include/qemu/main-loop.h @@ -64,7 +64,7 @@ int qemu_init_main_loop(Error **errp); * * void enter_co_bh(void *opaque) { * QEMUCoroutine *co = opaque; - * qemu_coroutine_enter(co); + * qemu_coroutine_enter(ctx, co); * } * * ... diff --git a/migration/colo.c b/migration/colo.c index c19eb3f..7167ce7 100644 --- a/migration/colo.c +++ b/migration/colo.c @@ -100,7 +100,8 @@ static void secondary_vm_do_failover(void) qemu_sem_post(&mis->colo_incoming_sem); /* For Secondary VM, jump to incoming co */ if (mis->migration_incoming_co) { - qemu_coroutine_enter(mis->migration_incoming_co); + qemu_coroutine_enter(qemu_get_aio_context(), + mis->migration_incoming_co); } } diff --git a/migration/migration.c b/migration/migration.c index 54060f7..82ed7e4 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -449,7 +449,7 @@ void migration_fd_process_incoming(QEMUFile *f) migrate_decompress_threads_create(); qemu_file_set_blocking(f, false); - qemu_coroutine_enter(co); + qemu_coroutine_enter(qemu_get_aio_context(), co); } diff --git a/nbd/server.c b/nbd/server.c index 924a1fe..a2635d2 100644 --- a/nbd/server.c +++ b/nbd/server.c @@ -108,7 +108,7 @@ static gboolean nbd_negotiate_continue(QIOChannel *ioc, GIOCondition condition, void *opaque) { - qemu_coroutine_enter(opaque); + qemu_coroutine_enter(ioc->ctx, opaque); return TRUE; } @@ -1418,5 +1418,6 @@ void nbd_client_new(NBDExport *exp, data->client = client; data->co = qemu_coroutine_create(nbd_co_client_start, data); - qemu_coroutine_enter(data->co); + qemu_coroutine_enter(exp ? exp->ctx : qemu_get_aio_context(), + data->co); } diff --git a/qemu-img.c b/qemu-img.c index b220cf7..ffb09f7 100644 --- a/qemu-img.c +++ b/qemu-img.c @@ -1830,7 +1830,7 @@ static void coroutine_fn convert_co_do_copy(void *opaque) * s->wait_sector_num[i] == -1 during A -> B. Therefore * B will never enter A during this time window. */ - qemu_coroutine_enter(s->co[i]); + qemu_coroutine_enter(qemu_get_aio_context(), s->co[i]); break; } } @@ -1896,7 +1896,7 @@ static int convert_do_copy(ImgConvertState *s) for (i = 0; i < s->num_coroutines; i++) { s->co[i] = qemu_coroutine_create(convert_co_do_copy, s); s->wait_sector_num[i] = -1; - qemu_coroutine_enter(s->co[i]); + qemu_coroutine_enter(qemu_get_aio_context(), s->co[i]); } while (s->ret == -EINPROGRESS) { diff --git a/qemu-io-cmds.c b/qemu-io-cmds.c index 883f53b..312fc6d 100644 --- a/qemu-io-cmds.c +++ b/qemu-io-cmds.c @@ -521,7 +521,7 @@ static int do_co_pwrite_zeroes(BlockBackend *blk, int64_t offset, } co = qemu_coroutine_create(co_pwrite_zeroes_entry, &data); - qemu_coroutine_enter(co); + bdrv_coroutine_enter(blk_bs(blk), co); while (!data.done) { aio_poll(blk_get_aio_context(blk), true); } diff --git a/tests/test-coroutine.c b/tests/test-coroutine.c index abd97c2..64f2563 100644 --- a/tests/test-coroutine.c +++ b/tests/test-coroutine.c @@ -14,6 +14,7 @@ #include "qemu/osdep.h" #include "qemu/coroutine.h" #include "qemu/coroutine_int.h" +#include "qemu/main-loop.h" /* * Check that qemu_in_coroutine() works @@ -31,7 +32,7 @@ static void test_in_coroutine(void) g_assert(!qemu_in_coroutine()); coroutine = qemu_coroutine_create(verify_in_coroutine, NULL); - qemu_coroutine_enter(coroutine); + qemu_coroutine_enter(qemu_get_aio_context(), coroutine); } /* @@ -49,7 +50,7 @@ static void test_self(void) Coroutine *coroutine; coroutine = qemu_coroutine_create(verify_self, &coroutine); - qemu_coroutine_enter(coroutine); + qemu_coroutine_enter(qemu_get_aio_context(), coroutine); } /* @@ -79,9 +80,9 @@ static void coroutine_fn verify_entered_step_1(void *opaque) coroutine = qemu_coroutine_create(verify_entered_step_2, self); g_assert(!qemu_coroutine_entered(coroutine)); - qemu_coroutine_enter(coroutine); + qemu_coroutine_enter(qemu_get_aio_context(), coroutine); g_assert(!qemu_coroutine_entered(coroutine)); - qemu_coroutine_enter(coroutine); + qemu_coroutine_enter(qemu_get_aio_context(), coroutine); } static void test_entered(void) @@ -90,7 +91,7 @@ static void test_entered(void) coroutine = qemu_coroutine_create(verify_entered_step_1, NULL); g_assert(!qemu_coroutine_entered(coroutine)); - qemu_coroutine_enter(coroutine); + qemu_coroutine_enter(qemu_get_aio_context(), coroutine); } /* @@ -113,7 +114,7 @@ static void coroutine_fn nest(void *opaque) Coroutine *child; child = qemu_coroutine_create(nest, nd); - qemu_coroutine_enter(child); + qemu_coroutine_enter(qemu_get_aio_context(), child); } nd->n_return++; @@ -129,7 +130,7 @@ static void test_nesting(void) }; root = qemu_coroutine_create(nest, &nd); - qemu_coroutine_enter(root); + qemu_coroutine_enter(qemu_get_aio_context(), root); /* Must enter and return from max nesting level */ g_assert_cmpint(nd.n_enter, ==, nd.max); @@ -159,7 +160,7 @@ static void test_yield(void) coroutine = qemu_coroutine_create(yield_5_times, &done); while (!done) { - qemu_coroutine_enter(coroutine); + qemu_coroutine_enter(qemu_get_aio_context(), coroutine); i++; } g_assert_cmpint(i, ==, 5); /* coroutine must yield 5 times */ @@ -173,7 +174,7 @@ static void coroutine_fn c2_fn(void *opaque) static void coroutine_fn c1_fn(void *opaque) { Coroutine *c2 = opaque; - qemu_coroutine_enter(c2); + qemu_coroutine_enter(qemu_get_aio_context(), c2); } static void test_co_queue(void) @@ -185,12 +186,12 @@ static void test_co_queue(void) c2 = qemu_coroutine_create(c2_fn, NULL); c1 = qemu_coroutine_create(c1_fn, c2); - qemu_coroutine_enter(c1); + qemu_coroutine_enter(qemu_get_aio_context(), c1); /* c1 shouldn't be used any more now; make sure we segfault if it is */ tmp = *c1; memset(c1, 0xff, sizeof(Coroutine)); - qemu_coroutine_enter(c2); + qemu_coroutine_enter(qemu_get_aio_context(), c2); /* Must restore the coroutine now to avoid corrupted pool */ *c1 = tmp; @@ -214,13 +215,13 @@ static void test_lifecycle(void) /* Create, enter, and return from coroutine */ coroutine = qemu_coroutine_create(set_and_exit, &done); - qemu_coroutine_enter(coroutine); + qemu_coroutine_enter(qemu_get_aio_context(), coroutine); g_assert(done); /* expect done to be true (first time) */ /* Repeat to check that no state affects this test */ done = false; coroutine = qemu_coroutine_create(set_and_exit, &done); - qemu_coroutine_enter(coroutine); + qemu_coroutine_enter(qemu_get_aio_context(), coroutine); g_assert(done); /* expect done to be true (second time) */ } @@ -256,10 +257,10 @@ static void do_order_test(void) co = qemu_coroutine_create(co_order_test, NULL); record_push(1, 1); - qemu_coroutine_enter(co); + qemu_coroutine_enter(qemu_get_aio_context(), co); record_push(1, 2); g_assert(!qemu_in_coroutine()); - qemu_coroutine_enter(co); + qemu_coroutine_enter(qemu_get_aio_context(), co); record_push(1, 3); g_assert(!qemu_in_coroutine()); } @@ -297,7 +298,7 @@ static void perf_lifecycle(void) g_test_timer_start(); for (i = 0; i < max; i++) { coroutine = qemu_coroutine_create(empty_coroutine, NULL); - qemu_coroutine_enter(coroutine); + qemu_coroutine_enter(qemu_get_aio_context(), coroutine); } duration = g_test_timer_elapsed(); @@ -321,7 +322,7 @@ static void perf_nesting(void) .max = maxnesting, }; root = qemu_coroutine_create(nest, &nd); - qemu_coroutine_enter(root); + qemu_coroutine_enter(qemu_get_aio_context(), root); } duration = g_test_timer_elapsed(); @@ -354,7 +355,7 @@ static void perf_yield(void) g_test_timer_start(); while (i > 0) { - qemu_coroutine_enter(coroutine); + qemu_coroutine_enter(qemu_get_aio_context(), coroutine); } duration = g_test_timer_elapsed(); @@ -401,8 +402,8 @@ static void perf_cost(void) g_test_timer_start(); while (i++ < maxcycles) { co = qemu_coroutine_create(perf_cost_func, &i); - qemu_coroutine_enter(co); - qemu_coroutine_enter(co); + qemu_coroutine_enter(qemu_get_aio_context(), co); + qemu_coroutine_enter(qemu_get_aio_context(), co); } duration = g_test_timer_elapsed(); ops = (long)(maxcycles / (duration * 1000)); diff --git a/tests/test-thread-pool.c b/tests/test-thread-pool.c index 91b4ec5..ba4fee4 100644 --- a/tests/test-thread-pool.c +++ b/tests/test-thread-pool.c @@ -94,7 +94,7 @@ static void test_submit_co(void) WorkerTestData data; Coroutine *co = qemu_coroutine_create(co_test_cb, &data); - qemu_coroutine_enter(co); + qemu_coroutine_enter(qemu_get_aio_context(), co); /* Back here once the worker has started. */ diff --git a/util/async.c b/util/async.c index 663e297..3289a36 100644 --- a/util/async.c +++ b/util/async.c @@ -388,7 +388,7 @@ static void co_schedule_bh_cb(void *opaque) QSLIST_REMOVE_HEAD(&straight, co_scheduled_next); trace_aio_co_schedule_bh_cb(ctx, co); aio_context_acquire(ctx); - qemu_coroutine_enter(co); + qemu_coroutine_enter(ctx, co); aio_context_release(ctx); } } @@ -464,7 +464,7 @@ void aio_co_wake(struct Coroutine *co) QSIMPLEQ_INSERT_TAIL(&self->co_queue_wakeup, co, co_queue_next); } else { aio_context_acquire(ctx); - qemu_coroutine_enter(co); + qemu_coroutine_enter(ctx, co); aio_context_release(ctx); } } diff --git a/util/qemu-coroutine-io.c b/util/qemu-coroutine-io.c index 44a8969..26818af 100644 --- a/util/qemu-coroutine-io.c +++ b/util/qemu-coroutine-io.c @@ -75,7 +75,8 @@ static void fd_coroutine_enter(void *opaque) { FDYieldUntilData *data = opaque; qemu_set_fd_handler(data->fd, NULL, NULL, NULL); - qemu_coroutine_enter(data->co); + /* XXX: do we need an explicit ctx? */ + qemu_coroutine_enter(qemu_get_current_aio_context(), data->co); } void coroutine_fn yield_until_fd_readable(int fd) diff --git a/util/qemu-coroutine-lock.c b/util/qemu-coroutine-lock.c index 6328eed..42c18d8 100644 --- a/util/qemu-coroutine-lock.c +++ b/util/qemu-coroutine-lock.c @@ -81,7 +81,8 @@ void qemu_co_queue_run_restart(Coroutine *co) trace_qemu_co_queue_run_restart(co); while ((next = QSIMPLEQ_FIRST(&co->co_queue_wakeup))) { QSIMPLEQ_REMOVE_HEAD(&co->co_queue_wakeup, co_queue_next); - qemu_coroutine_enter(next); + assert(next->ctx); + qemu_coroutine_enter(next->ctx, next); } } @@ -125,7 +126,8 @@ bool qemu_co_enter_next(CoQueue *queue) } QSIMPLEQ_REMOVE_HEAD(&queue->entries, co_queue_next); - qemu_coroutine_enter(next); + assert(next->ctx); + qemu_coroutine_enter(next->ctx, next); return true; } diff --git a/util/qemu-coroutine.c b/util/qemu-coroutine.c index 72412e5..47329a0 100644 --- a/util/qemu-coroutine.c +++ b/util/qemu-coroutine.c @@ -102,12 +102,12 @@ static void coroutine_delete(Coroutine *co) qemu_coroutine_delete(co); } -void qemu_coroutine_enter(Coroutine *co) +void qemu_coroutine_enter(AioContext *ctx, Coroutine *co) { Coroutine *self = qemu_coroutine_self(); CoroutineAction ret; - trace_qemu_coroutine_enter(self, co, co->entry_arg); + trace_qemu_coroutine_enter(self, ctx, co, co->entry_arg); if (co->caller) { fprintf(stderr, "Co-routine re-entered recursively\n"); @@ -115,7 +115,7 @@ void qemu_coroutine_enter(Coroutine *co) } co->caller = self; - co->ctx = qemu_get_current_aio_context(); + co->ctx = ctx; /* Store co->ctx before anything that stores co. Matches * barrier in aio_co_wake and qemu_co_mutex_wake. @@ -139,10 +139,10 @@ void qemu_coroutine_enter(Coroutine *co) } } -void qemu_coroutine_enter_if_inactive(Coroutine *co) +void qemu_coroutine_enter_if_inactive(AioContext *ctx, Coroutine *co) { if (!qemu_coroutine_entered(co)) { - qemu_coroutine_enter(co); + qemu_coroutine_enter(ctx, co); } } diff --git a/util/trace-events b/util/trace-events index ac27d94..d3c39a6 100644 --- a/util/trace-events +++ b/util/trace-events @@ -22,7 +22,7 @@ buffer_move(const char *buf, size_t len, const char *from) "%s: %zd bytes from % buffer_free(const char *buf, size_t len) "%s: capacity %zd" # util/qemu-coroutine.c -qemu_coroutine_enter(void *from, void *to, void *opaque) "from %p to %p opaque %p" +qemu_coroutine_enter(void *ctx, void *from, void *to, void *opaque) "ctx %p from %p to %p opaque %p" qemu_coroutine_yield(void *from, void *to) "from %p to %p" qemu_coroutine_terminate(void *co) "self %p" -- 2.9.3 ^ permalink raw reply related [flat|nested] 26+ messages in thread
* Re: [Qemu-devel] [PATCH v2 5/6] coroutine: Explicitly specify AioContext when entering coroutine 2017-04-07 6:54 ` [Qemu-devel] [PATCH v2 5/6] coroutine: Explicitly specify AioContext when entering coroutine Fam Zheng @ 2017-04-07 9:57 ` Fam Zheng 2017-04-07 13:26 ` Stefan Hajnoczi ` (2 subsequent siblings) 3 siblings, 0 replies; 26+ messages in thread From: Fam Zheng @ 2017-04-07 9:57 UTC (permalink / raw) To: qemu-devel Cc: Paolo Bonzini, qemu-block, Ed Swierk, Kevin Wolf, Max Reitz, Eric Blake, Stefan Hajnoczi On Fri, 04/07 14:54, Fam Zheng wrote: > > main loop iothread > ----------------------------------------------------------------------- > blockdev_snapshot > aio_context_acquire(bs->ctx) > bdrv_flush(bs) > bdrv_co_flush(bs) > ... > qemu_coroutine_yield(co) > BDRV_POLL_WHILE() > aio_context_release(bs->ctx) > aio_context_acquire(bs->ctx) > ... > aio_co_wake(co) > aio_poll(qemu_aio_context) ... > co_schedule_bh_cb() ... > qemu_coroutine_enter(co) ... > /* (A) bdrv_co_flush(bs) /* (B) I/O on bs */ > continues... */ > aio_context_release(bs->ctx) After talking to Kevin on IRC, this aio_context_acquire() in iothread could be the one in vq handler: main loop iothread ----------------------------------------------------------------------- blockdev_snapshot aio_context_acquire(bs->ctx) virtio_scsi_data_plane_handle_cmd bdrv_drained_begin(bs->ctx) bdrv_flush(bs) bdrv_co_flush(bs) aio_context_acquire(bs->ctx).enter ... qemu_coroutine_yield(co) BDRV_POLL_WHILE() aio_context_release(bs->ctx) aio_context_acquire(bs->ctx).return ... aio_co_wake(co) aio_poll(qemu_aio_context) ... co_schedule_bh_cb() ... qemu_coroutine_enter(co) ... /* (A) bdrv_co_flush(bs) /* (B) I/O on bs */ continues... */ aio_context_release(bs->ctx) aio_context_acquire(bs->ctx) Note that in this special case, bdrv_drained_begin() doesn't do the "release, poll, acquire" in BDRV_POLL_WHILE, because bs->in_flight == 0. This might be the root cause of the race? (Ed's test case showed that apart from vq handlers, block jobs in the iothread can also trigger the same pattern of race. For this part we need John's patches to pause block jobs in bdrv_drained_begin.) Fam ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [Qemu-devel] [PATCH v2 5/6] coroutine: Explicitly specify AioContext when entering coroutine 2017-04-07 6:54 ` [Qemu-devel] [PATCH v2 5/6] coroutine: Explicitly specify AioContext when entering coroutine Fam Zheng 2017-04-07 9:57 ` Fam Zheng @ 2017-04-07 13:26 ` Stefan Hajnoczi 2017-04-08 3:27 ` Fam Zheng 2017-04-07 14:07 ` Eric Blake 2017-04-07 15:14 ` Kevin Wolf 3 siblings, 1 reply; 26+ messages in thread From: Stefan Hajnoczi @ 2017-04-07 13:26 UTC (permalink / raw) To: Fam Zheng Cc: qemu-devel, Paolo Bonzini, qemu-block, Ed Swierk, Kevin Wolf, Max Reitz, Eric Blake [-- Attachment #1: Type: text/plain, Size: 1987 bytes --] On Fri, Apr 07, 2017 at 02:54:13PM +0800, Fam Zheng wrote: > Coroutine in block layer should always be waken up in bs->aio_context > rather than the "current" context where it is entered. They differ when > the main loop is doing QMP tasks. > > Race conditions happen without this patch, because the wrong context is > acquired in co_schedule_bh_cb, while the entered coroutine works on a > different one: > > main loop iothread > ----------------------------------------------------------------------- > blockdev_snapshot > aio_context_acquire(bs->ctx) > bdrv_flush(bs) > bdrv_co_flush(bs) > ... > qemu_coroutine_yield(co) > BDRV_POLL_WHILE() > aio_context_release(bs->ctx) > aio_context_acquire(bs->ctx) > ... > aio_co_wake(co) > aio_poll(qemu_aio_context) ... > co_schedule_bh_cb() ... > qemu_coroutine_enter(co) ... > /* (A) bdrv_co_flush(bs) /* (B) I/O on bs */ > continues... */ > aio_context_release(bs->ctx) > > Both (A) and (B) can access resources protected by bs->ctx, but (A) is > not thread-safe. > > Make the block layer explicitly specify a desired context for the > entered coroutine. For the rest callers, stick to the old behavior, > qemu_get_aio_context() or qemu_get_current_aio_context(). I wasn't expecting the patch to touch so many places. If a coroutine can be permanently associated with an AioContext then there's no need to keep assigning co->ctx on each qemu_coroutine_enter(). I don't know about this one. Paolo is the best person to review it since he's most familiar with the recent Coroutine/AioContext changes. Stefan [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 455 bytes --] ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [Qemu-devel] [PATCH v2 5/6] coroutine: Explicitly specify AioContext when entering coroutine 2017-04-07 13:26 ` Stefan Hajnoczi @ 2017-04-08 3:27 ` Fam Zheng 0 siblings, 0 replies; 26+ messages in thread From: Fam Zheng @ 2017-04-08 3:27 UTC (permalink / raw) To: Stefan Hajnoczi Cc: qemu-devel, Paolo Bonzini, qemu-block, Ed Swierk, Kevin Wolf, Max Reitz, Eric Blake On Fri, 04/07 14:26, Stefan Hajnoczi wrote: > I wasn't expecting the patch to touch so many places. If a coroutine > can be permanently associated with an AioContext then there's no need to > keep assigning co->ctx on each qemu_coroutine_enter(). Maybe, but bs->job->co->ctx changes when bdrv_set_aio_context(bs->ctx) is called. In v1 the co->ctx is associated at qemu_coroutine_create() time but failed to handle bdrv_set_aio_context. Fam ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [Qemu-devel] [PATCH v2 5/6] coroutine: Explicitly specify AioContext when entering coroutine 2017-04-07 6:54 ` [Qemu-devel] [PATCH v2 5/6] coroutine: Explicitly specify AioContext when entering coroutine Fam Zheng 2017-04-07 9:57 ` Fam Zheng 2017-04-07 13:26 ` Stefan Hajnoczi @ 2017-04-07 14:07 ` Eric Blake 2017-04-07 15:14 ` Kevin Wolf 3 siblings, 0 replies; 26+ messages in thread From: Eric Blake @ 2017-04-07 14:07 UTC (permalink / raw) To: Fam Zheng, qemu-devel Cc: Paolo Bonzini, qemu-block, Ed Swierk, Kevin Wolf, Max Reitz, Stefan Hajnoczi [-- Attachment #1: Type: text/plain, Size: 1924 bytes --] On 04/07/2017 01:54 AM, Fam Zheng wrote: > Coroutine in block layer should always be waken up in bs->aio_context s/waken up/awakened/ > rather than the "current" context where it is entered. They differ when > the main loop is doing QMP tasks. > > Race conditions happen without this patch, because the wrong context is > acquired in co_schedule_bh_cb, while the entered coroutine works on a > different one: > > main loop iothread > ----------------------------------------------------------------------- > blockdev_snapshot > aio_context_acquire(bs->ctx) > bdrv_flush(bs) > bdrv_co_flush(bs) > ... > qemu_coroutine_yield(co) > BDRV_POLL_WHILE() > aio_context_release(bs->ctx) > aio_context_acquire(bs->ctx) > ... > aio_co_wake(co) > aio_poll(qemu_aio_context) ... > co_schedule_bh_cb() ... > qemu_coroutine_enter(co) ... > /* (A) bdrv_co_flush(bs) /* (B) I/O on bs */ > continues... */ > aio_context_release(bs->ctx) > > Both (A) and (B) can access resources protected by bs->ctx, but (A) is > not thread-safe. > > Make the block layer explicitly specify a desired context for the > entered coroutine. For the rest callers, stick to the old behavior, s/rest/remaining/ > qemu_get_aio_context() or qemu_get_current_aio_context(). > > Signed-off-by: Fam Zheng <famz@redhat.com> > --- At this point, I'm still more comfortable waiting for Paolo's review. -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3266 Virtualization: qemu.org | libvirt.org [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 604 bytes --] ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [Qemu-devel] [PATCH v2 5/6] coroutine: Explicitly specify AioContext when entering coroutine 2017-04-07 6:54 ` [Qemu-devel] [PATCH v2 5/6] coroutine: Explicitly specify AioContext when entering coroutine Fam Zheng ` (2 preceding siblings ...) 2017-04-07 14:07 ` Eric Blake @ 2017-04-07 15:14 ` Kevin Wolf 2017-04-10 0:56 ` Paolo Bonzini 3 siblings, 1 reply; 26+ messages in thread From: Kevin Wolf @ 2017-04-07 15:14 UTC (permalink / raw) To: Fam Zheng Cc: qemu-devel, Paolo Bonzini, qemu-block, Ed Swierk, Max Reitz, Eric Blake, Stefan Hajnoczi Am 07.04.2017 um 08:54 hat Fam Zheng geschrieben: > Coroutine in block layer should always be waken up in bs->aio_context > rather than the "current" context where it is entered. They differ when > the main loop is doing QMP tasks. > > Race conditions happen without this patch, because the wrong context is > acquired in co_schedule_bh_cb, while the entered coroutine works on a > different one: > > main loop iothread > ----------------------------------------------------------------------- > blockdev_snapshot > aio_context_acquire(bs->ctx) > bdrv_flush(bs) > bdrv_co_flush(bs) > ... > qemu_coroutine_yield(co) > BDRV_POLL_WHILE() > aio_context_release(bs->ctx) > aio_context_acquire(bs->ctx) > ... > aio_co_wake(co) > aio_poll(qemu_aio_context) ... > co_schedule_bh_cb() ... > qemu_coroutine_enter(co) ... > /* (A) bdrv_co_flush(bs) /* (B) I/O on bs */ > continues... */ > aio_context_release(bs->ctx) As I discussed with Fam on IRC this morning, what this patch tries to fix (acquiring the wrong context) is not the real problem, but just a symptom. If you look at the example above (the same is true for the other example that Fam posted in a reply), you see that the monitor called aio_context_acquire() specifically in order to avoid that some other user interferes with its activities. The real bug is that the iothread even had a chance to run. One part of the reason is that BDRV_POLL_WHILE() drops the lock since commit c9d1a561, so just calling aio_context_acquire() doesn't protect you any more if there is any chance that a nested function calls BDRV_POLL_WHILE(). I haven't checked what was done when merging said commit, but I kind of expect that we didn't carefully audit all callers of aio_context_acquire() whether they can cope with this. Specifically, monitor commands tend to rely on the fact that they keep the lock and therefore nobody else can interfere. When I scrolled through blockdev.c this morning, I saw a few suspicious ones that could be broken now. Now, of course, some callers additionally call bdrv_drained_begin(), and the snapshot code is one of them. This should in theory be safe, but in practice even both aio_context_acquire() and bdrv_drained_begin() together don't give us the exclusive access that these callers want. This is the real bug to address. I don't know enough about this code to say whether aio_context_acquire() alone should give the same guarantees again (I suspect it became impractical?) or whether we need to fix only bdrv_drained_begin() and add it to more places. So I'll join the others in this email thread: Paolo, do you have an opinion on this? Kevin > Both (A) and (B) can access resources protected by bs->ctx, but (A) is > not thread-safe. > > Make the block layer explicitly specify a desired context for the > entered coroutine. For the rest callers, stick to the old behavior, > qemu_get_aio_context() or qemu_get_current_aio_context(). > > Signed-off-by: Fam Zheng <famz@redhat.com> ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [Qemu-devel] [PATCH v2 5/6] coroutine: Explicitly specify AioContext when entering coroutine 2017-04-07 15:14 ` Kevin Wolf @ 2017-04-10 0:56 ` Paolo Bonzini 2017-04-10 1:43 ` Fam Zheng 0 siblings, 1 reply; 26+ messages in thread From: Paolo Bonzini @ 2017-04-10 0:56 UTC (permalink / raw) To: Kevin Wolf, Fam Zheng; +Cc: qemu-block, qemu-devel, Max Reitz, Stefan Hajnoczi On 07/04/2017 23:14, Kevin Wolf wrote: > One part of the reason is that > BDRV_POLL_WHILE() drops the lock since commit c9d1a561, so just calling > aio_context_acquire() doesn't protect you any more if there is any > chance that a nested function calls BDRV_POLL_WHILE(). Hi guys, sorry for the late reply. There wasn't actually much that changed in commit c9d1a561. Everything that could break was also broken before. With commit c9d1a561 the logic is aio_context_acquire prepare I/O aio_context_release poll main AioContext aio_context_acquire aio_poll secondary AioContext complete I/O <-- bdrv_wakeup aio_context_acquire aio_context_release Sync I/O is complete while before it was aio_context_acquire prepare I/O aio_poll secondary AioContext complete I/O Sync I/O is complete The code that can run in "aio_poll secondary AioContext" is the same in both cases. The difference is that, after c9d1a561, it always runs in one thread which eliminates the need for RFifoLock's contention callbacks (and a bunch of bdrv_drain_all deadlocks that arose from the combination of contention callbacks and recursive locking). The patch that introduced the bug is the one that introduced the "home AioContext" co->ctx for coroutines, because now you have aio_context_acquire prepare I/O aio_context_release poll main AioContext aio_context_acquire aio_poll secondary AioContext aio_co_wake complete I/O bdrv_wakeup aio_context_acquire aio_context_release Sync I/O is complete and if "complete I/O" causes anything to happen in the iothread, bad things happen. I think Fam's analysis is right. This patch will hopefully be reverted in 2.10, but right now it's the right thing to do. However, I don't like very much adding an argument to qemu_coroutine_enter because: 1) coroutines can be used without AioContexts (CoMutex has a dependency on aio_co_wake, but it is hidden behind the API and if you have a single AioContext you could just define aio_co_wake to be qemu_coroutine_enter). 2) the value that is assigned to co->ctx is not the AioContext in which the coroutine is running. It would be better to split aio_co_wake so that aio_co_wake is /* Read coroutine before co->ctx. Matches smp_wmb in * qemu_coroutine_enter. */ smp_read_barrier_depends(); ctx = atomic_read(&co->ctx); aio_co_enter(ctx, co); and aio_co_enter is the rest of the logic. Then the coroutine code always runs in the right thread, which obviously is thread-safe. Paolo ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [Qemu-devel] [PATCH v2 5/6] coroutine: Explicitly specify AioContext when entering coroutine 2017-04-10 0:56 ` Paolo Bonzini @ 2017-04-10 1:43 ` Fam Zheng 0 siblings, 0 replies; 26+ messages in thread From: Fam Zheng @ 2017-04-10 1:43 UTC (permalink / raw) To: Paolo Bonzini Cc: Kevin Wolf, Stefan Hajnoczi, qemu-devel, qemu-block, Max Reitz On Mon, 04/10 08:56, Paolo Bonzini wrote: > > > On 07/04/2017 23:14, Kevin Wolf wrote: > > One part of the reason is that > > BDRV_POLL_WHILE() drops the lock since commit c9d1a561, so just calling > > aio_context_acquire() doesn't protect you any more if there is any > > chance that a nested function calls BDRV_POLL_WHILE(). > > Hi guys, sorry for the late reply. > > There wasn't actually much that changed in commit c9d1a561. Everything > that could break was also broken before. > > With commit c9d1a561 the logic is > > aio_context_acquire > prepare I/O > aio_context_release > poll main AioContext > aio_context_acquire > aio_poll secondary AioContext > complete I/O > <-- bdrv_wakeup > aio_context_acquire > aio_context_release > Sync I/O is complete > > while before it was > > aio_context_acquire > prepare I/O > aio_poll secondary AioContext > complete I/O > Sync I/O is complete > > The code that can run in "aio_poll secondary AioContext" is the same in > both cases. This is not completely true. Before this fact, a blocked aio_context_acquire() in virtio_scsi_data_plane_handle_cmd() wouldn't be woken up during the sync I/O. Of course the aio context pushdown (9d4566544) happened after c9d1a561, so this is a compound consequence of the two. Also, not polling for at least once in bdrv_drain_poll if !bdrv_requests_pending(bs), since 997235485, made the situation a bit worse - virtio_scsi_data_plane_handle_cmd could have returned before bdrv_drained_begin() returns if we do the BDRV_POLL_WHILE body at least once even if cond is false. > The difference is that, after c9d1a561, it always runs in > one thread which eliminates the need for RFifoLock's contention > callbacks (and a bunch of bdrv_drain_all deadlocks that arose from the > combination of contention callbacks and recursive locking). > > The patch that introduced the bug is the one that introduced the "home > AioContext" co->ctx for coroutines, because now you have > > aio_context_acquire > prepare I/O > aio_context_release > poll main AioContext > aio_context_acquire > aio_poll secondary AioContext > aio_co_wake > complete I/O > bdrv_wakeup > aio_context_acquire > aio_context_release > Sync I/O is complete > > and if "complete I/O" causes anything to happen in the iothread, bad > things happen. > > I think Fam's analysis is right. This patch will hopefully be reverted > in 2.10, but right now it's the right thing to do. However, I don't > like very much adding an argument to qemu_coroutine_enter because: > > 1) coroutines can be used without AioContexts (CoMutex has a dependency > on aio_co_wake, but it is hidden behind the API and if you have a single > AioContext you could just define aio_co_wake to be qemu_coroutine_enter). > > 2) the value that is assigned to co->ctx is not the AioContext in which > the coroutine is running. > > It would be better to split aio_co_wake so that aio_co_wake is > > /* Read coroutine before co->ctx. Matches smp_wmb in > * qemu_coroutine_enter. > */ > smp_read_barrier_depends(); > ctx = atomic_read(&co->ctx); > aio_co_enter(ctx, co); > > and aio_co_enter is the rest of the logic. Is aio_co_enter what solves 1) for you too? If so I can add it in v3. Fam > > Then the coroutine code always runs in the right thread, which obviously > is thread-safe. > > Paolo > ^ permalink raw reply [flat|nested] 26+ messages in thread
* [Qemu-devel] [PATCH v2 6/6] tests/block-job-txn: Don't start block job before adding to txn 2017-04-07 6:54 [Qemu-devel] [PATCH v2 0/6] block: Fixes regarding dataplane and management operations Fam Zheng ` (4 preceding siblings ...) 2017-04-07 6:54 ` [Qemu-devel] [PATCH v2 5/6] coroutine: Explicitly specify AioContext when entering coroutine Fam Zheng @ 2017-04-07 6:54 ` Fam Zheng 2017-04-07 13:28 ` Stefan Hajnoczi 2017-04-07 12:45 ` [Qemu-devel] [PATCH v2 0/6] block: Fixes regarding dataplane and management operations Kevin Wolf 6 siblings, 1 reply; 26+ messages in thread From: Fam Zheng @ 2017-04-07 6:54 UTC (permalink / raw) To: qemu-devel Cc: Paolo Bonzini, qemu-block, Ed Swierk, Fam Zheng, Kevin Wolf, Max Reitz, Eric Blake, Stefan Hajnoczi Previously, before test_block_job_start returns, the job can already complete, as a result, the transactional state of other jobs added to the same txn later cannot be handled correctly. Move the block_job_start() calls to callers after block_job_txn_add_job() calls. Signed-off-by: Fam Zheng <famz@redhat.com> --- tests/test-blockjob-txn.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/tests/test-blockjob-txn.c b/tests/test-blockjob-txn.c index 4ccbda1..0f80194 100644 --- a/tests/test-blockjob-txn.c +++ b/tests/test-blockjob-txn.c @@ -110,7 +110,6 @@ static BlockJob *test_block_job_start(unsigned int iterations, s->result = result; data->job = s; data->result = result; - block_job_start(&s->common); return &s->common; } @@ -123,6 +122,7 @@ static void test_single_job(int expected) txn = block_job_txn_new(); job = test_block_job_start(1, true, expected, &result); block_job_txn_add_job(txn, job); + block_job_start(job); if (expected == -ECANCELED) { block_job_cancel(job); @@ -164,6 +164,8 @@ static void test_pair_jobs(int expected1, int expected2) block_job_txn_add_job(txn, job1); job2 = test_block_job_start(2, true, expected2, &result2); block_job_txn_add_job(txn, job2); + block_job_start(job1); + block_job_start(job2); if (expected1 == -ECANCELED) { block_job_cancel(job1); @@ -223,6 +225,8 @@ static void test_pair_jobs_fail_cancel_race(void) block_job_txn_add_job(txn, job1); job2 = test_block_job_start(2, false, 0, &result2); block_job_txn_add_job(txn, job2); + block_job_start(job1); + block_job_start(job2); block_job_cancel(job1); -- 2.9.3 ^ permalink raw reply related [flat|nested] 26+ messages in thread
* Re: [Qemu-devel] [PATCH v2 6/6] tests/block-job-txn: Don't start block job before adding to txn 2017-04-07 6:54 ` [Qemu-devel] [PATCH v2 6/6] tests/block-job-txn: Don't start block job before adding to txn Fam Zheng @ 2017-04-07 13:28 ` Stefan Hajnoczi 2017-04-07 18:05 ` John Snow 0 siblings, 1 reply; 26+ messages in thread From: Stefan Hajnoczi @ 2017-04-07 13:28 UTC (permalink / raw) To: Fam Zheng Cc: qemu-devel, Paolo Bonzini, qemu-block, Ed Swierk, Kevin Wolf, Max Reitz, Eric Blake, jsnow [-- Attachment #1: Type: text/plain, Size: 596 bytes --] On Fri, Apr 07, 2017 at 02:54:14PM +0800, Fam Zheng wrote: > Previously, before test_block_job_start returns, the job can already > complete, as a result, the transactional state of other jobs added to > the same txn later cannot be handled correctly. > > Move the block_job_start() calls to callers after > block_job_txn_add_job() calls. > > Signed-off-by: Fam Zheng <famz@redhat.com> > --- > tests/test-blockjob-txn.c | 6 +++++- > 1 file changed, 5 insertions(+), 1 deletion(-) CCing John Snow because he looked at block jobs completing during txn setup recently. Stefan [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 455 bytes --] ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [Qemu-devel] [PATCH v2 6/6] tests/block-job-txn: Don't start block job before adding to txn 2017-04-07 13:28 ` Stefan Hajnoczi @ 2017-04-07 18:05 ` John Snow 2017-04-08 3:39 ` Fam Zheng 0 siblings, 1 reply; 26+ messages in thread From: John Snow @ 2017-04-07 18:05 UTC (permalink / raw) To: Stefan Hajnoczi, Fam Zheng Cc: qemu-devel, Paolo Bonzini, qemu-block, Ed Swierk, Kevin Wolf, Max Reitz, Eric Blake On 04/07/2017 09:28 AM, Stefan Hajnoczi wrote: > On Fri, Apr 07, 2017 at 02:54:14PM +0800, Fam Zheng wrote: >> Previously, before test_block_job_start returns, the job can already >> complete, as a result, the transactional state of other jobs added to >> the same txn later cannot be handled correctly. >> >> Move the block_job_start() calls to callers after >> block_job_txn_add_job() calls. >> >> Signed-off-by: Fam Zheng <famz@redhat.com> >> --- >> tests/test-blockjob-txn.c | 6 +++++- >> 1 file changed, 5 insertions(+), 1 deletion(-) > > CCing John Snow because he looked at block jobs completing during txn > setup recently. > > Stefan > This matches the changes we made to qmp_transaction, but I forgot to (or didn't take care to) change the qtest as it didn't cause a regression at the time. I wonder if I should make it a runtime error to add a job to a transaction which has already "started" to make sure that this interface is not misused, as this test highlights that there were still some remaining "bad" uses of the interface. Regardless... Thanks for the CC. ACK ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [Qemu-devel] [PATCH v2 6/6] tests/block-job-txn: Don't start block job before adding to txn 2017-04-07 18:05 ` John Snow @ 2017-04-08 3:39 ` Fam Zheng 0 siblings, 0 replies; 26+ messages in thread From: Fam Zheng @ 2017-04-08 3:39 UTC (permalink / raw) To: John Snow Cc: Stefan Hajnoczi, qemu-devel, Paolo Bonzini, qemu-block, Ed Swierk, Kevin Wolf, Max Reitz, Eric Blake On Fri, 04/07 14:05, John Snow wrote: > > > On 04/07/2017 09:28 AM, Stefan Hajnoczi wrote: > > On Fri, Apr 07, 2017 at 02:54:14PM +0800, Fam Zheng wrote: > >> Previously, before test_block_job_start returns, the job can already > >> complete, as a result, the transactional state of other jobs added to > >> the same txn later cannot be handled correctly. > >> > >> Move the block_job_start() calls to callers after > >> block_job_txn_add_job() calls. > >> > >> Signed-off-by: Fam Zheng <famz@redhat.com> > >> --- > >> tests/test-blockjob-txn.c | 6 +++++- > >> 1 file changed, 5 insertions(+), 1 deletion(-) > > > > CCing John Snow because he looked at block jobs completing during txn > > setup recently. > > > > Stefan > > > > This matches the changes we made to qmp_transaction, but I forgot to (or > didn't take care to) change the qtest as it didn't cause a regression > at the time. > > I wonder if I should make it a runtime error to add a job to a > transaction which has already "started" to make sure that this interface > is not misused, as this test highlights that there were still some > remaining "bad" uses of the interface. > > Regardless... > > Thanks for the CC. ACK So, shall we merge this for 2.9? Fam ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [Qemu-devel] [PATCH v2 0/6] block: Fixes regarding dataplane and management operations 2017-04-07 6:54 [Qemu-devel] [PATCH v2 0/6] block: Fixes regarding dataplane and management operations Fam Zheng ` (5 preceding siblings ...) 2017-04-07 6:54 ` [Qemu-devel] [PATCH v2 6/6] tests/block-job-txn: Don't start block job before adding to txn Fam Zheng @ 2017-04-07 12:45 ` Kevin Wolf 6 siblings, 0 replies; 26+ messages in thread From: Kevin Wolf @ 2017-04-07 12:45 UTC (permalink / raw) To: Fam Zheng Cc: qemu-devel, Paolo Bonzini, qemu-block, Ed Swierk, Max Reitz, Eric Blake, Stefan Hajnoczi Am 07.04.2017 um 08:54 hat Fam Zheng geschrieben: > v2: - Drop patch 4 in v1. A second thought made me feel neither it nor Kevin's > suggestion to move the BH process to bdrv_drain_recurse/BDRV_POLL_WHILE > is a complete fix. So leave it for a separate patch. > - Add rev-by to patches 1, 3, 4. > - Split from patch 1 in v1 and add patch 2, for the new assertions. [Kevin] > - Rewrite patch 5. Fix block job's co when a BDS is moved to a different > aio context. [Kevin] > - Add patch 6. > > Crashes are reported on dataplane devices when doing snapshot and commit under > guest I/O. > > With this series, Ed's test case '176' now passes (reran 10+ times): > > https://github.com/skyportsystems/qemu-1/commits/eswierk-iotests-2.9 > > The biggest fix for this is patch 5, which fixes a race condition between main > thread and iothread. Thanks, applied patch 1-3 for now. I feel the other patches still need closer review and potentially discussion. Kevin ^ permalink raw reply [flat|nested] 26+ messages in thread
end of thread, other threads:[~2017-04-10 9:11 UTC | newest] Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2017-04-07 6:54 [Qemu-devel] [PATCH v2 0/6] block: Fixes regarding dataplane and management operations Fam Zheng 2017-04-07 6:54 ` [Qemu-devel] [PATCH v2 1/6] block: Fix unpaired aio_disable_external in external snapshot Fam Zheng 2017-04-07 13:23 ` Stefan Hajnoczi 2017-04-07 6:54 ` [Qemu-devel] [PATCH v2 2/6] block: Assert attached child node has right aio context Fam Zheng 2017-04-07 13:24 ` Stefan Hajnoczi 2017-04-07 6:54 ` [Qemu-devel] [PATCH v2 3/6] mirror: Fix aio context of mirror_top_bs Fam Zheng 2017-04-07 13:24 ` Stefan Hajnoczi 2017-04-07 6:54 ` [Qemu-devel] [PATCH v2 4/6] block: Quiesce old aio context during bdrv_set_aio_context Fam Zheng 2017-04-07 12:50 ` Stefan Hajnoczi 2017-04-08 3:43 ` Fam Zheng 2017-04-10 8:06 ` Kevin Wolf 2017-04-10 8:45 ` Fam Zheng 2017-04-10 9:11 ` Kevin Wolf 2017-04-07 6:54 ` [Qemu-devel] [PATCH v2 5/6] coroutine: Explicitly specify AioContext when entering coroutine Fam Zheng 2017-04-07 9:57 ` Fam Zheng 2017-04-07 13:26 ` Stefan Hajnoczi 2017-04-08 3:27 ` Fam Zheng 2017-04-07 14:07 ` Eric Blake 2017-04-07 15:14 ` Kevin Wolf 2017-04-10 0:56 ` Paolo Bonzini 2017-04-10 1:43 ` Fam Zheng 2017-04-07 6:54 ` [Qemu-devel] [PATCH v2 6/6] tests/block-job-txn: Don't start block job before adding to txn Fam Zheng 2017-04-07 13:28 ` Stefan Hajnoczi 2017-04-07 18:05 ` John Snow 2017-04-08 3:39 ` Fam Zheng 2017-04-07 12:45 ` [Qemu-devel] [PATCH v2 0/6] block: Fixes regarding dataplane and management operations Kevin Wolf
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.