All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH for-2.9 0/5] block: Fixes regarding dataplane and management operations
@ 2017-04-06 14:25 Fam Zheng
  2017-04-06 14:25 ` [Qemu-devel] [PATCH for-2.9 1/5] block: Fix unpaired aio_disable_external in external snapshot Fam Zheng
                   ` (4 more replies)
  0 siblings, 5 replies; 21+ messages in thread
From: Fam Zheng @ 2017-04-06 14:25 UTC (permalink / raw)
  To: qemu-devel
  Cc: Paolo Bonzini, qemu-block, Ed Swierk, Fam Zheng, Kevin Wolf,
	Max Reitz, Stefan Hajnoczi

These are accumulated fixes that showed up when working on the "change"
issue and "repeated snapshot + commit" crash on dataplane:

    [Qemu-devel] [PATCH] blk: fix aio context loss on media change
        - Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>

    [Qemu-devel] New iotest repros failures on virtio external snapshot with
    iothread
        - Ed Swierk <eswierk@skyportsystems.com>

Specifically, patch 4 is an output of the former, and the rest are from the
latter. They seem quite related hence this one series.

With this series applied, Ed's test case '176' passes (reran 10+ times):

    https://github.com/skyportsystems/qemu-1/commits/eswierk-iotests-2.9

NOTE: The "change" crash still needs Vladimir's fix. Patch 4 is noticed with a
few not-so-related WIP patches, but my understanding is this is a latent bug
and IMO the change is rather harmless to have.

Fam Zheng (5):
  block: Fix unpaired aio_disable_external in external snapshot
  mirror: Fix aio context of mirror_top_bs
  block: Quiesce old aio context during bdrv_set_aio_context
  block: Drain BH in bdrv_drained_begin
  coroutine: Explicitly specify AioContext when creating coroutine

 block.c                      | 23 +++++++++++++++++---
 block/blkverify.c            |  4 ++--
 block/block-backend.c        |  4 ++--
 block/io.c                   | 21 ++++++++++++-------
 block/mirror.c               |  1 +
 block/nbd-client.c           |  2 +-
 block/quorum.c               |  6 +++---
 block/sheepdog.c             |  4 ++--
 blockdev.c                   |  4 ++--
 blockjob.c                   |  2 +-
 hw/9pfs/9p.c                 |  4 ++--
 include/block/block.h        |  3 +++
 include/qemu/coroutine.h     |  3 ++-
 include/qemu/main-loop.h     |  2 +-
 migration/migration.c        |  3 ++-
 nbd/server.c                 |  6 ++++--
 qemu-img.c                   |  3 ++-
 qemu-io-cmds.c               |  3 ++-
 tests/test-aio-multithread.c | 12 +++++++----
 tests/test-coroutine.c       | 50 ++++++++++++++++++++++++++++++--------------
 tests/test-thread-pool.c     |  3 ++-
 util/qemu-coroutine.c        |  6 ++++--
 22 files changed, 113 insertions(+), 56 deletions(-)

-- 
2.9.3

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Qemu-devel] [PATCH for-2.9 1/5] block: Fix unpaired aio_disable_external in external snapshot
  2017-04-06 14:25 [Qemu-devel] [PATCH for-2.9 0/5] block: Fixes regarding dataplane and management operations Fam Zheng
@ 2017-04-06 14:25 ` Fam Zheng
  2017-04-06 14:36   ` Eric Blake
  2017-04-06 14:36   ` Kevin Wolf
  2017-04-06 14:25 ` [Qemu-devel] [PATCH for-2.9 2/5] mirror: Fix aio context of mirror_top_bs Fam Zheng
                   ` (3 subsequent siblings)
  4 siblings, 2 replies; 21+ messages in thread
From: Fam Zheng @ 2017-04-06 14:25 UTC (permalink / raw)
  To: qemu-devel
  Cc: Paolo Bonzini, qemu-block, Ed Swierk, Fam Zheng, Kevin Wolf,
	Max Reitz, Stefan Hajnoczi

bdrv_replace_child_noperm tries to hand over the quiesce_counter state
from old bs to the new one, but if they are not on the same aio context
this causes unbalance.

Fix this by forcing callers to move the aio context first, and assert
it.

Reported-by: Ed Swierk <eswierk@skyportsystems.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
---
 block.c    | 3 +++
 blockdev.c | 4 ++--
 2 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/block.c b/block.c
index 927ba89..8893ac1 100644
--- a/block.c
+++ b/block.c
@@ -1752,6 +1752,9 @@ static void bdrv_replace_child_noperm(BdrvChild *child,
 {
     BlockDriverState *old_bs = child->bs;
 
+    if (old_bs && new_bs) {
+        assert(bdrv_get_aio_context(old_bs) == bdrv_get_aio_context(new_bs));
+    }
     if (old_bs) {
         if (old_bs->quiesce_counter && child->role->drained_end) {
             child->role->drained_end(child);
diff --git a/blockdev.c b/blockdev.c
index 040c152..4927914 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -1772,6 +1772,8 @@ static void external_snapshot_prepare(BlkActionState *common,
         return;
     }
 
+    bdrv_set_aio_context(state->new_bs, state->aio_context);
+
     /* This removes our old bs and adds the new bs. This is an operation that
      * can fail, so we need to do it in .prepare; undoing it for abort is
      * always possible. */
@@ -1789,8 +1791,6 @@ static void external_snapshot_commit(BlkActionState *common)
     ExternalSnapshotState *state =
                              DO_UPCAST(ExternalSnapshotState, common, common);
 
-    bdrv_set_aio_context(state->new_bs, state->aio_context);
-
     /* We don't need (or want) to use the transactional
      * bdrv_reopen_multiple() across all the entries at once, because we
      * don't want to abort all of them if one of them fails the reopen */
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [Qemu-devel] [PATCH for-2.9 2/5] mirror: Fix aio context of mirror_top_bs
  2017-04-06 14:25 [Qemu-devel] [PATCH for-2.9 0/5] block: Fixes regarding dataplane and management operations Fam Zheng
  2017-04-06 14:25 ` [Qemu-devel] [PATCH for-2.9 1/5] block: Fix unpaired aio_disable_external in external snapshot Fam Zheng
@ 2017-04-06 14:25 ` Fam Zheng
  2017-04-06 14:36   ` Eric Blake
  2017-04-06 14:37   ` Kevin Wolf
  2017-04-06 14:25 ` [Qemu-devel] [PATCH for-2.9 3/5] block: Quiesce old aio context during bdrv_set_aio_context Fam Zheng
                   ` (2 subsequent siblings)
  4 siblings, 2 replies; 21+ messages in thread
From: Fam Zheng @ 2017-04-06 14:25 UTC (permalink / raw)
  To: qemu-devel
  Cc: Paolo Bonzini, qemu-block, Ed Swierk, Fam Zheng, Kevin Wolf,
	Max Reitz, Stefan Hajnoczi

It should be moved to the same context as source, before inserting to the
graph.

Signed-off-by: Fam Zheng <famz@redhat.com>
---
 block/mirror.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/block/mirror.c b/block/mirror.c
index 9e2fecc..e904fef 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -1148,6 +1148,7 @@ static void mirror_start_job(const char *job_id, BlockDriverState *bs,
         return;
     }
     mirror_top_bs->total_sectors = bs->total_sectors;
+    bdrv_set_aio_context(mirror_top_bs, bdrv_get_aio_context(bs));
 
     /* bdrv_append takes ownership of the mirror_top_bs reference, need to keep
      * it alive until block_job_create() even if bs has no parent. */
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [Qemu-devel] [PATCH for-2.9 3/5] block: Quiesce old aio context during bdrv_set_aio_context
  2017-04-06 14:25 [Qemu-devel] [PATCH for-2.9 0/5] block: Fixes regarding dataplane and management operations Fam Zheng
  2017-04-06 14:25 ` [Qemu-devel] [PATCH for-2.9 1/5] block: Fix unpaired aio_disable_external in external snapshot Fam Zheng
  2017-04-06 14:25 ` [Qemu-devel] [PATCH for-2.9 2/5] mirror: Fix aio context of mirror_top_bs Fam Zheng
@ 2017-04-06 14:25 ` Fam Zheng
  2017-04-06 15:00   ` Eric Blake
  2017-04-06 15:17   ` Kevin Wolf
  2017-04-06 14:25 ` [Qemu-devel] [PATCH for-2.9 4/5] block: Drain BH in bdrv_drained_begin Fam Zheng
  2017-04-06 14:25 ` [Qemu-devel] [PATCH for-2.9 5/5] coroutine: Explicitly specify AioContext when creating coroutine Fam Zheng
  4 siblings, 2 replies; 21+ messages in thread
From: Fam Zheng @ 2017-04-06 14:25 UTC (permalink / raw)
  To: qemu-devel
  Cc: Paolo Bonzini, qemu-block, Ed Swierk, Fam Zheng, Kevin Wolf,
	Max Reitz, Stefan Hajnoczi

The fact that the bs->aio_context is changing can confuse the dataplane
iothread, because of the now fine granularity aio context lock.
bdrv_drain should rather be a bdrv_drained_begin/end pair, but since
bs->aio_context is changing, we can just use aio_disable_external and
block_job_pause.

Reported-by: Ed Swierk <eswierk@skyportsystems.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
---
 block.c | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/block.c b/block.c
index 8893ac1..e70684a 100644
--- a/block.c
+++ b/block.c
@@ -4395,11 +4395,14 @@ void bdrv_attach_aio_context(BlockDriverState *bs,
 
 void bdrv_set_aio_context(BlockDriverState *bs, AioContext *new_context)
 {
-    AioContext *ctx;
+    AioContext *ctx = bdrv_get_aio_context(bs);
 
+    aio_disable_external(ctx);
+    if (bs->job) {
+        block_job_pause(bs->job);
+    }
     bdrv_drain(bs); /* ensure there are no in-flight requests */
 
-    ctx = bdrv_get_aio_context(bs);
     while (aio_poll(ctx, false)) {
         /* wait for all bottom halves to execute */
     }
@@ -4412,6 +4415,10 @@ void bdrv_set_aio_context(BlockDriverState *bs, AioContext *new_context)
     aio_context_acquire(new_context);
     bdrv_attach_aio_context(bs, new_context);
     aio_context_release(new_context);
+    if (bs->job) {
+        block_job_resume(bs->job);
+    }
+    aio_enable_external(ctx);
 }
 
 void bdrv_add_aio_context_notifier(BlockDriverState *bs,
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [Qemu-devel] [PATCH for-2.9 4/5] block: Drain BH in bdrv_drained_begin
  2017-04-06 14:25 [Qemu-devel] [PATCH for-2.9 0/5] block: Fixes regarding dataplane and management operations Fam Zheng
                   ` (2 preceding siblings ...)
  2017-04-06 14:25 ` [Qemu-devel] [PATCH for-2.9 3/5] block: Quiesce old aio context during bdrv_set_aio_context Fam Zheng
@ 2017-04-06 14:25 ` Fam Zheng
  2017-04-06 15:10   ` Eric Blake
  2017-04-06 15:37   ` Kevin Wolf
  2017-04-06 14:25 ` [Qemu-devel] [PATCH for-2.9 5/5] coroutine: Explicitly specify AioContext when creating coroutine Fam Zheng
  4 siblings, 2 replies; 21+ messages in thread
From: Fam Zheng @ 2017-04-06 14:25 UTC (permalink / raw)
  To: qemu-devel
  Cc: Paolo Bonzini, qemu-block, Ed Swierk, Fam Zheng, Kevin Wolf,
	Max Reitz, Stefan Hajnoczi

During block job completion, nothing is preventing
block_job_defer_to_main_loop_bh from being called in a nested
aio_poll(), which is a trouble, such as in this code path:

    qmp_block_commit
      commit_active_start
        bdrv_reopen
          bdrv_reopen_multiple
            bdrv_reopen_prepare
              bdrv_flush
                aio_poll
                  aio_bh_poll
                    aio_bh_call
                      block_job_defer_to_main_loop_bh
                        stream_complete
                          bdrv_reopen

block_job_defer_to_main_loop_bh is the last step of the stream job,
which should have been "paused" by the bdrv_drained_begin/end in
bdrv_reopen_multiple, but it is not done because it's in the form of a
main loop BH.

Similar to why block jobs should be paused between drained_begin and
drained_end, BHs they schedule must be excluded as well.  To achieve
this, this patch forces draining the BH before leaving bdrv_drained_begin().

Signed-off-by: Fam Zheng <famz@redhat.com>
---
 block/io.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/block/io.c b/block/io.c
index 2709a70..b9cfd18 100644
--- a/block/io.c
+++ b/block/io.c
@@ -228,7 +228,12 @@ void bdrv_drained_begin(BlockDriverState *bs)
         bdrv_parent_drained_begin(bs);
     }
 
-    bdrv_drain_recurse(bs);
+    while (true) {
+        if (!bdrv_drain_recurse(bs) &&
+            !aio_poll(bdrv_get_aio_context(bs), false)) {
+                break;
+            }
+    }
 }
 
 void bdrv_drained_end(BlockDriverState *bs)
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [Qemu-devel] [PATCH for-2.9 5/5] coroutine: Explicitly specify AioContext when creating coroutine
  2017-04-06 14:25 [Qemu-devel] [PATCH for-2.9 0/5] block: Fixes regarding dataplane and management operations Fam Zheng
                   ` (3 preceding siblings ...)
  2017-04-06 14:25 ` [Qemu-devel] [PATCH for-2.9 4/5] block: Drain BH in bdrv_drained_begin Fam Zheng
@ 2017-04-06 14:25 ` Fam Zheng
  2017-04-06 15:34   ` Eric Blake
  2017-04-06 16:20   ` Kevin Wolf
  4 siblings, 2 replies; 21+ messages in thread
From: Fam Zheng @ 2017-04-06 14:25 UTC (permalink / raw)
  To: qemu-devel
  Cc: Paolo Bonzini, qemu-block, Ed Swierk, Fam Zheng, Kevin Wolf,
	Max Reitz, Stefan Hajnoczi

Coroutine in block layer should always be waken up in bs->aio_context
rather than the "current" context where it is entered. They differ when
the main loop is doing QMP tasks.

Race conditions happen without this patch, because the wrong context is
acquired in co_schedule_bh_cb, while the entered coroutine works on a
different one.

Make the block layer explicitly specify a desired context for each created
coroutine. For the rest, always use qemu_get_aio_context().

Signed-off-by: Fam Zheng <famz@redhat.com>
---
 block.c                      |  9 +++++++-
 block/blkverify.c            |  4 ++--
 block/block-backend.c        |  4 ++--
 block/io.c                   | 14 ++++++-------
 block/nbd-client.c           |  2 +-
 block/quorum.c               |  6 +++---
 block/sheepdog.c             |  4 ++--
 blockjob.c                   |  2 +-
 hw/9pfs/9p.c                 |  4 ++--
 include/block/block.h        |  3 +++
 include/qemu/coroutine.h     |  3 ++-
 include/qemu/main-loop.h     |  2 +-
 migration/migration.c        |  3 ++-
 nbd/server.c                 |  6 ++++--
 qemu-img.c                   |  3 ++-
 qemu-io-cmds.c               |  3 ++-
 tests/test-aio-multithread.c | 12 +++++++----
 tests/test-coroutine.c       | 50 ++++++++++++++++++++++++++++++--------------
 tests/test-thread-pool.c     |  3 ++-
 util/qemu-coroutine.c        |  6 ++++--
 20 files changed, 92 insertions(+), 51 deletions(-)

diff --git a/block.c b/block.c
index e70684a..d3598d5 100644
--- a/block.c
+++ b/block.c
@@ -357,7 +357,8 @@ int bdrv_create(BlockDriver *drv, const char* filename,
         /* Fast-path if already in coroutine context */
         bdrv_create_co_entry(&cco);
     } else {
-        co = qemu_coroutine_create(bdrv_create_co_entry, &cco);
+        co = qemu_coroutine_create(qemu_get_aio_context(),
+                                   bdrv_create_co_entry, &cco);
         qemu_coroutine_enter(co);
         while (cco.ret == NOT_DONE) {
             aio_poll(qemu_get_aio_context(), true);
@@ -4323,6 +4324,12 @@ AioContext *bdrv_get_aio_context(BlockDriverState *bs)
     return bs->aio_context;
 }
 
+Coroutine *bdrv_coroutine_create(BlockDriverState *bs,
+                                 CoroutineEntry *entry, void *opaque)
+{
+    return qemu_coroutine_create(bdrv_get_aio_context(bs), entry, opaque);
+}
+
 static void bdrv_do_remove_aio_context_notifier(BdrvAioNotifier *ban)
 {
     QLIST_REMOVE(ban, list);
diff --git a/block/blkverify.c b/block/blkverify.c
index 9a1e21c..df761cf 100644
--- a/block/blkverify.c
+++ b/block/blkverify.c
@@ -204,8 +204,8 @@ blkverify_co_prwv(BlockDriverState *bs, BlkverifyRequest *r, uint64_t offset,
         .request_fn = is_write ? bdrv_co_pwritev : bdrv_co_preadv,
     };
 
-    co_a = qemu_coroutine_create(blkverify_do_test_req, r);
-    co_b = qemu_coroutine_create(blkverify_do_raw_req, r);
+    co_a = bdrv_coroutine_create(bs, blkverify_do_test_req, r);
+    co_b = bdrv_coroutine_create(bs, blkverify_do_raw_req, r);
 
     qemu_coroutine_enter(co_a);
     qemu_coroutine_enter(co_b);
diff --git a/block/block-backend.c b/block/block-backend.c
index 0b63773..6a91514 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -1006,7 +1006,7 @@ static int blk_prw(BlockBackend *blk, int64_t offset, uint8_t *buf,
         /* Fast-path if already in coroutine context */
         co_entry(&rwco);
     } else {
-        Coroutine *co = qemu_coroutine_create(co_entry, &rwco);
+        Coroutine *co = bdrv_coroutine_create(blk_bs(blk), co_entry, &rwco);
         qemu_coroutine_enter(co);
         BDRV_POLL_WHILE(blk_bs(blk), rwco.ret == NOT_DONE);
     }
@@ -1113,7 +1113,7 @@ static BlockAIOCB *blk_aio_prwv(BlockBackend *blk, int64_t offset, int bytes,
     acb->bytes = bytes;
     acb->has_returned = false;
 
-    co = qemu_coroutine_create(co_entry, acb);
+    co = bdrv_coroutine_create(blk_bs(blk), co_entry, acb);
     qemu_coroutine_enter(co);
 
     acb->has_returned = true;
diff --git a/block/io.c b/block/io.c
index b9cfd18..41d7d7b 100644
--- a/block/io.c
+++ b/block/io.c
@@ -620,7 +620,7 @@ static int bdrv_prwv_co(BdrvChild *child, int64_t offset,
         /* Fast-path if already in coroutine context */
         bdrv_rw_co_entry(&rwco);
     } else {
-        co = qemu_coroutine_create(bdrv_rw_co_entry, &rwco);
+        co = bdrv_coroutine_create(child->bs, bdrv_rw_co_entry, &rwco);
         qemu_coroutine_enter(co);
         BDRV_POLL_WHILE(child->bs, rwco.ret == NOT_DONE);
     }
@@ -1876,7 +1876,7 @@ int64_t bdrv_get_block_status_above(BlockDriverState *bs,
         /* Fast-path if already in coroutine context */
         bdrv_get_block_status_above_co_entry(&data);
     } else {
-        co = qemu_coroutine_create(bdrv_get_block_status_above_co_entry,
+        co = bdrv_coroutine_create(bs, bdrv_get_block_status_above_co_entry,
                                    &data);
         qemu_coroutine_enter(co);
         BDRV_POLL_WHILE(bs, !data.done);
@@ -2002,7 +2002,7 @@ bdrv_rw_vmstate(BlockDriverState *bs, QEMUIOVector *qiov, int64_t pos,
             .is_read    = is_read,
             .ret        = -EINPROGRESS,
         };
-        Coroutine *co = qemu_coroutine_create(bdrv_co_rw_vmstate_entry, &data);
+        Coroutine *co = bdrv_coroutine_create(bs, bdrv_co_rw_vmstate_entry, &data);
 
         qemu_coroutine_enter(co);
         while (data.ret == -EINPROGRESS) {
@@ -2220,7 +2220,7 @@ static BlockAIOCB *bdrv_co_aio_prw_vector(BdrvChild *child,
     acb->req.flags = flags;
     acb->is_write = is_write;
 
-    co = qemu_coroutine_create(bdrv_co_do_rw, acb);
+    co = bdrv_coroutine_create(child->bs, bdrv_co_do_rw, acb);
     qemu_coroutine_enter(co);
 
     bdrv_co_maybe_schedule_bh(acb);
@@ -2251,7 +2251,7 @@ BlockAIOCB *bdrv_aio_flush(BlockDriverState *bs,
     acb->need_bh = true;
     acb->req.error = -EINPROGRESS;
 
-    co = qemu_coroutine_create(bdrv_aio_flush_co_entry, acb);
+    co = bdrv_coroutine_create(bs, bdrv_aio_flush_co_entry, acb);
     qemu_coroutine_enter(co);
 
     bdrv_co_maybe_schedule_bh(acb);
@@ -2384,7 +2384,7 @@ int bdrv_flush(BlockDriverState *bs)
         /* Fast-path if already in coroutine context */
         bdrv_flush_co_entry(&flush_co);
     } else {
-        co = qemu_coroutine_create(bdrv_flush_co_entry, &flush_co);
+        co = bdrv_coroutine_create(bs, bdrv_flush_co_entry, &flush_co);
         qemu_coroutine_enter(co);
         BDRV_POLL_WHILE(bs, flush_co.ret == NOT_DONE);
     }
@@ -2531,7 +2531,7 @@ int bdrv_pdiscard(BlockDriverState *bs, int64_t offset, int count)
         /* Fast-path if already in coroutine context */
         bdrv_pdiscard_co_entry(&rwco);
     } else {
-        co = qemu_coroutine_create(bdrv_pdiscard_co_entry, &rwco);
+        co = bdrv_coroutine_create(bs, bdrv_pdiscard_co_entry, &rwco);
         qemu_coroutine_enter(co);
         BDRV_POLL_WHILE(bs, rwco.ret == NOT_DONE);
     }
diff --git a/block/nbd-client.c b/block/nbd-client.c
index 1e2952f..526e56b 100644
--- a/block/nbd-client.c
+++ b/block/nbd-client.c
@@ -421,7 +421,7 @@ int nbd_client_init(BlockDriverState *bs,
     /* Now that we're connected, set the socket to be non-blocking and
      * kick the reply mechanism.  */
     qio_channel_set_blocking(QIO_CHANNEL(sioc), false, NULL);
-    client->read_reply_co = qemu_coroutine_create(nbd_read_reply_entry, client);
+    client->read_reply_co = bdrv_coroutine_create(bs, nbd_read_reply_entry, client);
     nbd_client_attach_aio_context(bs, bdrv_get_aio_context(bs));
 
     logout("Established connection with NBD server\n");
diff --git a/block/quorum.c b/block/quorum.c
index 40205fb..b34e7eb 100644
--- a/block/quorum.c
+++ b/block/quorum.c
@@ -317,7 +317,7 @@ static bool quorum_rewrite_bad_versions(QuorumAIOCB *acb,
                 .idx = item->index,
             };
 
-            co = qemu_coroutine_create(quorum_rewrite_entry, &data);
+            co = bdrv_coroutine_create(acb->bs, quorum_rewrite_entry, &data);
             qemu_coroutine_enter(co);
         }
     }
@@ -625,7 +625,7 @@ static int read_quorum_children(QuorumAIOCB *acb)
             .idx = i,
         };
 
-        co = qemu_coroutine_create(read_quorum_children_entry, &data);
+        co = bdrv_coroutine_create(acb->bs, read_quorum_children_entry, &data);
         qemu_coroutine_enter(co);
     }
 
@@ -730,7 +730,7 @@ static int quorum_co_pwritev(BlockDriverState *bs, uint64_t offset,
             .idx = i,
         };
 
-        co = qemu_coroutine_create(write_quorum_entry, &data);
+        co = bdrv_coroutine_create(bs, write_quorum_entry, &data);
         qemu_coroutine_enter(co);
     }
 
diff --git a/block/sheepdog.c b/block/sheepdog.c
index 1b71fc8..bb2feca 100644
--- a/block/sheepdog.c
+++ b/block/sheepdog.c
@@ -734,7 +734,7 @@ static int do_req(int sockfd, BlockDriverState *bs, SheepdogReq *hdr,
     if (qemu_in_coroutine()) {
         do_co_req(&srco);
     } else {
-        co = qemu_coroutine_create(do_co_req, &srco);
+        co = bdrv_coroutine_create(bs, do_co_req, &srco);
         if (bs) {
             qemu_coroutine_enter(co);
             BDRV_POLL_WHILE(bs, !srco.finished);
@@ -939,7 +939,7 @@ static void co_read_response(void *opaque)
     BDRVSheepdogState *s = opaque;
 
     if (!s->co_recv) {
-        s->co_recv = qemu_coroutine_create(aio_read_response, opaque);
+        s->co_recv = bdrv_coroutine_create(s->bs, aio_read_response, opaque);
     }
 
     aio_co_wake(s->co_recv);
diff --git a/blockjob.c b/blockjob.c
index 9b619f385..487920f 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -286,7 +286,7 @@ void block_job_start(BlockJob *job)
 {
     assert(job && !block_job_started(job) && job->paused &&
            job->driver && job->driver->start);
-    job->co = qemu_coroutine_create(block_job_co_entry, job);
+    job->co = bdrv_coroutine_create(blk_bs(job->blk), block_job_co_entry, job);
     job->pause_count--;
     job->busy = true;
     job->paused = false;
diff --git a/hw/9pfs/9p.c b/hw/9pfs/9p.c
index c80ba67..5ad4bc7 100644
--- a/hw/9pfs/9p.c
+++ b/hw/9pfs/9p.c
@@ -3462,7 +3462,7 @@ void pdu_submit(V9fsPDU *pdu)
     if (is_ro_export(&s->ctx) && !is_read_only_op(pdu)) {
         handler = v9fs_fs_ro;
     }
-    co = qemu_coroutine_create(handler, pdu);
+    co = qemu_coroutine_create(qemu_get_aio_context(), handler, pdu);
     qemu_coroutine_enter(co);
 }
 
@@ -3595,7 +3595,7 @@ void v9fs_reset(V9fsState *s)
         aio_poll(qemu_get_aio_context(), true);
     }
 
-    co = qemu_coroutine_create(virtfs_co_reset, &data);
+    co = qemu_coroutine_create(qemu_get_aio_context(), virtfs_co_reset, &data);
     qemu_coroutine_enter(co);
 
     while (!data.done) {
diff --git a/include/block/block.h b/include/block/block.h
index 5149260..5dc06bf 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -555,6 +555,9 @@ bool bdrv_debug_is_suspended(BlockDriverState *bs, const char *tag);
  */
 AioContext *bdrv_get_aio_context(BlockDriverState *bs);
 
+Coroutine *bdrv_coroutine_create(BlockDriverState *bs,
+                                 CoroutineEntry *entry, void *opaque);
+
 /**
  * bdrv_set_aio_context:
  *
diff --git a/include/qemu/coroutine.h b/include/qemu/coroutine.h
index e60beaf..57ee53d 100644
--- a/include/qemu/coroutine.h
+++ b/include/qemu/coroutine.h
@@ -63,7 +63,8 @@ typedef void coroutine_fn CoroutineEntry(void *opaque);
  * Use qemu_coroutine_enter() to actually transfer control to the coroutine.
  * The opaque argument is passed as the argument to the entry point.
  */
-Coroutine *qemu_coroutine_create(CoroutineEntry *entry, void *opaque);
+Coroutine *qemu_coroutine_create(AioContext *ctx,
+                                 CoroutineEntry *entry, void *opaque);
 
 /**
  * Transfer control to a coroutine
diff --git a/include/qemu/main-loop.h b/include/qemu/main-loop.h
index d7e24af..6680823 100644
--- a/include/qemu/main-loop.h
+++ b/include/qemu/main-loop.h
@@ -68,7 +68,7 @@ int qemu_init_main_loop(Error **errp);
  *     }
  *
  *     ...
- *     QEMUCoroutine *co = qemu_coroutine_create(coroutine_entry, NULL);
+ *     QEMUCoroutine *co = qemu_coroutine_create(ctx, coroutine_entry, NULL);
  *     QEMUBH *start_bh = qemu_bh_new(enter_co_bh, co);
  *     qemu_bh_schedule(start_bh);
  *     while (...) {
diff --git a/migration/migration.c b/migration/migration.c
index 54060f7..7a9a1c6 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -445,7 +445,8 @@ static void process_incoming_migration_co(void *opaque)
 
 void migration_fd_process_incoming(QEMUFile *f)
 {
-    Coroutine *co = qemu_coroutine_create(process_incoming_migration_co, f);
+    Coroutine *co = qemu_coroutine_create(qemu_get_aio_context(),
+                                          process_incoming_migration_co, f);
 
     migrate_decompress_threads_create();
     qemu_file_set_blocking(f, false);
diff --git a/nbd/server.c b/nbd/server.c
index 924a1fe..ee639ab 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -1363,7 +1363,8 @@ static void nbd_client_receive_next_request(NBDClient *client)
 {
     if (!client->recv_coroutine && client->nb_requests < MAX_NBD_REQUESTS) {
         nbd_client_get(client);
-        client->recv_coroutine = qemu_coroutine_create(nbd_trip, client);
+        client->recv_coroutine = qemu_coroutine_create(client->exp->ctx,
+                                                       nbd_trip, client);
         aio_co_schedule(client->exp->ctx, client->recv_coroutine);
     }
 }
@@ -1417,6 +1418,7 @@ void nbd_client_new(NBDExport *exp,
     client->close = close_fn;
 
     data->client = client;
-    data->co = qemu_coroutine_create(nbd_co_client_start, data);
+    data->co = qemu_coroutine_create(exp ? exp->ctx : qemu_get_aio_context(),
+                                     nbd_co_client_start, data);
     qemu_coroutine_enter(data->co);
 }
diff --git a/qemu-img.c b/qemu-img.c
index b220cf7..9b3dd98 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -1894,7 +1894,8 @@ static int convert_do_copy(ImgConvertState *s)
 
     qemu_co_mutex_init(&s->lock);
     for (i = 0; i < s->num_coroutines; i++) {
-        s->co[i] = qemu_coroutine_create(convert_co_do_copy, s);
+        s->co[i] = qemu_coroutine_create(qemu_get_aio_context(),
+                                         convert_co_do_copy, s);
         s->wait_sector_num[i] = -1;
         qemu_coroutine_enter(s->co[i]);
     }
diff --git a/qemu-io-cmds.c b/qemu-io-cmds.c
index 883f53b..31944bb 100644
--- a/qemu-io-cmds.c
+++ b/qemu-io-cmds.c
@@ -520,7 +520,8 @@ static int do_co_pwrite_zeroes(BlockBackend *blk, int64_t offset,
         return -ERANGE;
     }
 
-    co = qemu_coroutine_create(co_pwrite_zeroes_entry, &data);
+    co = qemu_coroutine_create(qemu_get_aio_context(),
+                               co_pwrite_zeroes_entry, &data);
     qemu_coroutine_enter(co);
     while (!data.done) {
         aio_poll(blk_get_aio_context(blk), true);
diff --git a/tests/test-aio-multithread.c b/tests/test-aio-multithread.c
index 549d784..325dded 100644
--- a/tests/test-aio-multithread.c
+++ b/tests/test-aio-multithread.c
@@ -168,7 +168,8 @@ static void test_multi_co_schedule(int seconds)
 
     create_aio_contexts();
     for (i = 0; i < NUM_CONTEXTS; i++) {
-        Coroutine *co1 = qemu_coroutine_create(test_multi_co_schedule_entry, NULL);
+        Coroutine *co1 = qemu_coroutine_create(ctx[i],
+                                               test_multi_co_schedule_entry, NULL);
         aio_co_schedule(ctx[i], co1);
     }
 
@@ -233,7 +234,8 @@ static void test_multi_co_mutex(int threads, int seconds)
     assert(threads <= NUM_CONTEXTS);
     running = threads;
     for (i = 0; i < threads; i++) {
-        Coroutine *co1 = qemu_coroutine_create(test_multi_co_mutex_entry, NULL);
+        Coroutine *co1 = qemu_coroutine_create(ctx[i],
+                                               test_multi_co_mutex_entry, NULL);
         aio_co_schedule(ctx[i], co1);
     }
 
@@ -352,7 +354,8 @@ static void test_multi_fair_mutex(int threads, int seconds)
     assert(threads <= NUM_CONTEXTS);
     running = threads;
     for (i = 0; i < threads; i++) {
-        Coroutine *co1 = qemu_coroutine_create(test_multi_fair_mutex_entry, NULL);
+        Coroutine *co1 = qemu_coroutine_create(ctx[i],
+                                               test_multi_fair_mutex_entry, NULL);
         aio_co_schedule(ctx[i], co1);
     }
 
@@ -408,7 +411,8 @@ static void test_multi_mutex(int threads, int seconds)
     assert(threads <= NUM_CONTEXTS);
     running = threads;
     for (i = 0; i < threads; i++) {
-        Coroutine *co1 = qemu_coroutine_create(test_multi_mutex_entry, NULL);
+        Coroutine *co1 = qemu_coroutine_create(ctx[i],
+                                               test_multi_mutex_entry, NULL);
         aio_co_schedule(ctx[i], co1);
     }
 
diff --git a/tests/test-coroutine.c b/tests/test-coroutine.c
index abd97c2..12a6575 100644
--- a/tests/test-coroutine.c
+++ b/tests/test-coroutine.c
@@ -14,6 +14,7 @@
 #include "qemu/osdep.h"
 #include "qemu/coroutine.h"
 #include "qemu/coroutine_int.h"
+#include "qemu/main-loop.h"
 
 /*
  * Check that qemu_in_coroutine() works
@@ -30,7 +31,8 @@ static void test_in_coroutine(void)
 
     g_assert(!qemu_in_coroutine());
 
-    coroutine = qemu_coroutine_create(verify_in_coroutine, NULL);
+    coroutine = qemu_coroutine_create(qemu_get_aio_context(),
+                                      verify_in_coroutine, NULL);
     qemu_coroutine_enter(coroutine);
 }
 
@@ -48,7 +50,8 @@ static void test_self(void)
 {
     Coroutine *coroutine;
 
-    coroutine = qemu_coroutine_create(verify_self, &coroutine);
+    coroutine = qemu_coroutine_create(qemu_get_aio_context(),
+                                      verify_self, &coroutine);
     qemu_coroutine_enter(coroutine);
 }
 
@@ -77,7 +80,8 @@ static void coroutine_fn verify_entered_step_1(void *opaque)
 
     g_assert(qemu_coroutine_entered(self));
 
-    coroutine = qemu_coroutine_create(verify_entered_step_2, self);
+    coroutine = qemu_coroutine_create(qemu_get_aio_context(),
+                                      verify_entered_step_2, self);
     g_assert(!qemu_coroutine_entered(coroutine));
     qemu_coroutine_enter(coroutine);
     g_assert(!qemu_coroutine_entered(coroutine));
@@ -88,7 +92,8 @@ static void test_entered(void)
 {
     Coroutine *coroutine;
 
-    coroutine = qemu_coroutine_create(verify_entered_step_1, NULL);
+    coroutine = qemu_coroutine_create(qemu_get_aio_context(),
+                                      verify_entered_step_1, NULL);
     g_assert(!qemu_coroutine_entered(coroutine));
     qemu_coroutine_enter(coroutine);
 }
@@ -112,7 +117,8 @@ static void coroutine_fn nest(void *opaque)
     if (nd->n_enter < nd->max) {
         Coroutine *child;
 
-        child = qemu_coroutine_create(nest, nd);
+        child = qemu_coroutine_create(qemu_get_aio_context(),
+                                      nest, nd);
         qemu_coroutine_enter(child);
     }
 
@@ -128,7 +134,8 @@ static void test_nesting(void)
         .max      = 128,
     };
 
-    root = qemu_coroutine_create(nest, &nd);
+    root = qemu_coroutine_create(qemu_get_aio_context(),
+                                 nest, &nd);
     qemu_coroutine_enter(root);
 
     /* Must enter and return from max nesting level */
@@ -157,7 +164,8 @@ static void test_yield(void)
     bool done = false;
     int i = -1; /* one extra time to return from coroutine */
 
-    coroutine = qemu_coroutine_create(yield_5_times, &done);
+    coroutine = qemu_coroutine_create(qemu_get_aio_context(),
+                                      yield_5_times, &done);
     while (!done) {
         qemu_coroutine_enter(coroutine);
         i++;
@@ -182,8 +190,11 @@ static void test_co_queue(void)
     Coroutine *c2;
     Coroutine tmp;
 
-    c2 = qemu_coroutine_create(c2_fn, NULL);
-    c1 = qemu_coroutine_create(c1_fn, c2);
+    qemu_init_main_loop(NULL);
+    c2 = qemu_coroutine_create(qemu_get_aio_context(),
+                               c2_fn, NULL);
+    c1 = qemu_coroutine_create(qemu_get_aio_context(),
+                               c1_fn, c2);
 
     qemu_coroutine_enter(c1);
 
@@ -213,13 +224,15 @@ static void test_lifecycle(void)
     bool done = false;
 
     /* Create, enter, and return from coroutine */
-    coroutine = qemu_coroutine_create(set_and_exit, &done);
+    coroutine = qemu_coroutine_create(qemu_get_aio_context(),
+                                      set_and_exit, &done);
     qemu_coroutine_enter(coroutine);
     g_assert(done); /* expect done to be true (first time) */
 
     /* Repeat to check that no state affects this test */
     done = false;
-    coroutine = qemu_coroutine_create(set_and_exit, &done);
+    coroutine = qemu_coroutine_create(qemu_get_aio_context(),
+                                      set_and_exit, &done);
     qemu_coroutine_enter(coroutine);
     g_assert(done); /* expect done to be true (second time) */
 }
@@ -254,7 +267,8 @@ static void do_order_test(void)
 {
     Coroutine *co;
 
-    co = qemu_coroutine_create(co_order_test, NULL);
+    co = qemu_coroutine_create(qemu_get_aio_context(),
+                               co_order_test, NULL);
     record_push(1, 1);
     qemu_coroutine_enter(co);
     record_push(1, 2);
@@ -296,7 +310,8 @@ static void perf_lifecycle(void)
 
     g_test_timer_start();
     for (i = 0; i < max; i++) {
-        coroutine = qemu_coroutine_create(empty_coroutine, NULL);
+        coroutine = qemu_coroutine_create(qemu_get_aio_context(),
+                                          empty_coroutine, NULL);
         qemu_coroutine_enter(coroutine);
     }
     duration = g_test_timer_elapsed();
@@ -320,7 +335,8 @@ static void perf_nesting(void)
             .n_return = 0,
             .max      = maxnesting,
         };
-        root = qemu_coroutine_create(nest, &nd);
+        root = qemu_coroutine_create(qemu_get_aio_context(),
+                                     nest, &nd);
         qemu_coroutine_enter(root);
     }
     duration = g_test_timer_elapsed();
@@ -350,7 +366,8 @@ static void perf_yield(void)
 
     maxcycles = 100000000;
     i = maxcycles;
-    Coroutine *coroutine = qemu_coroutine_create(yield_loop, &i);
+    Coroutine *coroutine = qemu_coroutine_create(qemu_get_aio_context(),
+                                                 yield_loop, &i);
 
     g_test_timer_start();
     while (i > 0) {
@@ -400,7 +417,8 @@ static void perf_cost(void)
 
     g_test_timer_start();
     while (i++ < maxcycles) {
-        co = qemu_coroutine_create(perf_cost_func, &i);
+        co = qemu_coroutine_create(qemu_get_aio_context(),
+                                   perf_cost_func, &i);
         qemu_coroutine_enter(co);
         qemu_coroutine_enter(co);
     }
diff --git a/tests/test-thread-pool.c b/tests/test-thread-pool.c
index 91b4ec5..86d993a 100644
--- a/tests/test-thread-pool.c
+++ b/tests/test-thread-pool.c
@@ -92,7 +92,8 @@ static void co_test_cb(void *opaque)
 static void test_submit_co(void)
 {
     WorkerTestData data;
-    Coroutine *co = qemu_coroutine_create(co_test_cb, &data);
+    Coroutine *co = qemu_coroutine_create(qemu_get_aio_context(),
+                                          co_test_cb, &data);
 
     qemu_coroutine_enter(co);
 
diff --git a/util/qemu-coroutine.c b/util/qemu-coroutine.c
index 72412e5..690d5a4 100644
--- a/util/qemu-coroutine.c
+++ b/util/qemu-coroutine.c
@@ -43,7 +43,8 @@ static void coroutine_pool_cleanup(Notifier *n, void *value)
     }
 }
 
-Coroutine *qemu_coroutine_create(CoroutineEntry *entry, void *opaque)
+Coroutine *qemu_coroutine_create(AioContext *ctx,
+                                 CoroutineEntry *entry, void *opaque)
 {
     Coroutine *co = NULL;
 
@@ -78,6 +79,7 @@ Coroutine *qemu_coroutine_create(CoroutineEntry *entry, void *opaque)
 
     co->entry = entry;
     co->entry_arg = opaque;
+    co->ctx = ctx;
     QSIMPLEQ_INIT(&co->co_queue_wakeup);
     return co;
 }
@@ -107,6 +109,7 @@ void qemu_coroutine_enter(Coroutine *co)
     Coroutine *self = qemu_coroutine_self();
     CoroutineAction ret;
 
+    assert(co->ctx);
     trace_qemu_coroutine_enter(self, co, co->entry_arg);
 
     if (co->caller) {
@@ -115,7 +118,6 @@ void qemu_coroutine_enter(Coroutine *co)
     }
 
     co->caller = self;
-    co->ctx = qemu_get_current_aio_context();
 
     /* Store co->ctx before anything that stores co.  Matches
      * barrier in aio_co_wake and qemu_co_mutex_wake.
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [PATCH for-2.9 1/5] block: Fix unpaired aio_disable_external in external snapshot
  2017-04-06 14:25 ` [Qemu-devel] [PATCH for-2.9 1/5] block: Fix unpaired aio_disable_external in external snapshot Fam Zheng
@ 2017-04-06 14:36   ` Eric Blake
  2017-04-06 14:36   ` Kevin Wolf
  1 sibling, 0 replies; 21+ messages in thread
From: Eric Blake @ 2017-04-06 14:36 UTC (permalink / raw)
  To: Fam Zheng, qemu-devel
  Cc: Kevin Wolf, qemu-block, Max Reitz, Ed Swierk, Stefan Hajnoczi,
	Paolo Bonzini

[-- Attachment #1: Type: text/plain, Size: 675 bytes --]

On 04/06/2017 09:25 AM, Fam Zheng wrote:
> bdrv_replace_child_noperm tries to hand over the quiesce_counter state
> from old bs to the new one, but if they are not on the same aio context
> this causes unbalance.
> 
> Fix this by forcing callers to move the aio context first, and assert
> it.
> 
> Reported-by: Ed Swierk <eswierk@skyportsystems.com>
> Signed-off-by: Fam Zheng <famz@redhat.com>
> ---
>  block.c    | 3 +++
>  blockdev.c | 4 ++--
>  2 files changed, 5 insertions(+), 2 deletions(-)
> 

Reviewed-by: Eric Blake <eblake@redhat.com>

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [PATCH for-2.9 1/5] block: Fix unpaired aio_disable_external in external snapshot
  2017-04-06 14:25 ` [Qemu-devel] [PATCH for-2.9 1/5] block: Fix unpaired aio_disable_external in external snapshot Fam Zheng
  2017-04-06 14:36   ` Eric Blake
@ 2017-04-06 14:36   ` Kevin Wolf
  2017-04-06 23:38     ` Fam Zheng
  1 sibling, 1 reply; 21+ messages in thread
From: Kevin Wolf @ 2017-04-06 14:36 UTC (permalink / raw)
  To: Fam Zheng
  Cc: qemu-devel, Paolo Bonzini, qemu-block, Ed Swierk, Max Reitz,
	Stefan Hajnoczi

Am 06.04.2017 um 16:25 hat Fam Zheng geschrieben:
> bdrv_replace_child_noperm tries to hand over the quiesce_counter state
> from old bs to the new one, but if they are not on the same aio context
> this causes unbalance.
> 
> Fix this by forcing callers to move the aio context first, and assert
> it.
> 
> Reported-by: Ed Swierk <eswierk@skyportsystems.com>
> Signed-off-by: Fam Zheng <famz@redhat.com>
> ---
>  block.c    | 3 +++
>  blockdev.c | 4 ++--
>  2 files changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/block.c b/block.c
> index 927ba89..8893ac1 100644
> --- a/block.c
> +++ b/block.c
> @@ -1752,6 +1752,9 @@ static void bdrv_replace_child_noperm(BdrvChild *child,
>  {
>      BlockDriverState *old_bs = child->bs;
>  
> +    if (old_bs && new_bs) {
> +        assert(bdrv_get_aio_context(old_bs) == bdrv_get_aio_context(new_bs));
> +    }
>      if (old_bs) {
>          if (old_bs->quiesce_counter && child->role->drained_end) {
>              child->role->drained_end(child);

Can we move this to a separate patch and also add this hunk for
asserting the same thing for newly attached child nodes:

@@ -1852,6 +1855,7 @@ BdrvChild *bdrv_attach_child(BlockDriverState *parent_bs,
     bdrv_get_cumulative_perm(parent_bs, &perm, &shared_perm);
 
     assert(parent_bs->drv);
+    assert(bdrv_get_aio_context(parent_bs) == bdrv_get_aio_context(child_bs));
     parent_bs->drv->bdrv_child_perm(parent_bs, NULL, child_role,
                                     perm, shared_perm, &perm, &shared_perm);

Kevin

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [PATCH for-2.9 2/5] mirror: Fix aio context of mirror_top_bs
  2017-04-06 14:25 ` [Qemu-devel] [PATCH for-2.9 2/5] mirror: Fix aio context of mirror_top_bs Fam Zheng
@ 2017-04-06 14:36   ` Eric Blake
  2017-04-06 14:37   ` Kevin Wolf
  1 sibling, 0 replies; 21+ messages in thread
From: Eric Blake @ 2017-04-06 14:36 UTC (permalink / raw)
  To: Fam Zheng, qemu-devel
  Cc: Kevin Wolf, qemu-block, Max Reitz, Ed Swierk, Stefan Hajnoczi,
	Paolo Bonzini

[-- Attachment #1: Type: text/plain, Size: 961 bytes --]

On 04/06/2017 09:25 AM, Fam Zheng wrote:
> It should be moved to the same context as source, before inserting to the
> graph.
> 
> Signed-off-by: Fam Zheng <famz@redhat.com>
> ---
>  block/mirror.c | 1 +
>  1 file changed, 1 insertion(+)

Reviewed-by: Eric Blake <eblake@redhat.com>

> 
> diff --git a/block/mirror.c b/block/mirror.c
> index 9e2fecc..e904fef 100644
> --- a/block/mirror.c
> +++ b/block/mirror.c
> @@ -1148,6 +1148,7 @@ static void mirror_start_job(const char *job_id, BlockDriverState *bs,
>          return;
>      }
>      mirror_top_bs->total_sectors = bs->total_sectors;
> +    bdrv_set_aio_context(mirror_top_bs, bdrv_get_aio_context(bs));
>  
>      /* bdrv_append takes ownership of the mirror_top_bs reference, need to keep
>       * it alive until block_job_create() even if bs has no parent. */
> 

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [PATCH for-2.9 2/5] mirror: Fix aio context of mirror_top_bs
  2017-04-06 14:25 ` [Qemu-devel] [PATCH for-2.9 2/5] mirror: Fix aio context of mirror_top_bs Fam Zheng
  2017-04-06 14:36   ` Eric Blake
@ 2017-04-06 14:37   ` Kevin Wolf
  1 sibling, 0 replies; 21+ messages in thread
From: Kevin Wolf @ 2017-04-06 14:37 UTC (permalink / raw)
  To: Fam Zheng
  Cc: qemu-devel, Paolo Bonzini, qemu-block, Ed Swierk, Max Reitz,
	Stefan Hajnoczi

Am 06.04.2017 um 16:25 hat Fam Zheng geschrieben:
> It should be moved to the same context as source, before inserting to the
> graph.
> 
> Signed-off-by: Fam Zheng <famz@redhat.com>

Reviewed-by: Kevin Wolf <kwolf@redhat.com>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [PATCH for-2.9 3/5] block: Quiesce old aio context during bdrv_set_aio_context
  2017-04-06 14:25 ` [Qemu-devel] [PATCH for-2.9 3/5] block: Quiesce old aio context during bdrv_set_aio_context Fam Zheng
@ 2017-04-06 15:00   ` Eric Blake
  2017-04-06 15:17   ` Kevin Wolf
  1 sibling, 0 replies; 21+ messages in thread
From: Eric Blake @ 2017-04-06 15:00 UTC (permalink / raw)
  To: Fam Zheng, qemu-devel
  Cc: Kevin Wolf, qemu-block, Max Reitz, Ed Swierk, Stefan Hajnoczi,
	Paolo Bonzini

[-- Attachment #1: Type: text/plain, Size: 704 bytes --]

On 04/06/2017 09:25 AM, Fam Zheng wrote:
> The fact that the bs->aio_context is changing can confuse the dataplane
> iothread, because of the now fine granularity aio context lock.
> bdrv_drain should rather be a bdrv_drained_begin/end pair, but since
> bs->aio_context is changing, we can just use aio_disable_external and
> block_job_pause.
> 
> Reported-by: Ed Swierk <eswierk@skyportsystems.com>
> Signed-off-by: Fam Zheng <famz@redhat.com>
> ---
>  block.c | 11 +++++++++--
>  1 file changed, 9 insertions(+), 2 deletions(-)
> 

Reviewed-by: Eric Blake <eblake@redhat.com>

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [PATCH for-2.9 4/5] block: Drain BH in bdrv_drained_begin
  2017-04-06 14:25 ` [Qemu-devel] [PATCH for-2.9 4/5] block: Drain BH in bdrv_drained_begin Fam Zheng
@ 2017-04-06 15:10   ` Eric Blake
  2017-04-06 15:37   ` Kevin Wolf
  1 sibling, 0 replies; 21+ messages in thread
From: Eric Blake @ 2017-04-06 15:10 UTC (permalink / raw)
  To: Fam Zheng, qemu-devel
  Cc: Kevin Wolf, qemu-block, Max Reitz, Ed Swierk, Stefan Hajnoczi,
	Paolo Bonzini

[-- Attachment #1: Type: text/plain, Size: 1413 bytes --]

On 04/06/2017 09:25 AM, Fam Zheng wrote:
> During block job completion, nothing is preventing
> block_job_defer_to_main_loop_bh from being called in a nested
> aio_poll(), which is a trouble, such as in this code path:
> 
>     qmp_block_commit
>       commit_active_start
>         bdrv_reopen
>           bdrv_reopen_multiple
>             bdrv_reopen_prepare
>               bdrv_flush
>                 aio_poll
>                   aio_bh_poll
>                     aio_bh_call
>                       block_job_defer_to_main_loop_bh
>                         stream_complete
>                           bdrv_reopen
> 
> block_job_defer_to_main_loop_bh is the last step of the stream job,
> which should have been "paused" by the bdrv_drained_begin/end in
> bdrv_reopen_multiple, but it is not done because it's in the form of a
> main loop BH.
> 
> Similar to why block jobs should be paused between drained_begin and
> drained_end, BHs they schedule must be excluded as well.  To achieve
> this, this patch forces draining the BH before leaving bdrv_drained_begin().
> 
> Signed-off-by: Fam Zheng <famz@redhat.com>
> ---
>  block/io.c | 7 ++++++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 

Nice writeup.

Reviewed-by: Eric Blake <eblake@redhat.com>

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [PATCH for-2.9 3/5] block: Quiesce old aio context during bdrv_set_aio_context
  2017-04-06 14:25 ` [Qemu-devel] [PATCH for-2.9 3/5] block: Quiesce old aio context during bdrv_set_aio_context Fam Zheng
  2017-04-06 15:00   ` Eric Blake
@ 2017-04-06 15:17   ` Kevin Wolf
  2017-04-06 23:44     ` Fam Zheng
  1 sibling, 1 reply; 21+ messages in thread
From: Kevin Wolf @ 2017-04-06 15:17 UTC (permalink / raw)
  To: Fam Zheng
  Cc: qemu-devel, Paolo Bonzini, qemu-block, Ed Swierk, Max Reitz,
	Stefan Hajnoczi

Am 06.04.2017 um 16:25 hat Fam Zheng geschrieben:
> The fact that the bs->aio_context is changing can confuse the dataplane
> iothread, because of the now fine granularity aio context lock.
> bdrv_drain should rather be a bdrv_drained_begin/end pair, but since
> bs->aio_context is changing, we can just use aio_disable_external and
> block_job_pause.
> 
> Reported-by: Ed Swierk <eswierk@skyportsystems.com>
> Signed-off-by: Fam Zheng <famz@redhat.com>
> ---
>  block.c | 11 +++++++++--
>  1 file changed, 9 insertions(+), 2 deletions(-)
> 
> diff --git a/block.c b/block.c
> index 8893ac1..e70684a 100644
> --- a/block.c
> +++ b/block.c
> @@ -4395,11 +4395,14 @@ void bdrv_attach_aio_context(BlockDriverState *bs,
>  
>  void bdrv_set_aio_context(BlockDriverState *bs, AioContext *new_context)
>  {
> -    AioContext *ctx;
> +    AioContext *ctx = bdrv_get_aio_context(bs);
>  
> +    aio_disable_external(ctx);
> +    if (bs->job) {
> +        block_job_pause(bs->job);
> +    }

Even more bs->job users... :-(

But is this one actually necessary? We drain all pending BHs below, so
the job shouldn't have any requests in flight or be able to submit new
ones while we switch.

>      bdrv_drain(bs); /* ensure there are no in-flight requests */
>  
> -    ctx = bdrv_get_aio_context(bs);
>      while (aio_poll(ctx, false)) {
>          /* wait for all bottom halves to execute */
>      }
> @@ -4412,6 +4415,10 @@ void bdrv_set_aio_context(BlockDriverState *bs, AioContext *new_context)
>      aio_context_acquire(new_context);
>      bdrv_attach_aio_context(bs, new_context);
>      aio_context_release(new_context);
> +    if (bs->job) {
> +        block_job_resume(bs->job);
> +    }
> +    aio_enable_external(ctx);
>  }

The aio_disabe/enable_external() pair seems to make sense anyway.

Kevin

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [PATCH for-2.9 5/5] coroutine: Explicitly specify AioContext when creating coroutine
  2017-04-06 14:25 ` [Qemu-devel] [PATCH for-2.9 5/5] coroutine: Explicitly specify AioContext when creating coroutine Fam Zheng
@ 2017-04-06 15:34   ` Eric Blake
  2017-04-06 16:20   ` Kevin Wolf
  1 sibling, 0 replies; 21+ messages in thread
From: Eric Blake @ 2017-04-06 15:34 UTC (permalink / raw)
  To: Fam Zheng, qemu-devel
  Cc: Kevin Wolf, qemu-block, Max Reitz, Ed Swierk, Stefan Hajnoczi,
	Paolo Bonzini

[-- Attachment #1: Type: text/plain, Size: 2297 bytes --]

On 04/06/2017 09:25 AM, Fam Zheng wrote:
> Coroutine in block layer should always be waken up in bs->aio_context

s/waken up/awakened/

> rather than the "current" context where it is entered. They differ when
> the main loop is doing QMP tasks.
> 
> Race conditions happen without this patch, because the wrong context is
> acquired in co_schedule_bh_cb, while the entered coroutine works on a
> different one.
> 
> Make the block layer explicitly specify a desired context for each created
> coroutine. For the rest, always use qemu_get_aio_context().
> 
> Signed-off-by: Fam Zheng <famz@redhat.com>
> ---

The meat of the change is here (using an order file to present your diff
with the interesting changes first can aid review)...

> +++ b/util/qemu-coroutine.c
> @@ -43,7 +43,8 @@ static void coroutine_pool_cleanup(Notifier *n, void *value)
>      }
>  }
>  
> -Coroutine *qemu_coroutine_create(CoroutineEntry *entry, void *opaque)
> +Coroutine *qemu_coroutine_create(AioContext *ctx,
> +                                 CoroutineEntry *entry, void *opaque)
>  {
>      Coroutine *co = NULL;
>  
> @@ -78,6 +79,7 @@ Coroutine *qemu_coroutine_create(CoroutineEntry *entry, void *opaque)
>  
>      co->entry = entry;
>      co->entry_arg = opaque;
> +    co->ctx = ctx;
>      QSIMPLEQ_INIT(&co->co_queue_wakeup);
>      return co;
>  }
> @@ -107,6 +109,7 @@ void qemu_coroutine_enter(Coroutine *co)
>      Coroutine *self = qemu_coroutine_self();
>      CoroutineAction ret;
>  
> +    assert(co->ctx);
>      trace_qemu_coroutine_enter(self, co, co->entry_arg);
>  
>      if (co->caller) {
> @@ -115,7 +118,6 @@ void qemu_coroutine_enter(Coroutine *co)
>      }
>  
>      co->caller = self;
> -    co->ctx = qemu_get_current_aio_context();

Basically, you close the race by assigning co->ctx sooner (during
creation, rather than entering the coroutine), where non-block callers
still end up with the same context, and block callers now have a chance
to provide their desired context up front.

Makes for a big patch due to the fallout, but the result seems sane to me.

Reviewed-by: Eric Blake <eblake@redhat.com>

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [PATCH for-2.9 4/5] block: Drain BH in bdrv_drained_begin
  2017-04-06 14:25 ` [Qemu-devel] [PATCH for-2.9 4/5] block: Drain BH in bdrv_drained_begin Fam Zheng
  2017-04-06 15:10   ` Eric Blake
@ 2017-04-06 15:37   ` Kevin Wolf
  2017-04-14  9:15     ` Paolo Bonzini
  1 sibling, 1 reply; 21+ messages in thread
From: Kevin Wolf @ 2017-04-06 15:37 UTC (permalink / raw)
  To: Fam Zheng
  Cc: qemu-devel, Paolo Bonzini, qemu-block, Ed Swierk, Max Reitz,
	Stefan Hajnoczi

Am 06.04.2017 um 16:25 hat Fam Zheng geschrieben:
> During block job completion, nothing is preventing
> block_job_defer_to_main_loop_bh from being called in a nested
> aio_poll(), which is a trouble, such as in this code path:
> 
>     qmp_block_commit
>       commit_active_start
>         bdrv_reopen
>           bdrv_reopen_multiple
>             bdrv_reopen_prepare
>               bdrv_flush
>                 aio_poll
>                   aio_bh_poll
>                     aio_bh_call
>                       block_job_defer_to_main_loop_bh
>                         stream_complete
>                           bdrv_reopen
> 
> block_job_defer_to_main_loop_bh is the last step of the stream job,
> which should have been "paused" by the bdrv_drained_begin/end in
> bdrv_reopen_multiple, but it is not done because it's in the form of a
> main loop BH.
> 
> Similar to why block jobs should be paused between drained_begin and
> drained_end, BHs they schedule must be excluded as well.  To achieve
> this, this patch forces draining the BH before leaving bdrv_drained_begin().
> 
> Signed-off-by: Fam Zheng <famz@redhat.com>

We used to poll BHs in earlier times. Commit 99723548 ('block: add BDS
field to count in-flight requests') changed this, without an explanation
in the commit message.

Paolo, is this simply a bug in that commit, or do you rely on it
somewhere? I remember that you wanted to get rid of some aio_poll()
calls a while ago.

>  block/io.c | 7 ++++++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/block/io.c b/block/io.c
> index 2709a70..b9cfd18 100644
> --- a/block/io.c
> +++ b/block/io.c
> @@ -228,7 +228,12 @@ void bdrv_drained_begin(BlockDriverState *bs)
>          bdrv_parent_drained_begin(bs);
>      }
>  
> -    bdrv_drain_recurse(bs);
> +    while (true) {
> +        if (!bdrv_drain_recurse(bs) &&
> +            !aio_poll(bdrv_get_aio_context(bs), false)) {
> +                break;
> +            }

The indentation is off here.

> +    }
>  }

The old code had this in what is now the BDRV_POLL_WHILE() call in
bdrv_drain_recurse(). I think it makes more sense there, saves you a
loop here and fixes bdrv_drain_all_begin() at the same time.

Kevin

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [PATCH for-2.9 5/5] coroutine: Explicitly specify AioContext when creating coroutine
  2017-04-06 14:25 ` [Qemu-devel] [PATCH for-2.9 5/5] coroutine: Explicitly specify AioContext when creating coroutine Fam Zheng
  2017-04-06 15:34   ` Eric Blake
@ 2017-04-06 16:20   ` Kevin Wolf
  2017-04-07  8:34     ` Fam Zheng
  1 sibling, 1 reply; 21+ messages in thread
From: Kevin Wolf @ 2017-04-06 16:20 UTC (permalink / raw)
  To: Fam Zheng
  Cc: qemu-devel, Paolo Bonzini, qemu-block, Ed Swierk, Max Reitz,
	Stefan Hajnoczi

Am 06.04.2017 um 16:25 hat Fam Zheng geschrieben:
> Coroutine in block layer should always be waken up in bs->aio_context
> rather than the "current" context where it is entered. They differ when
> the main loop is doing QMP tasks.

This whole mechanism is complex stuff that I haven't quite caught up on
yet, but this change means that we probably have some code that runs
under one AioContext or the other depending on whether a CoMutex had to
yield. Waking it up in its original AioContext definitely seemed more
straightforward.

> Race conditions happen without this patch, because the wrong context is
> acquired in co_schedule_bh_cb, while the entered coroutine works on a
> different one.

If the code in co_schedule_bh_cb() assumes that coroutines can never
move between AioContexts, then probably this assumption is what really
needs to be fixed.

For example, another case where this happens is that block jobs follow
their nodes if the AioContext changes and even implement
.attached_aio_context callbacks when they need to drag additional nodes
into the new context. With your change, the job coroutine would remember
the old coroutine and move back to the old context in some cases! I
don't want to be the one to debug this kind of problems...

If we really went this way, we'd need at least a way to change the
AioContext of a coroutine after the fact, and be sure that we call it
everywhere where it's needed (it's this last part that I'm highly
doubtful about for 2.9).

In fact, I think even attempting this is insanity and we need to teach
the infrastructure to cope with coroutines that move between
AioContexts. If it's really just about acquiring the wrong context,
shouldn't then using co->ctx instead of ctx solve the problem?

> Make the block layer explicitly specify a desired context for each created
> coroutine. For the rest, always use qemu_get_aio_context().
> 
> Signed-off-by: Fam Zheng <famz@redhat.com>

> --- a/include/qemu/coroutine.h
> +++ b/include/qemu/coroutine.h
> @@ -63,7 +63,8 @@ typedef void coroutine_fn CoroutineEntry(void *opaque);
>   * Use qemu_coroutine_enter() to actually transfer control to the coroutine.
>   * The opaque argument is passed as the argument to the entry point.
>   */
> -Coroutine *qemu_coroutine_create(CoroutineEntry *entry, void *opaque);
> +Coroutine *qemu_coroutine_create(AioContext *ctx,
> +                                 CoroutineEntry *entry, void *opaque);

The new parameter could use some documentation, it's not even obvious
why a coroutine should have an AioContext.

> --- a/util/qemu-coroutine.c
> +++ b/util/qemu-coroutine.c
> @@ -115,7 +118,6 @@ void qemu_coroutine_enter(Coroutine *co)
>      }
>  
>      co->caller = self;
> -    co->ctx = qemu_get_current_aio_context();
>  
>      /* Store co->ctx before anything that stores co.  Matches
>       * barrier in aio_co_wake and qemu_co_mutex_wake.
       */
       smp_wmb();

The comment suggests that the barrier can go away if you don't set
co->ctx any more.

Kevin

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [PATCH for-2.9 1/5] block: Fix unpaired aio_disable_external in external snapshot
  2017-04-06 14:36   ` Kevin Wolf
@ 2017-04-06 23:38     ` Fam Zheng
  0 siblings, 0 replies; 21+ messages in thread
From: Fam Zheng @ 2017-04-06 23:38 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: qemu-devel, Paolo Bonzini, qemu-block, Ed Swierk, Max Reitz,
	Stefan Hajnoczi

On Thu, 04/06 16:36, Kevin Wolf wrote:
> Am 06.04.2017 um 16:25 hat Fam Zheng geschrieben:
> > bdrv_replace_child_noperm tries to hand over the quiesce_counter state
> > from old bs to the new one, but if they are not on the same aio context
> > this causes unbalance.
> > 
> > Fix this by forcing callers to move the aio context first, and assert
> > it.
> > 
> > Reported-by: Ed Swierk <eswierk@skyportsystems.com>
> > Signed-off-by: Fam Zheng <famz@redhat.com>
> > ---
> >  block.c    | 3 +++
> >  blockdev.c | 4 ++--
> >  2 files changed, 5 insertions(+), 2 deletions(-)
> > 
> > diff --git a/block.c b/block.c
> > index 927ba89..8893ac1 100644
> > --- a/block.c
> > +++ b/block.c
> > @@ -1752,6 +1752,9 @@ static void bdrv_replace_child_noperm(BdrvChild *child,
> >  {
> >      BlockDriverState *old_bs = child->bs;
> >  
> > +    if (old_bs && new_bs) {
> > +        assert(bdrv_get_aio_context(old_bs) == bdrv_get_aio_context(new_bs));
> > +    }
> >      if (old_bs) {
> >          if (old_bs->quiesce_counter && child->role->drained_end) {
> >              child->role->drained_end(child);
> 
> Can we move this to a separate patch and also add this hunk for
> asserting the same thing for newly attached child nodes:
> 
> @@ -1852,6 +1855,7 @@ BdrvChild *bdrv_attach_child(BlockDriverState *parent_bs,
>      bdrv_get_cumulative_perm(parent_bs, &perm, &shared_perm);
>  
>      assert(parent_bs->drv);
> +    assert(bdrv_get_aio_context(parent_bs) == bdrv_get_aio_context(child_bs));
>      parent_bs->drv->bdrv_child_perm(parent_bs, NULL, child_role,
>                                      perm, shared_perm, &perm, &shared_perm);

OK, will do.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [PATCH for-2.9 3/5] block: Quiesce old aio context during bdrv_set_aio_context
  2017-04-06 15:17   ` Kevin Wolf
@ 2017-04-06 23:44     ` Fam Zheng
  2017-04-07  8:15       ` Kevin Wolf
  0 siblings, 1 reply; 21+ messages in thread
From: Fam Zheng @ 2017-04-06 23:44 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: qemu-devel, Paolo Bonzini, qemu-block, Ed Swierk, Max Reitz,
	Stefan Hajnoczi

On Thu, 04/06 17:17, Kevin Wolf wrote:
> Am 06.04.2017 um 16:25 hat Fam Zheng geschrieben:
> > The fact that the bs->aio_context is changing can confuse the dataplane
> > iothread, because of the now fine granularity aio context lock.
> > bdrv_drain should rather be a bdrv_drained_begin/end pair, but since
> > bs->aio_context is changing, we can just use aio_disable_external and
> > block_job_pause.
> > 
> > Reported-by: Ed Swierk <eswierk@skyportsystems.com>
> > Signed-off-by: Fam Zheng <famz@redhat.com>
> > ---
> >  block.c | 11 +++++++++--
> >  1 file changed, 9 insertions(+), 2 deletions(-)
> > 
> > diff --git a/block.c b/block.c
> > index 8893ac1..e70684a 100644
> > --- a/block.c
> > +++ b/block.c
> > @@ -4395,11 +4395,14 @@ void bdrv_attach_aio_context(BlockDriverState *bs,
> >  
> >  void bdrv_set_aio_context(BlockDriverState *bs, AioContext *new_context)
> >  {
> > -    AioContext *ctx;
> > +    AioContext *ctx = bdrv_get_aio_context(bs);
> >  
> > +    aio_disable_external(ctx);
> > +    if (bs->job) {
> > +        block_job_pause(bs->job);
> > +    }
> 
> Even more bs->job users... :-(
> 
> But is this one actually necessary? We drain all pending BHs below, so
> the job shouldn't have any requests in flight or be able to submit new
> ones while we switch.

I'm not 100% sure, but I think the aio_poll() loop below can still fire
co_sleep_cb if we don't do block_job_pause()?

> 
> >      bdrv_drain(bs); /* ensure there are no in-flight requests */
> >  
> > -    ctx = bdrv_get_aio_context(bs);
> >      while (aio_poll(ctx, false)) {
> >          /* wait for all bottom halves to execute */
> >      }
> > @@ -4412,6 +4415,10 @@ void bdrv_set_aio_context(BlockDriverState *bs, AioContext *new_context)
> >      aio_context_acquire(new_context);
> >      bdrv_attach_aio_context(bs, new_context);
> >      aio_context_release(new_context);
> > +    if (bs->job) {
> > +        block_job_resume(bs->job);
> > +    }
> > +    aio_enable_external(ctx);
> >  }
> 
> The aio_disabe/enable_external() pair seems to make sense anyway.
> 
> Kevin

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [PATCH for-2.9 3/5] block: Quiesce old aio context during bdrv_set_aio_context
  2017-04-06 23:44     ` Fam Zheng
@ 2017-04-07  8:15       ` Kevin Wolf
  0 siblings, 0 replies; 21+ messages in thread
From: Kevin Wolf @ 2017-04-07  8:15 UTC (permalink / raw)
  To: Fam Zheng
  Cc: qemu-devel, Paolo Bonzini, qemu-block, Ed Swierk, Max Reitz,
	Stefan Hajnoczi

Am 07.04.2017 um 01:44 hat Fam Zheng geschrieben:
> On Thu, 04/06 17:17, Kevin Wolf wrote:
> > Am 06.04.2017 um 16:25 hat Fam Zheng geschrieben:
> > > The fact that the bs->aio_context is changing can confuse the dataplane
> > > iothread, because of the now fine granularity aio context lock.
> > > bdrv_drain should rather be a bdrv_drained_begin/end pair, but since
> > > bs->aio_context is changing, we can just use aio_disable_external and
> > > block_job_pause.
> > > 
> > > Reported-by: Ed Swierk <eswierk@skyportsystems.com>
> > > Signed-off-by: Fam Zheng <famz@redhat.com>
> > > ---
> > >  block.c | 11 +++++++++--
> > >  1 file changed, 9 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/block.c b/block.c
> > > index 8893ac1..e70684a 100644
> > > --- a/block.c
> > > +++ b/block.c
> > > @@ -4395,11 +4395,14 @@ void bdrv_attach_aio_context(BlockDriverState *bs,
> > >  
> > >  void bdrv_set_aio_context(BlockDriverState *bs, AioContext *new_context)
> > >  {
> > > -    AioContext *ctx;
> > > +    AioContext *ctx = bdrv_get_aio_context(bs);
> > >  
> > > +    aio_disable_external(ctx);
> > > +    if (bs->job) {
> > > +        block_job_pause(bs->job);
> > > +    }
> > 
> > Even more bs->job users... :-(
> > 
> > But is this one actually necessary? We drain all pending BHs below, so
> > the job shouldn't have any requests in flight or be able to submit new
> > ones while we switch.
> 
> I'm not 100% sure, but I think the aio_poll() loop below can still fire
> co_sleep_cb if we don't do block_job_pause()?

Somehow I thought that aio_poll() doesn't run timers, but that seems to
be wrong. I think in the pre-AioContext world it used to be this way.

Okay, let's leave this patch as it is. We can then clean things up in
the 2.10 timeframe and propagate things cleanly upwards through the
BlockBackend to its users. We want to this for drain anyway.

Hm, maybe we should already call bdrv_parent_drained_begin/end? I don't
think that image throttling can submit new requests while we're changing
the AioContext (because it probably doesn't have any at that point), but
I'm not completely sure.

Kevin

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [PATCH for-2.9 5/5] coroutine: Explicitly specify AioContext when creating coroutine
  2017-04-06 16:20   ` Kevin Wolf
@ 2017-04-07  8:34     ` Fam Zheng
  0 siblings, 0 replies; 21+ messages in thread
From: Fam Zheng @ 2017-04-07  8:34 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: qemu-block, qemu-devel, Max Reitz, Ed Swierk, Stefan Hajnoczi,
	Paolo Bonzini

On Thu, 04/06 18:20, Kevin Wolf wrote:
> For example, another case where this happens is that block jobs follow
> their nodes if the AioContext changes and even implement
> .attached_aio_context callbacks when they need to drag additional nodes
> into the new context. With your change, the job coroutine would remember
> the old coroutine and move back to the old context in some cases!

You are right, in v2 I'll store the co->ctx at enter time explicitly. This way a
context change on a BDS will also move the block job's co.

Fam

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [PATCH for-2.9 4/5] block: Drain BH in bdrv_drained_begin
  2017-04-06 15:37   ` Kevin Wolf
@ 2017-04-14  9:15     ` Paolo Bonzini
  0 siblings, 0 replies; 21+ messages in thread
From: Paolo Bonzini @ 2017-04-14  9:15 UTC (permalink / raw)
  To: Kevin Wolf, Fam Zheng; +Cc: qemu-block, qemu-devel, Max Reitz, Stefan Hajnoczi



On 06/04/2017 23:37, Kevin Wolf wrote:
> We used to poll BHs in earlier times. Commit 99723548 ('block: add BDS
> field to count in-flight requests') changed this, without an explanation
> in the commit message.
> 
> Paolo, is this simply a bug in that commit, or do you rely on it
> somewhere? I remember that you wanted to get rid of some aio_poll()
> calls a while ago.

It was replaced by the bdrv_inc_in_flight/bdrv_dec_in_flight calls in
bdrv_aio_prw and bdrv_co_complete.

These let bdrv_drain complete the invocation of the callbacks.  But
block_job_defer_to_main_loop is different because it schedules the
bottom half in another QEMU AioContext (the main thread's).

Since he's here with me, I'll ask Fam exactly what's happening.  Stefan
also found something not too convincing in his review.

Paolo

>>  block/io.c | 7 ++++++-
>>  1 file changed, 6 insertions(+), 1 deletion(-)
>>
>> diff --git a/block/io.c b/block/io.c
>> index 2709a70..b9cfd18 100644
>> --- a/block/io.c
>> +++ b/block/io.c
>> @@ -228,7 +228,12 @@ void bdrv_drained_begin(BlockDriverState *bs)
>>          bdrv_parent_drained_begin(bs);
>>      }
>>  
>> -    bdrv_drain_recurse(bs);
>> +    while (true) {
>> +        if (!bdrv_drain_recurse(bs) &&
>> +            !aio_poll(bdrv_get_aio_context(bs), false)) {
>> +                break;
>> +            }
> 
> The indentation is off here.
> 
>> +    }
>>  }
> 
> The old code had this in what is now the BDRV_POLL_WHILE() call in
> bdrv_drain_recurse(). I think it makes more sense there, saves you a
> loop here and fixes bdrv_drain_all_begin() at the same time.
> 
> Kevin
> 
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2017-04-14  9:15 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-04-06 14:25 [Qemu-devel] [PATCH for-2.9 0/5] block: Fixes regarding dataplane and management operations Fam Zheng
2017-04-06 14:25 ` [Qemu-devel] [PATCH for-2.9 1/5] block: Fix unpaired aio_disable_external in external snapshot Fam Zheng
2017-04-06 14:36   ` Eric Blake
2017-04-06 14:36   ` Kevin Wolf
2017-04-06 23:38     ` Fam Zheng
2017-04-06 14:25 ` [Qemu-devel] [PATCH for-2.9 2/5] mirror: Fix aio context of mirror_top_bs Fam Zheng
2017-04-06 14:36   ` Eric Blake
2017-04-06 14:37   ` Kevin Wolf
2017-04-06 14:25 ` [Qemu-devel] [PATCH for-2.9 3/5] block: Quiesce old aio context during bdrv_set_aio_context Fam Zheng
2017-04-06 15:00   ` Eric Blake
2017-04-06 15:17   ` Kevin Wolf
2017-04-06 23:44     ` Fam Zheng
2017-04-07  8:15       ` Kevin Wolf
2017-04-06 14:25 ` [Qemu-devel] [PATCH for-2.9 4/5] block: Drain BH in bdrv_drained_begin Fam Zheng
2017-04-06 15:10   ` Eric Blake
2017-04-06 15:37   ` Kevin Wolf
2017-04-14  9:15     ` Paolo Bonzini
2017-04-06 14:25 ` [Qemu-devel] [PATCH for-2.9 5/5] coroutine: Explicitly specify AioContext when creating coroutine Fam Zheng
2017-04-06 15:34   ` Eric Blake
2017-04-06 16:20   ` Kevin Wolf
2017-04-07  8:34     ` Fam Zheng

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.