All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH 00/19] Drain fixes and cleanups, part 3
@ 2018-04-11 16:39 Kevin Wolf
  2018-04-11 16:39 ` [Qemu-devel] [PATCH 01/19] test-bdrv-drain: bdrv_drain() works with cross-AioContext events Kevin Wolf
                   ` (20 more replies)
  0 siblings, 21 replies; 52+ messages in thread
From: Kevin Wolf @ 2018-04-11 16:39 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, mreitz, pbonzini, famz, stefanha, qemu-devel

This is the third and hopefully for now last part of my work to fix
drain. The main goal of this series is to make drain robust against
graph changes that happen in any callbacks of in-flight requests while
we drain a block node.

The individual patches describe the details, but the rough plan is to
change all three drain types (single node, subtree and all) to work like
this:

1. First call all the necessary callbacks to quiesce external sources
   for new requests. This includes the block driver callbacks, the child
   node callbacks and disabling external AioContext events. This is done
   recursively.

   Much of the trouble we had with drain resulted from the fact that the
   graph changed while we were traversing the graph recursively. None of
   the callbacks called in this phase may change the graph.

2. Then do a single AIO_WAIT_WHILE() to drain the requests of all
   affected nodes. The aio_poll() called by it is where graph changes
   can happen and we need to be careful.

   However, while evaluating the loop condition, the graph can't change,
   so we can safely call all necessary callbacks, if needed recursively,
   to determine whether there are still pending requests in any affected
   nodes. We just need to make sure that we don't rely on the set of
   nodes being the same between any two evaluation of the condition.

There are a few more smaller, mostly self-contained changes needed
before we're actually safe, but this is the main mechanism that will
help you understand what we're working towards during the series.

Kevin Wolf (18):
  test-bdrv-drain: bdrv_drain() works with cross-AioContext events
  block: Use bdrv_do_drain_begin/end in bdrv_drain_all()
  block: Remove 'recursive' parameter from bdrv_drain_invoke()
  block: Don't manually poll in bdrv_drain_all()
  tests/test-bdrv-drain: bdrv_drain_all() works in coroutines now
  block: Avoid unnecessary aio_poll() in AIO_WAIT_WHILE()
  block: Really pause block jobs on drain
  block: Remove bdrv_drain_recurse()
  block: Drain recursively with a single BDRV_POLL_WHILE()
  test-bdrv-drain: Test node deletion in subtree recursion
  block: Don't poll in parent drain callbacks
  test-bdrv-drain: Graph change through parent callback
  block: Defer .bdrv_drain_begin callback to polling phase
  test-bdrv-drain: Test that bdrv_drain_invoke() doesn't poll
  block: Allow AIO_WAIT_WHILE with NULL ctx
  block: Move bdrv_drain_all_begin() out of coroutine context
  block: Allow graph changes in bdrv_drain_all_begin/end sections
  test-bdrv-drain: Test graph changes in drain_all section

Max Reitz (1):
  test-bdrv-drain: Add test for node deletion

 include/block/aio-wait.h     |  25 +-
 include/block/block.h        |  17 +
 include/block/block_int.h    |   8 +
 include/block/blockjob_int.h |   8 +
 block.c                      |  31 +-
 block/io.c                   | 280 +++++++++-------
 block/mirror.c               |   8 +
 blockjob.c                   |  21 ++
 tests/test-bdrv-drain.c      | 738 ++++++++++++++++++++++++++++++++++++++++---
 9 files changed, 974 insertions(+), 162 deletions(-)

-- 
2.13.6

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [Qemu-devel] [PATCH 01/19] test-bdrv-drain: bdrv_drain() works with cross-AioContext events
  2018-04-11 16:39 [Qemu-devel] [PATCH 00/19] Drain fixes and cleanups, part 3 Kevin Wolf
@ 2018-04-11 16:39 ` Kevin Wolf
  2018-04-11 16:39 ` [Qemu-devel] [PATCH 02/19] block: Use bdrv_do_drain_begin/end in bdrv_drain_all() Kevin Wolf
                   ` (19 subsequent siblings)
  20 siblings, 0 replies; 52+ messages in thread
From: Kevin Wolf @ 2018-04-11 16:39 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, mreitz, pbonzini, famz, stefanha, qemu-devel

As long as nobody keeps the other I/O thread from working, there is no
reason why bdrv_drain() wouldn't work with cross-AioContext events. The
key is that the root request we're waiting for is in the AioContext
we're polling (which it always is for bdrv_drain()) so that aio_poll()
is woken up in the end.

Add a test case that shows that it works. Remove the comment in
bdrv_drain() that claims otherwise.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 block/io.c              |   4 --
 tests/test-bdrv-drain.c | 187 +++++++++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 186 insertions(+), 5 deletions(-)

diff --git a/block/io.c b/block/io.c
index bd9a19a9c4..28e71221b0 100644
--- a/block/io.c
+++ b/block/io.c
@@ -369,10 +369,6 @@ void bdrv_unapply_subtree_drain(BdrvChild *child, BlockDriverState *old_parent)
  *
  * Note that unlike bdrv_drain_all(), the caller must hold the BlockDriverState
  * AioContext.
- *
- * Only this BlockDriverState's AioContext is run, so in-flight requests must
- * not depend on events in other AioContexts.  In that case, use
- * bdrv_drain_all() instead.
  */
 void coroutine_fn bdrv_co_drain(BlockDriverState *bs)
 {
diff --git a/tests/test-bdrv-drain.c b/tests/test-bdrv-drain.c
index 7673de1062..29634102d8 100644
--- a/tests/test-bdrv-drain.c
+++ b/tests/test-bdrv-drain.c
@@ -27,9 +27,13 @@
 #include "block/blockjob_int.h"
 #include "sysemu/block-backend.h"
 #include "qapi/error.h"
+#include "iothread.h"
+
+static QemuEvent done_event;
 
 typedef struct BDRVTestState {
     int drain_count;
+    AioContext *bh_indirection_ctx;
 } BDRVTestState;
 
 static void coroutine_fn bdrv_test_co_drain_begin(BlockDriverState *bs)
@@ -50,16 +54,29 @@ static void bdrv_test_close(BlockDriverState *bs)
     g_assert_cmpint(s->drain_count, >, 0);
 }
 
+static void co_reenter_bh(void *opaque)
+{
+    aio_co_wake(opaque);
+}
+
 static int coroutine_fn bdrv_test_co_preadv(BlockDriverState *bs,
                                             uint64_t offset, uint64_t bytes,
                                             QEMUIOVector *qiov, int flags)
 {
+    BDRVTestState *s = bs->opaque;
+
     /* We want this request to stay until the polling loop in drain waits for
      * it to complete. We need to sleep a while as bdrv_drain_invoke() comes
      * first and polls its result, too, but it shouldn't accidentally complete
      * this request yet. */
     qemu_co_sleep_ns(QEMU_CLOCK_REALTIME, 100000);
 
+    if (s->bh_indirection_ctx) {
+        aio_bh_schedule_oneshot(s->bh_indirection_ctx, co_reenter_bh,
+                                qemu_coroutine_self());
+        qemu_coroutine_yield();
+    }
+
     return 0;
 }
 
@@ -490,6 +507,164 @@ static void test_graph_change(void)
     blk_unref(blk_b);
 }
 
+struct test_iothread_data {
+    BlockDriverState *bs;
+    enum drain_type drain_type;
+    int *aio_ret;
+};
+
+static void test_iothread_drain_entry(void *opaque)
+{
+    struct test_iothread_data *data = opaque;
+
+    aio_context_acquire(bdrv_get_aio_context(data->bs));
+    do_drain_begin(data->drain_type, data->bs);
+    g_assert_cmpint(*data->aio_ret, ==, 0);
+    do_drain_end(data->drain_type, data->bs);
+    aio_context_release(bdrv_get_aio_context(data->bs));
+
+    qemu_event_set(&done_event);
+}
+
+static void test_iothread_aio_cb(void *opaque, int ret)
+{
+    int *aio_ret = opaque;
+    *aio_ret = ret;
+    qemu_event_set(&done_event);
+}
+
+/*
+ * Starts an AIO request on a BDS that runs in the AioContext of iothread 1.
+ * The request involves a BH on iothread 2 before it can complete.
+ *
+ * @drain_thread = 0 means that do_drain_begin/end are called from the main
+ * thread, @drain_thread = 1 means that they are called from iothread 1. Drain
+ * for this BDS cannot be called from iothread 2 because only the main thread
+ * may do cross-AioContext polling.
+ */
+static void test_iothread_common(enum drain_type drain_type, int drain_thread)
+{
+    BlockBackend *blk;
+    BlockDriverState *bs;
+    BDRVTestState *s;
+    BlockAIOCB *acb;
+    int aio_ret;
+    struct test_iothread_data data;
+
+    IOThread *a = iothread_new();
+    IOThread *b = iothread_new();
+    AioContext *ctx_a = iothread_get_aio_context(a);
+    AioContext *ctx_b = iothread_get_aio_context(b);
+
+    QEMUIOVector qiov;
+    struct iovec iov = {
+        .iov_base = NULL,
+        .iov_len = 0,
+    };
+    qemu_iovec_init_external(&qiov, &iov, 1);
+
+    /* bdrv_drain_all() may only be called from the main loop thread */
+    if (drain_type == BDRV_DRAIN_ALL && drain_thread != 0) {
+        goto out;
+    }
+
+    blk = blk_new(BLK_PERM_ALL, BLK_PERM_ALL);
+    bs = bdrv_new_open_driver(&bdrv_test, "test-node", BDRV_O_RDWR,
+                              &error_abort);
+    s = bs->opaque;
+    blk_insert_bs(blk, bs, &error_abort);
+
+    blk_set_aio_context(blk, ctx_a);
+    aio_context_acquire(ctx_a);
+
+    s->bh_indirection_ctx = ctx_b;
+
+    aio_ret = -EINPROGRESS;
+    if (drain_thread == 0) {
+        acb = blk_aio_preadv(blk, 0, &qiov, 0, test_iothread_aio_cb, &aio_ret);
+    } else {
+        acb = blk_aio_preadv(blk, 0, &qiov, 0, aio_ret_cb, &aio_ret);
+    }
+    g_assert(acb != NULL);
+    g_assert_cmpint(aio_ret, ==, -EINPROGRESS);
+
+    aio_context_release(ctx_a);
+
+    data = (struct test_iothread_data) {
+        .bs         = bs,
+        .drain_type = drain_type,
+        .aio_ret    = &aio_ret,
+    };
+
+    switch (drain_thread) {
+    case 0:
+        if (drain_type != BDRV_DRAIN_ALL) {
+            aio_context_acquire(ctx_a);
+        }
+
+        /* The request is running on the IOThread a. Draining its block device
+         * will make sure that it has completed as far as the BDS is concerned,
+         * but the drain in this thread can continue immediately after
+         * bdrv_dec_in_flight() and aio_ret might be assigned only slightly
+         * later. */
+        qemu_event_reset(&done_event);
+        do_drain_begin(drain_type, bs);
+        g_assert_cmpint(bs->in_flight, ==, 0);
+
+        if (drain_type != BDRV_DRAIN_ALL) {
+            aio_context_release(ctx_a);
+        }
+        qemu_event_wait(&done_event);
+        if (drain_type != BDRV_DRAIN_ALL) {
+            aio_context_acquire(ctx_a);
+        }
+
+        g_assert_cmpint(aio_ret, ==, 0);
+        do_drain_end(drain_type, bs);
+
+        if (drain_type != BDRV_DRAIN_ALL) {
+            aio_context_release(ctx_a);
+        }
+        break;
+    case 1:
+        qemu_event_reset(&done_event);
+        aio_bh_schedule_oneshot(ctx_a, test_iothread_drain_entry, &data);
+        qemu_event_wait(&done_event);
+        break;
+    default:
+        g_assert_not_reached();
+    }
+
+    aio_context_acquire(ctx_a);
+    blk_set_aio_context(blk, qemu_get_aio_context());
+    aio_context_release(ctx_a);
+
+    bdrv_unref(bs);
+    blk_unref(blk);
+
+out:
+    iothread_join(a);
+    iothread_join(b);
+}
+
+static void test_iothread_drain_all(void)
+{
+    test_iothread_common(BDRV_DRAIN_ALL, 0);
+    test_iothread_common(BDRV_DRAIN_ALL, 1);
+}
+
+static void test_iothread_drain(void)
+{
+    test_iothread_common(BDRV_DRAIN, 0);
+    test_iothread_common(BDRV_DRAIN, 1);
+}
+
+static void test_iothread_drain_subtree(void)
+{
+    test_iothread_common(BDRV_SUBTREE_DRAIN, 0);
+    test_iothread_common(BDRV_SUBTREE_DRAIN, 1);
+}
+
 
 typedef struct TestBlockJob {
     BlockJob common;
@@ -613,10 +788,13 @@ static void test_blockjob_drain_subtree(void)
 
 int main(int argc, char **argv)
 {
+    int ret;
+
     bdrv_init();
     qemu_init_main_loop(&error_abort);
 
     g_test_init(&argc, &argv, NULL);
+    qemu_event_init(&done_event, false);
 
     g_test_add_func("/bdrv-drain/driver-cb/drain_all", test_drv_cb_drain_all);
     g_test_add_func("/bdrv-drain/driver-cb/drain", test_drv_cb_drain);
@@ -643,10 +821,17 @@ int main(int argc, char **argv)
     g_test_add_func("/bdrv-drain/multiparent", test_multiparent);
     g_test_add_func("/bdrv-drain/graph-change", test_graph_change);
 
+    g_test_add_func("/bdrv-drain/iothread/drain_all", test_iothread_drain_all);
+    g_test_add_func("/bdrv-drain/iothread/drain", test_iothread_drain);
+    g_test_add_func("/bdrv-drain/iothread/drain_subtree",
+                    test_iothread_drain_subtree);
+
     g_test_add_func("/bdrv-drain/blockjob/drain_all", test_blockjob_drain_all);
     g_test_add_func("/bdrv-drain/blockjob/drain", test_blockjob_drain);
     g_test_add_func("/bdrv-drain/blockjob/drain_subtree",
                     test_blockjob_drain_subtree);
 
-    return g_test_run();
+    ret = g_test_run();
+    qemu_event_destroy(&done_event);
+    return ret;
 }
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [Qemu-devel] [PATCH 02/19] block: Use bdrv_do_drain_begin/end in bdrv_drain_all()
  2018-04-11 16:39 [Qemu-devel] [PATCH 00/19] Drain fixes and cleanups, part 3 Kevin Wolf
  2018-04-11 16:39 ` [Qemu-devel] [PATCH 01/19] test-bdrv-drain: bdrv_drain() works with cross-AioContext events Kevin Wolf
@ 2018-04-11 16:39 ` Kevin Wolf
  2018-04-20  7:07   ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi
  2018-04-11 16:39 ` [Qemu-devel] [PATCH 03/19] block: Remove 'recursive' parameter from bdrv_drain_invoke() Kevin Wolf
                   ` (18 subsequent siblings)
  20 siblings, 1 reply; 52+ messages in thread
From: Kevin Wolf @ 2018-04-11 16:39 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, mreitz, pbonzini, famz, stefanha, qemu-devel

bdrv_do_drain_begin/end() implement already everything that
bdrv_drain_all_begin/end() need and currently still do manually: Disable
external events, call parent drain callbacks, call block driver
callbacks.

It also does two more things:

The first is incrementing bs->quiesce_counter. bdrv_drain_all() already
stood out in the test case by behaving different from the other drain
variants. Adding this is not only safe, but in fact a bug fix.

The second is calling bdrv_drain_recurse(). We already do that later in
the same function in a loop, so basically doing an early first iteration
doesn't hurt.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 block/io.c              | 10 ++--------
 tests/test-bdrv-drain.c | 14 ++++----------
 2 files changed, 6 insertions(+), 18 deletions(-)

diff --git a/block/io.c b/block/io.c
index 28e71221b0..cad59db2f4 100644
--- a/block/io.c
+++ b/block/io.c
@@ -412,11 +412,8 @@ void bdrv_drain_all_begin(void)
     for (bs = bdrv_first(&it); bs; bs = bdrv_next(&it)) {
         AioContext *aio_context = bdrv_get_aio_context(bs);
 
-        /* Stop things in parent-to-child order */
         aio_context_acquire(aio_context);
-        aio_disable_external(aio_context);
-        bdrv_parent_drained_begin(bs, NULL);
-        bdrv_drain_invoke(bs, true, true);
+        bdrv_do_drained_begin(bs, true, NULL);
         aio_context_release(aio_context);
 
         if (!g_slist_find(aio_ctxs, aio_context)) {
@@ -457,11 +454,8 @@ void bdrv_drain_all_end(void)
     for (bs = bdrv_first(&it); bs; bs = bdrv_next(&it)) {
         AioContext *aio_context = bdrv_get_aio_context(bs);
 
-        /* Re-enable things in child-to-parent order */
         aio_context_acquire(aio_context);
-        bdrv_drain_invoke(bs, false, true);
-        bdrv_parent_drained_end(bs, NULL);
-        aio_enable_external(aio_context);
+        bdrv_do_drained_end(bs, true, NULL);
         aio_context_release(aio_context);
     }
 }
diff --git a/tests/test-bdrv-drain.c b/tests/test-bdrv-drain.c
index 29634102d8..cd870609c5 100644
--- a/tests/test-bdrv-drain.c
+++ b/tests/test-bdrv-drain.c
@@ -276,8 +276,7 @@ static void test_quiesce_common(enum drain_type drain_type, bool recursive)
 
 static void test_quiesce_drain_all(void)
 {
-    // XXX drain_all doesn't quiesce
-    //test_quiesce_common(BDRV_DRAIN_ALL, true);
+    test_quiesce_common(BDRV_DRAIN_ALL, true);
 }
 
 static void test_quiesce_drain(void)
@@ -319,12 +318,7 @@ static void test_nested(void)
 
     for (outer = 0; outer < DRAIN_TYPE_MAX; outer++) {
         for (inner = 0; inner < DRAIN_TYPE_MAX; inner++) {
-            /* XXX bdrv_drain_all() doesn't increase the quiesce_counter */
-            int bs_quiesce      = (outer != BDRV_DRAIN_ALL) +
-                                  (inner != BDRV_DRAIN_ALL);
-            int backing_quiesce = (outer == BDRV_SUBTREE_DRAIN) +
-                                  (inner == BDRV_SUBTREE_DRAIN);
-            int backing_cb_cnt  = (outer != BDRV_DRAIN) +
+            int backing_quiesce = (outer != BDRV_DRAIN) +
                                   (inner != BDRV_DRAIN);
 
             g_assert_cmpint(bs->quiesce_counter, ==, 0);
@@ -335,10 +329,10 @@ static void test_nested(void)
             do_drain_begin(outer, bs);
             do_drain_begin(inner, bs);
 
-            g_assert_cmpint(bs->quiesce_counter, ==, bs_quiesce);
+            g_assert_cmpint(bs->quiesce_counter, ==, 2);
             g_assert_cmpint(backing->quiesce_counter, ==, backing_quiesce);
             g_assert_cmpint(s->drain_count, ==, 2);
-            g_assert_cmpint(backing_s->drain_count, ==, backing_cb_cnt);
+            g_assert_cmpint(backing_s->drain_count, ==, backing_quiesce);
 
             do_drain_end(inner, bs);
             do_drain_end(outer, bs);
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [Qemu-devel] [PATCH 03/19] block: Remove 'recursive' parameter from bdrv_drain_invoke()
  2018-04-11 16:39 [Qemu-devel] [PATCH 00/19] Drain fixes and cleanups, part 3 Kevin Wolf
  2018-04-11 16:39 ` [Qemu-devel] [PATCH 01/19] test-bdrv-drain: bdrv_drain() works with cross-AioContext events Kevin Wolf
  2018-04-11 16:39 ` [Qemu-devel] [PATCH 02/19] block: Use bdrv_do_drain_begin/end in bdrv_drain_all() Kevin Wolf
@ 2018-04-11 16:39 ` Kevin Wolf
  2018-04-20  7:09   ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi
  2018-04-11 16:39 ` [Qemu-devel] [PATCH 04/19] block: Don't manually poll in bdrv_drain_all() Kevin Wolf
                   ` (17 subsequent siblings)
  20 siblings, 1 reply; 52+ messages in thread
From: Kevin Wolf @ 2018-04-11 16:39 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, mreitz, pbonzini, famz, stefanha, qemu-devel

All callers pass false for the 'recursive' parameter now. Remove it.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 block/io.c | 13 +++----------
 1 file changed, 3 insertions(+), 10 deletions(-)

diff --git a/block/io.c b/block/io.c
index cad59db2f4..d2bd89c3bb 100644
--- a/block/io.c
+++ b/block/io.c
@@ -167,9 +167,8 @@ static void coroutine_fn bdrv_drain_invoke_entry(void *opaque)
 }
 
 /* Recursively call BlockDriver.bdrv_co_drain_begin/end callbacks */
-static void bdrv_drain_invoke(BlockDriverState *bs, bool begin, bool recursive)
+static void bdrv_drain_invoke(BlockDriverState *bs, bool begin)
 {
-    BdrvChild *child, *tmp;
     BdrvCoDrainData data = { .bs = bs, .done = false, .begin = begin};
 
     if (!bs->drv || (begin && !bs->drv->bdrv_co_drain_begin) ||
@@ -180,12 +179,6 @@ static void bdrv_drain_invoke(BlockDriverState *bs, bool begin, bool recursive)
     data.co = qemu_coroutine_create(bdrv_drain_invoke_entry, &data);
     bdrv_coroutine_enter(bs, data.co);
     BDRV_POLL_WHILE(bs, !data.done);
-
-    if (recursive) {
-        QLIST_FOREACH_SAFE(child, &bs->children, next, tmp) {
-            bdrv_drain_invoke(child->bs, begin, true);
-        }
-    }
 }
 
 static bool bdrv_drain_recurse(BlockDriverState *bs)
@@ -286,7 +279,7 @@ void bdrv_do_drained_begin(BlockDriverState *bs, bool recursive,
     }
 
     bdrv_parent_drained_begin(bs, parent);
-    bdrv_drain_invoke(bs, true, false);
+    bdrv_drain_invoke(bs, true);
     bdrv_drain_recurse(bs);
 
     if (recursive) {
@@ -321,7 +314,7 @@ void bdrv_do_drained_end(BlockDriverState *bs, bool recursive,
     old_quiesce_counter = atomic_fetch_dec(&bs->quiesce_counter);
 
     /* Re-enable things in child-to-parent order */
-    bdrv_drain_invoke(bs, false, false);
+    bdrv_drain_invoke(bs, false);
     bdrv_parent_drained_end(bs, parent);
     if (old_quiesce_counter == 1) {
         aio_enable_external(bdrv_get_aio_context(bs));
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [Qemu-devel] [PATCH 04/19] block: Don't manually poll in bdrv_drain_all()
  2018-04-11 16:39 [Qemu-devel] [PATCH 00/19] Drain fixes and cleanups, part 3 Kevin Wolf
                   ` (2 preceding siblings ...)
  2018-04-11 16:39 ` [Qemu-devel] [PATCH 03/19] block: Remove 'recursive' parameter from bdrv_drain_invoke() Kevin Wolf
@ 2018-04-11 16:39 ` Kevin Wolf
  2018-04-11 18:32   ` Eric Blake
  2018-04-20  7:11   ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi
  2018-04-11 16:39 ` [Qemu-devel] [PATCH 05/19] tests/test-bdrv-drain: bdrv_drain_all() works in coroutines now Kevin Wolf
                   ` (16 subsequent siblings)
  20 siblings, 2 replies; 52+ messages in thread
From: Kevin Wolf @ 2018-04-11 16:39 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, mreitz, pbonzini, famz, stefanha, qemu-devel

All involved nodes are already idle, we called bdrv_do_draine_begin() on
them.

The comment in the code suggested that this were not correct because the
completion of a request on one node could spawn a new request on a
different node (which might have been drained before, so we wouldn't
drain the new request). In reality, new requests to different nodes
aren't spawned out of nothing, but only in the context of a parent
request, and they aren't submitted to random nodes, but only to child
nodes. As long as we still poll for the completion of the parent request
(which we do), draining each root node separately is good enough.

Remove the additional polling code from bdrv_drain_all_begin() and
replace it with an assertion that all nodes are already idle after we
drained them separately.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 block/io.c | 41 ++++++++++++-----------------------------
 1 file changed, 12 insertions(+), 29 deletions(-)

diff --git a/block/io.c b/block/io.c
index d2bd89c3bb..ea6f9f023a 100644
--- a/block/io.c
+++ b/block/io.c
@@ -376,6 +376,16 @@ void bdrv_drain(BlockDriverState *bs)
     bdrv_drained_end(bs);
 }
 
+static void bdrv_drain_assert_idle(BlockDriverState *bs)
+{
+    BdrvChild *child, *next;
+
+    assert(atomic_read(&bs->in_flight) == 0);
+    QLIST_FOREACH_SAFE(child, &bs->children, next, next) {
+        bdrv_drain_assert_idle(child->bs);
+    }
+}
+
 /*
  * Wait for pending requests to complete across all BlockDriverStates
  *
@@ -390,11 +400,8 @@ void bdrv_drain(BlockDriverState *bs)
  */
 void bdrv_drain_all_begin(void)
 {
-    /* Always run first iteration so any pending completion BHs run */
-    bool waited = true;
     BlockDriverState *bs;
     BdrvNextIterator it;
-    GSList *aio_ctxs = NULL, *ctx;
 
     /* BDRV_POLL_WHILE() for a node can only be called from its own I/O thread
      * or the main loop AioContext. We potentially use BDRV_POLL_WHILE() on
@@ -408,35 +415,11 @@ void bdrv_drain_all_begin(void)
         aio_context_acquire(aio_context);
         bdrv_do_drained_begin(bs, true, NULL);
         aio_context_release(aio_context);
-
-        if (!g_slist_find(aio_ctxs, aio_context)) {
-            aio_ctxs = g_slist_prepend(aio_ctxs, aio_context);
-        }
     }
 
-    /* Note that completion of an asynchronous I/O operation can trigger any
-     * number of other I/O operations on other devices---for example a
-     * coroutine can submit an I/O request to another device in response to
-     * request completion.  Therefore we must keep looping until there was no
-     * more activity rather than simply draining each device independently.
-     */
-    while (waited) {
-        waited = false;
-
-        for (ctx = aio_ctxs; ctx != NULL; ctx = ctx->next) {
-            AioContext *aio_context = ctx->data;
-
-            aio_context_acquire(aio_context);
-            for (bs = bdrv_first(&it); bs; bs = bdrv_next(&it)) {
-                if (aio_context == bdrv_get_aio_context(bs)) {
-                    waited |= bdrv_drain_recurse(bs);
-                }
-            }
-            aio_context_release(aio_context);
-        }
+    for (bs = bdrv_first(&it); bs; bs = bdrv_next(&it)) {
+        bdrv_drain_assert_idle(bs);
     }
-
-    g_slist_free(aio_ctxs);
 }
 
 void bdrv_drain_all_end(void)
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [Qemu-devel] [PATCH 05/19] tests/test-bdrv-drain: bdrv_drain_all() works in coroutines now
  2018-04-11 16:39 [Qemu-devel] [PATCH 00/19] Drain fixes and cleanups, part 3 Kevin Wolf
                   ` (3 preceding siblings ...)
  2018-04-11 16:39 ` [Qemu-devel] [PATCH 04/19] block: Don't manually poll in bdrv_drain_all() Kevin Wolf
@ 2018-04-11 16:39 ` Kevin Wolf
  2018-04-11 18:33   ` Eric Blake
  2018-04-20  7:12   ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi
  2018-04-11 16:39 ` [Qemu-devel] [PATCH 06/19] block: Avoid unnecessary aio_poll() in AIO_WAIT_WHILE() Kevin Wolf
                   ` (15 subsequent siblings)
  20 siblings, 2 replies; 52+ messages in thread
From: Kevin Wolf @ 2018-04-11 16:39 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, mreitz, pbonzini, famz, stefanha, qemu-devel

Since we use bdrv_do_drained_begin/end() for bdrv_drain_all_begin/end(),
coroutine context is automatically left with a BH, preventing the
deadlocks that made bdrv_drain_all*() unsafe in coroutine context. We
can consider it compatible now the latest, after having removed the old
polling code as dead code.

Enable the coroutine test cases for bdrv_drain_all().

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 tests/test-bdrv-drain.c | 16 ++++++++++++++--
 1 file changed, 14 insertions(+), 2 deletions(-)

diff --git a/tests/test-bdrv-drain.c b/tests/test-bdrv-drain.c
index cd870609c5..6de525b509 100644
--- a/tests/test-bdrv-drain.c
+++ b/tests/test-bdrv-drain.c
@@ -233,6 +233,11 @@ static void test_drv_cb_drain_subtree(void)
     test_drv_cb_common(BDRV_SUBTREE_DRAIN, true);
 }
 
+static void test_drv_cb_co_drain_all(void)
+{
+    call_in_coroutine(test_drv_cb_drain_all);
+}
+
 static void test_drv_cb_co_drain(void)
 {
     call_in_coroutine(test_drv_cb_drain);
@@ -289,6 +294,11 @@ static void test_quiesce_drain_subtree(void)
     test_quiesce_common(BDRV_SUBTREE_DRAIN, true);
 }
 
+static void test_quiesce_co_drain_all(void)
+{
+    call_in_coroutine(test_quiesce_drain_all);
+}
+
 static void test_quiesce_co_drain(void)
 {
     call_in_coroutine(test_quiesce_drain);
@@ -795,7 +805,8 @@ int main(int argc, char **argv)
     g_test_add_func("/bdrv-drain/driver-cb/drain_subtree",
                     test_drv_cb_drain_subtree);
 
-    // XXX bdrv_drain_all() doesn't work in coroutine context
+    g_test_add_func("/bdrv-drain/driver-cb/co/drain_all",
+                    test_drv_cb_co_drain_all);
     g_test_add_func("/bdrv-drain/driver-cb/co/drain", test_drv_cb_co_drain);
     g_test_add_func("/bdrv-drain/driver-cb/co/drain_subtree",
                     test_drv_cb_co_drain_subtree);
@@ -806,7 +817,8 @@ int main(int argc, char **argv)
     g_test_add_func("/bdrv-drain/quiesce/drain_subtree",
                     test_quiesce_drain_subtree);
 
-    // XXX bdrv_drain_all() doesn't work in coroutine context
+    g_test_add_func("/bdrv-drain/quiesce/co/drain_all",
+                    test_quiesce_co_drain_all);
     g_test_add_func("/bdrv-drain/quiesce/co/drain", test_quiesce_co_drain);
     g_test_add_func("/bdrv-drain/quiesce/co/drain_subtree",
                     test_quiesce_co_drain_subtree);
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [Qemu-devel] [PATCH 06/19] block: Avoid unnecessary aio_poll() in AIO_WAIT_WHILE()
  2018-04-11 16:39 [Qemu-devel] [PATCH 00/19] Drain fixes and cleanups, part 3 Kevin Wolf
                   ` (4 preceding siblings ...)
  2018-04-11 16:39 ` [Qemu-devel] [PATCH 05/19] tests/test-bdrv-drain: bdrv_drain_all() works in coroutines now Kevin Wolf
@ 2018-04-11 16:39 ` Kevin Wolf
  2018-04-11 17:33   ` Su Hang
  2018-04-20  7:17   ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi
  2018-04-11 16:39 ` [Qemu-devel] [PATCH 07/19] block: Really pause block jobs on drain Kevin Wolf
                   ` (14 subsequent siblings)
  20 siblings, 2 replies; 52+ messages in thread
From: Kevin Wolf @ 2018-04-11 16:39 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, mreitz, pbonzini, famz, stefanha, qemu-devel

Commit 91af091f923 added an additional aio_poll() to BDRV_POLL_WHILE()
in order to make sure that all pending BHs are executed on drain. This
was the wrong place to make the fix, as it is useless overhead for all
other users of the macro and unnecessarily complicates the mechanism.

This patch effectively reverts said commit (the context has changed a
bit and the code has moved to AIO_WAIT_WHILE()) and instead polls in the
loop condition for drain.

The effect is probably hard to measure in any real-world use case
because actual I/O will dominate, but if I run only the initialisation
part of 'qemu-img convert' where it calls bdrv_block_status() for the
whole image to find out how much data there is copy, this phase actually
needs only roughly half the time after this patch.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 include/block/aio-wait.h | 22 ++++++++--------------
 block/io.c               | 11 ++++++++++-
 2 files changed, 18 insertions(+), 15 deletions(-)

diff --git a/include/block/aio-wait.h b/include/block/aio-wait.h
index 8c90a2e66e..783d3678dd 100644
--- a/include/block/aio-wait.h
+++ b/include/block/aio-wait.h
@@ -73,29 +73,23 @@ typedef struct {
  */
 #define AIO_WAIT_WHILE(wait, ctx, cond) ({                         \
     bool waited_ = false;                                          \
-    bool busy_ = true;                                             \
     AioWait *wait_ = (wait);                                       \
     AioContext *ctx_ = (ctx);                                      \
     if (in_aio_context_home_thread(ctx_)) {                        \
-        while ((cond) || busy_) {                                  \
-            busy_ = aio_poll(ctx_, (cond));                        \
-            waited_ |= !!(cond) | busy_;                           \
+        while ((cond)) {                                           \
+            aio_poll(ctx_, true);                                  \
+            waited_ = true;                                        \
         }                                                          \
     } else {                                                       \
         assert(qemu_get_current_aio_context() ==                   \
                qemu_get_aio_context());                            \
         /* Increment wait_->num_waiters before evaluating cond. */ \
         atomic_inc(&wait_->num_waiters);                           \
-        while (busy_) {                                            \
-            if ((cond)) {                                          \
-                waited_ = busy_ = true;                            \
-                aio_context_release(ctx_);                         \
-                aio_poll(qemu_get_aio_context(), true);            \
-                aio_context_acquire(ctx_);                         \
-            } else {                                               \
-                busy_ = aio_poll(ctx_, false);                     \
-                waited_ |= busy_;                                  \
-            }                                                      \
+        while ((cond)) {                                           \
+            aio_context_release(ctx_);                             \
+            aio_poll(qemu_get_aio_context(), true);                \
+            aio_context_acquire(ctx_);                             \
+            waited_ = true;                                        \
         }                                                          \
         atomic_dec(&wait_->num_waiters);                           \
     }                                                              \
diff --git a/block/io.c b/block/io.c
index ea6f9f023a..6f580f49ff 100644
--- a/block/io.c
+++ b/block/io.c
@@ -181,13 +181,22 @@ static void bdrv_drain_invoke(BlockDriverState *bs, bool begin)
     BDRV_POLL_WHILE(bs, !data.done);
 }
 
+/* Returns true if BDRV_POLL_WHILE() should go into a blocking aio_poll() */
+static bool bdrv_drain_poll(BlockDriverState *bs)
+{
+    /* Execute pending BHs first and check everything else only after the BHs
+     * have executed. */
+    while (aio_poll(bs->aio_context, false));
+    return atomic_read(&bs->in_flight);
+}
+
 static bool bdrv_drain_recurse(BlockDriverState *bs)
 {
     BdrvChild *child, *tmp;
     bool waited;
 
     /* Wait for drained requests to finish */
-    waited = BDRV_POLL_WHILE(bs, atomic_read(&bs->in_flight) > 0);
+    waited = BDRV_POLL_WHILE(bs, bdrv_drain_poll(bs));
 
     QLIST_FOREACH_SAFE(child, &bs->children, next, tmp) {
         BlockDriverState *bs = child->bs;
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [Qemu-devel] [PATCH 07/19] block: Really pause block jobs on drain
  2018-04-11 16:39 [Qemu-devel] [PATCH 00/19] Drain fixes and cleanups, part 3 Kevin Wolf
                   ` (5 preceding siblings ...)
  2018-04-11 16:39 ` [Qemu-devel] [PATCH 06/19] block: Avoid unnecessary aio_poll() in AIO_WAIT_WHILE() Kevin Wolf
@ 2018-04-11 16:39 ` Kevin Wolf
  2018-04-12  8:37   ` Paolo Bonzini
  2018-04-11 16:39 ` [Qemu-devel] [PATCH 08/19] block: Remove bdrv_drain_recurse() Kevin Wolf
                   ` (13 subsequent siblings)
  20 siblings, 1 reply; 52+ messages in thread
From: Kevin Wolf @ 2018-04-11 16:39 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, mreitz, pbonzini, famz, stefanha, qemu-devel

We already requested that block jobs be paused in .bdrv_drained_begin,
but no guarantee was made that the job was actually inactive at the
point where bdrv_drained_begin() returned.

This introduces a new callback BdrvChildRole.bdrv_drained_poll() and
uses it to make bdrv_drain_poll() consider block jobs using the node to
be drained.

For the test case to work as expected, we have to switch from
block_job_sleep_ns() to qemu_co_sleep_ns() so that the test job is even
considered active and must be waited for when draining the node.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 include/block/block.h        |  7 +++++++
 include/block/block_int.h    |  7 +++++++
 include/block/blockjob_int.h |  8 ++++++++
 block.c                      |  9 +++++++++
 block/io.c                   | 27 ++++++++++++++++++++++++---
 block/mirror.c               |  8 ++++++++
 blockjob.c                   | 21 +++++++++++++++++++++
 tests/test-bdrv-drain.c      | 18 ++++++++++--------
 8 files changed, 94 insertions(+), 11 deletions(-)

diff --git a/include/block/block.h b/include/block/block.h
index cdec3639a3..23dee3c114 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -559,6 +559,13 @@ void bdrv_parent_drained_begin(BlockDriverState *bs, BdrvChild *ignore);
 void bdrv_parent_drained_end(BlockDriverState *bs, BdrvChild *ignore);
 
 /**
+ * bdrv_drain_poll:
+ *
+ * Poll for pending requests in @bs. This is part of bdrv_drained_begin.
+ */
+bool bdrv_drain_poll(BlockDriverState *bs, bool top_level);
+
+/**
  * bdrv_drained_begin:
  *
  * Begin a quiesced section for exclusive access to the BDS, by disabling
diff --git a/include/block/block_int.h b/include/block/block_int.h
index c4dd1d4bb8..dc6985e3ae 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -575,6 +575,13 @@ struct BdrvChildRole {
     void (*drained_begin)(BdrvChild *child);
     void (*drained_end)(BdrvChild *child);
 
+    /*
+     * Returns whether the parent has pending requests for the child. This
+     * callback is polled after .drained_begin() has been called until all
+     * activity on the child has stopped.
+     */
+    bool (*drained_poll)(BdrvChild *child);
+
     /* Notifies the parent that the child has been activated/inactivated (e.g.
      * when migration is completing) and it can start/stop requesting
      * permissions and doing I/O on it. */
diff --git a/include/block/blockjob_int.h b/include/block/blockjob_int.h
index 642adce68b..3a2f851d3f 100644
--- a/include/block/blockjob_int.h
+++ b/include/block/blockjob_int.h
@@ -106,6 +106,14 @@ struct BlockJobDriver {
     void coroutine_fn (*resume)(BlockJob *job);
 
     /*
+     * Returns whether the job has pending requests for the child or will
+     * submit new requests before the next pause point. This callback is polled
+     * in the context of draining a job node after requesting that the job be
+     * paused, until all activity on the child has stopped.
+     */
+    bool (*drained_poll)(BlockJob *job);
+
+    /*
      * If the callback is not NULL, it will be invoked before the job is
      * resumed in a new AioContext.  This is the place to move any resources
      * besides job->blk to the new AioContext.
diff --git a/block.c b/block.c
index a2caadf0a0..462287bdfb 100644
--- a/block.c
+++ b/block.c
@@ -820,6 +820,12 @@ static void bdrv_child_cb_drained_begin(BdrvChild *child)
     bdrv_drained_begin(bs);
 }
 
+static bool bdrv_child_cb_drained_poll(BdrvChild *child)
+{
+    BlockDriverState *bs = child->opaque;
+    return bdrv_drain_poll(bs, false);
+}
+
 static void bdrv_child_cb_drained_end(BdrvChild *child)
 {
     BlockDriverState *bs = child->opaque;
@@ -904,6 +910,7 @@ const BdrvChildRole child_file = {
     .get_parent_desc = bdrv_child_get_parent_desc,
     .inherit_options = bdrv_inherited_options,
     .drained_begin   = bdrv_child_cb_drained_begin,
+    .drained_poll    = bdrv_child_cb_drained_poll,
     .drained_end     = bdrv_child_cb_drained_end,
     .attach          = bdrv_child_cb_attach,
     .detach          = bdrv_child_cb_detach,
@@ -928,6 +935,7 @@ const BdrvChildRole child_format = {
     .get_parent_desc = bdrv_child_get_parent_desc,
     .inherit_options = bdrv_inherited_fmt_options,
     .drained_begin   = bdrv_child_cb_drained_begin,
+    .drained_poll    = bdrv_child_cb_drained_poll,
     .drained_end     = bdrv_child_cb_drained_end,
     .attach          = bdrv_child_cb_attach,
     .detach          = bdrv_child_cb_detach,
@@ -1047,6 +1055,7 @@ const BdrvChildRole child_backing = {
     .detach          = bdrv_backing_detach,
     .inherit_options = bdrv_backing_options,
     .drained_begin   = bdrv_child_cb_drained_begin,
+    .drained_poll    = bdrv_child_cb_drained_poll,
     .drained_end     = bdrv_child_cb_drained_end,
     .inactivate      = bdrv_child_cb_inactivate,
     .update_filename = bdrv_backing_update_filename,
diff --git a/block/io.c b/block/io.c
index 6f580f49ff..0a778eeff4 100644
--- a/block/io.c
+++ b/block/io.c
@@ -69,6 +69,20 @@ void bdrv_parent_drained_end(BlockDriverState *bs, BdrvChild *ignore)
     }
 }
 
+static bool bdrv_parent_drained_poll(BlockDriverState *bs)
+{
+    BdrvChild *c, *next;
+    bool busy = false;
+
+    QLIST_FOREACH_SAFE(c, &bs->parents, next_parent, next) {
+        if (c->role->drained_poll) {
+            busy |= c->role->drained_poll(c);
+        }
+    }
+
+    return busy;
+}
+
 static void bdrv_merge_limits(BlockLimits *dst, const BlockLimits *src)
 {
     dst->opt_transfer = MAX(dst->opt_transfer, src->opt_transfer);
@@ -182,11 +196,18 @@ static void bdrv_drain_invoke(BlockDriverState *bs, bool begin)
 }
 
 /* Returns true if BDRV_POLL_WHILE() should go into a blocking aio_poll() */
-static bool bdrv_drain_poll(BlockDriverState *bs)
+bool bdrv_drain_poll(BlockDriverState *bs, bool top_level)
 {
     /* Execute pending BHs first and check everything else only after the BHs
      * have executed. */
-    while (aio_poll(bs->aio_context, false));
+    if (top_level) {
+        while (aio_poll(bs->aio_context, false));
+    }
+
+    if (bdrv_parent_drained_poll(bs)) {
+        return true;
+    }
+
     return atomic_read(&bs->in_flight);
 }
 
@@ -196,7 +217,7 @@ static bool bdrv_drain_recurse(BlockDriverState *bs)
     bool waited;
 
     /* Wait for drained requests to finish */
-    waited = BDRV_POLL_WHILE(bs, bdrv_drain_poll(bs));
+    waited = BDRV_POLL_WHILE(bs, bdrv_drain_poll(bs, true));
 
     QLIST_FOREACH_SAFE(child, &bs->children, next, tmp) {
         BlockDriverState *bs = child->bs;
diff --git a/block/mirror.c b/block/mirror.c
index 820f512c7b..939504bd80 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -975,6 +975,12 @@ static void mirror_pause(BlockJob *job)
     mirror_wait_for_all_io(s);
 }
 
+static bool mirror_drained_poll(BlockJob *job)
+{
+    MirrorBlockJob *s = container_of(job, MirrorBlockJob, common);
+    return !!s->in_flight;
+}
+
 static void mirror_attached_aio_context(BlockJob *job, AioContext *new_context)
 {
     MirrorBlockJob *s = container_of(job, MirrorBlockJob, common);
@@ -1004,6 +1010,7 @@ static const BlockJobDriver mirror_job_driver = {
     .start                  = mirror_run,
     .complete               = mirror_complete,
     .pause                  = mirror_pause,
+    .drained_poll           = mirror_drained_poll,
     .attached_aio_context   = mirror_attached_aio_context,
     .drain                  = mirror_drain,
 };
@@ -1015,6 +1022,7 @@ static const BlockJobDriver commit_active_job_driver = {
     .start                  = mirror_run,
     .complete               = mirror_complete,
     .pause                  = mirror_pause,
+    .drained_poll           = mirror_drained_poll,
     .attached_aio_context   = mirror_attached_aio_context,
     .drain                  = mirror_drain,
 };
diff --git a/blockjob.c b/blockjob.c
index 27f957e571..3d65196309 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -306,6 +306,26 @@ static void child_job_drained_begin(BdrvChild *c)
     block_job_pause(job);
 }
 
+static bool child_job_drained_poll(BdrvChild *c)
+{
+    BlockJob *job = c->opaque;
+
+    /* An inactive or completed job doesn't have any pending requests. Jobs
+     * with !job->busy are either already paused or have a pause point after
+     * being reentered, so no job driver code will run before they pause. */
+    if (!job->busy || job->completed || job->deferred_to_main_loop) {
+        return false;
+    }
+
+    /* Otherwise, assume that it isn't fully stopped yet, but allow the job to
+     * override this assumption. */
+    if (job->driver->drained_poll) {
+        return job->driver->drained_poll(job);
+    } else {
+        return true;
+    }
+}
+
 static void child_job_drained_end(BdrvChild *c)
 {
     BlockJob *job = c->opaque;
@@ -315,6 +335,7 @@ static void child_job_drained_end(BdrvChild *c)
 static const BdrvChildRole child_job = {
     .get_parent_desc    = child_job_get_parent_desc,
     .drained_begin      = child_job_drained_begin,
+    .drained_poll       = child_job_drained_poll,
     .drained_end        = child_job_drained_end,
     .stay_at_node       = true,
 };
diff --git a/tests/test-bdrv-drain.c b/tests/test-bdrv-drain.c
index 6de525b509..37f2d6ac8c 100644
--- a/tests/test-bdrv-drain.c
+++ b/tests/test-bdrv-drain.c
@@ -686,7 +686,11 @@ static void coroutine_fn test_job_start(void *opaque)
 
     block_job_event_ready(&s->common);
     while (!s->should_complete) {
-        block_job_sleep_ns(&s->common, 100000);
+        /* Avoid block_job_sleep_ns() because it marks the job as !busy. We
+         * want to emulate some actual activity (probably some I/O) here so
+         * that drain has to wait for this acitivity to stop. */
+        qemu_co_sleep_ns(QEMU_CLOCK_REALTIME, 100000);
+        block_job_pause_point(&s->common);
     }
 
     block_job_defer_to_main_loop(&s->common, test_job_completed, NULL);
@@ -728,7 +732,7 @@ static void test_blockjob_common(enum drain_type drain_type)
 
     g_assert_cmpint(job->pause_count, ==, 0);
     g_assert_false(job->paused);
-    g_assert_false(job->busy); /* We're in block_job_sleep_ns() */
+    g_assert_true(job->busy); /* We're in qemu_co_sleep_ns() */
 
     do_drain_begin(drain_type, src);
 
@@ -738,15 +742,14 @@ static void test_blockjob_common(enum drain_type drain_type)
     } else {
         g_assert_cmpint(job->pause_count, ==, 1);
     }
-    /* XXX We don't wait until the job is actually paused. Is this okay? */
-    /* g_assert_true(job->paused); */
+    g_assert_true(job->paused);
     g_assert_false(job->busy); /* The job is paused */
 
     do_drain_end(drain_type, src);
 
     g_assert_cmpint(job->pause_count, ==, 0);
     g_assert_false(job->paused);
-    g_assert_false(job->busy); /* We're in block_job_sleep_ns() */
+    g_assert_true(job->busy); /* We're in qemu_co_sleep_ns() */
 
     do_drain_begin(drain_type, target);
 
@@ -756,15 +759,14 @@ static void test_blockjob_common(enum drain_type drain_type)
     } else {
         g_assert_cmpint(job->pause_count, ==, 1);
     }
-    /* XXX We don't wait until the job is actually paused. Is this okay? */
-    /* g_assert_true(job->paused); */
+    g_assert_true(job->paused);
     g_assert_false(job->busy); /* The job is paused */
 
     do_drain_end(drain_type, target);
 
     g_assert_cmpint(job->pause_count, ==, 0);
     g_assert_false(job->paused);
-    g_assert_false(job->busy); /* We're in block_job_sleep_ns() */
+    g_assert_true(job->busy); /* We're in qemu_co_sleep_ns() */
 
     ret = block_job_complete_sync(job, &error_abort);
     g_assert_cmpint(ret, ==, 0);
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [Qemu-devel] [PATCH 08/19] block: Remove bdrv_drain_recurse()
  2018-04-11 16:39 [Qemu-devel] [PATCH 00/19] Drain fixes and cleanups, part 3 Kevin Wolf
                   ` (6 preceding siblings ...)
  2018-04-11 16:39 ` [Qemu-devel] [PATCH 07/19] block: Really pause block jobs on drain Kevin Wolf
@ 2018-04-11 16:39 ` Kevin Wolf
  2018-04-12  8:39   ` Paolo Bonzini
  2018-04-20  7:20   ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi
  2018-04-11 16:39 ` [Qemu-devel] [PATCH 09/19] test-bdrv-drain: Add test for node deletion Kevin Wolf
                   ` (12 subsequent siblings)
  20 siblings, 2 replies; 52+ messages in thread
From: Kevin Wolf @ 2018-04-11 16:39 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, mreitz, pbonzini, famz, stefanha, qemu-devel

For bdrv_drain(), recursively waiting for child node requests is
pointless because we didn't quiesce their parents, so new requests could
come in anyway. Letting the function work only on a single node makes it
more consistent.

For subtree drains and drain_all, we already have the recursion in
bdrv_do_drained_begin(), so the extra recursion doesn't add anything
either.

Remove the useless code.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 block/io.c | 36 +++---------------------------------
 1 file changed, 3 insertions(+), 33 deletions(-)

diff --git a/block/io.c b/block/io.c
index 0a778eeff4..f24f39c278 100644
--- a/block/io.c
+++ b/block/io.c
@@ -211,38 +211,6 @@ bool bdrv_drain_poll(BlockDriverState *bs, bool top_level)
     return atomic_read(&bs->in_flight);
 }
 
-static bool bdrv_drain_recurse(BlockDriverState *bs)
-{
-    BdrvChild *child, *tmp;
-    bool waited;
-
-    /* Wait for drained requests to finish */
-    waited = BDRV_POLL_WHILE(bs, bdrv_drain_poll(bs, true));
-
-    QLIST_FOREACH_SAFE(child, &bs->children, next, tmp) {
-        BlockDriverState *bs = child->bs;
-        bool in_main_loop =
-            qemu_get_current_aio_context() == qemu_get_aio_context();
-        assert(bs->refcnt > 0);
-        if (in_main_loop) {
-            /* In case the recursive bdrv_drain_recurse processes a
-             * block_job_defer_to_main_loop BH and modifies the graph,
-             * let's hold a reference to bs until we are done.
-             *
-             * IOThread doesn't have such a BH, and it is not safe to call
-             * bdrv_unref without BQL, so skip doing it there.
-             */
-            bdrv_ref(bs);
-        }
-        waited |= bdrv_drain_recurse(bs);
-        if (in_main_loop) {
-            bdrv_unref(bs);
-        }
-    }
-
-    return waited;
-}
-
 static void bdrv_do_drained_begin(BlockDriverState *bs, bool recursive,
                                   BdrvChild *parent);
 static void bdrv_do_drained_end(BlockDriverState *bs, bool recursive,
@@ -310,7 +278,9 @@ void bdrv_do_drained_begin(BlockDriverState *bs, bool recursive,
 
     bdrv_parent_drained_begin(bs, parent);
     bdrv_drain_invoke(bs, true);
-    bdrv_drain_recurse(bs);
+
+    /* Wait for drained requests to finish */
+    BDRV_POLL_WHILE(bs, bdrv_drain_poll(bs, true));
 
     if (recursive) {
         bs->recursive_quiesce_counter++;
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [Qemu-devel] [PATCH 09/19] test-bdrv-drain: Add test for node deletion
  2018-04-11 16:39 [Qemu-devel] [PATCH 00/19] Drain fixes and cleanups, part 3 Kevin Wolf
                   ` (7 preceding siblings ...)
  2018-04-11 16:39 ` [Qemu-devel] [PATCH 08/19] block: Remove bdrv_drain_recurse() Kevin Wolf
@ 2018-04-11 16:39 ` Kevin Wolf
  2018-04-20  7:32   ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi
  2018-04-11 16:39 ` [Qemu-devel] [PATCH 10/19] block: Drain recursively with a single BDRV_POLL_WHILE() Kevin Wolf
                   ` (11 subsequent siblings)
  20 siblings, 1 reply; 52+ messages in thread
From: Kevin Wolf @ 2018-04-11 16:39 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, mreitz, pbonzini, famz, stefanha, qemu-devel

From: Max Reitz <mreitz@redhat.com>

This patch adds two bdrv-drain tests for what happens if some BDS goes
away during the drainage.

The basic idea is that you have a parent BDS with some child nodes.
Then, you drain one of the children.  Because of that, the party who
actually owns the parent decides to (A) delete it, or (B) detach all its
children from it -- both while the child is still being drained.

A real-world case where this can happen is the mirror block job, which
may exit if you drain one of its children.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 tests/test-bdrv-drain.c | 169 ++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 169 insertions(+)

diff --git a/tests/test-bdrv-drain.c b/tests/test-bdrv-drain.c
index 37f2d6ac8c..05768213be 100644
--- a/tests/test-bdrv-drain.c
+++ b/tests/test-bdrv-drain.c
@@ -792,6 +792,172 @@ static void test_blockjob_drain_subtree(void)
     test_blockjob_common(BDRV_SUBTREE_DRAIN);
 }
 
+
+typedef struct BDRVTestTopState {
+    BdrvChild *wait_child;
+} BDRVTestTopState;
+
+static void bdrv_test_top_close(BlockDriverState *bs)
+{
+    BdrvChild *c, *next_c;
+    QLIST_FOREACH_SAFE(c, &bs->children, next, next_c) {
+        bdrv_unref_child(bs, c);
+    }
+}
+
+static int coroutine_fn bdrv_test_top_co_preadv(BlockDriverState *bs,
+                                                uint64_t offset, uint64_t bytes,
+                                                QEMUIOVector *qiov, int flags)
+{
+    BDRVTestTopState *tts = bs->opaque;
+    return bdrv_co_preadv(tts->wait_child, offset, bytes, qiov, flags);
+}
+
+static BlockDriver bdrv_test_top_driver = {
+    .format_name            = "test_top_driver",
+    .instance_size          = sizeof(BDRVTestTopState),
+
+    .bdrv_close             = bdrv_test_top_close,
+    .bdrv_co_preadv         = bdrv_test_top_co_preadv,
+
+    .bdrv_child_perm        = bdrv_format_default_perms,
+};
+
+typedef struct TestCoDeleteByDrainData {
+    BlockBackend *blk;
+    bool detach_instead_of_delete;
+    bool done;
+} TestCoDeleteByDrainData;
+
+static void coroutine_fn test_co_delete_by_drain(void *opaque)
+{
+    TestCoDeleteByDrainData *dbdd = opaque;
+    BlockBackend *blk = dbdd->blk;
+    BlockDriverState *bs = blk_bs(blk);
+    BDRVTestTopState *tts = bs->opaque;
+    void *buffer = g_malloc(65536);
+    QEMUIOVector qiov;
+    struct iovec iov = {
+        .iov_base = buffer,
+        .iov_len  = 65536,
+    };
+
+    qemu_iovec_init_external(&qiov, &iov, 1);
+
+    /* Pretend some internal write operation from parent to child.
+     * Important: We have to read from the child, not from the parent!
+     * Draining works by first propagating it all up the tree to the
+     * root and then waiting for drainage from root to the leaves
+     * (protocol nodes).  If we have a request waiting on the root,
+     * everything will be drained before we go back down the tree, but
+     * we do not want that.  We want to be in the middle of draining
+     * when this following requests returns. */
+    bdrv_co_preadv(tts->wait_child, 0, 65536, &qiov, 0);
+
+    g_assert_cmpint(bs->refcnt, ==, 1);
+
+    if (!dbdd->detach_instead_of_delete) {
+        blk_unref(blk);
+    } else {
+        BdrvChild *c, *next_c;
+        QLIST_FOREACH_SAFE(c, &bs->children, next, next_c) {
+            bdrv_unref_child(bs, c);
+        }
+    }
+
+    dbdd->done = true;
+}
+
+/**
+ * Test what happens when some BDS has some children, you drain one of
+ * them and this results in the BDS being deleted.
+ *
+ * If @detach_instead_of_delete is set, the BDS is not going to be
+ * deleted but will only detach all of its children.
+ */
+static void do_test_delete_by_drain(bool detach_instead_of_delete)
+{
+    BlockBackend *blk;
+    BlockDriverState *bs, *child_bs, *null_bs;
+    BDRVTestTopState *tts;
+    TestCoDeleteByDrainData dbdd;
+    Coroutine *co;
+
+    bs = bdrv_new_open_driver(&bdrv_test_top_driver, "top", BDRV_O_RDWR,
+                              &error_abort);
+    bs->total_sectors = 65536 >> BDRV_SECTOR_BITS;
+    tts = bs->opaque;
+
+    null_bs = bdrv_open("null-co://", NULL, NULL, BDRV_O_RDWR | BDRV_O_PROTOCOL,
+                        &error_abort);
+    bdrv_attach_child(bs, null_bs, "null-child", &child_file, &error_abort);
+
+    /* This child will be the one to pass to requests through to, and
+     * it will stall until a drain occurs */
+    child_bs = bdrv_new_open_driver(&bdrv_test, "child", BDRV_O_RDWR,
+                                    &error_abort);
+    child_bs->total_sectors = 65536 >> BDRV_SECTOR_BITS;
+    /* Takes our reference to child_bs */
+    tts->wait_child = bdrv_attach_child(bs, child_bs, "wait-child", &child_file,
+                                        &error_abort);
+
+    /* This child is just there to be deleted
+     * (for detach_instead_of_delete == true) */
+    null_bs = bdrv_open("null-co://", NULL, NULL, BDRV_O_RDWR | BDRV_O_PROTOCOL,
+                        &error_abort);
+    bdrv_attach_child(bs, null_bs, "null-child", &child_file, &error_abort);
+
+    blk = blk_new(BLK_PERM_ALL, BLK_PERM_ALL);
+    blk_insert_bs(blk, bs, &error_abort);
+
+    /* Referenced by blk now */
+    bdrv_unref(bs);
+
+    g_assert_cmpint(bs->refcnt, ==, 1);
+    g_assert_cmpint(child_bs->refcnt, ==, 1);
+    g_assert_cmpint(null_bs->refcnt, ==, 1);
+
+
+    dbdd = (TestCoDeleteByDrainData){
+        .blk = blk,
+        .detach_instead_of_delete = detach_instead_of_delete,
+        .done = false,
+    };
+    co = qemu_coroutine_create(test_co_delete_by_drain, &dbdd);
+    qemu_coroutine_enter(co);
+
+    /* Drain the child while the read operation is still pending.
+     * This should result in the operation finishing and
+     * test_co_delete_by_drain() resuming.  Thus, @bs will be deleted
+     * and the coroutine will exit while this drain operation is still
+     * in progress. */
+    bdrv_ref(child_bs);
+    bdrv_drain(child_bs);
+    bdrv_unref(child_bs);
+
+    while (!dbdd.done) {
+        aio_poll(qemu_get_aio_context(), true);
+    }
+
+    if (detach_instead_of_delete) {
+        /* Here, the reference has not passed over to the coroutine,
+         * so we have to delete the BB ourselves */
+        blk_unref(blk);
+    }
+}
+
+
+static void test_delete_by_drain(void)
+{
+    do_test_delete_by_drain(false);
+}
+
+static void test_detach_by_drain(void)
+{
+    do_test_delete_by_drain(true);
+}
+
+
 int main(int argc, char **argv)
 {
     int ret;
@@ -839,6 +1005,9 @@ int main(int argc, char **argv)
     g_test_add_func("/bdrv-drain/blockjob/drain_subtree",
                     test_blockjob_drain_subtree);
 
+    g_test_add_func("/bdrv-drain/deletion", test_delete_by_drain);
+    g_test_add_func("/bdrv-drain/detach", test_detach_by_drain);
+
     ret = g_test_run();
     qemu_event_destroy(&done_event);
     return ret;
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [Qemu-devel] [PATCH 10/19] block: Drain recursively with a single BDRV_POLL_WHILE()
  2018-04-11 16:39 [Qemu-devel] [PATCH 00/19] Drain fixes and cleanups, part 3 Kevin Wolf
                   ` (8 preceding siblings ...)
  2018-04-11 16:39 ` [Qemu-devel] [PATCH 09/19] test-bdrv-drain: Add test for node deletion Kevin Wolf
@ 2018-04-11 16:39 ` Kevin Wolf
  2018-04-12  8:41   ` Paolo Bonzini
  2018-04-11 16:39 ` [Qemu-devel] [PATCH 11/19] test-bdrv-drain: Test node deletion in subtree recursion Kevin Wolf
                   ` (10 subsequent siblings)
  20 siblings, 1 reply; 52+ messages in thread
From: Kevin Wolf @ 2018-04-11 16:39 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, mreitz, pbonzini, famz, stefanha, qemu-devel

Anything can happen inside BDRV_POLL_WHILE(), including graph
changes that may interfere with its callers (e.g. child list iteration
in recursive callers of bdrv_do_drained_begin).

Switch to a single BDRV_POLL_WHILE() call for the whole subtree at the
end of bdrv_do_drained_begin() to avoid such effects. The recursion
happens now inside the loop condition. As the graph can only change
between bdrv_drain_poll() calls, but not inside of it, doing the
recursion here is safe.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 include/block/block.h |  2 +-
 block.c               |  2 +-
 block/io.c            | 58 +++++++++++++++++++++++++++++++++++++--------------
 3 files changed, 44 insertions(+), 18 deletions(-)

diff --git a/include/block/block.h b/include/block/block.h
index 23dee3c114..91bf3b4e36 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -563,7 +563,7 @@ void bdrv_parent_drained_end(BlockDriverState *bs, BdrvChild *ignore);
  *
  * Poll for pending requests in @bs. This is part of bdrv_drained_begin.
  */
-bool bdrv_drain_poll(BlockDriverState *bs, bool top_level);
+bool bdrv_drain_poll(BlockDriverState *bs, bool top_level, bool recursive);
 
 /**
  * bdrv_drained_begin:
diff --git a/block.c b/block.c
index 462287bdfb..9fe39ac8c1 100644
--- a/block.c
+++ b/block.c
@@ -823,7 +823,7 @@ static void bdrv_child_cb_drained_begin(BdrvChild *child)
 static bool bdrv_child_cb_drained_poll(BdrvChild *child)
 {
     BlockDriverState *bs = child->opaque;
-    return bdrv_drain_poll(bs, false);
+    return bdrv_drain_poll(bs, false, false);
 }
 
 static void bdrv_child_cb_drained_end(BdrvChild *child)
diff --git a/block/io.c b/block/io.c
index f24f39c278..1287630c58 100644
--- a/block/io.c
+++ b/block/io.c
@@ -161,6 +161,7 @@ typedef struct {
     bool done;
     bool begin;
     bool recursive;
+    bool poll;
     BdrvChild *parent;
 } BdrvCoDrainData;
 
@@ -196,8 +197,10 @@ static void bdrv_drain_invoke(BlockDriverState *bs, bool begin)
 }
 
 /* Returns true if BDRV_POLL_WHILE() should go into a blocking aio_poll() */
-bool bdrv_drain_poll(BlockDriverState *bs, bool top_level)
+bool bdrv_drain_poll(BlockDriverState *bs, bool top_level, bool recursive)
 {
+    BdrvChild *child, *next;
+
     /* Execute pending BHs first and check everything else only after the BHs
      * have executed. */
     if (top_level) {
@@ -208,11 +211,23 @@ bool bdrv_drain_poll(BlockDriverState *bs, bool top_level)
         return true;
     }
 
-    return atomic_read(&bs->in_flight);
+    if (atomic_read(&bs->in_flight)) {
+        return true;
+    }
+
+    if (recursive) {
+        QLIST_FOREACH_SAFE(child, &bs->children, next, next) {
+            if (bdrv_drain_poll(child->bs, false, recursive)) {
+                return true;
+            }
+        }
+    }
+
+    return false;
 }
 
 static void bdrv_do_drained_begin(BlockDriverState *bs, bool recursive,
-                                  BdrvChild *parent);
+                                  BdrvChild *parent, bool poll);
 static void bdrv_do_drained_end(BlockDriverState *bs, bool recursive,
                                 BdrvChild *parent);
 
@@ -224,7 +239,7 @@ static void bdrv_co_drain_bh_cb(void *opaque)
 
     bdrv_dec_in_flight(bs);
     if (data->begin) {
-        bdrv_do_drained_begin(bs, data->recursive, data->parent);
+        bdrv_do_drained_begin(bs, data->recursive, data->parent, data->poll);
     } else {
         bdrv_do_drained_end(bs, data->recursive, data->parent);
     }
@@ -235,7 +250,7 @@ static void bdrv_co_drain_bh_cb(void *opaque)
 
 static void coroutine_fn bdrv_co_yield_to_drain(BlockDriverState *bs,
                                                 bool begin, bool recursive,
-                                                BdrvChild *parent)
+                                                BdrvChild *parent, bool poll)
 {
     BdrvCoDrainData data;
 
@@ -250,6 +265,7 @@ static void coroutine_fn bdrv_co_yield_to_drain(BlockDriverState *bs,
         .begin = begin,
         .recursive = recursive,
         .parent = parent,
+        .poll = poll,
     };
     bdrv_inc_in_flight(bs);
     aio_bh_schedule_oneshot(bdrv_get_aio_context(bs),
@@ -262,12 +278,12 @@ static void coroutine_fn bdrv_co_yield_to_drain(BlockDriverState *bs,
 }
 
 void bdrv_do_drained_begin(BlockDriverState *bs, bool recursive,
-                           BdrvChild *parent)
+                           BdrvChild *parent, bool poll)
 {
     BdrvChild *child, *next;
 
     if (qemu_in_coroutine()) {
-        bdrv_co_yield_to_drain(bs, true, recursive, parent);
+        bdrv_co_yield_to_drain(bs, true, recursive, parent, poll);
         return;
     }
 
@@ -279,25 +295,35 @@ void bdrv_do_drained_begin(BlockDriverState *bs, bool recursive,
     bdrv_parent_drained_begin(bs, parent);
     bdrv_drain_invoke(bs, true);
 
-    /* Wait for drained requests to finish */
-    BDRV_POLL_WHILE(bs, bdrv_drain_poll(bs, true));
-
     if (recursive) {
         bs->recursive_quiesce_counter++;
         QLIST_FOREACH_SAFE(child, &bs->children, next, next) {
-            bdrv_do_drained_begin(child->bs, true, child);
+            bdrv_do_drained_begin(child->bs, true, child, false);
         }
     }
+
+    /*
+     * Wait for drained requests to finish.
+     *
+     * Calling BDRV_POLL_WHILE() only once for the top-level node is okay: The
+     * call is needed so things in this AioContext can make progress even
+     * though we don't return to the main AioContext loop - this automatically
+     * includes other nodes in the same AioContext and therefore all child
+     * nodes.
+     */
+    if (poll) {
+        BDRV_POLL_WHILE(bs, bdrv_drain_poll(bs, true, recursive));
+    }
 }
 
 void bdrv_drained_begin(BlockDriverState *bs)
 {
-    bdrv_do_drained_begin(bs, false, NULL);
+    bdrv_do_drained_begin(bs, false, NULL, true);
 }
 
 void bdrv_subtree_drained_begin(BlockDriverState *bs)
 {
-    bdrv_do_drained_begin(bs, true, NULL);
+    bdrv_do_drained_begin(bs, true, NULL, true);
 }
 
 void bdrv_do_drained_end(BlockDriverState *bs, bool recursive,
@@ -307,7 +333,7 @@ void bdrv_do_drained_end(BlockDriverState *bs, bool recursive,
     int old_quiesce_counter;
 
     if (qemu_in_coroutine()) {
-        bdrv_co_yield_to_drain(bs, false, recursive, parent);
+        bdrv_co_yield_to_drain(bs, false, recursive, parent, false);
         return;
     }
     assert(bs->quiesce_counter > 0);
@@ -343,7 +369,7 @@ void bdrv_apply_subtree_drain(BdrvChild *child, BlockDriverState *new_parent)
     int i;
 
     for (i = 0; i < new_parent->recursive_quiesce_counter; i++) {
-        bdrv_do_drained_begin(child->bs, true, child);
+        bdrv_do_drained_begin(child->bs, true, child, true);
     }
 }
 
@@ -413,7 +439,7 @@ void bdrv_drain_all_begin(void)
         AioContext *aio_context = bdrv_get_aio_context(bs);
 
         aio_context_acquire(aio_context);
-        bdrv_do_drained_begin(bs, true, NULL);
+        bdrv_do_drained_begin(bs, true, NULL, true);
         aio_context_release(aio_context);
     }
 
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [Qemu-devel] [PATCH 11/19] test-bdrv-drain: Test node deletion in subtree recursion
  2018-04-11 16:39 [Qemu-devel] [PATCH 00/19] Drain fixes and cleanups, part 3 Kevin Wolf
                   ` (9 preceding siblings ...)
  2018-04-11 16:39 ` [Qemu-devel] [PATCH 10/19] block: Drain recursively with a single BDRV_POLL_WHILE() Kevin Wolf
@ 2018-04-11 16:39 ` Kevin Wolf
  2018-04-11 16:39 ` [Qemu-devel] [PATCH 12/19] block: Don't poll in parent drain callbacks Kevin Wolf
                   ` (9 subsequent siblings)
  20 siblings, 0 replies; 52+ messages in thread
From: Kevin Wolf @ 2018-04-11 16:39 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, mreitz, pbonzini, famz, stefanha, qemu-devel

If bdrv_do_drained_begin() polls during its subtree recursion, the graph
can change and mess up the bs->children iteration. Test that this
doesn't happen.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 tests/test-bdrv-drain.c | 38 +++++++++++++++++++++++++++++---------
 1 file changed, 29 insertions(+), 9 deletions(-)

diff --git a/tests/test-bdrv-drain.c b/tests/test-bdrv-drain.c
index 05768213be..4af6571ca3 100644
--- a/tests/test-bdrv-drain.c
+++ b/tests/test-bdrv-drain.c
@@ -875,7 +875,8 @@ static void coroutine_fn test_co_delete_by_drain(void *opaque)
  * If @detach_instead_of_delete is set, the BDS is not going to be
  * deleted but will only detach all of its children.
  */
-static void do_test_delete_by_drain(bool detach_instead_of_delete)
+static void do_test_delete_by_drain(bool detach_instead_of_delete,
+                                    enum drain_type drain_type)
 {
     BlockBackend *blk;
     BlockDriverState *bs, *child_bs, *null_bs;
@@ -931,9 +932,23 @@ static void do_test_delete_by_drain(bool detach_instead_of_delete)
      * test_co_delete_by_drain() resuming.  Thus, @bs will be deleted
      * and the coroutine will exit while this drain operation is still
      * in progress. */
-    bdrv_ref(child_bs);
-    bdrv_drain(child_bs);
-    bdrv_unref(child_bs);
+    switch (drain_type) {
+    case BDRV_DRAIN:
+        bdrv_ref(child_bs);
+        bdrv_drain(child_bs);
+        bdrv_unref(child_bs);
+        break;
+    case BDRV_SUBTREE_DRAIN:
+        /* Would have to ref/unref bs here for !detach_instead_of_delete, but
+         * then the whole test becomes pointless because the graph changes
+         * don't occur during the drain any more. */
+        assert(detach_instead_of_delete);
+        bdrv_subtree_drained_begin(bs);
+        bdrv_subtree_drained_end(bs);
+        break;
+    default:
+        g_assert_not_reached();
+    }
 
     while (!dbdd.done) {
         aio_poll(qemu_get_aio_context(), true);
@@ -946,15 +961,19 @@ static void do_test_delete_by_drain(bool detach_instead_of_delete)
     }
 }
 
-
 static void test_delete_by_drain(void)
 {
-    do_test_delete_by_drain(false);
+    do_test_delete_by_drain(false, BDRV_DRAIN);
 }
 
 static void test_detach_by_drain(void)
 {
-    do_test_delete_by_drain(true);
+    do_test_delete_by_drain(true, BDRV_DRAIN);
+}
+
+static void test_detach_by_drain_subtree(void)
+{
+    do_test_delete_by_drain(true, BDRV_SUBTREE_DRAIN);
 }
 
 
@@ -1005,8 +1024,9 @@ int main(int argc, char **argv)
     g_test_add_func("/bdrv-drain/blockjob/drain_subtree",
                     test_blockjob_drain_subtree);
 
-    g_test_add_func("/bdrv-drain/deletion", test_delete_by_drain);
-    g_test_add_func("/bdrv-drain/detach", test_detach_by_drain);
+    g_test_add_func("/bdrv-drain/deletion/drain", test_delete_by_drain);
+    g_test_add_func("/bdrv-drain/detach/drain", test_detach_by_drain);
+    g_test_add_func("/bdrv-drain/detach/drain_subtree", test_detach_by_drain_subtree);
 
     ret = g_test_run();
     qemu_event_destroy(&done_event);
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [Qemu-devel] [PATCH 12/19] block: Don't poll in parent drain callbacks
  2018-04-11 16:39 [Qemu-devel] [PATCH 00/19] Drain fixes and cleanups, part 3 Kevin Wolf
                   ` (10 preceding siblings ...)
  2018-04-11 16:39 ` [Qemu-devel] [PATCH 11/19] test-bdrv-drain: Test node deletion in subtree recursion Kevin Wolf
@ 2018-04-11 16:39 ` Kevin Wolf
  2018-04-11 16:39 ` [Qemu-devel] [PATCH 13/19] test-bdrv-drain: Graph change through parent callback Kevin Wolf
                   ` (8 subsequent siblings)
  20 siblings, 0 replies; 52+ messages in thread
From: Kevin Wolf @ 2018-04-11 16:39 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, mreitz, pbonzini, famz, stefanha, qemu-devel

bdrv_do_drained_begin() is only safe if we have a single
BDRV_POLL_WHILE() after quiescing all affected nodes. We cannot allow
that parent callbacks introduce a nested polling loop that could cause
graph changes while we're traversing the graph.

Split off bdrv_do_drained_begin_quiesce(), which only quiesces a single
node without waiting for its requests to complete. These requests will
be waited for in the BDRV_POLL_WHILE() call down the call chain.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 include/block/block.h |  9 +++++++++
 block.c               |  2 +-
 block/io.c            | 24 ++++++++++++++++--------
 3 files changed, 26 insertions(+), 9 deletions(-)

diff --git a/include/block/block.h b/include/block/block.h
index 91bf3b4e36..de2cba2c74 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -578,6 +578,15 @@ bool bdrv_drain_poll(BlockDriverState *bs, bool top_level, bool recursive);
 void bdrv_drained_begin(BlockDriverState *bs);
 
 /**
+ * bdrv_do_drained_begin_quiesce:
+ *
+ * Quiesces a BDS like bdrv_drained_begin(), but does not wait for already
+ * running requests to complete.
+ */
+void bdrv_do_drained_begin_quiesce(BlockDriverState *bs,
+                                   BdrvChild *parent);
+
+/**
  * Like bdrv_drained_begin, but recursively begins a quiesced section for
  * exclusive access to all child nodes as well.
  */
diff --git a/block.c b/block.c
index 9fe39ac8c1..330238de19 100644
--- a/block.c
+++ b/block.c
@@ -817,7 +817,7 @@ static char *bdrv_child_get_parent_desc(BdrvChild *c)
 static void bdrv_child_cb_drained_begin(BdrvChild *child)
 {
     BlockDriverState *bs = child->opaque;
-    bdrv_drained_begin(bs);
+    bdrv_do_drained_begin_quiesce(bs, NULL);
 }
 
 static bool bdrv_child_cb_drained_poll(BdrvChild *child)
diff --git a/block/io.c b/block/io.c
index 1287630c58..f372b9ffb0 100644
--- a/block/io.c
+++ b/block/io.c
@@ -277,15 +277,10 @@ static void coroutine_fn bdrv_co_yield_to_drain(BlockDriverState *bs,
     assert(data.done);
 }
 
-void bdrv_do_drained_begin(BlockDriverState *bs, bool recursive,
-                           BdrvChild *parent, bool poll)
+void bdrv_do_drained_begin_quiesce(BlockDriverState *bs,
+                                   BdrvChild *parent)
 {
-    BdrvChild *child, *next;
-
-    if (qemu_in_coroutine()) {
-        bdrv_co_yield_to_drain(bs, true, recursive, parent, poll);
-        return;
-    }
+    assert(!qemu_in_coroutine());
 
     /* Stop things in parent-to-child order */
     if (atomic_fetch_inc(&bs->quiesce_counter) == 0) {
@@ -294,6 +289,19 @@ void bdrv_do_drained_begin(BlockDriverState *bs, bool recursive,
 
     bdrv_parent_drained_begin(bs, parent);
     bdrv_drain_invoke(bs, true);
+}
+
+static void bdrv_do_drained_begin(BlockDriverState *bs, bool recursive,
+                                  BdrvChild *parent, bool poll)
+{
+    BdrvChild *child, *next;
+
+    if (qemu_in_coroutine()) {
+        bdrv_co_yield_to_drain(bs, true, recursive, parent, poll);
+        return;
+    }
+
+    bdrv_do_drained_begin_quiesce(bs, parent);
 
     if (recursive) {
         bs->recursive_quiesce_counter++;
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [Qemu-devel] [PATCH 13/19] test-bdrv-drain: Graph change through parent callback
  2018-04-11 16:39 [Qemu-devel] [PATCH 00/19] Drain fixes and cleanups, part 3 Kevin Wolf
                   ` (11 preceding siblings ...)
  2018-04-11 16:39 ` [Qemu-devel] [PATCH 12/19] block: Don't poll in parent drain callbacks Kevin Wolf
@ 2018-04-11 16:39 ` Kevin Wolf
  2018-04-11 16:39 ` [Qemu-devel] [PATCH 14/19] block: Defer .bdrv_drain_begin callback to polling phase Kevin Wolf
                   ` (7 subsequent siblings)
  20 siblings, 0 replies; 52+ messages in thread
From: Kevin Wolf @ 2018-04-11 16:39 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, mreitz, pbonzini, famz, stefanha, qemu-devel

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 tests/test-bdrv-drain.c | 130 ++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 130 insertions(+)

diff --git a/tests/test-bdrv-drain.c b/tests/test-bdrv-drain.c
index 4af6571ca3..fdf3ce19ea 100644
--- a/tests/test-bdrv-drain.c
+++ b/tests/test-bdrv-drain.c
@@ -977,6 +977,135 @@ static void test_detach_by_drain_subtree(void)
 }
 
 
+struct detach_by_parent_data {
+    BlockDriverState *parent_b;
+    BdrvChild *child_b;
+    BlockDriverState *c;
+    BdrvChild *child_c;
+};
+
+static void detach_by_parent_aio_cb(void *opaque, int ret)
+{
+    struct detach_by_parent_data *data = opaque;
+
+    g_assert_cmpint(ret, ==, 0);
+    bdrv_unref_child(data->parent_b, data->child_b);
+
+    bdrv_ref(data->c);
+    data->child_c = bdrv_attach_child(data->parent_b, data->c, "PB-C",
+                                      &child_file, &error_abort);
+}
+
+/*
+ * Initial graph:
+ *
+ * PA     PB
+ *    \ /   \
+ *     A     B     C
+ *
+ * PA has a pending write request whose callback changes the child nodes of PB:
+ * It removes B and adds C instead. The subtree of PB is drained, which will
+ * indirectly drain the write request, too.
+ */
+static void test_detach_by_parent_cb(void)
+{
+    BlockBackend *blk;
+    BlockDriverState *parent_a, *parent_b, *a, *b, *c;
+    BdrvChild *child_a, *child_b;
+    BlockAIOCB *acb;
+    struct detach_by_parent_data data;
+
+    QEMUIOVector qiov;
+    struct iovec iov = {
+        .iov_base = NULL,
+        .iov_len = 0,
+    };
+    qemu_iovec_init_external(&qiov, &iov, 1);
+
+    /* Create all involved nodes */
+    parent_a = bdrv_new_open_driver(&bdrv_test, "parent-a", BDRV_O_RDWR,
+                                    &error_abort);
+    parent_b = bdrv_new_open_driver(&bdrv_test, "parent-b", 0,
+                                    &error_abort);
+
+    a = bdrv_new_open_driver(&bdrv_test, "a", BDRV_O_RDWR, &error_abort);
+    b = bdrv_new_open_driver(&bdrv_test, "b", BDRV_O_RDWR, &error_abort);
+    c = bdrv_new_open_driver(&bdrv_test, "c", BDRV_O_RDWR, &error_abort);
+
+    /* blk is a BB for parent-a */
+    blk = blk_new(BLK_PERM_ALL, BLK_PERM_ALL);
+    blk_insert_bs(blk, parent_a, &error_abort);
+    bdrv_unref(parent_a);
+
+    /* Set child relationships */
+    bdrv_ref(b);
+    bdrv_ref(a);
+    child_b = bdrv_attach_child(parent_b, b, "PB-B", &child_file, &error_abort);
+    child_a = bdrv_attach_child(parent_b, a, "PB-A", &child_backing, &error_abort);
+
+    bdrv_ref(a);
+    bdrv_attach_child(parent_a, a, "PA-A", &child_file, &error_abort);
+
+    g_assert_cmpint(parent_a->refcnt, ==, 1);
+    g_assert_cmpint(parent_b->refcnt, ==, 1);
+    g_assert_cmpint(a->refcnt, ==, 3);
+    g_assert_cmpint(b->refcnt, ==, 2);
+    g_assert_cmpint(c->refcnt, ==, 1);
+
+    g_assert(QLIST_FIRST(&parent_b->children) == child_a);
+    g_assert(QLIST_NEXT(child_a, next) == child_b);
+    g_assert(QLIST_NEXT(child_b, next) == NULL);
+
+    /* Start the evil write request */
+    data = (struct detach_by_parent_data) {
+        .parent_b = parent_b,
+        .child_b = child_b,
+        .c = c,
+    };
+    acb = blk_aio_preadv(blk, 0, &qiov, 0, detach_by_parent_aio_cb, &data);
+    g_assert(acb != NULL);
+
+    /* Drain and check the expected result */
+    bdrv_subtree_drained_begin(parent_b);
+
+    g_assert(data.child_c != NULL);
+
+    g_assert_cmpint(parent_a->refcnt, ==, 1);
+    g_assert_cmpint(parent_b->refcnt, ==, 1);
+    g_assert_cmpint(a->refcnt, ==, 3);
+    g_assert_cmpint(b->refcnt, ==, 1);
+    g_assert_cmpint(c->refcnt, ==, 2);
+
+    g_assert(QLIST_FIRST(&parent_b->children) == data.child_c);
+    g_assert(QLIST_NEXT(data.child_c, next) == child_a);
+    g_assert(QLIST_NEXT(child_a, next) == NULL);
+
+    g_assert_cmpint(parent_a->quiesce_counter, ==, 1);
+    g_assert_cmpint(parent_b->quiesce_counter, ==, 1);
+    g_assert_cmpint(a->quiesce_counter, ==, 1);
+    g_assert_cmpint(b->quiesce_counter, ==, 0);
+    g_assert_cmpint(c->quiesce_counter, ==, 1);
+
+    bdrv_subtree_drained_end(parent_b);
+
+    bdrv_unref(parent_b);
+    blk_unref(blk);
+
+    /* XXX Once bdrv_close() unref's children instead of just detaching them,
+     * this won't be necessary any more. */
+    bdrv_unref(a);
+    bdrv_unref(a);
+    bdrv_unref(c);
+
+    g_assert_cmpint(a->refcnt, ==, 1);
+    g_assert_cmpint(b->refcnt, ==, 1);
+    g_assert_cmpint(c->refcnt, ==, 1);
+    bdrv_unref(a);
+    bdrv_unref(b);
+    bdrv_unref(c);
+}
+
+
 int main(int argc, char **argv)
 {
     int ret;
@@ -1027,6 +1156,7 @@ int main(int argc, char **argv)
     g_test_add_func("/bdrv-drain/deletion/drain", test_delete_by_drain);
     g_test_add_func("/bdrv-drain/detach/drain", test_detach_by_drain);
     g_test_add_func("/bdrv-drain/detach/drain_subtree", test_detach_by_drain_subtree);
+    g_test_add_func("/bdrv-drain/detach/parent_cb", test_detach_by_parent_cb);
 
     ret = g_test_run();
     qemu_event_destroy(&done_event);
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [Qemu-devel] [PATCH 14/19] block: Defer .bdrv_drain_begin callback to polling phase
  2018-04-11 16:39 [Qemu-devel] [PATCH 00/19] Drain fixes and cleanups, part 3 Kevin Wolf
                   ` (12 preceding siblings ...)
  2018-04-11 16:39 ` [Qemu-devel] [PATCH 13/19] test-bdrv-drain: Graph change through parent callback Kevin Wolf
@ 2018-04-11 16:39 ` Kevin Wolf
  2018-06-27 14:30   ` Max Reitz
  2018-04-11 16:39 ` [Qemu-devel] [PATCH 15/19] test-bdrv-drain: Test that bdrv_drain_invoke() doesn't poll Kevin Wolf
                   ` (6 subsequent siblings)
  20 siblings, 1 reply; 52+ messages in thread
From: Kevin Wolf @ 2018-04-11 16:39 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, mreitz, pbonzini, famz, stefanha, qemu-devel

We cannot allow aio_poll() in bdrv_drain_invoke(begin=true) until we're
done with propagating the drain through the graph and are doing the
single final BDRV_POLL_WHILE().

Just schedule the coroutine with the callback and increase bs->in_flight
to make sure that the polling phase will wait for it.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 block/io.c | 28 +++++++++++++++++++++++-----
 1 file changed, 23 insertions(+), 5 deletions(-)

diff --git a/block/io.c b/block/io.c
index f372b9ffb0..fc26c1a2f8 100644
--- a/block/io.c
+++ b/block/io.c
@@ -178,22 +178,40 @@ static void coroutine_fn bdrv_drain_invoke_entry(void *opaque)
 
     /* Set data->done before reading bs->wakeup.  */
     atomic_mb_set(&data->done, true);
-    bdrv_wakeup(bs);
+    bdrv_dec_in_flight(bs);
+
+    if (data->begin) {
+        g_free(data);
+    }
 }
 
 /* Recursively call BlockDriver.bdrv_co_drain_begin/end callbacks */
 static void bdrv_drain_invoke(BlockDriverState *bs, bool begin)
 {
-    BdrvCoDrainData data = { .bs = bs, .done = false, .begin = begin};
+    BdrvCoDrainData *data;
 
     if (!bs->drv || (begin && !bs->drv->bdrv_co_drain_begin) ||
             (!begin && !bs->drv->bdrv_co_drain_end)) {
         return;
     }
 
-    data.co = qemu_coroutine_create(bdrv_drain_invoke_entry, &data);
-    bdrv_coroutine_enter(bs, data.co);
-    BDRV_POLL_WHILE(bs, !data.done);
+    data = g_new(BdrvCoDrainData, 1);
+    *data = (BdrvCoDrainData) {
+        .bs = bs,
+        .done = false,
+        .begin = begin
+    };
+
+    /* Make sure the driver callback completes during the polling phase for
+     * drain_begin. */
+    bdrv_inc_in_flight(bs);
+    data->co = qemu_coroutine_create(bdrv_drain_invoke_entry, data);
+    aio_co_schedule(bdrv_get_aio_context(bs), data->co);
+
+    if (!begin) {
+        BDRV_POLL_WHILE(bs, !data->done);
+        g_free(data);
+    }
 }
 
 /* Returns true if BDRV_POLL_WHILE() should go into a blocking aio_poll() */
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [Qemu-devel] [PATCH 15/19] test-bdrv-drain: Test that bdrv_drain_invoke() doesn't poll
  2018-04-11 16:39 [Qemu-devel] [PATCH 00/19] Drain fixes and cleanups, part 3 Kevin Wolf
                   ` (13 preceding siblings ...)
  2018-04-11 16:39 ` [Qemu-devel] [PATCH 14/19] block: Defer .bdrv_drain_begin callback to polling phase Kevin Wolf
@ 2018-04-11 16:39 ` Kevin Wolf
  2018-04-11 16:39 ` [Qemu-devel] [PATCH 16/19] block: Allow AIO_WAIT_WHILE with NULL ctx Kevin Wolf
                   ` (5 subsequent siblings)
  20 siblings, 0 replies; 52+ messages in thread
From: Kevin Wolf @ 2018-04-11 16:39 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, mreitz, pbonzini, famz, stefanha, qemu-devel

This adds a test case that goes wrong if bdrv_drain_invoke() calls
aio_poll().

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 tests/test-bdrv-drain.c | 102 +++++++++++++++++++++++++++++++++++++++++-------
 1 file changed, 88 insertions(+), 14 deletions(-)

diff --git a/tests/test-bdrv-drain.c b/tests/test-bdrv-drain.c
index fdf3ce19ea..79d845ecbb 100644
--- a/tests/test-bdrv-drain.c
+++ b/tests/test-bdrv-drain.c
@@ -34,12 +34,16 @@ static QemuEvent done_event;
 typedef struct BDRVTestState {
     int drain_count;
     AioContext *bh_indirection_ctx;
+    bool sleep_in_drain_begin;
 } BDRVTestState;
 
 static void coroutine_fn bdrv_test_co_drain_begin(BlockDriverState *bs)
 {
     BDRVTestState *s = bs->opaque;
     s->drain_count++;
+    if (s->sleep_in_drain_begin) {
+        qemu_co_sleep_ns(QEMU_CLOCK_REALTIME, 100000);
+    }
 }
 
 static void coroutine_fn bdrv_test_co_drain_end(BlockDriverState *bs)
@@ -80,6 +84,22 @@ static int coroutine_fn bdrv_test_co_preadv(BlockDriverState *bs,
     return 0;
 }
 
+static void bdrv_test_child_perm(BlockDriverState *bs, BdrvChild *c,
+                                 const BdrvChildRole *role,
+                                 BlockReopenQueue *reopen_queue,
+                                 uint64_t perm, uint64_t shared,
+                                 uint64_t *nperm, uint64_t *nshared)
+{
+    /* bdrv_format_default_perms() accepts only these two, so disguise
+     * detach_by_driver_cb_role as one of them. */
+    if (role != &child_file && role != &child_backing) {
+        role = &child_file;
+    }
+
+    bdrv_format_default_perms(bs, c, role, reopen_queue, perm, shared,
+                              nperm, nshared);
+}
+
 static BlockDriver bdrv_test = {
     .format_name            = "test",
     .instance_size          = sizeof(BDRVTestState),
@@ -90,7 +110,7 @@ static BlockDriver bdrv_test = {
     .bdrv_co_drain_begin    = bdrv_test_co_drain_begin,
     .bdrv_co_drain_end      = bdrv_test_co_drain_end,
 
-    .bdrv_child_perm        = bdrv_format_default_perms,
+    .bdrv_child_perm        = bdrv_test_child_perm,
 };
 
 static void aio_ret_cb(void *opaque, int ret)
@@ -982,13 +1002,14 @@ struct detach_by_parent_data {
     BdrvChild *child_b;
     BlockDriverState *c;
     BdrvChild *child_c;
+    bool by_parent_cb;
 };
+static struct detach_by_parent_data detach_by_parent_data;
 
-static void detach_by_parent_aio_cb(void *opaque, int ret)
+static void detach_indirect_bh(void *opaque)
 {
     struct detach_by_parent_data *data = opaque;
 
-    g_assert_cmpint(ret, ==, 0);
     bdrv_unref_child(data->parent_b, data->child_b);
 
     bdrv_ref(data->c);
@@ -996,6 +1017,25 @@ static void detach_by_parent_aio_cb(void *opaque, int ret)
                                       &child_file, &error_abort);
 }
 
+static void detach_by_parent_aio_cb(void *opaque, int ret)
+{
+    struct detach_by_parent_data *data = &detach_by_parent_data;
+
+    g_assert_cmpint(ret, ==, 0);
+    if (data->by_parent_cb) {
+        detach_indirect_bh(data);
+    }
+}
+
+static void detach_by_driver_cb_drained_begin(BdrvChild *child)
+{
+    aio_bh_schedule_oneshot(qemu_get_current_aio_context(),
+                            detach_indirect_bh, &detach_by_parent_data);
+    child_file.drained_begin(child);
+}
+
+static BdrvChildRole detach_by_driver_cb_role;
+
 /*
  * Initial graph:
  *
@@ -1003,17 +1043,25 @@ static void detach_by_parent_aio_cb(void *opaque, int ret)
  *    \ /   \
  *     A     B     C
  *
- * PA has a pending write request whose callback changes the child nodes of PB:
- * It removes B and adds C instead. The subtree of PB is drained, which will
- * indirectly drain the write request, too.
+ * by_parent_cb == true:  Test that parent callbacks don't poll
+ *
+ *     PA has a pending write request whose callback changes the child nodes of
+ *     PB: It removes B and adds C instead. The subtree of PB is drained, which
+ *     will indirectly drain the write request, too.
+ *
+ * by_parent_cb == false: Test that bdrv_drain_invoke() doesn't poll
+ *
+ *     PA's BdrvChildRole has a .drained_begin callback that schedules a BH
+ *     that does the same graph change. If bdrv_drain_invoke() calls it, the
+ *     state is messed up, but if it is only polled in the single
+ *     BDRV_POLL_WHILE() at the end of the drain, this should work fine.
  */
-static void test_detach_by_parent_cb(void)
+static void test_detach_indirect(bool by_parent_cb)
 {
     BlockBackend *blk;
     BlockDriverState *parent_a, *parent_b, *a, *b, *c;
     BdrvChild *child_a, *child_b;
     BlockAIOCB *acb;
-    struct detach_by_parent_data data;
 
     QEMUIOVector qiov;
     struct iovec iov = {
@@ -1022,6 +1070,12 @@ static void test_detach_by_parent_cb(void)
     };
     qemu_iovec_init_external(&qiov, &iov, 1);
 
+    if (!by_parent_cb) {
+        detach_by_driver_cb_role = child_file;
+        detach_by_driver_cb_role.drained_begin =
+            detach_by_driver_cb_drained_begin;
+    }
+
     /* Create all involved nodes */
     parent_a = bdrv_new_open_driver(&bdrv_test, "parent-a", BDRV_O_RDWR,
                                     &error_abort);
@@ -1037,6 +1091,13 @@ static void test_detach_by_parent_cb(void)
     blk_insert_bs(blk, parent_a, &error_abort);
     bdrv_unref(parent_a);
 
+    /* If we want to get bdrv_drain_invoke() to call aio_poll(), the driver
+     * callback must not return immediately. */
+    if (!by_parent_cb) {
+        BDRVTestState *s = parent_a->opaque;
+        s->sleep_in_drain_begin = true;
+    }
+
     /* Set child relationships */
     bdrv_ref(b);
     bdrv_ref(a);
@@ -1044,7 +1105,9 @@ static void test_detach_by_parent_cb(void)
     child_a = bdrv_attach_child(parent_b, a, "PB-A", &child_backing, &error_abort);
 
     bdrv_ref(a);
-    bdrv_attach_child(parent_a, a, "PA-A", &child_file, &error_abort);
+    bdrv_attach_child(parent_a, a, "PA-A",
+                      by_parent_cb ? &child_file : &detach_by_driver_cb_role,
+                      &error_abort);
 
     g_assert_cmpint(parent_a->refcnt, ==, 1);
     g_assert_cmpint(parent_b->refcnt, ==, 1);
@@ -1057,18 +1120,19 @@ static void test_detach_by_parent_cb(void)
     g_assert(QLIST_NEXT(child_b, next) == NULL);
 
     /* Start the evil write request */
-    data = (struct detach_by_parent_data) {
+    detach_by_parent_data = (struct detach_by_parent_data) {
         .parent_b = parent_b,
         .child_b = child_b,
         .c = c,
+        .by_parent_cb = by_parent_cb,
     };
-    acb = blk_aio_preadv(blk, 0, &qiov, 0, detach_by_parent_aio_cb, &data);
+    acb = blk_aio_preadv(blk, 0, &qiov, 0, detach_by_parent_aio_cb, NULL);
     g_assert(acb != NULL);
 
     /* Drain and check the expected result */
     bdrv_subtree_drained_begin(parent_b);
 
-    g_assert(data.child_c != NULL);
+    g_assert(detach_by_parent_data.child_c != NULL);
 
     g_assert_cmpint(parent_a->refcnt, ==, 1);
     g_assert_cmpint(parent_b->refcnt, ==, 1);
@@ -1076,8 +1140,8 @@ static void test_detach_by_parent_cb(void)
     g_assert_cmpint(b->refcnt, ==, 1);
     g_assert_cmpint(c->refcnt, ==, 2);
 
-    g_assert(QLIST_FIRST(&parent_b->children) == data.child_c);
-    g_assert(QLIST_NEXT(data.child_c, next) == child_a);
+    g_assert(QLIST_FIRST(&parent_b->children) == detach_by_parent_data.child_c);
+    g_assert(QLIST_NEXT(detach_by_parent_data.child_c, next) == child_a);
     g_assert(QLIST_NEXT(child_a, next) == NULL);
 
     g_assert_cmpint(parent_a->quiesce_counter, ==, 1);
@@ -1105,6 +1169,15 @@ static void test_detach_by_parent_cb(void)
     bdrv_unref(c);
 }
 
+static void test_detach_by_parent_cb(void)
+{
+    test_detach_indirect(true);
+}
+
+static void test_detach_by_driver_cb(void)
+{
+    test_detach_indirect(false);
+}
 
 int main(int argc, char **argv)
 {
@@ -1157,6 +1230,7 @@ int main(int argc, char **argv)
     g_test_add_func("/bdrv-drain/detach/drain", test_detach_by_drain);
     g_test_add_func("/bdrv-drain/detach/drain_subtree", test_detach_by_drain_subtree);
     g_test_add_func("/bdrv-drain/detach/parent_cb", test_detach_by_parent_cb);
+    g_test_add_func("/bdrv-drain/detach/driver_cb", test_detach_by_driver_cb);
 
     ret = g_test_run();
     qemu_event_destroy(&done_event);
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [Qemu-devel] [PATCH 16/19] block: Allow AIO_WAIT_WHILE with NULL ctx
  2018-04-11 16:39 [Qemu-devel] [PATCH 00/19] Drain fixes and cleanups, part 3 Kevin Wolf
                   ` (14 preceding siblings ...)
  2018-04-11 16:39 ` [Qemu-devel] [PATCH 15/19] test-bdrv-drain: Test that bdrv_drain_invoke() doesn't poll Kevin Wolf
@ 2018-04-11 16:39 ` Kevin Wolf
  2018-04-12  8:43   ` Paolo Bonzini
  2018-04-11 16:39 ` [Qemu-devel] [PATCH 17/19] block: Move bdrv_drain_all_begin() out of coroutine context Kevin Wolf
                   ` (4 subsequent siblings)
  20 siblings, 1 reply; 52+ messages in thread
From: Kevin Wolf @ 2018-04-11 16:39 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, mreitz, pbonzini, famz, stefanha, qemu-devel

bdrv_drain_all() wants to have a single polling loop for draining the
in-flight requests of all nodes. This means that the AIO_WAIT_WHILE()
condition relies on activity in multiple AioContexts, which is polled
from the mainloop context. We must therefore call AIO_WAIT_WHILE() from
the mainloop thread and use the AioWait notification mechanism.

Just randomly picking the AioContext of any non-mainloop thread would
work, but instead of bothering to find such a context in the caller, we
can just as well accept NULL for ctx.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 include/block/aio-wait.h | 13 +++++++++----
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/include/block/aio-wait.h b/include/block/aio-wait.h
index 783d3678dd..c85a62f798 100644
--- a/include/block/aio-wait.h
+++ b/include/block/aio-wait.h
@@ -57,7 +57,8 @@ typedef struct {
 /**
  * AIO_WAIT_WHILE:
  * @wait: the aio wait object
- * @ctx: the aio context
+ * @ctx: the aio context, or NULL if multiple aio contexts (for which the
+ *       caller does not hold a lock) are involved in the polling condition.
  * @cond: wait while this conditional expression is true
  *
  * Wait while a condition is true.  Use this to implement synchronous
@@ -75,7 +76,7 @@ typedef struct {
     bool waited_ = false;                                          \
     AioWait *wait_ = (wait);                                       \
     AioContext *ctx_ = (ctx);                                      \
-    if (in_aio_context_home_thread(ctx_)) {                        \
+    if (ctx_ && in_aio_context_home_thread(ctx_)) {                \
         while ((cond)) {                                           \
             aio_poll(ctx_, true);                                  \
             waited_ = true;                                        \
@@ -86,9 +87,13 @@ typedef struct {
         /* Increment wait_->num_waiters before evaluating cond. */ \
         atomic_inc(&wait_->num_waiters);                           \
         while ((cond)) {                                           \
-            aio_context_release(ctx_);                             \
+            if (ctx_) {                                            \
+                aio_context_release(ctx_);                         \
+            }                                                      \
             aio_poll(qemu_get_aio_context(), true);                \
-            aio_context_acquire(ctx_);                             \
+            if (ctx_) {                                            \
+                aio_context_acquire(ctx_);                         \
+            }                                                      \
             waited_ = true;                                        \
         }                                                          \
         atomic_dec(&wait_->num_waiters);                           \
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [Qemu-devel] [PATCH 17/19] block: Move bdrv_drain_all_begin() out of coroutine context
  2018-04-11 16:39 [Qemu-devel] [PATCH 00/19] Drain fixes and cleanups, part 3 Kevin Wolf
                   ` (15 preceding siblings ...)
  2018-04-11 16:39 ` [Qemu-devel] [PATCH 16/19] block: Allow AIO_WAIT_WHILE with NULL ctx Kevin Wolf
@ 2018-04-11 16:39 ` Kevin Wolf
  2018-04-11 16:39 ` [Qemu-devel] [PATCH 18/19] block: Allow graph changes in bdrv_drain_all_begin/end sections Kevin Wolf
                   ` (3 subsequent siblings)
  20 siblings, 0 replies; 52+ messages in thread
From: Kevin Wolf @ 2018-04-11 16:39 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, mreitz, pbonzini, famz, stefanha, qemu-devel

Before we can introduce a single polling loop for all nodes in
bdrv_drain_all_begin(), we must make sure to run it outside of coroutine
context like we already do for bdrv_do_drained_begin().

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 block/io.c | 22 +++++++++++++++++-----
 1 file changed, 17 insertions(+), 5 deletions(-)

diff --git a/block/io.c b/block/io.c
index fc26c1a2f8..db6810b35f 100644
--- a/block/io.c
+++ b/block/io.c
@@ -255,11 +255,16 @@ static void bdrv_co_drain_bh_cb(void *opaque)
     Coroutine *co = data->co;
     BlockDriverState *bs = data->bs;
 
-    bdrv_dec_in_flight(bs);
-    if (data->begin) {
-        bdrv_do_drained_begin(bs, data->recursive, data->parent, data->poll);
+    if (bs) {
+        bdrv_dec_in_flight(bs);
+        if (data->begin) {
+            bdrv_do_drained_begin(bs, data->recursive, data->parent, data->poll);
+        } else {
+            bdrv_do_drained_end(bs, data->recursive, data->parent);
+        }
     } else {
-        bdrv_do_drained_end(bs, data->recursive, data->parent);
+        assert(data->begin);
+        bdrv_drain_all_begin();
     }
 
     data->done = true;
@@ -285,7 +290,9 @@ static void coroutine_fn bdrv_co_yield_to_drain(BlockDriverState *bs,
         .parent = parent,
         .poll = poll,
     };
-    bdrv_inc_in_flight(bs);
+    if (bs) {
+        bdrv_inc_in_flight(bs);
+    }
     aio_bh_schedule_oneshot(bdrv_get_aio_context(bs),
                             bdrv_co_drain_bh_cb, &data);
 
@@ -455,6 +462,11 @@ void bdrv_drain_all_begin(void)
     BlockDriverState *bs;
     BdrvNextIterator it;
 
+    if (qemu_in_coroutine()) {
+        bdrv_co_yield_to_drain(NULL, true, false, NULL, true);
+        return;
+    }
+
     /* BDRV_POLL_WHILE() for a node can only be called from its own I/O thread
      * or the main loop AioContext. We potentially use BDRV_POLL_WHILE() on
      * nodes in several different AioContexts, so make sure we're in the main
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [Qemu-devel] [PATCH 18/19] block: Allow graph changes in bdrv_drain_all_begin/end sections
  2018-04-11 16:39 [Qemu-devel] [PATCH 00/19] Drain fixes and cleanups, part 3 Kevin Wolf
                   ` (16 preceding siblings ...)
  2018-04-11 16:39 ` [Qemu-devel] [PATCH 17/19] block: Move bdrv_drain_all_begin() out of coroutine context Kevin Wolf
@ 2018-04-11 16:39 ` Kevin Wolf
  2018-04-12  8:47   ` Paolo Bonzini
  2018-04-11 16:39 ` [Qemu-devel] [PATCH 19/19] test-bdrv-drain: Test graph changes in drain_all section Kevin Wolf
                   ` (2 subsequent siblings)
  20 siblings, 1 reply; 52+ messages in thread
From: Kevin Wolf @ 2018-04-11 16:39 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, mreitz, pbonzini, famz, stefanha, qemu-devel

bdrv_drain_all_*() used bdrv_next() to iterate over all root nodes and
did a subtree drain for each of them. This works fine as long as the
graph is static, but sadly, reality looks different.

If the graph changes so that root nodes are added or removed, we would
have to compensate for this. bdrv_next() returns each root node only
once even if it's the root node for multiple BlockBackends or for a
monitor-owned block driver tree, which would only complicate things.

The much easier and more obviously correct way is to fundamentally
change the way the functions work: Iterate over all BlockDriverStates,
no matter who owns them, and drain them individually. Compensation is
only necessary when a new BDS is created inside a drain_all section.
Removal of a BDS doesn't require any action because it's gone afterwards
anyway.

This change means that some nodes get a higher bs->quiesce_count now
because each node propagates its individual drain to all of its parents.
In the old subtree drain, propagation back to the parent that made the
recursive drain request was avoided. While this isn't perfectly
beautiful, accepting the higher counts seems preferable to adding drain
code to multiple other places that modify the graph.

The test case is changed to account for the higher counts where
necessary.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 include/block/block.h     |  1 +
 include/block/block_int.h |  1 +
 block.c                   | 20 +++++++++++++++-
 block/io.c                | 58 ++++++++++++++++++++++++++++++++++++-----------
 tests/test-bdrv-drain.c   | 37 +++++++++++++++++-------------
 5 files changed, 87 insertions(+), 30 deletions(-)

diff --git a/include/block/block.h b/include/block/block.h
index de2cba2c74..0b59d519c5 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -412,6 +412,7 @@ BlockDriverState *bdrv_lookup_bs(const char *device,
                                  Error **errp);
 bool bdrv_chain_contains(BlockDriverState *top, BlockDriverState *base);
 BlockDriverState *bdrv_next_node(BlockDriverState *bs);
+BlockDriverState *bdrv_next_all_states(BlockDriverState *bs);
 
 typedef struct BdrvNextIterator {
     enum {
diff --git a/include/block/block_int.h b/include/block/block_int.h
index dc6985e3ae..2c86a7b53f 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -804,6 +804,7 @@ int coroutine_fn bdrv_co_pwritev(BdrvChild *child,
     int64_t offset, unsigned int bytes, QEMUIOVector *qiov,
     BdrvRequestFlags flags);
 
+extern unsigned int bdrv_drain_all_count;
 void bdrv_apply_subtree_drain(BdrvChild *child, BlockDriverState *new_parent);
 void bdrv_unapply_subtree_drain(BdrvChild *child, BlockDriverState *old_parent);
 
diff --git a/block.c b/block.c
index 330238de19..c1b339c4a4 100644
--- a/block.c
+++ b/block.c
@@ -332,6 +332,10 @@ BlockDriverState *bdrv_new(void)
 
     qemu_co_queue_init(&bs->flush_queue);
 
+    for (i = 0; i < bdrv_drain_all_count; i++) {
+        bdrv_drained_begin(bs);
+    }
+
     QTAILQ_INSERT_TAIL(&all_bdrv_states, bs, bs_list);
 
     return bs;
@@ -1160,7 +1164,7 @@ static int bdrv_open_driver(BlockDriverState *bs, BlockDriver *drv,
                             int open_flags, Error **errp)
 {
     Error *local_err = NULL;
-    int ret;
+    int i, ret;
 
     bdrv_assign_node_name(bs, node_name, &local_err);
     if (local_err) {
@@ -1208,6 +1212,12 @@ static int bdrv_open_driver(BlockDriverState *bs, BlockDriver *drv,
     assert(bdrv_min_mem_align(bs) != 0);
     assert(is_power_of_2(bs->bl.request_alignment));
 
+    for (i = 0; i < bs->quiesce_counter; i++) {
+        if (drv->bdrv_co_drain_begin) {
+            drv->bdrv_co_drain_begin(bs);
+        }
+    }
+
     return 0;
 open_failed:
     bs->drv = NULL;
@@ -4034,6 +4044,14 @@ BlockDriverState *bdrv_next_node(BlockDriverState *bs)
     return QTAILQ_NEXT(bs, node_list);
 }
 
+BlockDriverState *bdrv_next_all_states(BlockDriverState *bs)
+{
+    if (!bs) {
+        return QTAILQ_FIRST(&all_bdrv_states);
+    }
+    return QTAILQ_NEXT(bs, bs_list);
+}
+
 const char *bdrv_get_node_name(const BlockDriverState *bs)
 {
     return bs->node_name;
diff --git a/block/io.c b/block/io.c
index db6810b35f..af65c3ec2f 100644
--- a/block/io.c
+++ b/block/io.c
@@ -38,6 +38,8 @@
 /* Maximum bounce buffer for copy-on-read and write zeroes, in bytes */
 #define MAX_BOUNCE_BUFFER (32768 << BDRV_SECTOR_BITS)
 
+static AioWait drain_all_aio_wait;
+
 static int coroutine_fn bdrv_co_do_pwrite_zeroes(BlockDriverState *bs,
     int64_t offset, int bytes, BdrvRequestFlags flags);
 
@@ -445,6 +447,29 @@ static void bdrv_drain_assert_idle(BlockDriverState *bs)
     }
 }
 
+unsigned int bdrv_drain_all_count = 0;
+
+static bool bdrv_drain_all_poll(void)
+{
+    BlockDriverState *bs = NULL;
+    bool result = false;
+
+    /* Execute pending BHs first (may modify the graph) and check everything
+     * else only after the BHs have executed. */
+    while (aio_poll(qemu_get_aio_context(), false));
+
+    /* bdrv_drain_poll() with top_level = false can't make changes to the
+     * graph, so iterating bdrv_next_all_states() is safe. */
+    while ((bs = bdrv_next_all_states(bs))) {
+        AioContext *aio_context = bdrv_get_aio_context(bs);
+        aio_context_acquire(aio_context);
+        result |= bdrv_drain_poll(bs, false, false);
+        aio_context_release(aio_context);
+    }
+
+    return result;
+}
+
 /*
  * Wait for pending requests to complete across all BlockDriverStates
  *
@@ -459,45 +484,51 @@ static void bdrv_drain_assert_idle(BlockDriverState *bs)
  */
 void bdrv_drain_all_begin(void)
 {
-    BlockDriverState *bs;
-    BdrvNextIterator it;
+    BlockDriverState *bs = NULL;
 
     if (qemu_in_coroutine()) {
         bdrv_co_yield_to_drain(NULL, true, false, NULL, true);
         return;
     }
 
-    /* BDRV_POLL_WHILE() for a node can only be called from its own I/O thread
-     * or the main loop AioContext. We potentially use BDRV_POLL_WHILE() on
-     * nodes in several different AioContexts, so make sure we're in the main
-     * context. */
+    /* AIO_WAIT_WHILE() with a NULL context can only be called from the main
+     * loop AioContext, so make sure we're in the main context. */
     assert(qemu_get_current_aio_context() == qemu_get_aio_context());
+    assert(bdrv_drain_all_count < INT_MAX);
+    bdrv_drain_all_count++;
 
-    for (bs = bdrv_first(&it); bs; bs = bdrv_next(&it)) {
+    /* Quiesce all nodes, without polling in-flight requests yet. The graph
+     * cannot change during this loop. */
+    while ((bs = bdrv_next_all_states(bs))) {
         AioContext *aio_context = bdrv_get_aio_context(bs);
 
         aio_context_acquire(aio_context);
-        bdrv_do_drained_begin(bs, true, NULL, true);
+        bdrv_do_drained_begin(bs, false, NULL, false);
         aio_context_release(aio_context);
     }
 
-    for (bs = bdrv_first(&it); bs; bs = bdrv_next(&it)) {
+    /* Now poll the in-flight requests */
+    AIO_WAIT_WHILE(&drain_all_aio_wait, NULL, bdrv_drain_all_poll());
+
+    while ((bs = bdrv_next_all_states(bs))) {
         bdrv_drain_assert_idle(bs);
     }
 }
 
 void bdrv_drain_all_end(void)
 {
-    BlockDriverState *bs;
-    BdrvNextIterator it;
+    BlockDriverState *bs = NULL;
 
-    for (bs = bdrv_first(&it); bs; bs = bdrv_next(&it)) {
+    while ((bs = bdrv_next_all_states(bs))) {
         AioContext *aio_context = bdrv_get_aio_context(bs);
 
         aio_context_acquire(aio_context);
-        bdrv_do_drained_end(bs, true, NULL);
+        bdrv_do_drained_end(bs, false, NULL);
         aio_context_release(aio_context);
     }
+
+    assert(bdrv_drain_all_count > 0);
+    bdrv_drain_all_count--;
 }
 
 void bdrv_drain_all(void)
@@ -620,6 +651,7 @@ void bdrv_inc_in_flight(BlockDriverState *bs)
 void bdrv_wakeup(BlockDriverState *bs)
 {
     aio_wait_kick(bdrv_get_aio_wait(bs));
+    aio_wait_kick(&drain_all_aio_wait);
 }
 
 void bdrv_dec_in_flight(BlockDriverState *bs)
diff --git a/tests/test-bdrv-drain.c b/tests/test-bdrv-drain.c
index 79d845ecbb..97ca0743c6 100644
--- a/tests/test-bdrv-drain.c
+++ b/tests/test-bdrv-drain.c
@@ -174,7 +174,8 @@ static void do_drain_end(enum drain_type drain_type, BlockDriverState *bs)
     }
 }
 
-static void test_drv_cb_common(enum drain_type drain_type, bool recursive)
+static void test_drv_cb_common(enum drain_type drain_type, int top_drain_count,
+                               int backing_drain_count)
 {
     BlockBackend *blk;
     BlockDriverState *bs, *backing;
@@ -205,8 +206,8 @@ static void test_drv_cb_common(enum drain_type drain_type, bool recursive)
 
     do_drain_begin(drain_type, bs);
 
-    g_assert_cmpint(s->drain_count, ==, 1);
-    g_assert_cmpint(backing_s->drain_count, ==, !!recursive);
+    g_assert_cmpint(s->drain_count, ==, top_drain_count);
+    g_assert_cmpint(backing_s->drain_count, ==, backing_drain_count);
 
     do_drain_end(drain_type, bs);
 
@@ -225,8 +226,8 @@ static void test_drv_cb_common(enum drain_type drain_type, bool recursive)
     do_drain_begin(drain_type, bs);
 
     g_assert_cmpint(aio_ret, ==, 0);
-    g_assert_cmpint(s->drain_count, ==, 1);
-    g_assert_cmpint(backing_s->drain_count, ==, !!recursive);
+    g_assert_cmpint(s->drain_count, ==, top_drain_count);
+    g_assert_cmpint(backing_s->drain_count, ==, backing_drain_count);
 
     do_drain_end(drain_type, bs);
 
@@ -240,17 +241,17 @@ static void test_drv_cb_common(enum drain_type drain_type, bool recursive)
 
 static void test_drv_cb_drain_all(void)
 {
-    test_drv_cb_common(BDRV_DRAIN_ALL, true);
+    test_drv_cb_common(BDRV_DRAIN_ALL, 2, 1);
 }
 
 static void test_drv_cb_drain(void)
 {
-    test_drv_cb_common(BDRV_DRAIN, false);
+    test_drv_cb_common(BDRV_DRAIN, 1, 0);
 }
 
 static void test_drv_cb_drain_subtree(void)
 {
-    test_drv_cb_common(BDRV_SUBTREE_DRAIN, true);
+    test_drv_cb_common(BDRV_SUBTREE_DRAIN, 1, 1);
 }
 
 static void test_drv_cb_co_drain_all(void)
@@ -268,7 +269,9 @@ static void test_drv_cb_co_drain_subtree(void)
     call_in_coroutine(test_drv_cb_drain_subtree);
 }
 
-static void test_quiesce_common(enum drain_type drain_type, bool recursive)
+static void test_quiesce_common(enum drain_type drain_type,
+                                int top_quiesce_count,
+                                int backing_quiesce_count)
 {
     BlockBackend *blk;
     BlockDriverState *bs, *backing;
@@ -286,8 +289,8 @@ static void test_quiesce_common(enum drain_type drain_type, bool recursive)
 
     do_drain_begin(drain_type, bs);
 
-    g_assert_cmpint(bs->quiesce_counter, ==, 1);
-    g_assert_cmpint(backing->quiesce_counter, ==, !!recursive);
+    g_assert_cmpint(bs->quiesce_counter, ==, top_quiesce_count);
+    g_assert_cmpint(backing->quiesce_counter, ==, backing_quiesce_count);
 
     do_drain_end(drain_type, bs);
 
@@ -301,17 +304,17 @@ static void test_quiesce_common(enum drain_type drain_type, bool recursive)
 
 static void test_quiesce_drain_all(void)
 {
-    test_quiesce_common(BDRV_DRAIN_ALL, true);
+    test_quiesce_common(BDRV_DRAIN_ALL, 2, 1);
 }
 
 static void test_quiesce_drain(void)
 {
-    test_quiesce_common(BDRV_DRAIN, false);
+    test_quiesce_common(BDRV_DRAIN, 1, 0);
 }
 
 static void test_quiesce_drain_subtree(void)
 {
-    test_quiesce_common(BDRV_SUBTREE_DRAIN, true);
+    test_quiesce_common(BDRV_SUBTREE_DRAIN, 1, 1);
 }
 
 static void test_quiesce_co_drain_all(void)
@@ -348,6 +351,8 @@ static void test_nested(void)
 
     for (outer = 0; outer < DRAIN_TYPE_MAX; outer++) {
         for (inner = 0; inner < DRAIN_TYPE_MAX; inner++) {
+            int top_quiesce = 2 + (outer == BDRV_DRAIN_ALL) +
+                                  (inner == BDRV_DRAIN_ALL);
             int backing_quiesce = (outer != BDRV_DRAIN) +
                                   (inner != BDRV_DRAIN);
 
@@ -359,9 +364,9 @@ static void test_nested(void)
             do_drain_begin(outer, bs);
             do_drain_begin(inner, bs);
 
-            g_assert_cmpint(bs->quiesce_counter, ==, 2);
+            g_assert_cmpint(bs->quiesce_counter, ==, top_quiesce);
             g_assert_cmpint(backing->quiesce_counter, ==, backing_quiesce);
-            g_assert_cmpint(s->drain_count, ==, 2);
+            g_assert_cmpint(s->drain_count, ==, top_quiesce);
             g_assert_cmpint(backing_s->drain_count, ==, backing_quiesce);
 
             do_drain_end(inner, bs);
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [Qemu-devel] [PATCH 19/19] test-bdrv-drain: Test graph changes in drain_all section
  2018-04-11 16:39 [Qemu-devel] [PATCH 00/19] Drain fixes and cleanups, part 3 Kevin Wolf
                   ` (17 preceding siblings ...)
  2018-04-11 16:39 ` [Qemu-devel] [PATCH 18/19] block: Allow graph changes in bdrv_drain_all_begin/end sections Kevin Wolf
@ 2018-04-11 16:39 ` Kevin Wolf
  2018-04-11 17:05 ` [Qemu-devel] [PATCH 00/19] Drain fixes and cleanups, part 3 no-reply
  2018-04-20  7:35 ` Stefan Hajnoczi
  20 siblings, 0 replies; 52+ messages in thread
From: Kevin Wolf @ 2018-04-11 16:39 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, mreitz, pbonzini, famz, stefanha, qemu-devel

This tests both adding and remove a node between bdrv_drain_all_begin()
and bdrv_drain_all_end(), and enabled the existing detach test for
drain_all.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 tests/test-bdrv-drain.c | 75 +++++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 73 insertions(+), 2 deletions(-)

diff --git a/tests/test-bdrv-drain.c b/tests/test-bdrv-drain.c
index 97ca0743c6..462842a761 100644
--- a/tests/test-bdrv-drain.c
+++ b/tests/test-bdrv-drain.c
@@ -457,7 +457,7 @@ static void test_multiparent(void)
     blk_unref(blk_b);
 }
 
-static void test_graph_change(void)
+static void test_graph_change_drain_subtree(void)
 {
     BlockBackend *blk_a, *blk_b;
     BlockDriverState *bs_a, *bs_b, *backing;
@@ -536,6 +536,63 @@ static void test_graph_change(void)
     blk_unref(blk_b);
 }
 
+static void test_graph_change_drain_all(void)
+{
+    BlockBackend *blk_a, *blk_b;
+    BlockDriverState *bs_a, *bs_b;
+    BDRVTestState *a_s, *b_s;
+
+    /* Create node A with a BlockBackend */
+    blk_a = blk_new(BLK_PERM_ALL, BLK_PERM_ALL);
+    bs_a = bdrv_new_open_driver(&bdrv_test, "test-node-a", BDRV_O_RDWR,
+                                &error_abort);
+    a_s = bs_a->opaque;
+    blk_insert_bs(blk_a, bs_a, &error_abort);
+
+    g_assert_cmpint(bs_a->quiesce_counter, ==, 0);
+    g_assert_cmpint(a_s->drain_count, ==, 0);
+
+    /* Call bdrv_drain_all_begin() */
+    bdrv_drain_all_begin();
+
+    g_assert_cmpint(bs_a->quiesce_counter, ==, 1);
+    g_assert_cmpint(a_s->drain_count, ==, 1);
+
+    /* Create node B with a BlockBackend */
+    blk_b = blk_new(BLK_PERM_ALL, BLK_PERM_ALL);
+    bs_b = bdrv_new_open_driver(&bdrv_test, "test-node-b", BDRV_O_RDWR,
+                                &error_abort);
+    b_s = bs_b->opaque;
+    blk_insert_bs(blk_b, bs_b, &error_abort);
+
+    g_assert_cmpint(bs_a->quiesce_counter, ==, 1);
+    g_assert_cmpint(bs_b->quiesce_counter, ==, 1);
+    g_assert_cmpint(a_s->drain_count, ==, 1);
+    g_assert_cmpint(b_s->drain_count, ==, 1);
+
+    /* Unref and finally delete node A */
+    blk_unref(blk_a);
+
+    g_assert_cmpint(bs_a->quiesce_counter, ==, 1);
+    g_assert_cmpint(bs_b->quiesce_counter, ==, 1);
+    g_assert_cmpint(a_s->drain_count, ==, 1);
+    g_assert_cmpint(b_s->drain_count, ==, 1);
+
+    bdrv_unref(bs_a);
+
+    g_assert_cmpint(bs_b->quiesce_counter, ==, 1);
+    g_assert_cmpint(b_s->drain_count, ==, 1);
+
+    /* End the drained section */
+    bdrv_drain_all_end();
+
+    g_assert_cmpint(bs_b->quiesce_counter, ==, 0);
+    g_assert_cmpint(b_s->drain_count, ==, 0);
+
+    bdrv_unref(bs_b);
+    blk_unref(blk_b);
+}
+
 struct test_iothread_data {
     BlockDriverState *bs;
     enum drain_type drain_type;
@@ -971,6 +1028,10 @@ static void do_test_delete_by_drain(bool detach_instead_of_delete,
         bdrv_subtree_drained_begin(bs);
         bdrv_subtree_drained_end(bs);
         break;
+    case BDRV_DRAIN_ALL:
+        bdrv_drain_all_begin();
+        bdrv_drain_all_end();
+        break;
     default:
         g_assert_not_reached();
     }
@@ -991,6 +1052,11 @@ static void test_delete_by_drain(void)
     do_test_delete_by_drain(false, BDRV_DRAIN);
 }
 
+static void test_detach_by_drain_all(void)
+{
+    do_test_delete_by_drain(true, BDRV_DRAIN_ALL);
+}
+
 static void test_detach_by_drain(void)
 {
     do_test_delete_by_drain(true, BDRV_DRAIN);
@@ -1219,7 +1285,11 @@ int main(int argc, char **argv)
 
     g_test_add_func("/bdrv-drain/nested", test_nested);
     g_test_add_func("/bdrv-drain/multiparent", test_multiparent);
-    g_test_add_func("/bdrv-drain/graph-change", test_graph_change);
+
+    g_test_add_func("/bdrv-drain/graph-change/drain_subtree",
+                    test_graph_change_drain_subtree);
+    g_test_add_func("/bdrv-drain/graph-change/drain_all",
+                    test_graph_change_drain_all);
 
     g_test_add_func("/bdrv-drain/iothread/drain_all", test_iothread_drain_all);
     g_test_add_func("/bdrv-drain/iothread/drain", test_iothread_drain);
@@ -1232,6 +1302,7 @@ int main(int argc, char **argv)
                     test_blockjob_drain_subtree);
 
     g_test_add_func("/bdrv-drain/deletion/drain", test_delete_by_drain);
+    g_test_add_func("/bdrv-drain/detach/drain_all", test_detach_by_drain_all);
     g_test_add_func("/bdrv-drain/detach/drain", test_detach_by_drain);
     g_test_add_func("/bdrv-drain/detach/drain_subtree", test_detach_by_drain_subtree);
     g_test_add_func("/bdrv-drain/detach/parent_cb", test_detach_by_parent_cb);
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* Re: [Qemu-devel] [PATCH 00/19] Drain fixes and cleanups, part 3
  2018-04-11 16:39 [Qemu-devel] [PATCH 00/19] Drain fixes and cleanups, part 3 Kevin Wolf
                   ` (18 preceding siblings ...)
  2018-04-11 16:39 ` [Qemu-devel] [PATCH 19/19] test-bdrv-drain: Test graph changes in drain_all section Kevin Wolf
@ 2018-04-11 17:05 ` no-reply
  2018-04-20  7:35 ` Stefan Hajnoczi
  20 siblings, 0 replies; 52+ messages in thread
From: no-reply @ 2018-04-11 17:05 UTC (permalink / raw)
  To: kwolf; +Cc: famz, qemu-block

Hi,

This series seems to have some coding style problems. See output below for
more information:

Type: series
Message-id: 20180411163940.2523-1-kwolf@redhat.com
Subject: [Qemu-devel] [PATCH 00/19] Drain fixes and cleanups, part 3

=== TEST SCRIPT BEGIN ===
#!/bin/bash

BASE=base
n=1
total=$(git log --oneline $BASE.. | wc -l)
failed=0

git config --local diff.renamelimit 0
git config --local diff.renames True
git config --local diff.algorithm histogram

commits="$(git log --format=%H --reverse $BASE..)"
for c in $commits; do
    echo "Checking PATCH $n/$total: $(git log -n 1 --format=%s $c)..."
    if ! git show $c --format=email | ./scripts/checkpatch.pl --mailback -; then
        failed=1
        echo
    fi
    n=$((n+1))
done

exit $failed
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
From https://github.com/patchew-project/qemu
   675608cb84..6523eaca37  master     -> master
 t [tag update]            patchew/1523089594-1422-1-git-send-email-lidongchen@tencent.com -> patchew/1523089594-1422-1-git-send-email-lidongchen@tencent.com
 t [tag update]            patchew/1523377186-32578-1-git-send-email-cota@braap.org -> patchew/1523377186-32578-1-git-send-email-cota@braap.org
 t [tag update]            patchew/20180410134203.17552-1-peter.maydell@linaro.org -> patchew/20180410134203.17552-1-peter.maydell@linaro.org
 t [tag update]            patchew/20180410193919.28026-1-alex.bennee@linaro.org -> patchew/20180410193919.28026-1-alex.bennee@linaro.org
 t [tag update]            patchew/20180411122606.367301-1-vsementsov@virtuozzo.com -> patchew/20180411122606.367301-1-vsementsov@virtuozzo.com
 * [new tag]               patchew/20180411163940.2523-1-kwolf@redhat.com -> patchew/20180411163940.2523-1-kwolf@redhat.com
Switched to a new branch 'test'
4b7690f2d7 test-bdrv-drain: Test graph changes in drain_all section
c3b712e854 block: Allow graph changes in bdrv_drain_all_begin/end sections
b42a7e0a7d block: Move bdrv_drain_all_begin() out of coroutine context
74bb69f37c block: Allow AIO_WAIT_WHILE with NULL ctx
a6e790e0bc test-bdrv-drain: Test that bdrv_drain_invoke() doesn't poll
26fc9f7a2f block: Defer .bdrv_drain_begin callback to polling phase
5de06df1ac test-bdrv-drain: Graph change through parent callback
13fb2f568b block: Don't poll in parent drain callbacks
48cfd9a68a test-bdrv-drain: Test node deletion in subtree recursion
81174751a0 block: Drain recursively with a single BDRV_POLL_WHILE()
1f4daf1742 test-bdrv-drain: Add test for node deletion
df4213f29a block: Remove bdrv_drain_recurse()
5bddf60629 block: Really pause block jobs on drain
aed8d29900 block: Avoid unnecessary aio_poll() in AIO_WAIT_WHILE()
6479633b40 tests/test-bdrv-drain: bdrv_drain_all() works in coroutines now
b02f2e5912 block: Don't manually poll in bdrv_drain_all()
c3fc61add1 block: Remove 'recursive' parameter from bdrv_drain_invoke()
f33873949d block: Use bdrv_do_drain_begin/end in bdrv_drain_all()
9edb04df89 test-bdrv-drain: bdrv_drain() works with cross-AioContext events

=== OUTPUT BEGIN ===
Checking PATCH 1/19: test-bdrv-drain: bdrv_drain() works with cross-AioContext events...
Checking PATCH 2/19: block: Use bdrv_do_drain_begin/end in bdrv_drain_all()...
Checking PATCH 3/19: block: Remove 'recursive' parameter from bdrv_drain_invoke()...
Checking PATCH 4/19: block: Don't manually poll in bdrv_drain_all()...
Checking PATCH 5/19: tests/test-bdrv-drain: bdrv_drain_all() works in coroutines now...
Checking PATCH 6/19: block: Avoid unnecessary aio_poll() in AIO_WAIT_WHILE()...
ERROR: trailing statements should be on next line
#37: FILE: block/io.c:189:
+    while (aio_poll(bs->aio_context, false));

ERROR: braces {} are necessary for all arms of this statement
#37: FILE: block/io.c:189:
+    while (aio_poll(bs->aio_context, false));
[...]

total: 2 errors, 0 warnings, 60 lines checked

Your patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

Checking PATCH 7/19: block: Really pause block jobs on drain...
ERROR: trailing statements should be on next line
#98: FILE: block/io.c:204:
+        while (aio_poll(bs->aio_context, false));

ERROR: braces {} are necessary for all arms of this statement
#98: FILE: block/io.c:204:
+        while (aio_poll(bs->aio_context, false));
[...]

total: 2 errors, 0 warnings, 234 lines checked

Your patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

Checking PATCH 8/19: block: Remove bdrv_drain_recurse()...
Checking PATCH 9/19: test-bdrv-drain: Add test for node deletion...
Checking PATCH 10/19: block: Drain recursively with a single BDRV_POLL_WHILE()...
Checking PATCH 11/19: test-bdrv-drain: Test node deletion in subtree recursion...
WARNING: line over 80 characters
#85: FILE: tests/test-bdrv-drain.c:1029:
+    g_test_add_func("/bdrv-drain/detach/drain_subtree", test_detach_by_drain_subtree);

total: 0 errors, 1 warnings, 68 lines checked

Your patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
Checking PATCH 12/19: block: Don't poll in parent drain callbacks...
Checking PATCH 13/19: test-bdrv-drain: Graph change through parent callback...
WARNING: line over 80 characters
#81: FILE: tests/test-bdrv-drain.c:1044:
+    child_a = bdrv_attach_child(parent_b, a, "PB-A", &child_backing, &error_abort);

total: 0 errors, 1 warnings, 142 lines checked

Your patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
Checking PATCH 14/19: block: Defer .bdrv_drain_begin callback to polling phase...
Checking PATCH 15/19: test-bdrv-drain: Test that bdrv_drain_invoke() doesn't poll...
Checking PATCH 16/19: block: Allow AIO_WAIT_WHILE with NULL ctx...
Checking PATCH 17/19: block: Move bdrv_drain_all_begin() out of coroutine context...
WARNING: line over 80 characters
#27: FILE: block/io.c:261:
+            bdrv_do_drained_begin(bs, data->recursive, data->parent, data->poll);

total: 0 errors, 1 warnings, 41 lines checked

Your patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
Checking PATCH 18/19: block: Allow graph changes in bdrv_drain_all_begin/end sections...
ERROR: do not initialise globals to 0 or NULL
#105: FILE: block/io.c:450:
+unsigned int bdrv_drain_all_count = 0;

ERROR: trailing statements should be on next line
#114: FILE: block/io.c:459:
+    while (aio_poll(qemu_get_aio_context(), false));

ERROR: braces {} are necessary for all arms of this statement
#114: FILE: block/io.c:459:
+    while (aio_poll(qemu_get_aio_context(), false));
[...]

total: 3 errors, 0 warnings, 274 lines checked

Your patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

Checking PATCH 19/19: test-bdrv-drain: Test graph changes in drain_all section...
=== OUTPUT END ===

Test command exited with code: 1


---
Email generated automatically by Patchew [http://patchew.org/].
Please send your feedback to patchew-devel@redhat.com

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Qemu-devel] [PATCH 06/19] block: Avoid unnecessary aio_poll() in AIO_WAIT_WHILE()
  2018-04-11 16:39 ` [Qemu-devel] [PATCH 06/19] block: Avoid unnecessary aio_poll() in AIO_WAIT_WHILE() Kevin Wolf
@ 2018-04-11 17:33   ` Su Hang
  2018-04-20  7:17   ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi
  1 sibling, 0 replies; 52+ messages in thread
From: Su Hang @ 2018-04-11 17:33 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: qemu-block, famz, qemu-devel, mreitz, stefanha, pbonzini

Dear Kevin:
I notice checkpatch.pl complained about code style problems,
you may want to replace `while (aio_poll(bs->aio_context, false));`
with
```
while (aio_poll(bs->aio_context, false)) {
    /* No further work */
}
```
to suppress the complaints.

Best,
Su Hang

"Kevin Wolf" <kwolf@redhat.com>wrote:
> Commit 91af091f923 added an additional aio_poll() to BDRV_POLL_WHILE()
> in order to make sure that all pending BHs are executed on drain. This
> was the wrong place to make the fix, as it is useless overhead for all
> other users of the macro and unnecessarily complicates the mechanism.
> 
> This patch effectively reverts said commit (the context has changed a
> bit and the code has moved to AIO_WAIT_WHILE()) and instead polls in the
> loop condition for drain.
> 
> The effect is probably hard to measure in any real-world use case
> because actual I/O will dominate, but if I run only the initialisation
> part of 'qemu-img convert' where it calls bdrv_block_status() for the
> whole image to find out how much data there is copy, this phase actually
> needs only roughly half the time after this patch.
> 
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> ---
>  include/block/aio-wait.h | 22 ++++++++--------------
>  block/io.c               | 11 ++++++++++-
>  2 files changed, 18 insertions(+), 15 deletions(-)
> 
> diff --git a/include/block/aio-wait.h b/include/block/aio-wait.h
> index 8c90a2e66e..783d3678dd 100644
> --- a/include/block/aio-wait.h
> +++ b/include/block/aio-wait.h
> @@ -73,29 +73,23 @@ typedef struct {
>   */
>  #define AIO_WAIT_WHILE(wait, ctx, cond) ({                         \
>      bool waited_ = false;                                          \
> -    bool busy_ = true;                                             \
>      AioWait *wait_ = (wait);                                       \
>      AioContext *ctx_ = (ctx);                                      \
>      if (in_aio_context_home_thread(ctx_)) {                        \
> -        while ((cond) || busy_) {                                  \
> -            busy_ = aio_poll(ctx_, (cond));                        \
> -            waited_ |= !!(cond) | busy_;                           \
> +        while ((cond)) {                                           \
> +            aio_poll(ctx_, true);                                  \
> +            waited_ = true;                                        \
>          }                                                          \
>      } else {                                                       \
>          assert(qemu_get_current_aio_context() ==                   \
>                 qemu_get_aio_context());                            \
>          /* Increment wait_->num_waiters before evaluating cond. */ \
>          atomic_inc(&wait_->num_waiters);                           \
> -        while (busy_) {                                            \
> -            if ((cond)) {                                          \
> -                waited_ = busy_ = true;                            \
> -                aio_context_release(ctx_);                         \
> -                aio_poll(qemu_get_aio_context(), true);            \
> -                aio_context_acquire(ctx_);                         \
> -            } else {                                               \
> -                busy_ = aio_poll(ctx_, false);                     \
> -                waited_ |= busy_;                                  \
> -            }                                                      \
> +        while ((cond)) {                                           \
> +            aio_context_release(ctx_);                             \
> +            aio_poll(qemu_get_aio_context(), true);                \
> +            aio_context_acquire(ctx_);                             \
> +            waited_ = true;                                        \
>          }                                                          \
>          atomic_dec(&wait_->num_waiters);                           \
>      }                                                              \
> diff --git a/block/io.c b/block/io.c
> index ea6f9f023a..6f580f49ff 100644
> --- a/block/io.c
> +++ b/block/io.c
> @@ -181,13 +181,22 @@ static void bdrv_drain_invoke(BlockDriverState *bs, bool begin)
>      BDRV_POLL_WHILE(bs, !data.done);
>  }
>  
> +/* Returns true if BDRV_POLL_WHILE() should go into a blocking aio_poll() */
> +static bool bdrv_drain_poll(BlockDriverState *bs)
> +{
> +    /* Execute pending BHs first and check everything else only after the BHs
> +     * have executed. */
> +    while (aio_poll(bs->aio_context, false));
> +    return atomic_read(&bs->in_flight);
> +}
> +
>  static bool bdrv_drain_recurse(BlockDriverState *bs)
>  {
>      BdrvChild *child, *tmp;
>      bool waited;
>  
>      /* Wait for drained requests to finish */
> -    waited = BDRV_POLL_WHILE(bs, atomic_read(&bs->in_flight) > 0);
> +    waited = BDRV_POLL_WHILE(bs, bdrv_drain_poll(bs));
>  
>      QLIST_FOREACH_SAFE(child, &bs->children, next, tmp) {
>          BlockDriverState *bs = child->bs;
> -- 
> 2.13.6
> 

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Qemu-devel] [PATCH 04/19] block: Don't manually poll in bdrv_drain_all()
  2018-04-11 16:39 ` [Qemu-devel] [PATCH 04/19] block: Don't manually poll in bdrv_drain_all() Kevin Wolf
@ 2018-04-11 18:32   ` Eric Blake
  2018-04-20  7:11   ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi
  1 sibling, 0 replies; 52+ messages in thread
From: Eric Blake @ 2018-04-11 18:32 UTC (permalink / raw)
  To: Kevin Wolf, qemu-block; +Cc: famz, qemu-devel, mreitz, stefanha, pbonzini

[-- Attachment #1: Type: text/plain, Size: 1233 bytes --]

On 04/11/2018 11:39 AM, Kevin Wolf wrote:
> All involved nodes are already idle, we called bdrv_do_draine_begin() on

s/draine/drain/

> them.
> 
> The comment in the code suggested that this were not correct because the

s/were/was/

> completion of a request on one node could spawn a new request on a
> different node (which might have been drained before, so we wouldn't
> drain the new request). In reality, new requests to different nodes
> aren't spawned out of nothing, but only in the context of a parent
> request, and they aren't submitted to random nodes, but only to child
> nodes. As long as we still poll for the completion of the parent request
> (which we do), draining each root node separately is good enough.
> 
> Remove the additional polling code from bdrv_drain_all_begin() and
> replace it with an assertion that all nodes are already idle after we
> drained them separately.
> 
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> ---
>  block/io.c | 41 ++++++++++++-----------------------------
>  1 file changed, 12 insertions(+), 29 deletions(-)
> 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 619 bytes --]

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Qemu-devel] [PATCH 05/19] tests/test-bdrv-drain: bdrv_drain_all() works in coroutines now
  2018-04-11 16:39 ` [Qemu-devel] [PATCH 05/19] tests/test-bdrv-drain: bdrv_drain_all() works in coroutines now Kevin Wolf
@ 2018-04-11 18:33   ` Eric Blake
  2018-04-20  7:12   ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi
  1 sibling, 0 replies; 52+ messages in thread
From: Eric Blake @ 2018-04-11 18:33 UTC (permalink / raw)
  To: Kevin Wolf, qemu-block; +Cc: famz, qemu-devel, mreitz, stefanha, pbonzini

[-- Attachment #1: Type: text/plain, Size: 752 bytes --]

On 04/11/2018 11:39 AM, Kevin Wolf wrote:
> Since we use bdrv_do_drained_begin/end() for bdrv_drain_all_begin/end(),
> coroutine context is automatically left with a BH, preventing the
> deadlocks that made bdrv_drain_all*() unsafe in coroutine context. We
> can consider it compatible now the latest, after having removed the old

s/the latest/at last/ ?

> polling code as dead code.
> 
> Enable the coroutine test cases for bdrv_drain_all().
> 
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> ---
>  tests/test-bdrv-drain.c | 16 ++++++++++++++--
>  1 file changed, 14 insertions(+), 2 deletions(-)
> 
-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 619 bytes --]

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Qemu-devel] [PATCH 07/19] block: Really pause block jobs on drain
  2018-04-11 16:39 ` [Qemu-devel] [PATCH 07/19] block: Really pause block jobs on drain Kevin Wolf
@ 2018-04-12  8:37   ` Paolo Bonzini
  2018-04-12  9:51     ` Kevin Wolf
  0 siblings, 1 reply; 52+ messages in thread
From: Paolo Bonzini @ 2018-04-12  8:37 UTC (permalink / raw)
  To: Kevin Wolf, qemu-block; +Cc: mreitz, famz, stefanha, qemu-devel

On 11/04/2018 18:39, Kevin Wolf wrote:
> +bool bdrv_drain_poll(BlockDriverState *bs, bool top_level)
>  {
>      /* Execute pending BHs first and check everything else only after the BHs
>       * have executed. */
> -    while (aio_poll(bs->aio_context, false));
> +    if (top_level) {
> +        while (aio_poll(bs->aio_context, false));
> +    }
> +
> +    if (bdrv_parent_drained_poll(bs)) {
> +        return true;
> +    }
> +
>      return atomic_read(&bs->in_flight);
>  }
>  

Up until now I liked very much this series, but I like this patch a bit
less for two reasons.

1) I think I would prefer to have the !top_level case in a separate
function---making the API easier to use in the BdrvChildRole callback
because there is no need to pass false.  In addition, the callback is
not really polling anything, but rather returning whether the children
are quiescent.  So a better name would be bdrv_children_drained and
c->role->drained.

2) Worse, the main idea behind the first drain restructuring was that
draining could proceed in topological order: first drain the roots' I/O,
then call bdrv_drain to send the last requests to their children, then
recurse.  It is not clear to me why you need to introduce this recursive
step, which is also O(n^2) in the worst case.

Paolo

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Qemu-devel] [PATCH 08/19] block: Remove bdrv_drain_recurse()
  2018-04-11 16:39 ` [Qemu-devel] [PATCH 08/19] block: Remove bdrv_drain_recurse() Kevin Wolf
@ 2018-04-12  8:39   ` Paolo Bonzini
  2018-04-20  7:20   ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi
  1 sibling, 0 replies; 52+ messages in thread
From: Paolo Bonzini @ 2018-04-12  8:39 UTC (permalink / raw)
  To: Kevin Wolf, qemu-block; +Cc: mreitz, famz, stefanha, qemu-devel

On 11/04/2018 18:39, Kevin Wolf wrote:
> For bdrv_drain(), recursively waiting for child node requests is
> pointless because we didn't quiesce their parents, so new requests could
> come in anyway. Letting the function work only on a single node makes it
> more consistent.
> 
> For subtree drains and drain_all, we already have the recursion in
> bdrv_do_drained_begin(), so the extra recursion doesn't add anything
> either.
> 
> Remove the useless code.

This solves one of my questions, but leaves me the doubt about quadratic
complexity.

Paolo

> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> ---
>  block/io.c | 36 +++---------------------------------
>  1 file changed, 3 insertions(+), 33 deletions(-)
> 
> diff --git a/block/io.c b/block/io.c
> index 0a778eeff4..f24f39c278 100644
> --- a/block/io.c
> +++ b/block/io.c
> @@ -211,38 +211,6 @@ bool bdrv_drain_poll(BlockDriverState *bs, bool top_level)
>      return atomic_read(&bs->in_flight);
>  }
>  
> -static bool bdrv_drain_recurse(BlockDriverState *bs)
> -{
> -    BdrvChild *child, *tmp;
> -    bool waited;
> -
> -    /* Wait for drained requests to finish */
> -    waited = BDRV_POLL_WHILE(bs, bdrv_drain_poll(bs, true));
> -
> -    QLIST_FOREACH_SAFE(child, &bs->children, next, tmp) {
> -        BlockDriverState *bs = child->bs;
> -        bool in_main_loop =
> -            qemu_get_current_aio_context() == qemu_get_aio_context();
> -        assert(bs->refcnt > 0);
> -        if (in_main_loop) {
> -            /* In case the recursive bdrv_drain_recurse processes a
> -             * block_job_defer_to_main_loop BH and modifies the graph,
> -             * let's hold a reference to bs until we are done.
> -             *
> -             * IOThread doesn't have such a BH, and it is not safe to call
> -             * bdrv_unref without BQL, so skip doing it there.
> -             */
> -            bdrv_ref(bs);
> -        }
> -        waited |= bdrv_drain_recurse(bs);
> -        if (in_main_loop) {
> -            bdrv_unref(bs);
> -        }
> -    }
> -
> -    return waited;
> -}
> -
>  static void bdrv_do_drained_begin(BlockDriverState *bs, bool recursive,
>                                    BdrvChild *parent);
>  static void bdrv_do_drained_end(BlockDriverState *bs, bool recursive,
> @@ -310,7 +278,9 @@ void bdrv_do_drained_begin(BlockDriverState *bs, bool recursive,
>  
>      bdrv_parent_drained_begin(bs, parent);
>      bdrv_drain_invoke(bs, true);
> -    bdrv_drain_recurse(bs);
> +
> +    /* Wait for drained requests to finish */
> +    BDRV_POLL_WHILE(bs, bdrv_drain_poll(bs, true));
>  
>      if (recursive) {
>          bs->recursive_quiesce_counter++;
> 

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Qemu-devel] [PATCH 10/19] block: Drain recursively with a single BDRV_POLL_WHILE()
  2018-04-11 16:39 ` [Qemu-devel] [PATCH 10/19] block: Drain recursively with a single BDRV_POLL_WHILE() Kevin Wolf
@ 2018-04-12  8:41   ` Paolo Bonzini
  0 siblings, 0 replies; 52+ messages in thread
From: Paolo Bonzini @ 2018-04-12  8:41 UTC (permalink / raw)
  To: Kevin Wolf, qemu-block; +Cc: mreitz, famz, stefanha, qemu-devel

On 11/04/2018 18:39, Kevin Wolf wrote:
> +    if (atomic_read(&bs->in_flight)) {
> +        return true;
> +    }
> +
> +    if (recursive) {
> +        QLIST_FOREACH_SAFE(child, &bs->children, next, next) {

QLIST_FOREACH_SAFE is only safe if child disappears, but not if e.g.
next disappears.  So this loop is only safe because top_level is false
below.  Sounds like another good reason to split the top_level == false
case into a separate function.

Paolo

> +            if (bdrv_drain_poll(child->bs, false, recursive)) {
> +                return true;
> +            }
> +        }

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Qemu-devel] [PATCH 16/19] block: Allow AIO_WAIT_WHILE with NULL ctx
  2018-04-11 16:39 ` [Qemu-devel] [PATCH 16/19] block: Allow AIO_WAIT_WHILE with NULL ctx Kevin Wolf
@ 2018-04-12  8:43   ` Paolo Bonzini
  0 siblings, 0 replies; 52+ messages in thread
From: Paolo Bonzini @ 2018-04-12  8:43 UTC (permalink / raw)
  To: Kevin Wolf, qemu-block; +Cc: mreitz, famz, stefanha, qemu-devel

On 11/04/2018 18:39, Kevin Wolf wrote:
> bdrv_drain_all() wants to have a single polling loop for draining the
> in-flight requests of all nodes. This means that the AIO_WAIT_WHILE()
> condition relies on activity in multiple AioContexts, which is polled
> from the mainloop context. We must therefore call AIO_WAIT_WHILE() from
> the mainloop thread and use the AioWait notification mechanism.
> 
> Just randomly picking the AioContext of any non-mainloop thread would
> work, but instead of bothering to find such a context in the caller, we
> can just as well accept NULL for ctx.
> 
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> ---
>  include/block/aio-wait.h | 13 +++++++++----
>  1 file changed, 9 insertions(+), 4 deletions(-)
> 
> diff --git a/include/block/aio-wait.h b/include/block/aio-wait.h
> index 783d3678dd..c85a62f798 100644
> --- a/include/block/aio-wait.h
> +++ b/include/block/aio-wait.h
> @@ -57,7 +57,8 @@ typedef struct {
>  /**
>   * AIO_WAIT_WHILE:
>   * @wait: the aio wait object
> - * @ctx: the aio context
> + * @ctx: the aio context, or NULL if multiple aio contexts (for which the
> + *       caller does not hold a lock) are involved in the polling condition.
>   * @cond: wait while this conditional expression is true
>   *
>   * Wait while a condition is true.  Use this to implement synchronous
> @@ -75,7 +76,7 @@ typedef struct {
>      bool waited_ = false;                                          \
>      AioWait *wait_ = (wait);                                       \
>      AioContext *ctx_ = (ctx);                                      \
> -    if (in_aio_context_home_thread(ctx_)) {                        \
> +    if (ctx_ && in_aio_context_home_thread(ctx_)) {                \
>          while ((cond)) {                                           \
>              aio_poll(ctx_, true);                                  \
>              waited_ = true;                                        \
> @@ -86,9 +87,13 @@ typedef struct {
>          /* Increment wait_->num_waiters before evaluating cond. */ \
>          atomic_inc(&wait_->num_waiters);                           \
>          while ((cond)) {                                           \
> -            aio_context_release(ctx_);                             \
> +            if (ctx_) {                                            \
> +                aio_context_release(ctx_);                         \
> +            }                                                      \
>              aio_poll(qemu_get_aio_context(), true);                \
> -            aio_context_acquire(ctx_);                             \
> +            if (ctx_) {                                            \
> +                aio_context_acquire(ctx_);                         \
> +            }                                                      \
>              waited_ = true;                                        \
>          }                                                          \
>          atomic_dec(&wait_->num_waiters);                           \
> 

Looks good.

Paolo

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Qemu-devel] [PATCH 18/19] block: Allow graph changes in bdrv_drain_all_begin/end sections
  2018-04-11 16:39 ` [Qemu-devel] [PATCH 18/19] block: Allow graph changes in bdrv_drain_all_begin/end sections Kevin Wolf
@ 2018-04-12  8:47   ` Paolo Bonzini
  0 siblings, 0 replies; 52+ messages in thread
From: Paolo Bonzini @ 2018-04-12  8:47 UTC (permalink / raw)
  To: Kevin Wolf, qemu-block; +Cc: mreitz, famz, stefanha, qemu-devel

On 11/04/2018 18:39, Kevin Wolf wrote:
> The much easier and more obviously correct way is to fundamentally
> change the way the functions work: Iterate over all BlockDriverStates,
> no matter who owns them, and drain them individually. Compensation is
> only necessary when a new BDS is created inside a drain_all section.
> Removal of a BDS doesn't require any action because it's gone afterwards
> anyway.

Ok, now I see (I think) why you chose the recursive check for in-flight
requests.  The higher quiesce_count is not a problem, but I am still not
convinced about the recursion.

Paolo

> This change means that some nodes get a higher bs->quiesce_count now
> because each node propagates its individual drain to all of its parents.
> In the old subtree drain, propagation back to the parent that made the
> recursive drain request was avoided. While this isn't perfectly
> beautiful, accepting the higher counts seems preferable to adding drain
> code to multiple other places that modify the graph.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Qemu-devel] [PATCH 07/19] block: Really pause block jobs on drain
  2018-04-12  8:37   ` Paolo Bonzini
@ 2018-04-12  9:51     ` Kevin Wolf
  2018-04-12 10:12       ` Paolo Bonzini
  0 siblings, 1 reply; 52+ messages in thread
From: Kevin Wolf @ 2018-04-12  9:51 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-block, mreitz, famz, stefanha, qemu-devel

Am 12.04.2018 um 10:37 hat Paolo Bonzini geschrieben:
> On 11/04/2018 18:39, Kevin Wolf wrote:
> > +bool bdrv_drain_poll(BlockDriverState *bs, bool top_level)
> >  {
> >      /* Execute pending BHs first and check everything else only after the BHs
> >       * have executed. */
> > -    while (aio_poll(bs->aio_context, false));
> > +    if (top_level) {
> > +        while (aio_poll(bs->aio_context, false));
> > +    }
> > +
> > +    if (bdrv_parent_drained_poll(bs)) {
> > +        return true;
> > +    }
> > +
> >      return atomic_read(&bs->in_flight);
> >  }
> >  
> 
> Up until now I liked very much this series, but I like this patch a bit
> less for two reasons.
> 
> 1) I think I would prefer to have the !top_level case in a separate
> function---making the API easier to use in the BdrvChildRole callback
> because there is no need to pass false.

Basically just move the aio_poll() out to a different function that
calls bdrv_drain_poll afterwards? Maybe it's a bit cleaner, yes.
However, see below.

> In addition, the callback is not really polling anything, but rather
> returning whether the children are quiescent.  So a better name would
> be bdrv_children_drained and c->role->drained.

Why isn't it polling? We're actively checking the state of the node, and
we keep calling the callback until it has the expected state. Would it
only become polling for you if the loop were in this function rather
than the its caller?

> 2) Worse, the main idea behind the first drain restructuring was that
> draining could proceed in topological order: first drain the roots' I/O,
> then call bdrv_drain to send the last requests to their children, then
> recurse.  It is not clear to me why you need to introduce this recursive
> step, which is also O(n^2) in the worst case.

I need to introduce it because it fixes the bug that we don't wait until
the parents are actually quiesced and don't send new requests any more.
I don't see how this could be fixed without going to the parents.

Is the O(n²) that you mean that we recursively iterate all children in
bdrv_do_drained_begin() (or later in the series in bdrv_drain_poll()),
and then come back from the children when they poll their parents?

We could do the same thing as for bdrv_parent_drained_begin(), i.e. pass
the parent that we came from (BdrvChild *parent instead of bool
top_level, NULL instead of top_level=true) and then skip that parent
while calling the BdrvChildRole .drain_poll callbacks. Would that
address your concerns?

In that solution, splitting the function by moving aio_poll() out
wouldn't get rid of a parameter and simplify the API any more. It might
still be cleaner, though?

Kevin

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Qemu-devel] [PATCH 07/19] block: Really pause block jobs on drain
  2018-04-12  9:51     ` Kevin Wolf
@ 2018-04-12 10:12       ` Paolo Bonzini
  2018-04-12 11:11         ` Kevin Wolf
  0 siblings, 1 reply; 52+ messages in thread
From: Paolo Bonzini @ 2018-04-12 10:12 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: qemu-block, mreitz, famz, stefanha, qemu-devel

On 12/04/2018 11:51, Kevin Wolf wrote:
> Am 12.04.2018 um 10:37 hat Paolo Bonzini geschrieben:
>> On 11/04/2018 18:39, Kevin Wolf wrote:
>>> +bool bdrv_drain_poll(BlockDriverState *bs, bool top_level)
>>>  {
>>>      /* Execute pending BHs first and check everything else only after the BHs
>>>       * have executed. */
>>> -    while (aio_poll(bs->aio_context, false));
>>> +    if (top_level) {
>>> +        while (aio_poll(bs->aio_context, false));
>>> +    }
>>> +
>>> +    if (bdrv_parent_drained_poll(bs)) {
>>> +        return true;
>>> +    }
>>> +
>>>      return atomic_read(&bs->in_flight);
>>>  }
>>>  
>>
>> Up until now I liked very much this series, but I like this patch a bit
>> less for two reasons.
>>
>> 1) I think I would prefer to have the !top_level case in a separate
>> function---making the API easier to use in the BdrvChildRole callback
>> because there is no need to pass false.
> 
> Basically just move the aio_poll() out to a different function that
> calls bdrv_drain_poll afterwards? Maybe it's a bit cleaner, yes.
> However, see below.

Yes.

>> In addition, the callback is not really polling anything, but rather
>> returning whether the children are quiescent.  So a better name would
>> be bdrv_children_drained and c->role->drained.
> 
> Why isn't it polling? We're actively checking the state of the node, and
> we keep calling the callback until it has the expected state. Would it
> only become polling for you if the loop were in this function rather
> than the its caller?

It's just checking the status, it's not invoking the event loop.  The
polling (in the sense of aio_poll or AIO_POLL_WHILE) is done elsewhere,
this function is just the condition.  It's just nomenclature I guess.

>> 2) Worse, the main idea behind the first drain restructuring was that
>> draining could proceed in topological order: first drain the roots' I/O,
>> then call bdrv_drain to send the last requests to their children, then
>> recurse.  It is not clear to me why you need to introduce this recursive
>> step, which is also O(n^2) in the worst case.
> 
> I need to introduce it because it fixes the bug that we don't wait until
> the parents are actually quiesced and don't send new requests any more.
> I don't see how this could be fixed without going to the parents.

Yes, you do need to go to the parents.  I don't understand however why
you need more than a walk of the graph in parent-before-child order
(which is a topological order, so it is reverse depth-first order and
it's easy to do the walk in a recursive function).  If you're draining X
below:

     A
     |
     B   C
      \ /
       X

then you start by draining A/B/C in topological order (so A before B).
If B happens to be already quiescent, you can skip not only B but A too.
 If the nodes disappear or move elsewhere in the graph it's okay, you've
just done useless work.  When you're done you ensure that every parent
is quiescent, and if so you're done.  If it's not, , a new parent
appeared---drain that too, using the same parent-before-child order, and
loop.

Well, there is one gotcha: bdrv_ref protects against disappearance, but
bdrv_ref/bdrv_unref are not thread-safe.  Am I missing something else?

> Is the O(n²) that you mean that we recursively iterate all children in
> bdrv_do_drained_begin() (or later in the series in bdrv_drain_poll()),
> and then come back from the children when they poll their parents?

Yes.

Paolo

> We could do the same thing as for bdrv_parent_drained_begin(), i.e. pass
> the parent that we came from (BdrvChild *parent instead of bool
> top_level, NULL instead of top_level=true) and then skip that parent
> while calling the BdrvChildRole .drain_poll callbacks. Would that
> address your concerns?
> 
> In that solution, splitting the function by moving aio_poll() out
> wouldn't get rid of a parameter and simplify the API any more. It might
> still be cleaner, though?
> 
> Kevin
> 

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Qemu-devel] [PATCH 07/19] block: Really pause block jobs on drain
  2018-04-12 10:12       ` Paolo Bonzini
@ 2018-04-12 11:11         ` Kevin Wolf
  2018-04-12 11:30           ` Paolo Bonzini
  0 siblings, 1 reply; 52+ messages in thread
From: Kevin Wolf @ 2018-04-12 11:11 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-block, mreitz, famz, stefanha, qemu-devel

Am 12.04.2018 um 12:12 hat Paolo Bonzini geschrieben:
> On 12/04/2018 11:51, Kevin Wolf wrote:
> > Am 12.04.2018 um 10:37 hat Paolo Bonzini geschrieben:
> >> On 11/04/2018 18:39, Kevin Wolf wrote:
> >>> +bool bdrv_drain_poll(BlockDriverState *bs, bool top_level)
> >>>  {
> >>>      /* Execute pending BHs first and check everything else only after the BHs
> >>>       * have executed. */
> >>> -    while (aio_poll(bs->aio_context, false));
> >>> +    if (top_level) {
> >>> +        while (aio_poll(bs->aio_context, false));
> >>> +    }
> >>> +
> >>> +    if (bdrv_parent_drained_poll(bs)) {
> >>> +        return true;
> >>> +    }
> >>> +
> >>>      return atomic_read(&bs->in_flight);
> >>>  }
> >>>  
> >>
> >> Up until now I liked very much this series, but I like this patch a bit
> >> less for two reasons.
> >>
> >> 1) I think I would prefer to have the !top_level case in a separate
> >> function---making the API easier to use in the BdrvChildRole callback
> >> because there is no need to pass false.
> > 
> > Basically just move the aio_poll() out to a different function that
> > calls bdrv_drain_poll afterwards? Maybe it's a bit cleaner, yes.
> > However, see below.
> 
> Yes.
> 
> >> In addition, the callback is not really polling anything, but rather
> >> returning whether the children are quiescent.  So a better name would
> >> be bdrv_children_drained and c->role->drained.
> > 
> > Why isn't it polling? We're actively checking the state of the node, and
> > we keep calling the callback until it has the expected state. Would it
> > only become polling for you if the loop were in this function rather
> > than the its caller?
> 
> It's just checking the status, it's not invoking the event loop.  The
> polling (in the sense of aio_poll or AIO_POLL_WHILE) is done elsewhere,
> this function is just the condition.  It's just nomenclature I guess.

Yes, it doesn't poll the AioContext events. I don't think that the name
implies that, though. It does poll the state of the block node and its
users.

> >> 2) Worse, the main idea behind the first drain restructuring was that
> >> draining could proceed in topological order: first drain the roots' I/O,
> >> then call bdrv_drain to send the last requests to their children, then
> >> recurse.  It is not clear to me why you need to introduce this recursive
> >> step, which is also O(n^2) in the worst case.
> > 
> > I need to introduce it because it fixes the bug that we don't wait until
> > the parents are actually quiesced and don't send new requests any more.
> > I don't see how this could be fixed without going to the parents.
> 
> Yes, you do need to go to the parents.  I don't understand however why
> you need more than a walk of the graph in parent-before-child order
> (which is a topological order, so it is reverse depth-first order and
> it's easy to do the walk in a recursive function).  If you're draining X
> below:
> 
>      A
>      |
>      B   C
>       \ /
>        X
> 
> then you start by draining A/B/C in topological order (so A before B).
> If B happens to be already quiescent, you can skip not only B but A too.

Are we talking only about bdrv_drain_poll() here or the complete
bdrv_do_drained_begin() call?

If you don't want to recurse separately for .drain_begin and the polling
phase, but keep everything in one function like it's before this series,
you can't skip anything just because it's already quiescent. You don't
have control over that other drain and it might end too early; also,
.drain_begin/end must come in pairs.

If we're only talking about .drain_poll, then in theory you could skip A
if B knows that it's fully quiesced (i.e. not only .drain_begin was
called, but also .drain_poll has returned false at least once). We don't
store this information anywhere, though, and I'm not sure if the saved
recursion would be worth additional state.

>  If the nodes disappear or move elsewhere in the graph it's okay, you've
> just done useless work.

Not really. The node is drained now and bdrv_do_drained_end() won't end
the drained section because the moved node isn't reachable any more from
this original node.

> When you're done you ensure that every parent
> is quiescent, and if so you're done.  If it's not, , a new parent
> appeared---drain that too, using the same parent-before-child order, and
> loop.

That might work if we only ever were in a single drained section and
BlockDriverState had a bool quiesced. What we do have is a
quiesce_count, and when you look at the quiesce_count of a node, you
can't tell whether it came from you or someone else. So there is no way
to tell whether a parent was already drained by you or not.

And to be honest, even if it worked, it would still seem kind of
complicated to me. Just calling .drain_begin everywhere first and then
polling until nobody has anything in flight any more feels much simpler
conceptually than keeping track of which nodes we drained and repeating
that until no new nodes appear, and somehow undraining the nodes that
disappeared, but are potentially still referenced from somewhere else.

> Well, there is one gotcha: bdrv_ref protects against disappearance, but
> bdrv_ref/bdrv_unref are not thread-safe.  Am I missing something else?

Apart from the above, if we do an extra bdrv_ref/unref we'd also have
to keep track of all the nodes that we've referenced so that we unref
the same nodes again, even if the graph has changes.

So essentially you'd be introducing a new list of BDSes that we have to
manage and then check for every reachable node whether it's already in
that list or not, and for every node in the list whether it's still
reachable. I don't really see how that's going to reduce the complexity.

> > Is the O(n²) that you mean that we recursively iterate all children in
> > bdrv_do_drained_begin() (or later in the series in bdrv_drain_poll()),
> > and then come back from the children when they poll their parents?
> 
> Yes.
> 
> Paolo
> 
> > We could do the same thing as for bdrv_parent_drained_begin(), i.e. pass
> > the parent that we came from (BdrvChild *parent instead of bool
> > top_level, NULL instead of top_level=true) and then skip that parent
> > while calling the BdrvChildRole .drain_poll callbacks. Would that
> > address your concerns?

Ok, then this should be the solution. I can implement it for v2.

Kevin

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Qemu-devel] [PATCH 07/19] block: Really pause block jobs on drain
  2018-04-12 11:11         ` Kevin Wolf
@ 2018-04-12 11:30           ` Paolo Bonzini
  2018-04-12 11:53             ` Kevin Wolf
  0 siblings, 1 reply; 52+ messages in thread
From: Paolo Bonzini @ 2018-04-12 11:30 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: qemu-block, mreitz, famz, stefanha, qemu-devel

On 12/04/2018 13:11, Kevin Wolf wrote:
>> Well, there is one gotcha: bdrv_ref protects against disappearance, but
>> bdrv_ref/bdrv_unref are not thread-safe.  Am I missing something else?
>
> Apart from the above, if we do an extra bdrv_ref/unref we'd also have
> to keep track of all the nodes that we've referenced so that we unref
> the same nodes again, even if the graph has changes.
>
> So essentially you'd be introducing a new list of BDSes that we have to
> manage and then check for every reachable node whether it's already in
> that list or not, and for every node in the list whether it's still
> reachable.

That would be a hash table (a set), not a list, so easy to check.  But
the thread-safety is a bigger issue.

The problem I have is that there is a direction through which I/O flows
(parent-to-child), so why can't draining follow that natural direction.
Having to check for the parents' I/O, while draining the child, seems
wrong.  Perhaps we can't help it, but I cannot understand the reason.

Paolo

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Qemu-devel] [PATCH 07/19] block: Really pause block jobs on drain
  2018-04-12 11:30           ` Paolo Bonzini
@ 2018-04-12 11:53             ` Kevin Wolf
  2018-04-12 12:02               ` Paolo Bonzini
  0 siblings, 1 reply; 52+ messages in thread
From: Kevin Wolf @ 2018-04-12 11:53 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-block, mreitz, famz, stefanha, qemu-devel

Am 12.04.2018 um 13:30 hat Paolo Bonzini geschrieben:
> On 12/04/2018 13:11, Kevin Wolf wrote:
> >> Well, there is one gotcha: bdrv_ref protects against disappearance, but
> >> bdrv_ref/bdrv_unref are not thread-safe.  Am I missing something else?
> >
> > Apart from the above, if we do an extra bdrv_ref/unref we'd also have
> > to keep track of all the nodes that we've referenced so that we unref
> > the same nodes again, even if the graph has changes.
> >
> > So essentially you'd be introducing a new list of BDSes that we have to
> > manage and then check for every reachable node whether it's already in
> > that list or not, and for every node in the list whether it's still
> > reachable.
> 
> That would be a hash table (a set), not a list, so easy to check.  But
> the thread-safety is a bigger issue.
> 
> The problem I have is that there is a direction through which I/O flows
> (parent-to-child), so why can't draining follow that natural direction.
> Having to check for the parents' I/O, while draining the child, seems
> wrong.  Perhaps we can't help it, but I cannot understand the reason.

I'm not sure what's there that could be not understood. You already
confirmed that we need to drain the parents, too, when we drain a node.
Drain really must propagate in the opposite direction of I/O, because
part of its job is to quiesce the origin of any I/O to the node that
should be drained. Opposite of I/O _is_ the natural direction for drain.

We also have subtree drains, but that's not because that's the natural
direction for drain, but just as a convenience function because some
operations (e.g. reopen) affect a whole subtree, so they need everything
in that subtree drained rather than just a single node.

Kevin

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Qemu-devel] [PATCH 07/19] block: Really pause block jobs on drain
  2018-04-12 11:53             ` Kevin Wolf
@ 2018-04-12 12:02               ` Paolo Bonzini
  2018-04-12 13:27                 ` Kevin Wolf
  0 siblings, 1 reply; 52+ messages in thread
From: Paolo Bonzini @ 2018-04-12 12:02 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: qemu-block, mreitz, famz, stefanha, qemu-devel

On 12/04/2018 13:53, Kevin Wolf wrote:
>> The problem I have is that there is a direction through which I/O flows
>> (parent-to-child), so why can't draining follow that natural direction.
>> Having to check for the parents' I/O, while draining the child, seems
>> wrong.  Perhaps we can't help it, but I cannot understand the reason.
> I'm not sure what's there that could be not understood. You already
> confirmed that we need to drain the parents, too, when we drain a node.
> Drain really must propagate in the opposite direction of I/O, because
> part of its job is to quiesce the origin of any I/O to the node that
> should be drained. Opposite of I/O _is_ the natural direction for drain.

Opposite of I/O is the natural direction for drain to propagate, yes.

However, I/O direction is the natural direction for requests to stop.
After quiescing X and calling X->drv->bdrv_drain(X), there can be
pending requests only in X's children.  So I don't understand why you
need to keep checking in_flight over the whole subgraph, when there are
roots that will conclude their request first, and then their children,
and so on so forth.

Thanks,

Paolo

> We also have subtree drains, but that's not because that's the natural
> direction for drain, but just as a convenience function because some
> operations (e.g. reopen) affect a whole subtree, so they need everything
> in that subtree drained rather than just a single node.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Qemu-devel] [PATCH 07/19] block: Really pause block jobs on drain
  2018-04-12 12:02               ` Paolo Bonzini
@ 2018-04-12 13:27                 ` Kevin Wolf
  2018-04-12 13:42                   ` Paolo Bonzini
  0 siblings, 1 reply; 52+ messages in thread
From: Kevin Wolf @ 2018-04-12 13:27 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-block, mreitz, famz, stefanha, qemu-devel

Am 12.04.2018 um 14:02 hat Paolo Bonzini geschrieben:
> On 12/04/2018 13:53, Kevin Wolf wrote:
> >> The problem I have is that there is a direction through which I/O flows
> >> (parent-to-child), so why can't draining follow that natural direction.
> >> Having to check for the parents' I/O, while draining the child, seems
> >> wrong.  Perhaps we can't help it, but I cannot understand the reason.
> > I'm not sure what's there that could be not understood. You already
> > confirmed that we need to drain the parents, too, when we drain a node.
> > Drain really must propagate in the opposite direction of I/O, because
> > part of its job is to quiesce the origin of any I/O to the node that
> > should be drained. Opposite of I/O _is_ the natural direction for drain.
> 
> Opposite of I/O is the natural direction for drain to propagate, yes.
> 
> However, I/O direction is the natural direction for requests to stop.
> After quiescing X and calling X->drv->bdrv_drain(X), there can be
> pending requests only in X's children.  So I don't understand why you
> need to keep checking in_flight over the whole subgraph, when there are
> roots that will conclude their request first, and then their children,
> and so on so forth.

Not sure I follow. Let's look at an example. Say, we have a block job
BlockBackend as the root (because that uses proper layering, unlike
devices which use aio_disable_external()), connected to a qcow2 node
over file.

1. The block job issues a request to the qcow2 node

2. In order to process that request, the qcow2 node internally issues
   one or more requests to the file node

3. Someone calls bdrv_drained_begin(qcow2_node)

4. We call BlockDriver.bdrv_drained_begin() for qcow2_node and
   file_node, and BdrvChildRole.drained_begin() for the block job.

   Now the block nodes don't create any new original requests any more
   (qcow2 and file don't do that anyway; qcow2 only creates requests to
   its children in the context of parent requests). The job potentially
   continues sending requests until it reaches the next pause point. The
   previously issued requests are still in flight.

   Is this step what you meant by X->drv->bdrv_drain(X)? I don't see why
   pending requests can only be in X's children. Why can't the request
   be pending in X itself, say waiting for a thread pool worker
   decrypting a buffer?

   Also, note that after this series, the driver callbacks are called
   asynchronously, but I don't think it makes a big difference here.

5. The file_node request completes. file_node doesn't have any requests
   in flight any more, but in theory it could still get new requests
   from qcow2_node. Anyway, let's say this is the last request, so I
   think we'd call its requests concluded?

6. qcow2 still has a request in flight, but doesn't need to access
   file_node for it. It finishes the work and therefore concludes its
   requests as well. Note that qcow2_node (the parent) concludes after
   file_node (the child).

7. We'll keep the example simple, so after completion of its request,
   the job reaches a pause point without sending a new request. Again,
   this happens after qcow2_node has concluded.

8. Only when neither file_node nor qcow2_node have a request in flight
   and the job has reached a pause point, bdrv_drained_begin() can
   return.

So completing the last request and reaching an actually quiescent state
looks very much like a process in child-to-parent order to me?

Kevin

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Qemu-devel] [PATCH 07/19] block: Really pause block jobs on drain
  2018-04-12 13:27                 ` Kevin Wolf
@ 2018-04-12 13:42                   ` Paolo Bonzini
  2018-04-12 14:25                     ` Kevin Wolf
  0 siblings, 1 reply; 52+ messages in thread
From: Paolo Bonzini @ 2018-04-12 13:42 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: qemu-block, mreitz, famz, stefanha, qemu-devel

On 12/04/2018 15:27, Kevin Wolf wrote:
> Not sure I follow. Let's look at an example. Say, we have a block job
> BlockBackend as the root (because that uses proper layering, unlike
> devices which use aio_disable_external()), connected to a qcow2 node
> over file.
> 
> 1. The block job issues a request to the qcow2 node
> 
> 2. In order to process that request, the qcow2 node internally issues
>    one or more requests to the file node
> 
> 3. Someone calls bdrv_drained_begin(qcow2_node)
> 
> 4. We call BlockDriver.bdrv_drained_begin() for qcow2_node and
>    file_node, and BdrvChildRole.drained_begin() for the block job.
> 
>    Now the block nodes don't create any new original requests any more
>    (qcow2 and file don't do that anyway; qcow2 only creates requests to
>    its children in the context of parent requests). The job potentially
>    continues sending requests until it reaches the next pause point. The
>    previously issued requests are still in flight.
>
>    Is this step what you meant by X->drv->bdrv_drain(X)? I don't see why
>    pending requests can only be in X's children. Why can't the request
>    be pending in X itself, say waiting for a thread pool worker
>    decrypting a buffer?

No, that step is ->bdrv_co_drain_begin in BlockDriver.  It's where the
"last" requests are sent to file_node after we know that qcow2_node
won't get any more requests.

>    Also, note that after this series, the driver callbacks are called
>    asynchronously, but I don't think it makes a big difference here.
> 
> 5. The file_node request completes. file_node doesn't have any requests
>    in flight any more, but in theory it could still get new requests
>    from qcow2_node. Anyway, let's say this is the last request, so I
>    think we'd call its requests concluded?

No, if it can still get more requests they're not concluded.  That's why
we need to first ensure qcow2_node is quiescent, and before then we need
to ensure that the BlockBackends are quiescent (in this case meaning the
job has reached its pause point).  Only then we can look at file_node.
In this case we'll see that we have nothing to do---file_node is already
quiescent---and bdrv_drained_begin() can return.

> 6. qcow2 still has a request in flight, but doesn't need to access
>    file_node for it. It finishes the work and therefore concludes its
>    requests as well. Note that qcow2_node (the parent) concludes after
>    file_node (the child).
> 
> 7. We'll keep the example simple, so after completion of its request,
>    the job reaches a pause point without sending a new request. Again,
>    this happens after qcow2_node has concluded.
> 
> 8. Only when neither file_node nor qcow2_node have a request in flight
>    and the job has reached a pause point, bdrv_drained_begin() can
>    return.
> 
> So completing the last request and reaching an actually quiescent state
> looks very much like a process in child-to-parent order to me?

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Qemu-devel] [PATCH 07/19] block: Really pause block jobs on drain
  2018-04-12 13:42                   ` Paolo Bonzini
@ 2018-04-12 14:25                     ` Kevin Wolf
  2018-04-12 20:44                       ` Paolo Bonzini
  0 siblings, 1 reply; 52+ messages in thread
From: Kevin Wolf @ 2018-04-12 14:25 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-block, mreitz, famz, stefanha, qemu-devel

Am 12.04.2018 um 15:42 hat Paolo Bonzini geschrieben:
> On 12/04/2018 15:27, Kevin Wolf wrote:
> > Not sure I follow. Let's look at an example. Say, we have a block job
> > BlockBackend as the root (because that uses proper layering, unlike
> > devices which use aio_disable_external()), connected to a qcow2 node
> > over file.
> > 
> > 1. The block job issues a request to the qcow2 node
> > 
> > 2. In order to process that request, the qcow2 node internally issues
> >    one or more requests to the file node
> > 
> > 3. Someone calls bdrv_drained_begin(qcow2_node)
> > 
> > 4. We call BlockDriver.bdrv_drained_begin() for qcow2_node and
> >    file_node, and BdrvChildRole.drained_begin() for the block job.
> > 
> >    Now the block nodes don't create any new original requests any more
> >    (qcow2 and file don't do that anyway; qcow2 only creates requests to
> >    its children in the context of parent requests). The job potentially
> >    continues sending requests until it reaches the next pause point. The
> >    previously issued requests are still in flight.
> >
> >    Is this step what you meant by X->drv->bdrv_drain(X)? I don't see why
> >    pending requests can only be in X's children. Why can't the request
> >    be pending in X itself, say waiting for a thread pool worker
> >    decrypting a buffer?
> 
> No, that step is ->bdrv_co_drain_begin in BlockDriver.  It's where the
> "last" requests are sent to file_node after we know that qcow2_node
> won't get any more requests.

That's not really how .bdrv_co_drain_begin works. It is called first,
potentially long before we know that the parent won't send new requests
any more. All it does is preventing the node from creating new original
requests, i.e. internal requests that are not triggered by a parent
request.

> >    Also, note that after this series, the driver callbacks are called
> >    asynchronously, but I don't think it makes a big difference here.
> > 
> > 5. The file_node request completes. file_node doesn't have any requests
> >    in flight any more, but in theory it could still get new requests
> >    from qcow2_node. Anyway, let's say this is the last request, so I
> >    think we'd call its requests concluded?
> 
> No, if it can still get more requests they're not concluded.

Okay. In that case, we can't really determine any order, because the
whole subtree concludes when the root node concludes. (Ignoring any
additional parents, but the effect is the same: Children always conclude
at the same time as their last parent.)

> That's why we need to first ensure qcow2_node is quiescent, and before
> then we need to ensure that the BlockBackends are quiescent (in this
> case meaning the job has reached its pause point).  Only then we can
> look at file_node.  In this case we'll see that we have nothing to
> do---file_node is already quiescent---and bdrv_drained_begin() can
> return.

bdrv_drained_begin(qcow2_node) actually never looks at file_node. Only
bdrv_subtree_drained_begin(qcow2_node) would do so.

But what you're saying is essentially that for a subtree drain, we want
the following order in bdrv_drained_poll():

1. Check if a parent still has pending requests. If so, don't bother
   checking the rest.

2. Check the node the drain request was for. If so, don't bother looking
   at the children.

3. Finally, if all parents and the node itself are drained, look at the
   children.

This is already the order we have there. What is probably different from
what you envision is that after the parents have concluded, we still
check that they are still quiescent in every iteration.

What we could do easily is introducing a bool bs->quiesce_concluded or
something that bdrv_drain_poll() sets to true the first time that it
returns false. It also gets a shortcut so that it returns false
immediately if bs->quiesce_concluded is true. The field is reset in
bdrv_do_drained_end() when bs->quiesce_counter reaches 0.

That would be a rather simple optimisation that could be done as the
final patch in this series, and would ensure that we don't recurse back
to parents that are already quiescent.

Would you be happy with this change?

Kevin

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Qemu-devel] [PATCH 07/19] block: Really pause block jobs on drain
  2018-04-12 14:25                     ` Kevin Wolf
@ 2018-04-12 20:44                       ` Paolo Bonzini
  2018-04-13  8:01                         ` Kevin Wolf
  0 siblings, 1 reply; 52+ messages in thread
From: Paolo Bonzini @ 2018-04-12 20:44 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: qemu-block, mreitz, famz, stefanha, qemu-devel

On 12/04/2018 16:25, Kevin Wolf wrote:
> This is already the order we have there. What is probably different from
> what you envision is that after the parents have concluded, we still
> check that they are still quiescent in every iteration.

Yes, and that's the quadratic part.

> What we could do easily is introducing a bool bs->quiesce_concluded or
> something that bdrv_drain_poll() sets to true the first time that it
> returns false. It also gets a shortcut so that it returns false
> immediately if bs->quiesce_concluded is true. The field is reset in
> bdrv_do_drained_end() when bs->quiesce_counter reaches 0.

Or bs->quiescent, for the sake of bikeshedding.

> That would be a rather simple optimisation that could be done as the
> final patch in this series, and would ensure that we don't recurse back
> to parents that are already quiescent.
> 
> Would you be happy with this change?

Yes, I'll leave the organization of the series to you of course.

Paolo

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Qemu-devel] [PATCH 07/19] block: Really pause block jobs on drain
  2018-04-12 20:44                       ` Paolo Bonzini
@ 2018-04-13  8:01                         ` Kevin Wolf
  2018-04-13 11:05                           ` [Qemu-devel] [Qemu-block] " Paolo Bonzini
  0 siblings, 1 reply; 52+ messages in thread
From: Kevin Wolf @ 2018-04-13  8:01 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-block, mreitz, famz, stefanha, qemu-devel

Am 12.04.2018 um 22:44 hat Paolo Bonzini geschrieben:
> On 12/04/2018 16:25, Kevin Wolf wrote:
> > This is already the order we have there. What is probably different from
> > what you envision is that after the parents have concluded, we still
> > check that they are still quiescent in every iteration.
> 
> Yes, and that's the quadratic part.
> 
> > What we could do easily is introducing a bool bs->quiesce_concluded or
> > something that bdrv_drain_poll() sets to true the first time that it
> > returns false. It also gets a shortcut so that it returns false
> > immediately if bs->quiesce_concluded is true. The field is reset in
> > bdrv_do_drained_end() when bs->quiesce_counter reaches 0.
> 
> Or bs->quiescent, for the sake of bikeshedding.

Yes, that sounds better.

The only problem with the proposal as I made it is that it's wrong. We
can't keep bs->quiescent until bdrv_do_drained_end() because the caller
can issue new requests and then have a nested drained section that needs
to wait for all requests again instead of deciding that everything is
already quiescent.

Maybe where we should really reset it is in the initial recursion of
bdrv_do_drained_begin(), specifically in bdrv_do_drained_begin_quiesce()
which is called by both the parent and the child recursion.

There don't seem to be completely obviously correct solutions (can't an
I/O thread be draining a specific node while the main loop runs
drain_all?), but this would probably be the most obvious one.

Hm... Or actually, reset bs->quiescent in bdrv_inc_in_flight()? Would
this be enough?

Kevin

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH 07/19] block: Really pause block jobs on drain
  2018-04-13  8:01                         ` Kevin Wolf
@ 2018-04-13 11:05                           ` Paolo Bonzini
  2018-04-13 12:40                             ` Kevin Wolf
  0 siblings, 1 reply; 52+ messages in thread
From: Paolo Bonzini @ 2018-04-13 11:05 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: qemu-devel, famz, stefanha, qemu-block, mreitz

On 13/04/2018 10:01, Kevin Wolf wrote:
>> Or bs->quiescent, for the sake of bikeshedding.
> Yes, that sounds better.
> 
> The only problem with the proposal as I made it is that it's wrong. We
> can't keep bs->quiescent until bdrv_do_drained_end() because the caller
> can issue new requests and then have a nested drained section that needs
> to wait for all requests again instead of deciding that everything is
> already quiescent.
> 
> Maybe where we should really reset it is in the initial recursion of
> bdrv_do_drained_begin(), specifically in bdrv_do_drained_begin_quiesce()
> which is called by both the parent and the child recursion.
> 
> There don't seem to be completely obviously correct solutions (can't an
> I/O thread be draining a specific node while the main loop runs
> drain_all?), but this would probably be the most obvious one.

Or use a hash table?

Paolo

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH 07/19] block: Really pause block jobs on drain
  2018-04-13 11:05                           ` [Qemu-devel] [Qemu-block] " Paolo Bonzini
@ 2018-04-13 12:40                             ` Kevin Wolf
  0 siblings, 0 replies; 52+ messages in thread
From: Kevin Wolf @ 2018-04-13 12:40 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel, famz, stefanha, qemu-block, mreitz

Am 13.04.2018 um 13:05 hat Paolo Bonzini geschrieben:
> On 13/04/2018 10:01, Kevin Wolf wrote:
> >> Or bs->quiescent, for the sake of bikeshedding.
> > Yes, that sounds better.
> > 
> > The only problem with the proposal as I made it is that it's wrong. We
> > can't keep bs->quiescent until bdrv_do_drained_end() because the caller
> > can issue new requests and then have a nested drained section that needs
> > to wait for all requests again instead of deciding that everything is
> > already quiescent.
> > 
> > Maybe where we should really reset it is in the initial recursion of
> > bdrv_do_drained_begin(), specifically in bdrv_do_drained_begin_quiesce()
> > which is called by both the parent and the child recursion.
> > 
> > There don't seem to be completely obviously correct solutions (can't an
> > I/O thread be draining a specific node while the main loop runs
> > drain_all?), but this would probably be the most obvious one.
> 
> Or use a hash table?

I don't think it would be any more obvious, but it would bring in quite
a bit of additional complexity (structural, not computational), with
multiple places that create a hash table and then it would have to be
passed down to all functions involved in the recursion etc.

The fundamental question would stay the same as with bool quiescent:
When do you have to enter a node in the hash table, and when do you have
to remove it again?

The first question is easy, you mark it quiescent when bdrv_drain_poll()
returns false. The second is a bit harder, but reseting the quiescent
state in bdrv_do_drained_begin_quiesce() feels like the best place. It's
much simpler than recursively resetting it in all places that start new
activity (which would include BlockBackend users, not only nodes).

Kevin

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH 02/19] block: Use bdrv_do_drain_begin/end in bdrv_drain_all()
  2018-04-11 16:39 ` [Qemu-devel] [PATCH 02/19] block: Use bdrv_do_drain_begin/end in bdrv_drain_all() Kevin Wolf
@ 2018-04-20  7:07   ` Stefan Hajnoczi
  0 siblings, 0 replies; 52+ messages in thread
From: Stefan Hajnoczi @ 2018-04-20  7:07 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: qemu-block, famz, qemu-devel, mreitz, stefanha, pbonzini

[-- Attachment #1: Type: text/plain, Size: 967 bytes --]

On Wed, Apr 11, 2018 at 06:39:23PM +0200, Kevin Wolf wrote:
> bdrv_do_drain_begin/end() implement already everything that
> bdrv_drain_all_begin/end() need and currently still do manually: Disable
> external events, call parent drain callbacks, call block driver
> callbacks.
> 
> It also does two more things:
> 
> The first is incrementing bs->quiesce_counter. bdrv_drain_all() already
> stood out in the test case by behaving different from the other drain
> variants. Adding this is not only safe, but in fact a bug fix.
> 
> The second is calling bdrv_drain_recurse(). We already do that later in
> the same function in a loop, so basically doing an early first iteration
> doesn't hurt.
> 
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> ---
>  block/io.c              | 10 ++--------
>  tests/test-bdrv-drain.c | 14 ++++----------
>  2 files changed, 6 insertions(+), 18 deletions(-)

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH 03/19] block: Remove 'recursive' parameter from bdrv_drain_invoke()
  2018-04-11 16:39 ` [Qemu-devel] [PATCH 03/19] block: Remove 'recursive' parameter from bdrv_drain_invoke() Kevin Wolf
@ 2018-04-20  7:09   ` Stefan Hajnoczi
  0 siblings, 0 replies; 52+ messages in thread
From: Stefan Hajnoczi @ 2018-04-20  7:09 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: qemu-block, famz, qemu-devel, mreitz, stefanha, pbonzini

[-- Attachment #1: Type: text/plain, Size: 333 bytes --]

On Wed, Apr 11, 2018 at 06:39:24PM +0200, Kevin Wolf wrote:
> All callers pass false for the 'recursive' parameter now. Remove it.
> 
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> ---
>  block/io.c | 13 +++----------
>  1 file changed, 3 insertions(+), 10 deletions(-)

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH 04/19] block: Don't manually poll in bdrv_drain_all()
  2018-04-11 16:39 ` [Qemu-devel] [PATCH 04/19] block: Don't manually poll in bdrv_drain_all() Kevin Wolf
  2018-04-11 18:32   ` Eric Blake
@ 2018-04-20  7:11   ` Stefan Hajnoczi
  1 sibling, 0 replies; 52+ messages in thread
From: Stefan Hajnoczi @ 2018-04-20  7:11 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: qemu-block, famz, qemu-devel, mreitz, stefanha, pbonzini

[-- Attachment #1: Type: text/plain, Size: 1131 bytes --]

On Wed, Apr 11, 2018 at 06:39:25PM +0200, Kevin Wolf wrote:
> All involved nodes are already idle, we called bdrv_do_draine_begin() on
> them.
> 
> The comment in the code suggested that this were not correct because the
> completion of a request on one node could spawn a new request on a
> different node (which might have been drained before, so we wouldn't
> drain the new request). In reality, new requests to different nodes
> aren't spawned out of nothing, but only in the context of a parent
> request, and they aren't submitted to random nodes, but only to child
> nodes. As long as we still poll for the completion of the parent request
> (which we do), draining each root node separately is good enough.
> 
> Remove the additional polling code from bdrv_drain_all_begin() and
> replace it with an assertion that all nodes are already idle after we
> drained them separately.
> 
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> ---
>  block/io.c | 41 ++++++++++++-----------------------------
>  1 file changed, 12 insertions(+), 29 deletions(-)

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH 05/19] tests/test-bdrv-drain: bdrv_drain_all() works in coroutines now
  2018-04-11 16:39 ` [Qemu-devel] [PATCH 05/19] tests/test-bdrv-drain: bdrv_drain_all() works in coroutines now Kevin Wolf
  2018-04-11 18:33   ` Eric Blake
@ 2018-04-20  7:12   ` Stefan Hajnoczi
  1 sibling, 0 replies; 52+ messages in thread
From: Stefan Hajnoczi @ 2018-04-20  7:12 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: qemu-block, famz, qemu-devel, mreitz, stefanha, pbonzini

[-- Attachment #1: Type: text/plain, Size: 661 bytes --]

On Wed, Apr 11, 2018 at 06:39:26PM +0200, Kevin Wolf wrote:
> Since we use bdrv_do_drained_begin/end() for bdrv_drain_all_begin/end(),
> coroutine context is automatically left with a BH, preventing the
> deadlocks that made bdrv_drain_all*() unsafe in coroutine context. We
> can consider it compatible now the latest, after having removed the old
> polling code as dead code.
> 
> Enable the coroutine test cases for bdrv_drain_all().
> 
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> ---
>  tests/test-bdrv-drain.c | 16 ++++++++++++++--
>  1 file changed, 14 insertions(+), 2 deletions(-)

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH 06/19] block: Avoid unnecessary aio_poll() in AIO_WAIT_WHILE()
  2018-04-11 16:39 ` [Qemu-devel] [PATCH 06/19] block: Avoid unnecessary aio_poll() in AIO_WAIT_WHILE() Kevin Wolf
  2018-04-11 17:33   ` Su Hang
@ 2018-04-20  7:17   ` Stefan Hajnoczi
  1 sibling, 0 replies; 52+ messages in thread
From: Stefan Hajnoczi @ 2018-04-20  7:17 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: qemu-block, famz, qemu-devel, mreitz, stefanha, pbonzini

[-- Attachment #1: Type: text/plain, Size: 1159 bytes --]

On Wed, Apr 11, 2018 at 06:39:27PM +0200, Kevin Wolf wrote:
> Commit 91af091f923 added an additional aio_poll() to BDRV_POLL_WHILE()
> in order to make sure that all pending BHs are executed on drain. This
> was the wrong place to make the fix, as it is useless overhead for all
> other users of the macro and unnecessarily complicates the mechanism.
> 
> This patch effectively reverts said commit (the context has changed a
> bit and the code has moved to AIO_WAIT_WHILE()) and instead polls in the
> loop condition for drain.
> 
> The effect is probably hard to measure in any real-world use case
> because actual I/O will dominate, but if I run only the initialisation
> part of 'qemu-img convert' where it calls bdrv_block_status() for the
> whole image to find out how much data there is copy, this phase actually
> needs only roughly half the time after this patch.
> 
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> ---
>  include/block/aio-wait.h | 22 ++++++++--------------
>  block/io.c               | 11 ++++++++++-
>  2 files changed, 18 insertions(+), 15 deletions(-)

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH 08/19] block: Remove bdrv_drain_recurse()
  2018-04-11 16:39 ` [Qemu-devel] [PATCH 08/19] block: Remove bdrv_drain_recurse() Kevin Wolf
  2018-04-12  8:39   ` Paolo Bonzini
@ 2018-04-20  7:20   ` Stefan Hajnoczi
  1 sibling, 0 replies; 52+ messages in thread
From: Stefan Hajnoczi @ 2018-04-20  7:20 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: qemu-block, famz, qemu-devel, mreitz, stefanha, pbonzini

[-- Attachment #1: Type: text/plain, Size: 713 bytes --]

On Wed, Apr 11, 2018 at 06:39:29PM +0200, Kevin Wolf wrote:
> For bdrv_drain(), recursively waiting for child node requests is
> pointless because we didn't quiesce their parents, so new requests could
> come in anyway. Letting the function work only on a single node makes it
> more consistent.
> 
> For subtree drains and drain_all, we already have the recursion in
> bdrv_do_drained_begin(), so the extra recursion doesn't add anything
> either.
> 
> Remove the useless code.
> 
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> ---
>  block/io.c | 36 +++---------------------------------
>  1 file changed, 3 insertions(+), 33 deletions(-)

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH 09/19] test-bdrv-drain: Add test for node deletion
  2018-04-11 16:39 ` [Qemu-devel] [PATCH 09/19] test-bdrv-drain: Add test for node deletion Kevin Wolf
@ 2018-04-20  7:32   ` Stefan Hajnoczi
  0 siblings, 0 replies; 52+ messages in thread
From: Stefan Hajnoczi @ 2018-04-20  7:32 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: qemu-block, famz, qemu-devel, mreitz, stefanha, pbonzini

[-- Attachment #1: Type: text/plain, Size: 1763 bytes --]

On Wed, Apr 11, 2018 at 06:39:30PM +0200, Kevin Wolf wrote:
> +static void do_test_delete_by_drain(bool detach_instead_of_delete)
> +{
> +    BlockBackend *blk;
> +    BlockDriverState *bs, *child_bs, *null_bs;
> +    BDRVTestTopState *tts;
> +    TestCoDeleteByDrainData dbdd;
> +    Coroutine *co;
> +
> +    bs = bdrv_new_open_driver(&bdrv_test_top_driver, "top", BDRV_O_RDWR,
> +                              &error_abort);
> +    bs->total_sectors = 65536 >> BDRV_SECTOR_BITS;
> +    tts = bs->opaque;
> +
> +    null_bs = bdrv_open("null-co://", NULL, NULL, BDRV_O_RDWR | BDRV_O_PROTOCOL,
> +                        &error_abort);
> +    bdrv_attach_child(bs, null_bs, "null-child", &child_file, &error_abort);
> +
> +    /* This child will be the one to pass to requests through to, and
> +     * it will stall until a drain occurs */
> +    child_bs = bdrv_new_open_driver(&bdrv_test, "child", BDRV_O_RDWR,
> +                                    &error_abort);
> +    child_bs->total_sectors = 65536 >> BDRV_SECTOR_BITS;
> +    /* Takes our reference to child_bs */
> +    tts->wait_child = bdrv_attach_child(bs, child_bs, "wait-child", &child_file,
> +                                        &error_abort);
> +
> +    /* This child is just there to be deleted
> +     * (for detach_instead_of_delete == true) */
> +    null_bs = bdrv_open("null-co://", NULL, NULL, BDRV_O_RDWR | BDRV_O_PROTOCOL,
> +                        &error_abort);
> +    bdrv_attach_child(bs, null_bs, "null-child", &child_file, &error_abort);

Why is null_bs created twice (with the exact same "null-child" name)?
I'm surprised that bdrv_attach_child() succeeds with a duplicate name.
Anyway, I find the duplicate null_bs confusing and I'm not sure if it's
a copy-paste mistake.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Qemu-devel] [PATCH 00/19] Drain fixes and cleanups, part 3
  2018-04-11 16:39 [Qemu-devel] [PATCH 00/19] Drain fixes and cleanups, part 3 Kevin Wolf
                   ` (19 preceding siblings ...)
  2018-04-11 17:05 ` [Qemu-devel] [PATCH 00/19] Drain fixes and cleanups, part 3 no-reply
@ 2018-04-20  7:35 ` Stefan Hajnoczi
  20 siblings, 0 replies; 52+ messages in thread
From: Stefan Hajnoczi @ 2018-04-20  7:35 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: qemu-block, famz, qemu-devel, mreitz, stefanha, pbonzini

[-- Attachment #1: Type: text/plain, Size: 467 bytes --]

On Wed, Apr 11, 2018 at 06:39:21PM +0200, Kevin Wolf wrote:
> This is the third and hopefully for now last part of my work to fix
> drain. The main goal of this series is to make drain robust against
> graph changes that happen in any callbacks of in-flight requests while
> we drain a block node.

I have reviewed the first half of this series and will review the rest
when v2 is posted since I want to see how the changes suggested by Paolo
play out first.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Qemu-devel] [PATCH 14/19] block: Defer .bdrv_drain_begin callback to polling phase
  2018-04-11 16:39 ` [Qemu-devel] [PATCH 14/19] block: Defer .bdrv_drain_begin callback to polling phase Kevin Wolf
@ 2018-06-27 14:30   ` Max Reitz
  2018-06-29 15:14     ` Kevin Wolf
  0 siblings, 1 reply; 52+ messages in thread
From: Max Reitz @ 2018-06-27 14:30 UTC (permalink / raw)
  To: Kevin Wolf, qemu-block; +Cc: pbonzini, famz, stefanha, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 1539 bytes --]

On 2018-04-11 18:39, Kevin Wolf wrote:
> We cannot allow aio_poll() in bdrv_drain_invoke(begin=true) until we're
> done with propagating the drain through the graph and are doing the
> single final BDRV_POLL_WHILE().
> 
> Just schedule the coroutine with the callback and increase bs->in_flight
> to make sure that the polling phase will wait for it.
> 
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> ---
>  block/io.c | 28 +++++++++++++++++++++++-----
>  1 file changed, 23 insertions(+), 5 deletions(-)

According to bisect, this breaks blockdev-snapshot with QED:

$ ./qemu-img create -f qed foo.qed 64M
Formatting 'foo.qed', fmt=qed size=67108864 cluster_size=65536
$ echo "{'execute':'qmp_capabilities'}
        {'execute':'blockdev-snapshot',
         'arguments':{'node':'backing','overlay':'overlay'}}
        {'execute':'quit'}" | \
    x86_64-softmmu/qemu-system-x86_64 -qmp stdio -nodefaults \
        -blockdev "{'node-name':'backing','driver':'null-co'}" \
        -blockdev "{'node-name':'overlay','driver':'qed',
                    'file':{'driver':'file','filename':'foo.qed'}}"
{"QMP": {"version": {"qemu": {"micro": 50, "minor": 12, "major": 2},
"package": "v2.12.0-1422-g0109e7e6f8"}, "capabilities": []}}
{"return": {}}
qemu-system-x86_64: block.c:3434: bdrv_replace_node: Assertion
`!atomic_read(&to->in_flight)' failed.
[1]    5252 done                 echo  |
       5253 abort (core dumped)  x86_64-softmmu/qemu-system-x86_64 -qmp
stdio -nodefaults -blockdev  -blockdev

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Qemu-devel] [PATCH 14/19] block: Defer .bdrv_drain_begin callback to polling phase
  2018-06-27 14:30   ` Max Reitz
@ 2018-06-29 15:14     ` Kevin Wolf
  0 siblings, 0 replies; 52+ messages in thread
From: Kevin Wolf @ 2018-06-29 15:14 UTC (permalink / raw)
  To: Max Reitz; +Cc: qemu-block, pbonzini, famz, stefanha, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 3613 bytes --]

Am 27.06.2018 um 16:30 hat Max Reitz geschrieben:
> On 2018-04-11 18:39, Kevin Wolf wrote:
> > We cannot allow aio_poll() in bdrv_drain_invoke(begin=true) until we're
> > done with propagating the drain through the graph and are doing the
> > single final BDRV_POLL_WHILE().
> > 
> > Just schedule the coroutine with the callback and increase bs->in_flight
> > to make sure that the polling phase will wait for it.
> > 
> > Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> > ---
> >  block/io.c | 28 +++++++++++++++++++++++-----
> >  1 file changed, 23 insertions(+), 5 deletions(-)
> 
> According to bisect, this breaks blockdev-snapshot with QED:

I have no idea why it would trigger only with qed, but I think the real
bug is in commit dcf94a23b1 ('block: Don't poll in parent drain
callbacks').

The crash is specifically in the place where the new overlay needs to be
drained because it's new backing file is drained. First,
bdrv_drain_invoke() creates new activity on the overlay when it gets the
old active layer attached as a backing file:

#0  0x000055b90eb73b88 in bdrv_drain_invoke (bs=0x55b910fd8440, begin=<optimized out>) at block/io.c:217
#1  0x000055b90eb1e2b0 in bdrv_replace_child_noperm (child=0x55b911f32450, new_bs=<optimized out>) at block.c:2069
#2  0x000055b90eb20875 in bdrv_replace_child (child=<optimized out>, new_bs=0x55b910fd2130) at block.c:2098
#3  0x000055b90eb20c53 in bdrv_root_attach_child (child_bs=child_bs@entry=0x55b910fd2130, child_name=child_name@entry=0x55b90eda75e5 "backing", child_role=child_role@entry=0x55b90f3d3300 <child_backing>, perm=0, shared_perm=31, opaque=opaque@entry=0x55b910fd8440, errp=0x7fffb8943620) at block.c:2141
#4  0x000055b90eb20d60 in bdrv_attach_child (parent_bs=parent_bs@entry=0x55b910fd8440, child_bs=child_bs@entry=0x55b910fd2130, child_name=child_name@entry=0x55b90eda75e5 "backing", child_role=child_role@entry=0x55b90f3d3300 <child_backing>, errp=0x7fffb8943620) at block.c:2162
#5  0x000055b90eb23c30 in bdrv_set_backing_hd (bs=bs@entry=0x55b910fd8440, backing_hd=backing_hd@entry=0x55b910fd2130, errp=errp@entry=0x7fffb8943620) at block.c:2249
#6  0x000055b90eb24a76 in bdrv_append (bs_new=0x55b910fd8440, bs_top=0x55b910fd2130, errp=errp@entry=0x7fffb8943680) at block.c:3535
#7  0x000055b90e937a89 in external_snapshot_prepare (common=0x55b910f5cf90, errp=0x7fffb89436f8) at blockdev.c:1680

And then, when trying to move all users of the old active layer to the
new overlay, it turns out that the bdrv_drain_invoke() BH hasn't been
executed yet:

#0  0x00007fdfcef5d9fb in raise () at /lib64/libc.so.6
#1  0x00007fdfcef5f800 in abort () at /lib64/libc.so.6
#2  0x00007fdfcef560da in __assert_fail_base () at /lib64/libc.so.6
#3  0x00007fdfcef56152 in  () at /lib64/libc.so.6
#4  0x000055b90eb24867 in bdrv_replace_node (from=<optimized out>, to=0x55b910fd8440, errp=0x7fffb8943620) at block.c:3470
#5  0x000055b90eb24abe in bdrv_append (bs_new=0x55b910fd8440, bs_top=0x55b910fd2130, errp=errp@entry=0x7fffb8943680) at block.c:3541
#6  0x000055b90e937a89 in external_snapshot_prepare (common=0x55b910f5cf90, errp=0x7fffb89436f8) at blockdev.c:1680

Not polling in bdrv_child_cb_drained_begin() is great if the original
drain is still polling, but if the original drain_begin has already
returned, we obviously do need to poll so we don't enter a drained
section while requests are still running.

Another thing I noticed is that qmp_transaction() only calls a one-time
bdrv_drain_all() instead of using a proper begin/end section. I suppose
this should be fixed as well.

Kevin

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 801 bytes --]

^ permalink raw reply	[flat|nested] 52+ messages in thread

end of thread, other threads:[~2018-06-29 15:15 UTC | newest]

Thread overview: 52+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-04-11 16:39 [Qemu-devel] [PATCH 00/19] Drain fixes and cleanups, part 3 Kevin Wolf
2018-04-11 16:39 ` [Qemu-devel] [PATCH 01/19] test-bdrv-drain: bdrv_drain() works with cross-AioContext events Kevin Wolf
2018-04-11 16:39 ` [Qemu-devel] [PATCH 02/19] block: Use bdrv_do_drain_begin/end in bdrv_drain_all() Kevin Wolf
2018-04-20  7:07   ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi
2018-04-11 16:39 ` [Qemu-devel] [PATCH 03/19] block: Remove 'recursive' parameter from bdrv_drain_invoke() Kevin Wolf
2018-04-20  7:09   ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi
2018-04-11 16:39 ` [Qemu-devel] [PATCH 04/19] block: Don't manually poll in bdrv_drain_all() Kevin Wolf
2018-04-11 18:32   ` Eric Blake
2018-04-20  7:11   ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi
2018-04-11 16:39 ` [Qemu-devel] [PATCH 05/19] tests/test-bdrv-drain: bdrv_drain_all() works in coroutines now Kevin Wolf
2018-04-11 18:33   ` Eric Blake
2018-04-20  7:12   ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi
2018-04-11 16:39 ` [Qemu-devel] [PATCH 06/19] block: Avoid unnecessary aio_poll() in AIO_WAIT_WHILE() Kevin Wolf
2018-04-11 17:33   ` Su Hang
2018-04-20  7:17   ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi
2018-04-11 16:39 ` [Qemu-devel] [PATCH 07/19] block: Really pause block jobs on drain Kevin Wolf
2018-04-12  8:37   ` Paolo Bonzini
2018-04-12  9:51     ` Kevin Wolf
2018-04-12 10:12       ` Paolo Bonzini
2018-04-12 11:11         ` Kevin Wolf
2018-04-12 11:30           ` Paolo Bonzini
2018-04-12 11:53             ` Kevin Wolf
2018-04-12 12:02               ` Paolo Bonzini
2018-04-12 13:27                 ` Kevin Wolf
2018-04-12 13:42                   ` Paolo Bonzini
2018-04-12 14:25                     ` Kevin Wolf
2018-04-12 20:44                       ` Paolo Bonzini
2018-04-13  8:01                         ` Kevin Wolf
2018-04-13 11:05                           ` [Qemu-devel] [Qemu-block] " Paolo Bonzini
2018-04-13 12:40                             ` Kevin Wolf
2018-04-11 16:39 ` [Qemu-devel] [PATCH 08/19] block: Remove bdrv_drain_recurse() Kevin Wolf
2018-04-12  8:39   ` Paolo Bonzini
2018-04-20  7:20   ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi
2018-04-11 16:39 ` [Qemu-devel] [PATCH 09/19] test-bdrv-drain: Add test for node deletion Kevin Wolf
2018-04-20  7:32   ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi
2018-04-11 16:39 ` [Qemu-devel] [PATCH 10/19] block: Drain recursively with a single BDRV_POLL_WHILE() Kevin Wolf
2018-04-12  8:41   ` Paolo Bonzini
2018-04-11 16:39 ` [Qemu-devel] [PATCH 11/19] test-bdrv-drain: Test node deletion in subtree recursion Kevin Wolf
2018-04-11 16:39 ` [Qemu-devel] [PATCH 12/19] block: Don't poll in parent drain callbacks Kevin Wolf
2018-04-11 16:39 ` [Qemu-devel] [PATCH 13/19] test-bdrv-drain: Graph change through parent callback Kevin Wolf
2018-04-11 16:39 ` [Qemu-devel] [PATCH 14/19] block: Defer .bdrv_drain_begin callback to polling phase Kevin Wolf
2018-06-27 14:30   ` Max Reitz
2018-06-29 15:14     ` Kevin Wolf
2018-04-11 16:39 ` [Qemu-devel] [PATCH 15/19] test-bdrv-drain: Test that bdrv_drain_invoke() doesn't poll Kevin Wolf
2018-04-11 16:39 ` [Qemu-devel] [PATCH 16/19] block: Allow AIO_WAIT_WHILE with NULL ctx Kevin Wolf
2018-04-12  8:43   ` Paolo Bonzini
2018-04-11 16:39 ` [Qemu-devel] [PATCH 17/19] block: Move bdrv_drain_all_begin() out of coroutine context Kevin Wolf
2018-04-11 16:39 ` [Qemu-devel] [PATCH 18/19] block: Allow graph changes in bdrv_drain_all_begin/end sections Kevin Wolf
2018-04-12  8:47   ` Paolo Bonzini
2018-04-11 16:39 ` [Qemu-devel] [PATCH 19/19] test-bdrv-drain: Test graph changes in drain_all section Kevin Wolf
2018-04-11 17:05 ` [Qemu-devel] [PATCH 00/19] Drain fixes and cleanups, part 3 no-reply
2018-04-20  7:35 ` Stefan Hajnoczi

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.