All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v2 00/20] Drain fixes and cleanups, part 3
@ 2018-05-29 17:21 Kevin Wolf
  2018-05-29 17:21 ` [Qemu-devel] [PATCH v2 01/20] test-bdrv-drain: bdrv_drain() works with cross-AioContext events Kevin Wolf
                   ` (21 more replies)
  0 siblings, 22 replies; 24+ messages in thread
From: Kevin Wolf @ 2018-05-29 17:21 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, mreitz, pbonzini, famz, stefanha, eblake, qemu-devel

This is the third and hopefully for now last part of my work to fix
drain. The main goal of this series is to make drain robust against
graph changes that happen in any callbacks of in-flight requests while
we drain a block node.

The individual patches describe the details, but the rough plan is to
change all three drain types (single node, subtree and all) to work like
this:

1. First call all the necessary callbacks to quiesce external sources
   for new requests. This includes the block driver callbacks, the child
   node callbacks and disabling external AioContext events. This is done
   recursively.

   Much of the trouble we had with drain resulted from the fact that the
   graph changed while we were traversing the graph recursively. None of
   the callbacks called in this phase may change the graph.

2. Then do a single AIO_WAIT_WHILE() to drain the requests of all
   affected nodes. The aio_poll() called by it is where graph changes
   can happen and we need to be careful.

   However, while evaluating the loop condition, the graph can't change,
   so we can safely call all necessary callbacks, if needed recursively,
   to determine whether there are still pending requests in any affected
   nodes. We just need to make sure that we don't rely on the set of
   nodes being the same between any two evaluation of the condition.

There are a few more smaller, mostly self-contained changes needed
before we're actually safe, but this is the main mechanism that will
help you understand what we're working towards during the series.

v2:

- Rebased on top of current master (e.g. including Job infrastructure)

- Avoid unnecessary parent callbacks for .drained_begin/poll/end:
  * subtree drains: Don't propagate the drain to the parent that we came
                    from recursively
  * drain_all:      Don't propagate the drain to BDS parents (which are
                    already separately drained), but only to non-BDS
                    parents like BBs or Jobs

- Separate bdrv_drain_poll_top_level() function instead of having a
  top_level parameter for bdrv_drain_poll().

- A few commit message and comment improvements


Kevin Wolf (19):
  test-bdrv-drain: bdrv_drain() works with cross-AioContext events
  block: Use bdrv_do_drain_begin/end in bdrv_drain_all()
  block: Remove 'recursive' parameter from bdrv_drain_invoke()
  block: Don't manually poll in bdrv_drain_all()
  tests/test-bdrv-drain: bdrv_drain_all() works in coroutines now
  block: Avoid unnecessary aio_poll() in AIO_WAIT_WHILE()
  block: Really pause block jobs on drain
  block: Remove bdrv_drain_recurse()
  block: Drain recursively with a single BDRV_POLL_WHILE()
  test-bdrv-drain: Test node deletion in subtree recursion
  block: Don't poll in parent drain callbacks
  test-bdrv-drain: Graph change through parent callback
  block: Defer .bdrv_drain_begin callback to polling phase
  test-bdrv-drain: Test that bdrv_drain_invoke() doesn't poll
  block: Allow AIO_WAIT_WHILE with NULL ctx
  block: Move bdrv_drain_all_begin() out of coroutine context
  block: ignore_bds_parents parameter for drain functions
  block: Allow graph changes in bdrv_drain_all_begin/end sections
  test-bdrv-drain: Test graph changes in drain_all section

Max Reitz (1):
  test-bdrv-drain: Add test for node deletion

 include/block/aio-wait.h     |  25 +-
 include/block/block.h        |  31 +-
 include/block/block_int.h    |  14 +
 include/block/blockjob_int.h |   8 +
 block.c                      |  52 +++-
 block/io.c                   | 332 ++++++++++++--------
 block/mirror.c               |   8 +
 block/vvfat.c                |   1 +
 blockjob.c                   |  23 ++
 tests/test-bdrv-drain.c      | 705 +++++++++++++++++++++++++++++++++++++++++--
 10 files changed, 1032 insertions(+), 167 deletions(-)

-- 
2.13.6

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Qemu-devel] [PATCH v2 01/20] test-bdrv-drain: bdrv_drain() works with cross-AioContext events
  2018-05-29 17:21 [Qemu-devel] [PATCH v2 00/20] Drain fixes and cleanups, part 3 Kevin Wolf
@ 2018-05-29 17:21 ` Kevin Wolf
  2018-05-29 17:21 ` [Qemu-devel] [PATCH v2 02/20] block: Use bdrv_do_drain_begin/end in bdrv_drain_all() Kevin Wolf
                   ` (20 subsequent siblings)
  21 siblings, 0 replies; 24+ messages in thread
From: Kevin Wolf @ 2018-05-29 17:21 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, mreitz, pbonzini, famz, stefanha, eblake, qemu-devel

As long as nobody keeps the other I/O thread from working, there is no
reason why bdrv_drain() wouldn't work with cross-AioContext events. The
key is that the root request we're waiting for is in the AioContext
we're polling (which it always is for bdrv_drain()) so that aio_poll()
is woken up in the end.

Add a test case that shows that it works. Remove the comment in
bdrv_drain() that claims otherwise.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 block/io.c              |   4 --
 tests/test-bdrv-drain.c | 187 +++++++++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 186 insertions(+), 5 deletions(-)

diff --git a/block/io.c b/block/io.c
index ca96b487eb..1e4e2f40ea 100644
--- a/block/io.c
+++ b/block/io.c
@@ -370,10 +370,6 @@ void bdrv_unapply_subtree_drain(BdrvChild *child, BlockDriverState *old_parent)
  *
  * Note that unlike bdrv_drain_all(), the caller must hold the BlockDriverState
  * AioContext.
- *
- * Only this BlockDriverState's AioContext is run, so in-flight requests must
- * not depend on events in other AioContexts.  In that case, use
- * bdrv_drain_all() instead.
  */
 void coroutine_fn bdrv_co_drain(BlockDriverState *bs)
 {
diff --git a/tests/test-bdrv-drain.c b/tests/test-bdrv-drain.c
index 2cba63b881..1682e2b72d 100644
--- a/tests/test-bdrv-drain.c
+++ b/tests/test-bdrv-drain.c
@@ -27,9 +27,13 @@
 #include "block/blockjob_int.h"
 #include "sysemu/block-backend.h"
 #include "qapi/error.h"
+#include "iothread.h"
+
+static QemuEvent done_event;
 
 typedef struct BDRVTestState {
     int drain_count;
+    AioContext *bh_indirection_ctx;
 } BDRVTestState;
 
 static void coroutine_fn bdrv_test_co_drain_begin(BlockDriverState *bs)
@@ -50,16 +54,29 @@ static void bdrv_test_close(BlockDriverState *bs)
     g_assert_cmpint(s->drain_count, >, 0);
 }
 
+static void co_reenter_bh(void *opaque)
+{
+    aio_co_wake(opaque);
+}
+
 static int coroutine_fn bdrv_test_co_preadv(BlockDriverState *bs,
                                             uint64_t offset, uint64_t bytes,
                                             QEMUIOVector *qiov, int flags)
 {
+    BDRVTestState *s = bs->opaque;
+
     /* We want this request to stay until the polling loop in drain waits for
      * it to complete. We need to sleep a while as bdrv_drain_invoke() comes
      * first and polls its result, too, but it shouldn't accidentally complete
      * this request yet. */
     qemu_co_sleep_ns(QEMU_CLOCK_REALTIME, 100000);
 
+    if (s->bh_indirection_ctx) {
+        aio_bh_schedule_oneshot(s->bh_indirection_ctx, co_reenter_bh,
+                                qemu_coroutine_self());
+        qemu_coroutine_yield();
+    }
+
     return 0;
 }
 
@@ -490,6 +507,164 @@ static void test_graph_change(void)
     blk_unref(blk_b);
 }
 
+struct test_iothread_data {
+    BlockDriverState *bs;
+    enum drain_type drain_type;
+    int *aio_ret;
+};
+
+static void test_iothread_drain_entry(void *opaque)
+{
+    struct test_iothread_data *data = opaque;
+
+    aio_context_acquire(bdrv_get_aio_context(data->bs));
+    do_drain_begin(data->drain_type, data->bs);
+    g_assert_cmpint(*data->aio_ret, ==, 0);
+    do_drain_end(data->drain_type, data->bs);
+    aio_context_release(bdrv_get_aio_context(data->bs));
+
+    qemu_event_set(&done_event);
+}
+
+static void test_iothread_aio_cb(void *opaque, int ret)
+{
+    int *aio_ret = opaque;
+    *aio_ret = ret;
+    qemu_event_set(&done_event);
+}
+
+/*
+ * Starts an AIO request on a BDS that runs in the AioContext of iothread 1.
+ * The request involves a BH on iothread 2 before it can complete.
+ *
+ * @drain_thread = 0 means that do_drain_begin/end are called from the main
+ * thread, @drain_thread = 1 means that they are called from iothread 1. Drain
+ * for this BDS cannot be called from iothread 2 because only the main thread
+ * may do cross-AioContext polling.
+ */
+static void test_iothread_common(enum drain_type drain_type, int drain_thread)
+{
+    BlockBackend *blk;
+    BlockDriverState *bs;
+    BDRVTestState *s;
+    BlockAIOCB *acb;
+    int aio_ret;
+    struct test_iothread_data data;
+
+    IOThread *a = iothread_new();
+    IOThread *b = iothread_new();
+    AioContext *ctx_a = iothread_get_aio_context(a);
+    AioContext *ctx_b = iothread_get_aio_context(b);
+
+    QEMUIOVector qiov;
+    struct iovec iov = {
+        .iov_base = NULL,
+        .iov_len = 0,
+    };
+    qemu_iovec_init_external(&qiov, &iov, 1);
+
+    /* bdrv_drain_all() may only be called from the main loop thread */
+    if (drain_type == BDRV_DRAIN_ALL && drain_thread != 0) {
+        goto out;
+    }
+
+    blk = blk_new(BLK_PERM_ALL, BLK_PERM_ALL);
+    bs = bdrv_new_open_driver(&bdrv_test, "test-node", BDRV_O_RDWR,
+                              &error_abort);
+    s = bs->opaque;
+    blk_insert_bs(blk, bs, &error_abort);
+
+    blk_set_aio_context(blk, ctx_a);
+    aio_context_acquire(ctx_a);
+
+    s->bh_indirection_ctx = ctx_b;
+
+    aio_ret = -EINPROGRESS;
+    if (drain_thread == 0) {
+        acb = blk_aio_preadv(blk, 0, &qiov, 0, test_iothread_aio_cb, &aio_ret);
+    } else {
+        acb = blk_aio_preadv(blk, 0, &qiov, 0, aio_ret_cb, &aio_ret);
+    }
+    g_assert(acb != NULL);
+    g_assert_cmpint(aio_ret, ==, -EINPROGRESS);
+
+    aio_context_release(ctx_a);
+
+    data = (struct test_iothread_data) {
+        .bs         = bs,
+        .drain_type = drain_type,
+        .aio_ret    = &aio_ret,
+    };
+
+    switch (drain_thread) {
+    case 0:
+        if (drain_type != BDRV_DRAIN_ALL) {
+            aio_context_acquire(ctx_a);
+        }
+
+        /* The request is running on the IOThread a. Draining its block device
+         * will make sure that it has completed as far as the BDS is concerned,
+         * but the drain in this thread can continue immediately after
+         * bdrv_dec_in_flight() and aio_ret might be assigned only slightly
+         * later. */
+        qemu_event_reset(&done_event);
+        do_drain_begin(drain_type, bs);
+        g_assert_cmpint(bs->in_flight, ==, 0);
+
+        if (drain_type != BDRV_DRAIN_ALL) {
+            aio_context_release(ctx_a);
+        }
+        qemu_event_wait(&done_event);
+        if (drain_type != BDRV_DRAIN_ALL) {
+            aio_context_acquire(ctx_a);
+        }
+
+        g_assert_cmpint(aio_ret, ==, 0);
+        do_drain_end(drain_type, bs);
+
+        if (drain_type != BDRV_DRAIN_ALL) {
+            aio_context_release(ctx_a);
+        }
+        break;
+    case 1:
+        qemu_event_reset(&done_event);
+        aio_bh_schedule_oneshot(ctx_a, test_iothread_drain_entry, &data);
+        qemu_event_wait(&done_event);
+        break;
+    default:
+        g_assert_not_reached();
+    }
+
+    aio_context_acquire(ctx_a);
+    blk_set_aio_context(blk, qemu_get_aio_context());
+    aio_context_release(ctx_a);
+
+    bdrv_unref(bs);
+    blk_unref(blk);
+
+out:
+    iothread_join(a);
+    iothread_join(b);
+}
+
+static void test_iothread_drain_all(void)
+{
+    test_iothread_common(BDRV_DRAIN_ALL, 0);
+    test_iothread_common(BDRV_DRAIN_ALL, 1);
+}
+
+static void test_iothread_drain(void)
+{
+    test_iothread_common(BDRV_DRAIN, 0);
+    test_iothread_common(BDRV_DRAIN, 1);
+}
+
+static void test_iothread_drain_subtree(void)
+{
+    test_iothread_common(BDRV_SUBTREE_DRAIN, 0);
+    test_iothread_common(BDRV_SUBTREE_DRAIN, 1);
+}
+
 
 typedef struct TestBlockJob {
     BlockJob common;
@@ -618,10 +793,13 @@ static void test_blockjob_drain_subtree(void)
 
 int main(int argc, char **argv)
 {
+    int ret;
+
     bdrv_init();
     qemu_init_main_loop(&error_abort);
 
     g_test_init(&argc, &argv, NULL);
+    qemu_event_init(&done_event, false);
 
     g_test_add_func("/bdrv-drain/driver-cb/drain_all", test_drv_cb_drain_all);
     g_test_add_func("/bdrv-drain/driver-cb/drain", test_drv_cb_drain);
@@ -648,10 +826,17 @@ int main(int argc, char **argv)
     g_test_add_func("/bdrv-drain/multiparent", test_multiparent);
     g_test_add_func("/bdrv-drain/graph-change", test_graph_change);
 
+    g_test_add_func("/bdrv-drain/iothread/drain_all", test_iothread_drain_all);
+    g_test_add_func("/bdrv-drain/iothread/drain", test_iothread_drain);
+    g_test_add_func("/bdrv-drain/iothread/drain_subtree",
+                    test_iothread_drain_subtree);
+
     g_test_add_func("/bdrv-drain/blockjob/drain_all", test_blockjob_drain_all);
     g_test_add_func("/bdrv-drain/blockjob/drain", test_blockjob_drain);
     g_test_add_func("/bdrv-drain/blockjob/drain_subtree",
                     test_blockjob_drain_subtree);
 
-    return g_test_run();
+    ret = g_test_run();
+    qemu_event_destroy(&done_event);
+    return ret;
 }
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [Qemu-devel] [PATCH v2 02/20] block: Use bdrv_do_drain_begin/end in bdrv_drain_all()
  2018-05-29 17:21 [Qemu-devel] [PATCH v2 00/20] Drain fixes and cleanups, part 3 Kevin Wolf
  2018-05-29 17:21 ` [Qemu-devel] [PATCH v2 01/20] test-bdrv-drain: bdrv_drain() works with cross-AioContext events Kevin Wolf
@ 2018-05-29 17:21 ` Kevin Wolf
  2018-05-29 17:21 ` [Qemu-devel] [PATCH v2 03/20] block: Remove 'recursive' parameter from bdrv_drain_invoke() Kevin Wolf
                   ` (19 subsequent siblings)
  21 siblings, 0 replies; 24+ messages in thread
From: Kevin Wolf @ 2018-05-29 17:21 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, mreitz, pbonzini, famz, stefanha, eblake, qemu-devel

bdrv_do_drain_begin/end() implement already everything that
bdrv_drain_all_begin/end() need and currently still do manually: Disable
external events, call parent drain callbacks, call block driver
callbacks.

It also does two more things:

The first is incrementing bs->quiesce_counter. bdrv_drain_all() already
stood out in the test case by behaving different from the other drain
variants. Adding this is not only safe, but in fact a bug fix.

The second is calling bdrv_drain_recurse(). We already do that later in
the same function in a loop, so basically doing an early first iteration
doesn't hurt.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 block/io.c              | 10 ++--------
 tests/test-bdrv-drain.c | 14 ++++----------
 2 files changed, 6 insertions(+), 18 deletions(-)

diff --git a/block/io.c b/block/io.c
index 1e4e2f40ea..73579b31cf 100644
--- a/block/io.c
+++ b/block/io.c
@@ -413,11 +413,8 @@ void bdrv_drain_all_begin(void)
     for (bs = bdrv_first(&it); bs; bs = bdrv_next(&it)) {
         AioContext *aio_context = bdrv_get_aio_context(bs);
 
-        /* Stop things in parent-to-child order */
         aio_context_acquire(aio_context);
-        aio_disable_external(aio_context);
-        bdrv_parent_drained_begin(bs, NULL);
-        bdrv_drain_invoke(bs, true, true);
+        bdrv_do_drained_begin(bs, true, NULL);
         aio_context_release(aio_context);
 
         if (!g_slist_find(aio_ctxs, aio_context)) {
@@ -458,11 +455,8 @@ void bdrv_drain_all_end(void)
     for (bs = bdrv_first(&it); bs; bs = bdrv_next(&it)) {
         AioContext *aio_context = bdrv_get_aio_context(bs);
 
-        /* Re-enable things in child-to-parent order */
         aio_context_acquire(aio_context);
-        bdrv_drain_invoke(bs, false, true);
-        bdrv_parent_drained_end(bs, NULL);
-        aio_enable_external(aio_context);
+        bdrv_do_drained_end(bs, true, NULL);
         aio_context_release(aio_context);
     }
 }
diff --git a/tests/test-bdrv-drain.c b/tests/test-bdrv-drain.c
index 1682e2b72d..4c3e93de0b 100644
--- a/tests/test-bdrv-drain.c
+++ b/tests/test-bdrv-drain.c
@@ -276,8 +276,7 @@ static void test_quiesce_common(enum drain_type drain_type, bool recursive)
 
 static void test_quiesce_drain_all(void)
 {
-    // XXX drain_all doesn't quiesce
-    //test_quiesce_common(BDRV_DRAIN_ALL, true);
+    test_quiesce_common(BDRV_DRAIN_ALL, true);
 }
 
 static void test_quiesce_drain(void)
@@ -319,12 +318,7 @@ static void test_nested(void)
 
     for (outer = 0; outer < DRAIN_TYPE_MAX; outer++) {
         for (inner = 0; inner < DRAIN_TYPE_MAX; inner++) {
-            /* XXX bdrv_drain_all() doesn't increase the quiesce_counter */
-            int bs_quiesce      = (outer != BDRV_DRAIN_ALL) +
-                                  (inner != BDRV_DRAIN_ALL);
-            int backing_quiesce = (outer == BDRV_SUBTREE_DRAIN) +
-                                  (inner == BDRV_SUBTREE_DRAIN);
-            int backing_cb_cnt  = (outer != BDRV_DRAIN) +
+            int backing_quiesce = (outer != BDRV_DRAIN) +
                                   (inner != BDRV_DRAIN);
 
             g_assert_cmpint(bs->quiesce_counter, ==, 0);
@@ -335,10 +329,10 @@ static void test_nested(void)
             do_drain_begin(outer, bs);
             do_drain_begin(inner, bs);
 
-            g_assert_cmpint(bs->quiesce_counter, ==, bs_quiesce);
+            g_assert_cmpint(bs->quiesce_counter, ==, 2);
             g_assert_cmpint(backing->quiesce_counter, ==, backing_quiesce);
             g_assert_cmpint(s->drain_count, ==, 2);
-            g_assert_cmpint(backing_s->drain_count, ==, backing_cb_cnt);
+            g_assert_cmpint(backing_s->drain_count, ==, backing_quiesce);
 
             do_drain_end(inner, bs);
             do_drain_end(outer, bs);
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [Qemu-devel] [PATCH v2 03/20] block: Remove 'recursive' parameter from bdrv_drain_invoke()
  2018-05-29 17:21 [Qemu-devel] [PATCH v2 00/20] Drain fixes and cleanups, part 3 Kevin Wolf
  2018-05-29 17:21 ` [Qemu-devel] [PATCH v2 01/20] test-bdrv-drain: bdrv_drain() works with cross-AioContext events Kevin Wolf
  2018-05-29 17:21 ` [Qemu-devel] [PATCH v2 02/20] block: Use bdrv_do_drain_begin/end in bdrv_drain_all() Kevin Wolf
@ 2018-05-29 17:21 ` Kevin Wolf
  2018-05-29 17:21 ` [Qemu-devel] [PATCH v2 04/20] block: Don't manually poll in bdrv_drain_all() Kevin Wolf
                   ` (18 subsequent siblings)
  21 siblings, 0 replies; 24+ messages in thread
From: Kevin Wolf @ 2018-05-29 17:21 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, mreitz, pbonzini, famz, stefanha, eblake, qemu-devel

All callers pass false for the 'recursive' parameter now. Remove it.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 block/io.c | 13 +++----------
 1 file changed, 3 insertions(+), 10 deletions(-)

diff --git a/block/io.c b/block/io.c
index 73579b31cf..48d4d4df22 100644
--- a/block/io.c
+++ b/block/io.c
@@ -168,9 +168,8 @@ static void coroutine_fn bdrv_drain_invoke_entry(void *opaque)
 }
 
 /* Recursively call BlockDriver.bdrv_co_drain_begin/end callbacks */
-static void bdrv_drain_invoke(BlockDriverState *bs, bool begin, bool recursive)
+static void bdrv_drain_invoke(BlockDriverState *bs, bool begin)
 {
-    BdrvChild *child, *tmp;
     BdrvCoDrainData data = { .bs = bs, .done = false, .begin = begin};
 
     if (!bs->drv || (begin && !bs->drv->bdrv_co_drain_begin) ||
@@ -181,12 +180,6 @@ static void bdrv_drain_invoke(BlockDriverState *bs, bool begin, bool recursive)
     data.co = qemu_coroutine_create(bdrv_drain_invoke_entry, &data);
     bdrv_coroutine_enter(bs, data.co);
     BDRV_POLL_WHILE(bs, !data.done);
-
-    if (recursive) {
-        QLIST_FOREACH_SAFE(child, &bs->children, next, tmp) {
-            bdrv_drain_invoke(child->bs, begin, true);
-        }
-    }
 }
 
 static bool bdrv_drain_recurse(BlockDriverState *bs)
@@ -287,7 +280,7 @@ void bdrv_do_drained_begin(BlockDriverState *bs, bool recursive,
     }
 
     bdrv_parent_drained_begin(bs, parent);
-    bdrv_drain_invoke(bs, true, false);
+    bdrv_drain_invoke(bs, true);
     bdrv_drain_recurse(bs);
 
     if (recursive) {
@@ -322,7 +315,7 @@ void bdrv_do_drained_end(BlockDriverState *bs, bool recursive,
     old_quiesce_counter = atomic_fetch_dec(&bs->quiesce_counter);
 
     /* Re-enable things in child-to-parent order */
-    bdrv_drain_invoke(bs, false, false);
+    bdrv_drain_invoke(bs, false);
     bdrv_parent_drained_end(bs, parent);
     if (old_quiesce_counter == 1) {
         aio_enable_external(bdrv_get_aio_context(bs));
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [Qemu-devel] [PATCH v2 04/20] block: Don't manually poll in bdrv_drain_all()
  2018-05-29 17:21 [Qemu-devel] [PATCH v2 00/20] Drain fixes and cleanups, part 3 Kevin Wolf
                   ` (2 preceding siblings ...)
  2018-05-29 17:21 ` [Qemu-devel] [PATCH v2 03/20] block: Remove 'recursive' parameter from bdrv_drain_invoke() Kevin Wolf
@ 2018-05-29 17:21 ` Kevin Wolf
  2018-05-29 17:21 ` [Qemu-devel] [PATCH v2 05/20] tests/test-bdrv-drain: bdrv_drain_all() works in coroutines now Kevin Wolf
                   ` (17 subsequent siblings)
  21 siblings, 0 replies; 24+ messages in thread
From: Kevin Wolf @ 2018-05-29 17:21 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, mreitz, pbonzini, famz, stefanha, eblake, qemu-devel

All involved nodes are already idle, we called bdrv_do_drain_begin() on
them.

The comment in the code suggested that this was not correct because the
completion of a request on one node could spawn a new request on a
different node (which might have been drained before, so we wouldn't
drain the new request). In reality, new requests to different nodes
aren't spawned out of nothing, but only in the context of a parent
request, and they aren't submitted to random nodes, but only to child
nodes. As long as we still poll for the completion of the parent request
(which we do), draining each root node separately is good enough.

Remove the additional polling code from bdrv_drain_all_begin() and
replace it with an assertion that all nodes are already idle after we
drained them separately.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 block/io.c | 41 ++++++++++++-----------------------------
 1 file changed, 12 insertions(+), 29 deletions(-)

diff --git a/block/io.c b/block/io.c
index 48d4d4df22..58f50ba253 100644
--- a/block/io.c
+++ b/block/io.c
@@ -377,6 +377,16 @@ void bdrv_drain(BlockDriverState *bs)
     bdrv_drained_end(bs);
 }
 
+static void bdrv_drain_assert_idle(BlockDriverState *bs)
+{
+    BdrvChild *child, *next;
+
+    assert(atomic_read(&bs->in_flight) == 0);
+    QLIST_FOREACH_SAFE(child, &bs->children, next, next) {
+        bdrv_drain_assert_idle(child->bs);
+    }
+}
+
 /*
  * Wait for pending requests to complete across all BlockDriverStates
  *
@@ -391,11 +401,8 @@ void bdrv_drain(BlockDriverState *bs)
  */
 void bdrv_drain_all_begin(void)
 {
-    /* Always run first iteration so any pending completion BHs run */
-    bool waited = true;
     BlockDriverState *bs;
     BdrvNextIterator it;
-    GSList *aio_ctxs = NULL, *ctx;
 
     /* BDRV_POLL_WHILE() for a node can only be called from its own I/O thread
      * or the main loop AioContext. We potentially use BDRV_POLL_WHILE() on
@@ -409,35 +416,11 @@ void bdrv_drain_all_begin(void)
         aio_context_acquire(aio_context);
         bdrv_do_drained_begin(bs, true, NULL);
         aio_context_release(aio_context);
-
-        if (!g_slist_find(aio_ctxs, aio_context)) {
-            aio_ctxs = g_slist_prepend(aio_ctxs, aio_context);
-        }
     }
 
-    /* Note that completion of an asynchronous I/O operation can trigger any
-     * number of other I/O operations on other devices---for example a
-     * coroutine can submit an I/O request to another device in response to
-     * request completion.  Therefore we must keep looping until there was no
-     * more activity rather than simply draining each device independently.
-     */
-    while (waited) {
-        waited = false;
-
-        for (ctx = aio_ctxs; ctx != NULL; ctx = ctx->next) {
-            AioContext *aio_context = ctx->data;
-
-            aio_context_acquire(aio_context);
-            for (bs = bdrv_first(&it); bs; bs = bdrv_next(&it)) {
-                if (aio_context == bdrv_get_aio_context(bs)) {
-                    waited |= bdrv_drain_recurse(bs);
-                }
-            }
-            aio_context_release(aio_context);
-        }
+    for (bs = bdrv_first(&it); bs; bs = bdrv_next(&it)) {
+        bdrv_drain_assert_idle(bs);
     }
-
-    g_slist_free(aio_ctxs);
 }
 
 void bdrv_drain_all_end(void)
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [Qemu-devel] [PATCH v2 05/20] tests/test-bdrv-drain: bdrv_drain_all() works in coroutines now
  2018-05-29 17:21 [Qemu-devel] [PATCH v2 00/20] Drain fixes and cleanups, part 3 Kevin Wolf
                   ` (3 preceding siblings ...)
  2018-05-29 17:21 ` [Qemu-devel] [PATCH v2 04/20] block: Don't manually poll in bdrv_drain_all() Kevin Wolf
@ 2018-05-29 17:21 ` Kevin Wolf
  2018-05-29 17:21 ` [Qemu-devel] [PATCH v2 06/20] block: Avoid unnecessary aio_poll() in AIO_WAIT_WHILE() Kevin Wolf
                   ` (16 subsequent siblings)
  21 siblings, 0 replies; 24+ messages in thread
From: Kevin Wolf @ 2018-05-29 17:21 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, mreitz, pbonzini, famz, stefanha, eblake, qemu-devel

Since we use bdrv_do_drained_begin/end() for bdrv_drain_all_begin/end(),
coroutine context is automatically left with a BH, preventing the
deadlocks that made bdrv_drain_all*() unsafe in coroutine context. Now
that we even removed the old polling code as dead code, it's obvious
that it's compatible now.

Enable the coroutine test cases for bdrv_drain_all().

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 tests/test-bdrv-drain.c | 16 ++++++++++++++--
 1 file changed, 14 insertions(+), 2 deletions(-)

diff --git a/tests/test-bdrv-drain.c b/tests/test-bdrv-drain.c
index 4c3e93de0b..17127e63e4 100644
--- a/tests/test-bdrv-drain.c
+++ b/tests/test-bdrv-drain.c
@@ -233,6 +233,11 @@ static void test_drv_cb_drain_subtree(void)
     test_drv_cb_common(BDRV_SUBTREE_DRAIN, true);
 }
 
+static void test_drv_cb_co_drain_all(void)
+{
+    call_in_coroutine(test_drv_cb_drain_all);
+}
+
 static void test_drv_cb_co_drain(void)
 {
     call_in_coroutine(test_drv_cb_drain);
@@ -289,6 +294,11 @@ static void test_quiesce_drain_subtree(void)
     test_quiesce_common(BDRV_SUBTREE_DRAIN, true);
 }
 
+static void test_quiesce_co_drain_all(void)
+{
+    call_in_coroutine(test_quiesce_drain_all);
+}
+
 static void test_quiesce_co_drain(void)
 {
     call_in_coroutine(test_quiesce_drain);
@@ -800,7 +810,8 @@ int main(int argc, char **argv)
     g_test_add_func("/bdrv-drain/driver-cb/drain_subtree",
                     test_drv_cb_drain_subtree);
 
-    // XXX bdrv_drain_all() doesn't work in coroutine context
+    g_test_add_func("/bdrv-drain/driver-cb/co/drain_all",
+                    test_drv_cb_co_drain_all);
     g_test_add_func("/bdrv-drain/driver-cb/co/drain", test_drv_cb_co_drain);
     g_test_add_func("/bdrv-drain/driver-cb/co/drain_subtree",
                     test_drv_cb_co_drain_subtree);
@@ -811,7 +822,8 @@ int main(int argc, char **argv)
     g_test_add_func("/bdrv-drain/quiesce/drain_subtree",
                     test_quiesce_drain_subtree);
 
-    // XXX bdrv_drain_all() doesn't work in coroutine context
+    g_test_add_func("/bdrv-drain/quiesce/co/drain_all",
+                    test_quiesce_co_drain_all);
     g_test_add_func("/bdrv-drain/quiesce/co/drain", test_quiesce_co_drain);
     g_test_add_func("/bdrv-drain/quiesce/co/drain_subtree",
                     test_quiesce_co_drain_subtree);
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [Qemu-devel] [PATCH v2 06/20] block: Avoid unnecessary aio_poll() in AIO_WAIT_WHILE()
  2018-05-29 17:21 [Qemu-devel] [PATCH v2 00/20] Drain fixes and cleanups, part 3 Kevin Wolf
                   ` (4 preceding siblings ...)
  2018-05-29 17:21 ` [Qemu-devel] [PATCH v2 05/20] tests/test-bdrv-drain: bdrv_drain_all() works in coroutines now Kevin Wolf
@ 2018-05-29 17:21 ` Kevin Wolf
  2018-05-29 17:21 ` [Qemu-devel] [PATCH v2 07/20] block: Really pause block jobs on drain Kevin Wolf
                   ` (15 subsequent siblings)
  21 siblings, 0 replies; 24+ messages in thread
From: Kevin Wolf @ 2018-05-29 17:21 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, mreitz, pbonzini, famz, stefanha, eblake, qemu-devel

Commit 91af091f923 added an additional aio_poll() to BDRV_POLL_WHILE()
in order to make sure that all pending BHs are executed on drain. This
was the wrong place to make the fix, as it is useless overhead for all
other users of the macro and unnecessarily complicates the mechanism.

This patch effectively reverts said commit (the context has changed a
bit and the code has moved to AIO_WAIT_WHILE()) and instead polls in the
loop condition for drain.

The effect is probably hard to measure in any real-world use case
because actual I/O will dominate, but if I run only the initialisation
part of 'qemu-img convert' where it calls bdrv_block_status() for the
whole image to find out how much data there is copy, this phase actually
needs only roughly half the time after this patch.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 include/block/aio-wait.h | 22 ++++++++--------------
 block/io.c               | 11 ++++++++++-
 2 files changed, 18 insertions(+), 15 deletions(-)

diff --git a/include/block/aio-wait.h b/include/block/aio-wait.h
index 8c90a2e66e..783d3678dd 100644
--- a/include/block/aio-wait.h
+++ b/include/block/aio-wait.h
@@ -73,29 +73,23 @@ typedef struct {
  */
 #define AIO_WAIT_WHILE(wait, ctx, cond) ({                         \
     bool waited_ = false;                                          \
-    bool busy_ = true;                                             \
     AioWait *wait_ = (wait);                                       \
     AioContext *ctx_ = (ctx);                                      \
     if (in_aio_context_home_thread(ctx_)) {                        \
-        while ((cond) || busy_) {                                  \
-            busy_ = aio_poll(ctx_, (cond));                        \
-            waited_ |= !!(cond) | busy_;                           \
+        while ((cond)) {                                           \
+            aio_poll(ctx_, true);                                  \
+            waited_ = true;                                        \
         }                                                          \
     } else {                                                       \
         assert(qemu_get_current_aio_context() ==                   \
                qemu_get_aio_context());                            \
         /* Increment wait_->num_waiters before evaluating cond. */ \
         atomic_inc(&wait_->num_waiters);                           \
-        while (busy_) {                                            \
-            if ((cond)) {                                          \
-                waited_ = busy_ = true;                            \
-                aio_context_release(ctx_);                         \
-                aio_poll(qemu_get_aio_context(), true);            \
-                aio_context_acquire(ctx_);                         \
-            } else {                                               \
-                busy_ = aio_poll(ctx_, false);                     \
-                waited_ |= busy_;                                  \
-            }                                                      \
+        while ((cond)) {                                           \
+            aio_context_release(ctx_);                             \
+            aio_poll(qemu_get_aio_context(), true);                \
+            aio_context_acquire(ctx_);                             \
+            waited_ = true;                                        \
         }                                                          \
         atomic_dec(&wait_->num_waiters);                           \
     }                                                              \
diff --git a/block/io.c b/block/io.c
index 58f50ba253..aa973f1d4a 100644
--- a/block/io.c
+++ b/block/io.c
@@ -182,13 +182,22 @@ static void bdrv_drain_invoke(BlockDriverState *bs, bool begin)
     BDRV_POLL_WHILE(bs, !data.done);
 }
 
+/* Returns true if BDRV_POLL_WHILE() should go into a blocking aio_poll() */
+static bool bdrv_drain_poll(BlockDriverState *bs)
+{
+    /* Execute pending BHs first and check everything else only after the BHs
+     * have executed. */
+    while (aio_poll(bs->aio_context, false));
+    return atomic_read(&bs->in_flight);
+}
+
 static bool bdrv_drain_recurse(BlockDriverState *bs)
 {
     BdrvChild *child, *tmp;
     bool waited;
 
     /* Wait for drained requests to finish */
-    waited = BDRV_POLL_WHILE(bs, atomic_read(&bs->in_flight) > 0);
+    waited = BDRV_POLL_WHILE(bs, bdrv_drain_poll(bs));
 
     QLIST_FOREACH_SAFE(child, &bs->children, next, tmp) {
         BlockDriverState *bs = child->bs;
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [Qemu-devel] [PATCH v2 07/20] block: Really pause block jobs on drain
  2018-05-29 17:21 [Qemu-devel] [PATCH v2 00/20] Drain fixes and cleanups, part 3 Kevin Wolf
                   ` (5 preceding siblings ...)
  2018-05-29 17:21 ` [Qemu-devel] [PATCH v2 06/20] block: Avoid unnecessary aio_poll() in AIO_WAIT_WHILE() Kevin Wolf
@ 2018-05-29 17:21 ` Kevin Wolf
  2018-05-29 17:21 ` [Qemu-devel] [PATCH v2 08/20] block: Remove bdrv_drain_recurse() Kevin Wolf
                   ` (14 subsequent siblings)
  21 siblings, 0 replies; 24+ messages in thread
From: Kevin Wolf @ 2018-05-29 17:21 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, mreitz, pbonzini, famz, stefanha, eblake, qemu-devel

We already requested that block jobs be paused in .bdrv_drained_begin,
but no guarantee was made that the job was actually inactive at the
point where bdrv_drained_begin() returned.

This introduces a new callback BdrvChildRole.bdrv_drained_poll() and
uses it to make bdrv_drain_poll() consider block jobs using the node to
be drained.

For the test case to work as expected, we have to switch from
block_job_sleep_ns() to qemu_co_sleep_ns() so that the test job is even
considered active and must be waited for when draining the node.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 include/block/block.h        |  8 ++++++++
 include/block/block_int.h    |  7 +++++++
 include/block/blockjob_int.h |  8 ++++++++
 block.c                      |  9 +++++++++
 block/io.c                   | 40 ++++++++++++++++++++++++++++++++++------
 block/mirror.c               |  8 ++++++++
 blockjob.c                   | 23 +++++++++++++++++++++++
 tests/test-bdrv-drain.c      | 18 ++++++++++--------
 8 files changed, 107 insertions(+), 14 deletions(-)

diff --git a/include/block/block.h b/include/block/block.h
index 3894edda9d..52f07da3d3 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -566,6 +566,14 @@ void bdrv_parent_drained_begin(BlockDriverState *bs, BdrvChild *ignore);
 void bdrv_parent_drained_end(BlockDriverState *bs, BdrvChild *ignore);
 
 /**
+ * bdrv_drain_poll:
+ *
+ * Poll for pending requests in @bs and its parents (except for
+ * @ignore_parent). This is part of bdrv_drained_begin.
+ */
+bool bdrv_drain_poll(BlockDriverState *bs, BdrvChild *ignore_parent);
+
+/**
  * bdrv_drained_begin:
  *
  * Begin a quiesced section for exclusive access to the BDS, by disabling
diff --git a/include/block/block_int.h b/include/block/block_int.h
index 6c0927bce3..feba86888e 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -573,6 +573,13 @@ struct BdrvChildRole {
     void (*drained_begin)(BdrvChild *child);
     void (*drained_end)(BdrvChild *child);
 
+    /*
+     * Returns whether the parent has pending requests for the child. This
+     * callback is polled after .drained_begin() has been called until all
+     * activity on the child has stopped.
+     */
+    bool (*drained_poll)(BdrvChild *child);
+
     /* Notifies the parent that the child has been activated/inactivated (e.g.
      * when migration is completing) and it can start/stop requesting
      * permissions and doing I/O on it. */
diff --git a/include/block/blockjob_int.h b/include/block/blockjob_int.h
index 5cd50c6639..e4a318dd15 100644
--- a/include/block/blockjob_int.h
+++ b/include/block/blockjob_int.h
@@ -39,6 +39,14 @@ struct BlockJobDriver {
     JobDriver job_driver;
 
     /*
+     * Returns whether the job has pending requests for the child or will
+     * submit new requests before the next pause point. This callback is polled
+     * in the context of draining a job node after requesting that the job be
+     * paused, until all activity on the child has stopped.
+     */
+    bool (*drained_poll)(BlockJob *job);
+
+    /*
      * If the callback is not NULL, it will be invoked before the job is
      * resumed in a new AioContext.  This is the place to move any resources
      * besides job->blk to the new AioContext.
diff --git a/block.c b/block.c
index 501b64c819..3c654a8a63 100644
--- a/block.c
+++ b/block.c
@@ -820,6 +820,12 @@ static void bdrv_child_cb_drained_begin(BdrvChild *child)
     bdrv_drained_begin(bs);
 }
 
+static bool bdrv_child_cb_drained_poll(BdrvChild *child)
+{
+    BlockDriverState *bs = child->opaque;
+    return bdrv_drain_poll(bs, NULL);
+}
+
 static void bdrv_child_cb_drained_end(BdrvChild *child)
 {
     BlockDriverState *bs = child->opaque;
@@ -904,6 +910,7 @@ const BdrvChildRole child_file = {
     .get_parent_desc = bdrv_child_get_parent_desc,
     .inherit_options = bdrv_inherited_options,
     .drained_begin   = bdrv_child_cb_drained_begin,
+    .drained_poll    = bdrv_child_cb_drained_poll,
     .drained_end     = bdrv_child_cb_drained_end,
     .attach          = bdrv_child_cb_attach,
     .detach          = bdrv_child_cb_detach,
@@ -928,6 +935,7 @@ const BdrvChildRole child_format = {
     .get_parent_desc = bdrv_child_get_parent_desc,
     .inherit_options = bdrv_inherited_fmt_options,
     .drained_begin   = bdrv_child_cb_drained_begin,
+    .drained_poll    = bdrv_child_cb_drained_poll,
     .drained_end     = bdrv_child_cb_drained_end,
     .attach          = bdrv_child_cb_attach,
     .detach          = bdrv_child_cb_detach,
@@ -1047,6 +1055,7 @@ const BdrvChildRole child_backing = {
     .detach          = bdrv_backing_detach,
     .inherit_options = bdrv_backing_options,
     .drained_begin   = bdrv_child_cb_drained_begin,
+    .drained_poll    = bdrv_child_cb_drained_poll,
     .drained_end     = bdrv_child_cb_drained_end,
     .inactivate      = bdrv_child_cb_inactivate,
     .update_filename = bdrv_backing_update_filename,
diff --git a/block/io.c b/block/io.c
index aa973f1d4a..b76ef92009 100644
--- a/block/io.c
+++ b/block/io.c
@@ -69,6 +69,23 @@ void bdrv_parent_drained_end(BlockDriverState *bs, BdrvChild *ignore)
     }
 }
 
+static bool bdrv_parent_drained_poll(BlockDriverState *bs, BdrvChild *ignore)
+{
+    BdrvChild *c, *next;
+    bool busy = false;
+
+    QLIST_FOREACH_SAFE(c, &bs->parents, next_parent, next) {
+        if (c == ignore) {
+            continue;
+        }
+        if (c->role->drained_poll) {
+            busy |= c->role->drained_poll(c);
+        }
+    }
+
+    return busy;
+}
+
 static void bdrv_merge_limits(BlockLimits *dst, const BlockLimits *src)
 {
     dst->opt_transfer = MAX(dst->opt_transfer, src->opt_transfer);
@@ -183,21 +200,32 @@ static void bdrv_drain_invoke(BlockDriverState *bs, bool begin)
 }
 
 /* Returns true if BDRV_POLL_WHILE() should go into a blocking aio_poll() */
-static bool bdrv_drain_poll(BlockDriverState *bs)
+bool bdrv_drain_poll(BlockDriverState *bs, BdrvChild *ignore_parent)
+{
+    if (bdrv_parent_drained_poll(bs, ignore_parent)) {
+        return true;
+    }
+
+    return atomic_read(&bs->in_flight);
+}
+
+static bool bdrv_drain_poll_top_level(BlockDriverState *bs,
+                                      BdrvChild *ignore_parent)
 {
     /* Execute pending BHs first and check everything else only after the BHs
      * have executed. */
     while (aio_poll(bs->aio_context, false));
-    return atomic_read(&bs->in_flight);
+
+    return bdrv_drain_poll(bs, ignore_parent);
 }
 
-static bool bdrv_drain_recurse(BlockDriverState *bs)
+static bool bdrv_drain_recurse(BlockDriverState *bs, BdrvChild *parent)
 {
     BdrvChild *child, *tmp;
     bool waited;
 
     /* Wait for drained requests to finish */
-    waited = BDRV_POLL_WHILE(bs, bdrv_drain_poll(bs));
+    waited = BDRV_POLL_WHILE(bs, bdrv_drain_poll_top_level(bs, parent));
 
     QLIST_FOREACH_SAFE(child, &bs->children, next, tmp) {
         BlockDriverState *bs = child->bs;
@@ -214,7 +242,7 @@ static bool bdrv_drain_recurse(BlockDriverState *bs)
              */
             bdrv_ref(bs);
         }
-        waited |= bdrv_drain_recurse(bs);
+        waited |= bdrv_drain_recurse(bs, child);
         if (in_main_loop) {
             bdrv_unref(bs);
         }
@@ -290,7 +318,7 @@ void bdrv_do_drained_begin(BlockDriverState *bs, bool recursive,
 
     bdrv_parent_drained_begin(bs, parent);
     bdrv_drain_invoke(bs, true);
-    bdrv_drain_recurse(bs);
+    bdrv_drain_recurse(bs, parent);
 
     if (recursive) {
         bs->recursive_quiesce_counter++;
diff --git a/block/mirror.c b/block/mirror.c
index dcb66ec3be..fb1d686ff6 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -964,6 +964,12 @@ static void mirror_pause(Job *job)
     mirror_wait_for_all_io(s);
 }
 
+static bool mirror_drained_poll(BlockJob *job)
+{
+    MirrorBlockJob *s = container_of(job, MirrorBlockJob, common);
+    return !!s->in_flight;
+}
+
 static void mirror_attached_aio_context(BlockJob *job, AioContext *new_context)
 {
     MirrorBlockJob *s = container_of(job, MirrorBlockJob, common);
@@ -997,6 +1003,7 @@ static const BlockJobDriver mirror_job_driver = {
         .pause                  = mirror_pause,
         .complete               = mirror_complete,
     },
+    .drained_poll           = mirror_drained_poll,
     .attached_aio_context   = mirror_attached_aio_context,
     .drain                  = mirror_drain,
 };
@@ -1012,6 +1019,7 @@ static const BlockJobDriver commit_active_job_driver = {
         .pause                  = mirror_pause,
         .complete               = mirror_complete,
     },
+    .drained_poll           = mirror_drained_poll,
     .attached_aio_context   = mirror_attached_aio_context,
     .drain                  = mirror_drain,
 };
diff --git a/blockjob.c b/blockjob.c
index 0306533a2e..be5903aa96 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -155,6 +155,28 @@ static void child_job_drained_begin(BdrvChild *c)
     job_pause(&job->job);
 }
 
+static bool child_job_drained_poll(BdrvChild *c)
+{
+    BlockJob *bjob = c->opaque;
+    Job *job = &bjob->job;
+    const BlockJobDriver *drv = block_job_driver(bjob);
+
+    /* An inactive or completed job doesn't have any pending requests. Jobs
+     * with !job->busy are either already paused or have a pause point after
+     * being reentered, so no job driver code will run before they pause. */
+    if (!job->busy || job_is_completed(job) || job->deferred_to_main_loop) {
+        return false;
+    }
+
+    /* Otherwise, assume that it isn't fully stopped yet, but allow the job to
+     * override this assumption. */
+    if (drv->drained_poll) {
+        return drv->drained_poll(bjob);
+    } else {
+        return true;
+    }
+}
+
 static void child_job_drained_end(BdrvChild *c)
 {
     BlockJob *job = c->opaque;
@@ -164,6 +186,7 @@ static void child_job_drained_end(BdrvChild *c)
 static const BdrvChildRole child_job = {
     .get_parent_desc    = child_job_get_parent_desc,
     .drained_begin      = child_job_drained_begin,
+    .drained_poll       = child_job_drained_poll,
     .drained_end        = child_job_drained_end,
     .stay_at_node       = true,
 };
diff --git a/tests/test-bdrv-drain.c b/tests/test-bdrv-drain.c
index 17127e63e4..756e7a1a7f 100644
--- a/tests/test-bdrv-drain.c
+++ b/tests/test-bdrv-drain.c
@@ -686,7 +686,11 @@ static void coroutine_fn test_job_start(void *opaque)
 
     job_transition_to_ready(&s->common.job);
     while (!s->should_complete) {
-        job_sleep_ns(&s->common.job, 100000);
+        /* Avoid block_job_sleep_ns() because it marks the job as !busy. We
+         * want to emulate some actual activity (probably some I/O) here so
+         * that drain has to wait for this acitivity to stop. */
+        qemu_co_sleep_ns(QEMU_CLOCK_REALTIME, 100000);
+        job_pause_point(&s->common.job);
     }
 
     job_defer_to_main_loop(&s->common.job, test_job_completed, NULL);
@@ -733,7 +737,7 @@ static void test_blockjob_common(enum drain_type drain_type)
 
     g_assert_cmpint(job->job.pause_count, ==, 0);
     g_assert_false(job->job.paused);
-    g_assert_false(job->job.busy); /* We're in job_sleep_ns() */
+    g_assert_true(job->job.busy); /* We're in job_sleep_ns() */
 
     do_drain_begin(drain_type, src);
 
@@ -743,15 +747,14 @@ static void test_blockjob_common(enum drain_type drain_type)
     } else {
         g_assert_cmpint(job->job.pause_count, ==, 1);
     }
-    /* XXX We don't wait until the job is actually paused. Is this okay? */
-    /* g_assert_true(job->job.paused); */
+    g_assert_true(job->job.paused);
     g_assert_false(job->job.busy); /* The job is paused */
 
     do_drain_end(drain_type, src);
 
     g_assert_cmpint(job->job.pause_count, ==, 0);
     g_assert_false(job->job.paused);
-    g_assert_false(job->job.busy); /* We're in job_sleep_ns() */
+    g_assert_true(job->job.busy); /* We're in qemu_co_sleep_ns() */
 
     do_drain_begin(drain_type, target);
 
@@ -761,15 +764,14 @@ static void test_blockjob_common(enum drain_type drain_type)
     } else {
         g_assert_cmpint(job->job.pause_count, ==, 1);
     }
-    /* XXX We don't wait until the job is actually paused. Is this okay? */
-    /* g_assert_true(job->job.paused); */
+    g_assert_true(job->job.paused);
     g_assert_false(job->job.busy); /* The job is paused */
 
     do_drain_end(drain_type, target);
 
     g_assert_cmpint(job->job.pause_count, ==, 0);
     g_assert_false(job->job.paused);
-    g_assert_false(job->job.busy); /* We're in job_sleep_ns() */
+    g_assert_true(job->job.busy); /* We're in job_sleep_ns() */
 
     ret = job_complete_sync(&job->job, &error_abort);
     g_assert_cmpint(ret, ==, 0);
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [Qemu-devel] [PATCH v2 08/20] block: Remove bdrv_drain_recurse()
  2018-05-29 17:21 [Qemu-devel] [PATCH v2 00/20] Drain fixes and cleanups, part 3 Kevin Wolf
                   ` (6 preceding siblings ...)
  2018-05-29 17:21 ` [Qemu-devel] [PATCH v2 07/20] block: Really pause block jobs on drain Kevin Wolf
@ 2018-05-29 17:21 ` Kevin Wolf
  2018-05-29 17:21 ` [Qemu-devel] [PATCH v2 09/20] test-bdrv-drain: Add test for node deletion Kevin Wolf
                   ` (13 subsequent siblings)
  21 siblings, 0 replies; 24+ messages in thread
From: Kevin Wolf @ 2018-05-29 17:21 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, mreitz, pbonzini, famz, stefanha, eblake, qemu-devel

For bdrv_drain(), recursively waiting for child node requests is
pointless because we didn't quiesce their parents, so new requests could
come in anyway. Letting the function work only on a single node makes it
more consistent.

For subtree drains and drain_all, we already have the recursion in
bdrv_do_drained_begin(), so the extra recursion doesn't add anything
either.

Remove the useless code.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 block/io.c | 36 +++---------------------------------
 1 file changed, 3 insertions(+), 33 deletions(-)

diff --git a/block/io.c b/block/io.c
index b76ef92009..8303aebb58 100644
--- a/block/io.c
+++ b/block/io.c
@@ -219,38 +219,6 @@ static bool bdrv_drain_poll_top_level(BlockDriverState *bs,
     return bdrv_drain_poll(bs, ignore_parent);
 }
 
-static bool bdrv_drain_recurse(BlockDriverState *bs, BdrvChild *parent)
-{
-    BdrvChild *child, *tmp;
-    bool waited;
-
-    /* Wait for drained requests to finish */
-    waited = BDRV_POLL_WHILE(bs, bdrv_drain_poll_top_level(bs, parent));
-
-    QLIST_FOREACH_SAFE(child, &bs->children, next, tmp) {
-        BlockDriverState *bs = child->bs;
-        bool in_main_loop =
-            qemu_get_current_aio_context() == qemu_get_aio_context();
-        assert(bs->refcnt > 0);
-        if (in_main_loop) {
-            /* In case the recursive bdrv_drain_recurse processes a
-             * block_job_defer_to_main_loop BH and modifies the graph,
-             * let's hold a reference to bs until we are done.
-             *
-             * IOThread doesn't have such a BH, and it is not safe to call
-             * bdrv_unref without BQL, so skip doing it there.
-             */
-            bdrv_ref(bs);
-        }
-        waited |= bdrv_drain_recurse(bs, child);
-        if (in_main_loop) {
-            bdrv_unref(bs);
-        }
-    }
-
-    return waited;
-}
-
 static void bdrv_do_drained_begin(BlockDriverState *bs, bool recursive,
                                   BdrvChild *parent);
 static void bdrv_do_drained_end(BlockDriverState *bs, bool recursive,
@@ -318,7 +286,9 @@ void bdrv_do_drained_begin(BlockDriverState *bs, bool recursive,
 
     bdrv_parent_drained_begin(bs, parent);
     bdrv_drain_invoke(bs, true);
-    bdrv_drain_recurse(bs, parent);
+
+    /* Wait for drained requests to finish */
+    BDRV_POLL_WHILE(bs, bdrv_drain_poll_top_level(bs, parent));
 
     if (recursive) {
         bs->recursive_quiesce_counter++;
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [Qemu-devel] [PATCH v2 09/20] test-bdrv-drain: Add test for node deletion
  2018-05-29 17:21 [Qemu-devel] [PATCH v2 00/20] Drain fixes and cleanups, part 3 Kevin Wolf
                   ` (7 preceding siblings ...)
  2018-05-29 17:21 ` [Qemu-devel] [PATCH v2 08/20] block: Remove bdrv_drain_recurse() Kevin Wolf
@ 2018-05-29 17:21 ` Kevin Wolf
  2018-05-29 17:21 ` [Qemu-devel] [PATCH v2 10/20] block: Drain recursively with a single BDRV_POLL_WHILE() Kevin Wolf
                   ` (12 subsequent siblings)
  21 siblings, 0 replies; 24+ messages in thread
From: Kevin Wolf @ 2018-05-29 17:21 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, mreitz, pbonzini, famz, stefanha, eblake, qemu-devel

From: Max Reitz <mreitz@redhat.com>

This patch adds two bdrv-drain tests for what happens if some BDS goes
away during the drainage.

The basic idea is that you have a parent BDS with some child nodes.
Then, you drain one of the children.  Because of that, the party who
actually owns the parent decides to (A) delete it, or (B) detach all its
children from it -- both while the child is still being drained.

A real-world case where this can happen is the mirror block job, which
may exit if you drain one of its children.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 tests/test-bdrv-drain.c | 169 ++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 169 insertions(+)

diff --git a/tests/test-bdrv-drain.c b/tests/test-bdrv-drain.c
index 756e7a1a7f..df438f0084 100644
--- a/tests/test-bdrv-drain.c
+++ b/tests/test-bdrv-drain.c
@@ -797,6 +797,172 @@ static void test_blockjob_drain_subtree(void)
     test_blockjob_common(BDRV_SUBTREE_DRAIN);
 }
 
+
+typedef struct BDRVTestTopState {
+    BdrvChild *wait_child;
+} BDRVTestTopState;
+
+static void bdrv_test_top_close(BlockDriverState *bs)
+{
+    BdrvChild *c, *next_c;
+    QLIST_FOREACH_SAFE(c, &bs->children, next, next_c) {
+        bdrv_unref_child(bs, c);
+    }
+}
+
+static int coroutine_fn bdrv_test_top_co_preadv(BlockDriverState *bs,
+                                                uint64_t offset, uint64_t bytes,
+                                                QEMUIOVector *qiov, int flags)
+{
+    BDRVTestTopState *tts = bs->opaque;
+    return bdrv_co_preadv(tts->wait_child, offset, bytes, qiov, flags);
+}
+
+static BlockDriver bdrv_test_top_driver = {
+    .format_name            = "test_top_driver",
+    .instance_size          = sizeof(BDRVTestTopState),
+
+    .bdrv_close             = bdrv_test_top_close,
+    .bdrv_co_preadv         = bdrv_test_top_co_preadv,
+
+    .bdrv_child_perm        = bdrv_format_default_perms,
+};
+
+typedef struct TestCoDeleteByDrainData {
+    BlockBackend *blk;
+    bool detach_instead_of_delete;
+    bool done;
+} TestCoDeleteByDrainData;
+
+static void coroutine_fn test_co_delete_by_drain(void *opaque)
+{
+    TestCoDeleteByDrainData *dbdd = opaque;
+    BlockBackend *blk = dbdd->blk;
+    BlockDriverState *bs = blk_bs(blk);
+    BDRVTestTopState *tts = bs->opaque;
+    void *buffer = g_malloc(65536);
+    QEMUIOVector qiov;
+    struct iovec iov = {
+        .iov_base = buffer,
+        .iov_len  = 65536,
+    };
+
+    qemu_iovec_init_external(&qiov, &iov, 1);
+
+    /* Pretend some internal write operation from parent to child.
+     * Important: We have to read from the child, not from the parent!
+     * Draining works by first propagating it all up the tree to the
+     * root and then waiting for drainage from root to the leaves
+     * (protocol nodes).  If we have a request waiting on the root,
+     * everything will be drained before we go back down the tree, but
+     * we do not want that.  We want to be in the middle of draining
+     * when this following requests returns. */
+    bdrv_co_preadv(tts->wait_child, 0, 65536, &qiov, 0);
+
+    g_assert_cmpint(bs->refcnt, ==, 1);
+
+    if (!dbdd->detach_instead_of_delete) {
+        blk_unref(blk);
+    } else {
+        BdrvChild *c, *next_c;
+        QLIST_FOREACH_SAFE(c, &bs->children, next, next_c) {
+            bdrv_unref_child(bs, c);
+        }
+    }
+
+    dbdd->done = true;
+}
+
+/**
+ * Test what happens when some BDS has some children, you drain one of
+ * them and this results in the BDS being deleted.
+ *
+ * If @detach_instead_of_delete is set, the BDS is not going to be
+ * deleted but will only detach all of its children.
+ */
+static void do_test_delete_by_drain(bool detach_instead_of_delete)
+{
+    BlockBackend *blk;
+    BlockDriverState *bs, *child_bs, *null_bs;
+    BDRVTestTopState *tts;
+    TestCoDeleteByDrainData dbdd;
+    Coroutine *co;
+
+    bs = bdrv_new_open_driver(&bdrv_test_top_driver, "top", BDRV_O_RDWR,
+                              &error_abort);
+    bs->total_sectors = 65536 >> BDRV_SECTOR_BITS;
+    tts = bs->opaque;
+
+    null_bs = bdrv_open("null-co://", NULL, NULL, BDRV_O_RDWR | BDRV_O_PROTOCOL,
+                        &error_abort);
+    bdrv_attach_child(bs, null_bs, "null-child", &child_file, &error_abort);
+
+    /* This child will be the one to pass to requests through to, and
+     * it will stall until a drain occurs */
+    child_bs = bdrv_new_open_driver(&bdrv_test, "child", BDRV_O_RDWR,
+                                    &error_abort);
+    child_bs->total_sectors = 65536 >> BDRV_SECTOR_BITS;
+    /* Takes our reference to child_bs */
+    tts->wait_child = bdrv_attach_child(bs, child_bs, "wait-child", &child_file,
+                                        &error_abort);
+
+    /* This child is just there to be deleted
+     * (for detach_instead_of_delete == true) */
+    null_bs = bdrv_open("null-co://", NULL, NULL, BDRV_O_RDWR | BDRV_O_PROTOCOL,
+                        &error_abort);
+    bdrv_attach_child(bs, null_bs, "null-child", &child_file, &error_abort);
+
+    blk = blk_new(BLK_PERM_ALL, BLK_PERM_ALL);
+    blk_insert_bs(blk, bs, &error_abort);
+
+    /* Referenced by blk now */
+    bdrv_unref(bs);
+
+    g_assert_cmpint(bs->refcnt, ==, 1);
+    g_assert_cmpint(child_bs->refcnt, ==, 1);
+    g_assert_cmpint(null_bs->refcnt, ==, 1);
+
+
+    dbdd = (TestCoDeleteByDrainData){
+        .blk = blk,
+        .detach_instead_of_delete = detach_instead_of_delete,
+        .done = false,
+    };
+    co = qemu_coroutine_create(test_co_delete_by_drain, &dbdd);
+    qemu_coroutine_enter(co);
+
+    /* Drain the child while the read operation is still pending.
+     * This should result in the operation finishing and
+     * test_co_delete_by_drain() resuming.  Thus, @bs will be deleted
+     * and the coroutine will exit while this drain operation is still
+     * in progress. */
+    bdrv_ref(child_bs);
+    bdrv_drain(child_bs);
+    bdrv_unref(child_bs);
+
+    while (!dbdd.done) {
+        aio_poll(qemu_get_aio_context(), true);
+    }
+
+    if (detach_instead_of_delete) {
+        /* Here, the reference has not passed over to the coroutine,
+         * so we have to delete the BB ourselves */
+        blk_unref(blk);
+    }
+}
+
+
+static void test_delete_by_drain(void)
+{
+    do_test_delete_by_drain(false);
+}
+
+static void test_detach_by_drain(void)
+{
+    do_test_delete_by_drain(true);
+}
+
+
 int main(int argc, char **argv)
 {
     int ret;
@@ -844,6 +1010,9 @@ int main(int argc, char **argv)
     g_test_add_func("/bdrv-drain/blockjob/drain_subtree",
                     test_blockjob_drain_subtree);
 
+    g_test_add_func("/bdrv-drain/deletion", test_delete_by_drain);
+    g_test_add_func("/bdrv-drain/detach", test_detach_by_drain);
+
     ret = g_test_run();
     qemu_event_destroy(&done_event);
     return ret;
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [Qemu-devel] [PATCH v2 10/20] block: Drain recursively with a single BDRV_POLL_WHILE()
  2018-05-29 17:21 [Qemu-devel] [PATCH v2 00/20] Drain fixes and cleanups, part 3 Kevin Wolf
                   ` (8 preceding siblings ...)
  2018-05-29 17:21 ` [Qemu-devel] [PATCH v2 09/20] test-bdrv-drain: Add test for node deletion Kevin Wolf
@ 2018-05-29 17:21 ` Kevin Wolf
  2018-05-29 17:21 ` [Qemu-devel] [PATCH v2 11/20] test-bdrv-drain: Test node deletion in subtree recursion Kevin Wolf
                   ` (11 subsequent siblings)
  21 siblings, 0 replies; 24+ messages in thread
From: Kevin Wolf @ 2018-05-29 17:21 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, mreitz, pbonzini, famz, stefanha, eblake, qemu-devel

Anything can happen inside BDRV_POLL_WHILE(), including graph
changes that may interfere with its callers (e.g. child list iteration
in recursive callers of bdrv_do_drained_begin).

Switch to a single BDRV_POLL_WHILE() call for the whole subtree at the
end of bdrv_do_drained_begin() to avoid such effects. The recursion
happens now inside the loop condition. As the graph can only change
between bdrv_drain_poll() calls, but not inside of it, doing the
recursion here is safe.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 include/block/block.h |  9 +++++---
 block.c               |  2 +-
 block/io.c            | 63 ++++++++++++++++++++++++++++++++++++---------------
 3 files changed, 52 insertions(+), 22 deletions(-)

diff --git a/include/block/block.h b/include/block/block.h
index 52f07da3d3..56962e7ee7 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -568,10 +568,13 @@ void bdrv_parent_drained_end(BlockDriverState *bs, BdrvChild *ignore);
 /**
  * bdrv_drain_poll:
  *
- * Poll for pending requests in @bs and its parents (except for
- * @ignore_parent). This is part of bdrv_drained_begin.
+ * Poll for pending requests in @bs, its parents (except for @ignore_parent),
+ * and if @recursive is true its children as well.
+ *
+ * This is part of bdrv_drained_begin.
  */
-bool bdrv_drain_poll(BlockDriverState *bs, BdrvChild *ignore_parent);
+bool bdrv_drain_poll(BlockDriverState *bs, bool recursive,
+                     BdrvChild *ignore_parent);
 
 /**
  * bdrv_drained_begin:
diff --git a/block.c b/block.c
index 3c654a8a63..8ca2cf27b4 100644
--- a/block.c
+++ b/block.c
@@ -823,7 +823,7 @@ static void bdrv_child_cb_drained_begin(BdrvChild *child)
 static bool bdrv_child_cb_drained_poll(BdrvChild *child)
 {
     BlockDriverState *bs = child->opaque;
-    return bdrv_drain_poll(bs, NULL);
+    return bdrv_drain_poll(bs, false, NULL);
 }
 
 static void bdrv_child_cb_drained_end(BdrvChild *child)
diff --git a/block/io.c b/block/io.c
index 8303aebb58..01784d1115 100644
--- a/block/io.c
+++ b/block/io.c
@@ -165,6 +165,7 @@ typedef struct {
     bool done;
     bool begin;
     bool recursive;
+    bool poll;
     BdrvChild *parent;
 } BdrvCoDrainData;
 
@@ -200,27 +201,42 @@ static void bdrv_drain_invoke(BlockDriverState *bs, bool begin)
 }
 
 /* Returns true if BDRV_POLL_WHILE() should go into a blocking aio_poll() */
-bool bdrv_drain_poll(BlockDriverState *bs, BdrvChild *ignore_parent)
+bool bdrv_drain_poll(BlockDriverState *bs, bool recursive,
+                     BdrvChild *ignore_parent)
 {
+    BdrvChild *child, *next;
+
     if (bdrv_parent_drained_poll(bs, ignore_parent)) {
         return true;
     }
 
-    return atomic_read(&bs->in_flight);
+    if (atomic_read(&bs->in_flight)) {
+        return true;
+    }
+
+    if (recursive) {
+        QLIST_FOREACH_SAFE(child, &bs->children, next, next) {
+            if (bdrv_drain_poll(child->bs, recursive, child)) {
+                return true;
+            }
+        }
+    }
+
+    return false;
 }
 
-static bool bdrv_drain_poll_top_level(BlockDriverState *bs,
+static bool bdrv_drain_poll_top_level(BlockDriverState *bs, bool recursive,
                                       BdrvChild *ignore_parent)
 {
     /* Execute pending BHs first and check everything else only after the BHs
      * have executed. */
     while (aio_poll(bs->aio_context, false));
 
-    return bdrv_drain_poll(bs, ignore_parent);
+    return bdrv_drain_poll(bs, recursive, ignore_parent);
 }
 
 static void bdrv_do_drained_begin(BlockDriverState *bs, bool recursive,
-                                  BdrvChild *parent);
+                                  BdrvChild *parent, bool poll);
 static void bdrv_do_drained_end(BlockDriverState *bs, bool recursive,
                                 BdrvChild *parent);
 
@@ -232,7 +248,7 @@ static void bdrv_co_drain_bh_cb(void *opaque)
 
     bdrv_dec_in_flight(bs);
     if (data->begin) {
-        bdrv_do_drained_begin(bs, data->recursive, data->parent);
+        bdrv_do_drained_begin(bs, data->recursive, data->parent, data->poll);
     } else {
         bdrv_do_drained_end(bs, data->recursive, data->parent);
     }
@@ -243,7 +259,7 @@ static void bdrv_co_drain_bh_cb(void *opaque)
 
 static void coroutine_fn bdrv_co_yield_to_drain(BlockDriverState *bs,
                                                 bool begin, bool recursive,
-                                                BdrvChild *parent)
+                                                BdrvChild *parent, bool poll)
 {
     BdrvCoDrainData data;
 
@@ -258,6 +274,7 @@ static void coroutine_fn bdrv_co_yield_to_drain(BlockDriverState *bs,
         .begin = begin,
         .recursive = recursive,
         .parent = parent,
+        .poll = poll,
     };
     bdrv_inc_in_flight(bs);
     aio_bh_schedule_oneshot(bdrv_get_aio_context(bs),
@@ -270,12 +287,12 @@ static void coroutine_fn bdrv_co_yield_to_drain(BlockDriverState *bs,
 }
 
 void bdrv_do_drained_begin(BlockDriverState *bs, bool recursive,
-                           BdrvChild *parent)
+                           BdrvChild *parent, bool poll)
 {
     BdrvChild *child, *next;
 
     if (qemu_in_coroutine()) {
-        bdrv_co_yield_to_drain(bs, true, recursive, parent);
+        bdrv_co_yield_to_drain(bs, true, recursive, parent, poll);
         return;
     }
 
@@ -287,25 +304,35 @@ void bdrv_do_drained_begin(BlockDriverState *bs, bool recursive,
     bdrv_parent_drained_begin(bs, parent);
     bdrv_drain_invoke(bs, true);
 
-    /* Wait for drained requests to finish */
-    BDRV_POLL_WHILE(bs, bdrv_drain_poll_top_level(bs, parent));
-
     if (recursive) {
         bs->recursive_quiesce_counter++;
         QLIST_FOREACH_SAFE(child, &bs->children, next, next) {
-            bdrv_do_drained_begin(child->bs, true, child);
+            bdrv_do_drained_begin(child->bs, true, child, false);
         }
     }
+
+    /*
+     * Wait for drained requests to finish.
+     *
+     * Calling BDRV_POLL_WHILE() only once for the top-level node is okay: The
+     * call is needed so things in this AioContext can make progress even
+     * though we don't return to the main AioContext loop - this automatically
+     * includes other nodes in the same AioContext and therefore all child
+     * nodes.
+     */
+    if (poll) {
+        BDRV_POLL_WHILE(bs, bdrv_drain_poll_top_level(bs, recursive, parent));
+    }
 }
 
 void bdrv_drained_begin(BlockDriverState *bs)
 {
-    bdrv_do_drained_begin(bs, false, NULL);
+    bdrv_do_drained_begin(bs, false, NULL, true);
 }
 
 void bdrv_subtree_drained_begin(BlockDriverState *bs)
 {
-    bdrv_do_drained_begin(bs, true, NULL);
+    bdrv_do_drained_begin(bs, true, NULL, true);
 }
 
 void bdrv_do_drained_end(BlockDriverState *bs, bool recursive,
@@ -315,7 +342,7 @@ void bdrv_do_drained_end(BlockDriverState *bs, bool recursive,
     int old_quiesce_counter;
 
     if (qemu_in_coroutine()) {
-        bdrv_co_yield_to_drain(bs, false, recursive, parent);
+        bdrv_co_yield_to_drain(bs, false, recursive, parent, false);
         return;
     }
     assert(bs->quiesce_counter > 0);
@@ -351,7 +378,7 @@ void bdrv_apply_subtree_drain(BdrvChild *child, BlockDriverState *new_parent)
     int i;
 
     for (i = 0; i < new_parent->recursive_quiesce_counter; i++) {
-        bdrv_do_drained_begin(child->bs, true, child);
+        bdrv_do_drained_begin(child->bs, true, child, true);
     }
 }
 
@@ -421,7 +448,7 @@ void bdrv_drain_all_begin(void)
         AioContext *aio_context = bdrv_get_aio_context(bs);
 
         aio_context_acquire(aio_context);
-        bdrv_do_drained_begin(bs, true, NULL);
+        bdrv_do_drained_begin(bs, true, NULL, true);
         aio_context_release(aio_context);
     }
 
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [Qemu-devel] [PATCH v2 11/20] test-bdrv-drain: Test node deletion in subtree recursion
  2018-05-29 17:21 [Qemu-devel] [PATCH v2 00/20] Drain fixes and cleanups, part 3 Kevin Wolf
                   ` (9 preceding siblings ...)
  2018-05-29 17:21 ` [Qemu-devel] [PATCH v2 10/20] block: Drain recursively with a single BDRV_POLL_WHILE() Kevin Wolf
@ 2018-05-29 17:21 ` Kevin Wolf
  2018-05-29 17:21 ` [Qemu-devel] [PATCH v2 12/20] block: Don't poll in parent drain callbacks Kevin Wolf
                   ` (10 subsequent siblings)
  21 siblings, 0 replies; 24+ messages in thread
From: Kevin Wolf @ 2018-05-29 17:21 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, mreitz, pbonzini, famz, stefanha, eblake, qemu-devel

If bdrv_do_drained_begin() polls during its subtree recursion, the graph
can change and mess up the bs->children iteration. Test that this
doesn't happen.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 tests/test-bdrv-drain.c | 38 +++++++++++++++++++++++++++++---------
 1 file changed, 29 insertions(+), 9 deletions(-)

diff --git a/tests/test-bdrv-drain.c b/tests/test-bdrv-drain.c
index df438f0084..4b788e323e 100644
--- a/tests/test-bdrv-drain.c
+++ b/tests/test-bdrv-drain.c
@@ -880,7 +880,8 @@ static void coroutine_fn test_co_delete_by_drain(void *opaque)
  * If @detach_instead_of_delete is set, the BDS is not going to be
  * deleted but will only detach all of its children.
  */
-static void do_test_delete_by_drain(bool detach_instead_of_delete)
+static void do_test_delete_by_drain(bool detach_instead_of_delete,
+                                    enum drain_type drain_type)
 {
     BlockBackend *blk;
     BlockDriverState *bs, *child_bs, *null_bs;
@@ -936,9 +937,23 @@ static void do_test_delete_by_drain(bool detach_instead_of_delete)
      * test_co_delete_by_drain() resuming.  Thus, @bs will be deleted
      * and the coroutine will exit while this drain operation is still
      * in progress. */
-    bdrv_ref(child_bs);
-    bdrv_drain(child_bs);
-    bdrv_unref(child_bs);
+    switch (drain_type) {
+    case BDRV_DRAIN:
+        bdrv_ref(child_bs);
+        bdrv_drain(child_bs);
+        bdrv_unref(child_bs);
+        break;
+    case BDRV_SUBTREE_DRAIN:
+        /* Would have to ref/unref bs here for !detach_instead_of_delete, but
+         * then the whole test becomes pointless because the graph changes
+         * don't occur during the drain any more. */
+        assert(detach_instead_of_delete);
+        bdrv_subtree_drained_begin(bs);
+        bdrv_subtree_drained_end(bs);
+        break;
+    default:
+        g_assert_not_reached();
+    }
 
     while (!dbdd.done) {
         aio_poll(qemu_get_aio_context(), true);
@@ -951,15 +966,19 @@ static void do_test_delete_by_drain(bool detach_instead_of_delete)
     }
 }
 
-
 static void test_delete_by_drain(void)
 {
-    do_test_delete_by_drain(false);
+    do_test_delete_by_drain(false, BDRV_DRAIN);
 }
 
 static void test_detach_by_drain(void)
 {
-    do_test_delete_by_drain(true);
+    do_test_delete_by_drain(true, BDRV_DRAIN);
+}
+
+static void test_detach_by_drain_subtree(void)
+{
+    do_test_delete_by_drain(true, BDRV_SUBTREE_DRAIN);
 }
 
 
@@ -1010,8 +1029,9 @@ int main(int argc, char **argv)
     g_test_add_func("/bdrv-drain/blockjob/drain_subtree",
                     test_blockjob_drain_subtree);
 
-    g_test_add_func("/bdrv-drain/deletion", test_delete_by_drain);
-    g_test_add_func("/bdrv-drain/detach", test_detach_by_drain);
+    g_test_add_func("/bdrv-drain/deletion/drain", test_delete_by_drain);
+    g_test_add_func("/bdrv-drain/detach/drain", test_detach_by_drain);
+    g_test_add_func("/bdrv-drain/detach/drain_subtree", test_detach_by_drain_subtree);
 
     ret = g_test_run();
     qemu_event_destroy(&done_event);
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [Qemu-devel] [PATCH v2 12/20] block: Don't poll in parent drain callbacks
  2018-05-29 17:21 [Qemu-devel] [PATCH v2 00/20] Drain fixes and cleanups, part 3 Kevin Wolf
                   ` (10 preceding siblings ...)
  2018-05-29 17:21 ` [Qemu-devel] [PATCH v2 11/20] test-bdrv-drain: Test node deletion in subtree recursion Kevin Wolf
@ 2018-05-29 17:21 ` Kevin Wolf
  2018-05-29 17:21 ` [Qemu-devel] [PATCH v2 13/20] test-bdrv-drain: Graph change through parent callback Kevin Wolf
                   ` (9 subsequent siblings)
  21 siblings, 0 replies; 24+ messages in thread
From: Kevin Wolf @ 2018-05-29 17:21 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, mreitz, pbonzini, famz, stefanha, eblake, qemu-devel

bdrv_do_drained_begin() is only safe if we have a single
BDRV_POLL_WHILE() after quiescing all affected nodes. We cannot allow
that parent callbacks introduce a nested polling loop that could cause
graph changes while we're traversing the graph.

Split off bdrv_do_drained_begin_quiesce(), which only quiesces a single
node without waiting for its requests to complete. These requests will
be waited for in the BDRV_POLL_WHILE() call down the call chain.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 include/block/block.h |  9 +++++++++
 block.c               |  2 +-
 block/io.c            | 24 ++++++++++++++++--------
 3 files changed, 26 insertions(+), 9 deletions(-)

diff --git a/include/block/block.h b/include/block/block.h
index 56962e7ee7..6026c1385e 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -589,6 +589,15 @@ bool bdrv_drain_poll(BlockDriverState *bs, bool recursive,
 void bdrv_drained_begin(BlockDriverState *bs);
 
 /**
+ * bdrv_do_drained_begin_quiesce:
+ *
+ * Quiesces a BDS like bdrv_drained_begin(), but does not wait for already
+ * running requests to complete.
+ */
+void bdrv_do_drained_begin_quiesce(BlockDriverState *bs,
+                                   BdrvChild *parent);
+
+/**
  * Like bdrv_drained_begin, but recursively begins a quiesced section for
  * exclusive access to all child nodes as well.
  */
diff --git a/block.c b/block.c
index 8ca2cf27b4..1405d1180b 100644
--- a/block.c
+++ b/block.c
@@ -817,7 +817,7 @@ static char *bdrv_child_get_parent_desc(BdrvChild *c)
 static void bdrv_child_cb_drained_begin(BdrvChild *child)
 {
     BlockDriverState *bs = child->opaque;
-    bdrv_drained_begin(bs);
+    bdrv_do_drained_begin_quiesce(bs, NULL);
 }
 
 static bool bdrv_child_cb_drained_poll(BdrvChild *child)
diff --git a/block/io.c b/block/io.c
index 01784d1115..af14227af8 100644
--- a/block/io.c
+++ b/block/io.c
@@ -286,15 +286,10 @@ static void coroutine_fn bdrv_co_yield_to_drain(BlockDriverState *bs,
     assert(data.done);
 }
 
-void bdrv_do_drained_begin(BlockDriverState *bs, bool recursive,
-                           BdrvChild *parent, bool poll)
+void bdrv_do_drained_begin_quiesce(BlockDriverState *bs,
+                                   BdrvChild *parent)
 {
-    BdrvChild *child, *next;
-
-    if (qemu_in_coroutine()) {
-        bdrv_co_yield_to_drain(bs, true, recursive, parent, poll);
-        return;
-    }
+    assert(!qemu_in_coroutine());
 
     /* Stop things in parent-to-child order */
     if (atomic_fetch_inc(&bs->quiesce_counter) == 0) {
@@ -303,6 +298,19 @@ void bdrv_do_drained_begin(BlockDriverState *bs, bool recursive,
 
     bdrv_parent_drained_begin(bs, parent);
     bdrv_drain_invoke(bs, true);
+}
+
+static void bdrv_do_drained_begin(BlockDriverState *bs, bool recursive,
+                                  BdrvChild *parent, bool poll)
+{
+    BdrvChild *child, *next;
+
+    if (qemu_in_coroutine()) {
+        bdrv_co_yield_to_drain(bs, true, recursive, parent, poll);
+        return;
+    }
+
+    bdrv_do_drained_begin_quiesce(bs, parent);
 
     if (recursive) {
         bs->recursive_quiesce_counter++;
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [Qemu-devel] [PATCH v2 13/20] test-bdrv-drain: Graph change through parent callback
  2018-05-29 17:21 [Qemu-devel] [PATCH v2 00/20] Drain fixes and cleanups, part 3 Kevin Wolf
                   ` (11 preceding siblings ...)
  2018-05-29 17:21 ` [Qemu-devel] [PATCH v2 12/20] block: Don't poll in parent drain callbacks Kevin Wolf
@ 2018-05-29 17:21 ` Kevin Wolf
  2018-05-29 17:21 ` [Qemu-devel] [PATCH v2 14/20] block: Defer .bdrv_drain_begin callback to polling phase Kevin Wolf
                   ` (8 subsequent siblings)
  21 siblings, 0 replies; 24+ messages in thread
From: Kevin Wolf @ 2018-05-29 17:21 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, mreitz, pbonzini, famz, stefanha, eblake, qemu-devel

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 tests/test-bdrv-drain.c | 130 ++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 130 insertions(+)

diff --git a/tests/test-bdrv-drain.c b/tests/test-bdrv-drain.c
index 4b788e323e..ca7b6a3fdf 100644
--- a/tests/test-bdrv-drain.c
+++ b/tests/test-bdrv-drain.c
@@ -982,6 +982,135 @@ static void test_detach_by_drain_subtree(void)
 }
 
 
+struct detach_by_parent_data {
+    BlockDriverState *parent_b;
+    BdrvChild *child_b;
+    BlockDriverState *c;
+    BdrvChild *child_c;
+};
+
+static void detach_by_parent_aio_cb(void *opaque, int ret)
+{
+    struct detach_by_parent_data *data = opaque;
+
+    g_assert_cmpint(ret, ==, 0);
+    bdrv_unref_child(data->parent_b, data->child_b);
+
+    bdrv_ref(data->c);
+    data->child_c = bdrv_attach_child(data->parent_b, data->c, "PB-C",
+                                      &child_file, &error_abort);
+}
+
+/*
+ * Initial graph:
+ *
+ * PA     PB
+ *    \ /   \
+ *     A     B     C
+ *
+ * PA has a pending write request whose callback changes the child nodes of PB:
+ * It removes B and adds C instead. The subtree of PB is drained, which will
+ * indirectly drain the write request, too.
+ */
+static void test_detach_by_parent_cb(void)
+{
+    BlockBackend *blk;
+    BlockDriverState *parent_a, *parent_b, *a, *b, *c;
+    BdrvChild *child_a, *child_b;
+    BlockAIOCB *acb;
+    struct detach_by_parent_data data;
+
+    QEMUIOVector qiov;
+    struct iovec iov = {
+        .iov_base = NULL,
+        .iov_len = 0,
+    };
+    qemu_iovec_init_external(&qiov, &iov, 1);
+
+    /* Create all involved nodes */
+    parent_a = bdrv_new_open_driver(&bdrv_test, "parent-a", BDRV_O_RDWR,
+                                    &error_abort);
+    parent_b = bdrv_new_open_driver(&bdrv_test, "parent-b", 0,
+                                    &error_abort);
+
+    a = bdrv_new_open_driver(&bdrv_test, "a", BDRV_O_RDWR, &error_abort);
+    b = bdrv_new_open_driver(&bdrv_test, "b", BDRV_O_RDWR, &error_abort);
+    c = bdrv_new_open_driver(&bdrv_test, "c", BDRV_O_RDWR, &error_abort);
+
+    /* blk is a BB for parent-a */
+    blk = blk_new(BLK_PERM_ALL, BLK_PERM_ALL);
+    blk_insert_bs(blk, parent_a, &error_abort);
+    bdrv_unref(parent_a);
+
+    /* Set child relationships */
+    bdrv_ref(b);
+    bdrv_ref(a);
+    child_b = bdrv_attach_child(parent_b, b, "PB-B", &child_file, &error_abort);
+    child_a = bdrv_attach_child(parent_b, a, "PB-A", &child_backing, &error_abort);
+
+    bdrv_ref(a);
+    bdrv_attach_child(parent_a, a, "PA-A", &child_file, &error_abort);
+
+    g_assert_cmpint(parent_a->refcnt, ==, 1);
+    g_assert_cmpint(parent_b->refcnt, ==, 1);
+    g_assert_cmpint(a->refcnt, ==, 3);
+    g_assert_cmpint(b->refcnt, ==, 2);
+    g_assert_cmpint(c->refcnt, ==, 1);
+
+    g_assert(QLIST_FIRST(&parent_b->children) == child_a);
+    g_assert(QLIST_NEXT(child_a, next) == child_b);
+    g_assert(QLIST_NEXT(child_b, next) == NULL);
+
+    /* Start the evil write request */
+    data = (struct detach_by_parent_data) {
+        .parent_b = parent_b,
+        .child_b = child_b,
+        .c = c,
+    };
+    acb = blk_aio_preadv(blk, 0, &qiov, 0, detach_by_parent_aio_cb, &data);
+    g_assert(acb != NULL);
+
+    /* Drain and check the expected result */
+    bdrv_subtree_drained_begin(parent_b);
+
+    g_assert(data.child_c != NULL);
+
+    g_assert_cmpint(parent_a->refcnt, ==, 1);
+    g_assert_cmpint(parent_b->refcnt, ==, 1);
+    g_assert_cmpint(a->refcnt, ==, 3);
+    g_assert_cmpint(b->refcnt, ==, 1);
+    g_assert_cmpint(c->refcnt, ==, 2);
+
+    g_assert(QLIST_FIRST(&parent_b->children) == data.child_c);
+    g_assert(QLIST_NEXT(data.child_c, next) == child_a);
+    g_assert(QLIST_NEXT(child_a, next) == NULL);
+
+    g_assert_cmpint(parent_a->quiesce_counter, ==, 1);
+    g_assert_cmpint(parent_b->quiesce_counter, ==, 1);
+    g_assert_cmpint(a->quiesce_counter, ==, 1);
+    g_assert_cmpint(b->quiesce_counter, ==, 0);
+    g_assert_cmpint(c->quiesce_counter, ==, 1);
+
+    bdrv_subtree_drained_end(parent_b);
+
+    bdrv_unref(parent_b);
+    blk_unref(blk);
+
+    /* XXX Once bdrv_close() unref's children instead of just detaching them,
+     * this won't be necessary any more. */
+    bdrv_unref(a);
+    bdrv_unref(a);
+    bdrv_unref(c);
+
+    g_assert_cmpint(a->refcnt, ==, 1);
+    g_assert_cmpint(b->refcnt, ==, 1);
+    g_assert_cmpint(c->refcnt, ==, 1);
+    bdrv_unref(a);
+    bdrv_unref(b);
+    bdrv_unref(c);
+}
+
+
 int main(int argc, char **argv)
 {
     int ret;
@@ -1032,6 +1161,7 @@ int main(int argc, char **argv)
     g_test_add_func("/bdrv-drain/deletion/drain", test_delete_by_drain);
     g_test_add_func("/bdrv-drain/detach/drain", test_detach_by_drain);
     g_test_add_func("/bdrv-drain/detach/drain_subtree", test_detach_by_drain_subtree);
+    g_test_add_func("/bdrv-drain/detach/parent_cb", test_detach_by_parent_cb);
 
     ret = g_test_run();
     qemu_event_destroy(&done_event);
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [Qemu-devel] [PATCH v2 14/20] block: Defer .bdrv_drain_begin callback to polling phase
  2018-05-29 17:21 [Qemu-devel] [PATCH v2 00/20] Drain fixes and cleanups, part 3 Kevin Wolf
                   ` (12 preceding siblings ...)
  2018-05-29 17:21 ` [Qemu-devel] [PATCH v2 13/20] test-bdrv-drain: Graph change through parent callback Kevin Wolf
@ 2018-05-29 17:21 ` Kevin Wolf
  2018-05-29 17:21 ` [Qemu-devel] [PATCH v2 15/20] test-bdrv-drain: Test that bdrv_drain_invoke() doesn't poll Kevin Wolf
                   ` (7 subsequent siblings)
  21 siblings, 0 replies; 24+ messages in thread
From: Kevin Wolf @ 2018-05-29 17:21 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, mreitz, pbonzini, famz, stefanha, eblake, qemu-devel

We cannot allow aio_poll() in bdrv_drain_invoke(begin=true) until we're
done with propagating the drain through the graph and are doing the
single final BDRV_POLL_WHILE().

Just schedule the coroutine with the callback and increase bs->in_flight
to make sure that the polling phase will wait for it.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 block/io.c | 28 +++++++++++++++++++++++-----
 1 file changed, 23 insertions(+), 5 deletions(-)

diff --git a/block/io.c b/block/io.c
index af14227af8..a5b899723a 100644
--- a/block/io.c
+++ b/block/io.c
@@ -182,22 +182,40 @@ static void coroutine_fn bdrv_drain_invoke_entry(void *opaque)
 
     /* Set data->done before reading bs->wakeup.  */
     atomic_mb_set(&data->done, true);
-    bdrv_wakeup(bs);
+    bdrv_dec_in_flight(bs);
+
+    if (data->begin) {
+        g_free(data);
+    }
 }
 
 /* Recursively call BlockDriver.bdrv_co_drain_begin/end callbacks */
 static void bdrv_drain_invoke(BlockDriverState *bs, bool begin)
 {
-    BdrvCoDrainData data = { .bs = bs, .done = false, .begin = begin};
+    BdrvCoDrainData *data;
 
     if (!bs->drv || (begin && !bs->drv->bdrv_co_drain_begin) ||
             (!begin && !bs->drv->bdrv_co_drain_end)) {
         return;
     }
 
-    data.co = qemu_coroutine_create(bdrv_drain_invoke_entry, &data);
-    bdrv_coroutine_enter(bs, data.co);
-    BDRV_POLL_WHILE(bs, !data.done);
+    data = g_new(BdrvCoDrainData, 1);
+    *data = (BdrvCoDrainData) {
+        .bs = bs,
+        .done = false,
+        .begin = begin
+    };
+
+    /* Make sure the driver callback completes during the polling phase for
+     * drain_begin. */
+    bdrv_inc_in_flight(bs);
+    data->co = qemu_coroutine_create(bdrv_drain_invoke_entry, data);
+    aio_co_schedule(bdrv_get_aio_context(bs), data->co);
+
+    if (!begin) {
+        BDRV_POLL_WHILE(bs, !data->done);
+        g_free(data);
+    }
 }
 
 /* Returns true if BDRV_POLL_WHILE() should go into a blocking aio_poll() */
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [Qemu-devel] [PATCH v2 15/20] test-bdrv-drain: Test that bdrv_drain_invoke() doesn't poll
  2018-05-29 17:21 [Qemu-devel] [PATCH v2 00/20] Drain fixes and cleanups, part 3 Kevin Wolf
                   ` (13 preceding siblings ...)
  2018-05-29 17:21 ` [Qemu-devel] [PATCH v2 14/20] block: Defer .bdrv_drain_begin callback to polling phase Kevin Wolf
@ 2018-05-29 17:21 ` Kevin Wolf
  2018-05-29 17:21 ` [Qemu-devel] [PATCH v2 16/20] block: Allow AIO_WAIT_WHILE with NULL ctx Kevin Wolf
                   ` (6 subsequent siblings)
  21 siblings, 0 replies; 24+ messages in thread
From: Kevin Wolf @ 2018-05-29 17:21 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, mreitz, pbonzini, famz, stefanha, eblake, qemu-devel

This adds a test case that goes wrong if bdrv_drain_invoke() calls
aio_poll().

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 tests/test-bdrv-drain.c | 102 +++++++++++++++++++++++++++++++++++++++++-------
 1 file changed, 88 insertions(+), 14 deletions(-)

diff --git a/tests/test-bdrv-drain.c b/tests/test-bdrv-drain.c
index ca7b6a3fdf..7b0454a193 100644
--- a/tests/test-bdrv-drain.c
+++ b/tests/test-bdrv-drain.c
@@ -34,12 +34,16 @@ static QemuEvent done_event;
 typedef struct BDRVTestState {
     int drain_count;
     AioContext *bh_indirection_ctx;
+    bool sleep_in_drain_begin;
 } BDRVTestState;
 
 static void coroutine_fn bdrv_test_co_drain_begin(BlockDriverState *bs)
 {
     BDRVTestState *s = bs->opaque;
     s->drain_count++;
+    if (s->sleep_in_drain_begin) {
+        qemu_co_sleep_ns(QEMU_CLOCK_REALTIME, 100000);
+    }
 }
 
 static void coroutine_fn bdrv_test_co_drain_end(BlockDriverState *bs)
@@ -80,6 +84,22 @@ static int coroutine_fn bdrv_test_co_preadv(BlockDriverState *bs,
     return 0;
 }
 
+static void bdrv_test_child_perm(BlockDriverState *bs, BdrvChild *c,
+                                 const BdrvChildRole *role,
+                                 BlockReopenQueue *reopen_queue,
+                                 uint64_t perm, uint64_t shared,
+                                 uint64_t *nperm, uint64_t *nshared)
+{
+    /* bdrv_format_default_perms() accepts only these two, so disguise
+     * detach_by_driver_cb_role as one of them. */
+    if (role != &child_file && role != &child_backing) {
+        role = &child_file;
+    }
+
+    bdrv_format_default_perms(bs, c, role, reopen_queue, perm, shared,
+                              nperm, nshared);
+}
+
 static BlockDriver bdrv_test = {
     .format_name            = "test",
     .instance_size          = sizeof(BDRVTestState),
@@ -90,7 +110,7 @@ static BlockDriver bdrv_test = {
     .bdrv_co_drain_begin    = bdrv_test_co_drain_begin,
     .bdrv_co_drain_end      = bdrv_test_co_drain_end,
 
-    .bdrv_child_perm        = bdrv_format_default_perms,
+    .bdrv_child_perm        = bdrv_test_child_perm,
 };
 
 static void aio_ret_cb(void *opaque, int ret)
@@ -987,13 +1007,14 @@ struct detach_by_parent_data {
     BdrvChild *child_b;
     BlockDriverState *c;
     BdrvChild *child_c;
+    bool by_parent_cb;
 };
+static struct detach_by_parent_data detach_by_parent_data;
 
-static void detach_by_parent_aio_cb(void *opaque, int ret)
+static void detach_indirect_bh(void *opaque)
 {
     struct detach_by_parent_data *data = opaque;
 
-    g_assert_cmpint(ret, ==, 0);
     bdrv_unref_child(data->parent_b, data->child_b);
 
     bdrv_ref(data->c);
@@ -1001,6 +1022,25 @@ static void detach_by_parent_aio_cb(void *opaque, int ret)
                                       &child_file, &error_abort);
 }
 
+static void detach_by_parent_aio_cb(void *opaque, int ret)
+{
+    struct detach_by_parent_data *data = &detach_by_parent_data;
+
+    g_assert_cmpint(ret, ==, 0);
+    if (data->by_parent_cb) {
+        detach_indirect_bh(data);
+    }
+}
+
+static void detach_by_driver_cb_drained_begin(BdrvChild *child)
+{
+    aio_bh_schedule_oneshot(qemu_get_current_aio_context(),
+                            detach_indirect_bh, &detach_by_parent_data);
+    child_file.drained_begin(child);
+}
+
+static BdrvChildRole detach_by_driver_cb_role;
+
 /*
  * Initial graph:
  *
@@ -1008,17 +1048,25 @@ static void detach_by_parent_aio_cb(void *opaque, int ret)
  *    \ /   \
  *     A     B     C
  *
- * PA has a pending write request whose callback changes the child nodes of PB:
- * It removes B and adds C instead. The subtree of PB is drained, which will
- * indirectly drain the write request, too.
+ * by_parent_cb == true:  Test that parent callbacks don't poll
+ *
+ *     PA has a pending write request whose callback changes the child nodes of
+ *     PB: It removes B and adds C instead. The subtree of PB is drained, which
+ *     will indirectly drain the write request, too.
+ *
+ * by_parent_cb == false: Test that bdrv_drain_invoke() doesn't poll
+ *
+ *     PA's BdrvChildRole has a .drained_begin callback that schedules a BH
+ *     that does the same graph change. If bdrv_drain_invoke() calls it, the
+ *     state is messed up, but if it is only polled in the single
+ *     BDRV_POLL_WHILE() at the end of the drain, this should work fine.
  */
-static void test_detach_by_parent_cb(void)
+static void test_detach_indirect(bool by_parent_cb)
 {
     BlockBackend *blk;
     BlockDriverState *parent_a, *parent_b, *a, *b, *c;
     BdrvChild *child_a, *child_b;
     BlockAIOCB *acb;
-    struct detach_by_parent_data data;
 
     QEMUIOVector qiov;
     struct iovec iov = {
@@ -1027,6 +1075,12 @@ static void test_detach_by_parent_cb(void)
     };
     qemu_iovec_init_external(&qiov, &iov, 1);
 
+    if (!by_parent_cb) {
+        detach_by_driver_cb_role = child_file;
+        detach_by_driver_cb_role.drained_begin =
+            detach_by_driver_cb_drained_begin;
+    }
+
     /* Create all involved nodes */
     parent_a = bdrv_new_open_driver(&bdrv_test, "parent-a", BDRV_O_RDWR,
                                     &error_abort);
@@ -1042,6 +1096,13 @@ static void test_detach_by_parent_cb(void)
     blk_insert_bs(blk, parent_a, &error_abort);
     bdrv_unref(parent_a);
 
+    /* If we want to get bdrv_drain_invoke() to call aio_poll(), the driver
+     * callback must not return immediately. */
+    if (!by_parent_cb) {
+        BDRVTestState *s = parent_a->opaque;
+        s->sleep_in_drain_begin = true;
+    }
+
     /* Set child relationships */
     bdrv_ref(b);
     bdrv_ref(a);
@@ -1049,7 +1110,9 @@ static void test_detach_by_parent_cb(void)
     child_a = bdrv_attach_child(parent_b, a, "PB-A", &child_backing, &error_abort);
 
     bdrv_ref(a);
-    bdrv_attach_child(parent_a, a, "PA-A", &child_file, &error_abort);
+    bdrv_attach_child(parent_a, a, "PA-A",
+                      by_parent_cb ? &child_file : &detach_by_driver_cb_role,
+                      &error_abort);
 
     g_assert_cmpint(parent_a->refcnt, ==, 1);
     g_assert_cmpint(parent_b->refcnt, ==, 1);
@@ -1062,18 +1125,19 @@ static void test_detach_by_parent_cb(void)
     g_assert(QLIST_NEXT(child_b, next) == NULL);
 
     /* Start the evil write request */
-    data = (struct detach_by_parent_data) {
+    detach_by_parent_data = (struct detach_by_parent_data) {
         .parent_b = parent_b,
         .child_b = child_b,
         .c = c,
+        .by_parent_cb = by_parent_cb,
     };
-    acb = blk_aio_preadv(blk, 0, &qiov, 0, detach_by_parent_aio_cb, &data);
+    acb = blk_aio_preadv(blk, 0, &qiov, 0, detach_by_parent_aio_cb, NULL);
     g_assert(acb != NULL);
 
     /* Drain and check the expected result */
     bdrv_subtree_drained_begin(parent_b);
 
-    g_assert(data.child_c != NULL);
+    g_assert(detach_by_parent_data.child_c != NULL);
 
     g_assert_cmpint(parent_a->refcnt, ==, 1);
     g_assert_cmpint(parent_b->refcnt, ==, 1);
@@ -1081,8 +1145,8 @@ static void test_detach_by_parent_cb(void)
     g_assert_cmpint(b->refcnt, ==, 1);
     g_assert_cmpint(c->refcnt, ==, 2);
 
-    g_assert(QLIST_FIRST(&parent_b->children) == data.child_c);
-    g_assert(QLIST_NEXT(data.child_c, next) == child_a);
+    g_assert(QLIST_FIRST(&parent_b->children) == detach_by_parent_data.child_c);
+    g_assert(QLIST_NEXT(detach_by_parent_data.child_c, next) == child_a);
     g_assert(QLIST_NEXT(child_a, next) == NULL);
 
     g_assert_cmpint(parent_a->quiesce_counter, ==, 1);
@@ -1110,6 +1174,15 @@ static void test_detach_by_parent_cb(void)
     bdrv_unref(c);
 }
 
+static void test_detach_by_parent_cb(void)
+{
+    test_detach_indirect(true);
+}
+
+static void test_detach_by_driver_cb(void)
+{
+    test_detach_indirect(false);
+}
 
 int main(int argc, char **argv)
 {
@@ -1162,6 +1235,7 @@ int main(int argc, char **argv)
     g_test_add_func("/bdrv-drain/detach/drain", test_detach_by_drain);
     g_test_add_func("/bdrv-drain/detach/drain_subtree", test_detach_by_drain_subtree);
     g_test_add_func("/bdrv-drain/detach/parent_cb", test_detach_by_parent_cb);
+    g_test_add_func("/bdrv-drain/detach/driver_cb", test_detach_by_driver_cb);
 
     ret = g_test_run();
     qemu_event_destroy(&done_event);
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [Qemu-devel] [PATCH v2 16/20] block: Allow AIO_WAIT_WHILE with NULL ctx
  2018-05-29 17:21 [Qemu-devel] [PATCH v2 00/20] Drain fixes and cleanups, part 3 Kevin Wolf
                   ` (14 preceding siblings ...)
  2018-05-29 17:21 ` [Qemu-devel] [PATCH v2 15/20] test-bdrv-drain: Test that bdrv_drain_invoke() doesn't poll Kevin Wolf
@ 2018-05-29 17:21 ` Kevin Wolf
  2018-05-29 17:21 ` [Qemu-devel] [PATCH v2 17/20] block: Move bdrv_drain_all_begin() out of coroutine context Kevin Wolf
                   ` (5 subsequent siblings)
  21 siblings, 0 replies; 24+ messages in thread
From: Kevin Wolf @ 2018-05-29 17:21 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, mreitz, pbonzini, famz, stefanha, eblake, qemu-devel

bdrv_drain_all() wants to have a single polling loop for draining the
in-flight requests of all nodes. This means that the AIO_WAIT_WHILE()
condition relies on activity in multiple AioContexts, which is polled
from the mainloop context. We must therefore call AIO_WAIT_WHILE() from
the mainloop thread and use the AioWait notification mechanism.

Just randomly picking the AioContext of any non-mainloop thread would
work, but instead of bothering to find such a context in the caller, we
can just as well accept NULL for ctx.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 include/block/aio-wait.h | 13 +++++++++----
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/include/block/aio-wait.h b/include/block/aio-wait.h
index 783d3678dd..c85a62f798 100644
--- a/include/block/aio-wait.h
+++ b/include/block/aio-wait.h
@@ -57,7 +57,8 @@ typedef struct {
 /**
  * AIO_WAIT_WHILE:
  * @wait: the aio wait object
- * @ctx: the aio context
+ * @ctx: the aio context, or NULL if multiple aio contexts (for which the
+ *       caller does not hold a lock) are involved in the polling condition.
  * @cond: wait while this conditional expression is true
  *
  * Wait while a condition is true.  Use this to implement synchronous
@@ -75,7 +76,7 @@ typedef struct {
     bool waited_ = false;                                          \
     AioWait *wait_ = (wait);                                       \
     AioContext *ctx_ = (ctx);                                      \
-    if (in_aio_context_home_thread(ctx_)) {                        \
+    if (ctx_ && in_aio_context_home_thread(ctx_)) {                \
         while ((cond)) {                                           \
             aio_poll(ctx_, true);                                  \
             waited_ = true;                                        \
@@ -86,9 +87,13 @@ typedef struct {
         /* Increment wait_->num_waiters before evaluating cond. */ \
         atomic_inc(&wait_->num_waiters);                           \
         while ((cond)) {                                           \
-            aio_context_release(ctx_);                             \
+            if (ctx_) {                                            \
+                aio_context_release(ctx_);                         \
+            }                                                      \
             aio_poll(qemu_get_aio_context(), true);                \
-            aio_context_acquire(ctx_);                             \
+            if (ctx_) {                                            \
+                aio_context_acquire(ctx_);                         \
+            }                                                      \
             waited_ = true;                                        \
         }                                                          \
         atomic_dec(&wait_->num_waiters);                           \
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [Qemu-devel] [PATCH v2 17/20] block: Move bdrv_drain_all_begin() out of coroutine context
  2018-05-29 17:21 [Qemu-devel] [PATCH v2 00/20] Drain fixes and cleanups, part 3 Kevin Wolf
                   ` (15 preceding siblings ...)
  2018-05-29 17:21 ` [Qemu-devel] [PATCH v2 16/20] block: Allow AIO_WAIT_WHILE with NULL ctx Kevin Wolf
@ 2018-05-29 17:21 ` Kevin Wolf
  2018-05-29 17:21 ` [Qemu-devel] [PATCH v2 18/20] block: ignore_bds_parents parameter for drain functions Kevin Wolf
                   ` (4 subsequent siblings)
  21 siblings, 0 replies; 24+ messages in thread
From: Kevin Wolf @ 2018-05-29 17:21 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, mreitz, pbonzini, famz, stefanha, eblake, qemu-devel

Before we can introduce a single polling loop for all nodes in
bdrv_drain_all_begin(), we must make sure to run it outside of coroutine
context like we already do for bdrv_do_drained_begin().

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 block/io.c | 22 +++++++++++++++++-----
 1 file changed, 17 insertions(+), 5 deletions(-)

diff --git a/block/io.c b/block/io.c
index a5b899723a..2c06aaf6d8 100644
--- a/block/io.c
+++ b/block/io.c
@@ -264,11 +264,16 @@ static void bdrv_co_drain_bh_cb(void *opaque)
     Coroutine *co = data->co;
     BlockDriverState *bs = data->bs;
 
-    bdrv_dec_in_flight(bs);
-    if (data->begin) {
-        bdrv_do_drained_begin(bs, data->recursive, data->parent, data->poll);
+    if (bs) {
+        bdrv_dec_in_flight(bs);
+        if (data->begin) {
+            bdrv_do_drained_begin(bs, data->recursive, data->parent, data->poll);
+        } else {
+            bdrv_do_drained_end(bs, data->recursive, data->parent);
+        }
     } else {
-        bdrv_do_drained_end(bs, data->recursive, data->parent);
+        assert(data->begin);
+        bdrv_drain_all_begin();
     }
 
     data->done = true;
@@ -294,7 +299,9 @@ static void coroutine_fn bdrv_co_yield_to_drain(BlockDriverState *bs,
         .parent = parent,
         .poll = poll,
     };
-    bdrv_inc_in_flight(bs);
+    if (bs) {
+        bdrv_inc_in_flight(bs);
+    }
     aio_bh_schedule_oneshot(bdrv_get_aio_context(bs),
                             bdrv_co_drain_bh_cb, &data);
 
@@ -464,6 +471,11 @@ void bdrv_drain_all_begin(void)
     BlockDriverState *bs;
     BdrvNextIterator it;
 
+    if (qemu_in_coroutine()) {
+        bdrv_co_yield_to_drain(NULL, true, false, NULL, true);
+        return;
+    }
+
     /* BDRV_POLL_WHILE() for a node can only be called from its own I/O thread
      * or the main loop AioContext. We potentially use BDRV_POLL_WHILE() on
      * nodes in several different AioContexts, so make sure we're in the main
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [Qemu-devel] [PATCH v2 18/20] block: ignore_bds_parents parameter for drain functions
  2018-05-29 17:21 [Qemu-devel] [PATCH v2 00/20] Drain fixes and cleanups, part 3 Kevin Wolf
                   ` (16 preceding siblings ...)
  2018-05-29 17:21 ` [Qemu-devel] [PATCH v2 17/20] block: Move bdrv_drain_all_begin() out of coroutine context Kevin Wolf
@ 2018-05-29 17:21 ` Kevin Wolf
  2018-05-29 17:21 ` [Qemu-devel] [PATCH v2 19/20] block: Allow graph changes in bdrv_drain_all_begin/end sections Kevin Wolf
                   ` (3 subsequent siblings)
  21 siblings, 0 replies; 24+ messages in thread
From: Kevin Wolf @ 2018-05-29 17:21 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, mreitz, pbonzini, famz, stefanha, eblake, qemu-devel

In the future, bdrv_drained_all_begin/end() will drain all invidiual
nodes separately rather than whole subtrees. This means that we don't
want to propagate the drain to all parents any more: If the parent is a
BDS, it will already be drained separately. Recursing to all parents is
unnecessary work and would make it an O(n²) operation.

Prepare the drain function for the changed drain_all by adding an
ignore_bds_parents parameter to the internal implementation that
prevents the propagation of the drain to BDS parents. We still (have to)
propagate it to non-BDS parents like BlockBackends or Jobs because those
are not drained separately.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 include/block/block.h     | 16 ++++++---
 include/block/block_int.h |  6 ++++
 block.c                   | 11 +++---
 block/io.c                | 88 ++++++++++++++++++++++++++++-------------------
 block/vvfat.c             |  1 +
 5 files changed, 78 insertions(+), 44 deletions(-)

diff --git a/include/block/block.h b/include/block/block.h
index 6026c1385e..a678556608 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -555,7 +555,8 @@ void bdrv_io_unplug(BlockDriverState *bs);
  * Begin a quiesced section of all users of @bs. This is part of
  * bdrv_drained_begin.
  */
-void bdrv_parent_drained_begin(BlockDriverState *bs, BdrvChild *ignore);
+void bdrv_parent_drained_begin(BlockDriverState *bs, BdrvChild *ignore,
+                               bool ignore_bds_parents);
 
 /**
  * bdrv_parent_drained_end:
@@ -563,18 +564,23 @@ void bdrv_parent_drained_begin(BlockDriverState *bs, BdrvChild *ignore);
  * End a quiesced section of all users of @bs. This is part of
  * bdrv_drained_end.
  */
-void bdrv_parent_drained_end(BlockDriverState *bs, BdrvChild *ignore);
+void bdrv_parent_drained_end(BlockDriverState *bs, BdrvChild *ignore,
+                             bool ignore_bds_parents);
 
 /**
  * bdrv_drain_poll:
  *
  * Poll for pending requests in @bs, its parents (except for @ignore_parent),
- * and if @recursive is true its children as well.
+ * and if @recursive is true its children as well (used for subtree drain).
+ *
+ * If @ignore_bds_parents is true, parents that are BlockDriverStates must
+ * ignore the drain request because they will be drained separately (used for
+ * drain_all).
  *
  * This is part of bdrv_drained_begin.
  */
 bool bdrv_drain_poll(BlockDriverState *bs, bool recursive,
-                     BdrvChild *ignore_parent);
+                     BdrvChild *ignore_parent, bool ignore_bds_parents);
 
 /**
  * bdrv_drained_begin:
@@ -595,7 +601,7 @@ void bdrv_drained_begin(BlockDriverState *bs);
  * running requests to complete.
  */
 void bdrv_do_drained_begin_quiesce(BlockDriverState *bs,
-                                   BdrvChild *parent);
+                                   BdrvChild *parent, bool ignore_bds_parents);
 
 /**
  * Like bdrv_drained_begin, but recursively begins a quiesced section for
diff --git a/include/block/block_int.h b/include/block/block_int.h
index feba86888e..edb16df015 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -545,6 +545,12 @@ struct BdrvChildRole {
      * points to. */
     bool stay_at_node;
 
+    /* If true, the parent is a BlockDriverState and bdrv_next_all_states()
+     * will return it. This information is used for drain_all, where every node
+     * will be drained separately, so the drain only needs to be propagated to
+     * non-BDS parents. */
+    bool parent_is_bds;
+
     void (*inherit_options)(int *child_flags, QDict *child_options,
                             int parent_flags, QDict *parent_options);
 
diff --git a/block.c b/block.c
index 1405d1180b..ce6eb53ad9 100644
--- a/block.c
+++ b/block.c
@@ -817,13 +817,13 @@ static char *bdrv_child_get_parent_desc(BdrvChild *c)
 static void bdrv_child_cb_drained_begin(BdrvChild *child)
 {
     BlockDriverState *bs = child->opaque;
-    bdrv_do_drained_begin_quiesce(bs, NULL);
+    bdrv_do_drained_begin_quiesce(bs, NULL, false);
 }
 
 static bool bdrv_child_cb_drained_poll(BdrvChild *child)
 {
     BlockDriverState *bs = child->opaque;
-    return bdrv_drain_poll(bs, false, NULL);
+    return bdrv_drain_poll(bs, false, NULL, false);
 }
 
 static void bdrv_child_cb_drained_end(BdrvChild *child)
@@ -907,6 +907,7 @@ static void bdrv_inherited_options(int *child_flags, QDict *child_options,
 }
 
 const BdrvChildRole child_file = {
+    .parent_is_bds   = true,
     .get_parent_desc = bdrv_child_get_parent_desc,
     .inherit_options = bdrv_inherited_options,
     .drained_begin   = bdrv_child_cb_drained_begin,
@@ -932,6 +933,7 @@ static void bdrv_inherited_fmt_options(int *child_flags, QDict *child_options,
 }
 
 const BdrvChildRole child_format = {
+    .parent_is_bds   = true,
     .get_parent_desc = bdrv_child_get_parent_desc,
     .inherit_options = bdrv_inherited_fmt_options,
     .drained_begin   = bdrv_child_cb_drained_begin,
@@ -1050,6 +1052,7 @@ static int bdrv_backing_update_filename(BdrvChild *c, BlockDriverState *base,
 }
 
 const BdrvChildRole child_backing = {
+    .parent_is_bds   = true,
     .get_parent_desc = bdrv_child_get_parent_desc,
     .attach          = bdrv_backing_attach,
     .detach          = bdrv_backing_detach,
@@ -4945,7 +4948,7 @@ void bdrv_set_aio_context(BlockDriverState *bs, AioContext *new_context)
     AioContext *ctx = bdrv_get_aio_context(bs);
 
     aio_disable_external(ctx);
-    bdrv_parent_drained_begin(bs, NULL);
+    bdrv_parent_drained_begin(bs, NULL, false);
     bdrv_drain(bs); /* ensure there are no in-flight requests */
 
     while (aio_poll(ctx, false)) {
@@ -4959,7 +4962,7 @@ void bdrv_set_aio_context(BlockDriverState *bs, AioContext *new_context)
      */
     aio_context_acquire(new_context);
     bdrv_attach_aio_context(bs, new_context);
-    bdrv_parent_drained_end(bs, NULL);
+    bdrv_parent_drained_end(bs, NULL, false);
     aio_enable_external(ctx);
     aio_context_release(new_context);
 }
diff --git a/block/io.c b/block/io.c
index 2c06aaf6d8..d71e278289 100644
--- a/block/io.c
+++ b/block/io.c
@@ -41,12 +41,13 @@
 static int coroutine_fn bdrv_co_do_pwrite_zeroes(BlockDriverState *bs,
     int64_t offset, int bytes, BdrvRequestFlags flags);
 
-void bdrv_parent_drained_begin(BlockDriverState *bs, BdrvChild *ignore)
+void bdrv_parent_drained_begin(BlockDriverState *bs, BdrvChild *ignore,
+                               bool ignore_bds_parents)
 {
     BdrvChild *c, *next;
 
     QLIST_FOREACH_SAFE(c, &bs->parents, next_parent, next) {
-        if (c == ignore) {
+        if (c == ignore || (ignore_bds_parents && c->role->parent_is_bds)) {
             continue;
         }
         if (c->role->drained_begin) {
@@ -55,12 +56,13 @@ void bdrv_parent_drained_begin(BlockDriverState *bs, BdrvChild *ignore)
     }
 }
 
-void bdrv_parent_drained_end(BlockDriverState *bs, BdrvChild *ignore)
+void bdrv_parent_drained_end(BlockDriverState *bs, BdrvChild *ignore,
+                             bool ignore_bds_parents)
 {
     BdrvChild *c, *next;
 
     QLIST_FOREACH_SAFE(c, &bs->parents, next_parent, next) {
-        if (c == ignore) {
+        if (c == ignore || (ignore_bds_parents && c->role->parent_is_bds)) {
             continue;
         }
         if (c->role->drained_end) {
@@ -69,13 +71,14 @@ void bdrv_parent_drained_end(BlockDriverState *bs, BdrvChild *ignore)
     }
 }
 
-static bool bdrv_parent_drained_poll(BlockDriverState *bs, BdrvChild *ignore)
+static bool bdrv_parent_drained_poll(BlockDriverState *bs, BdrvChild *ignore,
+                                     bool ignore_bds_parents)
 {
     BdrvChild *c, *next;
     bool busy = false;
 
     QLIST_FOREACH_SAFE(c, &bs->parents, next_parent, next) {
-        if (c == ignore) {
+        if (c == ignore || (ignore_bds_parents && c->role->parent_is_bds)) {
             continue;
         }
         if (c->role->drained_poll) {
@@ -167,6 +170,7 @@ typedef struct {
     bool recursive;
     bool poll;
     BdrvChild *parent;
+    bool ignore_bds_parents;
 } BdrvCoDrainData;
 
 static void coroutine_fn bdrv_drain_invoke_entry(void *opaque)
@@ -220,11 +224,11 @@ static void bdrv_drain_invoke(BlockDriverState *bs, bool begin)
 
 /* Returns true if BDRV_POLL_WHILE() should go into a blocking aio_poll() */
 bool bdrv_drain_poll(BlockDriverState *bs, bool recursive,
-                     BdrvChild *ignore_parent)
+                     BdrvChild *ignore_parent, bool ignore_bds_parents)
 {
     BdrvChild *child, *next;
 
-    if (bdrv_parent_drained_poll(bs, ignore_parent)) {
+    if (bdrv_parent_drained_poll(bs, ignore_parent, ignore_bds_parents)) {
         return true;
     }
 
@@ -233,8 +237,9 @@ bool bdrv_drain_poll(BlockDriverState *bs, bool recursive,
     }
 
     if (recursive) {
+        assert(!ignore_bds_parents);
         QLIST_FOREACH_SAFE(child, &bs->children, next, next) {
-            if (bdrv_drain_poll(child->bs, recursive, child)) {
+            if (bdrv_drain_poll(child->bs, recursive, child, false)) {
                 return true;
             }
         }
@@ -250,13 +255,14 @@ static bool bdrv_drain_poll_top_level(BlockDriverState *bs, bool recursive,
      * have executed. */
     while (aio_poll(bs->aio_context, false));
 
-    return bdrv_drain_poll(bs, recursive, ignore_parent);
+    return bdrv_drain_poll(bs, recursive, ignore_parent, false);
 }
 
 static void bdrv_do_drained_begin(BlockDriverState *bs, bool recursive,
-                                  BdrvChild *parent, bool poll);
+                                  BdrvChild *parent, bool ignore_bds_parents,
+                                  bool poll);
 static void bdrv_do_drained_end(BlockDriverState *bs, bool recursive,
-                                BdrvChild *parent);
+                                BdrvChild *parent, bool ignore_bds_parents);
 
 static void bdrv_co_drain_bh_cb(void *opaque)
 {
@@ -267,9 +273,11 @@ static void bdrv_co_drain_bh_cb(void *opaque)
     if (bs) {
         bdrv_dec_in_flight(bs);
         if (data->begin) {
-            bdrv_do_drained_begin(bs, data->recursive, data->parent, data->poll);
+            bdrv_do_drained_begin(bs, data->recursive, data->parent,
+                                  data->ignore_bds_parents, data->poll);
         } else {
-            bdrv_do_drained_end(bs, data->recursive, data->parent);
+            bdrv_do_drained_end(bs, data->recursive, data->parent,
+                                data->ignore_bds_parents);
         }
     } else {
         assert(data->begin);
@@ -282,7 +290,9 @@ static void bdrv_co_drain_bh_cb(void *opaque)
 
 static void coroutine_fn bdrv_co_yield_to_drain(BlockDriverState *bs,
                                                 bool begin, bool recursive,
-                                                BdrvChild *parent, bool poll)
+                                                BdrvChild *parent,
+                                                bool ignore_bds_parents,
+                                                bool poll)
 {
     BdrvCoDrainData data;
 
@@ -297,6 +307,7 @@ static void coroutine_fn bdrv_co_yield_to_drain(BlockDriverState *bs,
         .begin = begin,
         .recursive = recursive,
         .parent = parent,
+        .ignore_bds_parents = ignore_bds_parents,
         .poll = poll,
     };
     if (bs) {
@@ -312,7 +323,7 @@ static void coroutine_fn bdrv_co_yield_to_drain(BlockDriverState *bs,
 }
 
 void bdrv_do_drained_begin_quiesce(BlockDriverState *bs,
-                                   BdrvChild *parent)
+                                   BdrvChild *parent, bool ignore_bds_parents)
 {
     assert(!qemu_in_coroutine());
 
@@ -321,26 +332,30 @@ void bdrv_do_drained_begin_quiesce(BlockDriverState *bs,
         aio_disable_external(bdrv_get_aio_context(bs));
     }
 
-    bdrv_parent_drained_begin(bs, parent);
+    bdrv_parent_drained_begin(bs, parent, ignore_bds_parents);
     bdrv_drain_invoke(bs, true);
 }
 
 static void bdrv_do_drained_begin(BlockDriverState *bs, bool recursive,
-                                  BdrvChild *parent, bool poll)
+                                  BdrvChild *parent, bool ignore_bds_parents,
+                                  bool poll)
 {
     BdrvChild *child, *next;
 
     if (qemu_in_coroutine()) {
-        bdrv_co_yield_to_drain(bs, true, recursive, parent, poll);
+        bdrv_co_yield_to_drain(bs, true, recursive, parent, ignore_bds_parents,
+                               poll);
         return;
     }
 
-    bdrv_do_drained_begin_quiesce(bs, parent);
+    bdrv_do_drained_begin_quiesce(bs, parent, ignore_bds_parents);
 
     if (recursive) {
+        assert(!ignore_bds_parents);
         bs->recursive_quiesce_counter++;
         QLIST_FOREACH_SAFE(child, &bs->children, next, next) {
-            bdrv_do_drained_begin(child->bs, true, child, false);
+            bdrv_do_drained_begin(child->bs, true, child, ignore_bds_parents,
+                                  false);
         }
     }
 
@@ -354,28 +369,30 @@ static void bdrv_do_drained_begin(BlockDriverState *bs, bool recursive,
      * nodes.
      */
     if (poll) {
+        assert(!ignore_bds_parents);
         BDRV_POLL_WHILE(bs, bdrv_drain_poll_top_level(bs, recursive, parent));
     }
 }
 
 void bdrv_drained_begin(BlockDriverState *bs)
 {
-    bdrv_do_drained_begin(bs, false, NULL, true);
+    bdrv_do_drained_begin(bs, false, NULL, false, true);
 }
 
 void bdrv_subtree_drained_begin(BlockDriverState *bs)
 {
-    bdrv_do_drained_begin(bs, true, NULL, true);
+    bdrv_do_drained_begin(bs, true, NULL, false, true);
 }
 
-void bdrv_do_drained_end(BlockDriverState *bs, bool recursive,
-                         BdrvChild *parent)
+static void bdrv_do_drained_end(BlockDriverState *bs, bool recursive,
+                                BdrvChild *parent, bool ignore_bds_parents)
 {
     BdrvChild *child, *next;
     int old_quiesce_counter;
 
     if (qemu_in_coroutine()) {
-        bdrv_co_yield_to_drain(bs, false, recursive, parent, false);
+        bdrv_co_yield_to_drain(bs, false, recursive, parent, ignore_bds_parents,
+                               false);
         return;
     }
     assert(bs->quiesce_counter > 0);
@@ -383,27 +400,28 @@ void bdrv_do_drained_end(BlockDriverState *bs, bool recursive,
 
     /* Re-enable things in child-to-parent order */
     bdrv_drain_invoke(bs, false);
-    bdrv_parent_drained_end(bs, parent);
+    bdrv_parent_drained_end(bs, parent, ignore_bds_parents);
     if (old_quiesce_counter == 1) {
         aio_enable_external(bdrv_get_aio_context(bs));
     }
 
     if (recursive) {
+        assert(!ignore_bds_parents);
         bs->recursive_quiesce_counter--;
         QLIST_FOREACH_SAFE(child, &bs->children, next, next) {
-            bdrv_do_drained_end(child->bs, true, child);
+            bdrv_do_drained_end(child->bs, true, child, ignore_bds_parents);
         }
     }
 }
 
 void bdrv_drained_end(BlockDriverState *bs)
 {
-    bdrv_do_drained_end(bs, false, NULL);
+    bdrv_do_drained_end(bs, false, NULL, false);
 }
 
 void bdrv_subtree_drained_end(BlockDriverState *bs)
 {
-    bdrv_do_drained_end(bs, true, NULL);
+    bdrv_do_drained_end(bs, true, NULL, false);
 }
 
 void bdrv_apply_subtree_drain(BdrvChild *child, BlockDriverState *new_parent)
@@ -411,7 +429,7 @@ void bdrv_apply_subtree_drain(BdrvChild *child, BlockDriverState *new_parent)
     int i;
 
     for (i = 0; i < new_parent->recursive_quiesce_counter; i++) {
-        bdrv_do_drained_begin(child->bs, true, child, true);
+        bdrv_do_drained_begin(child->bs, true, child, false, true);
     }
 }
 
@@ -420,7 +438,7 @@ void bdrv_unapply_subtree_drain(BdrvChild *child, BlockDriverState *old_parent)
     int i;
 
     for (i = 0; i < old_parent->recursive_quiesce_counter; i++) {
-        bdrv_do_drained_end(child->bs, true, child);
+        bdrv_do_drained_end(child->bs, true, child, false);
     }
 }
 
@@ -472,7 +490,7 @@ void bdrv_drain_all_begin(void)
     BdrvNextIterator it;
 
     if (qemu_in_coroutine()) {
-        bdrv_co_yield_to_drain(NULL, true, false, NULL, true);
+        bdrv_co_yield_to_drain(NULL, true, false, NULL, false, true);
         return;
     }
 
@@ -486,7 +504,7 @@ void bdrv_drain_all_begin(void)
         AioContext *aio_context = bdrv_get_aio_context(bs);
 
         aio_context_acquire(aio_context);
-        bdrv_do_drained_begin(bs, true, NULL, true);
+        bdrv_do_drained_begin(bs, true, NULL, false, true);
         aio_context_release(aio_context);
     }
 
@@ -504,7 +522,7 @@ void bdrv_drain_all_end(void)
         AioContext *aio_context = bdrv_get_aio_context(bs);
 
         aio_context_acquire(aio_context);
-        bdrv_do_drained_end(bs, true, NULL);
+        bdrv_do_drained_end(bs, true, NULL, false);
         aio_context_release(aio_context);
     }
 }
diff --git a/block/vvfat.c b/block/vvfat.c
index 662dca0114..384c954d70 100644
--- a/block/vvfat.c
+++ b/block/vvfat.c
@@ -3133,6 +3133,7 @@ static void vvfat_qcow_options(int *child_flags, QDict *child_options,
 }
 
 static const BdrvChildRole child_vvfat_qcow = {
+    .parent_is_bds      = true,
     .inherit_options    = vvfat_qcow_options,
 };
 
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [Qemu-devel] [PATCH v2 19/20] block: Allow graph changes in bdrv_drain_all_begin/end sections
  2018-05-29 17:21 [Qemu-devel] [PATCH v2 00/20] Drain fixes and cleanups, part 3 Kevin Wolf
                   ` (17 preceding siblings ...)
  2018-05-29 17:21 ` [Qemu-devel] [PATCH v2 18/20] block: ignore_bds_parents parameter for drain functions Kevin Wolf
@ 2018-05-29 17:21 ` Kevin Wolf
  2018-05-29 17:21 ` [Qemu-devel] [PATCH v2 20/20] test-bdrv-drain: Test graph changes in drain_all section Kevin Wolf
                   ` (2 subsequent siblings)
  21 siblings, 0 replies; 24+ messages in thread
From: Kevin Wolf @ 2018-05-29 17:21 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, mreitz, pbonzini, famz, stefanha, eblake, qemu-devel

bdrv_drain_all_*() used bdrv_next() to iterate over all root nodes and
did a subtree drain for each of them. This works fine as long as the
graph is static, but sadly, reality looks different.

If the graph changes so that root nodes are added or removed, we would
have to compensate for this. bdrv_next() returns each root node only
once even if it's the root node for multiple BlockBackends or for a
monitor-owned block driver tree, which would only complicate things.

The much easier and more obviously correct way is to fundamentally
change the way the functions work: Iterate over all BlockDriverStates,
no matter who owns them, and drain them individually. Compensation is
only necessary when a new BDS is created inside a drain_all section.
Removal of a BDS doesn't require any action because it's gone afterwards
anyway.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 include/block/block.h     |  1 +
 include/block/block_int.h |  1 +
 block.c                   | 34 ++++++++++++++++++++++++---
 block/io.c                | 60 ++++++++++++++++++++++++++++++++++++-----------
 4 files changed, 79 insertions(+), 17 deletions(-)

diff --git a/include/block/block.h b/include/block/block.h
index a678556608..1ff13854d8 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -419,6 +419,7 @@ BlockDriverState *bdrv_lookup_bs(const char *device,
                                  Error **errp);
 bool bdrv_chain_contains(BlockDriverState *top, BlockDriverState *base);
 BlockDriverState *bdrv_next_node(BlockDriverState *bs);
+BlockDriverState *bdrv_next_all_states(BlockDriverState *bs);
 
 typedef struct BdrvNextIterator {
     enum {
diff --git a/include/block/block_int.h b/include/block/block_int.h
index edb16df015..212694a0c2 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -822,6 +822,7 @@ int coroutine_fn bdrv_co_pwritev(BdrvChild *child,
     int64_t offset, unsigned int bytes, QEMUIOVector *qiov,
     BdrvRequestFlags flags);
 
+extern unsigned int bdrv_drain_all_count;
 void bdrv_apply_subtree_drain(BdrvChild *child, BlockDriverState *new_parent);
 void bdrv_unapply_subtree_drain(BdrvChild *child, BlockDriverState *old_parent);
 
diff --git a/block.c b/block.c
index ce6eb53ad9..5c58a19646 100644
--- a/block.c
+++ b/block.c
@@ -332,6 +332,10 @@ BlockDriverState *bdrv_new(void)
 
     qemu_co_queue_init(&bs->flush_queue);
 
+    for (i = 0; i < bdrv_drain_all_count; i++) {
+        bdrv_drained_begin(bs);
+    }
+
     QTAILQ_INSERT_TAIL(&all_bdrv_states, bs, bs_list);
 
     return bs;
@@ -1163,7 +1167,7 @@ static int bdrv_open_driver(BlockDriverState *bs, BlockDriver *drv,
                             int open_flags, Error **errp)
 {
     Error *local_err = NULL;
-    int ret;
+    int i, ret;
 
     bdrv_assign_node_name(bs, node_name, &local_err);
     if (local_err) {
@@ -1211,6 +1215,12 @@ static int bdrv_open_driver(BlockDriverState *bs, BlockDriver *drv,
     assert(bdrv_min_mem_align(bs) != 0);
     assert(is_power_of_2(bs->bl.request_alignment));
 
+    for (i = 0; i < bs->quiesce_counter; i++) {
+        if (drv->bdrv_co_drain_begin) {
+            drv->bdrv_co_drain_begin(bs);
+        }
+    }
+
     return 0;
 open_failed:
     bs->drv = NULL;
@@ -2021,7 +2031,12 @@ static void bdrv_replace_child_noperm(BdrvChild *child,
             child->role->detach(child);
         }
         if (old_bs->quiesce_counter && child->role->drained_end) {
-            for (i = 0; i < old_bs->quiesce_counter; i++) {
+            int num = old_bs->quiesce_counter;
+            if (child->role->parent_is_bds) {
+                num -= bdrv_drain_all_count;
+            }
+            assert(num >= 0);
+            for (i = 0; i < num; i++) {
                 child->role->drained_end(child);
             }
         }
@@ -2033,7 +2048,12 @@ static void bdrv_replace_child_noperm(BdrvChild *child,
     if (new_bs) {
         QLIST_INSERT_HEAD(&new_bs->parents, child, next_parent);
         if (new_bs->quiesce_counter && child->role->drained_begin) {
-            for (i = 0; i < new_bs->quiesce_counter; i++) {
+            int num = new_bs->quiesce_counter;
+            if (child->role->parent_is_bds) {
+                num -= bdrv_drain_all_count;
+            }
+            assert(num >= 0);
+            for (i = 0; i < num; i++) {
                 child->role->drained_begin(child);
             }
         }
@@ -4037,6 +4057,14 @@ BlockDriverState *bdrv_next_node(BlockDriverState *bs)
     return QTAILQ_NEXT(bs, node_list);
 }
 
+BlockDriverState *bdrv_next_all_states(BlockDriverState *bs)
+{
+    if (!bs) {
+        return QTAILQ_FIRST(&all_bdrv_states);
+    }
+    return QTAILQ_NEXT(bs, bs_list);
+}
+
 const char *bdrv_get_node_name(const BlockDriverState *bs)
 {
     return bs->node_name;
diff --git a/block/io.c b/block/io.c
index d71e278289..08298f815b 100644
--- a/block/io.c
+++ b/block/io.c
@@ -38,6 +38,8 @@
 /* Maximum bounce buffer for copy-on-read and write zeroes, in bytes */
 #define MAX_BOUNCE_BUFFER (32768 << BDRV_SECTOR_BITS)
 
+static AioWait drain_all_aio_wait;
+
 static int coroutine_fn bdrv_co_do_pwrite_zeroes(BlockDriverState *bs,
     int64_t offset, int bytes, BdrvRequestFlags flags);
 
@@ -472,6 +474,29 @@ static void bdrv_drain_assert_idle(BlockDriverState *bs)
     }
 }
 
+unsigned int bdrv_drain_all_count = 0;
+
+static bool bdrv_drain_all_poll(void)
+{
+    BlockDriverState *bs = NULL;
+    bool result = false;
+
+    /* Execute pending BHs first (may modify the graph) and check everything
+     * else only after the BHs have executed. */
+    while (aio_poll(qemu_get_aio_context(), false));
+
+    /* bdrv_drain_poll() can't make changes to the graph and we are holding the
+     * main AioContext lock, so iterating bdrv_next_all_states() is safe. */
+    while ((bs = bdrv_next_all_states(bs))) {
+        AioContext *aio_context = bdrv_get_aio_context(bs);
+        aio_context_acquire(aio_context);
+        result |= bdrv_drain_poll(bs, false, NULL, true);
+        aio_context_release(aio_context);
+    }
+
+    return result;
+}
+
 /*
  * Wait for pending requests to complete across all BlockDriverStates
  *
@@ -486,45 +511,51 @@ static void bdrv_drain_assert_idle(BlockDriverState *bs)
  */
 void bdrv_drain_all_begin(void)
 {
-    BlockDriverState *bs;
-    BdrvNextIterator it;
+    BlockDriverState *bs = NULL;
 
     if (qemu_in_coroutine()) {
-        bdrv_co_yield_to_drain(NULL, true, false, NULL, false, true);
+        bdrv_co_yield_to_drain(NULL, true, false, NULL, true, true);
         return;
     }
 
-    /* BDRV_POLL_WHILE() for a node can only be called from its own I/O thread
-     * or the main loop AioContext. We potentially use BDRV_POLL_WHILE() on
-     * nodes in several different AioContexts, so make sure we're in the main
-     * context. */
+    /* AIO_WAIT_WHILE() with a NULL context can only be called from the main
+     * loop AioContext, so make sure we're in the main context. */
     assert(qemu_get_current_aio_context() == qemu_get_aio_context());
+    assert(bdrv_drain_all_count < INT_MAX);
+    bdrv_drain_all_count++;
 
-    for (bs = bdrv_first(&it); bs; bs = bdrv_next(&it)) {
+    /* Quiesce all nodes, without polling in-flight requests yet. The graph
+     * cannot change during this loop. */
+    while ((bs = bdrv_next_all_states(bs))) {
         AioContext *aio_context = bdrv_get_aio_context(bs);
 
         aio_context_acquire(aio_context);
-        bdrv_do_drained_begin(bs, true, NULL, false, true);
+        bdrv_do_drained_begin(bs, false, NULL, true, false);
         aio_context_release(aio_context);
     }
 
-    for (bs = bdrv_first(&it); bs; bs = bdrv_next(&it)) {
+    /* Now poll the in-flight requests */
+    AIO_WAIT_WHILE(&drain_all_aio_wait, NULL, bdrv_drain_all_poll());
+
+    while ((bs = bdrv_next_all_states(bs))) {
         bdrv_drain_assert_idle(bs);
     }
 }
 
 void bdrv_drain_all_end(void)
 {
-    BlockDriverState *bs;
-    BdrvNextIterator it;
+    BlockDriverState *bs = NULL;
 
-    for (bs = bdrv_first(&it); bs; bs = bdrv_next(&it)) {
+    while ((bs = bdrv_next_all_states(bs))) {
         AioContext *aio_context = bdrv_get_aio_context(bs);
 
         aio_context_acquire(aio_context);
-        bdrv_do_drained_end(bs, true, NULL, false);
+        bdrv_do_drained_end(bs, false, NULL, true);
         aio_context_release(aio_context);
     }
+
+    assert(bdrv_drain_all_count > 0);
+    bdrv_drain_all_count--;
 }
 
 void bdrv_drain_all(void)
@@ -647,6 +678,7 @@ void bdrv_inc_in_flight(BlockDriverState *bs)
 void bdrv_wakeup(BlockDriverState *bs)
 {
     aio_wait_kick(bdrv_get_aio_wait(bs));
+    aio_wait_kick(&drain_all_aio_wait);
 }
 
 void bdrv_dec_in_flight(BlockDriverState *bs)
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [Qemu-devel] [PATCH v2 20/20] test-bdrv-drain: Test graph changes in drain_all section
  2018-05-29 17:21 [Qemu-devel] [PATCH v2 00/20] Drain fixes and cleanups, part 3 Kevin Wolf
                   ` (18 preceding siblings ...)
  2018-05-29 17:21 ` [Qemu-devel] [PATCH v2 19/20] block: Allow graph changes in bdrv_drain_all_begin/end sections Kevin Wolf
@ 2018-05-29 17:21 ` Kevin Wolf
  2018-05-29 17:45 ` [Qemu-devel] [PATCH v2 00/20] Drain fixes and cleanups, part 3 no-reply
  2018-06-11 12:23 ` Kevin Wolf
  21 siblings, 0 replies; 24+ messages in thread
From: Kevin Wolf @ 2018-05-29 17:21 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, mreitz, pbonzini, famz, stefanha, eblake, qemu-devel

This tests both adding and remove a node between bdrv_drain_all_begin()
and bdrv_drain_all_end(), and enabled the existing detach test for
drain_all.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 tests/test-bdrv-drain.c | 75 +++++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 73 insertions(+), 2 deletions(-)

diff --git a/tests/test-bdrv-drain.c b/tests/test-bdrv-drain.c
index 7b0454a193..51088f318c 100644
--- a/tests/test-bdrv-drain.c
+++ b/tests/test-bdrv-drain.c
@@ -452,7 +452,7 @@ static void test_multiparent(void)
     blk_unref(blk_b);
 }
 
-static void test_graph_change(void)
+static void test_graph_change_drain_subtree(void)
 {
     BlockBackend *blk_a, *blk_b;
     BlockDriverState *bs_a, *bs_b, *backing;
@@ -531,6 +531,63 @@ static void test_graph_change(void)
     blk_unref(blk_b);
 }
 
+static void test_graph_change_drain_all(void)
+{
+    BlockBackend *blk_a, *blk_b;
+    BlockDriverState *bs_a, *bs_b;
+    BDRVTestState *a_s, *b_s;
+
+    /* Create node A with a BlockBackend */
+    blk_a = blk_new(BLK_PERM_ALL, BLK_PERM_ALL);
+    bs_a = bdrv_new_open_driver(&bdrv_test, "test-node-a", BDRV_O_RDWR,
+                                &error_abort);
+    a_s = bs_a->opaque;
+    blk_insert_bs(blk_a, bs_a, &error_abort);
+
+    g_assert_cmpint(bs_a->quiesce_counter, ==, 0);
+    g_assert_cmpint(a_s->drain_count, ==, 0);
+
+    /* Call bdrv_drain_all_begin() */
+    bdrv_drain_all_begin();
+
+    g_assert_cmpint(bs_a->quiesce_counter, ==, 1);
+    g_assert_cmpint(a_s->drain_count, ==, 1);
+
+    /* Create node B with a BlockBackend */
+    blk_b = blk_new(BLK_PERM_ALL, BLK_PERM_ALL);
+    bs_b = bdrv_new_open_driver(&bdrv_test, "test-node-b", BDRV_O_RDWR,
+                                &error_abort);
+    b_s = bs_b->opaque;
+    blk_insert_bs(blk_b, bs_b, &error_abort);
+
+    g_assert_cmpint(bs_a->quiesce_counter, ==, 1);
+    g_assert_cmpint(bs_b->quiesce_counter, ==, 1);
+    g_assert_cmpint(a_s->drain_count, ==, 1);
+    g_assert_cmpint(b_s->drain_count, ==, 1);
+
+    /* Unref and finally delete node A */
+    blk_unref(blk_a);
+
+    g_assert_cmpint(bs_a->quiesce_counter, ==, 1);
+    g_assert_cmpint(bs_b->quiesce_counter, ==, 1);
+    g_assert_cmpint(a_s->drain_count, ==, 1);
+    g_assert_cmpint(b_s->drain_count, ==, 1);
+
+    bdrv_unref(bs_a);
+
+    g_assert_cmpint(bs_b->quiesce_counter, ==, 1);
+    g_assert_cmpint(b_s->drain_count, ==, 1);
+
+    /* End the drained section */
+    bdrv_drain_all_end();
+
+    g_assert_cmpint(bs_b->quiesce_counter, ==, 0);
+    g_assert_cmpint(b_s->drain_count, ==, 0);
+
+    bdrv_unref(bs_b);
+    blk_unref(blk_b);
+}
+
 struct test_iothread_data {
     BlockDriverState *bs;
     enum drain_type drain_type;
@@ -971,6 +1028,10 @@ static void do_test_delete_by_drain(bool detach_instead_of_delete,
         bdrv_subtree_drained_begin(bs);
         bdrv_subtree_drained_end(bs);
         break;
+    case BDRV_DRAIN_ALL:
+        bdrv_drain_all_begin();
+        bdrv_drain_all_end();
+        break;
     default:
         g_assert_not_reached();
     }
@@ -991,6 +1052,11 @@ static void test_delete_by_drain(void)
     do_test_delete_by_drain(false, BDRV_DRAIN);
 }
 
+static void test_detach_by_drain_all(void)
+{
+    do_test_delete_by_drain(true, BDRV_DRAIN_ALL);
+}
+
 static void test_detach_by_drain(void)
 {
     do_test_delete_by_drain(true, BDRV_DRAIN);
@@ -1219,7 +1285,11 @@ int main(int argc, char **argv)
 
     g_test_add_func("/bdrv-drain/nested", test_nested);
     g_test_add_func("/bdrv-drain/multiparent", test_multiparent);
-    g_test_add_func("/bdrv-drain/graph-change", test_graph_change);
+
+    g_test_add_func("/bdrv-drain/graph-change/drain_subtree",
+                    test_graph_change_drain_subtree);
+    g_test_add_func("/bdrv-drain/graph-change/drain_all",
+                    test_graph_change_drain_all);
 
     g_test_add_func("/bdrv-drain/iothread/drain_all", test_iothread_drain_all);
     g_test_add_func("/bdrv-drain/iothread/drain", test_iothread_drain);
@@ -1232,6 +1302,7 @@ int main(int argc, char **argv)
                     test_blockjob_drain_subtree);
 
     g_test_add_func("/bdrv-drain/deletion/drain", test_delete_by_drain);
+    g_test_add_func("/bdrv-drain/detach/drain_all", test_detach_by_drain_all);
     g_test_add_func("/bdrv-drain/detach/drain", test_detach_by_drain);
     g_test_add_func("/bdrv-drain/detach/drain_subtree", test_detach_by_drain_subtree);
     g_test_add_func("/bdrv-drain/detach/parent_cb", test_detach_by_parent_cb);
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] [PATCH v2 00/20] Drain fixes and cleanups, part 3
  2018-05-29 17:21 [Qemu-devel] [PATCH v2 00/20] Drain fixes and cleanups, part 3 Kevin Wolf
                   ` (19 preceding siblings ...)
  2018-05-29 17:21 ` [Qemu-devel] [PATCH v2 20/20] test-bdrv-drain: Test graph changes in drain_all section Kevin Wolf
@ 2018-05-29 17:45 ` no-reply
  2018-06-11 12:23 ` Kevin Wolf
  21 siblings, 0 replies; 24+ messages in thread
From: no-reply @ 2018-05-29 17:45 UTC (permalink / raw)
  To: kwolf; +Cc: famz, qemu-block

Hi,

This series seems to have some coding style problems. See output below for
more information:

Type: series
Message-id: 20180529172156.29311-1-kwolf@redhat.com
Subject: [Qemu-devel] [PATCH v2 00/20] Drain fixes and cleanups, part 3

=== TEST SCRIPT BEGIN ===
#!/bin/bash

BASE=base
n=1
total=$(git log --oneline $BASE.. | wc -l)
failed=0

git config --local diff.renamelimit 0
git config --local diff.renames True
git config --local diff.algorithm histogram

commits="$(git log --format=%H --reverse $BASE..)"
for c in $commits; do
    echo "Checking PATCH $n/$total: $(git log -n 1 --format=%s $c)..."
    if ! git show $c --format=email | ./scripts/checkpatch.pl --mailback -; then
        failed=1
        echo
    fi
    n=$((n+1))
done

exit $failed
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
From https://github.com/patchew-project/qemu
 * [new tag]               patchew/20180529172156.29311-1-kwolf@redhat.com -> patchew/20180529172156.29311-1-kwolf@redhat.com
Switched to a new branch 'test'
eab5e8287f test-bdrv-drain: Test graph changes in drain_all section
3300d94a9b block: Allow graph changes in bdrv_drain_all_begin/end sections
3a37012ee9 block: ignore_bds_parents parameter for drain functions
a0cd9923b2 block: Move bdrv_drain_all_begin() out of coroutine context
7df9cb285f block: Allow AIO_WAIT_WHILE with NULL ctx
fa7d0ac25e test-bdrv-drain: Test that bdrv_drain_invoke() doesn't poll
3c98182ba5 block: Defer .bdrv_drain_begin callback to polling phase
928f6f85a4 test-bdrv-drain: Graph change through parent callback
75e58be3aa block: Don't poll in parent drain callbacks
84fea14482 test-bdrv-drain: Test node deletion in subtree recursion
6e4d9cda70 block: Drain recursively with a single BDRV_POLL_WHILE()
9531ba32b7 test-bdrv-drain: Add test for node deletion
f6fe14abad block: Remove bdrv_drain_recurse()
aa7ade75cf block: Really pause block jobs on drain
00d52a27b6 block: Avoid unnecessary aio_poll() in AIO_WAIT_WHILE()
c96450896a tests/test-bdrv-drain: bdrv_drain_all() works in coroutines now
955c52ae29 block: Don't manually poll in bdrv_drain_all()
09713c9ac7 block: Remove 'recursive' parameter from bdrv_drain_invoke()
e0ccd9c8f2 block: Use bdrv_do_drain_begin/end in bdrv_drain_all()
9e0ea8ed00 test-bdrv-drain: bdrv_drain() works with cross-AioContext events

=== OUTPUT BEGIN ===
Checking PATCH 1/20: test-bdrv-drain: bdrv_drain() works with cross-AioContext events...
Checking PATCH 2/20: block: Use bdrv_do_drain_begin/end in bdrv_drain_all()...
Checking PATCH 3/20: block: Remove 'recursive' parameter from bdrv_drain_invoke()...
Checking PATCH 4/20: block: Don't manually poll in bdrv_drain_all()...
Checking PATCH 5/20: tests/test-bdrv-drain: bdrv_drain_all() works in coroutines now...
Checking PATCH 6/20: block: Avoid unnecessary aio_poll() in AIO_WAIT_WHILE()...
ERROR: trailing statements should be on next line
#38: FILE: block/io.c:190:
+    while (aio_poll(bs->aio_context, false));

ERROR: braces {} are necessary for all arms of this statement
#38: FILE: block/io.c:190:
+    while (aio_poll(bs->aio_context, false));
[...]

total: 2 errors, 0 warnings, 60 lines checked

Your patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

Checking PATCH 7/20: block: Really pause block jobs on drain...
Checking PATCH 8/20: block: Remove bdrv_drain_recurse()...
Checking PATCH 9/20: test-bdrv-drain: Add test for node deletion...
Checking PATCH 10/20: block: Drain recursively with a single BDRV_POLL_WHILE()...
Checking PATCH 11/20: test-bdrv-drain: Test node deletion in subtree recursion...
WARNING: line over 80 characters
#85: FILE: tests/test-bdrv-drain.c:1034:
+    g_test_add_func("/bdrv-drain/detach/drain_subtree", test_detach_by_drain_subtree);

total: 0 errors, 1 warnings, 68 lines checked

Your patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
Checking PATCH 12/20: block: Don't poll in parent drain callbacks...
Checking PATCH 13/20: test-bdrv-drain: Graph change through parent callback...
WARNING: line over 80 characters
#81: FILE: tests/test-bdrv-drain.c:1049:
+    child_a = bdrv_attach_child(parent_b, a, "PB-A", &child_backing, &error_abort);

total: 0 errors, 1 warnings, 142 lines checked

Your patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
Checking PATCH 14/20: block: Defer .bdrv_drain_begin callback to polling phase...
Checking PATCH 15/20: test-bdrv-drain: Test that bdrv_drain_invoke() doesn't poll...
Checking PATCH 16/20: block: Allow AIO_WAIT_WHILE with NULL ctx...
Checking PATCH 17/20: block: Move bdrv_drain_all_begin() out of coroutine context...
WARNING: line over 80 characters
#27: FILE: block/io.c:270:
+            bdrv_do_drained_begin(bs, data->recursive, data->parent, data->poll);

total: 0 errors, 1 warnings, 41 lines checked

Your patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
Checking PATCH 18/20: block: ignore_bds_parents parameter for drain functions...
Checking PATCH 19/20: block: Allow graph changes in bdrv_drain_all_begin/end sections...
ERROR: do not initialise globals to 0 or NULL
#123: FILE: block/io.c:477:
+unsigned int bdrv_drain_all_count = 0;

ERROR: trailing statements should be on next line
#132: FILE: block/io.c:486:
+    while (aio_poll(qemu_get_aio_context(), false));

ERROR: braces {} are necessary for all arms of this statement
#132: FILE: block/io.c:486:
+    while (aio_poll(qemu_get_aio_context(), false));
[...]

total: 3 errors, 0 warnings, 193 lines checked

Your patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

Checking PATCH 20/20: test-bdrv-drain: Test graph changes in drain_all section...
=== OUTPUT END ===

Test command exited with code: 1


---
Email generated automatically by Patchew [http://patchew.org/].
Please send your feedback to patchew-devel@redhat.com

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] [PATCH v2 00/20] Drain fixes and cleanups, part 3
  2018-05-29 17:21 [Qemu-devel] [PATCH v2 00/20] Drain fixes and cleanups, part 3 Kevin Wolf
                   ` (20 preceding siblings ...)
  2018-05-29 17:45 ` [Qemu-devel] [PATCH v2 00/20] Drain fixes and cleanups, part 3 no-reply
@ 2018-06-11 12:23 ` Kevin Wolf
  2018-06-15 16:08   ` [Qemu-devel] [Qemu-block] " Kevin Wolf
  21 siblings, 1 reply; 24+ messages in thread
From: Kevin Wolf @ 2018-06-11 12:23 UTC (permalink / raw)
  To: qemu-block; +Cc: mreitz, pbonzini, famz, stefanha, eblake, qemu-devel

ping?

Am 29.05.2018 um 19:21 hat Kevin Wolf geschrieben:
> This is the third and hopefully for now last part of my work to fix
> drain. The main goal of this series is to make drain robust against
> graph changes that happen in any callbacks of in-flight requests while
> we drain a block node.
> 
> The individual patches describe the details, but the rough plan is to
> change all three drain types (single node, subtree and all) to work like
> this:
> 
> 1. First call all the necessary callbacks to quiesce external sources
>    for new requests. This includes the block driver callbacks, the child
>    node callbacks and disabling external AioContext events. This is done
>    recursively.
> 
>    Much of the trouble we had with drain resulted from the fact that the
>    graph changed while we were traversing the graph recursively. None of
>    the callbacks called in this phase may change the graph.
> 
> 2. Then do a single AIO_WAIT_WHILE() to drain the requests of all
>    affected nodes. The aio_poll() called by it is where graph changes
>    can happen and we need to be careful.
> 
>    However, while evaluating the loop condition, the graph can't change,
>    so we can safely call all necessary callbacks, if needed recursively,
>    to determine whether there are still pending requests in any affected
>    nodes. We just need to make sure that we don't rely on the set of
>    nodes being the same between any two evaluation of the condition.
> 
> There are a few more smaller, mostly self-contained changes needed
> before we're actually safe, but this is the main mechanism that will
> help you understand what we're working towards during the series.
> 
> v2:
> 
> - Rebased on top of current master (e.g. including Job infrastructure)
> 
> - Avoid unnecessary parent callbacks for .drained_begin/poll/end:
>   * subtree drains: Don't propagate the drain to the parent that we came
>                     from recursively
>   * drain_all:      Don't propagate the drain to BDS parents (which are
>                     already separately drained), but only to non-BDS
>                     parents like BBs or Jobs
> 
> - Separate bdrv_drain_poll_top_level() function instead of having a
>   top_level parameter for bdrv_drain_poll().
> 
> - A few commit message and comment improvements
> 
> 
> Kevin Wolf (19):
>   test-bdrv-drain: bdrv_drain() works with cross-AioContext events
>   block: Use bdrv_do_drain_begin/end in bdrv_drain_all()
>   block: Remove 'recursive' parameter from bdrv_drain_invoke()
>   block: Don't manually poll in bdrv_drain_all()
>   tests/test-bdrv-drain: bdrv_drain_all() works in coroutines now
>   block: Avoid unnecessary aio_poll() in AIO_WAIT_WHILE()
>   block: Really pause block jobs on drain
>   block: Remove bdrv_drain_recurse()
>   block: Drain recursively with a single BDRV_POLL_WHILE()
>   test-bdrv-drain: Test node deletion in subtree recursion
>   block: Don't poll in parent drain callbacks
>   test-bdrv-drain: Graph change through parent callback
>   block: Defer .bdrv_drain_begin callback to polling phase
>   test-bdrv-drain: Test that bdrv_drain_invoke() doesn't poll
>   block: Allow AIO_WAIT_WHILE with NULL ctx
>   block: Move bdrv_drain_all_begin() out of coroutine context
>   block: ignore_bds_parents parameter for drain functions
>   block: Allow graph changes in bdrv_drain_all_begin/end sections
>   test-bdrv-drain: Test graph changes in drain_all section
> 
> Max Reitz (1):
>   test-bdrv-drain: Add test for node deletion
> 
>  include/block/aio-wait.h     |  25 +-
>  include/block/block.h        |  31 +-
>  include/block/block_int.h    |  14 +
>  include/block/blockjob_int.h |   8 +
>  block.c                      |  52 +++-
>  block/io.c                   | 332 ++++++++++++--------
>  block/mirror.c               |   8 +
>  block/vvfat.c                |   1 +
>  blockjob.c                   |  23 ++
>  tests/test-bdrv-drain.c      | 705 +++++++++++++++++++++++++++++++++++++++++--
>  10 files changed, 1032 insertions(+), 167 deletions(-)
> 
> -- 
> 2.13.6
> 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH v2 00/20] Drain fixes and cleanups, part 3
  2018-06-11 12:23 ` Kevin Wolf
@ 2018-06-15 16:08   ` Kevin Wolf
  0 siblings, 0 replies; 24+ messages in thread
From: Kevin Wolf @ 2018-06-15 16:08 UTC (permalink / raw)
  To: qemu-block; +Cc: famz, qemu-devel, mreitz, stefanha, pbonzini

Am 11.06.2018 um 14:23 hat Kevin Wolf geschrieben:
> ping?
> 
> Am 29.05.2018 um 19:21 hat Kevin Wolf geschrieben:
> > This is the third and hopefully for now last part of my work to fix
> > drain. The main goal of this series is to make drain robust against
> > graph changes that happen in any callbacks of in-flight requests while
> > we drain a block node.
> > 
> > The individual patches describe the details, but the rough plan is to
> > change all three drain types (single node, subtree and all) to work like
> > this:
> > 
> > 1. First call all the necessary callbacks to quiesce external sources
> >    for new requests. This includes the block driver callbacks, the child
> >    node callbacks and disabling external AioContext events. This is done
> >    recursively.
> > 
> >    Much of the trouble we had with drain resulted from the fact that the
> >    graph changed while we were traversing the graph recursively. None of
> >    the callbacks called in this phase may change the graph.
> > 
> > 2. Then do a single AIO_WAIT_WHILE() to drain the requests of all
> >    affected nodes. The aio_poll() called by it is where graph changes
> >    can happen and we need to be careful.
> > 
> >    However, while evaluating the loop condition, the graph can't change,
> >    so we can safely call all necessary callbacks, if needed recursively,
> >    to determine whether there are still pending requests in any affected
> >    nodes. We just need to make sure that we don't rely on the set of
> >    nodes being the same between any two evaluation of the condition.
> > 
> > There are a few more smaller, mostly self-contained changes needed
> > before we're actually safe, but this is the main mechanism that will
> > help you understand what we're working towards during the series.

Without objection, applied to the block branch.

Kevin

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2018-06-15 16:08 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-05-29 17:21 [Qemu-devel] [PATCH v2 00/20] Drain fixes and cleanups, part 3 Kevin Wolf
2018-05-29 17:21 ` [Qemu-devel] [PATCH v2 01/20] test-bdrv-drain: bdrv_drain() works with cross-AioContext events Kevin Wolf
2018-05-29 17:21 ` [Qemu-devel] [PATCH v2 02/20] block: Use bdrv_do_drain_begin/end in bdrv_drain_all() Kevin Wolf
2018-05-29 17:21 ` [Qemu-devel] [PATCH v2 03/20] block: Remove 'recursive' parameter from bdrv_drain_invoke() Kevin Wolf
2018-05-29 17:21 ` [Qemu-devel] [PATCH v2 04/20] block: Don't manually poll in bdrv_drain_all() Kevin Wolf
2018-05-29 17:21 ` [Qemu-devel] [PATCH v2 05/20] tests/test-bdrv-drain: bdrv_drain_all() works in coroutines now Kevin Wolf
2018-05-29 17:21 ` [Qemu-devel] [PATCH v2 06/20] block: Avoid unnecessary aio_poll() in AIO_WAIT_WHILE() Kevin Wolf
2018-05-29 17:21 ` [Qemu-devel] [PATCH v2 07/20] block: Really pause block jobs on drain Kevin Wolf
2018-05-29 17:21 ` [Qemu-devel] [PATCH v2 08/20] block: Remove bdrv_drain_recurse() Kevin Wolf
2018-05-29 17:21 ` [Qemu-devel] [PATCH v2 09/20] test-bdrv-drain: Add test for node deletion Kevin Wolf
2018-05-29 17:21 ` [Qemu-devel] [PATCH v2 10/20] block: Drain recursively with a single BDRV_POLL_WHILE() Kevin Wolf
2018-05-29 17:21 ` [Qemu-devel] [PATCH v2 11/20] test-bdrv-drain: Test node deletion in subtree recursion Kevin Wolf
2018-05-29 17:21 ` [Qemu-devel] [PATCH v2 12/20] block: Don't poll in parent drain callbacks Kevin Wolf
2018-05-29 17:21 ` [Qemu-devel] [PATCH v2 13/20] test-bdrv-drain: Graph change through parent callback Kevin Wolf
2018-05-29 17:21 ` [Qemu-devel] [PATCH v2 14/20] block: Defer .bdrv_drain_begin callback to polling phase Kevin Wolf
2018-05-29 17:21 ` [Qemu-devel] [PATCH v2 15/20] test-bdrv-drain: Test that bdrv_drain_invoke() doesn't poll Kevin Wolf
2018-05-29 17:21 ` [Qemu-devel] [PATCH v2 16/20] block: Allow AIO_WAIT_WHILE with NULL ctx Kevin Wolf
2018-05-29 17:21 ` [Qemu-devel] [PATCH v2 17/20] block: Move bdrv_drain_all_begin() out of coroutine context Kevin Wolf
2018-05-29 17:21 ` [Qemu-devel] [PATCH v2 18/20] block: ignore_bds_parents parameter for drain functions Kevin Wolf
2018-05-29 17:21 ` [Qemu-devel] [PATCH v2 19/20] block: Allow graph changes in bdrv_drain_all_begin/end sections Kevin Wolf
2018-05-29 17:21 ` [Qemu-devel] [PATCH v2 20/20] test-bdrv-drain: Test graph changes in drain_all section Kevin Wolf
2018-05-29 17:45 ` [Qemu-devel] [PATCH v2 00/20] Drain fixes and cleanups, part 3 no-reply
2018-06-11 12:23 ` Kevin Wolf
2018-06-15 16:08   ` [Qemu-devel] [Qemu-block] " Kevin Wolf

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.