All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 00/15] block: Simplify drain
@ 2022-11-18 17:40 Kevin Wolf
  2022-11-18 17:40 ` [PATCH v2 01/15] qed: Don't yield in bdrv_qed_co_drain_begin() Kevin Wolf
                   ` (16 more replies)
  0 siblings, 17 replies; 31+ messages in thread
From: Kevin Wolf @ 2022-11-18 17:40 UTC (permalink / raw)
  To: qemu-block
  Cc: kwolf, eesposit, stefanha, hreitz, pbonzini, vsementsov, qemu-devel

I'm aware that exactly nobody has been looking forward to a series with
this title, but it has to be. The way drain works means that we need to
poll in bdrv_replace_child_noperm() and that makes things rather messy
with Emanuele's multiqueue work because you must not poll while you hold
the graph lock.

The other reason why it has to be is that drain is way too complex and
there are too many different cases. Some simplification like this will
hopefully make it considerably more maintainable. The diffstat probably
tells something, too.

There are roughly speaking three parts in this series:

1. Make BlockDriver.bdrv_drained_begin/end() non-coroutine_fn again,
   which allows us to not poll on bdrv_drained_end() any more.

2. Remove subtree drains. They are a considerable complication in the
   whole drain machinery (in particular, they require polling in the
   BdrvChildClass.attach/detach() callbacks that are called during
   bdrv_replace_child_noperm()) and none of their users actually has a
   good reason to use them.

3. Finally get rid of polling in bdrv_replace_child_noperm() by
   requiring that the child is already drained by the caller and calling
   callbacks only once and not again for every nested drain section.

If necessary, a prefix of this series can be merged that covers only the
first or the first two parts and it would still make sense.

v2:
- Rebased on master
- Patch 3: Removed left over _co parts in function names
- Patch 4: Updated function comments to reflect that we're not polling
  any more
- Patch 6 (new): Fix inconsistent AioContext locking for reopen code
- Patch 9 (was 8): Added comment to clarify when polling is allowed
  and the graph may change again
- Patch 11 (was 10):
  * Reworded some comments and the commit message.
  * Dropped a now unnecessary assertion that was dropped only in a later
    patch in v1 of the series.
  * Changed 'int parent_quiesce_counter' into 'bool quiesced_parent'
- Patch 12 (was 11): Don't remove ignore_bds_parents from
  bdrv_drain_poll(), it is actually still a valid optimisation there
  that makes polling O(n) instead of O(n²)
- Patch 13 (new): Instead of only removing assert(!qemu_in_coroutine())
  like in v1 of the series, drop out of coroutine context in
  bdrv_do_drained_begin_quiesce() just to be sure that we'll never get
  coroutine surprises in drain code.
- Patch 14 (was 12): More and reworded comments to make things hopefully
  a bit clearer

Kevin Wolf (15):
  qed: Don't yield in bdrv_qed_co_drain_begin()
  test-bdrv-drain: Don't yield in .bdrv_co_drained_begin/end()
  block: Revert .bdrv_drained_begin/end to non-coroutine_fn
  block: Remove drained_end_counter
  block: Inline bdrv_drain_invoke()
  block: Fix locking for bdrv_reopen_queue_child()
  block: Drain invidual nodes during reopen
  block: Don't use subtree drains in bdrv_drop_intermediate()
  stream: Replace subtree drain with a single node drain
  block: Remove subtree drains
  block: Call drain callbacks only once
  block: Remove ignore_bds_parents parameter from drain_begin/end.
  block: Drop out of coroutine in bdrv_do_drained_begin_quiesce()
  block: Don't poll in bdrv_replace_child_noperm()
  block: Remove poll parameter from bdrv_parent_drained_begin_single()

 include/block/block-global-state.h |   3 +
 include/block/block-io.h           |  58 ++---
 include/block/block_int-common.h   |  25 +-
 include/block/block_int-io.h       |  12 -
 block.c                            | 185 ++++++++++-----
 block/block-backend.c              |   4 +-
 block/io.c                         | 290 +++++------------------
 block/qed.c                        |  26 +-
 block/replication.c                |   6 -
 block/stream.c                     |  26 +-
 block/throttle.c                   |   8 +-
 blockdev.c                         |  13 -
 blockjob.c                         |   2 +-
 tests/unit/test-bdrv-drain.c       | 369 +++++++----------------------
 14 files changed, 340 insertions(+), 687 deletions(-)

-- 
2.38.1



^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH v2 01/15] qed: Don't yield in bdrv_qed_co_drain_begin()
  2022-11-18 17:40 [PATCH v2 00/15] block: Simplify drain Kevin Wolf
@ 2022-11-18 17:40 ` Kevin Wolf
  2022-11-18 17:40 ` [PATCH v2 02/15] test-bdrv-drain: Don't yield in .bdrv_co_drained_begin/end() Kevin Wolf
                   ` (15 subsequent siblings)
  16 siblings, 0 replies; 31+ messages in thread
From: Kevin Wolf @ 2022-11-18 17:40 UTC (permalink / raw)
  To: qemu-block
  Cc: kwolf, eesposit, stefanha, hreitz, pbonzini, vsementsov, qemu-devel

We want to change .bdrv_co_drained_begin() back to be a non-coroutine
callback, so in preparation, avoid yielding in its implementation.

Because we increase bs->in_flight and bdrv_drained_begin() polls, the
behaviour is unchanged.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Reviewed-by: Hanna Reitz <hreitz@redhat.com>
---
 block/qed.c | 20 +++++++++++++++++---
 1 file changed, 17 insertions(+), 3 deletions(-)

diff --git a/block/qed.c b/block/qed.c
index 2f36ad342c..013f826c44 100644
--- a/block/qed.c
+++ b/block/qed.c
@@ -282,9 +282,8 @@ static void coroutine_fn qed_unplug_allocating_write_reqs(BDRVQEDState *s)
     qemu_co_mutex_unlock(&s->table_lock);
 }
 
-static void coroutine_fn qed_need_check_timer_entry(void *opaque)
+static void coroutine_fn qed_need_check_timer(BDRVQEDState *s)
 {
-    BDRVQEDState *s = opaque;
     int ret;
 
     trace_qed_need_check_timer_cb(s);
@@ -310,9 +309,20 @@ static void coroutine_fn qed_need_check_timer_entry(void *opaque)
     (void) ret;
 }
 
+static void coroutine_fn qed_need_check_timer_entry(void *opaque)
+{
+    BDRVQEDState *s = opaque;
+
+    qed_need_check_timer(opaque);
+    bdrv_dec_in_flight(s->bs);
+}
+
 static void qed_need_check_timer_cb(void *opaque)
 {
+    BDRVQEDState *s = opaque;
     Coroutine *co = qemu_coroutine_create(qed_need_check_timer_entry, opaque);
+
+    bdrv_inc_in_flight(s->bs);
     qemu_coroutine_enter(co);
 }
 
@@ -363,8 +373,12 @@ static void coroutine_fn bdrv_qed_co_drain_begin(BlockDriverState *bs)
      * header is flushed.
      */
     if (s->need_check_timer && timer_pending(s->need_check_timer)) {
+        Coroutine *co;
+
         qed_cancel_need_check_timer(s);
-        qed_need_check_timer_entry(s);
+        co = qemu_coroutine_create(qed_need_check_timer_entry, s);
+        bdrv_inc_in_flight(bs);
+        aio_co_enter(bdrv_get_aio_context(bs), co);
     }
 }
 
-- 
2.38.1



^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v2 02/15] test-bdrv-drain: Don't yield in .bdrv_co_drained_begin/end()
  2022-11-18 17:40 [PATCH v2 00/15] block: Simplify drain Kevin Wolf
  2022-11-18 17:40 ` [PATCH v2 01/15] qed: Don't yield in bdrv_qed_co_drain_begin() Kevin Wolf
@ 2022-11-18 17:40 ` Kevin Wolf
  2022-11-18 17:40 ` [PATCH v2 03/15] block: Revert .bdrv_drained_begin/end to non-coroutine_fn Kevin Wolf
                   ` (14 subsequent siblings)
  16 siblings, 0 replies; 31+ messages in thread
From: Kevin Wolf @ 2022-11-18 17:40 UTC (permalink / raw)
  To: qemu-block
  Cc: kwolf, eesposit, stefanha, hreitz, pbonzini, vsementsov, qemu-devel

We want to change .bdrv_co_drained_begin/end() back to be non-coroutine
callbacks, so in preparation, avoid yielding in their implementation.

This does almost the same as the existing logic in bdrv_drain_invoke(),
by creating and entering coroutines internally. However, since the test
case is by far the heaviest user of coroutine code in drain callbacks,
it is preferable to have the complexity in the test case rather than the
drain core, which is already complicated enough without this.

The behaviour for bdrv_drain_begin() is unchanged because we increase
bs->in_flight and this is still polled. However, bdrv_drain_end()
doesn't wait for the spawned coroutine to complete any more. This is
fine, we don't rely on bdrv_drain_end() restarting all operations
immediately before the next aio_poll().

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Reviewed-by: Hanna Reitz <hreitz@redhat.com>
---
 tests/unit/test-bdrv-drain.c | 64 ++++++++++++++++++++++++++----------
 1 file changed, 46 insertions(+), 18 deletions(-)

diff --git a/tests/unit/test-bdrv-drain.c b/tests/unit/test-bdrv-drain.c
index 09dc4a4891..24f34e24ad 100644
--- a/tests/unit/test-bdrv-drain.c
+++ b/tests/unit/test-bdrv-drain.c
@@ -38,12 +38,22 @@ typedef struct BDRVTestState {
     bool sleep_in_drain_begin;
 } BDRVTestState;
 
+static void coroutine_fn sleep_in_drain_begin(void *opaque)
+{
+    BlockDriverState *bs = opaque;
+
+    qemu_co_sleep_ns(QEMU_CLOCK_REALTIME, 100000);
+    bdrv_dec_in_flight(bs);
+}
+
 static void coroutine_fn bdrv_test_co_drain_begin(BlockDriverState *bs)
 {
     BDRVTestState *s = bs->opaque;
     s->drain_count++;
     if (s->sleep_in_drain_begin) {
-        qemu_co_sleep_ns(QEMU_CLOCK_REALTIME, 100000);
+        Coroutine *co = qemu_coroutine_create(sleep_in_drain_begin, bs);
+        bdrv_inc_in_flight(bs);
+        aio_co_enter(bdrv_get_aio_context(bs), co);
     }
 }
 
@@ -1916,6 +1926,21 @@ static int coroutine_fn bdrv_replace_test_co_preadv(BlockDriverState *bs,
     return 0;
 }
 
+static void coroutine_fn bdrv_replace_test_drain_co(void *opaque)
+{
+    BlockDriverState *bs = opaque;
+    BDRVReplaceTestState *s = bs->opaque;
+
+    /* Keep waking io_co up until it is done */
+    while (s->io_co) {
+        aio_co_wake(s->io_co);
+        s->io_co = NULL;
+        qemu_coroutine_yield();
+    }
+    s->drain_co = NULL;
+    bdrv_dec_in_flight(bs);
+}
+
 /**
  * If .drain_count is 0, wake up .io_co if there is one; and set
  * .was_drained.
@@ -1926,20 +1951,27 @@ static void coroutine_fn bdrv_replace_test_co_drain_begin(BlockDriverState *bs)
     BDRVReplaceTestState *s = bs->opaque;
 
     if (!s->drain_count) {
-        /* Keep waking io_co up until it is done */
-        s->drain_co = qemu_coroutine_self();
-        while (s->io_co) {
-            aio_co_wake(s->io_co);
-            s->io_co = NULL;
-            qemu_coroutine_yield();
-        }
-        s->drain_co = NULL;
-
+        s->drain_co = qemu_coroutine_create(bdrv_replace_test_drain_co, bs);
+        bdrv_inc_in_flight(bs);
+        aio_co_enter(bdrv_get_aio_context(bs), s->drain_co);
         s->was_drained = true;
     }
     s->drain_count++;
 }
 
+static void coroutine_fn bdrv_replace_test_read_entry(void *opaque)
+{
+    BlockDriverState *bs = opaque;
+    char data;
+    QEMUIOVector qiov = QEMU_IOVEC_INIT_BUF(qiov, &data, 1);
+    int ret;
+
+    /* Queue a read request post-drain */
+    ret = bdrv_replace_test_co_preadv(bs, 0, 1, &qiov, 0);
+    g_assert(ret >= 0);
+    bdrv_dec_in_flight(bs);
+}
+
 /**
  * Reduce .drain_count, set .was_undrained once it reaches 0.
  * If .drain_count reaches 0 and the node has a backing file, issue a
@@ -1951,17 +1983,13 @@ static void coroutine_fn bdrv_replace_test_co_drain_end(BlockDriverState *bs)
 
     g_assert(s->drain_count > 0);
     if (!--s->drain_count) {
-        int ret;
-
         s->was_undrained = true;
 
         if (bs->backing) {
-            char data;
-            QEMUIOVector qiov = QEMU_IOVEC_INIT_BUF(qiov, &data, 1);
-
-            /* Queue a read request post-drain */
-            ret = bdrv_replace_test_co_preadv(bs, 0, 1, &qiov, 0);
-            g_assert(ret >= 0);
+            Coroutine *co = qemu_coroutine_create(bdrv_replace_test_read_entry,
+                                                  bs);
+            bdrv_inc_in_flight(bs);
+            aio_co_enter(bdrv_get_aio_context(bs), co);
         }
     }
 }
-- 
2.38.1



^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v2 03/15] block: Revert .bdrv_drained_begin/end to non-coroutine_fn
  2022-11-18 17:40 [PATCH v2 00/15] block: Simplify drain Kevin Wolf
  2022-11-18 17:40 ` [PATCH v2 01/15] qed: Don't yield in bdrv_qed_co_drain_begin() Kevin Wolf
  2022-11-18 17:40 ` [PATCH v2 02/15] test-bdrv-drain: Don't yield in .bdrv_co_drained_begin/end() Kevin Wolf
@ 2022-11-18 17:40 ` Kevin Wolf
  2022-11-18 17:40 ` [PATCH v2 04/15] block: Remove drained_end_counter Kevin Wolf
                   ` (13 subsequent siblings)
  16 siblings, 0 replies; 31+ messages in thread
From: Kevin Wolf @ 2022-11-18 17:40 UTC (permalink / raw)
  To: qemu-block
  Cc: kwolf, eesposit, stefanha, hreitz, pbonzini, vsementsov, qemu-devel

Polling during bdrv_drained_end() can be problematic (and in the future,
we may get cases for bdrv_drained_begin() where polling is forbidden,
and we don't care about already in-flight requests, but just want to
prevent new requests from arriving).

The .bdrv_drained_begin/end callbacks running in a coroutine is the only
reason why we have to do this polling, so make them non-coroutine
callbacks again. None of the callers actually yield any more.

This means that bdrv_drained_end() effectively doesn't poll any more,
even if AIO_WAIT_WHILE() loops are still there (their condition is false
from the beginning). This is generally not a problem, but in
test-bdrv-drain, some additional explicit aio_poll() calls need to be
added because the test case wants to verify the final state after BHs
have executed.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Reviewed-by: Hanna Reitz <hreitz@redhat.com>
---
 include/block/block_int-common.h | 10 ++++---
 block.c                          |  4 +--
 block/io.c                       | 49 +++++---------------------------
 block/qed.c                      |  6 ++--
 block/throttle.c                 |  8 +++---
 tests/unit/test-bdrv-drain.c     | 18 ++++++------
 6 files changed, 32 insertions(+), 63 deletions(-)

diff --git a/include/block/block_int-common.h b/include/block/block_int-common.h
index 31ae91e56e..40d646d1ed 100644
--- a/include/block/block_int-common.h
+++ b/include/block/block_int-common.h
@@ -735,17 +735,19 @@ struct BlockDriver {
     void (*bdrv_io_unplug)(BlockDriverState *bs);
 
     /**
-     * bdrv_co_drain_begin is called if implemented in the beginning of a
+     * bdrv_drain_begin is called if implemented in the beginning of a
      * drain operation to drain and stop any internal sources of requests in
      * the driver.
-     * bdrv_co_drain_end is called if implemented at the end of the drain.
+     * bdrv_drain_end is called if implemented at the end of the drain.
      *
      * They should be used by the driver to e.g. manage scheduled I/O
      * requests, or toggle an internal state. After the end of the drain new
      * requests will continue normally.
+     *
+     * Implementations of both functions must not call aio_poll().
      */
-    void coroutine_fn (*bdrv_co_drain_begin)(BlockDriverState *bs);
-    void coroutine_fn (*bdrv_co_drain_end)(BlockDriverState *bs);
+    void (*bdrv_drain_begin)(BlockDriverState *bs);
+    void (*bdrv_drain_end)(BlockDriverState *bs);
 
     bool (*bdrv_supports_persistent_dirty_bitmap)(BlockDriverState *bs);
     bool coroutine_fn (*bdrv_co_can_store_new_dirty_bitmap)(
diff --git a/block.c b/block.c
index a18f052374..cbff7acd97 100644
--- a/block.c
+++ b/block.c
@@ -1715,8 +1715,8 @@ static int bdrv_open_driver(BlockDriverState *bs, BlockDriver *drv,
     assert(is_power_of_2(bs->bl.request_alignment));
 
     for (i = 0; i < bs->quiesce_counter; i++) {
-        if (drv->bdrv_co_drain_begin) {
-            drv->bdrv_co_drain_begin(bs);
+        if (drv->bdrv_drain_begin) {
+            drv->bdrv_drain_begin(bs);
         }
     }
 
diff --git a/block/io.c b/block/io.c
index b9424024f9..c2ed4b2af9 100644
--- a/block/io.c
+++ b/block/io.c
@@ -252,55 +252,20 @@ typedef struct {
     int *drained_end_counter;
 } BdrvCoDrainData;
 
-static void coroutine_fn bdrv_drain_invoke_entry(void *opaque)
-{
-    BdrvCoDrainData *data = opaque;
-    BlockDriverState *bs = data->bs;
-
-    if (data->begin) {
-        bs->drv->bdrv_co_drain_begin(bs);
-    } else {
-        bs->drv->bdrv_co_drain_end(bs);
-    }
-
-    /* Set data->done and decrement drained_end_counter before bdrv_wakeup() */
-    qatomic_mb_set(&data->done, true);
-    if (!data->begin) {
-        qatomic_dec(data->drained_end_counter);
-    }
-    bdrv_dec_in_flight(bs);
-
-    g_free(data);
-}
-
-/* Recursively call BlockDriver.bdrv_co_drain_begin/end callbacks */
+/* Recursively call BlockDriver.bdrv_drain_begin/end callbacks */
 static void bdrv_drain_invoke(BlockDriverState *bs, bool begin,
                               int *drained_end_counter)
 {
-    BdrvCoDrainData *data;
-
-    if (!bs->drv || (begin && !bs->drv->bdrv_co_drain_begin) ||
-            (!begin && !bs->drv->bdrv_co_drain_end)) {
+    if (!bs->drv || (begin && !bs->drv->bdrv_drain_begin) ||
+            (!begin && !bs->drv->bdrv_drain_end)) {
         return;
     }
 
-    data = g_new(BdrvCoDrainData, 1);
-    *data = (BdrvCoDrainData) {
-        .bs = bs,
-        .done = false,
-        .begin = begin,
-        .drained_end_counter = drained_end_counter,
-    };
-
-    if (!begin) {
-        qatomic_inc(drained_end_counter);
+    if (begin) {
+        bs->drv->bdrv_drain_begin(bs);
+    } else {
+        bs->drv->bdrv_drain_end(bs);
     }
-
-    /* Make sure the driver callback completes during the polling phase for
-     * drain_begin. */
-    bdrv_inc_in_flight(bs);
-    data->co = qemu_coroutine_create(bdrv_drain_invoke_entry, data);
-    aio_co_schedule(bdrv_get_aio_context(bs), data->co);
 }
 
 /* Returns true if BDRV_POLL_WHILE() should go into a blocking aio_poll() */
diff --git a/block/qed.c b/block/qed.c
index 013f826c44..c2691a85b1 100644
--- a/block/qed.c
+++ b/block/qed.c
@@ -262,7 +262,7 @@ static bool coroutine_fn qed_plug_allocating_write_reqs(BDRVQEDState *s)
     assert(!s->allocating_write_reqs_plugged);
     if (s->allocating_acb != NULL) {
         /* Another allocating write came concurrently.  This cannot happen
-         * from bdrv_qed_co_drain_begin, but it can happen when the timer runs.
+         * from bdrv_qed_drain_begin, but it can happen when the timer runs.
          */
         qemu_co_mutex_unlock(&s->table_lock);
         return false;
@@ -365,7 +365,7 @@ static void bdrv_qed_attach_aio_context(BlockDriverState *bs,
     }
 }
 
-static void coroutine_fn bdrv_qed_co_drain_begin(BlockDriverState *bs)
+static void bdrv_qed_drain_begin(BlockDriverState *bs)
 {
     BDRVQEDState *s = bs->opaque;
 
@@ -1661,7 +1661,7 @@ static BlockDriver bdrv_qed = {
     .bdrv_co_check            = bdrv_qed_co_check,
     .bdrv_detach_aio_context  = bdrv_qed_detach_aio_context,
     .bdrv_attach_aio_context  = bdrv_qed_attach_aio_context,
-    .bdrv_co_drain_begin      = bdrv_qed_co_drain_begin,
+    .bdrv_drain_begin         = bdrv_qed_drain_begin,
 };
 
 static void bdrv_qed_init(void)
diff --git a/block/throttle.c b/block/throttle.c
index 131eba3ab4..88851c84f4 100644
--- a/block/throttle.c
+++ b/block/throttle.c
@@ -214,7 +214,7 @@ static void throttle_reopen_abort(BDRVReopenState *reopen_state)
     reopen_state->opaque = NULL;
 }
 
-static void coroutine_fn throttle_co_drain_begin(BlockDriverState *bs)
+static void throttle_drain_begin(BlockDriverState *bs)
 {
     ThrottleGroupMember *tgm = bs->opaque;
     if (qatomic_fetch_inc(&tgm->io_limits_disabled) == 0) {
@@ -222,7 +222,7 @@ static void coroutine_fn throttle_co_drain_begin(BlockDriverState *bs)
     }
 }
 
-static void coroutine_fn throttle_co_drain_end(BlockDriverState *bs)
+static void throttle_drain_end(BlockDriverState *bs)
 {
     ThrottleGroupMember *tgm = bs->opaque;
     assert(tgm->io_limits_disabled);
@@ -261,8 +261,8 @@ static BlockDriver bdrv_throttle = {
     .bdrv_reopen_commit                 =   throttle_reopen_commit,
     .bdrv_reopen_abort                  =   throttle_reopen_abort,
 
-    .bdrv_co_drain_begin                =   throttle_co_drain_begin,
-    .bdrv_co_drain_end                  =   throttle_co_drain_end,
+    .bdrv_drain_begin                   =   throttle_drain_begin,
+    .bdrv_drain_end                     =   throttle_drain_end,
 
     .is_filter                          =   true,
     .strong_runtime_opts                =   throttle_strong_runtime_opts,
diff --git a/tests/unit/test-bdrv-drain.c b/tests/unit/test-bdrv-drain.c
index 24f34e24ad..695519ee02 100644
--- a/tests/unit/test-bdrv-drain.c
+++ b/tests/unit/test-bdrv-drain.c
@@ -46,7 +46,7 @@ static void coroutine_fn sleep_in_drain_begin(void *opaque)
     bdrv_dec_in_flight(bs);
 }
 
-static void coroutine_fn bdrv_test_co_drain_begin(BlockDriverState *bs)
+static void bdrv_test_drain_begin(BlockDriverState *bs)
 {
     BDRVTestState *s = bs->opaque;
     s->drain_count++;
@@ -57,7 +57,7 @@ static void coroutine_fn bdrv_test_co_drain_begin(BlockDriverState *bs)
     }
 }
 
-static void coroutine_fn bdrv_test_co_drain_end(BlockDriverState *bs)
+static void bdrv_test_drain_end(BlockDriverState *bs)
 {
     BDRVTestState *s = bs->opaque;
     s->drain_count--;
@@ -111,8 +111,8 @@ static BlockDriver bdrv_test = {
     .bdrv_close             = bdrv_test_close,
     .bdrv_co_preadv         = bdrv_test_co_preadv,
 
-    .bdrv_co_drain_begin    = bdrv_test_co_drain_begin,
-    .bdrv_co_drain_end      = bdrv_test_co_drain_end,
+    .bdrv_drain_begin       = bdrv_test_drain_begin,
+    .bdrv_drain_end         = bdrv_test_drain_end,
 
     .bdrv_child_perm        = bdrv_default_perms,
 
@@ -1703,6 +1703,7 @@ static void test_blockjob_commit_by_drained_end(void)
     bdrv_drained_begin(bs_child);
     g_assert(!job_has_completed);
     bdrv_drained_end(bs_child);
+    aio_poll(qemu_get_aio_context(), false);
     g_assert(job_has_completed);
 
     bdrv_unref(bs_parents[0]);
@@ -1858,6 +1859,7 @@ static void test_drop_intermediate_poll(void)
 
     g_assert(!job_has_completed);
     ret = bdrv_drop_intermediate(chain[1], chain[0], NULL);
+    aio_poll(qemu_get_aio_context(), false);
     g_assert(ret == 0);
     g_assert(job_has_completed);
 
@@ -1946,7 +1948,7 @@ static void coroutine_fn bdrv_replace_test_drain_co(void *opaque)
  * .was_drained.
  * Increment .drain_count.
  */
-static void coroutine_fn bdrv_replace_test_co_drain_begin(BlockDriverState *bs)
+static void bdrv_replace_test_drain_begin(BlockDriverState *bs)
 {
     BDRVReplaceTestState *s = bs->opaque;
 
@@ -1977,7 +1979,7 @@ static void coroutine_fn bdrv_replace_test_read_entry(void *opaque)
  * If .drain_count reaches 0 and the node has a backing file, issue a
  * read request.
  */
-static void coroutine_fn bdrv_replace_test_co_drain_end(BlockDriverState *bs)
+static void bdrv_replace_test_drain_end(BlockDriverState *bs)
 {
     BDRVReplaceTestState *s = bs->opaque;
 
@@ -2002,8 +2004,8 @@ static BlockDriver bdrv_replace_test = {
     .bdrv_close             = bdrv_replace_test_close,
     .bdrv_co_preadv         = bdrv_replace_test_co_preadv,
 
-    .bdrv_co_drain_begin    = bdrv_replace_test_co_drain_begin,
-    .bdrv_co_drain_end      = bdrv_replace_test_co_drain_end,
+    .bdrv_drain_begin       = bdrv_replace_test_drain_begin,
+    .bdrv_drain_end         = bdrv_replace_test_drain_end,
 
     .bdrv_child_perm        = bdrv_default_perms,
 };
-- 
2.38.1



^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v2 04/15] block: Remove drained_end_counter
  2022-11-18 17:40 [PATCH v2 00/15] block: Simplify drain Kevin Wolf
                   ` (2 preceding siblings ...)
  2022-11-18 17:40 ` [PATCH v2 03/15] block: Revert .bdrv_drained_begin/end to non-coroutine_fn Kevin Wolf
@ 2022-11-18 17:40 ` Kevin Wolf
  2022-11-18 17:41 ` [PATCH v2 05/15] block: Inline bdrv_drain_invoke() Kevin Wolf
                   ` (12 subsequent siblings)
  16 siblings, 0 replies; 31+ messages in thread
From: Kevin Wolf @ 2022-11-18 17:40 UTC (permalink / raw)
  To: qemu-block
  Cc: kwolf, eesposit, stefanha, hreitz, pbonzini, vsementsov, qemu-devel

drained_end_counter is unused now, nobody changes its value any more. It
can be removed.

In cases where we had two almost identical functions that only differed
in whether the caller passes drained_end_counter, or whether they would
poll for a local drained_end_counter to reach 0, these become a single
function.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
---
 include/block/block-io.h         | 24 --------
 include/block/block_int-common.h |  6 +-
 block.c                          |  5 +-
 block/block-backend.c            |  4 +-
 block/io.c                       | 98 ++++++++------------------------
 blockjob.c                       |  2 +-
 6 files changed, 30 insertions(+), 109 deletions(-)

diff --git a/include/block/block-io.h b/include/block/block-io.h
index b099d7db45..054e964c9b 100644
--- a/include/block/block-io.h
+++ b/include/block/block-io.h
@@ -237,21 +237,6 @@ int coroutine_fn bdrv_co_copy_range(BdrvChild *src, int64_t src_offset,
                                     int64_t bytes, BdrvRequestFlags read_flags,
                                     BdrvRequestFlags write_flags);
 
-/**
- * bdrv_drained_end_no_poll:
- *
- * Same as bdrv_drained_end(), but do not poll for the subgraph to
- * actually become unquiesced.  Therefore, no graph changes will occur
- * with this function.
- *
- * *drained_end_counter is incremented for every background operation
- * that is scheduled, and will be decremented for every operation once
- * it settles.  The caller must poll until it reaches 0.  The counter
- * should be accessed using atomic operations only.
- */
-void bdrv_drained_end_no_poll(BlockDriverState *bs, int *drained_end_counter);
-
-
 /*
  * "I/O or GS" API functions. These functions can run without
  * the BQL, but only in one specific iothread/main loop.
@@ -311,9 +296,6 @@ void bdrv_parent_drained_begin_single(BdrvChild *c, bool poll);
  * bdrv_parent_drained_end_single:
  *
  * End a quiesced section for the parent of @c.
- *
- * This polls @bs's AioContext until all scheduled sub-drained_ends
- * have settled, which may result in graph changes.
  */
 void bdrv_parent_drained_end_single(BdrvChild *c);
 
@@ -361,12 +343,6 @@ void bdrv_subtree_drained_begin(BlockDriverState *bs);
  * bdrv_drained_end:
  *
  * End a quiescent section started by bdrv_drained_begin().
- *
- * This polls @bs's AioContext until all scheduled sub-drained_ends
- * have settled.  On one hand, that may result in graph changes.  On
- * the other, this requires that the caller either runs in the main
- * loop; or that all involved nodes (@bs and all of its parents) are
- * in the caller's AioContext.
  */
 void bdrv_drained_end(BlockDriverState *bs);
 
diff --git a/include/block/block_int-common.h b/include/block/block_int-common.h
index 40d646d1ed..2b97576f6d 100644
--- a/include/block/block_int-common.h
+++ b/include/block/block_int-common.h
@@ -939,15 +939,11 @@ struct BdrvChildClass {
      * These functions must not change the graph (and therefore also must not
      * call aio_poll(), which could change the graph indirectly).
      *
-     * If drained_end() schedules background operations, it must atomically
-     * increment *drained_end_counter for each such operation and atomically
-     * decrement it once the operation has settled.
-     *
      * Note that this can be nested. If drained_begin() was called twice, new
      * I/O is allowed only after drained_end() was called twice, too.
      */
     void (*drained_begin)(BdrvChild *child);
-    void (*drained_end)(BdrvChild *child, int *drained_end_counter);
+    void (*drained_end)(BdrvChild *child);
 
     /*
      * Returns whether the parent has pending requests for the child. This
diff --git a/block.c b/block.c
index cbff7acd97..214e5055a5 100644
--- a/block.c
+++ b/block.c
@@ -1237,11 +1237,10 @@ static bool bdrv_child_cb_drained_poll(BdrvChild *child)
     return bdrv_drain_poll(bs, false, NULL, false);
 }
 
-static void bdrv_child_cb_drained_end(BdrvChild *child,
-                                      int *drained_end_counter)
+static void bdrv_child_cb_drained_end(BdrvChild *child)
 {
     BlockDriverState *bs = child->opaque;
-    bdrv_drained_end_no_poll(bs, drained_end_counter);
+    bdrv_drained_end(bs);
 }
 
 static int bdrv_child_cb_inactivate(BdrvChild *child)
diff --git a/block/block-backend.c b/block/block-backend.c
index b48c91f4e1..742efa7955 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -129,7 +129,7 @@ static void blk_root_inherit_options(BdrvChildRole role, bool parent_is_format,
 }
 static void blk_root_drained_begin(BdrvChild *child);
 static bool blk_root_drained_poll(BdrvChild *child);
-static void blk_root_drained_end(BdrvChild *child, int *drained_end_counter);
+static void blk_root_drained_end(BdrvChild *child);
 
 static void blk_root_change_media(BdrvChild *child, bool load);
 static void blk_root_resize(BdrvChild *child);
@@ -2556,7 +2556,7 @@ static bool blk_root_drained_poll(BdrvChild *child)
     return busy || !!blk->in_flight;
 }
 
-static void blk_root_drained_end(BdrvChild *child, int *drained_end_counter)
+static void blk_root_drained_end(BdrvChild *child)
 {
     BlockBackend *blk = child->opaque;
     assert(blk->quiesce_counter);
diff --git a/block/io.c b/block/io.c
index c2ed4b2af9..f4ca62b034 100644
--- a/block/io.c
+++ b/block/io.c
@@ -58,28 +58,19 @@ static void bdrv_parent_drained_begin(BlockDriverState *bs, BdrvChild *ignore,
     }
 }
 
-static void bdrv_parent_drained_end_single_no_poll(BdrvChild *c,
-                                                   int *drained_end_counter)
+void bdrv_parent_drained_end_single(BdrvChild *c)
 {
+    IO_OR_GS_CODE();
+
     assert(c->parent_quiesce_counter > 0);
     c->parent_quiesce_counter--;
     if (c->klass->drained_end) {
-        c->klass->drained_end(c, drained_end_counter);
+        c->klass->drained_end(c);
     }
 }
 
-void bdrv_parent_drained_end_single(BdrvChild *c)
-{
-    int drained_end_counter = 0;
-    AioContext *ctx = bdrv_child_get_parent_aio_context(c);
-    IO_OR_GS_CODE();
-    bdrv_parent_drained_end_single_no_poll(c, &drained_end_counter);
-    AIO_WAIT_WHILE(ctx, qatomic_read(&drained_end_counter) > 0);
-}
-
 static void bdrv_parent_drained_end(BlockDriverState *bs, BdrvChild *ignore,
-                                    bool ignore_bds_parents,
-                                    int *drained_end_counter)
+                                    bool ignore_bds_parents)
 {
     BdrvChild *c;
 
@@ -87,7 +78,7 @@ static void bdrv_parent_drained_end(BlockDriverState *bs, BdrvChild *ignore,
         if (c == ignore || (ignore_bds_parents && c->klass->parent_is_bds)) {
             continue;
         }
-        bdrv_parent_drained_end_single_no_poll(c, drained_end_counter);
+        bdrv_parent_drained_end_single(c);
     }
 }
 
@@ -249,12 +240,10 @@ typedef struct {
     bool poll;
     BdrvChild *parent;
     bool ignore_bds_parents;
-    int *drained_end_counter;
 } BdrvCoDrainData;
 
 /* Recursively call BlockDriver.bdrv_drain_begin/end callbacks */
-static void bdrv_drain_invoke(BlockDriverState *bs, bool begin,
-                              int *drained_end_counter)
+static void bdrv_drain_invoke(BlockDriverState *bs, bool begin)
 {
     if (!bs->drv || (begin && !bs->drv->bdrv_drain_begin) ||
             (!begin && !bs->drv->bdrv_drain_end)) {
@@ -305,8 +294,7 @@ static void bdrv_do_drained_begin(BlockDriverState *bs, bool recursive,
                                   BdrvChild *parent, bool ignore_bds_parents,
                                   bool poll);
 static void bdrv_do_drained_end(BlockDriverState *bs, bool recursive,
-                                BdrvChild *parent, bool ignore_bds_parents,
-                                int *drained_end_counter);
+                                BdrvChild *parent, bool ignore_bds_parents);
 
 static void bdrv_co_drain_bh_cb(void *opaque)
 {
@@ -319,14 +307,12 @@ static void bdrv_co_drain_bh_cb(void *opaque)
         aio_context_acquire(ctx);
         bdrv_dec_in_flight(bs);
         if (data->begin) {
-            assert(!data->drained_end_counter);
             bdrv_do_drained_begin(bs, data->recursive, data->parent,
                                   data->ignore_bds_parents, data->poll);
         } else {
             assert(!data->poll);
             bdrv_do_drained_end(bs, data->recursive, data->parent,
-                                data->ignore_bds_parents,
-                                data->drained_end_counter);
+                                data->ignore_bds_parents);
         }
         aio_context_release(ctx);
     } else {
@@ -342,8 +328,7 @@ static void coroutine_fn bdrv_co_yield_to_drain(BlockDriverState *bs,
                                                 bool begin, bool recursive,
                                                 BdrvChild *parent,
                                                 bool ignore_bds_parents,
-                                                bool poll,
-                                                int *drained_end_counter)
+                                                bool poll)
 {
     BdrvCoDrainData data;
     Coroutine *self = qemu_coroutine_self();
@@ -363,7 +348,6 @@ static void coroutine_fn bdrv_co_yield_to_drain(BlockDriverState *bs,
         .parent = parent,
         .ignore_bds_parents = ignore_bds_parents,
         .poll = poll,
-        .drained_end_counter = drained_end_counter,
     };
 
     if (bs) {
@@ -406,7 +390,7 @@ void bdrv_do_drained_begin_quiesce(BlockDriverState *bs,
     }
 
     bdrv_parent_drained_begin(bs, parent, ignore_bds_parents);
-    bdrv_drain_invoke(bs, true, NULL);
+    bdrv_drain_invoke(bs, true);
 }
 
 static void bdrv_do_drained_begin(BlockDriverState *bs, bool recursive,
@@ -417,7 +401,7 @@ static void bdrv_do_drained_begin(BlockDriverState *bs, bool recursive,
 
     if (qemu_in_coroutine()) {
         bdrv_co_yield_to_drain(bs, true, recursive, parent, ignore_bds_parents,
-                               poll, NULL);
+                               poll);
         return;
     }
 
@@ -461,38 +445,24 @@ void bdrv_subtree_drained_begin(BlockDriverState *bs)
 
 /**
  * This function does not poll, nor must any of its recursively called
- * functions.  The *drained_end_counter pointee will be incremented
- * once for every background operation scheduled, and decremented once
- * the operation settles.  Therefore, the pointer must remain valid
- * until the pointee reaches 0.  That implies that whoever sets up the
- * pointee has to poll until it is 0.
- *
- * We use atomic operations to access *drained_end_counter, because
- * (1) when called from bdrv_set_aio_context_ignore(), the subgraph of
- *     @bs may contain nodes in different AioContexts,
- * (2) bdrv_drain_all_end() uses the same counter for all nodes,
- *     regardless of which AioContext they are in.
+ * functions.
  */
 static void bdrv_do_drained_end(BlockDriverState *bs, bool recursive,
-                                BdrvChild *parent, bool ignore_bds_parents,
-                                int *drained_end_counter)
+                                BdrvChild *parent, bool ignore_bds_parents)
 {
     BdrvChild *child;
     int old_quiesce_counter;
 
-    assert(drained_end_counter != NULL);
-
     if (qemu_in_coroutine()) {
         bdrv_co_yield_to_drain(bs, false, recursive, parent, ignore_bds_parents,
-                               false, drained_end_counter);
+                               false);
         return;
     }
     assert(bs->quiesce_counter > 0);
 
     /* Re-enable things in child-to-parent order */
-    bdrv_drain_invoke(bs, false, drained_end_counter);
-    bdrv_parent_drained_end(bs, parent, ignore_bds_parents,
-                            drained_end_counter);
+    bdrv_drain_invoke(bs, false);
+    bdrv_parent_drained_end(bs, parent, ignore_bds_parents);
 
     old_quiesce_counter = qatomic_fetch_dec(&bs->quiesce_counter);
     if (old_quiesce_counter == 1) {
@@ -503,32 +473,21 @@ static void bdrv_do_drained_end(BlockDriverState *bs, bool recursive,
         assert(!ignore_bds_parents);
         bs->recursive_quiesce_counter--;
         QLIST_FOREACH(child, &bs->children, next) {
-            bdrv_do_drained_end(child->bs, true, child, ignore_bds_parents,
-                                drained_end_counter);
+            bdrv_do_drained_end(child->bs, true, child, ignore_bds_parents);
         }
     }
 }
 
 void bdrv_drained_end(BlockDriverState *bs)
 {
-    int drained_end_counter = 0;
     IO_OR_GS_CODE();
-    bdrv_do_drained_end(bs, false, NULL, false, &drained_end_counter);
-    BDRV_POLL_WHILE(bs, qatomic_read(&drained_end_counter) > 0);
-}
-
-void bdrv_drained_end_no_poll(BlockDriverState *bs, int *drained_end_counter)
-{
-    IO_CODE();
-    bdrv_do_drained_end(bs, false, NULL, false, drained_end_counter);
+    bdrv_do_drained_end(bs, false, NULL, false);
 }
 
 void bdrv_subtree_drained_end(BlockDriverState *bs)
 {
-    int drained_end_counter = 0;
     IO_OR_GS_CODE();
-    bdrv_do_drained_end(bs, true, NULL, false, &drained_end_counter);
-    BDRV_POLL_WHILE(bs, qatomic_read(&drained_end_counter) > 0);
+    bdrv_do_drained_end(bs, true, NULL, false);
 }
 
 void bdrv_apply_subtree_drain(BdrvChild *child, BlockDriverState *new_parent)
@@ -543,16 +502,12 @@ void bdrv_apply_subtree_drain(BdrvChild *child, BlockDriverState *new_parent)
 
 void bdrv_unapply_subtree_drain(BdrvChild *child, BlockDriverState *old_parent)
 {
-    int drained_end_counter = 0;
     int i;
     IO_OR_GS_CODE();
 
     for (i = 0; i < old_parent->recursive_quiesce_counter; i++) {
-        bdrv_do_drained_end(child->bs, true, child, false,
-                            &drained_end_counter);
+        bdrv_do_drained_end(child->bs, true, child, false);
     }
-
-    BDRV_POLL_WHILE(child->bs, qatomic_read(&drained_end_counter) > 0);
 }
 
 void bdrv_drain(BlockDriverState *bs)
@@ -610,7 +565,7 @@ void bdrv_drain_all_begin(void)
     GLOBAL_STATE_CODE();
 
     if (qemu_in_coroutine()) {
-        bdrv_co_yield_to_drain(NULL, true, false, NULL, true, true, NULL);
+        bdrv_co_yield_to_drain(NULL, true, false, NULL, true, true);
         return;
     }
 
@@ -649,22 +604,19 @@ void bdrv_drain_all_begin(void)
 
 void bdrv_drain_all_end_quiesce(BlockDriverState *bs)
 {
-    int drained_end_counter = 0;
     GLOBAL_STATE_CODE();
 
     g_assert(bs->quiesce_counter > 0);
     g_assert(!bs->refcnt);
 
     while (bs->quiesce_counter) {
-        bdrv_do_drained_end(bs, false, NULL, true, &drained_end_counter);
+        bdrv_do_drained_end(bs, false, NULL, true);
     }
-    BDRV_POLL_WHILE(bs, qatomic_read(&drained_end_counter) > 0);
 }
 
 void bdrv_drain_all_end(void)
 {
     BlockDriverState *bs = NULL;
-    int drained_end_counter = 0;
     GLOBAL_STATE_CODE();
 
     /*
@@ -680,13 +632,11 @@ void bdrv_drain_all_end(void)
         AioContext *aio_context = bdrv_get_aio_context(bs);
 
         aio_context_acquire(aio_context);
-        bdrv_do_drained_end(bs, false, NULL, true, &drained_end_counter);
+        bdrv_do_drained_end(bs, false, NULL, true);
         aio_context_release(aio_context);
     }
 
     assert(qemu_get_current_aio_context() == qemu_get_aio_context());
-    AIO_WAIT_WHILE(NULL, qatomic_read(&drained_end_counter) > 0);
-
     assert(bdrv_drain_all_count > 0);
     bdrv_drain_all_count--;
 }
diff --git a/blockjob.c b/blockjob.c
index f51d4e18f3..0ab721e139 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -120,7 +120,7 @@ static bool child_job_drained_poll(BdrvChild *c)
     }
 }
 
-static void child_job_drained_end(BdrvChild *c, int *drained_end_counter)
+static void child_job_drained_end(BdrvChild *c)
 {
     BlockJob *job = c->opaque;
     job_resume(&job->job);
-- 
2.38.1



^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v2 05/15] block: Inline bdrv_drain_invoke()
  2022-11-18 17:40 [PATCH v2 00/15] block: Simplify drain Kevin Wolf
                   ` (3 preceding siblings ...)
  2022-11-18 17:40 ` [PATCH v2 04/15] block: Remove drained_end_counter Kevin Wolf
@ 2022-11-18 17:41 ` Kevin Wolf
  2022-11-18 17:41 ` [PATCH v2 06/15] block: Fix locking for bdrv_reopen_queue_child() Kevin Wolf
                   ` (11 subsequent siblings)
  16 siblings, 0 replies; 31+ messages in thread
From: Kevin Wolf @ 2022-11-18 17:41 UTC (permalink / raw)
  To: qemu-block
  Cc: kwolf, eesposit, stefanha, hreitz, pbonzini, vsementsov, qemu-devel

bdrv_drain_invoke() has now two entirely separate cases that share no
code any more and are selected depending on a bool parameter. Each case
has only one caller. Just inline the function.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Reviewed-by: Hanna Reitz <hreitz@redhat.com>
---
 block/io.c | 23 ++++++-----------------
 1 file changed, 6 insertions(+), 17 deletions(-)

diff --git a/block/io.c b/block/io.c
index f4ca62b034..a25103be6f 100644
--- a/block/io.c
+++ b/block/io.c
@@ -242,21 +242,6 @@ typedef struct {
     bool ignore_bds_parents;
 } BdrvCoDrainData;
 
-/* Recursively call BlockDriver.bdrv_drain_begin/end callbacks */
-static void bdrv_drain_invoke(BlockDriverState *bs, bool begin)
-{
-    if (!bs->drv || (begin && !bs->drv->bdrv_drain_begin) ||
-            (!begin && !bs->drv->bdrv_drain_end)) {
-        return;
-    }
-
-    if (begin) {
-        bs->drv->bdrv_drain_begin(bs);
-    } else {
-        bs->drv->bdrv_drain_end(bs);
-    }
-}
-
 /* Returns true if BDRV_POLL_WHILE() should go into a blocking aio_poll() */
 bool bdrv_drain_poll(BlockDriverState *bs, bool recursive,
                      BdrvChild *ignore_parent, bool ignore_bds_parents)
@@ -390,7 +375,9 @@ void bdrv_do_drained_begin_quiesce(BlockDriverState *bs,
     }
 
     bdrv_parent_drained_begin(bs, parent, ignore_bds_parents);
-    bdrv_drain_invoke(bs, true);
+    if (bs->drv && bs->drv->bdrv_drain_begin) {
+        bs->drv->bdrv_drain_begin(bs);
+    }
 }
 
 static void bdrv_do_drained_begin(BlockDriverState *bs, bool recursive,
@@ -461,7 +448,9 @@ static void bdrv_do_drained_end(BlockDriverState *bs, bool recursive,
     assert(bs->quiesce_counter > 0);
 
     /* Re-enable things in child-to-parent order */
-    bdrv_drain_invoke(bs, false);
+    if (bs->drv && bs->drv->bdrv_drain_end) {
+        bs->drv->bdrv_drain_end(bs);
+    }
     bdrv_parent_drained_end(bs, parent, ignore_bds_parents);
 
     old_quiesce_counter = qatomic_fetch_dec(&bs->quiesce_counter);
-- 
2.38.1



^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v2 06/15] block: Fix locking for bdrv_reopen_queue_child()
  2022-11-18 17:40 [PATCH v2 00/15] block: Simplify drain Kevin Wolf
                   ` (4 preceding siblings ...)
  2022-11-18 17:41 ` [PATCH v2 05/15] block: Inline bdrv_drain_invoke() Kevin Wolf
@ 2022-11-18 17:41 ` Kevin Wolf
  2022-11-25 12:56   ` Vladimir Sementsov-Ogievskiy
  2022-11-18 17:41 ` [PATCH v2 07/15] block: Drain invidual nodes during reopen Kevin Wolf
                   ` (10 subsequent siblings)
  16 siblings, 1 reply; 31+ messages in thread
From: Kevin Wolf @ 2022-11-18 17:41 UTC (permalink / raw)
  To: qemu-block
  Cc: kwolf, eesposit, stefanha, hreitz, pbonzini, vsementsov, qemu-devel

Callers don't agree whether bdrv_reopen_queue_child() should be called
with the AioContext lock held or not. Standardise on holding the lock
(as done by QMP blockdev-reopen and the replication block driver) and
fix bdrv_reopen() to do the same.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 block.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/block.c b/block.c
index 214e5055a5..191dfc5d0c 100644
--- a/block.c
+++ b/block.c
@@ -4153,6 +4153,8 @@ static bool bdrv_recurse_has_child(BlockDriverState *bs,
  * bs_queue, or the existing bs_queue being used.
  *
  * bs must be drained between bdrv_reopen_queue() and bdrv_reopen_multiple().
+ *
+ * To be called with bs->aio_context locked.
  */
 static BlockReopenQueue *bdrv_reopen_queue_child(BlockReopenQueue *bs_queue,
                                                  BlockDriverState *bs,
@@ -4311,6 +4313,7 @@ static BlockReopenQueue *bdrv_reopen_queue_child(BlockReopenQueue *bs_queue,
     return bs_queue;
 }
 
+/* To be called with bs->aio_context locked */
 BlockReopenQueue *bdrv_reopen_queue(BlockReopenQueue *bs_queue,
                                     BlockDriverState *bs,
                                     QDict *options, bool keep_old_opts)
@@ -4475,11 +4478,11 @@ int bdrv_reopen(BlockDriverState *bs, QDict *opts, bool keep_old_opts,
     GLOBAL_STATE_CODE();
 
     bdrv_subtree_drained_begin(bs);
+    queue = bdrv_reopen_queue(NULL, bs, opts, keep_old_opts);
+
     if (ctx != qemu_get_aio_context()) {
         aio_context_release(ctx);
     }
-
-    queue = bdrv_reopen_queue(NULL, bs, opts, keep_old_opts);
     ret = bdrv_reopen_multiple(queue, errp);
 
     if (ctx != qemu_get_aio_context()) {
-- 
2.38.1



^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v2 07/15] block: Drain invidual nodes during reopen
  2022-11-18 17:40 [PATCH v2 00/15] block: Simplify drain Kevin Wolf
                   ` (5 preceding siblings ...)
  2022-11-18 17:41 ` [PATCH v2 06/15] block: Fix locking for bdrv_reopen_queue_child() Kevin Wolf
@ 2022-11-18 17:41 ` Kevin Wolf
  2022-11-25 13:10   ` Vladimir Sementsov-Ogievskiy
  2022-11-18 17:41 ` [PATCH v2 08/15] block: Don't use subtree drains in bdrv_drop_intermediate() Kevin Wolf
                   ` (9 subsequent siblings)
  16 siblings, 1 reply; 31+ messages in thread
From: Kevin Wolf @ 2022-11-18 17:41 UTC (permalink / raw)
  To: qemu-block
  Cc: kwolf, eesposit, stefanha, hreitz, pbonzini, vsementsov, qemu-devel

bdrv_reopen() and friends use subtree drains as a lazy way of covering
all the nodes they touch. Turns out that this lazy way is a lot more
complicated than just draining the nodes individually, even not
accounting for the additional complexity in the drain mechanism itself.

Simplify the code by switching to draining the individual nodes that are
already managed in the BlockReopenQueue anyway.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 block.c             | 16 +++++++++-------
 block/replication.c |  6 ------
 blockdev.c          | 13 -------------
 3 files changed, 9 insertions(+), 26 deletions(-)

diff --git a/block.c b/block.c
index 191dfc5d0c..59eafcc54c 100644
--- a/block.c
+++ b/block.c
@@ -4152,7 +4152,7 @@ static bool bdrv_recurse_has_child(BlockDriverState *bs,
  * returns a pointer to bs_queue, which is either the newly allocated
  * bs_queue, or the existing bs_queue being used.
  *
- * bs must be drained between bdrv_reopen_queue() and bdrv_reopen_multiple().
+ * bs is drained here and undrained by bdrv_reopen_queue_free().
  *
  * To be called with bs->aio_context locked.
  */
@@ -4174,12 +4174,10 @@ static BlockReopenQueue *bdrv_reopen_queue_child(BlockReopenQueue *bs_queue,
     int flags;
     QemuOpts *opts;
 
-    /* Make sure that the caller remembered to use a drained section. This is
-     * important to avoid graph changes between the recursive queuing here and
-     * bdrv_reopen_multiple(). */
-    assert(bs->quiesce_counter > 0);
     GLOBAL_STATE_CODE();
 
+    bdrv_drained_begin(bs);
+
     if (bs_queue == NULL) {
         bs_queue = g_new0(BlockReopenQueue, 1);
         QTAILQ_INIT(bs_queue);
@@ -4330,6 +4328,12 @@ void bdrv_reopen_queue_free(BlockReopenQueue *bs_queue)
     if (bs_queue) {
         BlockReopenQueueEntry *bs_entry, *next;
         QTAILQ_FOREACH_SAFE(bs_entry, bs_queue, entry, next) {
+            AioContext *ctx = bdrv_get_aio_context(bs_entry->state.bs);
+
+            aio_context_acquire(ctx);
+            bdrv_drained_end(bs_entry->state.bs);
+            aio_context_release(ctx);
+
             qobject_unref(bs_entry->state.explicit_options);
             qobject_unref(bs_entry->state.options);
             g_free(bs_entry);
@@ -4477,7 +4481,6 @@ int bdrv_reopen(BlockDriverState *bs, QDict *opts, bool keep_old_opts,
 
     GLOBAL_STATE_CODE();
 
-    bdrv_subtree_drained_begin(bs);
     queue = bdrv_reopen_queue(NULL, bs, opts, keep_old_opts);
 
     if (ctx != qemu_get_aio_context()) {
@@ -4488,7 +4491,6 @@ int bdrv_reopen(BlockDriverState *bs, QDict *opts, bool keep_old_opts,
     if (ctx != qemu_get_aio_context()) {
         aio_context_acquire(ctx);
     }
-    bdrv_subtree_drained_end(bs);
 
     return ret;
 }
diff --git a/block/replication.c b/block/replication.c
index f1eed25e43..c62f48a874 100644
--- a/block/replication.c
+++ b/block/replication.c
@@ -374,9 +374,6 @@ static void reopen_backing_file(BlockDriverState *bs, bool writable,
         s->orig_secondary_read_only = bdrv_is_read_only(secondary_disk->bs);
     }
 
-    bdrv_subtree_drained_begin(hidden_disk->bs);
-    bdrv_subtree_drained_begin(secondary_disk->bs);
-
     if (s->orig_hidden_read_only) {
         QDict *opts = qdict_new();
         qdict_put_bool(opts, BDRV_OPT_READ_ONLY, !writable);
@@ -401,9 +398,6 @@ static void reopen_backing_file(BlockDriverState *bs, bool writable,
             aio_context_acquire(ctx);
         }
     }
-
-    bdrv_subtree_drained_end(hidden_disk->bs);
-    bdrv_subtree_drained_end(secondary_disk->bs);
 }
 
 static void backup_job_cleanup(BlockDriverState *bs)
diff --git a/blockdev.c b/blockdev.c
index 3f1dec6242..8ffb3d9537 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -3547,8 +3547,6 @@ fail:
 void qmp_blockdev_reopen(BlockdevOptionsList *reopen_list, Error **errp)
 {
     BlockReopenQueue *queue = NULL;
-    GSList *drained = NULL;
-    GSList *p;
 
     /* Add each one of the BDS that we want to reopen to the queue */
     for (; reopen_list != NULL; reopen_list = reopen_list->next) {
@@ -3585,9 +3583,7 @@ void qmp_blockdev_reopen(BlockdevOptionsList *reopen_list, Error **errp)
         ctx = bdrv_get_aio_context(bs);
         aio_context_acquire(ctx);
 
-        bdrv_subtree_drained_begin(bs);
         queue = bdrv_reopen_queue(queue, bs, qdict, false);
-        drained = g_slist_prepend(drained, bs);
 
         aio_context_release(ctx);
     }
@@ -3598,15 +3594,6 @@ void qmp_blockdev_reopen(BlockdevOptionsList *reopen_list, Error **errp)
 
 fail:
     bdrv_reopen_queue_free(queue);
-    for (p = drained; p; p = p->next) {
-        BlockDriverState *bs = p->data;
-        AioContext *ctx = bdrv_get_aio_context(bs);
-
-        aio_context_acquire(ctx);
-        bdrv_subtree_drained_end(bs);
-        aio_context_release(ctx);
-    }
-    g_slist_free(drained);
 }
 
 void qmp_blockdev_del(const char *node_name, Error **errp)
-- 
2.38.1



^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v2 08/15] block: Don't use subtree drains in bdrv_drop_intermediate()
  2022-11-18 17:40 [PATCH v2 00/15] block: Simplify drain Kevin Wolf
                   ` (6 preceding siblings ...)
  2022-11-18 17:41 ` [PATCH v2 07/15] block: Drain invidual nodes during reopen Kevin Wolf
@ 2022-11-18 17:41 ` Kevin Wolf
  2022-11-18 17:41 ` [PATCH v2 09/15] stream: Replace subtree drain with a single node drain Kevin Wolf
                   ` (8 subsequent siblings)
  16 siblings, 0 replies; 31+ messages in thread
From: Kevin Wolf @ 2022-11-18 17:41 UTC (permalink / raw)
  To: qemu-block
  Cc: kwolf, eesposit, stefanha, hreitz, pbonzini, vsementsov, qemu-devel

Instead of using a subtree drain from the top node (which also drains
child nodes of base that we're not even interested in), use a normal
drain for base, which automatically drains all of the parents, too.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Reviewed-by: Hanna Reitz <hreitz@redhat.com>
---
 block.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/block.c b/block.c
index 59eafcc54c..298954d514 100644
--- a/block.c
+++ b/block.c
@@ -5599,7 +5599,7 @@ int bdrv_drop_intermediate(BlockDriverState *top, BlockDriverState *base,
     GLOBAL_STATE_CODE();
 
     bdrv_ref(top);
-    bdrv_subtree_drained_begin(top);
+    bdrv_drained_begin(base);
 
     if (!top->drv || !base->drv) {
         goto exit;
@@ -5672,7 +5672,7 @@ int bdrv_drop_intermediate(BlockDriverState *top, BlockDriverState *base,
 
     ret = 0;
 exit:
-    bdrv_subtree_drained_end(top);
+    bdrv_drained_end(base);
     bdrv_unref(top);
     return ret;
 }
-- 
2.38.1



^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v2 09/15] stream: Replace subtree drain with a single node drain
  2022-11-18 17:40 [PATCH v2 00/15] block: Simplify drain Kevin Wolf
                   ` (7 preceding siblings ...)
  2022-11-18 17:41 ` [PATCH v2 08/15] block: Don't use subtree drains in bdrv_drop_intermediate() Kevin Wolf
@ 2022-11-18 17:41 ` Kevin Wolf
  2022-11-18 17:41 ` [PATCH v2 10/15] block: Remove subtree drains Kevin Wolf
                   ` (7 subsequent siblings)
  16 siblings, 0 replies; 31+ messages in thread
From: Kevin Wolf @ 2022-11-18 17:41 UTC (permalink / raw)
  To: qemu-block
  Cc: kwolf, eesposit, stefanha, hreitz, pbonzini, vsementsov, qemu-devel

The subtree drain was introduced in commit b1e1af394d9 as a way to avoid
graph changes between finding the base node and changing the block graph
as necessary on completion of the image streaming job.

The block graph could change between these two points because
bdrv_set_backing_hd() first drains the parent node, which involved
polling and can do anything.

Subtree draining was an imperfect way to make this less likely (because
with it, fewer callbacks are called during this window). Everyone agreed
that it's not really the right solution, and it was only committed as a
stopgap solution.

This replaces the subtree drain with a solution that simply drains the
parent node before we try to find the base node, and then call a version
of bdrv_set_backing_hd() that doesn't drain, but just asserts that the
parent node is already drained.

This way, any graph changes caused by draining happen before we start
looking at the graph and things stay consistent between finding the base
node and changing the graph.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Reviewed-by: Hanna Reitz <hreitz@redhat.com>
---
 include/block/block-global-state.h |  3 +++
 block.c                            | 17 ++++++++++++++---
 block/stream.c                     | 26 ++++++++++++++++----------
 3 files changed, 33 insertions(+), 13 deletions(-)

diff --git a/include/block/block-global-state.h b/include/block/block-global-state.h
index c7bd4a2088..00e0cf8aea 100644
--- a/include/block/block-global-state.h
+++ b/include/block/block-global-state.h
@@ -82,6 +82,9 @@ int bdrv_open_file_child(const char *filename,
 BlockDriverState *bdrv_open_blockdev_ref(BlockdevRef *ref, Error **errp);
 int bdrv_set_backing_hd(BlockDriverState *bs, BlockDriverState *backing_hd,
                         Error **errp);
+int bdrv_set_backing_hd_drained(BlockDriverState *bs,
+                                BlockDriverState *backing_hd,
+                                Error **errp);
 int bdrv_open_backing_file(BlockDriverState *bs, QDict *parent_options,
                            const char *bdref_key, Error **errp);
 BlockDriverState *bdrv_open(const char *filename, const char *reference,
diff --git a/block.c b/block.c
index 298954d514..01a27bd77f 100644
--- a/block.c
+++ b/block.c
@@ -3405,14 +3405,15 @@ static int bdrv_set_backing_noperm(BlockDriverState *bs,
     return bdrv_set_file_or_backing_noperm(bs, backing_hd, true, tran, errp);
 }
 
-int bdrv_set_backing_hd(BlockDriverState *bs, BlockDriverState *backing_hd,
-                        Error **errp)
+int bdrv_set_backing_hd_drained(BlockDriverState *bs,
+                                BlockDriverState *backing_hd,
+                                Error **errp)
 {
     int ret;
     Transaction *tran = tran_new();
 
     GLOBAL_STATE_CODE();
-    bdrv_drained_begin(bs);
+    assert(bs->quiesce_counter > 0);
 
     ret = bdrv_set_backing_noperm(bs, backing_hd, tran, errp);
     if (ret < 0) {
@@ -3422,7 +3423,17 @@ int bdrv_set_backing_hd(BlockDriverState *bs, BlockDriverState *backing_hd,
     ret = bdrv_refresh_perms(bs, errp);
 out:
     tran_finalize(tran, ret);
+    return ret;
+}
 
+int bdrv_set_backing_hd(BlockDriverState *bs, BlockDriverState *backing_hd,
+                        Error **errp)
+{
+    int ret;
+    GLOBAL_STATE_CODE();
+
+    bdrv_drained_begin(bs);
+    ret = bdrv_set_backing_hd_drained(bs, backing_hd, errp);
     bdrv_drained_end(bs);
 
     return ret;
diff --git a/block/stream.c b/block/stream.c
index 694709bd25..8744ad103f 100644
--- a/block/stream.c
+++ b/block/stream.c
@@ -64,13 +64,16 @@ static int stream_prepare(Job *job)
     bdrv_cor_filter_drop(s->cor_filter_bs);
     s->cor_filter_bs = NULL;
 
-    bdrv_subtree_drained_begin(s->above_base);
+    /*
+     * bdrv_set_backing_hd() requires that unfiltered_bs is drained. Drain
+     * already here and use bdrv_set_backing_hd_drained() instead because
+     * the polling during drained_begin() might change the graph, and if we do
+     * this only later, we may end up working with the wrong base node (or it
+     * might even have gone away by the time we want to use it).
+     */
+    bdrv_drained_begin(unfiltered_bs);
 
     base = bdrv_filter_or_cow_bs(s->above_base);
-    if (base) {
-        bdrv_ref(base);
-    }
-
     unfiltered_base = bdrv_skip_filters(base);
 
     if (bdrv_cow_child(unfiltered_bs)) {
@@ -82,7 +85,13 @@ static int stream_prepare(Job *job)
             }
         }
 
-        bdrv_set_backing_hd(unfiltered_bs, base, &local_err);
+        bdrv_set_backing_hd_drained(unfiltered_bs, base, &local_err);
+
+        /*
+         * This call will do I/O, so the graph can change again from here on.
+         * We have already completed the graph change, so we are not in danger
+         * of operating on the wrong node any more if this happens.
+         */
         ret = bdrv_change_backing_file(unfiltered_bs, base_id, base_fmt, false);
         if (local_err) {
             error_report_err(local_err);
@@ -92,10 +101,7 @@ static int stream_prepare(Job *job)
     }
 
 out:
-    if (base) {
-        bdrv_unref(base);
-    }
-    bdrv_subtree_drained_end(s->above_base);
+    bdrv_drained_end(unfiltered_bs);
     return ret;
 }
 
-- 
2.38.1



^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v2 10/15] block: Remove subtree drains
  2022-11-18 17:40 [PATCH v2 00/15] block: Simplify drain Kevin Wolf
                   ` (8 preceding siblings ...)
  2022-11-18 17:41 ` [PATCH v2 09/15] stream: Replace subtree drain with a single node drain Kevin Wolf
@ 2022-11-18 17:41 ` Kevin Wolf
  2022-11-18 17:41 ` [PATCH v2 11/15] block: Call drain callbacks only once Kevin Wolf
                   ` (6 subsequent siblings)
  16 siblings, 0 replies; 31+ messages in thread
From: Kevin Wolf @ 2022-11-18 17:41 UTC (permalink / raw)
  To: qemu-block
  Cc: kwolf, eesposit, stefanha, hreitz, pbonzini, vsementsov, qemu-devel

Subtree drains are not used any more. Remove them.

After this, BdrvChildClass.attach/detach() don't poll any more.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Reviewed-by: Hanna Reitz <hreitz@redhat.com>
---
 include/block/block-io.h         |  18 +--
 include/block/block_int-common.h |   1 -
 include/block/block_int-io.h     |  12 --
 block.c                          |  20 +--
 block/io.c                       | 121 +++-----------
 tests/unit/test-bdrv-drain.c     | 261 ++-----------------------------
 6 files changed, 44 insertions(+), 389 deletions(-)

diff --git a/include/block/block-io.h b/include/block/block-io.h
index 054e964c9b..9c36a16a1f 100644
--- a/include/block/block-io.h
+++ b/include/block/block-io.h
@@ -302,8 +302,7 @@ void bdrv_parent_drained_end_single(BdrvChild *c);
 /**
  * bdrv_drain_poll:
  *
- * Poll for pending requests in @bs, its parents (except for @ignore_parent),
- * and if @recursive is true its children as well (used for subtree drain).
+ * Poll for pending requests in @bs and its parents (except for @ignore_parent).
  *
  * If @ignore_bds_parents is true, parents that are BlockDriverStates must
  * ignore the drain request because they will be drained separately (used for
@@ -311,8 +310,8 @@ void bdrv_parent_drained_end_single(BdrvChild *c);
  *
  * This is part of bdrv_drained_begin.
  */
-bool bdrv_drain_poll(BlockDriverState *bs, bool recursive,
-                     BdrvChild *ignore_parent, bool ignore_bds_parents);
+bool bdrv_drain_poll(BlockDriverState *bs, BdrvChild *ignore_parent,
+                     bool ignore_bds_parents);
 
 /**
  * bdrv_drained_begin:
@@ -333,12 +332,6 @@ void bdrv_drained_begin(BlockDriverState *bs);
 void bdrv_do_drained_begin_quiesce(BlockDriverState *bs,
                                    BdrvChild *parent, bool ignore_bds_parents);
 
-/**
- * Like bdrv_drained_begin, but recursively begins a quiesced section for
- * exclusive access to all child nodes as well.
- */
-void bdrv_subtree_drained_begin(BlockDriverState *bs);
-
 /**
  * bdrv_drained_end:
  *
@@ -346,9 +339,4 @@ void bdrv_subtree_drained_begin(BlockDriverState *bs);
  */
 void bdrv_drained_end(BlockDriverState *bs);
 
-/**
- * End a quiescent section started by bdrv_subtree_drained_begin().
- */
-void bdrv_subtree_drained_end(BlockDriverState *bs);
-
 #endif /* BLOCK_IO_H */
diff --git a/include/block/block_int-common.h b/include/block/block_int-common.h
index 2b97576f6d..791dddfd7d 100644
--- a/include/block/block_int-common.h
+++ b/include/block/block_int-common.h
@@ -1184,7 +1184,6 @@ struct BlockDriverState {
 
     /* Accessed with atomic ops.  */
     int quiesce_counter;
-    int recursive_quiesce_counter;
 
     unsigned int write_gen;               /* Current data generation */
 
diff --git a/include/block/block_int-io.h b/include/block/block_int-io.h
index 4b0b3e17ef..8bc061ebb8 100644
--- a/include/block/block_int-io.h
+++ b/include/block/block_int-io.h
@@ -179,16 +179,4 @@ void bdrv_bsc_invalidate_range(BlockDriverState *bs,
  */
 void bdrv_bsc_fill(BlockDriverState *bs, int64_t offset, int64_t bytes);
 
-
-/*
- * "I/O or GS" API functions. These functions can run without
- * the BQL, but only in one specific iothread/main loop.
- *
- * See include/block/block-io.h for more information about
- * the "I/O or GS" API.
- */
-
-void bdrv_apply_subtree_drain(BdrvChild *child, BlockDriverState *new_parent);
-void bdrv_unapply_subtree_drain(BdrvChild *child, BlockDriverState *old_parent);
-
 #endif /* BLOCK_INT_IO_H */
diff --git a/block.c b/block.c
index 01a27bd77f..c589c3432b 100644
--- a/block.c
+++ b/block.c
@@ -1234,7 +1234,7 @@ static void bdrv_child_cb_drained_begin(BdrvChild *child)
 static bool bdrv_child_cb_drained_poll(BdrvChild *child)
 {
     BlockDriverState *bs = child->opaque;
-    return bdrv_drain_poll(bs, false, NULL, false);
+    return bdrv_drain_poll(bs, NULL, false);
 }
 
 static void bdrv_child_cb_drained_end(BdrvChild *child)
@@ -1484,8 +1484,6 @@ static void bdrv_child_cb_attach(BdrvChild *child)
         assert(!bs->file);
         bs->file = child;
     }
-
-    bdrv_apply_subtree_drain(child, bs);
 }
 
 static void bdrv_child_cb_detach(BdrvChild *child)
@@ -1496,8 +1494,6 @@ static void bdrv_child_cb_detach(BdrvChild *child)
         bdrv_backing_detach(child);
     }
 
-    bdrv_unapply_subtree_drain(child, bs);
-
     assert_bdrv_graph_writable(bs);
     QLIST_REMOVE(child, next);
     if (child == bs->backing) {
@@ -2853,9 +2849,6 @@ static void bdrv_replace_child_noperm(BdrvChild *child,
     }
 
     if (old_bs) {
-        /* Detach first so that the recursive drain sections coming from @child
-         * are already gone and we only end the drain sections that came from
-         * elsewhere. */
         if (child->klass->detach) {
             child->klass->detach(child);
         }
@@ -2870,17 +2863,14 @@ static void bdrv_replace_child_noperm(BdrvChild *child,
         QLIST_INSERT_HEAD(&new_bs->parents, child, next_parent);
 
         /*
-         * Detaching the old node may have led to the new node's
-         * quiesce_counter having been decreased.  Not a problem, we
-         * just need to recognize this here and then invoke
-         * drained_end appropriately more often.
+         * Polling in bdrv_parent_drained_begin_single() may have led to the new
+         * node's quiesce_counter having been decreased.  Not a problem, we just
+         * need to recognize this here and then invoke drained_end appropriately
+         * more often.
          */
         assert(new_bs->quiesce_counter <= new_bs_quiesce_counter);
         drain_saldo += new_bs->quiesce_counter - new_bs_quiesce_counter;
 
-        /* Attach only after starting new drained sections, so that recursive
-         * drain sections coming from @child don't get an extra .drained_begin
-         * callback. */
         if (child->klass->attach) {
             child->klass->attach(child);
         }
diff --git a/block/io.c b/block/io.c
index a25103be6f..75224480d0 100644
--- a/block/io.c
+++ b/block/io.c
@@ -236,17 +236,15 @@ typedef struct {
     BlockDriverState *bs;
     bool done;
     bool begin;
-    bool recursive;
     bool poll;
     BdrvChild *parent;
     bool ignore_bds_parents;
 } BdrvCoDrainData;
 
 /* Returns true if BDRV_POLL_WHILE() should go into a blocking aio_poll() */
-bool bdrv_drain_poll(BlockDriverState *bs, bool recursive,
-                     BdrvChild *ignore_parent, bool ignore_bds_parents)
+bool bdrv_drain_poll(BlockDriverState *bs, BdrvChild *ignore_parent,
+                     bool ignore_bds_parents)
 {
-    BdrvChild *child, *next;
     IO_OR_GS_CODE();
 
     if (bdrv_parent_drained_poll(bs, ignore_parent, ignore_bds_parents)) {
@@ -257,29 +255,19 @@ bool bdrv_drain_poll(BlockDriverState *bs, bool recursive,
         return true;
     }
 
-    if (recursive) {
-        assert(!ignore_bds_parents);
-        QLIST_FOREACH_SAFE(child, &bs->children, next, next) {
-            if (bdrv_drain_poll(child->bs, recursive, child, false)) {
-                return true;
-            }
-        }
-    }
-
     return false;
 }
 
-static bool bdrv_drain_poll_top_level(BlockDriverState *bs, bool recursive,
+static bool bdrv_drain_poll_top_level(BlockDriverState *bs,
                                       BdrvChild *ignore_parent)
 {
-    return bdrv_drain_poll(bs, recursive, ignore_parent, false);
+    return bdrv_drain_poll(bs, ignore_parent, false);
 }
 
-static void bdrv_do_drained_begin(BlockDriverState *bs, bool recursive,
-                                  BdrvChild *parent, bool ignore_bds_parents,
-                                  bool poll);
-static void bdrv_do_drained_end(BlockDriverState *bs, bool recursive,
-                                BdrvChild *parent, bool ignore_bds_parents);
+static void bdrv_do_drained_begin(BlockDriverState *bs, BdrvChild *parent,
+                                  bool ignore_bds_parents, bool poll);
+static void bdrv_do_drained_end(BlockDriverState *bs, BdrvChild *parent,
+                                bool ignore_bds_parents);
 
 static void bdrv_co_drain_bh_cb(void *opaque)
 {
@@ -292,12 +280,11 @@ static void bdrv_co_drain_bh_cb(void *opaque)
         aio_context_acquire(ctx);
         bdrv_dec_in_flight(bs);
         if (data->begin) {
-            bdrv_do_drained_begin(bs, data->recursive, data->parent,
-                                  data->ignore_bds_parents, data->poll);
+            bdrv_do_drained_begin(bs, data->parent, data->ignore_bds_parents,
+                                  data->poll);
         } else {
             assert(!data->poll);
-            bdrv_do_drained_end(bs, data->recursive, data->parent,
-                                data->ignore_bds_parents);
+            bdrv_do_drained_end(bs, data->parent, data->ignore_bds_parents);
         }
         aio_context_release(ctx);
     } else {
@@ -310,7 +297,7 @@ static void bdrv_co_drain_bh_cb(void *opaque)
 }
 
 static void coroutine_fn bdrv_co_yield_to_drain(BlockDriverState *bs,
-                                                bool begin, bool recursive,
+                                                bool begin,
                                                 BdrvChild *parent,
                                                 bool ignore_bds_parents,
                                                 bool poll)
@@ -329,7 +316,6 @@ static void coroutine_fn bdrv_co_yield_to_drain(BlockDriverState *bs,
         .bs = bs,
         .done = false,
         .begin = begin,
-        .recursive = recursive,
         .parent = parent,
         .ignore_bds_parents = ignore_bds_parents,
         .poll = poll,
@@ -380,29 +366,16 @@ void bdrv_do_drained_begin_quiesce(BlockDriverState *bs,
     }
 }
 
-static void bdrv_do_drained_begin(BlockDriverState *bs, bool recursive,
-                                  BdrvChild *parent, bool ignore_bds_parents,
-                                  bool poll)
+static void bdrv_do_drained_begin(BlockDriverState *bs, BdrvChild *parent,
+                                  bool ignore_bds_parents, bool poll)
 {
-    BdrvChild *child, *next;
-
     if (qemu_in_coroutine()) {
-        bdrv_co_yield_to_drain(bs, true, recursive, parent, ignore_bds_parents,
-                               poll);
+        bdrv_co_yield_to_drain(bs, true, parent, ignore_bds_parents, poll);
         return;
     }
 
     bdrv_do_drained_begin_quiesce(bs, parent, ignore_bds_parents);
 
-    if (recursive) {
-        assert(!ignore_bds_parents);
-        bs->recursive_quiesce_counter++;
-        QLIST_FOREACH_SAFE(child, &bs->children, next, next) {
-            bdrv_do_drained_begin(child->bs, true, child, ignore_bds_parents,
-                                  false);
-        }
-    }
-
     /*
      * Wait for drained requests to finish.
      *
@@ -414,35 +387,27 @@ static void bdrv_do_drained_begin(BlockDriverState *bs, bool recursive,
      */
     if (poll) {
         assert(!ignore_bds_parents);
-        BDRV_POLL_WHILE(bs, bdrv_drain_poll_top_level(bs, recursive, parent));
+        BDRV_POLL_WHILE(bs, bdrv_drain_poll_top_level(bs, parent));
     }
 }
 
 void bdrv_drained_begin(BlockDriverState *bs)
 {
     IO_OR_GS_CODE();
-    bdrv_do_drained_begin(bs, false, NULL, false, true);
-}
-
-void bdrv_subtree_drained_begin(BlockDriverState *bs)
-{
-    IO_OR_GS_CODE();
-    bdrv_do_drained_begin(bs, true, NULL, false, true);
+    bdrv_do_drained_begin(bs, NULL, false, true);
 }
 
 /**
  * This function does not poll, nor must any of its recursively called
  * functions.
  */
-static void bdrv_do_drained_end(BlockDriverState *bs, bool recursive,
-                                BdrvChild *parent, bool ignore_bds_parents)
+static void bdrv_do_drained_end(BlockDriverState *bs, BdrvChild *parent,
+                                bool ignore_bds_parents)
 {
-    BdrvChild *child;
     int old_quiesce_counter;
 
     if (qemu_in_coroutine()) {
-        bdrv_co_yield_to_drain(bs, false, recursive, parent, ignore_bds_parents,
-                               false);
+        bdrv_co_yield_to_drain(bs, false, parent, ignore_bds_parents, false);
         return;
     }
     assert(bs->quiesce_counter > 0);
@@ -457,46 +422,12 @@ static void bdrv_do_drained_end(BlockDriverState *bs, bool recursive,
     if (old_quiesce_counter == 1) {
         aio_enable_external(bdrv_get_aio_context(bs));
     }
-
-    if (recursive) {
-        assert(!ignore_bds_parents);
-        bs->recursive_quiesce_counter--;
-        QLIST_FOREACH(child, &bs->children, next) {
-            bdrv_do_drained_end(child->bs, true, child, ignore_bds_parents);
-        }
-    }
 }
 
 void bdrv_drained_end(BlockDriverState *bs)
 {
     IO_OR_GS_CODE();
-    bdrv_do_drained_end(bs, false, NULL, false);
-}
-
-void bdrv_subtree_drained_end(BlockDriverState *bs)
-{
-    IO_OR_GS_CODE();
-    bdrv_do_drained_end(bs, true, NULL, false);
-}
-
-void bdrv_apply_subtree_drain(BdrvChild *child, BlockDriverState *new_parent)
-{
-    int i;
-    IO_OR_GS_CODE();
-
-    for (i = 0; i < new_parent->recursive_quiesce_counter; i++) {
-        bdrv_do_drained_begin(child->bs, true, child, false, true);
-    }
-}
-
-void bdrv_unapply_subtree_drain(BdrvChild *child, BlockDriverState *old_parent)
-{
-    int i;
-    IO_OR_GS_CODE();
-
-    for (i = 0; i < old_parent->recursive_quiesce_counter; i++) {
-        bdrv_do_drained_end(child->bs, true, child, false);
-    }
+    bdrv_do_drained_end(bs, NULL, false);
 }
 
 void bdrv_drain(BlockDriverState *bs)
@@ -529,7 +460,7 @@ static bool bdrv_drain_all_poll(void)
     while ((bs = bdrv_next_all_states(bs))) {
         AioContext *aio_context = bdrv_get_aio_context(bs);
         aio_context_acquire(aio_context);
-        result |= bdrv_drain_poll(bs, false, NULL, true);
+        result |= bdrv_drain_poll(bs, NULL, true);
         aio_context_release(aio_context);
     }
 
@@ -554,7 +485,7 @@ void bdrv_drain_all_begin(void)
     GLOBAL_STATE_CODE();
 
     if (qemu_in_coroutine()) {
-        bdrv_co_yield_to_drain(NULL, true, false, NULL, true, true);
+        bdrv_co_yield_to_drain(NULL, true, NULL, true, true);
         return;
     }
 
@@ -579,7 +510,7 @@ void bdrv_drain_all_begin(void)
         AioContext *aio_context = bdrv_get_aio_context(bs);
 
         aio_context_acquire(aio_context);
-        bdrv_do_drained_begin(bs, false, NULL, true, false);
+        bdrv_do_drained_begin(bs, NULL, true, false);
         aio_context_release(aio_context);
     }
 
@@ -599,7 +530,7 @@ void bdrv_drain_all_end_quiesce(BlockDriverState *bs)
     g_assert(!bs->refcnt);
 
     while (bs->quiesce_counter) {
-        bdrv_do_drained_end(bs, false, NULL, true);
+        bdrv_do_drained_end(bs, NULL, true);
     }
 }
 
@@ -621,7 +552,7 @@ void bdrv_drain_all_end(void)
         AioContext *aio_context = bdrv_get_aio_context(bs);
 
         aio_context_acquire(aio_context);
-        bdrv_do_drained_end(bs, false, NULL, true);
+        bdrv_do_drained_end(bs, NULL, true);
         aio_context_release(aio_context);
     }
 
diff --git a/tests/unit/test-bdrv-drain.c b/tests/unit/test-bdrv-drain.c
index 695519ee02..dda08de8db 100644
--- a/tests/unit/test-bdrv-drain.c
+++ b/tests/unit/test-bdrv-drain.c
@@ -156,7 +156,6 @@ static void call_in_coroutine(void (*entry)(void))
 enum drain_type {
     BDRV_DRAIN_ALL,
     BDRV_DRAIN,
-    BDRV_SUBTREE_DRAIN,
     DRAIN_TYPE_MAX,
 };
 
@@ -165,7 +164,6 @@ static void do_drain_begin(enum drain_type drain_type, BlockDriverState *bs)
     switch (drain_type) {
     case BDRV_DRAIN_ALL:        bdrv_drain_all_begin(); break;
     case BDRV_DRAIN:            bdrv_drained_begin(bs); break;
-    case BDRV_SUBTREE_DRAIN:    bdrv_subtree_drained_begin(bs); break;
     default:                    g_assert_not_reached();
     }
 }
@@ -175,7 +173,6 @@ static void do_drain_end(enum drain_type drain_type, BlockDriverState *bs)
     switch (drain_type) {
     case BDRV_DRAIN_ALL:        bdrv_drain_all_end(); break;
     case BDRV_DRAIN:            bdrv_drained_end(bs); break;
-    case BDRV_SUBTREE_DRAIN:    bdrv_subtree_drained_end(bs); break;
     default:                    g_assert_not_reached();
     }
 }
@@ -271,11 +268,6 @@ static void test_drv_cb_drain(void)
     test_drv_cb_common(BDRV_DRAIN, false);
 }
 
-static void test_drv_cb_drain_subtree(void)
-{
-    test_drv_cb_common(BDRV_SUBTREE_DRAIN, true);
-}
-
 static void test_drv_cb_co_drain_all(void)
 {
     call_in_coroutine(test_drv_cb_drain_all);
@@ -286,11 +278,6 @@ static void test_drv_cb_co_drain(void)
     call_in_coroutine(test_drv_cb_drain);
 }
 
-static void test_drv_cb_co_drain_subtree(void)
-{
-    call_in_coroutine(test_drv_cb_drain_subtree);
-}
-
 static void test_quiesce_common(enum drain_type drain_type, bool recursive)
 {
     BlockBackend *blk;
@@ -332,11 +319,6 @@ static void test_quiesce_drain(void)
     test_quiesce_common(BDRV_DRAIN, false);
 }
 
-static void test_quiesce_drain_subtree(void)
-{
-    test_quiesce_common(BDRV_SUBTREE_DRAIN, true);
-}
-
 static void test_quiesce_co_drain_all(void)
 {
     call_in_coroutine(test_quiesce_drain_all);
@@ -347,11 +329,6 @@ static void test_quiesce_co_drain(void)
     call_in_coroutine(test_quiesce_drain);
 }
 
-static void test_quiesce_co_drain_subtree(void)
-{
-    call_in_coroutine(test_quiesce_drain_subtree);
-}
-
 static void test_nested(void)
 {
     BlockBackend *blk;
@@ -402,158 +379,6 @@ static void test_nested(void)
     blk_unref(blk);
 }
 
-static void test_multiparent(void)
-{
-    BlockBackend *blk_a, *blk_b;
-    BlockDriverState *bs_a, *bs_b, *backing;
-    BDRVTestState *a_s, *b_s, *backing_s;
-
-    blk_a = blk_new(qemu_get_aio_context(), BLK_PERM_ALL, BLK_PERM_ALL);
-    bs_a = bdrv_new_open_driver(&bdrv_test, "test-node-a", BDRV_O_RDWR,
-                                &error_abort);
-    a_s = bs_a->opaque;
-    blk_insert_bs(blk_a, bs_a, &error_abort);
-
-    blk_b = blk_new(qemu_get_aio_context(), BLK_PERM_ALL, BLK_PERM_ALL);
-    bs_b = bdrv_new_open_driver(&bdrv_test, "test-node-b", BDRV_O_RDWR,
-                                &error_abort);
-    b_s = bs_b->opaque;
-    blk_insert_bs(blk_b, bs_b, &error_abort);
-
-    backing = bdrv_new_open_driver(&bdrv_test, "backing", 0, &error_abort);
-    backing_s = backing->opaque;
-    bdrv_set_backing_hd(bs_a, backing, &error_abort);
-    bdrv_set_backing_hd(bs_b, backing, &error_abort);
-
-    g_assert_cmpint(bs_a->quiesce_counter, ==, 0);
-    g_assert_cmpint(bs_b->quiesce_counter, ==, 0);
-    g_assert_cmpint(backing->quiesce_counter, ==, 0);
-    g_assert_cmpint(a_s->drain_count, ==, 0);
-    g_assert_cmpint(b_s->drain_count, ==, 0);
-    g_assert_cmpint(backing_s->drain_count, ==, 0);
-
-    do_drain_begin(BDRV_SUBTREE_DRAIN, bs_a);
-
-    g_assert_cmpint(bs_a->quiesce_counter, ==, 1);
-    g_assert_cmpint(bs_b->quiesce_counter, ==, 1);
-    g_assert_cmpint(backing->quiesce_counter, ==, 1);
-    g_assert_cmpint(a_s->drain_count, ==, 1);
-    g_assert_cmpint(b_s->drain_count, ==, 1);
-    g_assert_cmpint(backing_s->drain_count, ==, 1);
-
-    do_drain_begin(BDRV_SUBTREE_DRAIN, bs_b);
-
-    g_assert_cmpint(bs_a->quiesce_counter, ==, 2);
-    g_assert_cmpint(bs_b->quiesce_counter, ==, 2);
-    g_assert_cmpint(backing->quiesce_counter, ==, 2);
-    g_assert_cmpint(a_s->drain_count, ==, 2);
-    g_assert_cmpint(b_s->drain_count, ==, 2);
-    g_assert_cmpint(backing_s->drain_count, ==, 2);
-
-    do_drain_end(BDRV_SUBTREE_DRAIN, bs_b);
-
-    g_assert_cmpint(bs_a->quiesce_counter, ==, 1);
-    g_assert_cmpint(bs_b->quiesce_counter, ==, 1);
-    g_assert_cmpint(backing->quiesce_counter, ==, 1);
-    g_assert_cmpint(a_s->drain_count, ==, 1);
-    g_assert_cmpint(b_s->drain_count, ==, 1);
-    g_assert_cmpint(backing_s->drain_count, ==, 1);
-
-    do_drain_end(BDRV_SUBTREE_DRAIN, bs_a);
-
-    g_assert_cmpint(bs_a->quiesce_counter, ==, 0);
-    g_assert_cmpint(bs_b->quiesce_counter, ==, 0);
-    g_assert_cmpint(backing->quiesce_counter, ==, 0);
-    g_assert_cmpint(a_s->drain_count, ==, 0);
-    g_assert_cmpint(b_s->drain_count, ==, 0);
-    g_assert_cmpint(backing_s->drain_count, ==, 0);
-
-    bdrv_unref(backing);
-    bdrv_unref(bs_a);
-    bdrv_unref(bs_b);
-    blk_unref(blk_a);
-    blk_unref(blk_b);
-}
-
-static void test_graph_change_drain_subtree(void)
-{
-    BlockBackend *blk_a, *blk_b;
-    BlockDriverState *bs_a, *bs_b, *backing;
-    BDRVTestState *a_s, *b_s, *backing_s;
-
-    blk_a = blk_new(qemu_get_aio_context(), BLK_PERM_ALL, BLK_PERM_ALL);
-    bs_a = bdrv_new_open_driver(&bdrv_test, "test-node-a", BDRV_O_RDWR,
-                                &error_abort);
-    a_s = bs_a->opaque;
-    blk_insert_bs(blk_a, bs_a, &error_abort);
-
-    blk_b = blk_new(qemu_get_aio_context(), BLK_PERM_ALL, BLK_PERM_ALL);
-    bs_b = bdrv_new_open_driver(&bdrv_test, "test-node-b", BDRV_O_RDWR,
-                                &error_abort);
-    b_s = bs_b->opaque;
-    blk_insert_bs(blk_b, bs_b, &error_abort);
-
-    backing = bdrv_new_open_driver(&bdrv_test, "backing", 0, &error_abort);
-    backing_s = backing->opaque;
-    bdrv_set_backing_hd(bs_a, backing, &error_abort);
-
-    g_assert_cmpint(bs_a->quiesce_counter, ==, 0);
-    g_assert_cmpint(bs_b->quiesce_counter, ==, 0);
-    g_assert_cmpint(backing->quiesce_counter, ==, 0);
-    g_assert_cmpint(a_s->drain_count, ==, 0);
-    g_assert_cmpint(b_s->drain_count, ==, 0);
-    g_assert_cmpint(backing_s->drain_count, ==, 0);
-
-    do_drain_begin(BDRV_SUBTREE_DRAIN, bs_a);
-    do_drain_begin(BDRV_SUBTREE_DRAIN, bs_a);
-    do_drain_begin(BDRV_SUBTREE_DRAIN, bs_a);
-    do_drain_begin(BDRV_SUBTREE_DRAIN, bs_b);
-    do_drain_begin(BDRV_SUBTREE_DRAIN, bs_b);
-
-    bdrv_set_backing_hd(bs_b, backing, &error_abort);
-    g_assert_cmpint(bs_a->quiesce_counter, ==, 5);
-    g_assert_cmpint(bs_b->quiesce_counter, ==, 5);
-    g_assert_cmpint(backing->quiesce_counter, ==, 5);
-    g_assert_cmpint(a_s->drain_count, ==, 5);
-    g_assert_cmpint(b_s->drain_count, ==, 5);
-    g_assert_cmpint(backing_s->drain_count, ==, 5);
-
-    bdrv_set_backing_hd(bs_b, NULL, &error_abort);
-    g_assert_cmpint(bs_a->quiesce_counter, ==, 3);
-    g_assert_cmpint(bs_b->quiesce_counter, ==, 2);
-    g_assert_cmpint(backing->quiesce_counter, ==, 3);
-    g_assert_cmpint(a_s->drain_count, ==, 3);
-    g_assert_cmpint(b_s->drain_count, ==, 2);
-    g_assert_cmpint(backing_s->drain_count, ==, 3);
-
-    bdrv_set_backing_hd(bs_b, backing, &error_abort);
-    g_assert_cmpint(bs_a->quiesce_counter, ==, 5);
-    g_assert_cmpint(bs_b->quiesce_counter, ==, 5);
-    g_assert_cmpint(backing->quiesce_counter, ==, 5);
-    g_assert_cmpint(a_s->drain_count, ==, 5);
-    g_assert_cmpint(b_s->drain_count, ==, 5);
-    g_assert_cmpint(backing_s->drain_count, ==, 5);
-
-    do_drain_end(BDRV_SUBTREE_DRAIN, bs_b);
-    do_drain_end(BDRV_SUBTREE_DRAIN, bs_b);
-    do_drain_end(BDRV_SUBTREE_DRAIN, bs_a);
-    do_drain_end(BDRV_SUBTREE_DRAIN, bs_a);
-    do_drain_end(BDRV_SUBTREE_DRAIN, bs_a);
-
-    g_assert_cmpint(bs_a->quiesce_counter, ==, 0);
-    g_assert_cmpint(bs_b->quiesce_counter, ==, 0);
-    g_assert_cmpint(backing->quiesce_counter, ==, 0);
-    g_assert_cmpint(a_s->drain_count, ==, 0);
-    g_assert_cmpint(b_s->drain_count, ==, 0);
-    g_assert_cmpint(backing_s->drain_count, ==, 0);
-
-    bdrv_unref(backing);
-    bdrv_unref(bs_a);
-    bdrv_unref(bs_b);
-    blk_unref(blk_a);
-    blk_unref(blk_b);
-}
-
 static void test_graph_change_drain_all(void)
 {
     BlockBackend *blk_a, *blk_b;
@@ -773,12 +598,6 @@ static void test_iothread_drain(void)
     test_iothread_common(BDRV_DRAIN, 1);
 }
 
-static void test_iothread_drain_subtree(void)
-{
-    test_iothread_common(BDRV_SUBTREE_DRAIN, 0);
-    test_iothread_common(BDRV_SUBTREE_DRAIN, 1);
-}
-
 
 typedef struct TestBlockJob {
     BlockJob common;
@@ -863,7 +682,6 @@ enum test_job_result {
 enum test_job_drain_node {
     TEST_JOB_DRAIN_SRC,
     TEST_JOB_DRAIN_SRC_CHILD,
-    TEST_JOB_DRAIN_SRC_PARENT,
 };
 
 static void test_blockjob_common_drain_node(enum drain_type drain_type,
@@ -901,9 +719,6 @@ static void test_blockjob_common_drain_node(enum drain_type drain_type,
     case TEST_JOB_DRAIN_SRC_CHILD:
         drain_bs = src_backing;
         break;
-    case TEST_JOB_DRAIN_SRC_PARENT:
-        drain_bs = src_overlay;
-        break;
     default:
         g_assert_not_reached();
     }
@@ -1055,10 +870,6 @@ static void test_blockjob_common(enum drain_type drain_type, bool use_iothread,
                                     TEST_JOB_DRAIN_SRC);
     test_blockjob_common_drain_node(drain_type, use_iothread, result,
                                     TEST_JOB_DRAIN_SRC_CHILD);
-    if (drain_type == BDRV_SUBTREE_DRAIN) {
-        test_blockjob_common_drain_node(drain_type, use_iothread, result,
-                                        TEST_JOB_DRAIN_SRC_PARENT);
-    }
 }
 
 static void test_blockjob_drain_all(void)
@@ -1071,11 +882,6 @@ static void test_blockjob_drain(void)
     test_blockjob_common(BDRV_DRAIN, false, TEST_JOB_SUCCESS);
 }
 
-static void test_blockjob_drain_subtree(void)
-{
-    test_blockjob_common(BDRV_SUBTREE_DRAIN, false, TEST_JOB_SUCCESS);
-}
-
 static void test_blockjob_error_drain_all(void)
 {
     test_blockjob_common(BDRV_DRAIN_ALL, false, TEST_JOB_FAIL_RUN);
@@ -1088,12 +894,6 @@ static void test_blockjob_error_drain(void)
     test_blockjob_common(BDRV_DRAIN, false, TEST_JOB_FAIL_PREPARE);
 }
 
-static void test_blockjob_error_drain_subtree(void)
-{
-    test_blockjob_common(BDRV_SUBTREE_DRAIN, false, TEST_JOB_FAIL_RUN);
-    test_blockjob_common(BDRV_SUBTREE_DRAIN, false, TEST_JOB_FAIL_PREPARE);
-}
-
 static void test_blockjob_iothread_drain_all(void)
 {
     test_blockjob_common(BDRV_DRAIN_ALL, true, TEST_JOB_SUCCESS);
@@ -1104,11 +904,6 @@ static void test_blockjob_iothread_drain(void)
     test_blockjob_common(BDRV_DRAIN, true, TEST_JOB_SUCCESS);
 }
 
-static void test_blockjob_iothread_drain_subtree(void)
-{
-    test_blockjob_common(BDRV_SUBTREE_DRAIN, true, TEST_JOB_SUCCESS);
-}
-
 static void test_blockjob_iothread_error_drain_all(void)
 {
     test_blockjob_common(BDRV_DRAIN_ALL, true, TEST_JOB_FAIL_RUN);
@@ -1121,12 +916,6 @@ static void test_blockjob_iothread_error_drain(void)
     test_blockjob_common(BDRV_DRAIN, true, TEST_JOB_FAIL_PREPARE);
 }
 
-static void test_blockjob_iothread_error_drain_subtree(void)
-{
-    test_blockjob_common(BDRV_SUBTREE_DRAIN, true, TEST_JOB_FAIL_RUN);
-    test_blockjob_common(BDRV_SUBTREE_DRAIN, true, TEST_JOB_FAIL_PREPARE);
-}
-
 
 typedef struct BDRVTestTopState {
     BdrvChild *wait_child;
@@ -1273,14 +1062,6 @@ static void do_test_delete_by_drain(bool detach_instead_of_delete,
         bdrv_drain(child_bs);
         bdrv_unref(child_bs);
         break;
-    case BDRV_SUBTREE_DRAIN:
-        /* Would have to ref/unref bs here for !detach_instead_of_delete, but
-         * then the whole test becomes pointless because the graph changes
-         * don't occur during the drain any more. */
-        assert(detach_instead_of_delete);
-        bdrv_subtree_drained_begin(bs);
-        bdrv_subtree_drained_end(bs);
-        break;
     case BDRV_DRAIN_ALL:
         bdrv_drain_all_begin();
         bdrv_drain_all_end();
@@ -1315,11 +1096,6 @@ static void test_detach_by_drain(void)
     do_test_delete_by_drain(true, BDRV_DRAIN);
 }
 
-static void test_detach_by_drain_subtree(void)
-{
-    do_test_delete_by_drain(true, BDRV_SUBTREE_DRAIN);
-}
-
 
 struct detach_by_parent_data {
     BlockDriverState *parent_b;
@@ -1452,7 +1228,10 @@ static void test_detach_indirect(bool by_parent_cb)
     g_assert(acb != NULL);
 
     /* Drain and check the expected result */
-    bdrv_subtree_drained_begin(parent_b);
+    bdrv_drained_begin(parent_b);
+    bdrv_drained_begin(a);
+    bdrv_drained_begin(b);
+    bdrv_drained_begin(c);
 
     g_assert(detach_by_parent_data.child_c != NULL);
 
@@ -1467,12 +1246,15 @@ static void test_detach_indirect(bool by_parent_cb)
     g_assert(QLIST_NEXT(child_a, next) == NULL);
 
     g_assert_cmpint(parent_a->quiesce_counter, ==, 1);
-    g_assert_cmpint(parent_b->quiesce_counter, ==, 1);
+    g_assert_cmpint(parent_b->quiesce_counter, ==, 3);
     g_assert_cmpint(a->quiesce_counter, ==, 1);
-    g_assert_cmpint(b->quiesce_counter, ==, 0);
+    g_assert_cmpint(b->quiesce_counter, ==, 1);
     g_assert_cmpint(c->quiesce_counter, ==, 1);
 
-    bdrv_subtree_drained_end(parent_b);
+    bdrv_drained_end(parent_b);
+    bdrv_drained_end(a);
+    bdrv_drained_end(b);
+    bdrv_drained_end(c);
 
     bdrv_unref(parent_b);
     blk_unref(blk);
@@ -2202,70 +1984,47 @@ int main(int argc, char **argv)
 
     g_test_add_func("/bdrv-drain/driver-cb/drain_all", test_drv_cb_drain_all);
     g_test_add_func("/bdrv-drain/driver-cb/drain", test_drv_cb_drain);
-    g_test_add_func("/bdrv-drain/driver-cb/drain_subtree",
-                    test_drv_cb_drain_subtree);
 
     g_test_add_func("/bdrv-drain/driver-cb/co/drain_all",
                     test_drv_cb_co_drain_all);
     g_test_add_func("/bdrv-drain/driver-cb/co/drain", test_drv_cb_co_drain);
-    g_test_add_func("/bdrv-drain/driver-cb/co/drain_subtree",
-                    test_drv_cb_co_drain_subtree);
-
 
     g_test_add_func("/bdrv-drain/quiesce/drain_all", test_quiesce_drain_all);
     g_test_add_func("/bdrv-drain/quiesce/drain", test_quiesce_drain);
-    g_test_add_func("/bdrv-drain/quiesce/drain_subtree",
-                    test_quiesce_drain_subtree);
 
     g_test_add_func("/bdrv-drain/quiesce/co/drain_all",
                     test_quiesce_co_drain_all);
     g_test_add_func("/bdrv-drain/quiesce/co/drain", test_quiesce_co_drain);
-    g_test_add_func("/bdrv-drain/quiesce/co/drain_subtree",
-                    test_quiesce_co_drain_subtree);
 
     g_test_add_func("/bdrv-drain/nested", test_nested);
-    g_test_add_func("/bdrv-drain/multiparent", test_multiparent);
 
-    g_test_add_func("/bdrv-drain/graph-change/drain_subtree",
-                    test_graph_change_drain_subtree);
     g_test_add_func("/bdrv-drain/graph-change/drain_all",
                     test_graph_change_drain_all);
 
     g_test_add_func("/bdrv-drain/iothread/drain_all", test_iothread_drain_all);
     g_test_add_func("/bdrv-drain/iothread/drain", test_iothread_drain);
-    g_test_add_func("/bdrv-drain/iothread/drain_subtree",
-                    test_iothread_drain_subtree);
 
     g_test_add_func("/bdrv-drain/blockjob/drain_all", test_blockjob_drain_all);
     g_test_add_func("/bdrv-drain/blockjob/drain", test_blockjob_drain);
-    g_test_add_func("/bdrv-drain/blockjob/drain_subtree",
-                    test_blockjob_drain_subtree);
 
     g_test_add_func("/bdrv-drain/blockjob/error/drain_all",
                     test_blockjob_error_drain_all);
     g_test_add_func("/bdrv-drain/blockjob/error/drain",
                     test_blockjob_error_drain);
-    g_test_add_func("/bdrv-drain/blockjob/error/drain_subtree",
-                    test_blockjob_error_drain_subtree);
 
     g_test_add_func("/bdrv-drain/blockjob/iothread/drain_all",
                     test_blockjob_iothread_drain_all);
     g_test_add_func("/bdrv-drain/blockjob/iothread/drain",
                     test_blockjob_iothread_drain);
-    g_test_add_func("/bdrv-drain/blockjob/iothread/drain_subtree",
-                    test_blockjob_iothread_drain_subtree);
 
     g_test_add_func("/bdrv-drain/blockjob/iothread/error/drain_all",
                     test_blockjob_iothread_error_drain_all);
     g_test_add_func("/bdrv-drain/blockjob/iothread/error/drain",
                     test_blockjob_iothread_error_drain);
-    g_test_add_func("/bdrv-drain/blockjob/iothread/error/drain_subtree",
-                    test_blockjob_iothread_error_drain_subtree);
 
     g_test_add_func("/bdrv-drain/deletion/drain", test_delete_by_drain);
     g_test_add_func("/bdrv-drain/detach/drain_all", test_detach_by_drain_all);
     g_test_add_func("/bdrv-drain/detach/drain", test_detach_by_drain);
-    g_test_add_func("/bdrv-drain/detach/drain_subtree", test_detach_by_drain_subtree);
     g_test_add_func("/bdrv-drain/detach/parent_cb", test_detach_by_parent_cb);
     g_test_add_func("/bdrv-drain/detach/driver_cb", test_detach_by_driver_cb);
 
-- 
2.38.1



^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v2 11/15] block: Call drain callbacks only once
  2022-11-18 17:40 [PATCH v2 00/15] block: Simplify drain Kevin Wolf
                   ` (9 preceding siblings ...)
  2022-11-18 17:41 ` [PATCH v2 10/15] block: Remove subtree drains Kevin Wolf
@ 2022-11-18 17:41 ` Kevin Wolf
  2022-11-25 14:59   ` Vladimir Sementsov-Ogievskiy
  2022-11-18 17:41 ` [PATCH v2 12/15] block: Remove ignore_bds_parents parameter from drain_begin/end Kevin Wolf
                   ` (5 subsequent siblings)
  16 siblings, 1 reply; 31+ messages in thread
From: Kevin Wolf @ 2022-11-18 17:41 UTC (permalink / raw)
  To: qemu-block
  Cc: kwolf, eesposit, stefanha, hreitz, pbonzini, vsementsov, qemu-devel

We only need to call both the BlockDriver's callback and the parent
callbacks when going from undrained to drained or vice versa. A second
drain section doesn't make a difference for the driver or the parent,
they weren't supposed to send new requests before and after the second
drain.

One thing that gets in the way is the 'ignore_bds_parents' parameter in
bdrv_do_drained_begin_quiesce() and bdrv_do_drained_end(): It means that
bdrv_drain_all_begin() increases bs->quiesce_counter, but does not
quiesce the parent through BdrvChildClass callbacks. If an additional
drain section is started now, bs->quiesce_counter will be non-zero, but
we would still need to quiesce the parent through BdrvChildClass in
order to keep things consistent (and unquiesce it on the matching
bdrv_drained_end(), even though the counter would reach 0 yet as long as
the bdrv_drain_all() section is still active).

Instead of keeping track of this, let's just get rid of the parameter.
It was introduced in commit 6cd5c9d7b2d as an optimisation so that
during bdrv_drain_all(), we wouldn't recursively drain all parents up to
the root for each node, resulting in quadratic complexity. As it happens,
calling the callbacks only once solves the same problem, so as of this
patch, we'll still have O(n) complexity and ignore_bds_parents is not
needed any more.

This patch only ignores the 'ignore_bds_parents' parameter. It will be
removed in a separate patch.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Hanna Reitz <hreitz@redhat.com>
---
 include/block/block_int-common.h |  8 ++++----
 block.c                          | 25 +++++++------------------
 block/io.c                       | 30 ++++++++++++++++++------------
 tests/unit/test-bdrv-drain.c     | 16 ++++++++++------
 4 files changed, 39 insertions(+), 40 deletions(-)

diff --git a/include/block/block_int-common.h b/include/block/block_int-common.h
index 791dddfd7d..a6bc6b7fe9 100644
--- a/include/block/block_int-common.h
+++ b/include/block/block_int-common.h
@@ -980,13 +980,13 @@ struct BdrvChild {
     bool frozen;
 
     /*
-     * How many times the parent of this child has been drained
+     * True if the parent of this child has been drained by this BdrvChild
      * (through klass->drained_*).
-     * Usually, this is equal to bs->quiesce_counter (potentially
-     * reduced by bdrv_drain_all_count).  It may differ while the
+     *
+     * It is generally true if bs->quiesce_counter > 0. It may differ while the
      * child is entering or leaving a drained section.
      */
-    int parent_quiesce_counter;
+    bool quiesced_parent;
 
     QLIST_ENTRY(BdrvChild) next;
     QLIST_ENTRY(BdrvChild) next_parent;
diff --git a/block.c b/block.c
index c589c3432b..6b41a252e4 100644
--- a/block.c
+++ b/block.c
@@ -2826,7 +2826,6 @@ static void bdrv_replace_child_noperm(BdrvChild *child,
 {
     BlockDriverState *old_bs = child->bs;
     int new_bs_quiesce_counter;
-    int drain_saldo;
 
     assert(!child->frozen);
     assert(old_bs != new_bs);
@@ -2836,16 +2835,13 @@ static void bdrv_replace_child_noperm(BdrvChild *child,
         assert(bdrv_get_aio_context(old_bs) == bdrv_get_aio_context(new_bs));
     }
 
-    new_bs_quiesce_counter = (new_bs ? new_bs->quiesce_counter : 0);
-    drain_saldo = new_bs_quiesce_counter - child->parent_quiesce_counter;
-
     /*
      * If the new child node is drained but the old one was not, flush
      * all outstanding requests to the old child node.
      */
-    while (drain_saldo > 0 && child->klass->drained_begin) {
+    new_bs_quiesce_counter = (new_bs ? new_bs->quiesce_counter : 0);
+    if (new_bs_quiesce_counter && !child->quiesced_parent) {
         bdrv_parent_drained_begin_single(child, true);
-        drain_saldo--;
     }
 
     if (old_bs) {
@@ -2861,16 +2857,6 @@ static void bdrv_replace_child_noperm(BdrvChild *child,
     if (new_bs) {
         assert_bdrv_graph_writable(new_bs);
         QLIST_INSERT_HEAD(&new_bs->parents, child, next_parent);
-
-        /*
-         * Polling in bdrv_parent_drained_begin_single() may have led to the new
-         * node's quiesce_counter having been decreased.  Not a problem, we just
-         * need to recognize this here and then invoke drained_end appropriately
-         * more often.
-         */
-        assert(new_bs->quiesce_counter <= new_bs_quiesce_counter);
-        drain_saldo += new_bs->quiesce_counter - new_bs_quiesce_counter;
-
         if (child->klass->attach) {
             child->klass->attach(child);
         }
@@ -2879,10 +2865,13 @@ static void bdrv_replace_child_noperm(BdrvChild *child,
     /*
      * If the old child node was drained but the new one is not, allow
      * requests to come in only after the new node has been attached.
+     *
+     * Update new_bs_quiesce_counter because bdrv_parent_drained_begin_single()
+     * polls, which could have changed the value.
      */
-    while (drain_saldo < 0 && child->klass->drained_end) {
+    new_bs_quiesce_counter = (new_bs ? new_bs->quiesce_counter : 0);
+    if (!new_bs_quiesce_counter && child->quiesced_parent) {
         bdrv_parent_drained_end_single(child);
-        drain_saldo++;
     }
 }
 
diff --git a/block/io.c b/block/io.c
index 75224480d0..87d6f22ec4 100644
--- a/block/io.c
+++ b/block/io.c
@@ -62,8 +62,9 @@ void bdrv_parent_drained_end_single(BdrvChild *c)
 {
     IO_OR_GS_CODE();
 
-    assert(c->parent_quiesce_counter > 0);
-    c->parent_quiesce_counter--;
+    assert(c->quiesced_parent);
+    c->quiesced_parent = false;
+
     if (c->klass->drained_end) {
         c->klass->drained_end(c);
     }
@@ -110,7 +111,10 @@ void bdrv_parent_drained_begin_single(BdrvChild *c, bool poll)
 {
     AioContext *ctx = bdrv_child_get_parent_aio_context(c);
     IO_OR_GS_CODE();
-    c->parent_quiesce_counter++;
+
+    assert(!c->quiesced_parent);
+    c->quiesced_parent = true;
+
     if (c->klass->drained_begin) {
         c->klass->drained_begin(c);
     }
@@ -358,11 +362,12 @@ void bdrv_do_drained_begin_quiesce(BlockDriverState *bs,
     /* Stop things in parent-to-child order */
     if (qatomic_fetch_inc(&bs->quiesce_counter) == 0) {
         aio_disable_external(bdrv_get_aio_context(bs));
-    }
 
-    bdrv_parent_drained_begin(bs, parent, ignore_bds_parents);
-    if (bs->drv && bs->drv->bdrv_drain_begin) {
-        bs->drv->bdrv_drain_begin(bs);
+        /* TODO Remove ignore_bds_parents, we don't consider it any more */
+        bdrv_parent_drained_begin(bs, parent, false);
+        if (bs->drv && bs->drv->bdrv_drain_begin) {
+            bs->drv->bdrv_drain_begin(bs);
+        }
     }
 }
 
@@ -413,13 +418,14 @@ static void bdrv_do_drained_end(BlockDriverState *bs, BdrvChild *parent,
     assert(bs->quiesce_counter > 0);
 
     /* Re-enable things in child-to-parent order */
-    if (bs->drv && bs->drv->bdrv_drain_end) {
-        bs->drv->bdrv_drain_end(bs);
-    }
-    bdrv_parent_drained_end(bs, parent, ignore_bds_parents);
-
     old_quiesce_counter = qatomic_fetch_dec(&bs->quiesce_counter);
     if (old_quiesce_counter == 1) {
+        if (bs->drv && bs->drv->bdrv_drain_end) {
+            bs->drv->bdrv_drain_end(bs);
+        }
+        /* TODO Remove ignore_bds_parents, we don't consider it any more */
+        bdrv_parent_drained_end(bs, parent, false);
+
         aio_enable_external(bdrv_get_aio_context(bs));
     }
 }
diff --git a/tests/unit/test-bdrv-drain.c b/tests/unit/test-bdrv-drain.c
index dda08de8db..172bc6debc 100644
--- a/tests/unit/test-bdrv-drain.c
+++ b/tests/unit/test-bdrv-drain.c
@@ -296,7 +296,11 @@ static void test_quiesce_common(enum drain_type drain_type, bool recursive)
 
     do_drain_begin(drain_type, bs);
 
-    g_assert_cmpint(bs->quiesce_counter, ==, 1);
+    if (drain_type == BDRV_DRAIN_ALL) {
+        g_assert_cmpint(bs->quiesce_counter, ==, 2);
+    } else {
+        g_assert_cmpint(bs->quiesce_counter, ==, 1);
+    }
     g_assert_cmpint(backing->quiesce_counter, ==, !!recursive);
 
     do_drain_end(drain_type, bs);
@@ -348,8 +352,8 @@ static void test_nested(void)
 
     for (outer = 0; outer < DRAIN_TYPE_MAX; outer++) {
         for (inner = 0; inner < DRAIN_TYPE_MAX; inner++) {
-            int backing_quiesce = (outer != BDRV_DRAIN) +
-                                  (inner != BDRV_DRAIN);
+            int backing_quiesce = (outer == BDRV_DRAIN_ALL) +
+                                  (inner == BDRV_DRAIN_ALL);
 
             g_assert_cmpint(bs->quiesce_counter, ==, 0);
             g_assert_cmpint(backing->quiesce_counter, ==, 0);
@@ -359,10 +363,10 @@ static void test_nested(void)
             do_drain_begin(outer, bs);
             do_drain_begin(inner, bs);
 
-            g_assert_cmpint(bs->quiesce_counter, ==, 2);
+            g_assert_cmpint(bs->quiesce_counter, ==, 2 + !!backing_quiesce);
             g_assert_cmpint(backing->quiesce_counter, ==, backing_quiesce);
-            g_assert_cmpint(s->drain_count, ==, 2);
-            g_assert_cmpint(backing_s->drain_count, ==, backing_quiesce);
+            g_assert_cmpint(s->drain_count, ==, 1);
+            g_assert_cmpint(backing_s->drain_count, ==, !!backing_quiesce);
 
             do_drain_end(inner, bs);
             do_drain_end(outer, bs);
-- 
2.38.1



^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v2 12/15] block: Remove ignore_bds_parents parameter from drain_begin/end.
  2022-11-18 17:40 [PATCH v2 00/15] block: Simplify drain Kevin Wolf
                   ` (10 preceding siblings ...)
  2022-11-18 17:41 ` [PATCH v2 11/15] block: Call drain callbacks only once Kevin Wolf
@ 2022-11-18 17:41 ` Kevin Wolf
  2022-11-25 15:05   ` Vladimir Sementsov-Ogievskiy
  2022-11-18 17:41 ` [PATCH v2 13/15] block: Drop out of coroutine in bdrv_do_drained_begin_quiesce() Kevin Wolf
                   ` (4 subsequent siblings)
  16 siblings, 1 reply; 31+ messages in thread
From: Kevin Wolf @ 2022-11-18 17:41 UTC (permalink / raw)
  To: qemu-block
  Cc: kwolf, eesposit, stefanha, hreitz, pbonzini, vsementsov, qemu-devel

ignore_bds_parents is now ignored during drain_begin and drain_end, so
we can just remove it there. It is still a valid optimisation for
drain_all in bdrv_drained_poll(), so leave it around there.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 include/block/block-io.h |  3 +--
 block.c                  |  2 +-
 block/io.c               | 58 +++++++++++++++-------------------------
 3 files changed, 24 insertions(+), 39 deletions(-)

diff --git a/include/block/block-io.h b/include/block/block-io.h
index 9c36a16a1f..8f5e75756a 100644
--- a/include/block/block-io.h
+++ b/include/block/block-io.h
@@ -329,8 +329,7 @@ void bdrv_drained_begin(BlockDriverState *bs);
  * Quiesces a BDS like bdrv_drained_begin(), but does not wait for already
  * running requests to complete.
  */
-void bdrv_do_drained_begin_quiesce(BlockDriverState *bs,
-                                   BdrvChild *parent, bool ignore_bds_parents);
+void bdrv_do_drained_begin_quiesce(BlockDriverState *bs, BdrvChild *parent);
 
 /**
  * bdrv_drained_end:
diff --git a/block.c b/block.c
index 6b41a252e4..d18512944d 100644
--- a/block.c
+++ b/block.c
@@ -1228,7 +1228,7 @@ static char *bdrv_child_get_parent_desc(BdrvChild *c)
 static void bdrv_child_cb_drained_begin(BdrvChild *child)
 {
     BlockDriverState *bs = child->opaque;
-    bdrv_do_drained_begin_quiesce(bs, NULL, false);
+    bdrv_do_drained_begin_quiesce(bs, NULL);
 }
 
 static bool bdrv_child_cb_drained_poll(BdrvChild *child)
diff --git a/block/io.c b/block/io.c
index 87d6f22ec4..2e9503df6a 100644
--- a/block/io.c
+++ b/block/io.c
@@ -45,13 +45,12 @@ static void bdrv_parent_cb_resize(BlockDriverState *bs);
 static int coroutine_fn bdrv_co_do_pwrite_zeroes(BlockDriverState *bs,
     int64_t offset, int64_t bytes, BdrvRequestFlags flags);
 
-static void bdrv_parent_drained_begin(BlockDriverState *bs, BdrvChild *ignore,
-                                      bool ignore_bds_parents)
+static void bdrv_parent_drained_begin(BlockDriverState *bs, BdrvChild *ignore)
 {
     BdrvChild *c, *next;
 
     QLIST_FOREACH_SAFE(c, &bs->parents, next_parent, next) {
-        if (c == ignore || (ignore_bds_parents && c->klass->parent_is_bds)) {
+        if (c == ignore) {
             continue;
         }
         bdrv_parent_drained_begin_single(c, false);
@@ -70,13 +69,12 @@ void bdrv_parent_drained_end_single(BdrvChild *c)
     }
 }
 
-static void bdrv_parent_drained_end(BlockDriverState *bs, BdrvChild *ignore,
-                                    bool ignore_bds_parents)
+static void bdrv_parent_drained_end(BlockDriverState *bs, BdrvChild *ignore)
 {
     BdrvChild *c;
 
     QLIST_FOREACH(c, &bs->parents, next_parent) {
-        if (c == ignore || (ignore_bds_parents && c->klass->parent_is_bds)) {
+        if (c == ignore) {
             continue;
         }
         bdrv_parent_drained_end_single(c);
@@ -242,7 +240,6 @@ typedef struct {
     bool begin;
     bool poll;
     BdrvChild *parent;
-    bool ignore_bds_parents;
 } BdrvCoDrainData;
 
 /* Returns true if BDRV_POLL_WHILE() should go into a blocking aio_poll() */
@@ -269,9 +266,8 @@ static bool bdrv_drain_poll_top_level(BlockDriverState *bs,
 }
 
 static void bdrv_do_drained_begin(BlockDriverState *bs, BdrvChild *parent,
-                                  bool ignore_bds_parents, bool poll);
-static void bdrv_do_drained_end(BlockDriverState *bs, BdrvChild *parent,
-                                bool ignore_bds_parents);
+                                  bool poll);
+static void bdrv_do_drained_end(BlockDriverState *bs, BdrvChild *parent);
 
 static void bdrv_co_drain_bh_cb(void *opaque)
 {
@@ -284,11 +280,10 @@ static void bdrv_co_drain_bh_cb(void *opaque)
         aio_context_acquire(ctx);
         bdrv_dec_in_flight(bs);
         if (data->begin) {
-            bdrv_do_drained_begin(bs, data->parent, data->ignore_bds_parents,
-                                  data->poll);
+            bdrv_do_drained_begin(bs, data->parent, data->poll);
         } else {
             assert(!data->poll);
-            bdrv_do_drained_end(bs, data->parent, data->ignore_bds_parents);
+            bdrv_do_drained_end(bs, data->parent);
         }
         aio_context_release(ctx);
     } else {
@@ -303,7 +298,6 @@ static void bdrv_co_drain_bh_cb(void *opaque)
 static void coroutine_fn bdrv_co_yield_to_drain(BlockDriverState *bs,
                                                 bool begin,
                                                 BdrvChild *parent,
-                                                bool ignore_bds_parents,
                                                 bool poll)
 {
     BdrvCoDrainData data;
@@ -321,7 +315,6 @@ static void coroutine_fn bdrv_co_yield_to_drain(BlockDriverState *bs,
         .done = false,
         .begin = begin,
         .parent = parent,
-        .ignore_bds_parents = ignore_bds_parents,
         .poll = poll,
     };
 
@@ -353,8 +346,7 @@ static void coroutine_fn bdrv_co_yield_to_drain(BlockDriverState *bs,
     }
 }
 
-void bdrv_do_drained_begin_quiesce(BlockDriverState *bs,
-                                   BdrvChild *parent, bool ignore_bds_parents)
+void bdrv_do_drained_begin_quiesce(BlockDriverState *bs, BdrvChild *parent)
 {
     IO_OR_GS_CODE();
     assert(!qemu_in_coroutine());
@@ -362,9 +354,7 @@ void bdrv_do_drained_begin_quiesce(BlockDriverState *bs,
     /* Stop things in parent-to-child order */
     if (qatomic_fetch_inc(&bs->quiesce_counter) == 0) {
         aio_disable_external(bdrv_get_aio_context(bs));
-
-        /* TODO Remove ignore_bds_parents, we don't consider it any more */
-        bdrv_parent_drained_begin(bs, parent, false);
+        bdrv_parent_drained_begin(bs, parent);
         if (bs->drv && bs->drv->bdrv_drain_begin) {
             bs->drv->bdrv_drain_begin(bs);
         }
@@ -372,14 +362,14 @@ void bdrv_do_drained_begin_quiesce(BlockDriverState *bs,
 }
 
 static void bdrv_do_drained_begin(BlockDriverState *bs, BdrvChild *parent,
-                                  bool ignore_bds_parents, bool poll)
+                                  bool poll)
 {
     if (qemu_in_coroutine()) {
-        bdrv_co_yield_to_drain(bs, true, parent, ignore_bds_parents, poll);
+        bdrv_co_yield_to_drain(bs, true, parent, poll);
         return;
     }
 
-    bdrv_do_drained_begin_quiesce(bs, parent, ignore_bds_parents);
+    bdrv_do_drained_begin_quiesce(bs, parent);
 
     /*
      * Wait for drained requests to finish.
@@ -391,7 +381,6 @@ static void bdrv_do_drained_begin(BlockDriverState *bs, BdrvChild *parent,
      * nodes.
      */
     if (poll) {
-        assert(!ignore_bds_parents);
         BDRV_POLL_WHILE(bs, bdrv_drain_poll_top_level(bs, parent));
     }
 }
@@ -399,20 +388,19 @@ static void bdrv_do_drained_begin(BlockDriverState *bs, BdrvChild *parent,
 void bdrv_drained_begin(BlockDriverState *bs)
 {
     IO_OR_GS_CODE();
-    bdrv_do_drained_begin(bs, NULL, false, true);
+    bdrv_do_drained_begin(bs, NULL, true);
 }
 
 /**
  * This function does not poll, nor must any of its recursively called
  * functions.
  */
-static void bdrv_do_drained_end(BlockDriverState *bs, BdrvChild *parent,
-                                bool ignore_bds_parents)
+static void bdrv_do_drained_end(BlockDriverState *bs, BdrvChild *parent)
 {
     int old_quiesce_counter;
 
     if (qemu_in_coroutine()) {
-        bdrv_co_yield_to_drain(bs, false, parent, ignore_bds_parents, false);
+        bdrv_co_yield_to_drain(bs, false, parent, false);
         return;
     }
     assert(bs->quiesce_counter > 0);
@@ -423,9 +411,7 @@ static void bdrv_do_drained_end(BlockDriverState *bs, BdrvChild *parent,
         if (bs->drv && bs->drv->bdrv_drain_end) {
             bs->drv->bdrv_drain_end(bs);
         }
-        /* TODO Remove ignore_bds_parents, we don't consider it any more */
-        bdrv_parent_drained_end(bs, parent, false);
-
+        bdrv_parent_drained_end(bs, parent);
         aio_enable_external(bdrv_get_aio_context(bs));
     }
 }
@@ -433,7 +419,7 @@ static void bdrv_do_drained_end(BlockDriverState *bs, BdrvChild *parent,
 void bdrv_drained_end(BlockDriverState *bs)
 {
     IO_OR_GS_CODE();
-    bdrv_do_drained_end(bs, NULL, false);
+    bdrv_do_drained_end(bs, NULL);
 }
 
 void bdrv_drain(BlockDriverState *bs)
@@ -491,7 +477,7 @@ void bdrv_drain_all_begin(void)
     GLOBAL_STATE_CODE();
 
     if (qemu_in_coroutine()) {
-        bdrv_co_yield_to_drain(NULL, true, NULL, true, true);
+        bdrv_co_yield_to_drain(NULL, true, NULL, true);
         return;
     }
 
@@ -516,7 +502,7 @@ void bdrv_drain_all_begin(void)
         AioContext *aio_context = bdrv_get_aio_context(bs);
 
         aio_context_acquire(aio_context);
-        bdrv_do_drained_begin(bs, NULL, true, false);
+        bdrv_do_drained_begin(bs, NULL, false);
         aio_context_release(aio_context);
     }
 
@@ -536,7 +522,7 @@ void bdrv_drain_all_end_quiesce(BlockDriverState *bs)
     g_assert(!bs->refcnt);
 
     while (bs->quiesce_counter) {
-        bdrv_do_drained_end(bs, NULL, true);
+        bdrv_do_drained_end(bs, NULL);
     }
 }
 
@@ -558,7 +544,7 @@ void bdrv_drain_all_end(void)
         AioContext *aio_context = bdrv_get_aio_context(bs);
 
         aio_context_acquire(aio_context);
-        bdrv_do_drained_end(bs, NULL, true);
+        bdrv_do_drained_end(bs, NULL);
         aio_context_release(aio_context);
     }
 
-- 
2.38.1



^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v2 13/15] block: Drop out of coroutine in bdrv_do_drained_begin_quiesce()
  2022-11-18 17:40 [PATCH v2 00/15] block: Simplify drain Kevin Wolf
                   ` (11 preceding siblings ...)
  2022-11-18 17:41 ` [PATCH v2 12/15] block: Remove ignore_bds_parents parameter from drain_begin/end Kevin Wolf
@ 2022-11-18 17:41 ` Kevin Wolf
  2022-11-25 15:13   ` Vladimir Sementsov-Ogievskiy
  2022-11-18 17:41 ` [PATCH v2 14/15] block: Don't poll in bdrv_replace_child_noperm() Kevin Wolf
                   ` (3 subsequent siblings)
  16 siblings, 1 reply; 31+ messages in thread
From: Kevin Wolf @ 2022-11-18 17:41 UTC (permalink / raw)
  To: qemu-block
  Cc: kwolf, eesposit, stefanha, hreitz, pbonzini, vsementsov, qemu-devel

The next patch adds a parent drain to bdrv_attach_child_common(), which
shouldn't be, but is currently called from coroutines in some cases (e.g.
.bdrv_co_create implementations generally open new nodes). Therefore,
the assertion that we're not in a coroutine doesn't hold true any more.

We could just remove the assertion because there is nothing in the
function that should be in conflict with running in a coroutine, but
just to be on the safe side, we can reverse the caller relationship
between bdrv_do_drained_begin() and bdrv_do_drained_begin_quiesce() so
that the latter also just drops out of coroutine context and we can
still be certain in the future that any drain code doesn't run in
coroutines.

As a nice side effect, the structure of bdrv_do_drained_begin() is now
symmetrical with bdrv_do_drained_end().

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 block/io.c | 25 ++++++++++++-------------
 1 file changed, 12 insertions(+), 13 deletions(-)

diff --git a/block/io.c b/block/io.c
index 2e9503df6a..5e9150d92c 100644
--- a/block/io.c
+++ b/block/io.c
@@ -346,10 +346,15 @@ static void coroutine_fn bdrv_co_yield_to_drain(BlockDriverState *bs,
     }
 }
 
-void bdrv_do_drained_begin_quiesce(BlockDriverState *bs, BdrvChild *parent)
+static void bdrv_do_drained_begin(BlockDriverState *bs, BdrvChild *parent,
+                                  bool poll)
 {
     IO_OR_GS_CODE();
-    assert(!qemu_in_coroutine());
+
+    if (qemu_in_coroutine()) {
+        bdrv_co_yield_to_drain(bs, true, parent, poll);
+        return;
+    }
 
     /* Stop things in parent-to-child order */
     if (qatomic_fetch_inc(&bs->quiesce_counter) == 0) {
@@ -359,17 +364,6 @@ void bdrv_do_drained_begin_quiesce(BlockDriverState *bs, BdrvChild *parent)
             bs->drv->bdrv_drain_begin(bs);
         }
     }
-}
-
-static void bdrv_do_drained_begin(BlockDriverState *bs, BdrvChild *parent,
-                                  bool poll)
-{
-    if (qemu_in_coroutine()) {
-        bdrv_co_yield_to_drain(bs, true, parent, poll);
-        return;
-    }
-
-    bdrv_do_drained_begin_quiesce(bs, parent);
 
     /*
      * Wait for drained requests to finish.
@@ -385,6 +379,11 @@ static void bdrv_do_drained_begin(BlockDriverState *bs, BdrvChild *parent,
     }
 }
 
+void bdrv_do_drained_begin_quiesce(BlockDriverState *bs, BdrvChild *parent)
+{
+    bdrv_do_drained_begin(bs, parent, false);
+}
+
 void bdrv_drained_begin(BlockDriverState *bs)
 {
     IO_OR_GS_CODE();
-- 
2.38.1



^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v2 14/15] block: Don't poll in bdrv_replace_child_noperm()
  2022-11-18 17:40 [PATCH v2 00/15] block: Simplify drain Kevin Wolf
                   ` (12 preceding siblings ...)
  2022-11-18 17:41 ` [PATCH v2 13/15] block: Drop out of coroutine in bdrv_do_drained_begin_quiesce() Kevin Wolf
@ 2022-11-18 17:41 ` Kevin Wolf
  2022-11-25 16:07   ` Vladimir Sementsov-Ogievskiy
  2022-12-09 16:53   ` Paolo Bonzini
  2022-11-18 17:41 ` [PATCH v2 15/15] block: Remove poll parameter from bdrv_parent_drained_begin_single() Kevin Wolf
                   ` (2 subsequent siblings)
  16 siblings, 2 replies; 31+ messages in thread
From: Kevin Wolf @ 2022-11-18 17:41 UTC (permalink / raw)
  To: qemu-block
  Cc: kwolf, eesposit, stefanha, hreitz, pbonzini, vsementsov, qemu-devel

In order to make sure that bdrv_replace_child_noperm() doesn't have to
poll any more, get rid of the bdrv_parent_drained_begin_single() call.

This is possible now because we can require that the parent is already
drained through the child in question when the function is called and we
don't call the parent drain callbacks more than once.

The additional drain calls needed in callers cause the test case to run
its code in the drain handler too early (bdrv_attach_child() drains
now), so modify it to only enable the code after the test setup has
completed.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 include/block/block-io.h     |   8 +++
 block.c                      | 103 ++++++++++++++++++++++++++++++-----
 block/io.c                   |   2 +-
 tests/unit/test-bdrv-drain.c |  10 ++++
 4 files changed, 108 insertions(+), 15 deletions(-)

diff --git a/include/block/block-io.h b/include/block/block-io.h
index 8f5e75756a..65e6d2569b 100644
--- a/include/block/block-io.h
+++ b/include/block/block-io.h
@@ -292,6 +292,14 @@ bdrv_writev_vmstate(BlockDriverState *bs, QEMUIOVector *qiov, int64_t pos);
  */
 void bdrv_parent_drained_begin_single(BdrvChild *c, bool poll);
 
+/**
+ * bdrv_parent_drained_poll_single:
+ *
+ * Returns true if there is any pending activity to cease before @c can be
+ * called quiesced, false otherwise.
+ */
+bool bdrv_parent_drained_poll_single(BdrvChild *c);
+
 /**
  * bdrv_parent_drained_end_single:
  *
diff --git a/block.c b/block.c
index d18512944d..3f12aba6ce 100644
--- a/block.c
+++ b/block.c
@@ -2409,6 +2409,20 @@ static void bdrv_replace_child_abort(void *opaque)
 
     GLOBAL_STATE_CODE();
     /* old_bs reference is transparently moved from @s to @s->child */
+    if (!s->child->bs) {
+        /*
+         * The parents were undrained when removing old_bs from the child. New
+         * requests can't have been made, though, because the child was empty.
+         *
+         * TODO Make bdrv_replace_child_noperm() transactionable to avoid
+         * undraining the parent in the first place. Once this is done, having
+         * new_bs drained when calling bdrv_replace_child_tran() is not a
+         * requirement any more.
+         */
+        bdrv_parent_drained_begin_single(s->child, false);
+        assert(!bdrv_parent_drained_poll_single(s->child));
+    }
+    assert(s->child->quiesced_parent);
     bdrv_replace_child_noperm(s->child, s->old_bs);
     bdrv_unref(new_bs);
 }
@@ -2424,12 +2438,19 @@ static TransactionActionDrv bdrv_replace_child_drv = {
  *
  * Note: real unref of old_bs is done only on commit.
  *
+ * Both @child->bs and @new_bs (if non-NULL) must be drained. @new_bs must be
+ * kept drained until the transaction is completed.
+ *
  * The function doesn't update permissions, caller is responsible for this.
  */
 static void bdrv_replace_child_tran(BdrvChild *child, BlockDriverState *new_bs,
                                     Transaction *tran)
 {
     BdrvReplaceChildState *s = g_new(BdrvReplaceChildState, 1);
+
+    assert(child->quiesced_parent);
+    assert(!new_bs || new_bs->quiesce_counter);
+
     *s = (BdrvReplaceChildState) {
         .child = child,
         .old_bs = child->bs,
@@ -2821,6 +2842,14 @@ uint64_t bdrv_qapi_perm_to_blk_perm(BlockPermission qapi_perm)
     return permissions[qapi_perm];
 }
 
+/*
+ * Replaces the node that a BdrvChild points to without updating permissions.
+ *
+ * If @new_bs is non-NULL, the parent of @child must already be drained through
+ * @child.
+ *
+ * This function does not poll.
+ */
 static void bdrv_replace_child_noperm(BdrvChild *child,
                                       BlockDriverState *new_bs)
 {
@@ -2828,6 +2857,28 @@ static void bdrv_replace_child_noperm(BdrvChild *child,
     int new_bs_quiesce_counter;
 
     assert(!child->frozen);
+
+    /*
+     * If we want to change the BdrvChild to point to a drained node as its new
+     * child->bs, we need to make sure that its new parent is drained, too. In
+     * other words, either child->quiesce_parent must already be true or we must
+     * be able to set it and keep the parent's quiesce_counter consistent with
+     * that, but without polling or starting new requests (this function
+     * guarantees that it doesn't poll, and starting new requests would be
+     * against the invariants of drain sections).
+     *
+     * To keep things simple, we pick the first option (child->quiesce_parent
+     * must already be true). We also generalise the rule a bit to make it
+     * easier to verify in callers and more likely to be covered in test cases:
+     * The parent must be quiesced through this child even if new_bs isn't
+     * currently drained.
+     *
+     * The only exception is for callers that always pass new_bs == NULL. In
+     * this case, we obviously never need to consider the case of a drained
+     * new_bs, so we can keep the callers simpler by allowing them not to drain
+     * the parent.
+     */
+    assert(!new_bs || child->quiesced_parent);
     assert(old_bs != new_bs);
     GLOBAL_STATE_CODE();
 
@@ -2835,15 +2886,6 @@ static void bdrv_replace_child_noperm(BdrvChild *child,
         assert(bdrv_get_aio_context(old_bs) == bdrv_get_aio_context(new_bs));
     }
 
-    /*
-     * If the new child node is drained but the old one was not, flush
-     * all outstanding requests to the old child node.
-     */
-    new_bs_quiesce_counter = (new_bs ? new_bs->quiesce_counter : 0);
-    if (new_bs_quiesce_counter && !child->quiesced_parent) {
-        bdrv_parent_drained_begin_single(child, true);
-    }
-
     if (old_bs) {
         if (child->klass->detach) {
             child->klass->detach(child);
@@ -2863,11 +2905,9 @@ static void bdrv_replace_child_noperm(BdrvChild *child,
     }
 
     /*
-     * If the old child node was drained but the new one is not, allow
-     * requests to come in only after the new node has been attached.
-     *
-     * Update new_bs_quiesce_counter because bdrv_parent_drained_begin_single()
-     * polls, which could have changed the value.
+     * If the parent was drained through this BdrvChild previously, but new_bs
+     * is not drained, allow requests to come in only after the new node has
+     * been attached.
      */
     new_bs_quiesce_counter = (new_bs ? new_bs->quiesce_counter : 0);
     if (!new_bs_quiesce_counter && child->quiesced_parent) {
@@ -3004,6 +3044,24 @@ static BdrvChild *bdrv_attach_child_common(BlockDriverState *child_bs,
     }
 
     bdrv_ref(child_bs);
+    /*
+     * Let every new BdrvChild start with a drained parent. Inserting the child
+     * in the graph with bdrv_replace_child_noperm() will undrain it if
+     * @child_bs is not drained.
+     *
+     * The child was only just created and is not yet visible in global state
+     * until bdrv_replace_child_noperm() inserts it into the graph, so nobody
+     * could have sent requests and polling is not necessary.
+     *
+     * Note that this means that the parent isn't fully drained yet, we only
+     * stop new requests from coming in. This is fine, we don't care about the
+     * old requests here, they are not for this child. If another place enters a
+     * drain section for the same parent, but wants it to be fully quiesced, it
+     * will not run most of the the code in .drained_begin() again (which is not
+     * a problem, we already did this), but it will still poll until the parent
+     * is fully quiesced, so it will not be negatively affected either.
+     */
+    bdrv_parent_drained_begin_single(new_child, false);
     bdrv_replace_child_noperm(new_child, child_bs);
 
     BdrvAttachChildCommonState *s = g_new(BdrvAttachChildCommonState, 1);
@@ -5061,7 +5119,10 @@ static void bdrv_remove_child(BdrvChild *child, Transaction *tran)
     }
 
     if (child->bs) {
+        BlockDriverState *bs = child->bs;
+        bdrv_drained_begin(bs);
         bdrv_replace_child_tran(child, NULL, tran);
+        bdrv_drained_end(bs);
     }
 
     tran_add(tran, &bdrv_remove_child_drv, child);
@@ -5078,6 +5139,15 @@ static void bdrv_remove_filter_or_cow_child(BlockDriverState *bs,
     bdrv_remove_child(bdrv_filter_or_cow_child(bs), tran);
 }
 
+static void undrain_on_clean_cb(void *opaque)
+{
+    bdrv_drained_end(opaque);
+}
+
+static TransactionActionDrv undrain_on_clean = {
+    .clean = undrain_on_clean_cb,
+};
+
 static int bdrv_replace_node_noperm(BlockDriverState *from,
                                     BlockDriverState *to,
                                     bool auto_skip, Transaction *tran,
@@ -5087,6 +5157,11 @@ static int bdrv_replace_node_noperm(BlockDriverState *from,
 
     GLOBAL_STATE_CODE();
 
+    bdrv_drained_begin(from);
+    bdrv_drained_begin(to);
+    tran_add(tran, &undrain_on_clean, from);
+    tran_add(tran, &undrain_on_clean, to);
+
     QLIST_FOREACH_SAFE(c, &from->parents, next_parent, next) {
         assert(c->bs == from);
         if (!should_update_child(c, to)) {
diff --git a/block/io.c b/block/io.c
index 5e9150d92c..ae64830eac 100644
--- a/block/io.c
+++ b/block/io.c
@@ -81,7 +81,7 @@ static void bdrv_parent_drained_end(BlockDriverState *bs, BdrvChild *ignore)
     }
 }
 
-static bool bdrv_parent_drained_poll_single(BdrvChild *c)
+bool bdrv_parent_drained_poll_single(BdrvChild *c)
 {
     if (c->klass->drained_poll) {
         return c->klass->drained_poll(c);
diff --git a/tests/unit/test-bdrv-drain.c b/tests/unit/test-bdrv-drain.c
index 172bc6debc..2686a8acee 100644
--- a/tests/unit/test-bdrv-drain.c
+++ b/tests/unit/test-bdrv-drain.c
@@ -1654,6 +1654,7 @@ static void test_drop_intermediate_poll(void)
 
 
 typedef struct BDRVReplaceTestState {
+    bool setup_completed;
     bool was_drained;
     bool was_undrained;
     bool has_read;
@@ -1738,6 +1739,10 @@ static void bdrv_replace_test_drain_begin(BlockDriverState *bs)
 {
     BDRVReplaceTestState *s = bs->opaque;
 
+    if (!s->setup_completed) {
+        return;
+    }
+
     if (!s->drain_count) {
         s->drain_co = qemu_coroutine_create(bdrv_replace_test_drain_co, bs);
         bdrv_inc_in_flight(bs);
@@ -1769,6 +1774,10 @@ static void bdrv_replace_test_drain_end(BlockDriverState *bs)
 {
     BDRVReplaceTestState *s = bs->opaque;
 
+    if (!s->setup_completed) {
+        return;
+    }
+
     g_assert(s->drain_count > 0);
     if (!--s->drain_count) {
         s->was_undrained = true;
@@ -1867,6 +1876,7 @@ static void do_test_replace_child_mid_drain(int old_drain_count,
     bdrv_ref(old_child_bs);
     bdrv_attach_child(parent_bs, old_child_bs, "child", &child_of_bds,
                       BDRV_CHILD_COW, &error_abort);
+    parent_s->setup_completed = true;
 
     for (i = 0; i < old_drain_count; i++) {
         bdrv_drained_begin(old_child_bs);
-- 
2.38.1



^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v2 15/15] block: Remove poll parameter from bdrv_parent_drained_begin_single()
  2022-11-18 17:40 [PATCH v2 00/15] block: Simplify drain Kevin Wolf
                   ` (13 preceding siblings ...)
  2022-11-18 17:41 ` [PATCH v2 14/15] block: Don't poll in bdrv_replace_child_noperm() Kevin Wolf
@ 2022-11-18 17:41 ` Kevin Wolf
  2022-11-25 16:08   ` Vladimir Sementsov-Ogievskiy
  2022-11-24 18:26 ` [PATCH v2 00/15] block: Simplify drain Hanna Reitz
  2022-11-28 13:00 ` Kevin Wolf
  16 siblings, 1 reply; 31+ messages in thread
From: Kevin Wolf @ 2022-11-18 17:41 UTC (permalink / raw)
  To: qemu-block
  Cc: kwolf, eesposit, stefanha, hreitz, pbonzini, vsementsov, qemu-devel

All callers of bdrv_parent_drained_begin_single() pass poll=false now,
so we don't need the parameter any more.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 include/block/block-io.h | 5 ++---
 block.c                  | 4 ++--
 block/io.c               | 8 ++------
 3 files changed, 6 insertions(+), 11 deletions(-)

diff --git a/include/block/block-io.h b/include/block/block-io.h
index 65e6d2569b..92aaa7c1e9 100644
--- a/include/block/block-io.h
+++ b/include/block/block-io.h
@@ -287,10 +287,9 @@ bdrv_writev_vmstate(BlockDriverState *bs, QEMUIOVector *qiov, int64_t pos);
 /**
  * bdrv_parent_drained_begin_single:
  *
- * Begin a quiesced section for the parent of @c. If @poll is true, wait for
- * any pending activity to cease.
+ * Begin a quiesced section for the parent of @c.
  */
-void bdrv_parent_drained_begin_single(BdrvChild *c, bool poll);
+void bdrv_parent_drained_begin_single(BdrvChild *c);
 
 /**
  * bdrv_parent_drained_poll_single:
diff --git a/block.c b/block.c
index 3f12aba6ce..8c9f4ee37c 100644
--- a/block.c
+++ b/block.c
@@ -2419,7 +2419,7 @@ static void bdrv_replace_child_abort(void *opaque)
          * new_bs drained when calling bdrv_replace_child_tran() is not a
          * requirement any more.
          */
-        bdrv_parent_drained_begin_single(s->child, false);
+        bdrv_parent_drained_begin_single(s->child);
         assert(!bdrv_parent_drained_poll_single(s->child));
     }
     assert(s->child->quiesced_parent);
@@ -3061,7 +3061,7 @@ static BdrvChild *bdrv_attach_child_common(BlockDriverState *child_bs,
      * a problem, we already did this), but it will still poll until the parent
      * is fully quiesced, so it will not be negatively affected either.
      */
-    bdrv_parent_drained_begin_single(new_child, false);
+    bdrv_parent_drained_begin_single(new_child);
     bdrv_replace_child_noperm(new_child, child_bs);
 
     BdrvAttachChildCommonState *s = g_new(BdrvAttachChildCommonState, 1);
diff --git a/block/io.c b/block/io.c
index ae64830eac..38e57d1f67 100644
--- a/block/io.c
+++ b/block/io.c
@@ -53,7 +53,7 @@ static void bdrv_parent_drained_begin(BlockDriverState *bs, BdrvChild *ignore)
         if (c == ignore) {
             continue;
         }
-        bdrv_parent_drained_begin_single(c, false);
+        bdrv_parent_drained_begin_single(c);
     }
 }
 
@@ -105,9 +105,8 @@ static bool bdrv_parent_drained_poll(BlockDriverState *bs, BdrvChild *ignore,
     return busy;
 }
 
-void bdrv_parent_drained_begin_single(BdrvChild *c, bool poll)
+void bdrv_parent_drained_begin_single(BdrvChild *c)
 {
-    AioContext *ctx = bdrv_child_get_parent_aio_context(c);
     IO_OR_GS_CODE();
 
     assert(!c->quiesced_parent);
@@ -116,9 +115,6 @@ void bdrv_parent_drained_begin_single(BdrvChild *c, bool poll)
     if (c->klass->drained_begin) {
         c->klass->drained_begin(c);
     }
-    if (poll) {
-        AIO_WAIT_WHILE(ctx, bdrv_parent_drained_poll_single(c));
-    }
 }
 
 static void bdrv_merge_limits(BlockLimits *dst, const BlockLimits *src)
-- 
2.38.1



^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 00/15] block: Simplify drain
  2022-11-18 17:40 [PATCH v2 00/15] block: Simplify drain Kevin Wolf
                   ` (14 preceding siblings ...)
  2022-11-18 17:41 ` [PATCH v2 15/15] block: Remove poll parameter from bdrv_parent_drained_begin_single() Kevin Wolf
@ 2022-11-24 18:26 ` Hanna Reitz
  2022-11-28 13:00 ` Kevin Wolf
  16 siblings, 0 replies; 31+ messages in thread
From: Hanna Reitz @ 2022-11-24 18:26 UTC (permalink / raw)
  To: Kevin Wolf, qemu-block
  Cc: eesposit, stefanha, pbonzini, vsementsov, qemu-devel

On 18.11.22 18:40, Kevin Wolf wrote:
> I'm aware that exactly nobody has been looking forward to a series with
> this title, but it has to be. The way drain works means that we need to
> poll in bdrv_replace_child_noperm() and that makes things rather messy
> with Emanuele's multiqueue work because you must not poll while you hold
> the graph lock.
>
> The other reason why it has to be is that drain is way too complex and
> there are too many different cases. Some simplification like this will
> hopefully make it considerably more maintainable. The diffstat probably
> tells something, too.
>
> There are roughly speaking three parts in this series:
>
> 1. Make BlockDriver.bdrv_drained_begin/end() non-coroutine_fn again,
>     which allows us to not poll on bdrv_drained_end() any more.
>
> 2. Remove subtree drains. They are a considerable complication in the
>     whole drain machinery (in particular, they require polling in the
>     BdrvChildClass.attach/detach() callbacks that are called during
>     bdrv_replace_child_noperm()) and none of their users actually has a
>     good reason to use them.
>
> 3. Finally get rid of polling in bdrv_replace_child_noperm() by
>     requiring that the child is already drained by the caller and calling
>     callbacks only once and not again for every nested drain section.
>
> If necessary, a prefix of this series can be merged that covers only the
> first or the first two parts and it would still make sense.
>
> v2:
> - Rebased on master
> - Patch 3: Removed left over _co parts in function names
> - Patch 4: Updated function comments to reflect that we're not polling
>    any more
> - Patch 6 (new): Fix inconsistent AioContext locking for reopen code
> - Patch 9 (was 8): Added comment to clarify when polling is allowed
>    and the graph may change again
> - Patch 11 (was 10):
>    * Reworded some comments and the commit message.
>    * Dropped a now unnecessary assertion that was dropped only in a later
>      patch in v1 of the series.
>    * Changed 'int parent_quiesce_counter' into 'bool quiesced_parent'
> - Patch 12 (was 11): Don't remove ignore_bds_parents from
>    bdrv_drain_poll(), it is actually still a valid optimisation there
>    that makes polling O(n) instead of O(n²)
> - Patch 13 (new): Instead of only removing assert(!qemu_in_coroutine())
>    like in v1 of the series, drop out of coroutine context in
>    bdrv_do_drained_begin_quiesce() just to be sure that we'll never get
>    coroutine surprises in drain code.
> - Patch 14 (was 12): More and reworded comments to make things hopefully
>    a bit clearer

Thanks!

Reviewed-by: Hanna Reitz <hreitz@redhat.com>



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 06/15] block: Fix locking for bdrv_reopen_queue_child()
  2022-11-18 17:41 ` [PATCH v2 06/15] block: Fix locking for bdrv_reopen_queue_child() Kevin Wolf
@ 2022-11-25 12:56   ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 31+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2022-11-25 12:56 UTC (permalink / raw)
  To: Kevin Wolf, qemu-block; +Cc: eesposit, stefanha, hreitz, pbonzini, qemu-devel

On 11/18/22 20:41, Kevin Wolf wrote:
> Callers don't agree whether bdrv_reopen_queue_child() should be called
> with the AioContext lock held or not. Standardise on holding the lock
> (as done by QMP blockdev-reopen and the replication block driver) and
> fix bdrv_reopen() to do the same.
> 
> Signed-off-by: Kevin Wolf<kwolf@redhat.com>

Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>

-- 
Best regards,
Vladimir



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 07/15] block: Drain invidual nodes during reopen
  2022-11-18 17:41 ` [PATCH v2 07/15] block: Drain invidual nodes during reopen Kevin Wolf
@ 2022-11-25 13:10   ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 31+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2022-11-25 13:10 UTC (permalink / raw)
  To: Kevin Wolf, qemu-block; +Cc: eesposit, stefanha, hreitz, pbonzini, qemu-devel

in subject: s/invidual/individual/

On 11/18/22 20:41, Kevin Wolf wrote:
> bdrv_reopen() and friends use subtree drains as a lazy way of covering
> all the nodes they touch. Turns out that this lazy way is a lot more
> complicated than just draining the nodes individually, even not
> accounting for the additional complexity in the drain mechanism itself.
> 
> Simplify the code by switching to draining the individual nodes that are
> already managed in the BlockReopenQueue anyway.
> 
> Signed-off-by: Kevin Wolf<kwolf@redhat.com>

Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>

-- 
Best regards,
Vladimir



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 11/15] block: Call drain callbacks only once
  2022-11-18 17:41 ` [PATCH v2 11/15] block: Call drain callbacks only once Kevin Wolf
@ 2022-11-25 14:59   ` Vladimir Sementsov-Ogievskiy
  2022-11-28 12:37     ` Kevin Wolf
  0 siblings, 1 reply; 31+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2022-11-25 14:59 UTC (permalink / raw)
  To: Kevin Wolf, qemu-block; +Cc: eesposit, stefanha, hreitz, pbonzini, qemu-devel

On 11/18/22 20:41, Kevin Wolf wrote:
> We only need to call both the BlockDriver's callback and the parent
> callbacks when going from undrained to drained or vice versa. A second
> drain section doesn't make a difference for the driver or the parent,
> they weren't supposed to send new requests before and after the second
> drain.
> 
> One thing that gets in the way is the 'ignore_bds_parents' parameter in
> bdrv_do_drained_begin_quiesce() and bdrv_do_drained_end(): It means that
> bdrv_drain_all_begin() increases bs->quiesce_counter, but does not
> quiesce the parent through BdrvChildClass callbacks. If an additional
> drain section is started now, bs->quiesce_counter will be non-zero, but
> we would still need to quiesce the parent through BdrvChildClass in
> order to keep things consistent (and unquiesce it on the matching
> bdrv_drained_end(), even though the counter would reach 0 yet as long as

would reach -> would NOT reach

if I understand correctly)

> the bdrv_drain_all() section is still active).
> 
> Instead of keeping track of this, let's just get rid of the parameter.
> It was introduced in commit 6cd5c9d7b2d as an optimisation so that
> during bdrv_drain_all(), we wouldn't recursively drain all parents up to
> the root for each node, resulting in quadratic complexity. As it happens,
> calling the callbacks only once solves the same problem, so as of this
> patch, we'll still have O(n) complexity and ignore_bds_parents is not
> needed any more.
> 
> This patch only ignores the 'ignore_bds_parents' parameter. It will be
> removed in a separate patch.
> 
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> Reviewed-by: Hanna Reitz <hreitz@redhat.com>

Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>




-- 
Best regards,
Vladimir



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 12/15] block: Remove ignore_bds_parents parameter from drain_begin/end.
  2022-11-18 17:41 ` [PATCH v2 12/15] block: Remove ignore_bds_parents parameter from drain_begin/end Kevin Wolf
@ 2022-11-25 15:05   ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 31+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2022-11-25 15:05 UTC (permalink / raw)
  To: Kevin Wolf, qemu-block; +Cc: eesposit, stefanha, hreitz, pbonzini, qemu-devel

On 11/18/22 20:41, Kevin Wolf wrote:
> ignore_bds_parents is now ignored during drain_begin and drain_end, so
> we can just remove it there. It is still a valid optimisation for
> drain_all in bdrv_drained_poll(), so leave it around there.
> 
> Signed-off-by: Kevin Wolf<kwolf@redhat.com>


Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>

-- 
Best regards,
Vladimir



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 13/15] block: Drop out of coroutine in bdrv_do_drained_begin_quiesce()
  2022-11-18 17:41 ` [PATCH v2 13/15] block: Drop out of coroutine in bdrv_do_drained_begin_quiesce() Kevin Wolf
@ 2022-11-25 15:13   ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 31+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2022-11-25 15:13 UTC (permalink / raw)
  To: Kevin Wolf, qemu-block; +Cc: eesposit, stefanha, hreitz, pbonzini, qemu-devel

On 11/18/22 20:41, Kevin Wolf wrote:
> The next patch adds a parent drain to bdrv_attach_child_common(), which
> shouldn't be, but is currently called from coroutines in some cases (e.g.
> .bdrv_co_create implementations generally open new nodes). Therefore,
> the assertion that we're not in a coroutine doesn't hold true any more.
> 
> We could just remove the assertion because there is nothing in the
> function that should be in conflict with running in a coroutine, but
> just to be on the safe side, we can reverse the caller relationship
> between bdrv_do_drained_begin() and bdrv_do_drained_begin_quiesce() so
> that the latter also just drops out of coroutine context and we can
> still be certain in the future that any drain code doesn't run in
> coroutines.
> 
> As a nice side effect, the structure of bdrv_do_drained_begin() is now
> symmetrical with bdrv_do_drained_end().
> 
> Signed-off-by: Kevin Wolf<kwolf@redhat.com>

Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>

-- 
Best regards,
Vladimir



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 14/15] block: Don't poll in bdrv_replace_child_noperm()
  2022-11-18 17:41 ` [PATCH v2 14/15] block: Don't poll in bdrv_replace_child_noperm() Kevin Wolf
@ 2022-11-25 16:07   ` Vladimir Sementsov-Ogievskiy
  2022-11-28 12:59     ` Kevin Wolf
  2022-12-09 16:53   ` Paolo Bonzini
  1 sibling, 1 reply; 31+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2022-11-25 16:07 UTC (permalink / raw)
  To: Kevin Wolf, qemu-block; +Cc: eesposit, stefanha, hreitz, pbonzini, qemu-devel

On 11/18/22 20:41, Kevin Wolf wrote:
> In order to make sure that bdrv_replace_child_noperm() doesn't have to
> poll any more, get rid of the bdrv_parent_drained_begin_single() call.
> 
> This is possible now because we can require that the parent is already
> drained through the child in question when the function is called and we
> don't call the parent drain callbacks more than once.
> 
> The additional drain calls needed in callers cause the test case to run
> its code in the drain handler too early (bdrv_attach_child() drains
> now), so modify it to only enable the code after the test setup has
> completed.
> 
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>

It's still hard to keep this all in mind, so weak:
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>

> ---
>   include/block/block-io.h     |   8 +++
>   block.c                      | 103 ++++++++++++++++++++++++++++++-----
>   block/io.c                   |   2 +-
>   tests/unit/test-bdrv-drain.c |  10 ++++
>   4 files changed, 108 insertions(+), 15 deletions(-)
> 
> diff --git a/include/block/block-io.h b/include/block/block-io.h
> index 8f5e75756a..65e6d2569b 100644
> --- a/include/block/block-io.h
> +++ b/include/block/block-io.h
> @@ -292,6 +292,14 @@ bdrv_writev_vmstate(BlockDriverState *bs, QEMUIOVector *qiov, int64_t pos);
>    */
>   void bdrv_parent_drained_begin_single(BdrvChild *c, bool poll);
>   
> +/**
> + * bdrv_parent_drained_poll_single:
> + *
> + * Returns true if there is any pending activity to cease before @c can be
> + * called quiesced, false otherwise.
> + */
> +bool bdrv_parent_drained_poll_single(BdrvChild *c);
> +
>   /**
>    * bdrv_parent_drained_end_single:
>    *
> diff --git a/block.c b/block.c
> index d18512944d..3f12aba6ce 100644
> --- a/block.c
> +++ b/block.c

[..]

> @@ -2863,11 +2905,9 @@ static void bdrv_replace_child_noperm(BdrvChild *child,
>       }
>   
>       /*
> -     * If the old child node was drained but the new one is not, allow
> -     * requests to come in only after the new node has been attached.
> -     *
> -     * Update new_bs_quiesce_counter because bdrv_parent_drained_begin_single()
> -     * polls, which could have changed the value.
> +     * If the parent was drained through this BdrvChild previously, but new_bs
> +     * is not drained, allow requests to come in only after the new node has
> +     * been attached.

As I understand,the main reason why we MUST do this automatic undrain, is the contract with the user:

User things that:

1. It's enough to drain node X to drain all its parents (thanks to recursion)
2. User should undrain exactly same nodes that was drained by hand, everything that was drained automatically would be automatically undrained.

So here we break the connection between X and its parent, therefore recursion will not help on final undrain. So, we should do the automation here.



I have an idea how to (probably) make things even more simple.

1. drop .quiesced_parent

2. consider the Full graph, including non-bds parents, and support having .quiesce_counter for non-bds parents. (probably need some structure to unify bds and non-bds nodes of the Full graph - Generic ndoe, that's not the first time we are saying about that)

3. drop any recursion and automatic drain/undrain

4. user is responsible to drain all the nodes and their parents as needed to proceed with some block graph modification

5. user keeps the list of all nodes that was drained to undrain them in the end

6. node may be drained only when all its parents are already drained (add an assertion)


And of course we need helpers for the user to do 4-6. That should work similar to permissions update. Add a function to produce a topologically sorted list of Generic nodes (starting from some node and include all its parents and their parents and so on), and  simple functions that to drain/undrain of such list in a loop.



-- 
Best regards,
Vladimir



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 15/15] block: Remove poll parameter from bdrv_parent_drained_begin_single()
  2022-11-18 17:41 ` [PATCH v2 15/15] block: Remove poll parameter from bdrv_parent_drained_begin_single() Kevin Wolf
@ 2022-11-25 16:08   ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 31+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2022-11-25 16:08 UTC (permalink / raw)
  To: Kevin Wolf, qemu-block; +Cc: eesposit, stefanha, hreitz, pbonzini, qemu-devel

On 11/18/22 20:41, Kevin Wolf wrote:
> All callers of bdrv_parent_drained_begin_single() pass poll=false now,
> so we don't need the parameter any more.
> 
> Signed-off-by: Kevin Wolf<kwolf@redhat.com>

Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>

-- 
Best regards,
Vladimir



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 11/15] block: Call drain callbacks only once
  2022-11-25 14:59   ` Vladimir Sementsov-Ogievskiy
@ 2022-11-28 12:37     ` Kevin Wolf
  0 siblings, 0 replies; 31+ messages in thread
From: Kevin Wolf @ 2022-11-28 12:37 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy
  Cc: qemu-block, eesposit, stefanha, hreitz, pbonzini, qemu-devel

Am 25.11.2022 um 15:59 hat Vladimir Sementsov-Ogievskiy geschrieben:
> On 11/18/22 20:41, Kevin Wolf wrote:
> > We only need to call both the BlockDriver's callback and the parent
> > callbacks when going from undrained to drained or vice versa. A second
> > drain section doesn't make a difference for the driver or the parent,
> > they weren't supposed to send new requests before and after the second
> > drain.
> > 
> > One thing that gets in the way is the 'ignore_bds_parents' parameter in
> > bdrv_do_drained_begin_quiesce() and bdrv_do_drained_end(): It means that
> > bdrv_drain_all_begin() increases bs->quiesce_counter, but does not
> > quiesce the parent through BdrvChildClass callbacks. If an additional
> > drain section is started now, bs->quiesce_counter will be non-zero, but
> > we would still need to quiesce the parent through BdrvChildClass in
> > order to keep things consistent (and unquiesce it on the matching
> > bdrv_drained_end(), even though the counter would reach 0 yet as long as
> 
> would reach -> would NOT reach
> 
> if I understand correctly)

Yes, you're right. Thanks for catching it. Fixing this while applying
the series.

Kevin

> > the bdrv_drain_all() section is still active).
> > 
> > Instead of keeping track of this, let's just get rid of the parameter.
> > It was introduced in commit 6cd5c9d7b2d as an optimisation so that
> > during bdrv_drain_all(), we wouldn't recursively drain all parents up to
> > the root for each node, resulting in quadratic complexity. As it happens,
> > calling the callbacks only once solves the same problem, so as of this
> > patch, we'll still have O(n) complexity and ignore_bds_parents is not
> > needed any more.
> > 
> > This patch only ignores the 'ignore_bds_parents' parameter. It will be
> > removed in a separate patch.
> > 
> > Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> > Reviewed-by: Hanna Reitz <hreitz@redhat.com>
> 
> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
> 
> 
> 
> 
> -- 
> Best regards,
> Vladimir
> 



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 14/15] block: Don't poll in bdrv_replace_child_noperm()
  2022-11-25 16:07   ` Vladimir Sementsov-Ogievskiy
@ 2022-11-28 12:59     ` Kevin Wolf
  0 siblings, 0 replies; 31+ messages in thread
From: Kevin Wolf @ 2022-11-28 12:59 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy
  Cc: qemu-block, eesposit, stefanha, hreitz, pbonzini, qemu-devel

Am 25.11.2022 um 17:07 hat Vladimir Sementsov-Ogievskiy geschrieben:
> On 11/18/22 20:41, Kevin Wolf wrote:
> > In order to make sure that bdrv_replace_child_noperm() doesn't have to
> > poll any more, get rid of the bdrv_parent_drained_begin_single() call.
> > 
> > This is possible now because we can require that the parent is already
> > drained through the child in question when the function is called and we
> > don't call the parent drain callbacks more than once.
> > 
> > The additional drain calls needed in callers cause the test case to run
> > its code in the drain handler too early (bdrv_attach_child() drains
> > now), so modify it to only enable the code after the test setup has
> > completed.
> > 
> > Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> 
> It's still hard to keep this all in mind, so weak:
> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
> 
> > ---
> >   include/block/block-io.h     |   8 +++
> >   block.c                      | 103 ++++++++++++++++++++++++++++++-----
> >   block/io.c                   |   2 +-
> >   tests/unit/test-bdrv-drain.c |  10 ++++
> >   4 files changed, 108 insertions(+), 15 deletions(-)
> > 
> > diff --git a/include/block/block-io.h b/include/block/block-io.h
> > index 8f5e75756a..65e6d2569b 100644
> > --- a/include/block/block-io.h
> > +++ b/include/block/block-io.h
> > @@ -292,6 +292,14 @@ bdrv_writev_vmstate(BlockDriverState *bs, QEMUIOVector *qiov, int64_t pos);
> >    */
> >   void bdrv_parent_drained_begin_single(BdrvChild *c, bool poll);
> > +/**
> > + * bdrv_parent_drained_poll_single:
> > + *
> > + * Returns true if there is any pending activity to cease before @c can be
> > + * called quiesced, false otherwise.
> > + */
> > +bool bdrv_parent_drained_poll_single(BdrvChild *c);
> > +
> >   /**
> >    * bdrv_parent_drained_end_single:
> >    *
> > diff --git a/block.c b/block.c
> > index d18512944d..3f12aba6ce 100644
> > --- a/block.c
> > +++ b/block.c
> 
> [..]
> 
> > @@ -2863,11 +2905,9 @@ static void bdrv_replace_child_noperm(BdrvChild *child,
> >       }
> >       /*
> > -     * If the old child node was drained but the new one is not, allow
> > -     * requests to come in only after the new node has been attached.
> > -     *
> > -     * Update new_bs_quiesce_counter because bdrv_parent_drained_begin_single()
> > -     * polls, which could have changed the value.
> > +     * If the parent was drained through this BdrvChild previously, but new_bs
> > +     * is not drained, allow requests to come in only after the new node has
> > +     * been attached.
> 
> As I understand,the main reason why we MUST do this automatic undrain, is the contract with the user:
> 
> User things that:
> 
> 1. It's enough to drain node X to drain all its parents (thanks to recursion)
> 2. User should undrain exactly same nodes that was drained by hand, everything that was drained automatically would be automatically undrained.
> 
> So here we break the connection between X and its parent, therefore recursion will not help on final undrain. So, we should do the automation here.

Yes, I think that's the idea behind our interface.

> I have an idea how to (probably) make things even more simple.
> 
> 1. drop .quiesced_parent
> 
> 2. consider the Full graph, including non-bds parents, and support having .quiesce_counter for non-bds parents. (probably need some structure to unify bds and non-bds nodes of the Full graph - Generic ndoe, that's not the first time we are saying about that)
> 
> 3. drop any recursion and automatic drain/undrain
> 
> 4. user is responsible to drain all the nodes and their parents as needed to proceed with some block graph modification
> 
> 5. user keeps the list of all nodes that was drained to undrain them in the end
> 
> 6. node may be drained only when all its parents are already drained (add an assertion)
> 
> And of course we need helpers for the user to do 4-6. That should work similar to permissions update. Add a function to produce a topologically sorted list of Generic nodes (starting from some node and include all its parents and their parents and so on), and  simple functions that to drain/undrain of such list in a loop.

I understand your idea and it looks nice at the first sight.

However, on second thought, I'm not sure how easy and nice this would
actually turn out: You will lose the invariant that if a node is
drained, its parent will always be drained, too - it depends on the
caller now. You also can't delete a node that is still drained, you need
to keep it around for undraining. Same thing with graph modifications
in a drained section: You won't automatically get a consistent state
regarding drain in the new graph layout, so you would have to manually
make sure that both old and new children are drained.

It feels like this will lead to new complications that might not be any
easier to manage than the old ones.

Kevin



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 00/15] block: Simplify drain
  2022-11-18 17:40 [PATCH v2 00/15] block: Simplify drain Kevin Wolf
                   ` (15 preceding siblings ...)
  2022-11-24 18:26 ` [PATCH v2 00/15] block: Simplify drain Hanna Reitz
@ 2022-11-28 13:00 ` Kevin Wolf
  16 siblings, 0 replies; 31+ messages in thread
From: Kevin Wolf @ 2022-11-28 13:00 UTC (permalink / raw)
  To: qemu-block; +Cc: eesposit, stefanha, hreitz, pbonzini, vsementsov, qemu-devel

Am 18.11.2022 um 18:40 hat Kevin Wolf geschrieben:
> I'm aware that exactly nobody has been looking forward to a series with
> this title, but it has to be. The way drain works means that we need to
> poll in bdrv_replace_child_noperm() and that makes things rather messy
> with Emanuele's multiqueue work because you must not poll while you hold
> the graph lock.
> 
> The other reason why it has to be is that drain is way too complex and
> there are too many different cases. Some simplification like this will
> hopefully make it considerably more maintainable. The diffstat probably
> tells something, too.
> 
> There are roughly speaking three parts in this series:
> 
> 1. Make BlockDriver.bdrv_drained_begin/end() non-coroutine_fn again,
>    which allows us to not poll on bdrv_drained_end() any more.
> 
> 2. Remove subtree drains. They are a considerable complication in the
>    whole drain machinery (in particular, they require polling in the
>    BdrvChildClass.attach/detach() callbacks that are called during
>    bdrv_replace_child_noperm()) and none of their users actually has a
>    good reason to use them.
> 
> 3. Finally get rid of polling in bdrv_replace_child_noperm() by
>    requiring that the child is already drained by the caller and calling
>    callbacks only once and not again for every nested drain section.
> 
> If necessary, a prefix of this series can be merged that covers only the
> first or the first two parts and it would still make sense.

Thanks for the review, applied to block-next.

Kevin



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 14/15] block: Don't poll in bdrv_replace_child_noperm()
  2022-11-18 17:41 ` [PATCH v2 14/15] block: Don't poll in bdrv_replace_child_noperm() Kevin Wolf
  2022-11-25 16:07   ` Vladimir Sementsov-Ogievskiy
@ 2022-12-09 16:53   ` Paolo Bonzini
  2022-12-12  9:09     ` Paolo Bonzini
  2022-12-12 15:57     ` Kevin Wolf
  1 sibling, 2 replies; 31+ messages in thread
From: Paolo Bonzini @ 2022-12-09 16:53 UTC (permalink / raw)
  To: Kevin Wolf, qemu-block; +Cc: eesposit, stefanha, hreitz, vsementsov, qemu-devel

On 11/18/22 18:41, Kevin Wolf wrote:
> In order to make sure that bdrv_replace_child_noperm() doesn't have to
> poll any more, get rid of the bdrv_parent_drained_begin_single() call.
> 
> This is possible now because we can require that the parent is already
> drained through the child in question when the function is called and we
> don't call the parent drain callbacks more than once.
> 
> The additional drain calls needed in callers cause the test case to run
> its code in the drain handler too early (bdrv_attach_child() drains
> now), so modify it to only enable the code after the test setup has
> completed.
> 
> Signed-off-by: Kevin Wolf<kwolf@redhat.com>

I hate to bear bad news, but this breaks the Windows builds on github 
(msys-32bit, msys-64bit) with an obscure but 100% reproducible

51/88 qemu:unit / test-bdrv-drain 
ERROR           1.30s   (exit status 3221225477 or signal 3221225349 
SIGinvalid)

The exit status is 0xC0000005 aka a Windows SIGSEGV.  With some luck it 
could be reproducible with Wine (but no gdb).

Paolo



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 14/15] block: Don't poll in bdrv_replace_child_noperm()
  2022-12-09 16:53   ` Paolo Bonzini
@ 2022-12-12  9:09     ` Paolo Bonzini
  2022-12-12 15:57     ` Kevin Wolf
  1 sibling, 0 replies; 31+ messages in thread
From: Paolo Bonzini @ 2022-12-12  9:09 UTC (permalink / raw)
  To: Kevin Wolf, qemu-block; +Cc: eesposit, stefanha, hreitz, vsementsov, qemu-devel

On 12/9/22 17:53, Paolo Bonzini wrote:
> On 11/18/22 18:41, Kevin Wolf wrote:
>> In order to make sure that bdrv_replace_child_noperm() doesn't have to
>> poll any more, get rid of the bdrv_parent_drained_begin_single() call.
>>
>> This is possible now because we can require that the parent is already
>> drained through the child in question when the function is called and we
>> don't call the parent drain callbacks more than once.
>>
>> The additional drain calls needed in callers cause the test case to run
>> its code in the drain handler too early (bdrv_attach_child() drains
>> now), so modify it to only enable the code after the test setup has
>> completed.
>>
>> Signed-off-by: Kevin Wolf<kwolf@redhat.com>
> 
> I hate to bear bad news, but this breaks the Windows builds on github 
> (msys-32bit, msys-64bit) with an obscure but 100% reproducible
> 
> 51/88 qemu:unit / test-bdrv-drain ERROR           1.30s   (exit status 
> 3221225477 or signal 3221225349 SIGinvalid)
> 
> The exit status is 0xC0000005 aka a Windows SIGSEGV.  With some luck it 
> could be reproducible with Wine (but no gdb).

I am testing dropping this patch and squashing

diff --git a/block.c b/block.c
index 1a2a8d9de907..bb9c85222168 100644
--- a/block.c
+++ b/block.c
@@ -2870,7 +2870,9 @@ static void bdrv_replace_child_noperm(BdrvChild 
*child,
       */
      new_bs_quiesce_counter = (new_bs ? new_bs->quiesce_counter : 0);
      if (new_bs_quiesce_counter && !child->quiesced_parent) {
-        bdrv_parent_drained_begin_single(child, true);
+        bdrv_parent_drained_begin_single(child);
+        AIO_WAIT_WHILE(bdrv_child_get_parent_aio_context(child),
+                       bdrv_parent_drained_poll_single(c));
      }

      if (old_bs) {

in patch 15/15, which should at least allow me to keep a lot of the 
cleanups I had on top of this series.

Paolo



^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 14/15] block: Don't poll in bdrv_replace_child_noperm()
  2022-12-09 16:53   ` Paolo Bonzini
  2022-12-12  9:09     ` Paolo Bonzini
@ 2022-12-12 15:57     ` Kevin Wolf
  2022-12-12 17:31       ` Paolo Bonzini
  1 sibling, 1 reply; 31+ messages in thread
From: Kevin Wolf @ 2022-12-12 15:57 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: qemu-block, eesposit, stefanha, hreitz, vsementsov, qemu-devel

Am 09.12.2022 um 17:53 hat Paolo Bonzini geschrieben:
> On 11/18/22 18:41, Kevin Wolf wrote:
> > In order to make sure that bdrv_replace_child_noperm() doesn't have to
> > poll any more, get rid of the bdrv_parent_drained_begin_single() call.
> > 
> > This is possible now because we can require that the parent is already
> > drained through the child in question when the function is called and we
> > don't call the parent drain callbacks more than once.
> > 
> > The additional drain calls needed in callers cause the test case to run
> > its code in the drain handler too early (bdrv_attach_child() drains
> > now), so modify it to only enable the code after the test setup has
> > completed.
> > 
> > Signed-off-by: Kevin Wolf<kwolf@redhat.com>
> 
> I hate to bear bad news, but this breaks the Windows builds on github
> (msys-32bit, msys-64bit) with an obscure but 100% reproducible
> 
> 51/88 qemu:unit / test-bdrv-drain ERROR           1.30s   (exit status
> 3221225477 or signal 3221225349 SIGinvalid)
> 
> The exit status is 0xC0000005 aka a Windows SIGSEGV.  With some luck it
> could be reproducible with Wine (but no gdb).

I can reproduce it with mingw+wine (and actually gdb! Still a somewhat
limited debugging environment, but not as bad as I imagined.)

I looks to me like this is a problem with the test case rather than the
change per se. It seems to be fixed with this patch that is already
posted as part of the next series:

[PATCH 09/18] test-bdrv-drain: Fix incorrrect drain assumptions
https://lists.gnu.org/archive/html/qemu-block/2022-12/msg00165.html

Kevin



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 14/15] block: Don't poll in bdrv_replace_child_noperm()
  2022-12-12 15:57     ` Kevin Wolf
@ 2022-12-12 17:31       ` Paolo Bonzini
  0 siblings, 0 replies; 31+ messages in thread
From: Paolo Bonzini @ 2022-12-12 17:31 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: qemu-block, eesposit, stefanha, hreitz, vsementsov, qemu-devel

On 12/12/22 16:57, Kevin Wolf wrote:
> I looks to me like this is a problem with the test case rather than the
> change per se. It seems to be fixed with this patch that is already
> posted as part of the next series:
> 
> [PATCH 09/18] test-bdrv-drain: Fix incorrrect drain assumptions
> https://lists.gnu.org/archive/html/qemu-block/2022-12/msg00165.html

Cool, thanks.  I'll retest once the series hits block-next.  In the 
meanwhile you can ignore the first three patches of 
http://patchew.org/QEMU/20221212125920.248567-1-pbonzini@redhat.com.

Paolo



^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2022-12-12 17:31 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-11-18 17:40 [PATCH v2 00/15] block: Simplify drain Kevin Wolf
2022-11-18 17:40 ` [PATCH v2 01/15] qed: Don't yield in bdrv_qed_co_drain_begin() Kevin Wolf
2022-11-18 17:40 ` [PATCH v2 02/15] test-bdrv-drain: Don't yield in .bdrv_co_drained_begin/end() Kevin Wolf
2022-11-18 17:40 ` [PATCH v2 03/15] block: Revert .bdrv_drained_begin/end to non-coroutine_fn Kevin Wolf
2022-11-18 17:40 ` [PATCH v2 04/15] block: Remove drained_end_counter Kevin Wolf
2022-11-18 17:41 ` [PATCH v2 05/15] block: Inline bdrv_drain_invoke() Kevin Wolf
2022-11-18 17:41 ` [PATCH v2 06/15] block: Fix locking for bdrv_reopen_queue_child() Kevin Wolf
2022-11-25 12:56   ` Vladimir Sementsov-Ogievskiy
2022-11-18 17:41 ` [PATCH v2 07/15] block: Drain invidual nodes during reopen Kevin Wolf
2022-11-25 13:10   ` Vladimir Sementsov-Ogievskiy
2022-11-18 17:41 ` [PATCH v2 08/15] block: Don't use subtree drains in bdrv_drop_intermediate() Kevin Wolf
2022-11-18 17:41 ` [PATCH v2 09/15] stream: Replace subtree drain with a single node drain Kevin Wolf
2022-11-18 17:41 ` [PATCH v2 10/15] block: Remove subtree drains Kevin Wolf
2022-11-18 17:41 ` [PATCH v2 11/15] block: Call drain callbacks only once Kevin Wolf
2022-11-25 14:59   ` Vladimir Sementsov-Ogievskiy
2022-11-28 12:37     ` Kevin Wolf
2022-11-18 17:41 ` [PATCH v2 12/15] block: Remove ignore_bds_parents parameter from drain_begin/end Kevin Wolf
2022-11-25 15:05   ` Vladimir Sementsov-Ogievskiy
2022-11-18 17:41 ` [PATCH v2 13/15] block: Drop out of coroutine in bdrv_do_drained_begin_quiesce() Kevin Wolf
2022-11-25 15:13   ` Vladimir Sementsov-Ogievskiy
2022-11-18 17:41 ` [PATCH v2 14/15] block: Don't poll in bdrv_replace_child_noperm() Kevin Wolf
2022-11-25 16:07   ` Vladimir Sementsov-Ogievskiy
2022-11-28 12:59     ` Kevin Wolf
2022-12-09 16:53   ` Paolo Bonzini
2022-12-12  9:09     ` Paolo Bonzini
2022-12-12 15:57     ` Kevin Wolf
2022-12-12 17:31       ` Paolo Bonzini
2022-11-18 17:41 ` [PATCH v2 15/15] block: Remove poll parameter from bdrv_parent_drained_begin_single() Kevin Wolf
2022-11-25 16:08   ` Vladimir Sementsov-Ogievskiy
2022-11-24 18:26 ` [PATCH v2 00/15] block: Simplify drain Hanna Reitz
2022-11-28 13:00 ` Kevin Wolf

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.