All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/12] More cleanups and fixes for drain
@ 2022-12-12 12:59 Paolo Bonzini
  2022-12-12 12:59 ` [PATCH 01/15] Revert "block: Remove poll parameter from bdrv_parent_drained_begin_single()" Paolo Bonzini
                   ` (16 more replies)
  0 siblings, 17 replies; 22+ messages in thread
From: Paolo Bonzini @ 2022-12-12 12:59 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, eesposit, kwolf

There are a few more lines of code that can be removed around draining
code, but especially the logic can be simplified by removing unnecessary
parameters.

Due to the failure of the block-next branch, the first three patches
drop patches 14+15 of Kevin's drain cleanup series, and then redo
patch 15 in a slightly less satisfactory way that still enables the
remaining cleanups.  These reverts are not supposed to be applied;
either the offending patches are dropped from the branch, or if the
issue is fixed then my first three patches can go away.

The next three are taken from Emanuele's old subtree drain attempt
at removing the AioContext.  The main one is the second, which is needed
to avoid testcase failures, but I included all of them for simplicity.

Patch 7 fixes another latent bug exposed by the later cleanups, and while
looking for a fix I noticed a general lack of thread-safety in BlockBackend's
drain code.  There are some global properties that only need to be documented
and enforced to be set only at creation time (patches 8/9), but also
queued_requests is not protected by any mutex, which is fixed in patch 10.

Finally, patches 11-15 are the actual simplification.

Applies on top of block-next.

Paolo

Emanuele Giuseppe Esposito (3):
  test-bdrv-drain.c: remove test_detach_by_parent_cb()
  tests/unit/test-bdrv-drain.c: graph setup functions can't run in
    coroutines
  tests/qemu-iotests/030: test_stream_parallel should use
    auto_finalize=False

Paolo Bonzini (12):
  Revert "block: Remove poll parameter from
    bdrv_parent_drained_begin_single()"
  Revert "block: Don't poll in bdrv_replace_child_noperm()"
  block: Pull polling out of bdrv_parent_drained_begin_single()
  block-backend: enter aio coroutine only after drain
  nbd: a BlockExport always has a BlockBackend
  block-backend: make global properties write-once
  block-backend: always wait for drain before starting operation
  block-backend: make queued_requests thread-safe
  block: limit bdrv_co_yield_to_drain to drain_begin
  block: second argument of bdrv_do_drained_end is always NULL
  block: second argument of bdrv_do_drained_begin and bdrv_drain_poll is
    always NULL
  block: only get out of coroutine context for polling

 block.c                           | 109 ++++----------------
 block/block-backend.c             |  91 +++++++++-------
 block/commit.c                    |   4 +-
 block/export/export.c             |   2 +-
 block/io.c                        | 136 +++++++++---------------
 block/mirror.c                    |   4 +-
 block/parallels.c                 |   2 +-
 block/qcow.c                      |   2 +-
 block/qcow2.c                     |   2 +-
 block/qed.c                       |   2 +-
 block/stream.c                    |   4 +-
 block/vdi.c                       |   2 +-
 block/vhdx.c                      |   2 +-
 block/vmdk.c                      |   4 +-
 block/vpc.c                       |   2 +-
 include/block/block-io.h          |  29 +-----
 include/block/block_int-io.h      |  21 ++++
 include/sysemu/block-backend-io.h |   6 +-
 nbd/server.c                      |  15 ++-
 tests/qemu-iotests/030            |  12 +--
 tests/unit/test-bdrv-drain.c      | 166 +++++++++++++++---------------
 tests/unit/test-block-iothread.c  |   2 +-
 22 files changed, 258 insertions(+), 361 deletions(-)

-- 
2.38.1



^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH 01/15] Revert "block: Remove poll parameter from bdrv_parent_drained_begin_single()"
  2022-12-12 12:59 [PATCH 00/12] More cleanups and fixes for drain Paolo Bonzini
@ 2022-12-12 12:59 ` Paolo Bonzini
  2022-12-12 12:59 ` [PATCH 02/15] Revert "block: Don't poll in bdrv_replace_child_noperm()" Paolo Bonzini
                   ` (15 subsequent siblings)
  16 siblings, 0 replies; 22+ messages in thread
From: Paolo Bonzini @ 2022-12-12 12:59 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, eesposit, kwolf

This reverts commit dcc5d4bc2abed4268bf31908193c4369e4c9d005.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 block.c                  | 4 ++--
 block/io.c               | 8 ++++++--
 include/block/block-io.h | 5 +++--
 3 files changed, 11 insertions(+), 6 deletions(-)

diff --git a/block.c b/block.c
index 6191ac1f440c..87022f4cd971 100644
--- a/block.c
+++ b/block.c
@@ -2377,7 +2377,7 @@ static void bdrv_replace_child_abort(void *opaque)
          * new_bs drained when calling bdrv_replace_child_tran() is not a
          * requirement any more.
          */
-        bdrv_parent_drained_begin_single(s->child);
+        bdrv_parent_drained_begin_single(s->child, false);
         assert(!bdrv_parent_drained_poll_single(s->child));
     }
     assert(s->child->quiesced_parent);
@@ -3050,7 +3050,7 @@ static BdrvChild *bdrv_attach_child_common(BlockDriverState *child_bs,
      * a problem, we already did this), but it will still poll until the parent
      * is fully quiesced, so it will not be negatively affected either.
      */
-    bdrv_parent_drained_begin_single(new_child);
+    bdrv_parent_drained_begin_single(new_child, false);
     bdrv_replace_child_noperm(new_child, child_bs);
 
     BdrvAttachChildCommonState *s = g_new(BdrvAttachChildCommonState, 1);
diff --git a/block/io.c b/block/io.c
index f4444b7777d9..aee6e70c1496 100644
--- a/block/io.c
+++ b/block/io.c
@@ -53,7 +53,7 @@ static void bdrv_parent_drained_begin(BlockDriverState *bs, BdrvChild *ignore)
         if (c == ignore) {
             continue;
         }
-        bdrv_parent_drained_begin_single(c);
+        bdrv_parent_drained_begin_single(c, false);
     }
 }
 
@@ -105,8 +105,9 @@ static bool bdrv_parent_drained_poll(BlockDriverState *bs, BdrvChild *ignore,
     return busy;
 }
 
-void bdrv_parent_drained_begin_single(BdrvChild *c)
+void bdrv_parent_drained_begin_single(BdrvChild *c, bool poll)
 {
+    AioContext *ctx = bdrv_child_get_parent_aio_context(c);
     IO_OR_GS_CODE();
 
     assert(!c->quiesced_parent);
@@ -115,6 +116,9 @@ void bdrv_parent_drained_begin_single(BdrvChild *c)
     if (c->klass->drained_begin) {
         c->klass->drained_begin(c);
     }
+    if (poll) {
+        AIO_WAIT_WHILE(ctx, bdrv_parent_drained_poll_single(c));
+    }
 }
 
 static void bdrv_merge_limits(BlockLimits *dst, const BlockLimits *src)
diff --git a/include/block/block-io.h b/include/block/block-io.h
index 52869ea08eb5..75d043204355 100644
--- a/include/block/block-io.h
+++ b/include/block/block-io.h
@@ -308,9 +308,10 @@ bdrv_writev_vmstate(BlockDriverState *bs, QEMUIOVector *qiov, int64_t pos);
 /**
  * bdrv_parent_drained_begin_single:
  *
- * Begin a quiesced section for the parent of @c.
+ * Begin a quiesced section for the parent of @c. If @poll is true, wait for
+ * any pending activity to cease.
  */
-void bdrv_parent_drained_begin_single(BdrvChild *c);
+void bdrv_parent_drained_begin_single(BdrvChild *c, bool poll);
 
 /**
  * bdrv_parent_drained_poll_single:
-- 
2.38.1



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 02/15] Revert "block: Don't poll in bdrv_replace_child_noperm()"
  2022-12-12 12:59 [PATCH 00/12] More cleanups and fixes for drain Paolo Bonzini
  2022-12-12 12:59 ` [PATCH 01/15] Revert "block: Remove poll parameter from bdrv_parent_drained_begin_single()" Paolo Bonzini
@ 2022-12-12 12:59 ` Paolo Bonzini
  2022-12-12 12:59 ` [PATCH 03/15] block: Pull polling out of bdrv_parent_drained_begin_single() Paolo Bonzini
                   ` (14 subsequent siblings)
  16 siblings, 0 replies; 22+ messages in thread
From: Paolo Bonzini @ 2022-12-12 12:59 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, eesposit, kwolf

This reverts commit a4e5c80a45b22359cf9c187f0df4f8544812c55c.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 block.c                      | 103 +++++------------------------------
 block/io.c                   |   2 +-
 include/block/block-io.h     |   8 ---
 tests/unit/test-bdrv-drain.c |  10 ----
 4 files changed, 15 insertions(+), 108 deletions(-)

diff --git a/block.c b/block.c
index 87022f4cd971..2f2123f4a4e5 100644
--- a/block.c
+++ b/block.c
@@ -2367,20 +2367,6 @@ static void bdrv_replace_child_abort(void *opaque)
 
     GLOBAL_STATE_CODE();
     /* old_bs reference is transparently moved from @s to @s->child */
-    if (!s->child->bs) {
-        /*
-         * The parents were undrained when removing old_bs from the child. New
-         * requests can't have been made, though, because the child was empty.
-         *
-         * TODO Make bdrv_replace_child_noperm() transactionable to avoid
-         * undraining the parent in the first place. Once this is done, having
-         * new_bs drained when calling bdrv_replace_child_tran() is not a
-         * requirement any more.
-         */
-        bdrv_parent_drained_begin_single(s->child, false);
-        assert(!bdrv_parent_drained_poll_single(s->child));
-    }
-    assert(s->child->quiesced_parent);
     bdrv_replace_child_noperm(s->child, s->old_bs);
     bdrv_unref(new_bs);
 }
@@ -2396,19 +2382,12 @@ static TransactionActionDrv bdrv_replace_child_drv = {
  *
  * Note: real unref of old_bs is done only on commit.
  *
- * Both @child->bs and @new_bs (if non-NULL) must be drained. @new_bs must be
- * kept drained until the transaction is completed.
- *
  * The function doesn't update permissions, caller is responsible for this.
  */
 static void bdrv_replace_child_tran(BdrvChild *child, BlockDriverState *new_bs,
                                     Transaction *tran)
 {
     BdrvReplaceChildState *s = g_new(BdrvReplaceChildState, 1);
-
-    assert(child->quiesced_parent);
-    assert(!new_bs || new_bs->quiesce_counter);
-
     *s = (BdrvReplaceChildState) {
         .child = child,
         .old_bs = child->bs,
@@ -2831,14 +2810,6 @@ uint64_t bdrv_qapi_perm_to_blk_perm(BlockPermission qapi_perm)
     return permissions[qapi_perm];
 }
 
-/*
- * Replaces the node that a BdrvChild points to without updating permissions.
- *
- * If @new_bs is non-NULL, the parent of @child must already be drained through
- * @child.
- *
- * This function does not poll.
- */
 static void bdrv_replace_child_noperm(BdrvChild *child,
                                       BlockDriverState *new_bs)
 {
@@ -2846,28 +2817,6 @@ static void bdrv_replace_child_noperm(BdrvChild *child,
     int new_bs_quiesce_counter;
 
     assert(!child->frozen);
-
-    /*
-     * If we want to change the BdrvChild to point to a drained node as its new
-     * child->bs, we need to make sure that its new parent is drained, too. In
-     * other words, either child->quiesce_parent must already be true or we must
-     * be able to set it and keep the parent's quiesce_counter consistent with
-     * that, but without polling or starting new requests (this function
-     * guarantees that it doesn't poll, and starting new requests would be
-     * against the invariants of drain sections).
-     *
-     * To keep things simple, we pick the first option (child->quiesce_parent
-     * must already be true). We also generalise the rule a bit to make it
-     * easier to verify in callers and more likely to be covered in test cases:
-     * The parent must be quiesced through this child even if new_bs isn't
-     * currently drained.
-     *
-     * The only exception is for callers that always pass new_bs == NULL. In
-     * this case, we obviously never need to consider the case of a drained
-     * new_bs, so we can keep the callers simpler by allowing them not to drain
-     * the parent.
-     */
-    assert(!new_bs || child->quiesced_parent);
     assert(old_bs != new_bs);
     GLOBAL_STATE_CODE();
 
@@ -2875,6 +2824,15 @@ static void bdrv_replace_child_noperm(BdrvChild *child,
         assert(bdrv_get_aio_context(old_bs) == bdrv_get_aio_context(new_bs));
     }
 
+    /*
+     * If the new child node is drained but the old one was not, flush
+     * all outstanding requests to the old child node.
+     */
+    new_bs_quiesce_counter = (new_bs ? new_bs->quiesce_counter : 0);
+    if (new_bs_quiesce_counter && !child->quiesced_parent) {
+        bdrv_parent_drained_begin_single(child, true);
+    }
+
     if (old_bs) {
         if (child->klass->detach) {
             child->klass->detach(child);
@@ -2894,9 +2852,11 @@ static void bdrv_replace_child_noperm(BdrvChild *child,
     }
 
     /*
-     * If the parent was drained through this BdrvChild previously, but new_bs
-     * is not drained, allow requests to come in only after the new node has
-     * been attached.
+     * If the old child node was drained but the new one is not, allow
+     * requests to come in only after the new node has been attached.
+     *
+     * Update new_bs_quiesce_counter because bdrv_parent_drained_begin_single()
+     * polls, which could have changed the value.
      */
     new_bs_quiesce_counter = (new_bs ? new_bs->quiesce_counter : 0);
     if (!new_bs_quiesce_counter && child->quiesced_parent) {
@@ -3033,24 +2993,6 @@ static BdrvChild *bdrv_attach_child_common(BlockDriverState *child_bs,
     }
 
     bdrv_ref(child_bs);
-    /*
-     * Let every new BdrvChild start with a drained parent. Inserting the child
-     * in the graph with bdrv_replace_child_noperm() will undrain it if
-     * @child_bs is not drained.
-     *
-     * The child was only just created and is not yet visible in global state
-     * until bdrv_replace_child_noperm() inserts it into the graph, so nobody
-     * could have sent requests and polling is not necessary.
-     *
-     * Note that this means that the parent isn't fully drained yet, we only
-     * stop new requests from coming in. This is fine, we don't care about the
-     * old requests here, they are not for this child. If another place enters a
-     * drain section for the same parent, but wants it to be fully quiesced, it
-     * will not run most of the the code in .drained_begin() again (which is not
-     * a problem, we already did this), but it will still poll until the parent
-     * is fully quiesced, so it will not be negatively affected either.
-     */
-    bdrv_parent_drained_begin_single(new_child, false);
     bdrv_replace_child_noperm(new_child, child_bs);
 
     BdrvAttachChildCommonState *s = g_new(BdrvAttachChildCommonState, 1);
@@ -5096,24 +5038,12 @@ static void bdrv_remove_child(BdrvChild *child, Transaction *tran)
     }
 
     if (child->bs) {
-        BlockDriverState *bs = child->bs;
-        bdrv_drained_begin(bs);
         bdrv_replace_child_tran(child, NULL, tran);
-        bdrv_drained_end(bs);
     }
 
     tran_add(tran, &bdrv_remove_child_drv, child);
 }
 
-static void undrain_on_clean_cb(void *opaque)
-{
-    bdrv_drained_end(opaque);
-}
-
-static TransactionActionDrv undrain_on_clean = {
-    .clean = undrain_on_clean_cb,
-};
-
 static int bdrv_replace_node_noperm(BlockDriverState *from,
                                     BlockDriverState *to,
                                     bool auto_skip, Transaction *tran,
@@ -5123,11 +5053,6 @@ static int bdrv_replace_node_noperm(BlockDriverState *from,
 
     GLOBAL_STATE_CODE();
 
-    bdrv_drained_begin(from);
-    bdrv_drained_begin(to);
-    tran_add(tran, &undrain_on_clean, from);
-    tran_add(tran, &undrain_on_clean, to);
-
     QLIST_FOREACH_SAFE(c, &from->parents, next_parent, next) {
         assert(c->bs == from);
         if (!should_update_child(c, to)) {
diff --git a/block/io.c b/block/io.c
index aee6e70c1496..571ff8c6493a 100644
--- a/block/io.c
+++ b/block/io.c
@@ -81,7 +81,7 @@ static void bdrv_parent_drained_end(BlockDriverState *bs, BdrvChild *ignore)
     }
 }
 
-bool bdrv_parent_drained_poll_single(BdrvChild *c)
+static bool bdrv_parent_drained_poll_single(BdrvChild *c)
 {
     if (c->klass->drained_poll) {
         return c->klass->drained_poll(c);
diff --git a/include/block/block-io.h b/include/block/block-io.h
index 75d043204355..0e0cd1249705 100644
--- a/include/block/block-io.h
+++ b/include/block/block-io.h
@@ -313,14 +313,6 @@ bdrv_writev_vmstate(BlockDriverState *bs, QEMUIOVector *qiov, int64_t pos);
  */
 void bdrv_parent_drained_begin_single(BdrvChild *c, bool poll);
 
-/**
- * bdrv_parent_drained_poll_single:
- *
- * Returns true if there is any pending activity to cease before @c can be
- * called quiesced, false otherwise.
- */
-bool bdrv_parent_drained_poll_single(BdrvChild *c);
-
 /**
  * bdrv_parent_drained_end_single:
  *
diff --git a/tests/unit/test-bdrv-drain.c b/tests/unit/test-bdrv-drain.c
index 2686a8aceee1..172bc6debc23 100644
--- a/tests/unit/test-bdrv-drain.c
+++ b/tests/unit/test-bdrv-drain.c
@@ -1654,7 +1654,6 @@ static void test_drop_intermediate_poll(void)
 
 
 typedef struct BDRVReplaceTestState {
-    bool setup_completed;
     bool was_drained;
     bool was_undrained;
     bool has_read;
@@ -1739,10 +1738,6 @@ static void bdrv_replace_test_drain_begin(BlockDriverState *bs)
 {
     BDRVReplaceTestState *s = bs->opaque;
 
-    if (!s->setup_completed) {
-        return;
-    }
-
     if (!s->drain_count) {
         s->drain_co = qemu_coroutine_create(bdrv_replace_test_drain_co, bs);
         bdrv_inc_in_flight(bs);
@@ -1774,10 +1769,6 @@ static void bdrv_replace_test_drain_end(BlockDriverState *bs)
 {
     BDRVReplaceTestState *s = bs->opaque;
 
-    if (!s->setup_completed) {
-        return;
-    }
-
     g_assert(s->drain_count > 0);
     if (!--s->drain_count) {
         s->was_undrained = true;
@@ -1876,7 +1867,6 @@ static void do_test_replace_child_mid_drain(int old_drain_count,
     bdrv_ref(old_child_bs);
     bdrv_attach_child(parent_bs, old_child_bs, "child", &child_of_bds,
                       BDRV_CHILD_COW, &error_abort);
-    parent_s->setup_completed = true;
 
     for (i = 0; i < old_drain_count; i++) {
         bdrv_drained_begin(old_child_bs);
-- 
2.38.1



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 03/15] block: Pull polling out of bdrv_parent_drained_begin_single()
  2022-12-12 12:59 [PATCH 00/12] More cleanups and fixes for drain Paolo Bonzini
  2022-12-12 12:59 ` [PATCH 01/15] Revert "block: Remove poll parameter from bdrv_parent_drained_begin_single()" Paolo Bonzini
  2022-12-12 12:59 ` [PATCH 02/15] Revert "block: Don't poll in bdrv_replace_child_noperm()" Paolo Bonzini
@ 2022-12-12 12:59 ` Paolo Bonzini
  2022-12-12 12:59 ` [PATCH 04/15] test-bdrv-drain.c: remove test_detach_by_parent_cb() Paolo Bonzini
                   ` (13 subsequent siblings)
  16 siblings, 0 replies; 22+ messages in thread
From: Paolo Bonzini @ 2022-12-12 12:59 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, eesposit, kwolf

Only one caller of bdrv_parent_drained_begin_single() passes poll=true;
move the polling to that one caller.

While this requires exposing bdrv_parent_drained_poll_single to outside
block/io.c, this is not a big deal because the bdrv_parent_drained_*_single
functions are really internal between block.c and block/io.c.  So make
that clear while we're at it, by moving them to block_int-io.h.

Based on a patch by Kevin Wolf <kwolf@redhat.com>.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 block.c                      |  4 +++-
 block/io.c                   | 10 +++-------
 include/block/block-io.h     | 15 ---------------
 include/block/block_int-io.h | 21 +++++++++++++++++++++
 4 files changed, 27 insertions(+), 23 deletions(-)

diff --git a/block.c b/block.c
index 2f2123f4a4e5..c542a0a33358 100644
--- a/block.c
+++ b/block.c
@@ -2830,7 +2830,9 @@ static void bdrv_replace_child_noperm(BdrvChild *child,
      */
     new_bs_quiesce_counter = (new_bs ? new_bs->quiesce_counter : 0);
     if (new_bs_quiesce_counter && !child->quiesced_parent) {
-        bdrv_parent_drained_begin_single(child, true);
+        bdrv_parent_drained_begin_single(child);
+        AIO_WAIT_WHILE(bdrv_child_get_parent_aio_context(child),
+                       bdrv_parent_drained_poll_single(child));
     }
 
     if (old_bs) {
diff --git a/block/io.c b/block/io.c
index 571ff8c6493a..f4444b7777d9 100644
--- a/block/io.c
+++ b/block/io.c
@@ -53,7 +53,7 @@ static void bdrv_parent_drained_begin(BlockDriverState *bs, BdrvChild *ignore)
         if (c == ignore) {
             continue;
         }
-        bdrv_parent_drained_begin_single(c, false);
+        bdrv_parent_drained_begin_single(c);
     }
 }
 
@@ -81,7 +81,7 @@ static void bdrv_parent_drained_end(BlockDriverState *bs, BdrvChild *ignore)
     }
 }
 
-static bool bdrv_parent_drained_poll_single(BdrvChild *c)
+bool bdrv_parent_drained_poll_single(BdrvChild *c)
 {
     if (c->klass->drained_poll) {
         return c->klass->drained_poll(c);
@@ -105,9 +105,8 @@ static bool bdrv_parent_drained_poll(BlockDriverState *bs, BdrvChild *ignore,
     return busy;
 }
 
-void bdrv_parent_drained_begin_single(BdrvChild *c, bool poll)
+void bdrv_parent_drained_begin_single(BdrvChild *c)
 {
-    AioContext *ctx = bdrv_child_get_parent_aio_context(c);
     IO_OR_GS_CODE();
 
     assert(!c->quiesced_parent);
@@ -116,9 +115,6 @@ void bdrv_parent_drained_begin_single(BdrvChild *c, bool poll)
     if (c->klass->drained_begin) {
         c->klass->drained_begin(c);
     }
-    if (poll) {
-        AIO_WAIT_WHILE(ctx, bdrv_parent_drained_poll_single(c));
-    }
 }
 
 static void bdrv_merge_limits(BlockLimits *dst, const BlockLimits *src)
diff --git a/include/block/block-io.h b/include/block/block-io.h
index 0e0cd1249705..10659a3f246c 100644
--- a/include/block/block-io.h
+++ b/include/block/block-io.h
@@ -305,21 +305,6 @@ bdrv_readv_vmstate(BlockDriverState *bs, QEMUIOVector *qiov, int64_t pos);
 int co_wrapper_mixed
 bdrv_writev_vmstate(BlockDriverState *bs, QEMUIOVector *qiov, int64_t pos);
 
-/**
- * bdrv_parent_drained_begin_single:
- *
- * Begin a quiesced section for the parent of @c. If @poll is true, wait for
- * any pending activity to cease.
- */
-void bdrv_parent_drained_begin_single(BdrvChild *c, bool poll);
-
-/**
- * bdrv_parent_drained_end_single:
- *
- * End a quiesced section for the parent of @c.
- */
-void bdrv_parent_drained_end_single(BdrvChild *c);
-
 /**
  * bdrv_drain_poll:
  *
diff --git a/include/block/block_int-io.h b/include/block/block_int-io.h
index 8bc061ebb895..0ced9c025acb 100644
--- a/include/block/block_int-io.h
+++ b/include/block/block_int-io.h
@@ -179,4 +179,25 @@ void bdrv_bsc_invalidate_range(BlockDriverState *bs,
  */
 void bdrv_bsc_fill(BlockDriverState *bs, int64_t offset, int64_t bytes);
 
+/**
+ * bdrv_parent_drained_begin_single:
+ *
+ * Begin a quiesced section for the parent of @c.
+ */
+void bdrv_parent_drained_begin_single(BdrvChild *c);
+
+/**
+ * bdrv_parent_drained_begin_single:
+ *
+ * Check whether the parent of @c has quiesced.
+ */
+bool bdrv_parent_drained_poll_single(BdrvChild *c);
+
+/**
+ * bdrv_parent_drained_end_single:
+ *
+ * End a quiesced section for the parent of @c.
+ */
+void bdrv_parent_drained_end_single(BdrvChild *c);
+
 #endif /* BLOCK_INT_IO_H */
-- 
2.38.1



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 04/15] test-bdrv-drain.c: remove test_detach_by_parent_cb()
  2022-12-12 12:59 [PATCH 00/12] More cleanups and fixes for drain Paolo Bonzini
                   ` (2 preceding siblings ...)
  2022-12-12 12:59 ` [PATCH 03/15] block: Pull polling out of bdrv_parent_drained_begin_single() Paolo Bonzini
@ 2022-12-12 12:59 ` Paolo Bonzini
  2022-12-12 12:59 ` [PATCH 05/15] tests/unit/test-bdrv-drain.c: graph setup functions can't run in coroutines Paolo Bonzini
                   ` (12 subsequent siblings)
  16 siblings, 0 replies; 22+ messages in thread
From: Paolo Bonzini @ 2022-12-12 12:59 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, eesposit, kwolf, Stefan Hajnoczi

From: Emanuele Giuseppe Esposito <eesposit@redhat.com>

This test uses a callback of an I/O function (blk_aio_preadv)
to modify the graph, using bdrv_attach_child.
This is simply not allowed anymore. I/O cannot change the graph.

Before "block/io.c: make bdrv_do_drained_begin_quiesce static
and introduce bdrv_drained_begin_no_poll", the test would simply
be at risk of failure, because if bdrv_replace_child_noperm()
(called to modify the graph) would call a drain,
then one callback of .drained_begin() is bdrv_do_drained_begin_quiesce,
that specifically asserts that we are not in a coroutine.

Now that we fixed the behavior, the drain will invoke a bh in the
main loop, so we don't have such problem. However, this test is still
illegal and fails because we forbid graph changes from I/O paths.

Once we add the required subtree_drains to protect
bdrv_replace_child_noperm(), the key problem in this test is in:

acb = blk_aio_preadv(blk, 0, &qiov, 0, detach_by_parent_aio_cb, NULL);
/* Drain and check the expected result */
bdrv_subtree_drained_begin(parent_b);

because the detach_by_parent_aio_cb calls detach_indirect_bh(), that
modifies the graph and is invoked during bdrv_subtree_drained_begin().
The call stack is the following:
1. blk_aio_preadv() creates a coroutine, increments in_flight counter
and enters the coroutine running blk_aio_read_entry()
2. blk_aio_read_entry() performs the read and then schedules a bh to
   complete (blk_aio_complete)
3. at this point, subtree_drained_begin() kicks in and waits for all
   in_flight requests, polling
4. polling allows the bh to be scheduled, so blk_aio_complete runs
5. blk_aio_complete *first* invokes the callback
   (detach_by_parent_aio_cb) and then decrements the in_flight counter
6. Here we have the problem: detach_by_parent_aio_cb modifies the graph,
   so both bdrv_unref_child() and bdrv_attach_child() will have
   subtree_drains inside. And this causes a deadlock, because the
   nested drain will wait for in_flight counter to go to zero, which
   is only happening once the drain itself finishes.

Different story is test_detach_by_driver_cb(): in this case,
detach_by_parent_aio_cb() does not call detach_indirect_bh(),
but it is instead called as a bh running in the main loop by
detach_by_driver_cb_drained_begin(), the callback for
.drained_begin().

This test was added in 231281ab42 and part of the series
"Drain fixes and cleanups, part 3"
https://lists.nongnu.org/archive/html/qemu-block/2018-05/msg01132.html
but as explained above I believe that it is not valid anymore, and
can be discarded.

Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20220314131854.2202651-8-eesposit@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 tests/unit/test-bdrv-drain.c | 42 +++++++++---------------------------
 1 file changed, 10 insertions(+), 32 deletions(-)

diff --git a/tests/unit/test-bdrv-drain.c b/tests/unit/test-bdrv-drain.c
index 172bc6debc23..3be214f30186 100644
--- a/tests/unit/test-bdrv-drain.c
+++ b/tests/unit/test-bdrv-drain.c
@@ -1106,7 +1106,6 @@ struct detach_by_parent_data {
     BdrvChild *child_b;
     BlockDriverState *c;
     BdrvChild *child_c;
-    bool by_parent_cb;
 };
 static struct detach_by_parent_data detach_by_parent_data;
 
@@ -1124,12 +1123,7 @@ static void detach_indirect_bh(void *opaque)
 
 static void detach_by_parent_aio_cb(void *opaque, int ret)
 {
-    struct detach_by_parent_data *data = &detach_by_parent_data;
-
     g_assert_cmpint(ret, ==, 0);
-    if (data->by_parent_cb) {
-        detach_indirect_bh(data);
-    }
 }
 
 static void detach_by_driver_cb_drained_begin(BdrvChild *child)
@@ -1148,33 +1142,26 @@ static BdrvChildClass detach_by_driver_cb_class;
  *    \ /   \
  *     A     B     C
  *
- * by_parent_cb == true:  Test that parent callbacks don't poll
- *
- *     PA has a pending write request whose callback changes the child nodes of
- *     PB: It removes B and adds C instead. The subtree of PB is drained, which
- *     will indirectly drain the write request, too.
- *
- * by_parent_cb == false: Test that bdrv_drain_invoke() doesn't poll
+ * Test that bdrv_drain_invoke() doesn't poll
  *
  *     PA's BdrvChildClass has a .drained_begin callback that schedules a BH
  *     that does the same graph change. If bdrv_drain_invoke() calls it, the
  *     state is messed up, but if it is only polled in the single
  *     BDRV_POLL_WHILE() at the end of the drain, this should work fine.
  */
-static void test_detach_indirect(bool by_parent_cb)
+static void test_detach_indirect(void)
 {
     BlockBackend *blk;
     BlockDriverState *parent_a, *parent_b, *a, *b, *c;
     BdrvChild *child_a, *child_b;
     BlockAIOCB *acb;
+    BDRVTestState *s;
 
     QEMUIOVector qiov = QEMU_IOVEC_INIT_BUF(qiov, NULL, 0);
 
-    if (!by_parent_cb) {
-        detach_by_driver_cb_class = child_of_bds;
-        detach_by_driver_cb_class.drained_begin =
-            detach_by_driver_cb_drained_begin;
-    }
+    detach_by_driver_cb_class = child_of_bds;
+    detach_by_driver_cb_class.drained_begin =
+        detach_by_driver_cb_drained_begin;
 
     /* Create all involved nodes */
     parent_a = bdrv_new_open_driver(&bdrv_test, "parent-a", BDRV_O_RDWR,
@@ -1193,10 +1180,8 @@ static void test_detach_indirect(bool by_parent_cb)
 
     /* If we want to get bdrv_drain_invoke() to call aio_poll(), the driver
      * callback must not return immediately. */
-    if (!by_parent_cb) {
-        BDRVTestState *s = parent_a->opaque;
-        s->sleep_in_drain_begin = true;
-    }
+    s = parent_a->opaque;
+    s->sleep_in_drain_begin = true;
 
     /* Set child relationships */
     bdrv_ref(b);
@@ -1208,7 +1193,7 @@ static void test_detach_indirect(bool by_parent_cb)
 
     bdrv_ref(a);
     bdrv_attach_child(parent_a, a, "PA-A",
-                      by_parent_cb ? &child_of_bds : &detach_by_driver_cb_class,
+                      &detach_by_driver_cb_class,
                       BDRV_CHILD_DATA, &error_abort);
 
     g_assert_cmpint(parent_a->refcnt, ==, 1);
@@ -1226,7 +1211,6 @@ static void test_detach_indirect(bool by_parent_cb)
         .parent_b = parent_b,
         .child_b = child_b,
         .c = c,
-        .by_parent_cb = by_parent_cb,
     };
     acb = blk_aio_preadv(blk, 0, &qiov, 0, detach_by_parent_aio_cb, NULL);
     g_assert(acb != NULL);
@@ -1271,14 +1255,9 @@ static void test_detach_indirect(bool by_parent_cb)
     bdrv_unref(c);
 }
 
-static void test_detach_by_parent_cb(void)
-{
-    test_detach_indirect(true);
-}
-
 static void test_detach_by_driver_cb(void)
 {
-    test_detach_indirect(false);
+    test_detach_indirect();
 }
 
 static void test_append_to_drained(void)
@@ -2029,7 +2008,6 @@ int main(int argc, char **argv)
     g_test_add_func("/bdrv-drain/deletion/drain", test_delete_by_drain);
     g_test_add_func("/bdrv-drain/detach/drain_all", test_detach_by_drain_all);
     g_test_add_func("/bdrv-drain/detach/drain", test_detach_by_drain);
-    g_test_add_func("/bdrv-drain/detach/parent_cb", test_detach_by_parent_cb);
     g_test_add_func("/bdrv-drain/detach/driver_cb", test_detach_by_driver_cb);
 
     g_test_add_func("/bdrv-drain/attach/drain", test_append_to_drained);
-- 
2.38.1



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 05/15] tests/unit/test-bdrv-drain.c: graph setup functions can't run in coroutines
  2022-12-12 12:59 [PATCH 00/12] More cleanups and fixes for drain Paolo Bonzini
                   ` (3 preceding siblings ...)
  2022-12-12 12:59 ` [PATCH 04/15] test-bdrv-drain.c: remove test_detach_by_parent_cb() Paolo Bonzini
@ 2022-12-12 12:59 ` Paolo Bonzini
  2022-12-12 12:59 ` [PATCH 06/15] tests/qemu-iotests/030: test_stream_parallel should use auto_finalize=False Paolo Bonzini
                   ` (11 subsequent siblings)
  16 siblings, 0 replies; 22+ messages in thread
From: Paolo Bonzini @ 2022-12-12 12:59 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, eesposit, kwolf

From: Emanuele Giuseppe Esposito <eesposit@redhat.com>

Graph initialization functions like blk_new(), bdrv_new() and so on
should not run in a coroutine. In fact, they might invoke a drain
(for example blk_insert_bs eventually calls bdrv_replace_child_noperm)
that in turn can invoke callbacks like bdrv_do_drained_begin_quiesce(),
that asserts exactly that we are not in a coroutine.

Move the initialization phase of test_drv_cb and test_quiesce_common
outside the coroutine logic.

Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Message-Id: <20220314131854.2202651-9-eesposit@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 tests/unit/test-bdrv-drain.c | 110 ++++++++++++++++++++++-------------
 1 file changed, 69 insertions(+), 41 deletions(-)

diff --git a/tests/unit/test-bdrv-drain.c b/tests/unit/test-bdrv-drain.c
index 3be214f30186..f677f1d9fc25 100644
--- a/tests/unit/test-bdrv-drain.c
+++ b/tests/unit/test-bdrv-drain.c
@@ -126,7 +126,8 @@ static void aio_ret_cb(void *opaque, int ret)
 }
 
 typedef struct CallInCoroutineData {
-    void (*entry)(void);
+    void (*entry)(void *);
+    void *arg;
     bool done;
 } CallInCoroutineData;
 
@@ -134,15 +135,16 @@ static coroutine_fn void call_in_coroutine_entry(void *opaque)
 {
     CallInCoroutineData *data = opaque;
 
-    data->entry();
+    data->entry(data->arg);
     data->done = true;
 }
 
-static void call_in_coroutine(void (*entry)(void))
+static void call_in_coroutine(void (*entry)(void *), void *arg)
 {
     Coroutine *co;
     CallInCoroutineData data = {
         .entry  = entry,
+        .arg    = arg,
         .done   = false,
     };
 
@@ -199,26 +201,28 @@ static void do_drain_end_unlocked(enum drain_type drain_type, BlockDriverState *
     }
 }
 
-static void test_drv_cb_common(enum drain_type drain_type, bool recursive)
-{
+typedef struct TestDriverCBData {
+    enum drain_type drain_type;
+    bool recursive;
     BlockBackend *blk;
     BlockDriverState *bs, *backing;
-    BDRVTestState *s, *backing_s;
+} TestDriverCBData;
+
+static void test_drv_cb_common(void *arg)
+{
+    TestDriverCBData *data = arg;
+    BlockBackend *blk = data->blk;
+    BlockDriverState *bs = data->bs;
+    BlockDriverState *backing = data->backing;
+    enum drain_type drain_type = data->drain_type;
+    bool recursive = data->recursive;
+    BDRVTestState *s = bs->opaque;
+    BDRVTestState *backing_s = backing->opaque;
     BlockAIOCB *acb;
     int aio_ret;
 
     QEMUIOVector qiov = QEMU_IOVEC_INIT_BUF(qiov, NULL, 0);
 
-    blk = blk_new(qemu_get_aio_context(), BLK_PERM_ALL, BLK_PERM_ALL);
-    bs = bdrv_new_open_driver(&bdrv_test, "test-node", BDRV_O_RDWR,
-                              &error_abort);
-    s = bs->opaque;
-    blk_insert_bs(blk, bs, &error_abort);
-
-    backing = bdrv_new_open_driver(&bdrv_test, "backing", 0, &error_abort);
-    backing_s = backing->opaque;
-    bdrv_set_backing_hd(bs, backing, &error_abort);
-
     /* Simple bdrv_drain_all_begin/end pair, check that CBs are called */
     g_assert_cmpint(s->drain_count, ==, 0);
     g_assert_cmpint(backing_s->drain_count, ==, 0);
@@ -252,44 +256,67 @@ static void test_drv_cb_common(enum drain_type drain_type, bool recursive)
 
     g_assert_cmpint(s->drain_count, ==, 0);
     g_assert_cmpint(backing_s->drain_count, ==, 0);
+}
 
-    bdrv_unref(backing);
-    bdrv_unref(bs);
-    blk_unref(blk);
+static void test_common_cb(enum drain_type drain_type, bool in_coroutine,
+                           void (*cb)(void *))
+{
+    TestDriverCBData data;
+
+    data.drain_type = drain_type;
+    data.recursive = (drain_type != BDRV_DRAIN);
+
+    data.blk = blk_new(qemu_get_aio_context(), BLK_PERM_ALL, BLK_PERM_ALL);
+    data.bs = bdrv_new_open_driver(&bdrv_test, "test-node", BDRV_O_RDWR,
+                              &error_abort);
+    blk_insert_bs(data.blk, data.bs, &error_abort);
+
+    data.backing = bdrv_new_open_driver(&bdrv_test, "backing", 0, &error_abort);
+    bdrv_set_backing_hd(data.bs, data.backing, &error_abort);
+
+    if (in_coroutine) {
+        call_in_coroutine(cb, &data);
+    } else {
+        cb(&data);
+    }
+
+    bdrv_unref(data.backing);
+    bdrv_unref(data.bs);
+    blk_unref(data.blk);
+}
+
+static void test_drv_cb(enum drain_type drain_type, bool in_coroutine)
+{
+    test_common_cb(drain_type, in_coroutine, test_drv_cb_common);
 }
 
 static void test_drv_cb_drain_all(void)
 {
-    test_drv_cb_common(BDRV_DRAIN_ALL, true);
+    test_drv_cb(BDRV_DRAIN_ALL, false);
 }
 
 static void test_drv_cb_drain(void)
 {
-    test_drv_cb_common(BDRV_DRAIN, false);
+    test_drv_cb(BDRV_DRAIN, false);
 }
 
 static void test_drv_cb_co_drain_all(void)
 {
-    call_in_coroutine(test_drv_cb_drain_all);
+    test_drv_cb(BDRV_DRAIN_ALL, true);
 }
 
 static void test_drv_cb_co_drain(void)
 {
-    call_in_coroutine(test_drv_cb_drain);
+    test_drv_cb(BDRV_DRAIN, true);
 }
 
-static void test_quiesce_common(enum drain_type drain_type, bool recursive)
+static void test_quiesce_common(void *arg)
 {
-    BlockBackend *blk;
-    BlockDriverState *bs, *backing;
-
-    blk = blk_new(qemu_get_aio_context(), BLK_PERM_ALL, BLK_PERM_ALL);
-    bs = bdrv_new_open_driver(&bdrv_test, "test-node", BDRV_O_RDWR,
-                              &error_abort);
-    blk_insert_bs(blk, bs, &error_abort);
-
-    backing = bdrv_new_open_driver(&bdrv_test, "backing", 0, &error_abort);
-    bdrv_set_backing_hd(bs, backing, &error_abort);
+    TestDriverCBData *data = arg;
+    BlockDriverState *bs = data->bs;
+    BlockDriverState *backing = data->backing;
+    enum drain_type drain_type = data->drain_type;
+    bool recursive = data->recursive;
 
     g_assert_cmpint(bs->quiesce_counter, ==, 0);
     g_assert_cmpint(backing->quiesce_counter, ==, 0);
@@ -307,30 +334,31 @@ static void test_quiesce_common(enum drain_type drain_type, bool recursive)
 
     g_assert_cmpint(bs->quiesce_counter, ==, 0);
     g_assert_cmpint(backing->quiesce_counter, ==, 0);
+}
 
-    bdrv_unref(backing);
-    bdrv_unref(bs);
-    blk_unref(blk);
+static void test_quiesce(enum drain_type drain_type, bool in_coroutine)
+{
+    test_common_cb(drain_type, in_coroutine, test_quiesce_common);
 }
 
 static void test_quiesce_drain_all(void)
 {
-    test_quiesce_common(BDRV_DRAIN_ALL, true);
+    test_quiesce(BDRV_DRAIN_ALL, false);
 }
 
 static void test_quiesce_drain(void)
 {
-    test_quiesce_common(BDRV_DRAIN, false);
+    test_quiesce(BDRV_DRAIN, false);
 }
 
 static void test_quiesce_co_drain_all(void)
 {
-    call_in_coroutine(test_quiesce_drain_all);
+    test_quiesce(BDRV_DRAIN_ALL, true);
 }
 
 static void test_quiesce_co_drain(void)
 {
-    call_in_coroutine(test_quiesce_drain);
+    test_quiesce(BDRV_DRAIN, true);
 }
 
 static void test_nested(void)
-- 
2.38.1



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 06/15] tests/qemu-iotests/030: test_stream_parallel should use auto_finalize=False
  2022-12-12 12:59 [PATCH 00/12] More cleanups and fixes for drain Paolo Bonzini
                   ` (4 preceding siblings ...)
  2022-12-12 12:59 ` [PATCH 05/15] tests/unit/test-bdrv-drain.c: graph setup functions can't run in coroutines Paolo Bonzini
@ 2022-12-12 12:59 ` Paolo Bonzini
  2022-12-12 12:59 ` [PATCH 07/15] block-backend: enter aio coroutine only after drain Paolo Bonzini
                   ` (10 subsequent siblings)
  16 siblings, 0 replies; 22+ messages in thread
From: Paolo Bonzini @ 2022-12-12 12:59 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, eesposit, kwolf

From: Emanuele Giuseppe Esposito <eesposit@redhat.com>

First, use run_job() instead of the current logic to run the
stream job. Then, use auto_finalize=False to be sure that
the job is not automatically deleted once it is done.

In this way, if the job finishes before we want, it is not
finalized yet so the other commands can still execute
without failing. run_job() will then take care of calling
job-finalize.

Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Suggested-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20220314131854.2202651-11-eesposit@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 tests/qemu-iotests/030 | 12 +++---------
 1 file changed, 3 insertions(+), 9 deletions(-)

diff --git a/tests/qemu-iotests/030 b/tests/qemu-iotests/030
index 98595d47fec3..e5c13cb5fe4c 100755
--- a/tests/qemu-iotests/030
+++ b/tests/qemu-iotests/030
@@ -256,7 +256,7 @@ class TestParallelOps(iotests.QMPTestCase):
             pending_jobs.append(job_id)
             result = self.vm.qmp('block-stream', device=node_name,
                                  job_id=job_id, bottom=f'node{i-1}',
-                                 speed=1024)
+                                 speed=1024, auto_finalize=False)
             self.assert_qmp(result, 'return', {})
 
         # Do this in reverse: After unthrottling them, some jobs may finish
@@ -272,14 +272,8 @@ class TestParallelOps(iotests.QMPTestCase):
             result = self.vm.qmp('block-job-set-speed', device=job, speed=0)
             self.assert_qmp(result, 'return', {})
 
-        # Wait for all jobs to be finished.
-        while len(pending_jobs) > 0:
-            for event in self.vm.get_qmp_events(wait=True):
-                if event['event'] == 'BLOCK_JOB_COMPLETED':
-                    job_id = self.dictpath(event, 'data/device')
-                    self.assertTrue(job_id in pending_jobs)
-                    self.assert_qmp_absent(event, 'data/error')
-                    pending_jobs.remove(job_id)
+        for job in pending_jobs:
+            self.vm.run_job(job=job, auto_finalize=False)
 
         self.assert_no_active_block_jobs()
         self.vm.shutdown()
-- 
2.38.1



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 07/15] block-backend: enter aio coroutine only after drain
  2022-12-12 12:59 [PATCH 00/12] More cleanups and fixes for drain Paolo Bonzini
                   ` (5 preceding siblings ...)
  2022-12-12 12:59 ` [PATCH 06/15] tests/qemu-iotests/030: test_stream_parallel should use auto_finalize=False Paolo Bonzini
@ 2022-12-12 12:59 ` Paolo Bonzini
  2023-01-16 15:57   ` Kevin Wolf
  2022-12-12 12:59 ` [PATCH 08/15] nbd: a BlockExport always has a BlockBackend Paolo Bonzini
                   ` (9 subsequent siblings)
  16 siblings, 1 reply; 22+ messages in thread
From: Paolo Bonzini @ 2022-12-12 12:59 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, eesposit, kwolf

When called from within (another) coroutine, aio_co_enter will not
enter a coroutine immediately; instead the new coroutine is scheduled
to run after qemu_coroutine_yield().  This however might cause the
currently-running coroutine to yield without having raised blk->in_flight.
If it was a ->drained_begin() callback who scheduled the coroutine,
bdrv_drained_begin() might exit without waiting for the I/O operation
to finish.  Right now, this is masked by unnecessary polling done by
bdrv_drained_begin() after the callbacks return, but it is wrong and
a latent bug.

So, ensure that blk_inc_in_flight() and blk_wait_while_drained()
are called before aio_co_enter().  To do so, pull the call to
blk_wait_while_drained() out of the blk_co_do_* functions, which are
called from the AIO coroutines, and place them separately in the public
blk_co_* functions and in blk_aio_prwv.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 block/block-backend.c | 16 +++++++---------
 1 file changed, 7 insertions(+), 9 deletions(-)

diff --git a/block/block-backend.c b/block/block-backend.c
index 2852a892de6c..c4a884b86c2b 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -1288,8 +1288,6 @@ blk_co_do_preadv_part(BlockBackend *blk, int64_t offset, int64_t bytes,
     BlockDriverState *bs;
     IO_CODE();
 
-    blk_wait_while_drained(blk);
-
     /* Call blk_bs() only after waiting, the graph may have changed */
     bs = blk_bs(blk);
     trace_blk_co_preadv(blk, bs, offset, bytes, flags);
@@ -1332,6 +1330,7 @@ int coroutine_fn blk_co_preadv(BlockBackend *blk, int64_t offset,
     IO_OR_GS_CODE();
 
     blk_inc_in_flight(blk);
+    blk_wait_while_drained(blk);
     ret = blk_co_do_preadv_part(blk, offset, bytes, qiov, 0, flags);
     blk_dec_in_flight(blk);
 
@@ -1346,6 +1345,7 @@ int coroutine_fn blk_co_preadv_part(BlockBackend *blk, int64_t offset,
     IO_OR_GS_CODE();
 
     blk_inc_in_flight(blk);
+    blk_wait_while_drained(blk);
     ret = blk_co_do_preadv_part(blk, offset, bytes, qiov, qiov_offset, flags);
     blk_dec_in_flight(blk);
 
@@ -1362,8 +1362,6 @@ blk_co_do_pwritev_part(BlockBackend *blk, int64_t offset, int64_t bytes,
     BlockDriverState *bs;
     IO_CODE();
 
-    blk_wait_while_drained(blk);
-
     /* Call blk_bs() only after waiting, the graph may have changed */
     bs = blk_bs(blk);
     trace_blk_co_pwritev(blk, bs, offset, bytes, flags);
@@ -1399,6 +1397,7 @@ int coroutine_fn blk_co_pwritev_part(BlockBackend *blk, int64_t offset,
     IO_OR_GS_CODE();
 
     blk_inc_in_flight(blk);
+    blk_wait_while_drained(blk);
     ret = blk_co_do_pwritev_part(blk, offset, bytes, qiov, qiov_offset, flags);
     blk_dec_in_flight(blk);
 
@@ -1543,6 +1542,7 @@ static BlockAIOCB *blk_aio_prwv(BlockBackend *blk, int64_t offset,
     Coroutine *co;
 
     blk_inc_in_flight(blk);
+    blk_wait_while_drained(blk);
     acb = blk_aio_get(&blk_aio_em_aiocb_info, blk, cb, opaque);
     acb->rwco = (BlkRwCo) {
         .blk    = blk,
@@ -1667,8 +1667,6 @@ blk_co_do_ioctl(BlockBackend *blk, unsigned long int req, void *buf)
 {
     IO_CODE();
 
-    blk_wait_while_drained(blk);
-
     if (!blk_is_available(blk)) {
         return -ENOMEDIUM;
     }
@@ -1683,6 +1681,7 @@ int coroutine_fn blk_co_ioctl(BlockBackend *blk, unsigned long int req,
     IO_OR_GS_CODE();
 
     blk_inc_in_flight(blk);
+    blk_wait_while_drained(blk);
     ret = blk_co_do_ioctl(blk, req, buf);
     blk_dec_in_flight(blk);
 
@@ -1713,8 +1712,6 @@ blk_co_do_pdiscard(BlockBackend *blk, int64_t offset, int64_t bytes)
     int ret;
     IO_CODE();
 
-    blk_wait_while_drained(blk);
-
     ret = blk_check_byte_request(blk, offset, bytes);
     if (ret < 0) {
         return ret;
@@ -1748,6 +1745,7 @@ int coroutine_fn blk_co_pdiscard(BlockBackend *blk, int64_t offset,
     IO_OR_GS_CODE();
 
     blk_inc_in_flight(blk);
+    blk_wait_while_drained(blk);
     ret = blk_co_do_pdiscard(blk, offset, bytes);
     blk_dec_in_flight(blk);
 
@@ -1757,7 +1755,6 @@ int coroutine_fn blk_co_pdiscard(BlockBackend *blk, int64_t offset,
 /* To be called between exactly one pair of blk_inc/dec_in_flight() */
 static int coroutine_fn blk_co_do_flush(BlockBackend *blk)
 {
-    blk_wait_while_drained(blk);
     IO_CODE();
 
     if (!blk_is_available(blk)) {
@@ -1789,6 +1786,7 @@ int coroutine_fn blk_co_flush(BlockBackend *blk)
     IO_OR_GS_CODE();
 
     blk_inc_in_flight(blk);
+    blk_wait_while_drained(blk);
     ret = blk_co_do_flush(blk);
     blk_dec_in_flight(blk);
 
-- 
2.38.1



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 08/15] nbd: a BlockExport always has a BlockBackend
  2022-12-12 12:59 [PATCH 00/12] More cleanups and fixes for drain Paolo Bonzini
                   ` (6 preceding siblings ...)
  2022-12-12 12:59 ` [PATCH 07/15] block-backend: enter aio coroutine only after drain Paolo Bonzini
@ 2022-12-12 12:59 ` Paolo Bonzini
  2022-12-12 12:59 ` [PATCH 09/15] block-backend: make global properties write-once Paolo Bonzini
                   ` (8 subsequent siblings)
  16 siblings, 0 replies; 22+ messages in thread
From: Paolo Bonzini @ 2022-12-12 12:59 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, eesposit, kwolf

exp->common.blk cannot be NULL, nbd_export_delete() is only called from
blk_exp_unref() and in turn that can only happen after blk_exp_add()
has asserted exp->blk != NULL.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 nbd/server.c | 14 ++++++--------
 1 file changed, 6 insertions(+), 8 deletions(-)

diff --git a/nbd/server.c b/nbd/server.c
index c53c39560e50..462ddb0e4adb 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -1848,15 +1848,13 @@ static void nbd_export_delete(BlockExport *blk_exp)
     g_free(exp->description);
     exp->description = NULL;
 
-    if (exp->common.blk) {
-        if (exp->eject_notifier_blk) {
-            notifier_remove(&exp->eject_notifier);
-            blk_unref(exp->eject_notifier_blk);
-        }
-        blk_remove_aio_context_notifier(exp->common.blk, blk_aio_attached,
-                                        blk_aio_detach, exp);
-        blk_set_disable_request_queuing(exp->common.blk, false);
+    if (exp->eject_notifier_blk) {
+        notifier_remove(&exp->eject_notifier);
+        blk_unref(exp->eject_notifier_blk);
     }
+    blk_remove_aio_context_notifier(exp->common.blk, blk_aio_attached,
+                                    blk_aio_detach, exp);
+    blk_set_disable_request_queuing(exp->common.blk, false);
 
     for (i = 0; i < exp->nr_export_bitmaps; i++) {
         bdrv_dirty_bitmap_set_busy(exp->export_bitmaps[i], false);
-- 
2.38.1



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 09/15] block-backend: make global properties write-once
  2022-12-12 12:59 [PATCH 00/12] More cleanups and fixes for drain Paolo Bonzini
                   ` (7 preceding siblings ...)
  2022-12-12 12:59 ` [PATCH 08/15] nbd: a BlockExport always has a BlockBackend Paolo Bonzini
@ 2022-12-12 12:59 ` Paolo Bonzini
  2022-12-12 12:59 ` [PATCH 10/15] block-backend: always wait for drain before starting operation Paolo Bonzini
                   ` (7 subsequent siblings)
  16 siblings, 0 replies; 22+ messages in thread
From: Paolo Bonzini @ 2022-12-12 12:59 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, eesposit, kwolf

The three global properties allow_aio_context_change,
disable_request_queuing and allow_write_before_eof are
always set for the whole life of a BlockBackend.  Make
this clear by removing the possibility of clearing them,
and by marking the corresponding function GLOBAL_STATE_CODE().

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 block/block-backend.c             | 24 ++++++++++++++----------
 block/commit.c                    |  4 ++--
 block/export/export.c             |  2 +-
 block/mirror.c                    |  4 ++--
 block/parallels.c                 |  2 +-
 block/qcow.c                      |  2 +-
 block/qcow2.c                     |  2 +-
 block/qed.c                       |  2 +-
 block/stream.c                    |  4 ++--
 block/vdi.c                       |  2 +-
 block/vhdx.c                      |  2 +-
 block/vmdk.c                      |  4 ++--
 block/vpc.c                       |  2 +-
 include/sysemu/block-backend-io.h |  6 +++---
 nbd/server.c                      |  3 +--
 tests/unit/test-bdrv-drain.c      |  4 ++--
 tests/unit/test-block-iothread.c  |  2 +-
 17 files changed, 37 insertions(+), 34 deletions(-)

diff --git a/block/block-backend.c b/block/block-backend.c
index c4a884b86c2b..fe42d53d655d 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -73,8 +73,13 @@ struct BlockBackend {
     uint64_t shared_perm;
     bool disable_perm;
 
+    /*
+     * Can only become true; should be written before any requests is
+     * submitted to the BlockBackend.
+     */
     bool allow_aio_context_change;
     bool allow_write_beyond_eof;
+    bool disable_request_queuing;
 
     /* Protected by BQL */
     NotifierList remove_bs_notifiers, insert_bs_notifiers;
@@ -82,7 +87,6 @@ struct BlockBackend {
 
     int quiesce_counter;
     CoQueue queued_requests;
-    bool disable_request_queuing;
 
     VMChangeStateEntry *vmsh;
     bool force_allow_inactivate;
@@ -1217,22 +1221,22 @@ void blk_iostatus_set_err(BlockBackend *blk, int error)
     }
 }
 
-void blk_set_allow_write_beyond_eof(BlockBackend *blk, bool allow)
+void blk_allow_write_beyond_eof(BlockBackend *blk)
 {
-    IO_CODE();
-    blk->allow_write_beyond_eof = allow;
+    GLOBAL_STATE_CODE();
+    blk->allow_write_beyond_eof = true;
 }
 
-void blk_set_allow_aio_context_change(BlockBackend *blk, bool allow)
+void blk_allow_aio_context_change(BlockBackend *blk)
 {
-    IO_CODE();
-    blk->allow_aio_context_change = allow;
+    GLOBAL_STATE_CODE();
+    blk->allow_aio_context_change = true;
 }
 
-void blk_set_disable_request_queuing(BlockBackend *blk, bool disable)
+void blk_disable_request_queuing(BlockBackend *blk)
 {
-    IO_CODE();
-    blk->disable_request_queuing = disable;
+    GLOBAL_STATE_CODE();
+    blk->disable_request_queuing = true;
 }
 
 static int blk_check_byte_request(BlockBackend *blk, int64_t offset,
diff --git a/block/commit.c b/block/commit.c
index b34634176797..7a448838301b 100644
--- a/block/commit.c
+++ b/block/commit.c
@@ -379,7 +379,7 @@ void commit_start(const char *job_id, BlockDriverState *bs,
     if (ret < 0) {
         goto fail;
     }
-    blk_set_disable_request_queuing(s->base, true);
+    blk_disable_request_queuing(s->base);
     s->base_bs = base;
 
     /* Required permissions are already taken with block_job_add_bdrv() */
@@ -388,7 +388,7 @@ void commit_start(const char *job_id, BlockDriverState *bs,
     if (ret < 0) {
         goto fail;
     }
-    blk_set_disable_request_queuing(s->top, true);
+    blk_disable_request_queuing(s->top);
 
     s->backing_file_str = g_strdup(backing_file_str);
     s->on_error = on_error;
diff --git a/block/export/export.c b/block/export/export.c
index 7cc0c25c1c9f..6abfe5becb88 100644
--- a/block/export/export.c
+++ b/block/export/export.c
@@ -155,7 +155,7 @@ BlockExport *blk_exp_add(BlockExportOptions *export, Error **errp)
     blk = blk_new(ctx, perm, BLK_PERM_ALL);
 
     if (!fixed_iothread) {
-        blk_set_allow_aio_context_change(blk, true);
+        blk_allow_aio_context_change(blk);
     }
 
     ret = blk_insert_bs(blk, bs, errp);
diff --git a/block/mirror.c b/block/mirror.c
index 251adc5ae02a..28e76e9dbf55 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -1768,8 +1768,8 @@ static BlockJob *mirror_start_job(
          * ensure that. */
         blk_set_force_allow_inactivate(s->target);
     }
-    blk_set_allow_aio_context_change(s->target, true);
-    blk_set_disable_request_queuing(s->target, true);
+    blk_allow_aio_context_change(s->target);
+    blk_disable_request_queuing(s->target);
 
     s->replaces = g_strdup(replaces);
     s->on_source_error = on_source_error;
diff --git a/block/parallels.c b/block/parallels.c
index bbea2f2221ff..44e18a8cc75d 100644
--- a/block/parallels.c
+++ b/block/parallels.c
@@ -576,7 +576,7 @@ static int coroutine_fn parallels_co_create(BlockdevCreateOptions* opts,
         ret = -EPERM;
         goto out;
     }
-    blk_set_allow_write_beyond_eof(blk, true);
+    blk_allow_write_beyond_eof(blk);
 
     /* Create image format */
     bat_entries = DIV_ROUND_UP(total_size, cl_size);
diff --git a/block/qcow.c b/block/qcow.c
index 18e17a5b1235..6165885882dd 100644
--- a/block/qcow.c
+++ b/block/qcow.c
@@ -844,7 +844,7 @@ static int coroutine_fn qcow_co_create(BlockdevCreateOptions *opts,
         ret = -EPERM;
         goto exit;
     }
-    blk_set_allow_write_beyond_eof(qcow_blk, true);
+    blk_allow_write_beyond_eof(qcow_blk);
 
     /* Create image format */
     memset(&header, 0, sizeof(header));
diff --git a/block/qcow2.c b/block/qcow2.c
index 7cc49a3a6ce3..133dcef16760 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -3634,7 +3634,7 @@ qcow2_co_create(BlockdevCreateOptions *create_options, Error **errp)
         ret = -EPERM;
         goto out;
     }
-    blk_set_allow_write_beyond_eof(blk, true);
+    blk_allow_write_beyond_eof(blk);
 
     /* Write the header */
     QEMU_BUILD_BUG_ON((1 << MIN_CLUSTER_BITS) < sizeof(*header));
diff --git a/block/qed.c b/block/qed.c
index 9d54c8eec5b0..6bcbace6f24a 100644
--- a/block/qed.c
+++ b/block/qed.c
@@ -687,7 +687,7 @@ static int coroutine_fn bdrv_qed_co_create(BlockdevCreateOptions *opts,
         ret = -EPERM;
         goto out;
     }
-    blk_set_allow_write_beyond_eof(blk, true);
+    blk_allow_write_beyond_eof(blk);
 
     /* Prepare image format */
     header = (QEDHeader) {
diff --git a/block/stream.c b/block/stream.c
index 8744ad103f71..660d9dd4fa22 100644
--- a/block/stream.c
+++ b/block/stream.c
@@ -331,8 +331,8 @@ void stream_start(const char *job_id, BlockDriverState *bs,
      * Disable request queuing in the BlockBackend to avoid deadlocks on drain:
      * The job reports that it's busy until it reaches a pause point.
      */
-    blk_set_disable_request_queuing(s->blk, true);
-    blk_set_allow_aio_context_change(s->blk, true);
+    blk_disable_request_queuing(s->blk);
+    blk_allow_aio_context_change(s->blk);
 
     /*
      * Prevent concurrent jobs trying to modify the graph structure here, we
diff --git a/block/vdi.c b/block/vdi.c
index 479bcfe820ed..b5f43f42021a 100644
--- a/block/vdi.c
+++ b/block/vdi.c
@@ -812,7 +812,7 @@ static int coroutine_fn vdi_co_do_create(BlockdevCreateOptions *create_options,
         goto exit;
     }
 
-    blk_set_allow_write_beyond_eof(blk, true);
+    blk_allow_write_beyond_eof(blk);
 
     /* We need enough blocks to store the given disk size,
        so always round up. */
diff --git a/block/vhdx.c b/block/vhdx.c
index 4c929800feeb..30904dc73978 100644
--- a/block/vhdx.c
+++ b/block/vhdx.c
@@ -2001,7 +2001,7 @@ static int coroutine_fn vhdx_co_create(BlockdevCreateOptions *opts,
         ret = -EPERM;
         goto delete_and_exit;
     }
-    blk_set_allow_write_beyond_eof(blk, true);
+    blk_allow_write_beyond_eof(blk);
 
     /* Create (A) */
 
diff --git a/block/vmdk.c b/block/vmdk.c
index afd347191527..ed9011648ec5 100644
--- a/block/vmdk.c
+++ b/block/vmdk.c
@@ -2307,7 +2307,7 @@ static int coroutine_fn vmdk_create_extent(const char *filename,
         goto exit;
     }
 
-    blk_set_allow_write_beyond_eof(blk, true);
+    blk_allow_write_beyond_eof(blk);
 
     ret = vmdk_init_extent(blk, filesize, flat, compress, zeroed_grain, errp);
 exit:
@@ -2807,7 +2807,7 @@ static BlockBackend * coroutine_fn vmdk_co_create_cb(int64_t size, int idx,
     if (!blk) {
         return NULL;
     }
-    blk_set_allow_write_beyond_eof(blk, true);
+    blk_allow_write_beyond_eof(blk);
     bdrv_unref(bs);
 
     if (size != -1) {
diff --git a/block/vpc.c b/block/vpc.c
index 6ee95dcb9600..e5c420a8cb76 100644
--- a/block/vpc.c
+++ b/block/vpc.c
@@ -1015,7 +1015,7 @@ static int coroutine_fn vpc_co_create(BlockdevCreateOptions *opts,
         ret = -EPERM;
         goto out;
     }
-    blk_set_allow_write_beyond_eof(blk, true);
+    blk_allow_write_beyond_eof(blk);
 
     /* Get geometry and check that it matches the image size*/
     ret = calculate_rounded_image_size(vpc_opts, &cyls, &heads, &secs_per_cyl,
diff --git a/include/sysemu/block-backend-io.h b/include/sysemu/block-backend-io.h
index 7ec6d978d408..f77adb1fe294 100644
--- a/include/sysemu/block-backend-io.h
+++ b/include/sysemu/block-backend-io.h
@@ -26,9 +26,9 @@ const char *blk_name(const BlockBackend *blk);
 
 BlockDriverState *blk_bs(BlockBackend *blk);
 
-void blk_set_allow_write_beyond_eof(BlockBackend *blk, bool allow);
-void blk_set_allow_aio_context_change(BlockBackend *blk, bool allow);
-void blk_set_disable_request_queuing(BlockBackend *blk, bool disable);
+void blk_allow_write_beyond_eof(BlockBackend *blk);
+void blk_allow_aio_context_change(BlockBackend *blk);
+void blk_disable_request_queuing(BlockBackend *blk);
 bool blk_iostatus_is_enabled(const BlockBackend *blk);
 
 char *blk_get_attached_dev_id(BlockBackend *blk);
diff --git a/nbd/server.c b/nbd/server.c
index 462ddb0e4adb..5c92bec6d005 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -1778,7 +1778,7 @@ static int nbd_export_create(BlockExport *blk_exp, BlockExportOptions *exp_args,
      * be properly quiesced when entering a drained section, as our coroutines
      * servicing pending requests might enter blk_pread().
      */
-    blk_set_disable_request_queuing(blk, true);
+    blk_disable_request_queuing(blk);
 
     blk_add_aio_context_notifier(blk, blk_aio_attached, blk_aio_detach, exp);
 
@@ -1854,7 +1854,6 @@ static void nbd_export_delete(BlockExport *blk_exp)
     }
     blk_remove_aio_context_notifier(exp->common.blk, blk_aio_attached,
                                     blk_aio_detach, exp);
-    blk_set_disable_request_queuing(exp->common.blk, false);
 
     for (i = 0; i < exp->nr_export_bitmaps; i++) {
         bdrv_dirty_bitmap_set_busy(exp->export_bitmaps[i], false);
diff --git a/tests/unit/test-bdrv-drain.c b/tests/unit/test-bdrv-drain.c
index f677f1d9fc25..d44fa1728589 100644
--- a/tests/unit/test-bdrv-drain.c
+++ b/tests/unit/test-bdrv-drain.c
@@ -541,7 +541,7 @@ static void test_iothread_common(enum drain_type drain_type, int drain_thread)
                               &error_abort);
     s = bs->opaque;
     blk_insert_bs(blk, bs, &error_abort);
-    blk_set_disable_request_queuing(blk, true);
+    blk_disable_request_queuing(blk);
 
     blk_set_aio_context(blk, ctx_a, &error_abort);
     aio_context_acquire(ctx_a);
@@ -767,7 +767,7 @@ static void test_blockjob_common_drain_node(enum drain_type drain_type,
                                   &error_abort);
     blk_target = blk_new(qemu_get_aio_context(), BLK_PERM_ALL, BLK_PERM_ALL);
     blk_insert_bs(blk_target, target, &error_abort);
-    blk_set_allow_aio_context_change(blk_target, true);
+    blk_allow_aio_context_change(blk_target);
 
     aio_context_acquire(ctx);
     tjob = block_job_create("job0", &test_job_driver, NULL, src,
diff --git a/tests/unit/test-block-iothread.c b/tests/unit/test-block-iothread.c
index 8ca5adec5e82..68df827cddd8 100644
--- a/tests/unit/test-block-iothread.c
+++ b/tests/unit/test-block-iothread.c
@@ -793,7 +793,7 @@ static void test_propagate_mirror(void)
 
     /* ...unless we explicitly allow it */
     aio_context_acquire(ctx);
-    blk_set_allow_aio_context_change(blk, true);
+    blk_allow_aio_context_change(blk);
     bdrv_try_change_aio_context(target, ctx, NULL, &error_abort);
     aio_context_release(ctx);
 
-- 
2.38.1



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 10/15] block-backend: always wait for drain before starting operation
  2022-12-12 12:59 [PATCH 00/12] More cleanups and fixes for drain Paolo Bonzini
                   ` (8 preceding siblings ...)
  2022-12-12 12:59 ` [PATCH 09/15] block-backend: make global properties write-once Paolo Bonzini
@ 2022-12-12 12:59 ` Paolo Bonzini
  2023-01-16 16:48   ` Kevin Wolf
  2022-12-12 12:59 ` [PATCH 11/15] block-backend: make queued_requests thread-safe Paolo Bonzini
                   ` (6 subsequent siblings)
  16 siblings, 1 reply; 22+ messages in thread
From: Paolo Bonzini @ 2022-12-12 12:59 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, eesposit, kwolf

All I/O operations call blk_wait_while_drained() immediately after
blk_inc_in_flight(), except for blk_abort_aio_request() where it
does not hurt to add such a call.  Merge the two functions into one,
and add a note about a disturbing lack of thread-safety that will
be fixed shortly.

While at it, make the quiesce_counter check a loop.  While the check
does not have to be "perfect", i.e. it only matters that an endless
stream of I/Os is stopped sooner or later, it is more logical to check
the counter repeatedly until it is zero.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 block/block-backend.c | 27 ++++++++-------------------
 1 file changed, 8 insertions(+), 19 deletions(-)

diff --git a/block/block-backend.c b/block/block-backend.c
index fe42d53d655d..627d491d4155 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -1270,18 +1270,6 @@ static int blk_check_byte_request(BlockBackend *blk, int64_t offset,
     return 0;
 }
 
-/* To be called between exactly one pair of blk_inc/dec_in_flight() */
-static void coroutine_fn blk_wait_while_drained(BlockBackend *blk)
-{
-    assert(blk->in_flight > 0);
-
-    if (blk->quiesce_counter && !blk->disable_request_queuing) {
-        blk_dec_in_flight(blk);
-        qemu_co_queue_wait(&blk->queued_requests, NULL);
-        blk_inc_in_flight(blk);
-    }
-}
-
 /* To be called between exactly one pair of blk_inc/dec_in_flight() */
 static int coroutine_fn
 blk_co_do_preadv_part(BlockBackend *blk, int64_t offset, int64_t bytes,
@@ -1334,7 +1322,6 @@ int coroutine_fn blk_co_preadv(BlockBackend *blk, int64_t offset,
     IO_OR_GS_CODE();
 
     blk_inc_in_flight(blk);
-    blk_wait_while_drained(blk);
     ret = blk_co_do_preadv_part(blk, offset, bytes, qiov, 0, flags);
     blk_dec_in_flight(blk);
 
@@ -1349,7 +1336,6 @@ int coroutine_fn blk_co_preadv_part(BlockBackend *blk, int64_t offset,
     IO_OR_GS_CODE();
 
     blk_inc_in_flight(blk);
-    blk_wait_while_drained(blk);
     ret = blk_co_do_preadv_part(blk, offset, bytes, qiov, qiov_offset, flags);
     blk_dec_in_flight(blk);
 
@@ -1401,7 +1387,6 @@ int coroutine_fn blk_co_pwritev_part(BlockBackend *blk, int64_t offset,
     IO_OR_GS_CODE();
 
     blk_inc_in_flight(blk);
-    blk_wait_while_drained(blk);
     ret = blk_co_do_pwritev_part(blk, offset, bytes, qiov, qiov_offset, flags);
     blk_dec_in_flight(blk);
 
@@ -1466,6 +1451,14 @@ void blk_inc_in_flight(BlockBackend *blk)
 {
     IO_CODE();
     qatomic_inc(&blk->in_flight);
+    if (!blk->disable_request_queuing) {
+        /* TODO: this is not thread-safe! */
+        while (blk->quiesce_counter) {
+            qatomic_dec(&blk->in_flight);
+            qemu_co_queue_wait(&blk->queued_requests, NULL);
+            qatomic_inc(&blk->in_flight);
+        }
+    }
 }
 
 void blk_dec_in_flight(BlockBackend *blk)
@@ -1546,7 +1539,6 @@ static BlockAIOCB *blk_aio_prwv(BlockBackend *blk, int64_t offset,
     Coroutine *co;
 
     blk_inc_in_flight(blk);
-    blk_wait_while_drained(blk);
     acb = blk_aio_get(&blk_aio_em_aiocb_info, blk, cb, opaque);
     acb->rwco = (BlkRwCo) {
         .blk    = blk,
@@ -1685,7 +1677,6 @@ int coroutine_fn blk_co_ioctl(BlockBackend *blk, unsigned long int req,
     IO_OR_GS_CODE();
 
     blk_inc_in_flight(blk);
-    blk_wait_while_drained(blk);
     ret = blk_co_do_ioctl(blk, req, buf);
     blk_dec_in_flight(blk);
 
@@ -1749,7 +1740,6 @@ int coroutine_fn blk_co_pdiscard(BlockBackend *blk, int64_t offset,
     IO_OR_GS_CODE();
 
     blk_inc_in_flight(blk);
-    blk_wait_while_drained(blk);
     ret = blk_co_do_pdiscard(blk, offset, bytes);
     blk_dec_in_flight(blk);
 
@@ -1790,7 +1780,6 @@ int coroutine_fn blk_co_flush(BlockBackend *blk)
     IO_OR_GS_CODE();
 
     blk_inc_in_flight(blk);
-    blk_wait_while_drained(blk);
     ret = blk_co_do_flush(blk);
     blk_dec_in_flight(blk);
 
-- 
2.38.1



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 11/15] block-backend: make queued_requests thread-safe
  2022-12-12 12:59 [PATCH 00/12] More cleanups and fixes for drain Paolo Bonzini
                   ` (9 preceding siblings ...)
  2022-12-12 12:59 ` [PATCH 10/15] block-backend: always wait for drain before starting operation Paolo Bonzini
@ 2022-12-12 12:59 ` Paolo Bonzini
  2023-01-11 20:44   ` Stefan Hajnoczi
  2023-01-16 16:55   ` Kevin Wolf
  2022-12-12 12:59 ` [PATCH 12/15] block: limit bdrv_co_yield_to_drain to drain_begin Paolo Bonzini
                   ` (5 subsequent siblings)
  16 siblings, 2 replies; 22+ messages in thread
From: Paolo Bonzini @ 2022-12-12 12:59 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, eesposit, kwolf

Protect quiesce_counter and queued_requests with a QemuMutex, so that
they are protected from concurrent access in the main thread (for example
blk_root_drained_end() reached from bdrv_drain_all()) and in the iothread
(where any I/O operation will call blk_inc_in_flight()).

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 block/block-backend.c | 44 +++++++++++++++++++++++++++++++++++--------
 1 file changed, 36 insertions(+), 8 deletions(-)

diff --git a/block/block-backend.c b/block/block-backend.c
index 627d491d4155..fdf82f1f1599 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -23,6 +23,7 @@
 #include "qapi/error.h"
 #include "qapi/qapi-events-block.h"
 #include "qemu/id.h"
+#include "qemu/thread.h"
 #include "qemu/main-loop.h"
 #include "qemu/option.h"
 #include "trace.h"
@@ -85,6 +86,8 @@ struct BlockBackend {
     NotifierList remove_bs_notifiers, insert_bs_notifiers;
     QLIST_HEAD(, BlockBackendAioNotifier) aio_notifiers;
 
+    /* Protected by quiesce_lock.  */
+    QemuMutex quiesce_lock;
     int quiesce_counter;
     CoQueue queued_requests;
 
@@ -372,6 +375,7 @@ BlockBackend *blk_new(AioContext *ctx, uint64_t perm, uint64_t shared_perm)
 
     block_acct_init(&blk->stats);
 
+    qemu_mutex_init(&blk->quiesce_lock);
     qemu_co_queue_init(&blk->queued_requests);
     notifier_list_init(&blk->remove_bs_notifiers);
     notifier_list_init(&blk->insert_bs_notifiers);
@@ -490,6 +494,7 @@ static void blk_delete(BlockBackend *blk)
     assert(QLIST_EMPTY(&blk->insert_bs_notifiers.notifiers));
     assert(QLIST_EMPTY(&blk->aio_notifiers));
     QTAILQ_REMOVE(&block_backends, blk, link);
+    qemu_mutex_destroy(&blk->quiesce_lock);
     drive_info_del(blk->legacy_dinfo);
     block_acct_cleanup(&blk->stats);
     g_free(blk);
@@ -1451,11 +1456,25 @@ void blk_inc_in_flight(BlockBackend *blk)
 {
     IO_CODE();
     qatomic_inc(&blk->in_flight);
-    if (!blk->disable_request_queuing) {
-        /* TODO: this is not thread-safe! */
+
+    /*
+     * Avoid a continuous stream of requests from AIO callbacks, which
+     * call a user-provided callback while blk->in_flight is elevated,
+     * if the BlockBackend is being quiesced.
+     *
+     * This initial test does not have to be perfect: a race might
+     * cause an extra I/O to be queued, but sooner or later a nonzero
+     * quiesce_counter will be observed.
+     */
+    if (!blk->disable_request_queuing && qatomic_read(&blk->quiesce_counter)) {
+        /*
+         * ... on the other hand, it is important that the final check and
+	 * the wait are done under the lock, so that no wakeups are missed.
+         */
+        QEMU_LOCK_GUARD(&blk->quiesce_lock);
         while (blk->quiesce_counter) {
             qatomic_dec(&blk->in_flight);
-            qemu_co_queue_wait(&blk->queued_requests, NULL);
+            qemu_co_queue_wait(&blk->queued_requests, &blk->quiesce_lock);
             qatomic_inc(&blk->in_flight);
         }
     }
@@ -2542,7 +2561,8 @@ static void blk_root_drained_begin(BdrvChild *child)
     BlockBackend *blk = child->opaque;
     ThrottleGroupMember *tgm = &blk->public.throttle_group_member;
 
-    if (++blk->quiesce_counter == 1) {
+    qatomic_set(&blk->quiesce_counter, blk->quiesce_counter + 1);
+    if (blk->quiesce_counter == 1) {
         if (blk->dev_ops && blk->dev_ops->drained_begin) {
             blk->dev_ops->drained_begin(blk->dev_opaque);
         }
@@ -2560,6 +2580,7 @@ static bool blk_root_drained_poll(BdrvChild *child)
 {
     BlockBackend *blk = child->opaque;
     bool busy = false;
+
     assert(blk->quiesce_counter);
 
     if (blk->dev_ops && blk->dev_ops->drained_poll) {
@@ -2576,14 +2597,21 @@ static void blk_root_drained_end(BdrvChild *child)
     assert(blk->public.throttle_group_member.io_limits_disabled);
     qatomic_dec(&blk->public.throttle_group_member.io_limits_disabled);
 
-    if (--blk->quiesce_counter == 0) {
+    qemu_mutex_lock(&blk->quiesce_lock);
+    qatomic_set(&blk->quiesce_counter, blk->quiesce_counter - 1);
+    if (blk->quiesce_counter == 0) {
+        /* After this point, new I/O will not sleep on queued_requests.  */
+        qemu_mutex_unlock(&blk->quiesce_lock);
+
         if (blk->dev_ops && blk->dev_ops->drained_end) {
             blk->dev_ops->drained_end(blk->dev_opaque);
         }
-        while (qemu_co_enter_next(&blk->queued_requests, NULL)) {
-            /* Resume all queued requests */
-        }
+
+        /* Now resume all queued requests */
+        qemu_mutex_lock(&blk->quiesce_lock);
+        qemu_co_enter_all(&blk->queued_requests, &blk->quiesce_lock);
     }
+    qemu_mutex_unlock(&blk->quiesce_lock);
 }
 
 bool blk_register_buf(BlockBackend *blk, void *host, size_t size, Error **errp)
-- 
2.38.1



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 12/15] block: limit bdrv_co_yield_to_drain to drain_begin
  2022-12-12 12:59 [PATCH 00/12] More cleanups and fixes for drain Paolo Bonzini
                   ` (10 preceding siblings ...)
  2022-12-12 12:59 ` [PATCH 11/15] block-backend: make queued_requests thread-safe Paolo Bonzini
@ 2022-12-12 12:59 ` Paolo Bonzini
  2022-12-12 12:59 ` [PATCH 13/15] block: second argument of bdrv_do_drained_end is always NULL Paolo Bonzini
                   ` (4 subsequent siblings)
  16 siblings, 0 replies; 22+ messages in thread
From: Paolo Bonzini @ 2022-12-12 12:59 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, eesposit, kwolf

Since bdrv_drained_end does not poll anymore, it need not jump out of
coroutine context.  This in turn enables the removal of the "begin"
field in BdrvCoDrainData.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 block/io.c | 19 +++----------------
 1 file changed, 3 insertions(+), 16 deletions(-)

diff --git a/block/io.c b/block/io.c
index f4444b7777d9..695c8f3f5faa 100644
--- a/block/io.c
+++ b/block/io.c
@@ -233,7 +233,6 @@ typedef struct {
     Coroutine *co;
     BlockDriverState *bs;
     bool done;
-    bool begin;
     bool poll;
     BdrvChild *parent;
 } BdrvCoDrainData;
@@ -275,15 +274,9 @@ static void bdrv_co_drain_bh_cb(void *opaque)
         AioContext *ctx = bdrv_get_aio_context(bs);
         aio_context_acquire(ctx);
         bdrv_dec_in_flight(bs);
-        if (data->begin) {
-            bdrv_do_drained_begin(bs, data->parent, data->poll);
-        } else {
-            assert(!data->poll);
-            bdrv_do_drained_end(bs, data->parent);
-        }
+        bdrv_do_drained_begin(bs, data->parent, data->poll);
         aio_context_release(ctx);
     } else {
-        assert(data->begin);
         bdrv_drain_all_begin();
     }
 
@@ -292,7 +285,6 @@ static void bdrv_co_drain_bh_cb(void *opaque)
 }
 
 static void coroutine_fn bdrv_co_yield_to_drain(BlockDriverState *bs,
-                                                bool begin,
                                                 BdrvChild *parent,
                                                 bool poll)
 {
@@ -309,7 +301,6 @@ static void coroutine_fn bdrv_co_yield_to_drain(BlockDriverState *bs,
         .co = self,
         .bs = bs,
         .done = false,
-        .begin = begin,
         .parent = parent,
         .poll = poll,
     };
@@ -348,7 +339,7 @@ static void bdrv_do_drained_begin(BlockDriverState *bs, BdrvChild *parent,
     IO_OR_GS_CODE();
 
     if (qemu_in_coroutine()) {
-        bdrv_co_yield_to_drain(bs, true, parent, poll);
+        bdrv_co_yield_to_drain(bs, parent, poll);
         return;
     }
 
@@ -394,10 +385,6 @@ static void bdrv_do_drained_end(BlockDriverState *bs, BdrvChild *parent)
 {
     int old_quiesce_counter;
 
-    if (qemu_in_coroutine()) {
-        bdrv_co_yield_to_drain(bs, false, parent, false);
-        return;
-    }
     assert(bs->quiesce_counter > 0);
 
     /* Re-enable things in child-to-parent order */
@@ -472,7 +459,7 @@ void bdrv_drain_all_begin(void)
     GLOBAL_STATE_CODE();
 
     if (qemu_in_coroutine()) {
-        bdrv_co_yield_to_drain(NULL, true, NULL, true);
+        bdrv_co_yield_to_drain(NULL, NULL, true);
         return;
     }
 
-- 
2.38.1



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 13/15] block: second argument of bdrv_do_drained_end is always NULL
  2022-12-12 12:59 [PATCH 00/12] More cleanups and fixes for drain Paolo Bonzini
                   ` (11 preceding siblings ...)
  2022-12-12 12:59 ` [PATCH 12/15] block: limit bdrv_co_yield_to_drain to drain_begin Paolo Bonzini
@ 2022-12-12 12:59 ` Paolo Bonzini
  2022-12-12 12:59 ` [PATCH 14/15] block: second argument of bdrv_do_drained_begin and bdrv_drain_poll " Paolo Bonzini
                   ` (3 subsequent siblings)
  16 siblings, 0 replies; 22+ messages in thread
From: Paolo Bonzini @ 2022-12-12 12:59 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, eesposit, kwolf

Remove it and unify the function with bdrv_drained_end.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 block/io.c | 21 ++++++---------------
 1 file changed, 6 insertions(+), 15 deletions(-)

diff --git a/block/io.c b/block/io.c
index 695c8f3f5faa..c2962adf8d2d 100644
--- a/block/io.c
+++ b/block/io.c
@@ -69,14 +69,11 @@ void bdrv_parent_drained_end_single(BdrvChild *c)
     }
 }
 
-static void bdrv_parent_drained_end(BlockDriverState *bs, BdrvChild *ignore)
+static void bdrv_parent_drained_end(BlockDriverState *bs)
 {
     BdrvChild *c;
 
     QLIST_FOREACH(c, &bs->parents, next_parent) {
-        if (c == ignore) {
-            continue;
-        }
         bdrv_parent_drained_end_single(c);
     }
 }
@@ -262,7 +259,6 @@ static bool bdrv_drain_poll_top_level(BlockDriverState *bs,
 
 static void bdrv_do_drained_begin(BlockDriverState *bs, BdrvChild *parent,
                                   bool poll);
-static void bdrv_do_drained_end(BlockDriverState *bs, BdrvChild *parent);
 
 static void bdrv_co_drain_bh_cb(void *opaque)
 {
@@ -381,10 +377,11 @@ void bdrv_drained_begin(BlockDriverState *bs)
  * This function does not poll, nor must any of its recursively called
  * functions.
  */
-static void bdrv_do_drained_end(BlockDriverState *bs, BdrvChild *parent)
+void bdrv_drained_end(BlockDriverState *bs)
 {
     int old_quiesce_counter;
 
+    IO_OR_GS_CODE();
     assert(bs->quiesce_counter > 0);
 
     /* Re-enable things in child-to-parent order */
@@ -393,17 +390,11 @@ static void bdrv_do_drained_end(BlockDriverState *bs, BdrvChild *parent)
         if (bs->drv && bs->drv->bdrv_drain_end) {
             bs->drv->bdrv_drain_end(bs);
         }
-        bdrv_parent_drained_end(bs, parent);
+        bdrv_parent_drained_end(bs);
         aio_enable_external(bdrv_get_aio_context(bs));
     }
 }
 
-void bdrv_drained_end(BlockDriverState *bs)
-{
-    IO_OR_GS_CODE();
-    bdrv_do_drained_end(bs, NULL);
-}
-
 void bdrv_drain(BlockDriverState *bs)
 {
     IO_OR_GS_CODE();
@@ -504,7 +495,7 @@ void bdrv_drain_all_end_quiesce(BlockDriverState *bs)
     g_assert(!bs->refcnt);
 
     while (bs->quiesce_counter) {
-        bdrv_do_drained_end(bs, NULL);
+        bdrv_drained_end(bs);
     }
 }
 
@@ -526,7 +517,7 @@ void bdrv_drain_all_end(void)
         AioContext *aio_context = bdrv_get_aio_context(bs);
 
         aio_context_acquire(aio_context);
-        bdrv_do_drained_end(bs, NULL);
+        bdrv_drained_end(bs);
         aio_context_release(aio_context);
     }
 
-- 
2.38.1



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 14/15] block: second argument of bdrv_do_drained_begin and bdrv_drain_poll is always NULL
  2022-12-12 12:59 [PATCH 00/12] More cleanups and fixes for drain Paolo Bonzini
                   ` (12 preceding siblings ...)
  2022-12-12 12:59 ` [PATCH 13/15] block: second argument of bdrv_do_drained_end is always NULL Paolo Bonzini
@ 2022-12-12 12:59 ` Paolo Bonzini
  2022-12-12 12:59 ` [PATCH 15/15] block: only get out of coroutine context for polling Paolo Bonzini
                   ` (2 subsequent siblings)
  16 siblings, 0 replies; 22+ messages in thread
From: Paolo Bonzini @ 2022-12-12 12:59 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, eesposit, kwolf

Remove it from the functions, from callers/callees such as
bdrv_do_drained_begin_quiesce() and bdrv_drain_poll(), and
from bdrv_co_yield_to_drain() and BdrvCoDrainData.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 block.c                  |  4 ++--
 block/io.c               | 49 ++++++++++++++++------------------------
 include/block/block-io.h |  7 +++---
 3 files changed, 24 insertions(+), 36 deletions(-)

diff --git a/block.c b/block.c
index c542a0a33358..676bbe0529b0 100644
--- a/block.c
+++ b/block.c
@@ -1186,13 +1186,13 @@ static char *bdrv_child_get_parent_desc(BdrvChild *c)
 static void bdrv_child_cb_drained_begin(BdrvChild *child)
 {
     BlockDriverState *bs = child->opaque;
-    bdrv_do_drained_begin_quiesce(bs, NULL);
+    bdrv_do_drained_begin_quiesce(bs);
 }
 
 static bool bdrv_child_cb_drained_poll(BdrvChild *child)
 {
     BlockDriverState *bs = child->opaque;
-    return bdrv_drain_poll(bs, NULL, false);
+    return bdrv_drain_poll(bs, false);
 }
 
 static void bdrv_child_cb_drained_end(BdrvChild *child)
diff --git a/block/io.c b/block/io.c
index c2962adf8d2d..a75f42ee13cb 100644
--- a/block/io.c
+++ b/block/io.c
@@ -45,14 +45,11 @@ static void bdrv_parent_cb_resize(BlockDriverState *bs);
 static int coroutine_fn bdrv_co_do_pwrite_zeroes(BlockDriverState *bs,
     int64_t offset, int64_t bytes, BdrvRequestFlags flags);
 
-static void bdrv_parent_drained_begin(BlockDriverState *bs, BdrvChild *ignore)
+static void bdrv_parent_drained_begin(BlockDriverState *bs)
 {
     BdrvChild *c, *next;
 
     QLIST_FOREACH_SAFE(c, &bs->parents, next_parent, next) {
-        if (c == ignore) {
-            continue;
-        }
         bdrv_parent_drained_begin_single(c);
     }
 }
@@ -86,14 +83,13 @@ bool bdrv_parent_drained_poll_single(BdrvChild *c)
     return false;
 }
 
-static bool bdrv_parent_drained_poll(BlockDriverState *bs, BdrvChild *ignore,
-                                     bool ignore_bds_parents)
+static bool bdrv_parent_drained_poll(BlockDriverState *bs, bool ignore_bds_parents)
 {
     BdrvChild *c, *next;
     bool busy = false;
 
     QLIST_FOREACH_SAFE(c, &bs->parents, next_parent, next) {
-        if (c == ignore || (ignore_bds_parents && c->klass->parent_is_bds)) {
+        if (ignore_bds_parents && c->klass->parent_is_bds) {
             continue;
         }
         busy |= bdrv_parent_drained_poll_single(c);
@@ -231,16 +227,14 @@ typedef struct {
     BlockDriverState *bs;
     bool done;
     bool poll;
-    BdrvChild *parent;
 } BdrvCoDrainData;
 
 /* Returns true if BDRV_POLL_WHILE() should go into a blocking aio_poll() */
-bool bdrv_drain_poll(BlockDriverState *bs, BdrvChild *ignore_parent,
-                     bool ignore_bds_parents)
+bool bdrv_drain_poll(BlockDriverState *bs, bool ignore_bds_parents)
 {
     IO_OR_GS_CODE();
 
-    if (bdrv_parent_drained_poll(bs, ignore_parent, ignore_bds_parents)) {
+    if (bdrv_parent_drained_poll(bs, ignore_bds_parents)) {
         return true;
     }
 
@@ -251,14 +245,12 @@ bool bdrv_drain_poll(BlockDriverState *bs, BdrvChild *ignore_parent,
     return false;
 }
 
-static bool bdrv_drain_poll_top_level(BlockDriverState *bs,
-                                      BdrvChild *ignore_parent)
+static bool bdrv_drain_poll_top_level(BlockDriverState *bs)
 {
-    return bdrv_drain_poll(bs, ignore_parent, false);
+    return bdrv_drain_poll(bs, false);
 }
 
-static void bdrv_do_drained_begin(BlockDriverState *bs, BdrvChild *parent,
-                                  bool poll);
+static void bdrv_do_drained_begin(BlockDriverState *bs, bool poll);
 
 static void bdrv_co_drain_bh_cb(void *opaque)
 {
@@ -270,7 +262,7 @@ static void bdrv_co_drain_bh_cb(void *opaque)
         AioContext *ctx = bdrv_get_aio_context(bs);
         aio_context_acquire(ctx);
         bdrv_dec_in_flight(bs);
-        bdrv_do_drained_begin(bs, data->parent, data->poll);
+        bdrv_do_drained_begin(bs, data->poll);
         aio_context_release(ctx);
     } else {
         bdrv_drain_all_begin();
@@ -281,7 +273,6 @@ static void bdrv_co_drain_bh_cb(void *opaque)
 }
 
 static void coroutine_fn bdrv_co_yield_to_drain(BlockDriverState *bs,
-                                                BdrvChild *parent,
                                                 bool poll)
 {
     BdrvCoDrainData data;
@@ -297,7 +288,6 @@ static void coroutine_fn bdrv_co_yield_to_drain(BlockDriverState *bs,
         .co = self,
         .bs = bs,
         .done = false,
-        .parent = parent,
         .poll = poll,
     };
 
@@ -329,20 +319,19 @@ static void coroutine_fn bdrv_co_yield_to_drain(BlockDriverState *bs,
     }
 }
 
-static void bdrv_do_drained_begin(BlockDriverState *bs, BdrvChild *parent,
-                                  bool poll)
+static void bdrv_do_drained_begin(BlockDriverState *bs, bool poll)
 {
     IO_OR_GS_CODE();
 
     if (qemu_in_coroutine()) {
-        bdrv_co_yield_to_drain(bs, parent, poll);
+        bdrv_co_yield_to_drain(bs, poll);
         return;
     }
 
     /* Stop things in parent-to-child order */
     if (qatomic_fetch_inc(&bs->quiesce_counter) == 0) {
         aio_disable_external(bdrv_get_aio_context(bs));
-        bdrv_parent_drained_begin(bs, parent);
+        bdrv_parent_drained_begin(bs);
         if (bs->drv && bs->drv->bdrv_drain_begin) {
             bs->drv->bdrv_drain_begin(bs);
         }
@@ -358,19 +347,19 @@ static void bdrv_do_drained_begin(BlockDriverState *bs, BdrvChild *parent,
      * nodes.
      */
     if (poll) {
-        BDRV_POLL_WHILE(bs, bdrv_drain_poll_top_level(bs, parent));
+        BDRV_POLL_WHILE(bs, bdrv_drain_poll_top_level(bs));
     }
 }
 
-void bdrv_do_drained_begin_quiesce(BlockDriverState *bs, BdrvChild *parent)
+void bdrv_do_drained_begin_quiesce(BlockDriverState *bs)
 {
-    bdrv_do_drained_begin(bs, parent, false);
+    bdrv_do_drained_begin(bs, false);
 }
 
 void bdrv_drained_begin(BlockDriverState *bs)
 {
     IO_OR_GS_CODE();
-    bdrv_do_drained_begin(bs, NULL, true);
+    bdrv_do_drained_begin(bs, true);
 }
 
 /**
@@ -425,7 +414,7 @@ static bool bdrv_drain_all_poll(void)
     while ((bs = bdrv_next_all_states(bs))) {
         AioContext *aio_context = bdrv_get_aio_context(bs);
         aio_context_acquire(aio_context);
-        result |= bdrv_drain_poll(bs, NULL, true);
+        result |= bdrv_drain_poll(bs, true);
         aio_context_release(aio_context);
     }
 
@@ -450,7 +439,7 @@ void bdrv_drain_all_begin(void)
     GLOBAL_STATE_CODE();
 
     if (qemu_in_coroutine()) {
-        bdrv_co_yield_to_drain(NULL, NULL, true);
+        bdrv_co_yield_to_drain(NULL, true);
         return;
     }
 
@@ -475,7 +464,7 @@ void bdrv_drain_all_begin(void)
         AioContext *aio_context = bdrv_get_aio_context(bs);
 
         aio_context_acquire(aio_context);
-        bdrv_do_drained_begin(bs, NULL, false);
+        bdrv_do_drained_begin(bs, false);
         aio_context_release(aio_context);
     }
 
diff --git a/include/block/block-io.h b/include/block/block-io.h
index 10659a3f246c..2f596a56fe0f 100644
--- a/include/block/block-io.h
+++ b/include/block/block-io.h
@@ -308,7 +308,7 @@ bdrv_writev_vmstate(BlockDriverState *bs, QEMUIOVector *qiov, int64_t pos);
 /**
  * bdrv_drain_poll:
  *
- * Poll for pending requests in @bs and its parents (except for @ignore_parent).
+ * Poll for pending requests in @bs and its parents.
  *
  * If @ignore_bds_parents is true, parents that are BlockDriverStates must
  * ignore the drain request because they will be drained separately (used for
@@ -316,8 +316,7 @@ bdrv_writev_vmstate(BlockDriverState *bs, QEMUIOVector *qiov, int64_t pos);
  *
  * This is part of bdrv_drained_begin.
  */
-bool bdrv_drain_poll(BlockDriverState *bs, BdrvChild *ignore_parent,
-                     bool ignore_bds_parents);
+bool bdrv_drain_poll(BlockDriverState *bs, bool ignore_bds_parents);
 
 /**
  * bdrv_drained_begin:
@@ -335,7 +334,7 @@ void bdrv_drained_begin(BlockDriverState *bs);
  * Quiesces a BDS like bdrv_drained_begin(), but does not wait for already
  * running requests to complete.
  */
-void bdrv_do_drained_begin_quiesce(BlockDriverState *bs, BdrvChild *parent);
+void bdrv_do_drained_begin_quiesce(BlockDriverState *bs);
 
 /**
  * bdrv_drained_end:
-- 
2.38.1



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 15/15] block: only get out of coroutine context for polling
  2022-12-12 12:59 [PATCH 00/12] More cleanups and fixes for drain Paolo Bonzini
                   ` (13 preceding siblings ...)
  2022-12-12 12:59 ` [PATCH 14/15] block: second argument of bdrv_do_drained_begin and bdrv_drain_poll " Paolo Bonzini
@ 2022-12-12 12:59 ` Paolo Bonzini
  2023-01-16 15:51 ` [PATCH 00/12] More cleanups and fixes for drain Kevin Wolf
  2023-01-16 17:25 ` Kevin Wolf
  16 siblings, 0 replies; 22+ messages in thread
From: Paolo Bonzini @ 2022-12-12 12:59 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, eesposit, kwolf

The drained_begin callbacks are not polling, all they do is schedule
any operations that must complete before bdrv_drained_end().  As such,
they can be called before bdrv_co_yield_to_drain().  Thus, the only
remaining task left for bdrv_co_drain_bh_cb() is the BDRV_POLL_WHILE()
loop.  This patch extracts it into two new functions bdrv_drain_wait()
and bdrv_drain_all_wait(), so that it is easily accessible from
bdrv_co_drain_bh_cb().

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 block/io.c | 77 +++++++++++++++++++++++++++---------------------------
 1 file changed, 38 insertions(+), 39 deletions(-)

diff --git a/block/io.c b/block/io.c
index a75f42ee13cb..fc4bc38ee9d5 100644
--- a/block/io.c
+++ b/block/io.c
@@ -245,13 +245,8 @@ bool bdrv_drain_poll(BlockDriverState *bs, bool ignore_bds_parents)
     return false;
 }
 
-static bool bdrv_drain_poll_top_level(BlockDriverState *bs)
-{
-    return bdrv_drain_poll(bs, false);
-}
-
-static void bdrv_do_drained_begin(BlockDriverState *bs, bool poll);
-
+static void bdrv_drain_wait(BlockDriverState *bs);
+static void bdrv_drain_all_wait(void);
 static void bdrv_co_drain_bh_cb(void *opaque)
 {
     BdrvCoDrainData *data = opaque;
@@ -262,18 +257,17 @@ static void bdrv_co_drain_bh_cb(void *opaque)
         AioContext *ctx = bdrv_get_aio_context(bs);
         aio_context_acquire(ctx);
         bdrv_dec_in_flight(bs);
-        bdrv_do_drained_begin(bs, data->poll);
+        bdrv_drain_wait(bs);
         aio_context_release(ctx);
     } else {
-        bdrv_drain_all_begin();
+        bdrv_drain_all_wait();
     }
 
     data->done = true;
     aio_co_wake(co);
 }
 
-static void coroutine_fn bdrv_co_yield_to_drain(BlockDriverState *bs,
-                                                bool poll)
+static void coroutine_fn bdrv_co_yield_to_drain(BlockDriverState *bs)
 {
     BdrvCoDrainData data;
     Coroutine *self = qemu_coroutine_self();
@@ -288,7 +282,6 @@ static void coroutine_fn bdrv_co_yield_to_drain(BlockDriverState *bs,
         .co = self,
         .bs = bs,
         .done = false,
-        .poll = poll,
     };
 
     if (bs) {
@@ -319,15 +312,10 @@ static void coroutine_fn bdrv_co_yield_to_drain(BlockDriverState *bs,
     }
 }
 
-static void bdrv_do_drained_begin(BlockDriverState *bs, bool poll)
+void bdrv_do_drained_begin_quiesce(BlockDriverState *bs)
 {
     IO_OR_GS_CODE();
 
-    if (qemu_in_coroutine()) {
-        bdrv_co_yield_to_drain(bs, poll);
-        return;
-    }
-
     /* Stop things in parent-to-child order */
     if (qatomic_fetch_inc(&bs->quiesce_counter) == 0) {
         aio_disable_external(bdrv_get_aio_context(bs));
@@ -336,6 +324,23 @@ static void bdrv_do_drained_begin(BlockDriverState *bs, bool poll)
             bs->drv->bdrv_drain_begin(bs);
         }
     }
+}
+
+void bdrv_drained_begin(BlockDriverState *bs)
+{
+    IO_OR_GS_CODE();
+    bdrv_do_drained_begin_quiesce(bs);
+    bdrv_drain_wait(bs);
+}
+
+static void bdrv_drain_wait(BlockDriverState *bs)
+{
+    IO_OR_GS_CODE();
+
+    if (qemu_in_coroutine()) {
+        bdrv_co_yield_to_drain(bs);
+        return;
+    }
 
     /*
      * Wait for drained requests to finish.
@@ -346,20 +351,7 @@ static void bdrv_do_drained_begin(BlockDriverState *bs, bool poll)
      * includes other nodes in the same AioContext and therefore all child
      * nodes.
      */
-    if (poll) {
-        BDRV_POLL_WHILE(bs, bdrv_drain_poll_top_level(bs));
-    }
-}
-
-void bdrv_do_drained_begin_quiesce(BlockDriverState *bs)
-{
-    bdrv_do_drained_begin(bs, false);
-}
-
-void bdrv_drained_begin(BlockDriverState *bs)
-{
-    IO_OR_GS_CODE();
-    bdrv_do_drained_begin(bs, true);
+    BDRV_POLL_WHILE(bs, bdrv_drain_poll(bs, false));
 }
 
 /**
@@ -438,11 +430,6 @@ void bdrv_drain_all_begin(void)
     BlockDriverState *bs = NULL;
     GLOBAL_STATE_CODE();
 
-    if (qemu_in_coroutine()) {
-        bdrv_co_yield_to_drain(NULL, true);
-        return;
-    }
-
     /*
      * bdrv queue is managed by record/replay,
      * waiting for finishing the I/O requests may
@@ -464,18 +451,30 @@ void bdrv_drain_all_begin(void)
         AioContext *aio_context = bdrv_get_aio_context(bs);
 
         aio_context_acquire(aio_context);
-        bdrv_do_drained_begin(bs, false);
+        bdrv_do_drained_begin_quiesce(bs);
         aio_context_release(aio_context);
     }
 
     /* Now poll the in-flight requests */
-    AIO_WAIT_WHILE(NULL, bdrv_drain_all_poll());
+    bdrv_drain_all_wait();
 
     while ((bs = bdrv_next_all_states(bs))) {
         bdrv_drain_assert_idle(bs);
     }
 }
 
+static void bdrv_drain_all_wait(void)
+{
+    GLOBAL_STATE_CODE();
+
+    if (qemu_in_coroutine()) {
+        bdrv_co_yield_to_drain(NULL);
+        return;
+    }
+
+    AIO_WAIT_WHILE(NULL, bdrv_drain_all_poll());
+}
+
 void bdrv_drain_all_end_quiesce(BlockDriverState *bs)
 {
     GLOBAL_STATE_CODE();
-- 
2.38.1



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH 11/15] block-backend: make queued_requests thread-safe
  2022-12-12 12:59 ` [PATCH 11/15] block-backend: make queued_requests thread-safe Paolo Bonzini
@ 2023-01-11 20:44   ` Stefan Hajnoczi
  2023-01-16 16:55   ` Kevin Wolf
  1 sibling, 0 replies; 22+ messages in thread
From: Stefan Hajnoczi @ 2023-01-11 20:44 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel, qemu-block, eesposit, kwolf

[-- Attachment #1: Type: text/plain, Size: 5395 bytes --]

On Mon, Dec 12, 2022 at 01:59:16PM +0100, Paolo Bonzini wrote:
> Protect quiesce_counter and queued_requests with a QemuMutex, so that
> they are protected from concurrent access in the main thread (for example
> blk_root_drained_end() reached from bdrv_drain_all()) and in the iothread
> (where any I/O operation will call blk_inc_in_flight()).
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  block/block-backend.c | 44 +++++++++++++++++++++++++++++++++++--------
>  1 file changed, 36 insertions(+), 8 deletions(-)
> 
> diff --git a/block/block-backend.c b/block/block-backend.c
> index 627d491d4155..fdf82f1f1599 100644
> --- a/block/block-backend.c
> +++ b/block/block-backend.c
> @@ -23,6 +23,7 @@
>  #include "qapi/error.h"
>  #include "qapi/qapi-events-block.h"
>  #include "qemu/id.h"
> +#include "qemu/thread.h"
>  #include "qemu/main-loop.h"
>  #include "qemu/option.h"
>  #include "trace.h"
> @@ -85,6 +86,8 @@ struct BlockBackend {
>      NotifierList remove_bs_notifiers, insert_bs_notifiers;
>      QLIST_HEAD(, BlockBackendAioNotifier) aio_notifiers;
>  
> +    /* Protected by quiesce_lock.  */
> +    QemuMutex quiesce_lock;
>      int quiesce_counter;
>      CoQueue queued_requests;
>  
> @@ -372,6 +375,7 @@ BlockBackend *blk_new(AioContext *ctx, uint64_t perm, uint64_t shared_perm)
>  
>      block_acct_init(&blk->stats);
>  
> +    qemu_mutex_init(&blk->quiesce_lock);
>      qemu_co_queue_init(&blk->queued_requests);
>      notifier_list_init(&blk->remove_bs_notifiers);
>      notifier_list_init(&blk->insert_bs_notifiers);
> @@ -490,6 +494,7 @@ static void blk_delete(BlockBackend *blk)
>      assert(QLIST_EMPTY(&blk->insert_bs_notifiers.notifiers));
>      assert(QLIST_EMPTY(&blk->aio_notifiers));
>      QTAILQ_REMOVE(&block_backends, blk, link);
> +    qemu_mutex_destroy(&blk->quiesce_lock);
>      drive_info_del(blk->legacy_dinfo);
>      block_acct_cleanup(&blk->stats);
>      g_free(blk);
> @@ -1451,11 +1456,25 @@ void blk_inc_in_flight(BlockBackend *blk)
>  {
>      IO_CODE();
>      qatomic_inc(&blk->in_flight);
> -    if (!blk->disable_request_queuing) {
> -        /* TODO: this is not thread-safe! */
> +
> +    /*
> +     * Avoid a continuous stream of requests from AIO callbacks, which
> +     * call a user-provided callback while blk->in_flight is elevated,
> +     * if the BlockBackend is being quiesced.
> +     *
> +     * This initial test does not have to be perfect: a race might
> +     * cause an extra I/O to be queued, but sooner or later a nonzero
> +     * quiesce_counter will be observed.
> +     */
> +    if (!blk->disable_request_queuing && qatomic_read(&blk->quiesce_counter)) {
> +        /*
> +         * ... on the other hand, it is important that the final check and
> +	 * the wait are done under the lock, so that no wakeups are missed.
> +         */
> +        QEMU_LOCK_GUARD(&blk->quiesce_lock);
>          while (blk->quiesce_counter) {
>              qatomic_dec(&blk->in_flight);
> -            qemu_co_queue_wait(&blk->queued_requests, NULL);
> +            qemu_co_queue_wait(&blk->queued_requests, &blk->quiesce_lock);
>              qatomic_inc(&blk->in_flight);
>          }
>      }
> @@ -2542,7 +2561,8 @@ static void blk_root_drained_begin(BdrvChild *child)
>      BlockBackend *blk = child->opaque;
>      ThrottleGroupMember *tgm = &blk->public.throttle_group_member;
>  
> -    if (++blk->quiesce_counter == 1) {
> +    qatomic_set(&blk->quiesce_counter, blk->quiesce_counter + 1);

Can drained begin race with drained end? If yes, then this atomic set
looks racy because drained end might be updating the counter at around
the same time. We should probably hold quiesce_lock?

> +    if (blk->quiesce_counter == 1) {
>          if (blk->dev_ops && blk->dev_ops->drained_begin) {
>              blk->dev_ops->drained_begin(blk->dev_opaque);
>          }
> @@ -2560,6 +2580,7 @@ static bool blk_root_drained_poll(BdrvChild *child)
>  {
>      BlockBackend *blk = child->opaque;
>      bool busy = false;
> +
>      assert(blk->quiesce_counter);
>  
>      if (blk->dev_ops && blk->dev_ops->drained_poll) {
> @@ -2576,14 +2597,21 @@ static void blk_root_drained_end(BdrvChild *child)
>      assert(blk->public.throttle_group_member.io_limits_disabled);
>      qatomic_dec(&blk->public.throttle_group_member.io_limits_disabled);
>  
> -    if (--blk->quiesce_counter == 0) {
> +    qemu_mutex_lock(&blk->quiesce_lock);
> +    qatomic_set(&blk->quiesce_counter, blk->quiesce_counter - 1);
> +    if (blk->quiesce_counter == 0) {
> +        /* After this point, new I/O will not sleep on queued_requests.  */
> +        qemu_mutex_unlock(&blk->quiesce_lock);
> +
>          if (blk->dev_ops && blk->dev_ops->drained_end) {
>              blk->dev_ops->drained_end(blk->dev_opaque);
>          }
> -        while (qemu_co_enter_next(&blk->queued_requests, NULL)) {
> -            /* Resume all queued requests */
> -        }
> +
> +        /* Now resume all queued requests */
> +        qemu_mutex_lock(&blk->quiesce_lock);
> +        qemu_co_enter_all(&blk->queued_requests, &blk->quiesce_lock);
>      }
> +    qemu_mutex_unlock(&blk->quiesce_lock);
>  }
>  
>  bool blk_register_buf(BlockBackend *blk, void *host, size_t size, Error **errp)
> -- 
> 2.38.1

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 00/12] More cleanups and fixes for drain
  2022-12-12 12:59 [PATCH 00/12] More cleanups and fixes for drain Paolo Bonzini
                   ` (14 preceding siblings ...)
  2022-12-12 12:59 ` [PATCH 15/15] block: only get out of coroutine context for polling Paolo Bonzini
@ 2023-01-16 15:51 ` Kevin Wolf
  2023-01-16 17:25 ` Kevin Wolf
  16 siblings, 0 replies; 22+ messages in thread
From: Kevin Wolf @ 2023-01-16 15:51 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel, qemu-block, eesposit

Am 12.12.2022 um 13:59 hat Paolo Bonzini geschrieben:
> There are a few more lines of code that can be removed around draining
> code, but especially the logic can be simplified by removing unnecessary
> parameters.
> 
> Due to the failure of the block-next branch, the first three patches
> drop patches 14+15 of Kevin's drain cleanup series, and then redo
> patch 15 in a slightly less satisfactory way that still enables the
> remaining cleanups.  These reverts are not supposed to be applied;
> either the offending patches are dropped from the branch, or if the
> issue is fixed then my first three patches can go away.

Can you remind me which problem this was? The patches are in master now,
but I'm not sure if the latest version fixed whatever you had in mind.

> The next three are taken from Emanuele's old subtree drain attempt
> at removing the AioContext.  The main one is the second, which is needed
> to avoid testcase failures, but I included all of them for simplicity.
> 
> Patch 7 fixes another latent bug exposed by the later cleanups, and while
> looking for a fix I noticed a general lack of thread-safety in BlockBackend's
> drain code.  There are some global properties that only need to be documented
> and enforced to be set only at creation time (patches 8/9), but also
> queued_requests is not protected by any mutex, which is fixed in patch 10.
> 
> Finally, patches 11-15 are the actual simplification.
> 
> Applies on top of block-next.

Not any more. :-)

I found out that it applies on top of 6355f90eef, which may work for
some basic review, but the conflicts when rebasing seem non-trivial, so
we'll need a v2.

Kevin



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 07/15] block-backend: enter aio coroutine only after drain
  2022-12-12 12:59 ` [PATCH 07/15] block-backend: enter aio coroutine only after drain Paolo Bonzini
@ 2023-01-16 15:57   ` Kevin Wolf
  0 siblings, 0 replies; 22+ messages in thread
From: Kevin Wolf @ 2023-01-16 15:57 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel, qemu-block, eesposit

Am 12.12.2022 um 13:59 hat Paolo Bonzini geschrieben:
> When called from within (another) coroutine, aio_co_enter will not
> enter a coroutine immediately; instead the new coroutine is scheduled
> to run after qemu_coroutine_yield().  This however might cause the
> currently-running coroutine to yield without having raised blk->in_flight.

I assume you're talking about the blk_aio_prwv() path here. However,
calling blk_inc_in_flight() is the very first thing it does (before even
calling bdrv_coroutine_enter -> aio_co_enter), so I don't understand how
it could happen that it yields before increasing the counter.

> If it was a ->drained_begin() callback who scheduled the coroutine,

Which one? The one that executes blk_aio_prwv()?

> bdrv_drained_begin() might exit without waiting for the I/O operation
> to finish.  Right now, this is masked by unnecessary polling done by
> bdrv_drained_begin() after the callbacks return, but it is wrong and
> a latent bug.
> 
> So, ensure that blk_inc_in_flight() and blk_wait_while_drained()
> are called before aio_co_enter().  To do so, pull the call to
> blk_wait_while_drained() out of the blk_co_do_* functions, which are
> called from the AIO coroutines, and place them separately in the public
> blk_co_* functions and in blk_aio_prwv.

You can't call blk_wait_while_drained() in blk_aio_prwv() because the
latter isn't a coroutine_fn.

> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  block/block-backend.c | 16 +++++++---------
>  1 file changed, 7 insertions(+), 9 deletions(-)

Kevin



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 10/15] block-backend: always wait for drain before starting operation
  2022-12-12 12:59 ` [PATCH 10/15] block-backend: always wait for drain before starting operation Paolo Bonzini
@ 2023-01-16 16:48   ` Kevin Wolf
  0 siblings, 0 replies; 22+ messages in thread
From: Kevin Wolf @ 2023-01-16 16:48 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel, qemu-block, eesposit

Am 12.12.2022 um 13:59 hat Paolo Bonzini geschrieben:
> All I/O operations call blk_wait_while_drained() immediately after
> blk_inc_in_flight(), except for blk_abort_aio_request() where it
> does not hurt to add such a call.  Merge the two functions into one,
> and add a note about a disturbing lack of thread-safety that will
> be fixed shortly.
> 
> While at it, make the quiesce_counter check a loop.  While the check
> does not have to be "perfect", i.e. it only matters that an endless
> stream of I/Os is stopped sooner or later, it is more logical to check
> the counter repeatedly until it is zero.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  block/block-backend.c | 27 ++++++++-------------------
>  1 file changed, 8 insertions(+), 19 deletions(-)
> 
> diff --git a/block/block-backend.c b/block/block-backend.c
> index fe42d53d655d..627d491d4155 100644
> --- a/block/block-backend.c
> +++ b/block/block-backend.c
> @@ -1270,18 +1270,6 @@ static int blk_check_byte_request(BlockBackend *blk, int64_t offset,
>      return 0;
>  }
>  
> -/* To be called between exactly one pair of blk_inc/dec_in_flight() */
> -static void coroutine_fn blk_wait_while_drained(BlockBackend *blk)
> -{
> -    assert(blk->in_flight > 0);
> -
> -    if (blk->quiesce_counter && !blk->disable_request_queuing) {
> -        blk_dec_in_flight(blk);
> -        qemu_co_queue_wait(&blk->queued_requests, NULL);
> -        blk_inc_in_flight(blk);
> -    }
> -}
> -
>  /* To be called between exactly one pair of blk_inc/dec_in_flight() */
>  static int coroutine_fn
>  blk_co_do_preadv_part(BlockBackend *blk, int64_t offset, int64_t bytes,
> @@ -1334,7 +1322,6 @@ int coroutine_fn blk_co_preadv(BlockBackend *blk, int64_t offset,
>      IO_OR_GS_CODE();
>  
>      blk_inc_in_flight(blk);
> -    blk_wait_while_drained(blk);
>      ret = blk_co_do_preadv_part(blk, offset, bytes, qiov, 0, flags);
>      blk_dec_in_flight(blk);
>  
> @@ -1349,7 +1336,6 @@ int coroutine_fn blk_co_preadv_part(BlockBackend *blk, int64_t offset,
>      IO_OR_GS_CODE();
>  
>      blk_inc_in_flight(blk);
> -    blk_wait_while_drained(blk);
>      ret = blk_co_do_preadv_part(blk, offset, bytes, qiov, qiov_offset, flags);
>      blk_dec_in_flight(blk);
>  
> @@ -1401,7 +1387,6 @@ int coroutine_fn blk_co_pwritev_part(BlockBackend *blk, int64_t offset,
>      IO_OR_GS_CODE();
>  
>      blk_inc_in_flight(blk);
> -    blk_wait_while_drained(blk);
>      ret = blk_co_do_pwritev_part(blk, offset, bytes, qiov, qiov_offset, flags);
>      blk_dec_in_flight(blk);
>  
> @@ -1466,6 +1451,14 @@ void blk_inc_in_flight(BlockBackend *blk)
>  {
>      IO_CODE();
>      qatomic_inc(&blk->in_flight);
> +    if (!blk->disable_request_queuing) {
> +        /* TODO: this is not thread-safe! */
> +        while (blk->quiesce_counter) {
> +            qatomic_dec(&blk->in_flight);
> +            qemu_co_queue_wait(&blk->queued_requests, NULL);

blk_inc_in_flight() must be a coroutine_fn now.

blk_abort_aio_request() and blk_aio_prwv() aren't, but still call it.

> +            qatomic_inc(&blk->in_flight);
> +        }
> +    }
>  }

Kevin



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 11/15] block-backend: make queued_requests thread-safe
  2022-12-12 12:59 ` [PATCH 11/15] block-backend: make queued_requests thread-safe Paolo Bonzini
  2023-01-11 20:44   ` Stefan Hajnoczi
@ 2023-01-16 16:55   ` Kevin Wolf
  1 sibling, 0 replies; 22+ messages in thread
From: Kevin Wolf @ 2023-01-16 16:55 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel, qemu-block, eesposit

Am 12.12.2022 um 13:59 hat Paolo Bonzini geschrieben:
> Protect quiesce_counter and queued_requests with a QemuMutex, so that
> they are protected from concurrent access in the main thread (for example
> blk_root_drained_end() reached from bdrv_drain_all()) and in the iothread
> (where any I/O operation will call blk_inc_in_flight()).
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  block/block-backend.c | 44 +++++++++++++++++++++++++++++++++++--------
>  1 file changed, 36 insertions(+), 8 deletions(-)
> 
> diff --git a/block/block-backend.c b/block/block-backend.c
> index 627d491d4155..fdf82f1f1599 100644
> --- a/block/block-backend.c
> +++ b/block/block-backend.c
> @@ -23,6 +23,7 @@
>  #include "qapi/error.h"
>  #include "qapi/qapi-events-block.h"
>  #include "qemu/id.h"
> +#include "qemu/thread.h"
>  #include "qemu/main-loop.h"
>  #include "qemu/option.h"
>  #include "trace.h"
> @@ -85,6 +86,8 @@ struct BlockBackend {
>      NotifierList remove_bs_notifiers, insert_bs_notifiers;
>      QLIST_HEAD(, BlockBackendAioNotifier) aio_notifiers;
>  
> +    /* Protected by quiesce_lock.  */
> +    QemuMutex quiesce_lock;
>      int quiesce_counter;
>      CoQueue queued_requests;
>  
> @@ -372,6 +375,7 @@ BlockBackend *blk_new(AioContext *ctx, uint64_t perm, uint64_t shared_perm)
>  
>      block_acct_init(&blk->stats);
>  
> +    qemu_mutex_init(&blk->quiesce_lock);
>      qemu_co_queue_init(&blk->queued_requests);
>      notifier_list_init(&blk->remove_bs_notifiers);
>      notifier_list_init(&blk->insert_bs_notifiers);
> @@ -490,6 +494,7 @@ static void blk_delete(BlockBackend *blk)
>      assert(QLIST_EMPTY(&blk->insert_bs_notifiers.notifiers));
>      assert(QLIST_EMPTY(&blk->aio_notifiers));
>      QTAILQ_REMOVE(&block_backends, blk, link);
> +    qemu_mutex_destroy(&blk->quiesce_lock);
>      drive_info_del(blk->legacy_dinfo);
>      block_acct_cleanup(&blk->stats);
>      g_free(blk);
> @@ -1451,11 +1456,25 @@ void blk_inc_in_flight(BlockBackend *blk)
>  {
>      IO_CODE();
>      qatomic_inc(&blk->in_flight);
> -    if (!blk->disable_request_queuing) {
> -        /* TODO: this is not thread-safe! */
> +
> +    /*
> +     * Avoid a continuous stream of requests from AIO callbacks, which
> +     * call a user-provided callback while blk->in_flight is elevated,
> +     * if the BlockBackend is being quiesced.
> +     *
> +     * This initial test does not have to be perfect: a race might
> +     * cause an extra I/O to be queued, but sooner or later a nonzero
> +     * quiesce_counter will be observed.

This is true in the initial drain phase while we're still polling. But
generally this is not only about avoiding a continuous stream of
requests, but about making sure that absolutely no new requests come in
while a node is drained.

> +     */
> +    if (!blk->disable_request_queuing && qatomic_read(&blk->quiesce_counter)) {

So if no other requests were pending and we didn't even call aio_poll()
because the AIO_WAIT_WHILE() condition was false from the start, could
it be that bdrv_drained_begin() has already returned on the thread that
drains, but another thread still sees the old value here?

Starting a new request after bdrv_drained_begin() has returned would be
a bug.

Kevin



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 00/12] More cleanups and fixes for drain
  2022-12-12 12:59 [PATCH 00/12] More cleanups and fixes for drain Paolo Bonzini
                   ` (15 preceding siblings ...)
  2023-01-16 15:51 ` [PATCH 00/12] More cleanups and fixes for drain Kevin Wolf
@ 2023-01-16 17:25 ` Kevin Wolf
  16 siblings, 0 replies; 22+ messages in thread
From: Kevin Wolf @ 2023-01-16 17:25 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel, qemu-block, eesposit

Am 12.12.2022 um 13:59 hat Paolo Bonzini geschrieben:
> There are a few more lines of code that can be removed around draining
> code, but especially the logic can be simplified by removing unnecessary
> parameters.
> 
> Due to the failure of the block-next branch, the first three patches
> drop patches 14+15 of Kevin's drain cleanup series, and then redo
> patch 15 in a slightly less satisfactory way that still enables the
> remaining cleanups.  These reverts are not supposed to be applied;
> either the offending patches are dropped from the branch, or if the
> issue is fixed then my first three patches can go away.
> 
> The next three are taken from Emanuele's old subtree drain attempt
> at removing the AioContext.  The main one is the second, which is needed
> to avoid testcase failures, but I included all of them for simplicity.
> 
> Patch 7 fixes another latent bug exposed by the later cleanups, and while
> looking for a fix I noticed a general lack of thread-safety in BlockBackend's
> drain code.  There are some global properties that only need to be documented
> and enforced to be set only at creation time (patches 8/9), but also
> queued_requests is not protected by any mutex, which is fixed in patch 10.
> 
> Finally, patches 11-15 are the actual simplification.

Patches 12-15 actually look independent from the rest of the series, and
they look good to me. Could they be applied now even if the rest of the
series needs more discussion and a v2?

Kevin



^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2023-01-16 17:25 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-12-12 12:59 [PATCH 00/12] More cleanups and fixes for drain Paolo Bonzini
2022-12-12 12:59 ` [PATCH 01/15] Revert "block: Remove poll parameter from bdrv_parent_drained_begin_single()" Paolo Bonzini
2022-12-12 12:59 ` [PATCH 02/15] Revert "block: Don't poll in bdrv_replace_child_noperm()" Paolo Bonzini
2022-12-12 12:59 ` [PATCH 03/15] block: Pull polling out of bdrv_parent_drained_begin_single() Paolo Bonzini
2022-12-12 12:59 ` [PATCH 04/15] test-bdrv-drain.c: remove test_detach_by_parent_cb() Paolo Bonzini
2022-12-12 12:59 ` [PATCH 05/15] tests/unit/test-bdrv-drain.c: graph setup functions can't run in coroutines Paolo Bonzini
2022-12-12 12:59 ` [PATCH 06/15] tests/qemu-iotests/030: test_stream_parallel should use auto_finalize=False Paolo Bonzini
2022-12-12 12:59 ` [PATCH 07/15] block-backend: enter aio coroutine only after drain Paolo Bonzini
2023-01-16 15:57   ` Kevin Wolf
2022-12-12 12:59 ` [PATCH 08/15] nbd: a BlockExport always has a BlockBackend Paolo Bonzini
2022-12-12 12:59 ` [PATCH 09/15] block-backend: make global properties write-once Paolo Bonzini
2022-12-12 12:59 ` [PATCH 10/15] block-backend: always wait for drain before starting operation Paolo Bonzini
2023-01-16 16:48   ` Kevin Wolf
2022-12-12 12:59 ` [PATCH 11/15] block-backend: make queued_requests thread-safe Paolo Bonzini
2023-01-11 20:44   ` Stefan Hajnoczi
2023-01-16 16:55   ` Kevin Wolf
2022-12-12 12:59 ` [PATCH 12/15] block: limit bdrv_co_yield_to_drain to drain_begin Paolo Bonzini
2022-12-12 12:59 ` [PATCH 13/15] block: second argument of bdrv_do_drained_end is always NULL Paolo Bonzini
2022-12-12 12:59 ` [PATCH 14/15] block: second argument of bdrv_do_drained_begin and bdrv_drain_poll " Paolo Bonzini
2022-12-12 12:59 ` [PATCH 15/15] block: only get out of coroutine context for polling Paolo Bonzini
2023-01-16 15:51 ` [PATCH 00/12] More cleanups and fixes for drain Kevin Wolf
2023-01-16 17:25 ` Kevin Wolf

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.