[PATCH 0/6] block: bug fixes in preparation of AioContext removal

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH 0/6] block: bug fixes in preparation of AioContext removal
@ 2022-02-08 15:36 Emanuele Giuseppe Esposito
  2022-02-08 15:36 ` [PATCH 1/6] block/io.c: fix bdrv_child_cb_drained_begin invocations from a coroutine Emanuele Giuseppe Esposito
                   ` (5 more replies)
  0 siblings, 6 replies; 28+ messages in thread
From: Emanuele Giuseppe Esposito @ 2022-02-08 15:36 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Fam Zheng, Vladimir Sementsov-Ogievskiy,
	Emanuele Giuseppe Esposito, qemu-devel, Hanna Reitz,
	Stefan Hajnoczi, Paolo Bonzini, John Snow

This serie aims to remove and clean up some bugs that came up
when trying to replace the AioContext lock and still protect
BlockDriverState fields.

They were part of the serie "Removal of Aiocontext lock
through drains: protect bdrv_replace_child_noperm", but since
that serie is still a work in progress and these fixes are
pretty much independent, I split that in two separate series.

This serie is based on "job: replace AioContext lock with job_mutex"

Based-on: <20220208143513.1077229-1-eesposit@redhat.com>

Emanuele Giuseppe Esposito (6):
  block/io.c: fix bdrv_child_cb_drained_begin invocations from a
    coroutine
  block.c: bdrv_replace_child_noperm: first remove the child, and then
    call ->detach()
  block.c: bdrv_replace_child_noperm: first call ->attach(), and then
    add child
  test-bdrv-drain.c: adapt test to the coming subtree drains
  test-bdrv-drain.c: remove test_detach_by_parent_cb()
  jobs: ensure sleep in job_sleep_ns is fully performed

 block.c                      | 18 +++++++-----
 block/io.c                   |  7 ++++-
 include/block/block-io.h     |  8 ++++--
 job.c                        | 19 +++++++------
 tests/qemu-iotests/030       |  2 +-
 tests/qemu-iotests/151       |  4 +--
 tests/unit/test-bdrv-drain.c | 53 ++++++++++++------------------------
 tests/unit/test-blockjob.c   |  2 +-
 8 files changed, 55 insertions(+), 58 deletions(-)

-- 
2.31.1



^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH 1/6] block/io.c: fix bdrv_child_cb_drained_begin invocations from a coroutine
  2022-02-08 15:36 [PATCH 0/6] block: bug fixes in preparation of AioContext removal Emanuele Giuseppe Esposito
@ 2022-02-08 15:36 ` Emanuele Giuseppe Esposito
  2022-02-10 14:14   ` Stefan Hajnoczi
  2022-02-11 11:54   ` Kevin Wolf
  2022-02-08 15:36 ` [PATCH 2/6] block.c: bdrv_replace_child_noperm: first remove the child, and then call ->detach() Emanuele Giuseppe Esposito
                   ` (4 subsequent siblings)
  5 siblings, 2 replies; 28+ messages in thread
From: Emanuele Giuseppe Esposito @ 2022-02-08 15:36 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Fam Zheng, Vladimir Sementsov-Ogievskiy,
	Emanuele Giuseppe Esposito, qemu-devel, Hanna Reitz,
	Stefan Hajnoczi, Paolo Bonzini, John Snow

Using bdrv_do_drained_begin_quiesce() in bdrv_child_cb_drained_begin()
is not a good idea: the callback might be called when running
a drain in a coroutine, and bdrv_drained_begin_poll() does not
handle that case, resulting in assertion failure.

Instead, bdrv_do_drained_begin with no recursion and poll
will accomplish the same thing (invoking bdrv_do_drained_begin_quiesce)
but will firstly check if we are already in a coroutine, and exit
from that via bdrv_co_yield_to_drain().

Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
---
 block.c                  | 2 +-
 block/io.c               | 7 ++++++-
 include/block/block-io.h | 8 +++++---
 3 files changed, 12 insertions(+), 5 deletions(-)

diff --git a/block.c b/block.c
index e42a99d2af..4551eba2aa 100644
--- a/block.c
+++ b/block.c
@@ -1203,7 +1203,7 @@ static char *bdrv_child_get_parent_desc(BdrvChild *c)
 static void bdrv_child_cb_drained_begin(BdrvChild *child)
 {
     BlockDriverState *bs = child->opaque;
-    bdrv_do_drained_begin_quiesce(bs, NULL, false);
+    bdrv_drained_begin_no_poll(bs);
 }
 
 static bool bdrv_child_cb_drained_poll(BdrvChild *child)
diff --git a/block/io.c b/block/io.c
index 683a32c955..d3ee3e1a7c 100644
--- a/block/io.c
+++ b/block/io.c
@@ -425,7 +425,7 @@ static void coroutine_fn bdrv_co_yield_to_drain(BlockDriverState *bs,
     }
 }
 
-void bdrv_do_drained_begin_quiesce(BlockDriverState *bs,
+static void bdrv_do_drained_begin_quiesce(BlockDriverState *bs,
                                    BdrvChild *parent, bool ignore_bds_parents)
 {
     assert(!qemu_in_coroutine());
@@ -487,6 +487,11 @@ void bdrv_subtree_drained_begin(BlockDriverState *bs)
     bdrv_do_drained_begin(bs, true, NULL, false, true);
 }
 
+void bdrv_drained_begin_no_poll(BlockDriverState *bs)
+{
+    bdrv_do_drained_begin(bs, false, NULL, false, false);
+}
+
 /**
  * This function does not poll, nor must any of its recursively called
  * functions.  The *drained_end_counter pointee will be incremented
diff --git a/include/block/block-io.h b/include/block/block-io.h
index 7c04bc3049..48d3442970 100644
--- a/include/block/block-io.h
+++ b/include/block/block-io.h
@@ -308,13 +308,15 @@ bool bdrv_drain_poll(BlockDriverState *bs, bool recursive,
 void bdrv_drained_begin(BlockDriverState *bs);
 
 /**
- * bdrv_do_drained_begin_quiesce:
+ * bdrv_drained_begin_no_poll:
  *
  * Quiesces a BDS like bdrv_drained_begin(), but does not wait for already
  * running requests to complete.
+ * Same as bdrv_drained_begin(), but do not poll for the subgraph to
+ * actually become unquiesced. Therefore, no graph changes will occur
+ * with this function.
  */
-void bdrv_do_drained_begin_quiesce(BlockDriverState *bs,
-                                   BdrvChild *parent, bool ignore_bds_parents);
+void bdrv_drained_begin_no_poll(BlockDriverState *bs);
 
 /**
  * Like bdrv_drained_begin, but recursively begins a quiesced section for
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 2/6] block.c: bdrv_replace_child_noperm: first remove the child, and then call ->detach()
  2022-02-08 15:36 [PATCH 0/6] block: bug fixes in preparation of AioContext removal Emanuele Giuseppe Esposito
  2022-02-08 15:36 ` [PATCH 1/6] block/io.c: fix bdrv_child_cb_drained_begin invocations from a coroutine Emanuele Giuseppe Esposito
@ 2022-02-08 15:36 ` Emanuele Giuseppe Esposito
  2022-02-10 14:16   ` Stefan Hajnoczi
  2022-02-11 12:28   ` Kevin Wolf
  2022-02-08 15:36 ` [PATCH 3/6] block.c: bdrv_replace_child_noperm: first call ->attach(), and then add child Emanuele Giuseppe Esposito
                   ` (3 subsequent siblings)
  5 siblings, 2 replies; 28+ messages in thread
From: Emanuele Giuseppe Esposito @ 2022-02-08 15:36 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Fam Zheng, Vladimir Sementsov-Ogievskiy,
	Emanuele Giuseppe Esposito, qemu-devel, Hanna Reitz,
	Stefan Hajnoczi, Paolo Bonzini, John Snow

Doing the opposite can make ->detach() (more precisely
bdrv_unapply_subtree_drain() in bdrv_child_cb_detach) undo the subtree_drain
just performed to protect the removal of the child from the graph,
thus making the fully-enabled assert_bdrv_graph_writable fail.

Note that assert_bdrv_graph_writable is not yet fully enabled.

Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
---
 block.c | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/block.c b/block.c
index 4551eba2aa..ec346a7e2e 100644
--- a/block.c
+++ b/block.c
@@ -2854,14 +2854,16 @@ static void bdrv_replace_child_noperm(BdrvChild **childp,
     }
 
     if (old_bs) {
-        /* Detach first so that the recursive drain sections coming from @child
+        assert_bdrv_graph_writable(old_bs);
+        QLIST_REMOVE(child, next_parent);
+        /*
+         * Detach first so that the recursive drain sections coming from @child
          * are already gone and we only end the drain sections that came from
-         * elsewhere. */
+         * elsewhere.
+         */
         if (child->klass->detach) {
             child->klass->detach(child);
         }
-        assert_bdrv_graph_writable(old_bs);
-        QLIST_REMOVE(child, next_parent);
     }
 
     child->bs = new_bs;
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 3/6] block.c: bdrv_replace_child_noperm: first call ->attach(), and then add child
  2022-02-08 15:36 [PATCH 0/6] block: bug fixes in preparation of AioContext removal Emanuele Giuseppe Esposito
  2022-02-08 15:36 ` [PATCH 1/6] block/io.c: fix bdrv_child_cb_drained_begin invocations from a coroutine Emanuele Giuseppe Esposito
  2022-02-08 15:36 ` [PATCH 2/6] block.c: bdrv_replace_child_noperm: first remove the child, and then call ->detach() Emanuele Giuseppe Esposito
@ 2022-02-08 15:36 ` Emanuele Giuseppe Esposito
  2022-02-10 14:16   ` Stefan Hajnoczi
  2022-02-11 12:34   ` Kevin Wolf
  2022-02-08 15:36 ` [PATCH 4/6] test-bdrv-drain.c: adapt test to the coming subtree drains Emanuele Giuseppe Esposito
                   ` (2 subsequent siblings)
  5 siblings, 2 replies; 28+ messages in thread
From: Emanuele Giuseppe Esposito @ 2022-02-08 15:36 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Fam Zheng, Vladimir Sementsov-Ogievskiy,
	Emanuele Giuseppe Esposito, qemu-devel, Hanna Reitz,
	Stefan Hajnoczi, Paolo Bonzini, John Snow

Doing the opposite can make adding the child node to a non-drained node,
as apply_subtree_drain is only done in ->attach() and thus make
assert_bdrv_graph_writable fail.

This can happen for example during a transaction rollback (test 245,
test_io_with_graph_changes):
1. a node is removed from the graph, thus it is undrained
2. then something happens, and we need to roll back the transactions
   through tran_abort()
3. at this point, the current code would first attach the undrained node
   to the graph via QLIST_INSERT_HEAD, and then call ->attach() that
   will take care of restoring the drain with apply_subtree_drain(),
   leaving the node undrained between the two operations.

Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
---
 block.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/block.c b/block.c
index ec346a7e2e..08a6e3a4ef 100644
--- a/block.c
+++ b/block.c
@@ -2872,8 +2872,6 @@ static void bdrv_replace_child_noperm(BdrvChild **childp,
     }
 
     if (new_bs) {
-        assert_bdrv_graph_writable(new_bs);
-        QLIST_INSERT_HEAD(&new_bs->parents, child, next_parent);
 
         /*
          * Detaching the old node may have led to the new node's
@@ -2890,6 +2888,10 @@ static void bdrv_replace_child_noperm(BdrvChild **childp,
         if (child->klass->attach) {
             child->klass->attach(child);
         }
+
+        assert_bdrv_graph_writable(new_bs);
+        QLIST_INSERT_HEAD(&new_bs->parents, child, next_parent);
+
     }
 
     /*
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 4/6] test-bdrv-drain.c: adapt test to the coming subtree drains
  2022-02-08 15:36 [PATCH 0/6] block: bug fixes in preparation of AioContext removal Emanuele Giuseppe Esposito
                   ` (2 preceding siblings ...)
  2022-02-08 15:36 ` [PATCH 3/6] block.c: bdrv_replace_child_noperm: first call ->attach(), and then add child Emanuele Giuseppe Esposito
@ 2022-02-08 15:36 ` Emanuele Giuseppe Esposito
  2022-02-10 14:32   ` Stefan Hajnoczi
  2022-02-08 15:36 ` [PATCH 5/6] test-bdrv-drain.c: remove test_detach_by_parent_cb() Emanuele Giuseppe Esposito
  2022-02-08 15:36 ` [PATCH 6/6] jobs: ensure sleep in job_sleep_ns is fully performed Emanuele Giuseppe Esposito
  5 siblings, 1 reply; 28+ messages in thread
From: Emanuele Giuseppe Esposito @ 2022-02-08 15:36 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Fam Zheng, Vladimir Sementsov-Ogievskiy,
	Emanuele Giuseppe Esposito, qemu-devel, Hanna Reitz,
	Stefan Hajnoczi, Paolo Bonzini, John Snow

There will be 2 problems in this test when we will add
subtree drains in bdrv_replace_child_noperm:

- First, the test is inconsistent about taking the AioContext lock when
  calling bdrv_replace_child_noperm.  bdrv_replace_child_noperm is reached
  in two places: from blk_insert_bs directly, and via block_job_create.
  Only the second does it with the AioContext lock taken, and there seems
  to be no reason why the lock is needed.
  Move aio_context_acquire further down, to just protect block_job_add_bdrv()

- Second, test_detach_indirect is only interested in observing the first
  call to .drained_begin. In the original test, there was only a single
  subtree drain; however, with additional drains introduced in
  bdrv_replace_child_noperm(), the test callback would be called too early
  and/or multiple times.
  Override the callback only when we actually want to use it, and put back
  the original after it's been invoked.

Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
---
 tests/unit/test-bdrv-drain.c | 17 ++++++++++++-----
 1 file changed, 12 insertions(+), 5 deletions(-)

diff --git a/tests/unit/test-bdrv-drain.c b/tests/unit/test-bdrv-drain.c
index 4924ceb562..c52ba2db4e 100644
--- a/tests/unit/test-bdrv-drain.c
+++ b/tests/unit/test-bdrv-drain.c
@@ -912,12 +912,12 @@ static void test_blockjob_common_drain_node(enum drain_type drain_type,
     blk_insert_bs(blk_target, target, &error_abort);
     blk_set_allow_aio_context_change(blk_target, true);
 
-    aio_context_acquire(ctx);
     tjob = block_job_create("job0", &test_job_driver, NULL, src,
                             0, BLK_PERM_ALL,
                             0, 0, NULL, NULL, &error_abort);
     tjob->bs = src;
     job = &tjob->common;
+    aio_context_acquire(ctx);
     block_job_add_bdrv(job, "target", target, 0, BLK_PERM_ALL, &error_abort);
 
     switch (result) {
@@ -1342,15 +1342,18 @@ static void detach_by_parent_aio_cb(void *opaque, int ret)
     }
 }
 
+static BdrvChildClass detach_by_driver_cb_class;
+
 static void detach_by_driver_cb_drained_begin(BdrvChild *child)
 {
+    /* restore .drained_begin cb, we don't need it anymore. */
+    detach_by_driver_cb_class.drained_begin = child_of_bds.drained_begin;
+
     aio_bh_schedule_oneshot(qemu_get_current_aio_context(),
                             detach_indirect_bh, &detach_by_parent_data);
     child_of_bds.drained_begin(child);
 }
 
-static BdrvChildClass detach_by_driver_cb_class;
-
 /*
  * Initial graph:
  *
@@ -1382,8 +1385,6 @@ static void test_detach_indirect(bool by_parent_cb)
 
     if (!by_parent_cb) {
         detach_by_driver_cb_class = child_of_bds;
-        detach_by_driver_cb_class.drained_begin =
-            detach_by_driver_cb_drained_begin;
     }
 
     /* Create all involved nodes */
@@ -1441,6 +1442,12 @@ static void test_detach_indirect(bool by_parent_cb)
     acb = blk_aio_preadv(blk, 0, &qiov, 0, detach_by_parent_aio_cb, NULL);
     g_assert(acb != NULL);
 
+    if (!by_parent_cb) {
+        /* set .drained_begin cb to run only in the following drain. */
+        detach_by_driver_cb_class.drained_begin =
+            detach_by_driver_cb_drained_begin;
+    }
+
     /* Drain and check the expected result */
     bdrv_subtree_drained_begin(parent_b);
 
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 5/6] test-bdrv-drain.c: remove test_detach_by_parent_cb()
  2022-02-08 15:36 [PATCH 0/6] block: bug fixes in preparation of AioContext removal Emanuele Giuseppe Esposito
                   ` (3 preceding siblings ...)
  2022-02-08 15:36 ` [PATCH 4/6] test-bdrv-drain.c: adapt test to the coming subtree drains Emanuele Giuseppe Esposito
@ 2022-02-08 15:36 ` Emanuele Giuseppe Esposito
  2022-02-10 14:33   ` Stefan Hajnoczi
  2022-02-11 15:38   ` Kevin Wolf
  2022-02-08 15:36 ` [PATCH 6/6] jobs: ensure sleep in job_sleep_ns is fully performed Emanuele Giuseppe Esposito
  5 siblings, 2 replies; 28+ messages in thread
From: Emanuele Giuseppe Esposito @ 2022-02-08 15:36 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Fam Zheng, Vladimir Sementsov-Ogievskiy,
	Emanuele Giuseppe Esposito, qemu-devel, Hanna Reitz,
	Stefan Hajnoczi, Paolo Bonzini, John Snow

This test uses a callback of an I/O function (blk_aio_preadv)
to modify the graph, using bdrv_attach_child.
This is simply not allowed anymore. I/O cannot change the graph.

Before "block/io.c: make bdrv_do_drained_begin_quiesce static
and introduce bdrv_drained_begin_no_poll", the test would simply
be at risk of failure, because if bdrv_replace_child_noperm()
(called to modify the graph) would call a drain,
then one callback of .drained_begin() is bdrv_do_drained_begin_quiesce,
that specifically asserts that we are not in a coroutine.

Now that we fixed the behavior, the drain will invoke a bh in the
main loop, so we don't have such problem. However, this test is still
illegal and fails because we forbid graph changes from I/O paths.

Once we add the required subtree_drains to protect
bdrv_replace_child_noperm(), the key problem in this test is in:

acb = blk_aio_preadv(blk, 0, &qiov, 0, detach_by_parent_aio_cb, NULL);
/* Drain and check the expected result */
bdrv_subtree_drained_begin(parent_b);

because the detach_by_parent_aio_cb calls detach_indirect_bh(), that
modifies the graph and is invoked during bdrv_subtree_drained_begin().
The call stack is the following:
1. blk_aio_preadv() creates a coroutine, increments in_flight counter
and enters the coroutine running blk_aio_read_entry()
2. blk_aio_read_entry() performs the read and then schedules a bh to
   complete (blk_aio_complete)
3. at this point, subtree_drained_begin() kicks in and waits for all
   in_flight requests, polling
4. polling allows the bh to be scheduled, so blk_aio_complete runs
5. blk_aio_complete *first* invokes the callback
   (detach_by_parent_aio_cb) and then decrements the in_flight counter
6. Here we have the problem: detach_by_parent_aio_cb modifies the graph,
   so both bdrv_unref_child() and bdrv_attach_child() will have
   subtree_drains inside. And this causes a deadlock, because the
   nested drain will wait for in_flight counter to go to zero, which
   is only happening once the drain itself finishes.

Different story is test_detach_by_driver_cb(): in this case,
detach_by_parent_aio_cb() does not call detach_indirect_bh(),
but it is instead called as a bh running in the main loop by
detach_by_driver_cb_drained_begin(), the callback for
.drained_begin().

This test was added in 231281ab42 and part of the series
"Drain fixes and cleanups, part 3"
https://lists.nongnu.org/archive/html/qemu-block/2018-05/msg01132.html
but as explained above I believe that it is not valid anymore, and
can be discarded.

Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
---
 tests/unit/test-bdrv-drain.c | 46 +++++++++---------------------------
 1 file changed, 11 insertions(+), 35 deletions(-)

diff --git a/tests/unit/test-bdrv-drain.c b/tests/unit/test-bdrv-drain.c
index c52ba2db4e..ef154b6536 100644
--- a/tests/unit/test-bdrv-drain.c
+++ b/tests/unit/test-bdrv-drain.c
@@ -1316,7 +1316,6 @@ struct detach_by_parent_data {
     BdrvChild *child_b;
     BlockDriverState *c;
     BdrvChild *child_c;
-    bool by_parent_cb;
 };
 static struct detach_by_parent_data detach_by_parent_data;
 
@@ -1334,12 +1333,7 @@ static void detach_indirect_bh(void *opaque)
 
 static void detach_by_parent_aio_cb(void *opaque, int ret)
 {
-    struct detach_by_parent_data *data = &detach_by_parent_data;
-
     g_assert_cmpint(ret, ==, 0);
-    if (data->by_parent_cb) {
-        detach_indirect_bh(data);
-    }
 }
 
 static BdrvChildClass detach_by_driver_cb_class;
@@ -1361,31 +1355,24 @@ static void detach_by_driver_cb_drained_begin(BdrvChild *child)
  *    \ /   \
  *     A     B     C
  *
- * by_parent_cb == true:  Test that parent callbacks don't poll
- *
- *     PA has a pending write request whose callback changes the child nodes of
- *     PB: It removes B and adds C instead. The subtree of PB is drained, which
- *     will indirectly drain the write request, too.
- *
- * by_parent_cb == false: Test that bdrv_drain_invoke() doesn't poll
+ * Test that bdrv_drain_invoke() doesn't poll
  *
  *     PA's BdrvChildClass has a .drained_begin callback that schedules a BH
  *     that does the same graph change. If bdrv_drain_invoke() calls it, the
  *     state is messed up, but if it is only polled in the single
  *     BDRV_POLL_WHILE() at the end of the drain, this should work fine.
  */
-static void test_detach_indirect(bool by_parent_cb)
+static void test_detach_indirect(void)
 {
     BlockBackend *blk;
     BlockDriverState *parent_a, *parent_b, *a, *b, *c;
     BdrvChild *child_a, *child_b;
     BlockAIOCB *acb;
+    BDRVTestState *s;
 
     QEMUIOVector qiov = QEMU_IOVEC_INIT_BUF(qiov, NULL, 0);
 
-    if (!by_parent_cb) {
-        detach_by_driver_cb_class = child_of_bds;
-    }
+    detach_by_driver_cb_class = child_of_bds;
 
     /* Create all involved nodes */
     parent_a = bdrv_new_open_driver(&bdrv_test, "parent-a", BDRV_O_RDWR,
@@ -1404,10 +1391,8 @@ static void test_detach_indirect(bool by_parent_cb)
 
     /* If we want to get bdrv_drain_invoke() to call aio_poll(), the driver
      * callback must not return immediately. */
-    if (!by_parent_cb) {
-        BDRVTestState *s = parent_a->opaque;
-        s->sleep_in_drain_begin = true;
-    }
+    s = parent_a->opaque;
+    s->sleep_in_drain_begin = true;
 
     /* Set child relationships */
     bdrv_ref(b);
@@ -1419,7 +1404,7 @@ static void test_detach_indirect(bool by_parent_cb)
 
     bdrv_ref(a);
     bdrv_attach_child(parent_a, a, "PA-A",
-                      by_parent_cb ? &child_of_bds : &detach_by_driver_cb_class,
+                      &detach_by_driver_cb_class,
                       BDRV_CHILD_DATA, &error_abort);
 
     g_assert_cmpint(parent_a->refcnt, ==, 1);
@@ -1437,16 +1422,13 @@ static void test_detach_indirect(bool by_parent_cb)
         .parent_b = parent_b,
         .child_b = child_b,
         .c = c,
-        .by_parent_cb = by_parent_cb,
     };
     acb = blk_aio_preadv(blk, 0, &qiov, 0, detach_by_parent_aio_cb, NULL);
     g_assert(acb != NULL);
 
-    if (!by_parent_cb) {
-        /* set .drained_begin cb to run only in the following drain. */
-        detach_by_driver_cb_class.drained_begin =
-            detach_by_driver_cb_drained_begin;
-    }
+    /* set .drained_begin cb to run only in the following drain. */
+    detach_by_driver_cb_class.drained_begin =
+        detach_by_driver_cb_drained_begin;
 
     /* Drain and check the expected result */
     bdrv_subtree_drained_begin(parent_b);
@@ -1482,14 +1464,9 @@ static void test_detach_indirect(bool by_parent_cb)
     bdrv_unref(c);
 }
 
-static void test_detach_by_parent_cb(void)
-{
-    test_detach_indirect(true);
-}
-
 static void test_detach_by_driver_cb(void)
 {
-    test_detach_indirect(false);
+    test_detach_indirect();
 }
 
 static void test_append_to_drained(void)
@@ -2244,7 +2221,6 @@ int main(int argc, char **argv)
     g_test_add_func("/bdrv-drain/detach/drain_all", test_detach_by_drain_all);
     g_test_add_func("/bdrv-drain/detach/drain", test_detach_by_drain);
     g_test_add_func("/bdrv-drain/detach/drain_subtree", test_detach_by_drain_subtree);
-    g_test_add_func("/bdrv-drain/detach/parent_cb", test_detach_by_parent_cb);
     g_test_add_func("/bdrv-drain/detach/driver_cb", test_detach_by_driver_cb);
 
     g_test_add_func("/bdrv-drain/attach/drain", test_append_to_drained);
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 6/6] jobs: ensure sleep in job_sleep_ns is fully performed
  2022-02-08 15:36 [PATCH 0/6] block: bug fixes in preparation of AioContext removal Emanuele Giuseppe Esposito
                   ` (4 preceding siblings ...)
  2022-02-08 15:36 ` [PATCH 5/6] test-bdrv-drain.c: remove test_detach_by_parent_cb() Emanuele Giuseppe Esposito
@ 2022-02-08 15:36 ` Emanuele Giuseppe Esposito
  2022-02-10 14:43   ` Stefan Hajnoczi
  2022-02-10 15:02   ` Vladimir Sementsov-Ogievskiy
  5 siblings, 2 replies; 28+ messages in thread
From: Emanuele Giuseppe Esposito @ 2022-02-08 15:36 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Fam Zheng, Vladimir Sementsov-Ogievskiy,
	Emanuele Giuseppe Esposito, qemu-devel, Hanna Reitz,
	Stefan Hajnoczi, Paolo Bonzini, John Snow

If a drain happens while a job is sleeping, the timeout
gets cancelled and the job continues once the drain ends.
This is especially bad for the sleep performed in commit and stream
jobs, since that is dictated by ratelimit to maintain a certain speed.

Basically the execution path is the following:
1. job calls job_sleep_ns, and yield with a timer in @ns ns.
2. meanwhile, a drain is executed, and
   child_job_drained_{begin/end} could be executed as ->drained_begin()
   and ->drained_end() callbacks.
   Therefore child_job_drained_begin() enters the job, that continues
   execution in job_sleep_ns() and calls job_pause_point_locked().
3. job_pause_point_locked() detects that we are in the middle of a
   drain, and firstly deletes any existing timer and then yields again,
   waiting for ->drained_end().
4. Once draining is finished, child_job_drained_end() runs and resumes
   the job. At this point, the timer has been lost and we just resume
   without checking if enough time has passed.

This fix implies that from now onwards, job_sleep_ns will force the job
to sleep @ns, even if it is wake up (purposefully or not) in the middle
of the sleep. Therefore qemu-iotests test might run a little bit slower,
depending on the speed of the job. Setting a job speed to values like "1"
is not allowed anymore (unless you want to wait forever).

Because of this fix, test_stream_parallel() in tests/qemu-iotests/030
takes too long, since speed of stream job is just 1024 and before
it was skipping all the wait thanks to the drains. Increase the
speed to 256 * 1024. Exactly the same happens for test 151.

Instead we need to sleep less in test_cancel_ready() test-blockjob.c,
so that the job will be able to exit the sleep and transition to ready
before the main loop asserts.

Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
---
 job.c                      | 19 +++++++++++--------
 tests/qemu-iotests/030     |  2 +-
 tests/qemu-iotests/151     |  4 ++--
 tests/unit/test-blockjob.c |  2 +-
 4 files changed, 15 insertions(+), 12 deletions(-)

diff --git a/job.c b/job.c
index c7a41d88d6..72d6c462ed 100644
--- a/job.c
+++ b/job.c
@@ -639,19 +639,22 @@ void job_yield(Job *job)
 
 void coroutine_fn job_sleep_ns(Job *job, int64_t ns)
 {
+    int64_t end_ns = qemu_clock_get_ns(QEMU_CLOCK_REALTIME) + ns;
     JOB_LOCK_GUARD();
     assert(job->busy);
 
-    /* Check cancellation *before* setting busy = false, too!  */
-    if (job_is_cancelled_locked(job)) {
-        return;
-    }
+    do {
+        /* Check cancellation *before* setting busy = false, too!  */
+        if (job_is_cancelled_locked(job)) {
+            return;
+        }
 
-    if (!job_should_pause_locked(job)) {
-        job_do_yield_locked(job, qemu_clock_get_ns(QEMU_CLOCK_REALTIME) + ns);
-    }
+        if (!job_should_pause_locked(job)) {
+            job_do_yield_locked(job, end_ns);
+        }
 
-    job_pause_point_locked(job);
+        job_pause_point_locked(job);
+    } while (qemu_clock_get_ns(QEMU_CLOCK_REALTIME) < end_ns);
 }
 
 /* Assumes the job_mutex is held */
diff --git a/tests/qemu-iotests/030 b/tests/qemu-iotests/030
index 567bf1da67..969b246d0f 100755
--- a/tests/qemu-iotests/030
+++ b/tests/qemu-iotests/030
@@ -248,7 +248,7 @@ class TestParallelOps(iotests.QMPTestCase):
             pending_jobs.append(job_id)
             result = self.vm.qmp('block-stream', device=node_name,
                                  job_id=job_id, bottom=f'node{i-1}',
-                                 speed=1024)
+                                 speed=256*1024)
             self.assert_qmp(result, 'return', {})
 
         # Do this in reverse: After unthrottling them, some jobs may finish
diff --git a/tests/qemu-iotests/151 b/tests/qemu-iotests/151
index 93d14193d0..5998beb5c4 100755
--- a/tests/qemu-iotests/151
+++ b/tests/qemu-iotests/151
@@ -129,7 +129,7 @@ class TestActiveMirror(iotests.QMPTestCase):
                              sync='full',
                              copy_mode='write-blocking',
                              buf_size=(1048576 // 4),
-                             speed=1)
+                             speed=1024*1024)
         self.assert_qmp(result, 'return', {})
 
         # Start an unaligned request to a dirty area
@@ -154,7 +154,7 @@ class TestActiveMirror(iotests.QMPTestCase):
                              target='target-node',
                              sync='full',
                              copy_mode='write-blocking',
-                             speed=1)
+                             speed=1024*1024)
 
         self.vm.hmp_qemu_io('source', 'break write_aio A')
         self.vm.hmp_qemu_io('source', 'aio_write 0 1M')  # 1
diff --git a/tests/unit/test-blockjob.c b/tests/unit/test-blockjob.c
index d6fc52f80a..81ebc7d154 100644
--- a/tests/unit/test-blockjob.c
+++ b/tests/unit/test-blockjob.c
@@ -184,7 +184,7 @@ static int coroutine_fn cancel_job_run(Job *job, Error **errp)
             job_transition_to_ready(&s->common.job);
         }
 
-        job_sleep_ns(&s->common.job, 100000);
+        job_sleep_ns(&s->common.job, 100);
     }
 
     return 0;
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/6] block/io.c: fix bdrv_child_cb_drained_begin invocations from a coroutine
  2022-02-08 15:36 ` [PATCH 1/6] block/io.c: fix bdrv_child_cb_drained_begin invocations from a coroutine Emanuele Giuseppe Esposito
@ 2022-02-10 14:14   ` Stefan Hajnoczi
  2022-02-11 11:54   ` Kevin Wolf
  1 sibling, 0 replies; 28+ messages in thread
From: Stefan Hajnoczi @ 2022-02-10 14:14 UTC (permalink / raw)
  To: Emanuele Giuseppe Esposito
  Cc: Kevin Wolf, Fam Zheng, Vladimir Sementsov-Ogievskiy, qemu-block,
	qemu-devel, Hanna Reitz, Paolo Bonzini, John Snow

[-- Attachment #1: Type: text/plain, Size: 892 bytes --]

On Tue, Feb 08, 2022 at 10:36:50AM -0500, Emanuele Giuseppe Esposito wrote:
> Using bdrv_do_drained_begin_quiesce() in bdrv_child_cb_drained_begin()
> is not a good idea: the callback might be called when running
> a drain in a coroutine, and bdrv_drained_begin_poll() does not
> handle that case, resulting in assertion failure.
> 
> Instead, bdrv_do_drained_begin with no recursion and poll
> will accomplish the same thing (invoking bdrv_do_drained_begin_quiesce)
> but will firstly check if we are already in a coroutine, and exit
> from that via bdrv_co_yield_to_drain().
> 
> Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
> ---
>  block.c                  | 2 +-
>  block/io.c               | 7 ++++++-
>  include/block/block-io.h | 8 +++++---
>  3 files changed, 12 insertions(+), 5 deletions(-)

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/6] block.c: bdrv_replace_child_noperm: first remove the child, and then call ->detach()
  2022-02-08 15:36 ` [PATCH 2/6] block.c: bdrv_replace_child_noperm: first remove the child, and then call ->detach() Emanuele Giuseppe Esposito
@ 2022-02-10 14:16   ` Stefan Hajnoczi
  2022-02-11 12:28   ` Kevin Wolf
  1 sibling, 0 replies; 28+ messages in thread
From: Stefan Hajnoczi @ 2022-02-10 14:16 UTC (permalink / raw)
  To: Emanuele Giuseppe Esposito
  Cc: Kevin Wolf, Fam Zheng, Vladimir Sementsov-Ogievskiy, qemu-block,
	qemu-devel, Hanna Reitz, Paolo Bonzini, John Snow

[-- Attachment #1: Type: text/plain, Size: 633 bytes --]

On Tue, Feb 08, 2022 at 10:36:51AM -0500, Emanuele Giuseppe Esposito wrote:
> Doing the opposite can make ->detach() (more precisely
> bdrv_unapply_subtree_drain() in bdrv_child_cb_detach) undo the subtree_drain
> just performed to protect the removal of the child from the graph,
> thus making the fully-enabled assert_bdrv_graph_writable fail.
> 
> Note that assert_bdrv_graph_writable is not yet fully enabled.
> 
> Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
> ---
>  block.c | 10 ++++++----
>  1 file changed, 6 insertions(+), 4 deletions(-)

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 3/6] block.c: bdrv_replace_child_noperm: first call ->attach(), and then add child
  2022-02-08 15:36 ` [PATCH 3/6] block.c: bdrv_replace_child_noperm: first call ->attach(), and then add child Emanuele Giuseppe Esposito
@ 2022-02-10 14:16   ` Stefan Hajnoczi
  2022-02-11 12:34   ` Kevin Wolf
  1 sibling, 0 replies; 28+ messages in thread
From: Stefan Hajnoczi @ 2022-02-10 14:16 UTC (permalink / raw)
  To: Emanuele Giuseppe Esposito
  Cc: Kevin Wolf, Fam Zheng, Vladimir Sementsov-Ogievskiy, qemu-block,
	qemu-devel, Hanna Reitz, Paolo Bonzini, John Snow

[-- Attachment #1: Type: text/plain, Size: 1012 bytes --]

On Tue, Feb 08, 2022 at 10:36:52AM -0500, Emanuele Giuseppe Esposito wrote:
> Doing the opposite can make adding the child node to a non-drained node,
> as apply_subtree_drain is only done in ->attach() and thus make
> assert_bdrv_graph_writable fail.
> 
> This can happen for example during a transaction rollback (test 245,
> test_io_with_graph_changes):
> 1. a node is removed from the graph, thus it is undrained
> 2. then something happens, and we need to roll back the transactions
>    through tran_abort()
> 3. at this point, the current code would first attach the undrained node
>    to the graph via QLIST_INSERT_HEAD, and then call ->attach() that
>    will take care of restoring the drain with apply_subtree_drain(),
>    leaving the node undrained between the two operations.
> 
> Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
> ---
>  block.c | 6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 4/6] test-bdrv-drain.c: adapt test to the coming subtree drains
  2022-02-08 15:36 ` [PATCH 4/6] test-bdrv-drain.c: adapt test to the coming subtree drains Emanuele Giuseppe Esposito
@ 2022-02-10 14:32   ` Stefan Hajnoczi
  2022-02-10 16:02     ` Emanuele Giuseppe Esposito
  0 siblings, 1 reply; 28+ messages in thread
From: Stefan Hajnoczi @ 2022-02-10 14:32 UTC (permalink / raw)
  To: Emanuele Giuseppe Esposito
  Cc: Kevin Wolf, Fam Zheng, Vladimir Sementsov-Ogievskiy, qemu-block,
	qemu-devel, Hanna Reitz, Paolo Bonzini, John Snow

[-- Attachment #1: Type: text/plain, Size: 2690 bytes --]

On Tue, Feb 08, 2022 at 10:36:53AM -0500, Emanuele Giuseppe Esposito wrote:
> There will be 2 problems in this test when we will add
> subtree drains in bdrv_replace_child_noperm:
> 
> - First, the test is inconsistent about taking the AioContext lock when
>   calling bdrv_replace_child_noperm.  bdrv_replace_child_noperm is reached
>   in two places: from blk_insert_bs directly, and via block_job_create.
>   Only the second does it with the AioContext lock taken, and there seems
>   to be no reason why the lock is needed.
>   Move aio_context_acquire further down, to just protect block_job_add_bdrv()
> 
> - Second, test_detach_indirect is only interested in observing the first
>   call to .drained_begin. In the original test, there was only a single
>   subtree drain; however, with additional drains introduced in
>   bdrv_replace_child_noperm(), the test callback would be called too early
>   and/or multiple times.
>   Override the callback only when we actually want to use it, and put back
>   the original after it's been invoked.
> 
> Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
> ---
>  tests/unit/test-bdrv-drain.c | 17 ++++++++++++-----
>  1 file changed, 12 insertions(+), 5 deletions(-)
> 
> diff --git a/tests/unit/test-bdrv-drain.c b/tests/unit/test-bdrv-drain.c
> index 4924ceb562..c52ba2db4e 100644
> --- a/tests/unit/test-bdrv-drain.c
> +++ b/tests/unit/test-bdrv-drain.c
> @@ -912,12 +912,12 @@ static void test_blockjob_common_drain_node(enum drain_type drain_type,
>      blk_insert_bs(blk_target, target, &error_abort);
>      blk_set_allow_aio_context_change(blk_target, true);
>  
> -    aio_context_acquire(ctx);
>      tjob = block_job_create("job0", &test_job_driver, NULL, src,
>                              0, BLK_PERM_ALL,
>                              0, 0, NULL, NULL, &error_abort);
>      tjob->bs = src;
>      job = &tjob->common;
> +    aio_context_acquire(ctx);

block_job_create() uses src's AioContext. In the IOThread case the
AioContext is not qemu_aio_context. My expectation is that src's
AioContext must be acquired before block_job_create() starts using src.

blockdev.c QMP commands acquire the BDS's AioContext before calling the
function that creates the job. It seems strange to do it differently in
this test case.

You mentioned that blk_insert_bs() is called without acquiring an
AioContext. That may be because we know blk_target is in
qemu_aio_context and we assume our thread holds it (even if we don't
explicitly hold it). If you want to fix an inconsistency then maybe fix
that instead of removing the acquire around block_job_create()?

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 5/6] test-bdrv-drain.c: remove test_detach_by_parent_cb()
  2022-02-08 15:36 ` [PATCH 5/6] test-bdrv-drain.c: remove test_detach_by_parent_cb() Emanuele Giuseppe Esposito
@ 2022-02-10 14:33   ` Stefan Hajnoczi
  2022-02-11 15:38   ` Kevin Wolf
  1 sibling, 0 replies; 28+ messages in thread
From: Stefan Hajnoczi @ 2022-02-10 14:33 UTC (permalink / raw)
  To: Emanuele Giuseppe Esposito
  Cc: Kevin Wolf, Fam Zheng, Vladimir Sementsov-Ogievskiy, qemu-block,
	qemu-devel, Hanna Reitz, Paolo Bonzini, John Snow

[-- Attachment #1: Type: text/plain, Size: 3021 bytes --]

On Tue, Feb 08, 2022 at 10:36:54AM -0500, Emanuele Giuseppe Esposito wrote:
> This test uses a callback of an I/O function (blk_aio_preadv)
> to modify the graph, using bdrv_attach_child.
> This is simply not allowed anymore. I/O cannot change the graph.
> 
> Before "block/io.c: make bdrv_do_drained_begin_quiesce static
> and introduce bdrv_drained_begin_no_poll", the test would simply
> be at risk of failure, because if bdrv_replace_child_noperm()
> (called to modify the graph) would call a drain,
> then one callback of .drained_begin() is bdrv_do_drained_begin_quiesce,
> that specifically asserts that we are not in a coroutine.
> 
> Now that we fixed the behavior, the drain will invoke a bh in the
> main loop, so we don't have such problem. However, this test is still
> illegal and fails because we forbid graph changes from I/O paths.
> 
> Once we add the required subtree_drains to protect
> bdrv_replace_child_noperm(), the key problem in this test is in:
> 
> acb = blk_aio_preadv(blk, 0, &qiov, 0, detach_by_parent_aio_cb, NULL);
> /* Drain and check the expected result */
> bdrv_subtree_drained_begin(parent_b);
> 
> because the detach_by_parent_aio_cb calls detach_indirect_bh(), that
> modifies the graph and is invoked during bdrv_subtree_drained_begin().
> The call stack is the following:
> 1. blk_aio_preadv() creates a coroutine, increments in_flight counter
> and enters the coroutine running blk_aio_read_entry()
> 2. blk_aio_read_entry() performs the read and then schedules a bh to
>    complete (blk_aio_complete)
> 3. at this point, subtree_drained_begin() kicks in and waits for all
>    in_flight requests, polling
> 4. polling allows the bh to be scheduled, so blk_aio_complete runs
> 5. blk_aio_complete *first* invokes the callback
>    (detach_by_parent_aio_cb) and then decrements the in_flight counter
> 6. Here we have the problem: detach_by_parent_aio_cb modifies the graph,
>    so both bdrv_unref_child() and bdrv_attach_child() will have
>    subtree_drains inside. And this causes a deadlock, because the
>    nested drain will wait for in_flight counter to go to zero, which
>    is only happening once the drain itself finishes.
> 
> Different story is test_detach_by_driver_cb(): in this case,
> detach_by_parent_aio_cb() does not call detach_indirect_bh(),
> but it is instead called as a bh running in the main loop by
> detach_by_driver_cb_drained_begin(), the callback for
> .drained_begin().
> 
> This test was added in 231281ab42 and part of the series
> "Drain fixes and cleanups, part 3"
> https://lists.nongnu.org/archive/html/qemu-block/2018-05/msg01132.html
> but as explained above I believe that it is not valid anymore, and
> can be discarded.
> 
> Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
> ---
>  tests/unit/test-bdrv-drain.c | 46 +++++++++---------------------------
>  1 file changed, 11 insertions(+), 35 deletions(-)

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 6/6] jobs: ensure sleep in job_sleep_ns is fully performed
  2022-02-08 15:36 ` [PATCH 6/6] jobs: ensure sleep in job_sleep_ns is fully performed Emanuele Giuseppe Esposito
@ 2022-02-10 14:43   ` Stefan Hajnoczi
  2022-02-10 15:02   ` Vladimir Sementsov-Ogievskiy
  1 sibling, 0 replies; 28+ messages in thread
From: Stefan Hajnoczi @ 2022-02-10 14:43 UTC (permalink / raw)
  To: Emanuele Giuseppe Esposito
  Cc: Kevin Wolf, Fam Zheng, Vladimir Sementsov-Ogievskiy, qemu-block,
	qemu-devel, Hanna Reitz, Paolo Bonzini, John Snow

[-- Attachment #1: Type: text/plain, Size: 2321 bytes --]

On Tue, Feb 08, 2022 at 10:36:55AM -0500, Emanuele Giuseppe Esposito wrote:
> If a drain happens while a job is sleeping, the timeout
> gets cancelled and the job continues once the drain ends.
> This is especially bad for the sleep performed in commit and stream
> jobs, since that is dictated by ratelimit to maintain a certain speed.
> 
> Basically the execution path is the following:
> 1. job calls job_sleep_ns, and yield with a timer in @ns ns.
> 2. meanwhile, a drain is executed, and
>    child_job_drained_{begin/end} could be executed as ->drained_begin()
>    and ->drained_end() callbacks.
>    Therefore child_job_drained_begin() enters the job, that continues
>    execution in job_sleep_ns() and calls job_pause_point_locked().
> 3. job_pause_point_locked() detects that we are in the middle of a
>    drain, and firstly deletes any existing timer and then yields again,
>    waiting for ->drained_end().
> 4. Once draining is finished, child_job_drained_end() runs and resumes
>    the job. At this point, the timer has been lost and we just resume
>    without checking if enough time has passed.
> 
> This fix implies that from now onwards, job_sleep_ns will force the job
> to sleep @ns, even if it is wake up (purposefully or not) in the middle
> of the sleep. Therefore qemu-iotests test might run a little bit slower,
> depending on the speed of the job. Setting a job speed to values like "1"
> is not allowed anymore (unless you want to wait forever).
> 
> Because of this fix, test_stream_parallel() in tests/qemu-iotests/030
> takes too long, since speed of stream job is just 1024 and before
> it was skipping all the wait thanks to the drains. Increase the
> speed to 256 * 1024. Exactly the same happens for test 151.
> 
> Instead we need to sleep less in test_cancel_ready() test-blockjob.c,
> so that the job will be able to exit the sleep and transition to ready
> before the main loop asserts.
> 
> Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
> ---
>  job.c                      | 19 +++++++++++--------
>  tests/qemu-iotests/030     |  2 +-
>  tests/qemu-iotests/151     |  4 ++--
>  tests/unit/test-blockjob.c |  2 +-
>  4 files changed, 15 insertions(+), 12 deletions(-)

Acked-by: Stefan Hajnoczi <stefanha@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 6/6] jobs: ensure sleep in job_sleep_ns is fully performed
  2022-02-08 15:36 ` [PATCH 6/6] jobs: ensure sleep in job_sleep_ns is fully performed Emanuele Giuseppe Esposito
  2022-02-10 14:43   ` Stefan Hajnoczi
@ 2022-02-10 15:02   ` Vladimir Sementsov-Ogievskiy
  1 sibling, 0 replies; 28+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2022-02-10 15:02 UTC (permalink / raw)
  To: Emanuele Giuseppe Esposito, qemu-block
  Cc: Kevin Wolf, Hanna Reitz, John Snow, Fam Zheng, Paolo Bonzini,
	Stefan Hajnoczi, qemu-devel

08.02.2022 18:36, Emanuele Giuseppe Esposito wrote:
> If a drain happens while a job is sleeping, the timeout
> gets cancelled and the job continues once the drain ends.
> This is especially bad for the sleep performed in commit and stream
> jobs, since that is dictated by ratelimit to maintain a certain speed.
> 
> Basically the execution path is the following:
> 1. job calls job_sleep_ns, and yield with a timer in @ns ns.
> 2. meanwhile, a drain is executed, and
>     child_job_drained_{begin/end} could be executed as ->drained_begin()
>     and ->drained_end() callbacks.
>     Therefore child_job_drained_begin() enters the job, that continues
>     execution in job_sleep_ns() and calls job_pause_point_locked().
> 3. job_pause_point_locked() detects that we are in the middle of a
>     drain, and firstly deletes any existing timer and then yields again,
>     waiting for ->drained_end().
> 4. Once draining is finished, child_job_drained_end() runs and resumes
>     the job. At this point, the timer has been lost and we just resume
>     without checking if enough time has passed.
> 
> This fix implies that from now onwards, job_sleep_ns will force the job
> to sleep @ns, even if it is wake up (purposefully or not) in the middle
> of the sleep. Therefore qemu-iotests test might run a little bit slower,
> depending on the speed of the job. Setting a job speed to values like "1"
> is not allowed anymore (unless you want to wait forever).
> 
> Because of this fix, test_stream_parallel() in tests/qemu-iotests/030
> takes too long, since speed of stream job is just 1024 and before
> it was skipping all the wait thanks to the drains. Increase the
> speed to 256 * 1024. Exactly the same happens for test 151.
> 
> Instead we need to sleep less in test_cancel_ready() test-blockjob.c,
> so that the job will be able to exit the sleep and transition to ready
> before the main loop asserts.
> 
> Signed-off-by: Emanuele Giuseppe Esposito<eesposit@redhat.com>

Acked-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>

-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 4/6] test-bdrv-drain.c: adapt test to the coming subtree drains
  2022-02-10 14:32   ` Stefan Hajnoczi
@ 2022-02-10 16:02     ` Emanuele Giuseppe Esposito
  0 siblings, 0 replies; 28+ messages in thread
From: Emanuele Giuseppe Esposito @ 2022-02-10 16:02 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Kevin Wolf, Fam Zheng, Vladimir Sementsov-Ogievskiy, qemu-block,
	qemu-devel, Hanna Reitz, Paolo Bonzini, John Snow



On 10/02/2022 15:32, Stefan Hajnoczi wrote:
> On Tue, Feb 08, 2022 at 10:36:53AM -0500, Emanuele Giuseppe Esposito wrote:
>> There will be 2 problems in this test when we will add
>> subtree drains in bdrv_replace_child_noperm:
>>
>> - First, the test is inconsistent about taking the AioContext lock when
>>   calling bdrv_replace_child_noperm.  bdrv_replace_child_noperm is reached
>>   in two places: from blk_insert_bs directly, and via block_job_create.
>>   Only the second does it with the AioContext lock taken, and there seems
>>   to be no reason why the lock is needed.
>>   Move aio_context_acquire further down, to just protect block_job_add_bdrv()
>>
>> - Second, test_detach_indirect is only interested in observing the first
>>   call to .drained_begin. In the original test, there was only a single
>>   subtree drain; however, with additional drains introduced in
>>   bdrv_replace_child_noperm(), the test callback would be called too early
>>   and/or multiple times.
>>   Override the callback only when we actually want to use it, and put back
>>   the original after it's been invoked.
>>
>> Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
>> ---
>>  tests/unit/test-bdrv-drain.c | 17 ++++++++++++-----
>>  1 file changed, 12 insertions(+), 5 deletions(-)
>>
>> diff --git a/tests/unit/test-bdrv-drain.c b/tests/unit/test-bdrv-drain.c
>> index 4924ceb562..c52ba2db4e 100644
>> --- a/tests/unit/test-bdrv-drain.c
>> +++ b/tests/unit/test-bdrv-drain.c
>> @@ -912,12 +912,12 @@ static void test_blockjob_common_drain_node(enum drain_type drain_type,
>>      blk_insert_bs(blk_target, target, &error_abort);
>>      blk_set_allow_aio_context_change(blk_target, true);
>>  
>> -    aio_context_acquire(ctx);
>>      tjob = block_job_create("job0", &test_job_driver, NULL, src,
>>                              0, BLK_PERM_ALL,
>>                              0, 0, NULL, NULL, &error_abort);
>>      tjob->bs = src;
>>      job = &tjob->common;
>> +    aio_context_acquire(ctx);
> 
> block_job_create() uses src's AioContext. In the IOThread case the
> AioContext is not qemu_aio_context. My expectation is that src's
> AioContext must be acquired before block_job_create() starts using src.
> 
> blockdev.c QMP commands acquire the BDS's AioContext before calling the
> function that creates the job. It seems strange to do it differently in
> this test case.
> 
> You mentioned that blk_insert_bs() is called without acquiring an
> AioContext. That may be because we know blk_target is in
> qemu_aio_context and we assume our thread holds it (even if we don't
> explicitly hold it).

This is an assumption done in all unit tests, so it's not worth changing
only this case.

 If you want to fix an inconsistency then maybe fix
> that instead of removing the acquire around block_job_create()?

Your explanation above makes sense, I think I will drop this change here.

Thank you,
Emanuele



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/6] block/io.c: fix bdrv_child_cb_drained_begin invocations from a coroutine
  2022-02-08 15:36 ` [PATCH 1/6] block/io.c: fix bdrv_child_cb_drained_begin invocations from a coroutine Emanuele Giuseppe Esposito
  2022-02-10 14:14   ` Stefan Hajnoczi
@ 2022-02-11 11:54   ` Kevin Wolf
  2022-02-14 10:27     ` Emanuele Giuseppe Esposito
  1 sibling, 1 reply; 28+ messages in thread
From: Kevin Wolf @ 2022-02-11 11:54 UTC (permalink / raw)
  To: Emanuele Giuseppe Esposito
  Cc: Fam Zheng, Vladimir Sementsov-Ogievskiy, qemu-block, qemu-devel,
	Hanna Reitz, Stefan Hajnoczi, Paolo Bonzini, John Snow

Am 08.02.2022 um 16:36 hat Emanuele Giuseppe Esposito geschrieben:
> Using bdrv_do_drained_begin_quiesce() in bdrv_child_cb_drained_begin()
> is not a good idea: the callback might be called when running
> a drain in a coroutine, and bdrv_drained_begin_poll() does not
> handle that case, resulting in assertion failure.

I remembered that we talked about this only recently on IRC, but it
didn't make any sense to me again when I read this commit message. So I
think we need --verbose.

The .drained_begin callback was always meant to run outside of coroutine
context, so the unexpected part isn't that it calls a function that
can't run in coroutine context, but that it is already called itself in
coroutine context.

The problematic path is bdrv_replace_child_noperm() which then calls
bdrv_parent_drained_begin_single(poll=true). Polling in coroutine
context is dangerous, it can cause deadlocks because the caller of the
coroutine can't make progress. So I believe this call is already wrong
in coroutine context.

Now I don't know the call path up to bdrv_replace_child_noperm(), but as
far as I remember, that was another function that was originally never
run in coroutine context. Maybe we have good reason to change this, I
can't point to anything that would be inherently wrong with it, but I
would still be curious in which context it does run in a coroutine now.

Anyway, whatever the specific place is, I believe we must drop out of
coroutine context _before_ calling bdrv_parent_drained_begin_single(),
not only in callbacks called by it.

> Instead, bdrv_do_drained_begin with no recursion and poll
> will accomplish the same thing (invoking bdrv_do_drained_begin_quiesce)
> but will firstly check if we are already in a coroutine, and exit
> from that via bdrv_co_yield_to_drain().
> 
> Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>

Kevin

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/6] block.c: bdrv_replace_child_noperm: first remove the child, and then call ->detach()
  2022-02-08 15:36 ` [PATCH 2/6] block.c: bdrv_replace_child_noperm: first remove the child, and then call ->detach() Emanuele Giuseppe Esposito
  2022-02-10 14:16   ` Stefan Hajnoczi
@ 2022-02-11 12:28   ` Kevin Wolf
  1 sibling, 0 replies; 28+ messages in thread
From: Kevin Wolf @ 2022-02-11 12:28 UTC (permalink / raw)
  To: Emanuele Giuseppe Esposito
  Cc: Fam Zheng, Vladimir Sementsov-Ogievskiy, qemu-block, qemu-devel,
	Hanna Reitz, Stefan Hajnoczi, Paolo Bonzini, John Snow

Am 08.02.2022 um 16:36 hat Emanuele Giuseppe Esposito geschrieben:
> Doing the opposite can make ->detach() (more precisely
> bdrv_unapply_subtree_drain() in bdrv_child_cb_detach) undo the subtree_drain
> just performed to protect the removal of the child from the graph,
> thus making the fully-enabled assert_bdrv_graph_writable fail.
> 
> Note that assert_bdrv_graph_writable is not yet fully enabled.
> 
> Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
> ---
>  block.c | 10 ++++++----
>  1 file changed, 6 insertions(+), 4 deletions(-)
> 
> diff --git a/block.c b/block.c
> index 4551eba2aa..ec346a7e2e 100644
> --- a/block.c
> +++ b/block.c
> @@ -2854,14 +2854,16 @@ static void bdrv_replace_child_noperm(BdrvChild **childp,
>      }
>  
>      if (old_bs) {
> -        /* Detach first so that the recursive drain sections coming from @child
> +        assert_bdrv_graph_writable(old_bs);
> +        QLIST_REMOVE(child, next_parent);
> +        /*
> +         * Detach first so that the recursive drain sections coming from @child
>           * are already gone and we only end the drain sections that came from
> -         * elsewhere. */
> +         * elsewhere.
> +         */

This comment is confusing, but that's not your fault.

It was originally added in commit d736f119dae and referred to calling
.detach() before calling .drained_end(), which was the very next thing
it would do. Commit debc2927671 moved the .drained_end() code to the end
of the whole operation, but left this comment (and a similar one for
.attach() and .drained_begin()) around even though it doesn't explain
the new code very well any more.

>          if (child->klass->detach) {
>              child->klass->detach(child);
>          }
> -        assert_bdrv_graph_writable(old_bs);
> -        QLIST_REMOVE(child, next_parent);
>      }
>  
>      child->bs = new_bs;

After digging into what the comment really meant, I think it doesn't
refer to the order of QLIST_REMOVE() and .detach(). The change looks
safe to me.

I would just suggest updating the comment to explain the order you're
fixing here instead of the now irrelevant one.

Kevin



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 3/6] block.c: bdrv_replace_child_noperm: first call ->attach(), and then add child
  2022-02-08 15:36 ` [PATCH 3/6] block.c: bdrv_replace_child_noperm: first call ->attach(), and then add child Emanuele Giuseppe Esposito
  2022-02-10 14:16   ` Stefan Hajnoczi
@ 2022-02-11 12:34   ` Kevin Wolf
  2022-02-14 10:37     ` Emanuele Giuseppe Esposito
  1 sibling, 1 reply; 28+ messages in thread
From: Kevin Wolf @ 2022-02-11 12:34 UTC (permalink / raw)
  To: Emanuele Giuseppe Esposito
  Cc: Fam Zheng, Vladimir Sementsov-Ogievskiy, qemu-block, qemu-devel,
	Hanna Reitz, Stefan Hajnoczi, Paolo Bonzini, John Snow

Am 08.02.2022 um 16:36 hat Emanuele Giuseppe Esposito geschrieben:
> Doing the opposite can make adding the child node to a non-drained node,
> as apply_subtree_drain is only done in ->attach() and thus make
> assert_bdrv_graph_writable fail.
> 
> This can happen for example during a transaction rollback (test 245,
> test_io_with_graph_changes):
> 1. a node is removed from the graph, thus it is undrained
> 2. then something happens, and we need to roll back the transactions
>    through tran_abort()
> 3. at this point, the current code would first attach the undrained node
>    to the graph via QLIST_INSERT_HEAD, and then call ->attach() that
>    will take care of restoring the drain with apply_subtree_drain(),
>    leaving the node undrained between the two operations.
> 
> Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
> ---
>  block.c | 6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/block.c b/block.c
> index ec346a7e2e..08a6e3a4ef 100644
> --- a/block.c
> +++ b/block.c
> @@ -2872,8 +2872,6 @@ static void bdrv_replace_child_noperm(BdrvChild **childp,
>      }
>  
>      if (new_bs) {
> -        assert_bdrv_graph_writable(new_bs);
> -        QLIST_INSERT_HEAD(&new_bs->parents, child, next_parent);
>  
>          /*
>           * Detaching the old node may have led to the new node's
> @@ -2890,6 +2888,10 @@ static void bdrv_replace_child_noperm(BdrvChild **childp,
>          if (child->klass->attach) {
>              child->klass->attach(child);
>          }
> +
> +        assert_bdrv_graph_writable(new_bs);
> +        QLIST_INSERT_HEAD(&new_bs->parents, child, next_parent);
> +
>      }

Extra empty line. Looks good otherwise.

Does this also mean that the order in bdrv_child_cb_attach/detach() is
wrong? Or maybe adding a new node to bs->children is okay even when the
child node isn't drained.

Kevin



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 5/6] test-bdrv-drain.c: remove test_detach_by_parent_cb()
  2022-02-08 15:36 ` [PATCH 5/6] test-bdrv-drain.c: remove test_detach_by_parent_cb() Emanuele Giuseppe Esposito
  2022-02-10 14:33   ` Stefan Hajnoczi
@ 2022-02-11 15:38   ` Kevin Wolf
  2022-02-14 11:11     ` Emanuele Giuseppe Esposito
  2022-02-14 11:42     ` Paolo Bonzini
  1 sibling, 2 replies; 28+ messages in thread
From: Kevin Wolf @ 2022-02-11 15:38 UTC (permalink / raw)
  To: Emanuele Giuseppe Esposito
  Cc: Fam Zheng, Vladimir Sementsov-Ogievskiy, qemu-block, qemu-devel,
	Hanna Reitz, Stefan Hajnoczi, Paolo Bonzini, John Snow

Am 08.02.2022 um 16:36 hat Emanuele Giuseppe Esposito geschrieben:
> This test uses a callback of an I/O function (blk_aio_preadv)
> to modify the graph, using bdrv_attach_child.
> This is simply not allowed anymore. I/O cannot change the graph.

The callback of an I/O function isn't I/O, though. It is code _after_
the I/O has completed. If this doesn't work any more, it feels like this
is a bug.

I think it becomes a bit more obvious when you translate the AIO into
the equivalent coroutine function:

    void coroutine_fn foo(...)
    {
        GLOBAL_STATE_CODE();

        blk_co_preadv(...);
        detach_by_parent_aio_cb(...);
    }

This is obviously correct code that must work. Calling an I/O function
from a GS function is allowed, and of course the GS function may
continue to do GS stuff after the I/O function returned.

(Actually, I'm not sure if it really works when blk is not in the main
AioContext, but your API split patches claim that it does, so let's
assume for the moment that this is already true :-))

> Before "block/io.c: make bdrv_do_drained_begin_quiesce static
> and introduce bdrv_drained_begin_no_poll", the test would simply
> be at risk of failure, because if bdrv_replace_child_noperm()
> (called to modify the graph) would call a drain,
> then one callback of .drained_begin() is bdrv_do_drained_begin_quiesce,
> that specifically asserts that we are not in a coroutine.
> 
> Now that we fixed the behavior, the drain will invoke a bh in the
> main loop, so we don't have such problem. However, this test is still
> illegal and fails because we forbid graph changes from I/O paths.
> 
> Once we add the required subtree_drains to protect
> bdrv_replace_child_noperm(), the key problem in this test is in:

Probably a question for a different patch, but why is a subtree drain
required instead of just a normal node drain? It feels like a bigger
hammer than what is needed for replacing a single child.

> acb = blk_aio_preadv(blk, 0, &qiov, 0, detach_by_parent_aio_cb, NULL);
> /* Drain and check the expected result */
> bdrv_subtree_drained_begin(parent_b);
> 
> because the detach_by_parent_aio_cb calls detach_indirect_bh(), that
> modifies the graph and is invoked during bdrv_subtree_drained_begin().
> The call stack is the following:
> 1. blk_aio_preadv() creates a coroutine, increments in_flight counter
> and enters the coroutine running blk_aio_read_entry()
> 2. blk_aio_read_entry() performs the read and then schedules a bh to
>    complete (blk_aio_complete)
> 3. at this point, subtree_drained_begin() kicks in and waits for all
>    in_flight requests, polling
> 4. polling allows the bh to be scheduled, so blk_aio_complete runs
> 5. blk_aio_complete *first* invokes the callback
>    (detach_by_parent_aio_cb) and then decrements the in_flight counter
> 6. Here we have the problem: detach_by_parent_aio_cb modifies the graph,
>    so both bdrv_unref_child() and bdrv_attach_child() will have
>    subtree_drains inside. And this causes a deadlock, because the
>    nested drain will wait for in_flight counter to go to zero, which
>    is only happening once the drain itself finishes.

So the problem has nothing to do with detach_by_parent_aio_cb() being
an I/O function in the sense of locking rules (which it isn't), but with
calling a drain operation while having the in_flight counter increased.

In other words, an AIO callback like detach_by_parent_aio_cb() must
never call drain - and it doesn't before your changes to
bdrv_replace_child_noperm() break it. How confident are we that this
only breaks tests and not real code?

Part of the problem is probably that drain is serving two slightly
different purposes inside the block layer (just make sure that nothing
touches the node any more) and callers outside of the block layer (make
sure that everything has completed and no more callbacks will be
called). This bug sits exactly in the difference between those two
concepts - we're waiting for more than we would have to wait for, and it
causes a deadlock in this combination.

I guess it could be fixed if BlockBackend accounted for requests that
are already completed, but their callback hasn't yet. blk_drain() would
then have to wait for those requests, but blk_root_drained_poll()
wouldn't because these requests don't affect the root node any more.

> Different story is test_detach_by_driver_cb(): in this case,
> detach_by_parent_aio_cb() does not call detach_indirect_bh(),
> but it is instead called as a bh running in the main loop by
> detach_by_driver_cb_drained_begin(), the callback for
> .drained_begin().
> 
> This test was added in 231281ab42 and part of the series
> "Drain fixes and cleanups, part 3"
> https://lists.nongnu.org/archive/html/qemu-block/2018-05/msg01132.html
> but as explained above I believe that it is not valid anymore, and
> can be discarded.
> 
> Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>

I feel throwing tests away just because they don't pass any more is a
bit too simple. :-)

Kevin

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/6] block/io.c: fix bdrv_child_cb_drained_begin invocations from a coroutine
  2022-02-11 11:54   ` Kevin Wolf
@ 2022-02-14 10:27     ` Emanuele Giuseppe Esposito
  2022-02-14 11:57       ` Paolo Bonzini
  2022-02-14 12:03       ` Kevin Wolf
  0 siblings, 2 replies; 28+ messages in thread
From: Emanuele Giuseppe Esposito @ 2022-02-14 10:27 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: Fam Zheng, Vladimir Sementsov-Ogievskiy, qemu-block, qemu-devel,
	Hanna Reitz, Stefan Hajnoczi, Paolo Bonzini, John Snow

On 11/02/2022 12:54, Kevin Wolf wrote:
> Am 08.02.2022 um 16:36 hat Emanuele Giuseppe Esposito geschrieben:
>> Using bdrv_do_drained_begin_quiesce() in bdrv_child_cb_drained_begin()
>> is not a good idea: the callback might be called when running
>> a drain in a coroutine, and bdrv_drained_begin_poll() does not
>> handle that case, resulting in assertion failure.
> 
> I remembered that we talked about this only recently on IRC, but it
> didn't make any sense to me again when I read this commit message. So I
> think we need --verbose.
> 
> The .drained_begin callback was always meant to run outside of coroutine
> context, so the unexpected part isn't that it calls a function that
> can't run in coroutine context, but that it is already called itself in
> coroutine context.
> 
> The problematic path is bdrv_replace_child_noperm() which then calls
> bdrv_parent_drained_begin_single(poll=true). Polling in coroutine
> context is dangerous, it can cause deadlocks because the caller of the
> coroutine can't make progress. So I believe this call is already wrong
> in coroutine context.

Ok, you added this assertion in dcf94a23, but at that time there was no
bdrv_parent_drained_begin_single, and the polling was only done in
bdrv_do_drained_begin. So I think that to keep the same logic, the
assertion should be moved in bdrv_parent_drained_begin_single()? And
even more specifically, only if the poll flag is true.

I triggered this by adding additional drains in the callers of
bdrv_replace_child_noperm(), and I think some test (probably unit test)
was failing because of either the drained_begin callback itself called
by the drain, or as you suggested the callbacks called by
bdrv_parent_drained_begin_single from bdrv_replace_child_noperm.

Anyways, I think that in addition to the fix in this patch, we should
also fix bdrv_parent_drained_begin_single(poll=true) in
bdrv_replace_child_noperm, with something similar to what is done in
bdrv_co_yield_to_drain? ie if we are in coroutine, schedule a BH that
runs the same logic but in the main loop, but then somehow wait that it
finishes before continuing?
Even though at that point we would have a coroutine waiting for the main
loop, which I don't think it's something we want.

Alternatively, we would forbid polling in coroutines at all. And the
only place I can see that is using the drain in coroutine is mirror (see
below).

Additional question: I also noticed that there is a bdrv_drained_begin()
call in mirror.c in the JobDriver run() callback. How can this even
work? If a parent uses bdrv_child_cb_drained_begin (which should not be
so rare) it will crash because of the assertion.

Further additional question: actually I don't understand also the
polling logic of mirror (mirror_drained_poll), as if we are draining in
the coroutine with in_drain = true I think we can have a deadlock if
in_flight>0?

Emanuele

> 
> Now I don't know the call path up to bdrv_replace_child_noperm(), but as
> far as I remember, that was another function that was originally never
> run in coroutine context. Maybe we have good reason to change this, I
> can't point to anything that would be inherently wrong with it, but I
> would still be curious in which context it does run in a coroutine now.
> 
> Anyway, whatever the specific place is, I believe we must drop out of
> coroutine context _before_ calling bdrv_parent_drained_begin_single(),
> not only in callbacks called by it.
> 
>> Instead, bdrv_do_drained_begin with no recursion and poll
>> will accomplish the same thing (invoking bdrv_do_drained_begin_quiesce)
>> but will firstly check if we are already in a coroutine, and exit
>> from that via bdrv_co_yield_to_drain().
>>
>> Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
> 
> Kevin
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 3/6] block.c: bdrv_replace_child_noperm: first call ->attach(), and then add child
  2022-02-11 12:34   ` Kevin Wolf
@ 2022-02-14 10:37     ` Emanuele Giuseppe Esposito
  0 siblings, 0 replies; 28+ messages in thread
From: Emanuele Giuseppe Esposito @ 2022-02-14 10:37 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: Fam Zheng, Vladimir Sementsov-Ogievskiy, qemu-block, qemu-devel,
	Hanna Reitz, Stefan Hajnoczi, Paolo Bonzini, John Snow



On 11/02/2022 13:34, Kevin Wolf wrote:
> Am 08.02.2022 um 16:36 hat Emanuele Giuseppe Esposito geschrieben:
>> Doing the opposite can make adding the child node to a non-drained node,
>> as apply_subtree_drain is only done in ->attach() and thus make
>> assert_bdrv_graph_writable fail.
>>
>> This can happen for example during a transaction rollback (test 245,
>> test_io_with_graph_changes):
>> 1. a node is removed from the graph, thus it is undrained
>> 2. then something happens, and we need to roll back the transactions
>>    through tran_abort()
>> 3. at this point, the current code would first attach the undrained node
>>    to the graph via QLIST_INSERT_HEAD, and then call ->attach() that
>>    will take care of restoring the drain with apply_subtree_drain(),
>>    leaving the node undrained between the two operations.
>>
>> Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
>> ---
>>  block.c | 6 ++++--
>>  1 file changed, 4 insertions(+), 2 deletions(-)
>>
>> diff --git a/block.c b/block.c
>> index ec346a7e2e..08a6e3a4ef 100644
>> --- a/block.c
>> +++ b/block.c
>> @@ -2872,8 +2872,6 @@ static void bdrv_replace_child_noperm(BdrvChild **childp,
>>      }
>>  
>>      if (new_bs) {
>> -        assert_bdrv_graph_writable(new_bs);
>> -        QLIST_INSERT_HEAD(&new_bs->parents, child, next_parent);
>>  
>>          /*
>>           * Detaching the old node may have led to the new node's
>> @@ -2890,6 +2888,10 @@ static void bdrv_replace_child_noperm(BdrvChild **childp,
>>          if (child->klass->attach) {
>>              child->klass->attach(child);
>>          }
>> +
>> +        assert_bdrv_graph_writable(new_bs);
>> +        QLIST_INSERT_HEAD(&new_bs->parents, child, next_parent);
>> +
>>      }
> 
> Extra empty line. Looks good otherwise.
> 
> Does this also mean that the order in bdrv_child_cb_attach/detach() is
> wrong? Or maybe adding a new node to bs->children is okay even when the
> child node isn't drained.

No I don't think it's wrong. In fact, if we are just replacing a node
(so old_bs and new_bs are both != NULL), the child will be just removed
and then re-added to the same children's list of the same parent
(child->opaque).

Whether adding a new node to bs->children requires a drain or not is
still under debate in the other serie with Vladimir. We'll see about
that, but in the meanwhile this is just a safe fix that makes sure that
*if* drains are added, everything will always stay under proper drain.

Emanuele

> 
> Kevin
> 



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 5/6] test-bdrv-drain.c: remove test_detach_by_parent_cb()
  2022-02-11 15:38   ` Kevin Wolf
@ 2022-02-14 11:11     ` Emanuele Giuseppe Esposito
  2022-02-14 11:42     ` Paolo Bonzini
  1 sibling, 0 replies; 28+ messages in thread
From: Emanuele Giuseppe Esposito @ 2022-02-14 11:11 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: Fam Zheng, Vladimir Sementsov-Ogievskiy, qemu-block, qemu-devel,
	Hanna Reitz, Stefan Hajnoczi, Paolo Bonzini, John Snow



On 11/02/2022 16:38, Kevin Wolf wrote:
> Am 08.02.2022 um 16:36 hat Emanuele Giuseppe Esposito geschrieben:
>> This test uses a callback of an I/O function (blk_aio_preadv)
>> to modify the graph, using bdrv_attach_child.
>> This is simply not allowed anymore. I/O cannot change the graph.
> 
> The callback of an I/O function isn't I/O, though. It is code _after_
> the I/O has completed. If this doesn't work any more, it feels like this
> is a bug.
> 
> I think it becomes a bit more obvious when you translate the AIO into
> the equivalent coroutine function:
> 
>     void coroutine_fn foo(...)
>     {
>         GLOBAL_STATE_CODE();
> 
>         blk_co_preadv(...);
>         detach_by_parent_aio_cb(...);
>     }
> 
> This is obviously correct code that must work. Calling an I/O function
> from a GS function is allowed, and of course the GS function may
> continue to do GS stuff after the I/O function returned.
> 
> (Actually, I'm not sure if it really works when blk is not in the main
> AioContext, but your API split patches claim that it does, so let's
> assume for the moment that this is already true :-))

Uhmm why does my split claims that it is? all blk_aio_* are IO_CODE.
Also, as you said, if blk is not in the main loop, the callback is not
GS either.

> 
>> Before "block/io.c: make bdrv_do_drained_begin_quiesce static
>> and introduce bdrv_drained_begin_no_poll", the test would simply
>> be at risk of failure, because if bdrv_replace_child_noperm()
>> (called to modify the graph) would call a drain,
>> then one callback of .drained_begin() is bdrv_do_drained_begin_quiesce,
>> that specifically asserts that we are not in a coroutine.
>>
>> Now that we fixed the behavior, the drain will invoke a bh in the
>> main loop, so we don't have such problem. However, this test is still
>> illegal and fails because we forbid graph changes from I/O paths.
>>
>> Once we add the required subtree_drains to protect
>> bdrv_replace_child_noperm(), the key problem in this test is in:
> 
> Probably a question for a different patch, but why is a subtree drain
> required instead of just a normal node drain? It feels like a bigger
> hammer than what is needed for replacing a single child.
> 
>> acb = blk_aio_preadv(blk, 0, &qiov, 0, detach_by_parent_aio_cb, NULL);
>> /* Drain and check the expected result */
>> bdrv_subtree_drained_begin(parent_b);
>>
>> because the detach_by_parent_aio_cb calls detach_indirect_bh(), that
>> modifies the graph and is invoked during bdrv_subtree_drained_begin().
>> The call stack is the following:
>> 1. blk_aio_preadv() creates a coroutine, increments in_flight counter
>> and enters the coroutine running blk_aio_read_entry()
>> 2. blk_aio_read_entry() performs the read and then schedules a bh to
>>    complete (blk_aio_complete)
>> 3. at this point, subtree_drained_begin() kicks in and waits for all
>>    in_flight requests, polling
>> 4. polling allows the bh to be scheduled, so blk_aio_complete runs
>> 5. blk_aio_complete *first* invokes the callback
>>    (detach_by_parent_aio_cb) and then decrements the in_flight counter
>> 6. Here we have the problem: detach_by_parent_aio_cb modifies the graph,
>>    so both bdrv_unref_child() and bdrv_attach_child() will have
>>    subtree_drains inside. And this causes a deadlock, because the
>>    nested drain will wait for in_flight counter to go to zero, which
>>    is only happening once the drain itself finishes.
> 
> So the problem has nothing to do with detach_by_parent_aio_cb() being
> an I/O function in the sense of locking rules (which it isn't), but with
> calling a drain operation while having the in_flight counter increased.
> 
> In other words, an AIO callback like detach_by_parent_aio_cb() must
> never call drain - and it doesn't before your changes to
> bdrv_replace_child_noperm() break it. How confident are we that this
> only breaks tests and not real code?
I am not sure. From a quick look, the AIO callback is really used pretty
much everywhere. Maybe we should really find a way to asseert
GLOBAL_STATE_CODE and friends?

> 
> Part of the problem is probably that drain is serving two slightly
> different purposes inside the block layer (just make sure that nothing
> touches the node any more) and callers outside of the block layer (make
> sure that everything has completed and no more callbacks will be
> called). This bug sits exactly in the difference between those two
> concepts - we're waiting for more than we would have to wait for, and it
> causes a deadlock in this combination.
> 
> I guess it could be fixed if BlockBackend accounted for requests that
> are already completed, but their callback hasn't yet. blk_drain() would
> then have to wait for those requests, but blk_root_drained_poll()
> wouldn't because these requests don't affect the root node any more.
> 
>> Different story is test_detach_by_driver_cb(): in this case,
>> detach_by_parent_aio_cb() does not call detach_indirect_bh(),
>> but it is instead called as a bh running in the main loop by
>> detach_by_driver_cb_drained_begin(), the callback for
>> .drained_begin().
>>
>> This test was added in 231281ab42 and part of the series
>> "Drain fixes and cleanups, part 3"
>> https://lists.nongnu.org/archive/html/qemu-block/2018-05/msg01132.html
>> but as explained above I believe that it is not valid anymore, and
>> can be discarded.
>>
>> Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
> 
> I feel throwing tests away just because they don't pass any more is a
> bit too simple. :-)

Well if the test is doing something it is not supposed to do, why not :)

And anyways this was obviously discussed in IRC with Paolo and somebody
else, can't remember who.

Emanuele
> 
> Kevin
> 



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 5/6] test-bdrv-drain.c: remove test_detach_by_parent_cb()
  2022-02-11 15:38   ` Kevin Wolf
  2022-02-14 11:11     ` Emanuele Giuseppe Esposito
@ 2022-02-14 11:42     ` Paolo Bonzini
  2022-02-14 15:28       ` Kevin Wolf
  1 sibling, 1 reply; 28+ messages in thread
From: Paolo Bonzini @ 2022-02-14 11:42 UTC (permalink / raw)
  To: Kevin Wolf, Emanuele Giuseppe Esposito
  Cc: Fam Zheng, Vladimir Sementsov-Ogievskiy, qemu-block, qemu-devel,
	Hanna Reitz, Stefan Hajnoczi, John Snow

On 2/11/22 16:38, Kevin Wolf wrote:
> The callback of an I/O function isn't I/O, though. It is code_after_
> the I/O has completed. If this doesn't work any more, it feels like this
> is a bug.

The issue is that the I/O has *not* completed yet.  blk_aio_preadv(..., 
cb, opaque) is not equivalent to

	ret = blk_co_preadv(...);
	cb(ret, opaque);

but rather to

	blk_inc_in_flight(blk);
	ret = blk_co_preadv(...);
	cb(ret, opaque);
	blk_dec_in_flight(blk);

Your own commit message (yeah I know we've all been through that :)) 
explains why, and notes that it is now invalid to drain in a callback:

     commit 46aaf2a566e364a62315219255099cbf1c9b990d
     Author: Kevin Wolf <kwolf@redhat.com>
     Date:   Thu Sep 6 17:47:22 2018 +0200

     block-backend: Decrease in_flight only after callback

     Request callbacks can do pretty much anything, including operations
     that will yield from the coroutine (such as draining the backend).
     In that case, a decreased in_flight would be visible to other code
     and could lead to a drain completing while the callback hasn't
     actually completed yet.

     Note that reordering these operations forbids calling drain directly
     inside an AIO callback.

So the questions are:

1) is the above commit wrong though well-intentioned?

2) is it unexpected that bdrv_replace_child_noperm() drains (thus 
becoming invalid from the callback, albeit valid through a bottom half)?

My answer is respectively 1) it's correct, many coroutines do 
inc_in_flight before creation and dec_in_flight at the end, we're just 
saying that it's _always_ the case for callback-based operations; 2) no, 
it's not unexpected and therefore the test is the incorrect one.

Paolo

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/6] block/io.c: fix bdrv_child_cb_drained_begin invocations from a coroutine
  2022-02-14 10:27     ` Emanuele Giuseppe Esposito
@ 2022-02-14 11:57       ` Paolo Bonzini
  2022-02-17 15:49         ` Emanuele Giuseppe Esposito
  2022-02-14 12:03       ` Kevin Wolf
  1 sibling, 1 reply; 28+ messages in thread
From: Paolo Bonzini @ 2022-02-14 11:57 UTC (permalink / raw)
  To: Emanuele Giuseppe Esposito, Kevin Wolf
  Cc: Fam Zheng, Vladimir Sementsov-Ogievskiy, qemu-block, qemu-devel,
	Hanna Reitz, Stefan Hajnoczi, John Snow

On 2/14/22 11:27, Emanuele Giuseppe Esposito wrote:
> Anyways, I think that in addition to the fix in this patch, we should
> also fix bdrv_parent_drained_begin_single(poll=true) in
> bdrv_replace_child_noperm, with something similar to what is done in
> bdrv_co_yield_to_drain? ie if we are in coroutine, schedule a BH that
> runs the same logic but in the main loop, but then somehow wait that it
> finishes before continuing?
> 
> Alternatively, we would forbid polling in coroutines at all. And the
> only place I can see that is using the drain in coroutine is mirror (see
> below).

I think you should first of all see what breaks if you forbid 
bdrv_replace_child_noperm() from coroutine context.

Drain in coroutines does not poll, it gets out of the coroutine through 
a bottom half before polling.  So if bdrv_replace_child_noperm() doesn't 
require it, polling in coroutines can still be forbidden.

This patch is correct nevertheless.

Paolo


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/6] block/io.c: fix bdrv_child_cb_drained_begin invocations from a coroutine
  2022-02-14 10:27     ` Emanuele Giuseppe Esposito
  2022-02-14 11:57       ` Paolo Bonzini
@ 2022-02-14 12:03       ` Kevin Wolf
  1 sibling, 0 replies; 28+ messages in thread
From: Kevin Wolf @ 2022-02-14 12:03 UTC (permalink / raw)
  To: Emanuele Giuseppe Esposito
  Cc: Fam Zheng, Vladimir Sementsov-Ogievskiy, qemu-block, qemu-devel,
	Hanna Reitz, Stefan Hajnoczi, Paolo Bonzini, John Snow

Am 14.02.2022 um 11:27 hat Emanuele Giuseppe Esposito geschrieben:
> 
> 
> On 11/02/2022 12:54, Kevin Wolf wrote:
> > Am 08.02.2022 um 16:36 hat Emanuele Giuseppe Esposito geschrieben:
> >> Using bdrv_do_drained_begin_quiesce() in bdrv_child_cb_drained_begin()
> >> is not a good idea: the callback might be called when running
> >> a drain in a coroutine, and bdrv_drained_begin_poll() does not
> >> handle that case, resulting in assertion failure.
> > 
> > I remembered that we talked about this only recently on IRC, but it
> > didn't make any sense to me again when I read this commit message. So I
> > think we need --verbose.
> > 
> > The .drained_begin callback was always meant to run outside of coroutine
> > context, so the unexpected part isn't that it calls a function that
> > can't run in coroutine context, but that it is already called itself in
> > coroutine context.
> > 
> > The problematic path is bdrv_replace_child_noperm() which then calls
> > bdrv_parent_drained_begin_single(poll=true). Polling in coroutine
> > context is dangerous, it can cause deadlocks because the caller of the
> > coroutine can't make progress. So I believe this call is already wrong
> > in coroutine context.
> 
> Ok, you added this assertion in dcf94a23, but at that time there was no
> bdrv_parent_drained_begin_single, and the polling was only done in
> bdrv_do_drained_begin. So I think that to keep the same logic, the
> assertion should be moved in bdrv_parent_drained_begin_single()? And
> even more specifically, only if the poll flag is true.

I wouldn't necessarily say move, but copying it there makes sense to me.
In order to keep the interface constraints simple, I would assert it
independent of the poll parameter.

> I triggered this by adding additional drains in the callers of
> bdrv_replace_child_noperm(), and I think some test (probably unit test)
> was failing because of either the drained_begin callback itself called
> by the drain, or as you suggested the callbacks called by
> bdrv_parent_drained_begin_single from bdrv_replace_child_noperm.
> 
> Anyways, I think that in addition to the fix in this patch, we should
> also fix bdrv_parent_drained_begin_single(poll=true) in
> bdrv_replace_child_noperm, with something similar to what is done in
> bdrv_co_yield_to_drain? ie if we are in coroutine, schedule a BH that
> runs the same logic but in the main loop, but then somehow wait that it
> finishes before continuing?
> Even though at that point we would have a coroutine waiting for the main
> loop, which I don't think it's something we want.

Coroutines are waiting for the main loop all the time, why would this be
a problem?

Yes, I think a mechanism similar to bdrv_co_yield_to_drain() is needed
if we want to allow callers to be in coroutine context.

And once we have this mechanism, it's actually not in addition to this
patch, but instead of it, because this patch isn't needed any more when
we know that we can't be in coroutine context.

> Alternatively, we would forbid polling in coroutines at all. And the
> only place I can see that is using the drain in coroutine is mirror
> (see below).

Well, my point is that it is already forbidden because it can deadlock.
Code that polls in coroutine context anyway is probably buggy, unless it
can guarantee very specific circumstances that make a deadlock
impossible.

Maybe we can actually assert this in AIO_WAIT_WHILE().

> Additional question: I also noticed that there is a bdrv_drained_begin()
> call in mirror.c in the JobDriver run() callback. How can this even
> work? If a parent uses bdrv_child_cb_drained_begin (which should not be
> so rare) it will crash because of the assertion.

bdrv_co_yield_to_drain() lets this code run in the main loop.

> Further additional question: actually I don't understand also the
> polling logic of mirror (mirror_drained_poll), as if we are draining in
> the coroutine with in_drain = true I think we can have a deadlock if
> in_flight>0?

You mean for a drain issued by the mirror job itself? The in-flight
requests are still processed by the polling loop, so eventually
in_flight should become 0.

Kevin



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 5/6] test-bdrv-drain.c: remove test_detach_by_parent_cb()
  2022-02-14 11:42     ` Paolo Bonzini
@ 2022-02-14 15:28       ` Kevin Wolf
  2022-02-14 17:39         ` Paolo Bonzini
  0 siblings, 1 reply; 28+ messages in thread
From: Kevin Wolf @ 2022-02-14 15:28 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Emanuele Giuseppe Esposito, Fam Zheng,
	Vladimir Sementsov-Ogievskiy, qemu-block, qemu-devel,
	Hanna Reitz, Stefan Hajnoczi, John Snow

Am 14.02.2022 um 12:42 hat Paolo Bonzini geschrieben:
> On 2/11/22 16:38, Kevin Wolf wrote:
> > The callback of an I/O function isn't I/O, though. It is code_after_
> > the I/O has completed. If this doesn't work any more, it feels like this
> > is a bug.
> 
> The issue is that the I/O has *not* completed yet.  blk_aio_preadv(..., cb,
> opaque) is not equivalent to
> 
> 	ret = blk_co_preadv(...);
> 	cb(ret, opaque);
> 
> but rather to
> 
> 	blk_inc_in_flight(blk);
> 	ret = blk_co_preadv(...);
> 	cb(ret, opaque);
> 	blk_dec_in_flight(blk);

It depends on what you call "the I/O". The request to blk_bs(blk) has
already completed, but the request to blk itself hasn't yet.

This is exactly the difference I was talking about:
bdrv_replace_child_noperm() really only cares about requests to the
BlockDriverState, but drain currently waits for attached BlockBackends,
too.

The BlockBackend could safely return false from blk_root_drained_poll()
while requests are still in their callbacks (if they do anything that
touches a node, they would increase in_flight again), it just doesn't do
it yet. It's only blk_drain(_all)() that would still have to wait for
those.

> Your own commit message (yeah I know we've all been through that :))
> explains why, and notes that it is now invalid to drain in a callback:
> 
>     commit 46aaf2a566e364a62315219255099cbf1c9b990d
>     Author: Kevin Wolf <kwolf@redhat.com>
>     Date:   Thu Sep 6 17:47:22 2018 +0200
> 
>     block-backend: Decrease in_flight only after callback
> 
>     Request callbacks can do pretty much anything, including operations
>     that will yield from the coroutine (such as draining the backend).
>     In that case, a decreased in_flight would be visible to other code
>     and could lead to a drain completing while the callback hasn't
>     actually completed yet.
> 
>     Note that reordering these operations forbids calling drain directly
>     inside an AIO callback.

Yes, I wasn't aware of this any more, but I've come to the same
conclusion in my previous message in this thread.

> So the questions are:
> 
> 1) is the above commit wrong though well-intentioned?
> 
> 2) is it unexpected that bdrv_replace_child_noperm() drains (thus becoming
> invalid from the callback, albeit valid through a bottom half)?
> 
> 
> My answer is respectively 1) it's correct, many coroutines do inc_in_flight
> before creation and dec_in_flight at the end, we're just saying that it's
> _always_ the case for callback-based operations; 2) no, it's not unexpected
> and therefore the test is the incorrect one.

My question isn't really only about the test, though. If it is now
forbidden to call bdrv_replace_child_noperm() in a callback, how do we
verify that the test is the only incorrect one rather than just the
obvious one?

And is it better to throw away the test and find and fix all other
places that are using something that is now forbidden, or wouldn't it be
better to actually allow bdrv_replace_child_noperm() in callbacks?

Kevin

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 5/6] test-bdrv-drain.c: remove test_detach_by_parent_cb()
  2022-02-14 15:28       ` Kevin Wolf
@ 2022-02-14 17:39         ` Paolo Bonzini
  0 siblings, 0 replies; 28+ messages in thread
From: Paolo Bonzini @ 2022-02-14 17:39 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: Emanuele Giuseppe Esposito, Fam Zheng,
	Vladimir Sementsov-Ogievskiy, qemu-block, qemu-devel,
	Hanna Reitz, Stefan Hajnoczi, John Snow

On 2/14/22 16:28, Kevin Wolf wrote:
> The BlockBackend could safely return false from blk_root_drained_poll()
> while requests are still in their callbacks (if they do anything that
> touches a node, they would increase in_flight again), it just doesn't do
> it yet. It's only blk_drain(_all)() that would still have to wait for
> those.

That would be very subtle, especially it's not clear to me why this 
wouldn't be "a drain completing while the callback hasn't actually 
completed yet".  The drain referred to in the commit message of 
46aaf2a56 could *not* use blk_drain, because it is not coroutine 
friendly; it must use bdrv_drained_begin.

>> My answer is respectively 1) it's correct, many coroutines do inc_in_flight
>> before creation and dec_in_flight at the end, we're just saying that it's
>> _always_  the case for callback-based operations; 2) no, it's not unexpected
>> and therefore the test is the incorrect one.
> My question isn't really only about the test, though. If it is now
> forbidden to call bdrv_replace_child_noperm() in a callback, how do we
> verify that the test is the only incorrect one rather than just the
> obvious one?
> 
> And is it better to throw away the test and find and fix all other
> places that are using something that is now forbidden, or wouldn't it be
> better to actually allow bdrv_replace_child_noperm() in callbacks?

The question is would you ever need bdrv_replace_child_noperm() in 
callbacks?  The AIO functions are called from any iothread and so are 
the callbacks.  We do have a usecase (in block/mirror.c) for 
bdrv_drained_begin from a coroutine; do we have a usecase for calling 
global-state functions from a callback, in such a way that:

1) going through a bottom half would not be possible

2) it's only needed in the special case of a BlockBackend homed in the 
main event loop (because otherwise you'd have to go through a bottom 
half, and we have excluded that already)?

Thanks,

Paolo

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/6] block/io.c: fix bdrv_child_cb_drained_begin invocations from a coroutine
  2022-02-14 11:57       ` Paolo Bonzini
@ 2022-02-17 15:49         ` Emanuele Giuseppe Esposito
  0 siblings, 0 replies; 28+ messages in thread
From: Emanuele Giuseppe Esposito @ 2022-02-17 15:49 UTC (permalink / raw)
  To: Paolo Bonzini, Kevin Wolf
  Cc: Fam Zheng, Vladimir Sementsov-Ogievskiy, qemu-block, qemu-devel,
	Hanna Reitz, Stefan Hajnoczi, John Snow

On 14/02/2022 12:57, Paolo Bonzini wrote:
> On 2/14/22 11:27, Emanuele Giuseppe Esposito wrote:
>> Anyways, I think that in addition to the fix in this patch, we should
>> also fix bdrv_parent_drained_begin_single(poll=true) in
>> bdrv_replace_child_noperm, with something similar to what is done in
>> bdrv_co_yield_to_drain? ie if we are in coroutine, schedule a BH that
>> runs the same logic but in the main loop, but then somehow wait that it
>> finishes before continuing?
>>
>> Alternatively, we would forbid polling in coroutines at all. And the
>> only place I can see that is using the drain in coroutine is mirror (see
>> below).
> 
> I think you should first of all see what breaks if you forbid
> bdrv_replace_child_noperm() from coroutine context.

Checked. Basically, if I add the assertion assert(!qemu_in_coroutine())
only in bdrv_parent_drained_begin_single(), iotests and unit tests run
as intended.

If I add the assertion in bdrv_replace_child_noperm(), so that the
function can't be invoked by coroutines, everything breaks. Starting
from qemu-img, as it uses bdrv_create_co_entry() and therefore
qcow2_co_create_opts() for qcow2 format.

On the other side, if I add subtree_drains throughout the code, we end
up having  bdrv_parent_drained_begin_single() called much more
frequently, and since bdrv_replace_child_noperm() *is* called by
coroutines, the drain count won't match, so
bdrv_parent_drained_begin_single() will be called much more frequently
and the assertion will fail.

I am testing the fix agreed with Kevin, i.e. implement something very
similar to bdrv_co_yield_to_drain() in
bdrv_parent_drained_begin_single(), where we just create a bh to do the
work in the main loop, and it seems to be working. Maybe that's the way
to go for bdrv_replace_child_noperm?

Should I get rid of this one or have both fixes in two patches? This
patch just fixes part of the problem, but as Paolo said it's correct
nevertheless.

Emanuele

> 
> Drain in coroutines does not poll, it gets out of the coroutine through
> a bottom half before polling.  So if bdrv_replace_child_noperm() doesn't
> require it, polling in coroutines can still be forbidden.
> 
> This patch is correct nevertheless.
> 
> Paolo
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2022-02-17 15:54 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-02-08 15:36 [PATCH 0/6] block: bug fixes in preparation of AioContext removal Emanuele Giuseppe Esposito
2022-02-08 15:36 ` [PATCH 1/6] block/io.c: fix bdrv_child_cb_drained_begin invocations from a coroutine Emanuele Giuseppe Esposito
2022-02-10 14:14   ` Stefan Hajnoczi
2022-02-11 11:54   ` Kevin Wolf
2022-02-14 10:27     ` Emanuele Giuseppe Esposito
2022-02-14 11:57       ` Paolo Bonzini
2022-02-17 15:49         ` Emanuele Giuseppe Esposito
2022-02-14 12:03       ` Kevin Wolf
2022-02-08 15:36 ` [PATCH 2/6] block.c: bdrv_replace_child_noperm: first remove the child, and then call ->detach() Emanuele Giuseppe Esposito
2022-02-10 14:16   ` Stefan Hajnoczi
2022-02-11 12:28   ` Kevin Wolf
2022-02-08 15:36 ` [PATCH 3/6] block.c: bdrv_replace_child_noperm: first call ->attach(), and then add child Emanuele Giuseppe Esposito
2022-02-10 14:16   ` Stefan Hajnoczi
2022-02-11 12:34   ` Kevin Wolf
2022-02-14 10:37     ` Emanuele Giuseppe Esposito
2022-02-08 15:36 ` [PATCH 4/6] test-bdrv-drain.c: adapt test to the coming subtree drains Emanuele Giuseppe Esposito
2022-02-10 14:32   ` Stefan Hajnoczi
2022-02-10 16:02     ` Emanuele Giuseppe Esposito
2022-02-08 15:36 ` [PATCH 5/6] test-bdrv-drain.c: remove test_detach_by_parent_cb() Emanuele Giuseppe Esposito
2022-02-10 14:33   ` Stefan Hajnoczi
2022-02-11 15:38   ` Kevin Wolf
2022-02-14 11:11     ` Emanuele Giuseppe Esposito
2022-02-14 11:42     ` Paolo Bonzini
2022-02-14 15:28       ` Kevin Wolf
2022-02-14 17:39         ` Paolo Bonzini
2022-02-08 15:36 ` [PATCH 6/6] jobs: ensure sleep in job_sleep_ns is fully performed Emanuele Giuseppe Esposito
2022-02-10 14:43   ` Stefan Hajnoczi
2022-02-10 15:02   ` Vladimir Sementsov-Ogievskiy

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.