All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH 0/6] Removal of Aiocontext lock and usage of subtree drains in aborted transactions
@ 2021-12-13 10:40 Emanuele Giuseppe Esposito
  2021-12-13 10:40 ` [RFC PATCH 1/6] tests/unit/test-bdrv-drain.c: graph setup functions can't run in coroutines Emanuele Giuseppe Esposito
                   ` (7 more replies)
  0 siblings, 8 replies; 12+ messages in thread
From: Emanuele Giuseppe Esposito @ 2021-12-13 10:40 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Fam Zheng, Emanuele Giuseppe Esposito, qemu-devel,
	Hanna Reitz, Stefan Hajnoczi, Paolo Bonzini

Hello everyone,

As you know already, my current goal is to try to remove the AioContext lock from the QEMU block layer.
Currently the AioContext is used pretty much throughout the whole block layer, it is a little bit confusing to understand what it exactly protects, and I am starting to think that in some places it is being taken just because of the block API assumptions.
For example, some functions like AIO_WAIT_WHILE() release the lock with the assumption that it is always held, so all callers must take it just to allow the function to release it.

Removing the aiocontext lock is not a straightforward task: the first step is to understand which function is running in the main loop thus under the BQL (Big Qemu Lock) and which is used by the iothreads. We call the former category global state (GS) and the latter I/O.

The patch series "block layer: split block APIs in global state and I/O" aims to do that. Once we can at least (roughly) distinguish what is called by iothreads and what from the main loop, we can start analyzing what needs protection and what doesn't. This series is particularly helpful because by splitting the API we know where each function runs, so it helps us identifying the cases where both the main loop and iothreads read/write the same value/field (and thus need protection) and cases where the same function is used only by the main loop for example, so it shouldn't need protection.
For example, if some BlockDriverState field is read by I/O threads but modified in a GS function, this has to be protected in some way.

Another series I posted, "job: replace AioContext lock with job_mutex", provides a good example on how the AioContext lock can be removed and simply replaced by a fine grained lock.

Another way to have thread safety in the AioContext is to rely to the fact that in some cases, writings to a field are always done in the main loop *and* under drains. In this way, we know that no request is coming to the I/O threads, so we can safely modify the fields.

This is exactly what assert_bdrv_graph_writable() introduced in the block API splitup (patch 9 in v5) is there for, even though it is currently not checking for drains but only for main loop.

We could then use this assertion to effectively prove that some writes on a field/list are safe, and completely get rid of the aiocontext lock.
However, this is not an easy task: for example, if we look at the ->children and ->parents lists in BlockDriverState we can see that they are modified in BQL functions, but also read in I/O.
We therefore ideally need to add some drains (because in the current state assert_bdrv_graph_writable() with drains would fail).

The main function that modifies the ->children and ->parent lists is bdrv_replace_child_noperm.
So ideally we would like to drain both the old_bs and new_bs (the function moves a BdrvChild from one bs to another, modifying the respective lists).

A couple of question to answer:

- which drain to use? My answer would be bdrv_subtree_drain_* class of functions, because it takes care of draining the whole graph of the node, while bdrv_drained_* does not cover the child of the given node.
This theoretically simplifies the draining requirements, as we can just invoke subtree_drain_* on the two bs that are involved in bdrv_replace_child_noperm, and we should guarantee that the write is safe.

- where to add these drains? Inside the function or delegate to the caller?
According to d736f119da (and my unit tests), it is safe to modify the graph even side a bdrv_subtree_drained_begin/end() section.
Therefore, wrapping each call of bdrv_replace_child_noperm with a subtree_drain_begin/end is (or seems) perfectly valid.

Problems met so far (mostly solved):

1) consider that the drains use BDRV_WAIT_WHILE, which in turns unlocks the AioContext lock. This can create problems because not all caller take the lock, but could be easily fixed by introducing BDRV_WAIT_WHILE_UNLOCKED and bdrv_subtree_drain_begin/end_unlocked functions, but when running unit tests it is easy to find cases where the aiocontext is not always held. For example, in test_blockjob_common_drain_node (tests/unit/test-bdrv-drain.c):

    blk_insert_bs(blk_target, target, &error_abort);
    [...]
    aio_context_acquire(ctx);
    tjob = block_job_create("job0", &test_job_driver, NULL, src,
                            0, BLK_PERM_ALL,
                            0, 0, NULL, NULL, &error_abort);

Both functions eventually call bdrv_replace_child_noperm, but none one with the aiocontext lock held, another without.
In this case the solution is easy and helpful for our goal, since we are reducing the area that the aiocontext lock covers.

2) Some tests like tests/unit/test-bdrv-drain.c do not expect additional drains. Therefore we might have cases where a specific drain callback (in this case used for testing) is called way before it is expected to do so, because of the additional subtree drains.
Again also here we can simply modify the test to use the specific callback only when we actually need to use it. The test I am referring to is test_detach_by_driver_cb().

3) Transactions. I am currently struggling a lot with this, and need a little bit of help trying to figure out what is happening.
Basically the test test_update_perm_tree() in tests/unit/test-bdrv-graph-mod.c tests for permissions, but indirectly calls also the abort() procedure of the transactions.

The test performs the following (ignoring the permissions):
1. create a blockbackend blk
2. create a BlockdriverState "node" and "filter"
3. create BdrvChild edge "root" that represents blk -> node
4. create BdrvChild edge "child" that represents filter -> node

Current graph:
blk ------ root -------v
                      node
filter --- child ------^

5a. bdrv_append: modify "root" child to point blk -> filter
5b. bdrv_append: create BdrvChild edge "backing" that represents filter -> node (redundant edge)
5c. bdrv_append: refresh permissions, and as expected make bdrv_append fail.

Current graph:
blk ------- root --------v
                       filter
node <---- child --------+
 ^-------- backing ------+

At this point, the transaction procedure takes place to undo everything, and firstly it restores the BdrvChild "root" to point again to node, and then deletes "backing".
The problem here is that despite d736f119da, in this case in bdrv_replace_child_abort() moving an edge under subtree_drain* has side effects, leaving the quiesce_counter, recursive_counter and parent_counter of the various bs in the graph are not to zero. This is obviously due to edge movement between subtree_drained_begin and end, but I am not sure why the drain_saldo mechanism implemented in bdrv_replace_child_noperm is not effective in this case.

The failure is actually on the next step of the aborted transaction, bdrv_attach_child_common_abort(), but the root cause
is due to the non-zero counters left by bdrv_replace_child_abort().

Error message:
test-bdrv-graph-mod: ../block/io.c:63: bdrv_parent_drained_end_single_no_poll: Assertion `c->parent_quiesce_counter > 0' failed.

It is worth mentioning also that I know a way to fix this case,
and it is simply to not call
bdrv_subtree_drained_begin/end_unlocked(s->child->bs);
where s->child->bs is the filter bs in bdrv_replace_child_abort().
In this specific case, it would work correctly, leaving all counters
to zero once the drain ends, but I think it is not correct when/if
the BdrvChild is pointing into another separated graph, because we
would need to drain also that.

I even tried to reproduce this case with an unit test, but adding subtree_drain_begin/end around bdrv_append does not reproduce this issue.

So the questions in this RFC are:
- is this the right approach to remove the aiocontext lock? I think so
- are there better options?
- most importantly, any idea or suggestion on why this happens,
  and why when adding drains the quiesce counters are not properly restored in abort()?

This series is based on "job: replace AioContext lock with job_mutex".

Based-on: <20211104145334.1346363-1-eesposit@redhat.com>

Thank you in advance,
Emanuele

Emanuele Giuseppe Esposito (6):
  tests/unit/test-bdrv-drain.c: graph setup functions can't run in
    coroutines
  introduce BDRV_POLL_WHILE_UNLOCKED
  block/io.c: introduce bdrv_subtree_drained_{begin/end}_unlocked
  block.c: add subtree_drains where needed
  test-bdrv-drain.c: adapt test to the new subtree drains
  block/io.c: enable assert_bdrv_graph_writable

 block.c                            |  24 +++++
 block/io.c                         |  36 ++++++--
 include/block/block-global-state.h |   5 ++
 include/block/block-io.h           |   2 +
 tests/unit/test-bdrv-drain.c       | 136 ++++++++++++++++++-----------
 5 files changed, 145 insertions(+), 58 deletions(-)

-- 
2.31.1



^ permalink raw reply	[flat|nested] 12+ messages in thread

* [RFC PATCH 1/6] tests/unit/test-bdrv-drain.c: graph setup functions can't run in coroutines
  2021-12-13 10:40 [RFC PATCH 0/6] Removal of Aiocontext lock and usage of subtree drains in aborted transactions Emanuele Giuseppe Esposito
@ 2021-12-13 10:40 ` Emanuele Giuseppe Esposito
  2021-12-13 10:40 ` [RFC PATCH 2/6] introduce BDRV_POLL_WHILE_UNLOCKED Emanuele Giuseppe Esposito
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 12+ messages in thread
From: Emanuele Giuseppe Esposito @ 2021-12-13 10:40 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Fam Zheng, Emanuele Giuseppe Esposito, qemu-devel,
	Hanna Reitz, Stefan Hajnoczi, Paolo Bonzini

Graph initialization functions like blk_new(), bdrv_new() and so on
should not run in a coroutine. In fact, they might invoke a drain
(for example blk_insert_bs eventually calls bdrv_replace_child_noperm)
that in turn can invoke callbacks like bdrv_do_drained_begin_quiesce(),
that asserts exactly that we are not in a coroutine.

Move the initialization phase of test_drv_cb and test_quiesce_common
outside the coroutine logic.

Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
---
 tests/unit/test-bdrv-drain.c | 118 ++++++++++++++++++++++-------------
 1 file changed, 73 insertions(+), 45 deletions(-)

diff --git a/tests/unit/test-bdrv-drain.c b/tests/unit/test-bdrv-drain.c
index 83485a33aa..a62e6451a1 100644
--- a/tests/unit/test-bdrv-drain.c
+++ b/tests/unit/test-bdrv-drain.c
@@ -116,7 +116,8 @@ static void aio_ret_cb(void *opaque, int ret)
 }
 
 typedef struct CallInCoroutineData {
-    void (*entry)(void);
+    void (*entry)(void *);
+    void *arg;
     bool done;
 } CallInCoroutineData;
 
@@ -124,15 +125,16 @@ static coroutine_fn void call_in_coroutine_entry(void *opaque)
 {
     CallInCoroutineData *data = opaque;
 
-    data->entry();
+    data->entry(data->arg);
     data->done = true;
 }
 
-static void call_in_coroutine(void (*entry)(void))
+static void call_in_coroutine(void (*entry)(void *), void *arg)
 {
     Coroutine *co;
     CallInCoroutineData data = {
         .entry  = entry,
+        .arg    = arg,
         .done   = false,
     };
 
@@ -192,26 +194,28 @@ static void do_drain_end_unlocked(enum drain_type drain_type, BlockDriverState *
     }
 }
 
-static void test_drv_cb_common(enum drain_type drain_type, bool recursive)
-{
+typedef struct TestDriverCBData {
+    enum drain_type drain_type;
+    bool recursive;
     BlockBackend *blk;
     BlockDriverState *bs, *backing;
-    BDRVTestState *s, *backing_s;
+} TestDriverCBData;
+
+static void test_drv_cb_common(void *arg)
+{
+    TestDriverCBData *data = arg;
+    BlockBackend *blk = data->blk;
+    BlockDriverState *bs = data->bs;
+    BlockDriverState *backing = data->backing;
+    enum drain_type drain_type = data->drain_type;
+    bool recursive = data->recursive;
+    BDRVTestState *s = bs->opaque;
+    BDRVTestState *backing_s = backing->opaque;
     BlockAIOCB *acb;
     int aio_ret;
 
     QEMUIOVector qiov = QEMU_IOVEC_INIT_BUF(qiov, NULL, 0);
 
-    blk = blk_new(qemu_get_aio_context(), BLK_PERM_ALL, BLK_PERM_ALL);
-    bs = bdrv_new_open_driver(&bdrv_test, "test-node", BDRV_O_RDWR,
-                              &error_abort);
-    s = bs->opaque;
-    blk_insert_bs(blk, bs, &error_abort);
-
-    backing = bdrv_new_open_driver(&bdrv_test, "backing", 0, &error_abort);
-    backing_s = backing->opaque;
-    bdrv_set_backing_hd(bs, backing, &error_abort);
-
     /* Simple bdrv_drain_all_begin/end pair, check that CBs are called */
     g_assert_cmpint(s->drain_count, ==, 0);
     g_assert_cmpint(backing_s->drain_count, ==, 0);
@@ -245,54 +249,77 @@ static void test_drv_cb_common(enum drain_type drain_type, bool recursive)
 
     g_assert_cmpint(s->drain_count, ==, 0);
     g_assert_cmpint(backing_s->drain_count, ==, 0);
+}
 
-    bdrv_unref(backing);
-    bdrv_unref(bs);
-    blk_unref(blk);
+static void test_common_cb(enum drain_type drain_type, bool in_coroutine,
+                           void (*cb)(void *))
+{
+    TestDriverCBData data;
+
+    data.drain_type = drain_type;
+    data.recursive = (drain_type != BDRV_DRAIN);
+
+    data.blk = blk_new(qemu_get_aio_context(), BLK_PERM_ALL, BLK_PERM_ALL);
+    data.bs = bdrv_new_open_driver(&bdrv_test, "test-node", BDRV_O_RDWR,
+                              &error_abort);
+    blk_insert_bs(data.blk, data.bs, &error_abort);
+
+    data.backing = bdrv_new_open_driver(&bdrv_test, "backing", 0, &error_abort);
+    bdrv_set_backing_hd(data.bs, data.backing, &error_abort);
+
+    if (in_coroutine) {
+        call_in_coroutine(cb, &data);
+    } else {
+        cb(&data);
+    }
+
+    bdrv_unref(data.backing);
+    bdrv_unref(data.bs);
+    blk_unref(data.blk);
+}
+
+static void test_drv_cb(enum drain_type drain_type, bool in_coroutine)
+{
+    test_common_cb(drain_type, in_coroutine, test_drv_cb_common);
 }
 
 static void test_drv_cb_drain_all(void)
 {
-    test_drv_cb_common(BDRV_DRAIN_ALL, true);
+    test_drv_cb(BDRV_DRAIN_ALL, false);
 }
 
 static void test_drv_cb_drain(void)
 {
-    test_drv_cb_common(BDRV_DRAIN, false);
+    test_drv_cb(BDRV_DRAIN, false);
 }
 
 static void test_drv_cb_drain_subtree(void)
 {
-    test_drv_cb_common(BDRV_SUBTREE_DRAIN, true);
+    test_drv_cb(BDRV_SUBTREE_DRAIN, false);
 }
 
 static void test_drv_cb_co_drain_all(void)
 {
-    call_in_coroutine(test_drv_cb_drain_all);
+    test_drv_cb(BDRV_DRAIN_ALL, true);
 }
 
 static void test_drv_cb_co_drain(void)
 {
-    call_in_coroutine(test_drv_cb_drain);
+    test_drv_cb(BDRV_DRAIN, true);
 }
 
 static void test_drv_cb_co_drain_subtree(void)
 {
-    call_in_coroutine(test_drv_cb_drain_subtree);
+    test_drv_cb(BDRV_SUBTREE_DRAIN, true);
 }
 
-static void test_quiesce_common(enum drain_type drain_type, bool recursive)
+static void test_quiesce_common(void *arg)
 {
-    BlockBackend *blk;
-    BlockDriverState *bs, *backing;
-
-    blk = blk_new(qemu_get_aio_context(), BLK_PERM_ALL, BLK_PERM_ALL);
-    bs = bdrv_new_open_driver(&bdrv_test, "test-node", BDRV_O_RDWR,
-                              &error_abort);
-    blk_insert_bs(blk, bs, &error_abort);
-
-    backing = bdrv_new_open_driver(&bdrv_test, "backing", 0, &error_abort);
-    bdrv_set_backing_hd(bs, backing, &error_abort);
+    TestDriverCBData *data = arg;
+    BlockDriverState *bs = data->bs;
+    BlockDriverState *backing = data->backing;
+    enum drain_type drain_type = data->drain_type;
+    bool recursive = data->recursive;
 
     g_assert_cmpint(bs->quiesce_counter, ==, 0);
     g_assert_cmpint(backing->quiesce_counter, ==, 0);
@@ -306,40 +333,41 @@ static void test_quiesce_common(enum drain_type drain_type, bool recursive)
 
     g_assert_cmpint(bs->quiesce_counter, ==, 0);
     g_assert_cmpint(backing->quiesce_counter, ==, 0);
+}
 
-    bdrv_unref(backing);
-    bdrv_unref(bs);
-    blk_unref(blk);
+static void test_quiesce(enum drain_type drain_type, bool in_coroutine)
+{
+    test_common_cb(drain_type, in_coroutine, test_quiesce_common);
 }
 
 static void test_quiesce_drain_all(void)
 {
-    test_quiesce_common(BDRV_DRAIN_ALL, true);
+    test_quiesce(BDRV_DRAIN_ALL, false);
 }
 
 static void test_quiesce_drain(void)
 {
-    test_quiesce_common(BDRV_DRAIN, false);
+    test_quiesce(BDRV_DRAIN, false);
 }
 
 static void test_quiesce_drain_subtree(void)
 {
-    test_quiesce_common(BDRV_SUBTREE_DRAIN, true);
+    test_quiesce(BDRV_SUBTREE_DRAIN, false);
 }
 
 static void test_quiesce_co_drain_all(void)
 {
-    call_in_coroutine(test_quiesce_drain_all);
+    test_quiesce(BDRV_DRAIN_ALL, true);
 }
 
 static void test_quiesce_co_drain(void)
 {
-    call_in_coroutine(test_quiesce_drain);
+    test_quiesce(BDRV_DRAIN, true);
 }
 
 static void test_quiesce_co_drain_subtree(void)
 {
-    call_in_coroutine(test_quiesce_drain_subtree);
+    test_quiesce(BDRV_SUBTREE_DRAIN, true);
 }
 
 static void test_nested(void)
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [RFC PATCH 2/6] introduce BDRV_POLL_WHILE_UNLOCKED
  2021-12-13 10:40 [RFC PATCH 0/6] Removal of Aiocontext lock and usage of subtree drains in aborted transactions Emanuele Giuseppe Esposito
  2021-12-13 10:40 ` [RFC PATCH 1/6] tests/unit/test-bdrv-drain.c: graph setup functions can't run in coroutines Emanuele Giuseppe Esposito
@ 2021-12-13 10:40 ` Emanuele Giuseppe Esposito
  2021-12-13 10:40 ` [RFC PATCH 3/6] block/io.c: introduce bdrv_subtree_drained_{begin/end}_unlocked Emanuele Giuseppe Esposito
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 12+ messages in thread
From: Emanuele Giuseppe Esposito @ 2021-12-13 10:40 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Fam Zheng, Emanuele Giuseppe Esposito, qemu-devel,
	Hanna Reitz, Stefan Hajnoczi, Paolo Bonzini

Same as BDRV_POLL_WHILE, but uses AIO_WAIT_WHILE_UNLOCKED.

Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
---
 include/block/block-global-state.h | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/include/block/block-global-state.h b/include/block/block-global-state.h
index 7a6b065101..818164a06b 100644
--- a/include/block/block-global-state.h
+++ b/include/block/block-global-state.h
@@ -158,6 +158,11 @@ void bdrv_drain_all(void);
     AIO_WAIT_WHILE(bdrv_get_aio_context(bs_),              \
                    cond); })
 
+#define BDRV_POLL_WHILE_UNLOCKED(bs, cond) ({              \
+    BlockDriverState *bs_ = (bs);                          \
+    AIO_WAIT_WHILE_UNLOCKED(bdrv_get_aio_context(bs_),     \
+                            cond); })
+
 int bdrv_has_zero_init_1(BlockDriverState *bs);
 int bdrv_has_zero_init(BlockDriverState *bs);
 int bdrv_can_set_read_only(BlockDriverState *bs, bool read_only,
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [RFC PATCH 3/6] block/io.c: introduce bdrv_subtree_drained_{begin/end}_unlocked
  2021-12-13 10:40 [RFC PATCH 0/6] Removal of Aiocontext lock and usage of subtree drains in aborted transactions Emanuele Giuseppe Esposito
  2021-12-13 10:40 ` [RFC PATCH 1/6] tests/unit/test-bdrv-drain.c: graph setup functions can't run in coroutines Emanuele Giuseppe Esposito
  2021-12-13 10:40 ` [RFC PATCH 2/6] introduce BDRV_POLL_WHILE_UNLOCKED Emanuele Giuseppe Esposito
@ 2021-12-13 10:40 ` Emanuele Giuseppe Esposito
  2021-12-13 10:40 ` [RFC PATCH 4/6] block.c: add subtree_drains where needed Emanuele Giuseppe Esposito
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 12+ messages in thread
From: Emanuele Giuseppe Esposito @ 2021-12-13 10:40 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Fam Zheng, Emanuele Giuseppe Esposito, qemu-devel,
	Hanna Reitz, Stefan Hajnoczi, Paolo Bonzini

Same as the locked version, but use BDRV_POLL_UNLOCKED

Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
---
 block/io.c               | 35 ++++++++++++++++++++++++++---------
 include/block/block-io.h |  2 ++
 2 files changed, 28 insertions(+), 9 deletions(-)

diff --git a/block/io.c b/block/io.c
index 3be08cad29..a031691860 100644
--- a/block/io.c
+++ b/block/io.c
@@ -334,7 +334,7 @@ static bool bdrv_drain_poll_top_level(BlockDriverState *bs, bool recursive,
 
 static void bdrv_do_drained_begin(BlockDriverState *bs, bool recursive,
                                   BdrvChild *parent, bool ignore_bds_parents,
-                                  bool poll);
+                                  bool poll, bool unlock);
 static void bdrv_do_drained_end(BlockDriverState *bs, bool recursive,
                                 BdrvChild *parent, bool ignore_bds_parents,
                                 int *drained_end_counter);
@@ -352,7 +352,7 @@ static void bdrv_co_drain_bh_cb(void *opaque)
         if (data->begin) {
             assert(!data->drained_end_counter);
             bdrv_do_drained_begin(bs, data->recursive, data->parent,
-                                  data->ignore_bds_parents, data->poll);
+                                  data->ignore_bds_parents, data->poll, false);
         } else {
             assert(!data->poll);
             bdrv_do_drained_end(bs, data->recursive, data->parent,
@@ -441,7 +441,7 @@ void bdrv_do_drained_begin_quiesce(BlockDriverState *bs,
 
 static void bdrv_do_drained_begin(BlockDriverState *bs, bool recursive,
                                   BdrvChild *parent, bool ignore_bds_parents,
-                                  bool poll)
+                                  bool poll, bool unlock)
 {
     BdrvChild *child, *next;
 
@@ -458,7 +458,7 @@ static void bdrv_do_drained_begin(BlockDriverState *bs, bool recursive,
         bs->recursive_quiesce_counter++;
         QLIST_FOREACH_SAFE(child, &bs->children, next, next) {
             bdrv_do_drained_begin(child->bs, true, child, ignore_bds_parents,
-                                  false);
+                                  false, false);
         }
     }
 
@@ -473,18 +473,28 @@ static void bdrv_do_drained_begin(BlockDriverState *bs, bool recursive,
      */
     if (poll) {
         assert(!ignore_bds_parents);
-        BDRV_POLL_WHILE(bs, bdrv_drain_poll_top_level(bs, recursive, parent));
+        if (unlock) {
+            BDRV_POLL_WHILE_UNLOCKED(bs,
+                                     bdrv_drain_poll_top_level(bs, recursive, parent));
+        } else {
+            BDRV_POLL_WHILE(bs, bdrv_drain_poll_top_level(bs, recursive, parent));
+        }
     }
 }
 
 void bdrv_drained_begin(BlockDriverState *bs)
 {
-    bdrv_do_drained_begin(bs, false, NULL, false, true);
+    bdrv_do_drained_begin(bs, false, NULL, false, true, false);
 }
 
 void bdrv_subtree_drained_begin(BlockDriverState *bs)
 {
-    bdrv_do_drained_begin(bs, true, NULL, false, true);
+    bdrv_do_drained_begin(bs, true, NULL, false, true, false);
+}
+
+void bdrv_subtree_drained_begin_unlocked(BlockDriverState *bs)
+{
+    bdrv_do_drained_begin(bs, true, NULL, false, true, true);
 }
 
 /**
@@ -556,12 +566,19 @@ void bdrv_subtree_drained_end(BlockDriverState *bs)
     BDRV_POLL_WHILE(bs, qatomic_read(&drained_end_counter) > 0);
 }
 
+void bdrv_subtree_drained_end_unlocked(BlockDriverState *bs)
+{
+    int drained_end_counter = 0;
+    bdrv_do_drained_end(bs, true, NULL, false, &drained_end_counter);
+    BDRV_POLL_WHILE_UNLOCKED(bs, qatomic_read(&drained_end_counter) > 0);
+}
+
 void bdrv_apply_subtree_drain(BdrvChild *child, BlockDriverState *new_parent)
 {
     int i;
 
     for (i = 0; i < new_parent->recursive_quiesce_counter; i++) {
-        bdrv_do_drained_begin(child->bs, true, child, false, true);
+        bdrv_do_drained_begin(child->bs, true, child, false, true, false);
     }
 }
 
@@ -671,7 +688,7 @@ void bdrv_drain_all_begin(void)
         AioContext *aio_context = bdrv_get_aio_context(bs);
 
         aio_context_acquire(aio_context);
-        bdrv_do_drained_begin(bs, false, NULL, true, false);
+        bdrv_do_drained_begin(bs, false, NULL, true, false, false);
         aio_context_release(aio_context);
     }
 
diff --git a/include/block/block-io.h b/include/block/block-io.h
index e2789dd344..457e77bfce 100644
--- a/include/block/block-io.h
+++ b/include/block/block-io.h
@@ -249,6 +249,7 @@ void bdrv_do_drained_begin_quiesce(BlockDriverState *bs,
  * exclusive access to all child nodes as well.
  */
 void bdrv_subtree_drained_begin(BlockDriverState *bs);
+void bdrv_subtree_drained_begin_unlocked(BlockDriverState *bs);
 
 /**
  * bdrv_drained_end:
@@ -281,6 +282,7 @@ void bdrv_drained_end_no_poll(BlockDriverState *bs, int *drained_end_counter);
  * End a quiescent section started by bdrv_subtree_drained_begin().
  */
 void bdrv_subtree_drained_end(BlockDriverState *bs);
+void bdrv_subtree_drained_end_unlocked(BlockDriverState *bs);
 
 bool bdrv_can_store_new_dirty_bitmap(BlockDriverState *bs, const char *name,
                                      uint32_t granularity, Error **errp);
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [RFC PATCH 4/6] block.c: add subtree_drains where needed
  2021-12-13 10:40 [RFC PATCH 0/6] Removal of Aiocontext lock and usage of subtree drains in aborted transactions Emanuele Giuseppe Esposito
                   ` (2 preceding siblings ...)
  2021-12-13 10:40 ` [RFC PATCH 3/6] block/io.c: introduce bdrv_subtree_drained_{begin/end}_unlocked Emanuele Giuseppe Esposito
@ 2021-12-13 10:40 ` Emanuele Giuseppe Esposito
  2021-12-13 10:40 ` [RFC PATCH 5/6] test-bdrv-drain.c: adapt test to the new subtree drains Emanuele Giuseppe Esposito
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 12+ messages in thread
From: Emanuele Giuseppe Esposito @ 2021-12-13 10:40 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Fam Zheng, Emanuele Giuseppe Esposito, qemu-devel,
	Hanna Reitz, Stefan Hajnoczi, Paolo Bonzini

Protect bdrv_replace_child_noperm, as it modifies the
graph by adding/removing elements to .children and .parents
list of a bs.

Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
---
 block.c | 24 ++++++++++++++++++++++++
 1 file changed, 24 insertions(+)

diff --git a/block.c b/block.c
index 3c3c90704c..1aa9e51a98 100644
--- a/block.c
+++ b/block.c
@@ -2369,7 +2369,11 @@ static void bdrv_replace_child_abort(void *opaque)
      * So whether new_bs was NULL or not, we cannot pass s->childp here; and in
      * any case, there is no reason to pass it anyway.
      */
+    bdrv_subtree_drained_begin_unlocked(s->child->bs);
+    bdrv_subtree_drained_begin_unlocked(s->old_bs);
     bdrv_replace_child_noperm(&s->child, s->old_bs, true);
+    bdrv_subtree_drained_end_unlocked(s->old_bs);
+    bdrv_subtree_drained_end_unlocked(s->child->bs);
     /*
      * The child was pre-existing, so s->old_bs must be non-NULL, and
      * s->child thus must not have been freed
@@ -2427,13 +2431,20 @@ static void bdrv_replace_child_tran(BdrvChild **childp,
 
     if (new_bs) {
         bdrv_ref(new_bs);
+        bdrv_subtree_drained_begin_unlocked(new_bs);
     }
     /*
      * Pass free_empty_child=false, we will free the child (if
      * necessary) in bdrv_replace_child_commit() (if our
      * @free_empty_child parameter was true).
      */
+    bdrv_subtree_drained_begin_unlocked(s->old_bs);
     bdrv_replace_child_noperm(childp, new_bs, false);
+    bdrv_subtree_drained_end_unlocked(s->old_bs);
+
+    if (new_bs) {
+        bdrv_subtree_drained_end_unlocked(new_bs);
+    }
     /* old_bs reference is transparently moved from *childp to @s */
 }
 
@@ -2951,7 +2962,9 @@ static void bdrv_attach_child_common_abort(void *opaque)
      * need to keep it as an empty shell (after this function, it will
      * not be attached to any parent, and it will not have a .bs).
      */
+    bdrv_subtree_drained_begin_unlocked(bs);
     bdrv_replace_child_noperm(s->child, NULL, false);
+    bdrv_subtree_drained_end_unlocked(bs);
 
     if (bdrv_get_aio_context(bs) != s->old_child_ctx) {
         bdrv_try_set_aio_context(bs, s->old_child_ctx, &error_abort);
@@ -3051,7 +3064,10 @@ static int bdrv_attach_child_common(BlockDriverState *child_bs,
     }
 
     bdrv_ref(child_bs);
+
+    bdrv_subtree_drained_begin_unlocked(child_bs);
     bdrv_replace_child_noperm(&new_child, child_bs, true);
+    bdrv_subtree_drained_end_unlocked(child_bs);
     /* child_bs was non-NULL, so new_child must not have been freed */
     assert(new_child != NULL);
 
@@ -3114,8 +3130,16 @@ static void bdrv_detach_child(BdrvChild **childp)
     BlockDriverState *old_bs = (*childp)->bs;
 
     assert(qemu_in_main_thread());
+    if (old_bs) {
+        bdrv_subtree_drained_begin(old_bs);
+    }
+
     bdrv_replace_child_noperm(childp, NULL, true);
 
+    if (old_bs) {
+        bdrv_subtree_drained_end(old_bs);
+    }
+
     if (old_bs) {
         /*
          * Update permissions for old node. We're just taking a parent away, so
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [RFC PATCH 5/6] test-bdrv-drain.c: adapt test to the new subtree drains
  2021-12-13 10:40 [RFC PATCH 0/6] Removal of Aiocontext lock and usage of subtree drains in aborted transactions Emanuele Giuseppe Esposito
                   ` (3 preceding siblings ...)
  2021-12-13 10:40 ` [RFC PATCH 4/6] block.c: add subtree_drains where needed Emanuele Giuseppe Esposito
@ 2021-12-13 10:40 ` Emanuele Giuseppe Esposito
  2021-12-13 10:40 ` [RFC PATCH 6/6] block/io.c: enable assert_bdrv_graph_writable Emanuele Giuseppe Esposito
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 12+ messages in thread
From: Emanuele Giuseppe Esposito @ 2021-12-13 10:40 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Fam Zheng, Emanuele Giuseppe Esposito, qemu-devel,
	Hanna Reitz, Stefan Hajnoczi, Paolo Bonzini

There are a couple of problems in this test when we add
subtree drains in bdrv_replace_child_noperm:

- First of all, inconsistency between block_job_create under
aiocontext lock that internally calls blk_insert_bs and therefore
bdrv_replace_child_noperm, and blk_insert_bs that is called two lines
above in the same test without aiocontext. Since we use the
unlocked subtree_drain, we want to move the aiocontext further
down.

- test_detach_by_parent_cb: this test uses a callback of an I/O
function (blk_aio_preadv) to modify the grah, using bdrv_attach_child.
This is simply not allowed anymore. I/O cannot change the graph.

- test_detach_indirect: here it is simply a matter of wrong callbacks
used. In the original test, there was only a subtree drain, so
overriding .drained_begin was not a problem. Now that we have
additional subtree drains, we risk to call the test callback
to early, or multiple times. We do not want that, so override
the callback only when we actually want to use it.

Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
---
 tests/unit/test-bdrv-drain.c | 18 ++++++++++++++----
 1 file changed, 14 insertions(+), 4 deletions(-)

diff --git a/tests/unit/test-bdrv-drain.c b/tests/unit/test-bdrv-drain.c
index a62e6451a1..9414fc68b7 100644
--- a/tests/unit/test-bdrv-drain.c
+++ b/tests/unit/test-bdrv-drain.c
@@ -939,10 +939,10 @@ static void test_blockjob_common_drain_node(enum drain_type drain_type,
     blk_insert_bs(blk_target, target, &error_abort);
     blk_set_allow_aio_context_change(blk_target, true);
 
-    aio_context_acquire(ctx);
     tjob = block_job_create("job0", &test_job_driver, NULL, src,
                             0, BLK_PERM_ALL,
                             0, 0, NULL, NULL, &error_abort);
+    aio_context_acquire(ctx);
     job = &tjob->common;
     block_job_add_bdrv(job, "target", target, 0, BLK_PERM_ALL, &error_abort);
 
@@ -1388,8 +1388,6 @@ static void test_detach_indirect(bool by_parent_cb)
 
     if (!by_parent_cb) {
         detach_by_driver_cb_class = child_of_bds;
-        detach_by_driver_cb_class.drained_begin =
-            detach_by_driver_cb_drained_begin;
     }
 
     /* Create all involved nodes */
@@ -1447,6 +1445,12 @@ static void test_detach_indirect(bool by_parent_cb)
     acb = blk_aio_preadv(blk, 0, &qiov, 0, detach_by_parent_aio_cb, NULL);
     g_assert(acb != NULL);
 
+    if (!by_parent_cb) {
+        /* set .drained_begin cb to run only in the following drain. */
+        detach_by_driver_cb_class.drained_begin =
+            detach_by_driver_cb_drained_begin;
+    }
+
     /* Drain and check the expected result */
     bdrv_subtree_drained_begin(parent_b);
 
@@ -1470,6 +1474,12 @@ static void test_detach_indirect(bool by_parent_cb)
 
     bdrv_subtree_drained_end(parent_b);
 
+    if (!by_parent_cb) {
+        /* restore .drained_begin cb, we don't need it anymore. */
+        detach_by_driver_cb_class.drained_begin =
+            child_of_bds.drained_begin;
+    }
+
     bdrv_unref(parent_b);
     blk_unref(blk);
 
@@ -1483,7 +1493,7 @@ static void test_detach_indirect(bool by_parent_cb)
 
 static void test_detach_by_parent_cb(void)
 {
-    test_detach_indirect(true);
+    /* test_detach_indirect(true); */
 }
 
 static void test_detach_by_driver_cb(void)
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [RFC PATCH 6/6] block/io.c: enable assert_bdrv_graph_writable
  2021-12-13 10:40 [RFC PATCH 0/6] Removal of Aiocontext lock and usage of subtree drains in aborted transactions Emanuele Giuseppe Esposito
                   ` (4 preceding siblings ...)
  2021-12-13 10:40 ` [RFC PATCH 5/6] test-bdrv-drain.c: adapt test to the new subtree drains Emanuele Giuseppe Esposito
@ 2021-12-13 10:40 ` Emanuele Giuseppe Esposito
  2021-12-13 14:52 ` [RFC PATCH 0/6] Removal of Aiocontext lock and usage of subtree drains in aborted transactions Stefan Hajnoczi
  2021-12-14 16:47 ` Stefan Hajnoczi
  7 siblings, 0 replies; 12+ messages in thread
From: Emanuele Giuseppe Esposito @ 2021-12-13 10:40 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Fam Zheng, Emanuele Giuseppe Esposito, qemu-devel,
	Hanna Reitz, Stefan Hajnoczi, Paolo Bonzini

Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
---
 block/io.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/block/io.c b/block/io.c
index a031691860..c2f1a494c4 100644
--- a/block/io.c
+++ b/block/io.c
@@ -759,6 +759,7 @@ void assert_bdrv_graph_writable(BlockDriverState *bs)
      * Once the necessary drains are added,
      * assert also for qatomic_read(&bs->quiesce_counter) > 0
      */
+    assert(qatomic_read(&bs->quiesce_counter) > 0);
     assert(qemu_in_main_thread());
 }
 
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [RFC PATCH 0/6] Removal of Aiocontext lock and usage of subtree drains in aborted transactions
  2021-12-13 10:40 [RFC PATCH 0/6] Removal of Aiocontext lock and usage of subtree drains in aborted transactions Emanuele Giuseppe Esposito
                   ` (5 preceding siblings ...)
  2021-12-13 10:40 ` [RFC PATCH 6/6] block/io.c: enable assert_bdrv_graph_writable Emanuele Giuseppe Esposito
@ 2021-12-13 14:52 ` Stefan Hajnoczi
  2021-12-14 18:10   ` Emanuele Giuseppe Esposito
  2021-12-14 16:47 ` Stefan Hajnoczi
  7 siblings, 1 reply; 12+ messages in thread
From: Stefan Hajnoczi @ 2021-12-13 14:52 UTC (permalink / raw)
  To: Emanuele Giuseppe Esposito
  Cc: Kevin Wolf, Fam Zheng, qemu-block, qemu-devel, Hanna Reitz,
	Paolo Bonzini

[-- Attachment #1: Type: text/plain, Size: 9810 bytes --]

On Mon, Dec 13, 2021 at 05:40:08AM -0500, Emanuele Giuseppe Esposito wrote:
> Hello everyone,
> 
> As you know already, my current goal is to try to remove the AioContext lock from the QEMU block layer.
> Currently the AioContext is used pretty much throughout the whole block layer, it is a little bit confusing to understand what it exactly protects, and I am starting to think that in some places it is being taken just because of the block API assumptions.
> For example, some functions like AIO_WAIT_WHILE() release the lock with the assumption that it is always held, so all callers must take it just to allow the function to release it.
> 
> Removing the aiocontext lock is not a straightforward task: the first step is to understand which function is running in the main loop thus under the BQL (Big Qemu Lock) and which is used by the iothreads. We call the former category global state (GS) and the latter I/O.
> 
> The patch series "block layer: split block APIs in global state and I/O" aims to do that. Once we can at least (roughly) distinguish what is called by iothreads and what from the main loop, we can start analyzing what needs protection and what doesn't. This series is particularly helpful because by splitting the API we know where each function runs, so it helps us identifying the cases where both the main loop and iothreads read/write the same value/field (and thus need protection) and cases where the same function is used only by the main loop for example, so it shouldn't need protection.
> For example, if some BlockDriverState field is read by I/O threads but modified in a GS function, this has to be protected in some way.
> 
> Another series I posted, "job: replace AioContext lock with job_mutex", provides a good example on how the AioContext lock can be removed and simply replaced by a fine grained lock.
> 
> Another way to have thread safety in the AioContext is to rely to the fact that in some cases, writings to a field are always done in the main loop *and* under drains. In this way, we know that no request is coming to the I/O threads, so we can safely modify the fields.
> 
> This is exactly what assert_bdrv_graph_writable() introduced in the block API splitup (patch 9 in v5) is there for, even though it is currently not checking for drains but only for main loop.
> 
> We could then use this assertion to effectively prove that some writes on a field/list are safe, and completely get rid of the aiocontext lock.
> However, this is not an easy task: for example, if we look at the ->children and ->parents lists in BlockDriverState we can see that they are modified in BQL functions, but also read in I/O.
> We therefore ideally need to add some drains (because in the current state assert_bdrv_graph_writable() with drains would fail).
> 
> The main function that modifies the ->children and ->parent lists is bdrv_replace_child_noperm.
> So ideally we would like to drain both the old_bs and new_bs (the function moves a BdrvChild from one bs to another, modifying the respective lists).
> 
> A couple of question to answer:
> 
> - which drain to use? My answer would be bdrv_subtree_drain_* class of functions, because it takes care of draining the whole graph of the node, while bdrv_drained_* does not cover the child of the given node.
> This theoretically simplifies the draining requirements, as we can just invoke subtree_drain_* on the two bs that are involved in bdrv_replace_child_noperm, and we should guarantee that the write is safe.

Off-topic: I don't understand the difference between the effects of
bdrv_drained_begin() and bdrv_subtree_drained_begin(). Both call
aio_disable_external(aio_context) and aio_poll(). bdrv_drained_begin()
only polls parents and itself, while bdrv_subtree_drained_begin() also
polls children. But why does that distinction matter? I wouldn't know
when to use one over the other.

On-topic: aio_disable_external() does not notify the AioContext. We
probably get away with it since the AioContext lock is currently held,
but it will be necessary to notify the AioContext so it disables
external handlers when the lock is not held:

  static inline void aio_disable_external(AioContext *ctx)
  {
      qatomic_inc(&ctx->external_disable_cnt);
      <--- missing aio_notify() since the AioContext needs to
           re-evaluate handlers
  }

> - where to add these drains? Inside the function or delegate to the caller?
> According to d736f119da (and my unit tests), it is safe to modify the graph even side a bdrv_subtree_drained_begin/end() section.
> Therefore, wrapping each call of bdrv_replace_child_noperm with a subtree_drain_begin/end is (or seems) perfectly valid.
> 
> Problems met so far (mostly solved):
> 
> 1) consider that the drains use BDRV_WAIT_WHILE, which in turns unlocks the AioContext lock. This can create problems because not all caller take the lock, but could be easily fixed by introducing BDRV_WAIT_WHILE_UNLOCKED and bdrv_subtree_drain_begin/end_unlocked functions, but when running unit tests it is easy to find cases where the aiocontext is not always held. For example, in test_blockjob_common_drain_node (tests/unit/test-bdrv-drain.c):
> 
>     blk_insert_bs(blk_target, target, &error_abort);
>     [...]
>     aio_context_acquire(ctx);
>     tjob = block_job_create("job0", &test_job_driver, NULL, src,
>                             0, BLK_PERM_ALL,
>                             0, 0, NULL, NULL, &error_abort);
> 
> Both functions eventually call bdrv_replace_child_noperm, but none one with the aiocontext lock held, another without.
> In this case the solution is easy and helpful for our goal, since we are reducing the area that the aiocontext lock covers.
> 
> 2) Some tests like tests/unit/test-bdrv-drain.c do not expect additional drains. Therefore we might have cases where a specific drain callback (in this case used for testing) is called way before it is expected to do so, because of the additional subtree drains.
> Again also here we can simply modify the test to use the specific callback only when we actually need to use it. The test I am referring to is test_detach_by_driver_cb().

I'm not sure what this means but some tests make assumptions about
internals. They are fragile. Modifying the test sounds reasonable.

> 
> 3) Transactions. I am currently struggling a lot with this, and need a little bit of help trying to figure out what is happening.
> Basically the test test_update_perm_tree() in tests/unit/test-bdrv-graph-mod.c tests for permissions, but indirectly calls also the abort() procedure of the transactions.
> 
> The test performs the following (ignoring the permissions):
> 1. create a blockbackend blk
> 2. create a BlockdriverState "node" and "filter"
> 3. create BdrvChild edge "root" that represents blk -> node
> 4. create BdrvChild edge "child" that represents filter -> node
> 
> Current graph:
> blk ------ root -------v
>                       node
> filter --- child ------^
> 
> 5a. bdrv_append: modify "root" child to point blk -> filter
> 5b. bdrv_append: create BdrvChild edge "backing" that represents filter -> node (redundant edge)
> 5c. bdrv_append: refresh permissions, and as expected make bdrv_append fail.
> 
> Current graph:
> blk ------- root --------v
>                        filter
> node <---- child --------+
>  ^-------- backing ------+
> 
> At this point, the transaction procedure takes place to undo everything, and firstly it restores the BdrvChild "root" to point again to node, and then deletes "backing".
> The problem here is that despite d736f119da, in this case in bdrv_replace_child_abort() moving an edge under subtree_drain* has side effects, leaving the quiesce_counter, recursive_counter and parent_counter of the various bs in the graph are not to zero. This is obviously due to edge movement between subtree_drained_begin and end, but I am not sure why the drain_saldo mechanism implemented in bdrv_replace_child_noperm is not effective in this case.
> 
> The failure is actually on the next step of the aborted transaction, bdrv_attach_child_common_abort(), but the root cause
> is due to the non-zero counters left by bdrv_replace_child_abort().
> 
> Error message:
> test-bdrv-graph-mod: ../block/io.c:63: bdrv_parent_drained_end_single_no_poll: Assertion `c->parent_quiesce_counter > 0' failed.
> 
> It is worth mentioning also that I know a way to fix this case,
> and it is simply to not call
> bdrv_subtree_drained_begin/end_unlocked(s->child->bs);
> where s->child->bs is the filter bs in bdrv_replace_child_abort().
> In this specific case, it would work correctly, leaving all counters
> to zero once the drain ends, but I think it is not correct when/if
> the BdrvChild is pointing into another separated graph, because we
> would need to drain also that.
> 
> I even tried to reproduce this case with an unit test, but adding subtree_drain_begin/end around bdrv_append does not reproduce this issue.
> 
> So the questions in this RFC are:
> - is this the right approach to remove the aiocontext lock? I think so

Yes, I think using drained sections to quiesce I/O is the right choice.

> - are there better options?

I/O needs to quiesce, at least in some cases, so I don't think we can
avoid drained sections. It may be possible to implement other solutions
on a case-by-case basis, but it would be more complex and still wouldn't
get rid of some of the drained sections that are definitely needed.

> - most importantly, any idea or suggestion on why this happens,
>   and why when adding drains the quiesce counters are not properly restored in abort()?

Maybe Kevin has an idea here. He wrote bdrv_subtree_drained_begin().

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC PATCH 0/6] Removal of Aiocontext lock and usage of subtree drains in aborted transactions
  2021-12-13 10:40 [RFC PATCH 0/6] Removal of Aiocontext lock and usage of subtree drains in aborted transactions Emanuele Giuseppe Esposito
                   ` (6 preceding siblings ...)
  2021-12-13 14:52 ` [RFC PATCH 0/6] Removal of Aiocontext lock and usage of subtree drains in aborted transactions Stefan Hajnoczi
@ 2021-12-14 16:47 ` Stefan Hajnoczi
  7 siblings, 0 replies; 12+ messages in thread
From: Stefan Hajnoczi @ 2021-12-14 16:47 UTC (permalink / raw)
  To: Emanuele Giuseppe Esposito
  Cc: Kevin Wolf, Fam Zheng, qemu-block, qemu-devel, Hanna Reitz,
	Paolo Bonzini

[-- Attachment #1: Type: text/plain, Size: 200 bytes --]

Ignore what I said about a missing aio_notify() call in
aio_disable_external(). Skipping the call is an optimization and it's
safe. We only need to call aio_notify() in aio_enable_external().

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC PATCH 0/6] Removal of Aiocontext lock and usage of subtree drains in aborted transactions
  2021-12-13 14:52 ` [RFC PATCH 0/6] Removal of Aiocontext lock and usage of subtree drains in aborted transactions Stefan Hajnoczi
@ 2021-12-14 18:10   ` Emanuele Giuseppe Esposito
  2021-12-15 12:34     ` Hanna Reitz
  0 siblings, 1 reply; 12+ messages in thread
From: Emanuele Giuseppe Esposito @ 2021-12-14 18:10 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Kevin Wolf, Fam Zheng, qemu-block, qemu-devel, Hanna Reitz,
	Paolo Bonzini



On 13/12/2021 15:52, Stefan Hajnoczi wrote:
> Off-topic: I don't understand the difference between the effects of
> bdrv_drained_begin() and bdrv_subtree_drained_begin(). Both call
> aio_disable_external(aio_context) and aio_poll(). bdrv_drained_begin()
> only polls parents and itself, while bdrv_subtree_drained_begin() also
> polls children. But why does that distinction matter? I wouldn't know
> when to use one over the other.

Good point. Now I am wondering the same, so it would be great if anyone 
could clarify it.

Emanuele



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC PATCH 0/6] Removal of Aiocontext lock and usage of subtree drains in aborted transactions
  2021-12-14 18:10   ` Emanuele Giuseppe Esposito
@ 2021-12-15 12:34     ` Hanna Reitz
  2021-12-16 10:37       ` Kevin Wolf
  0 siblings, 1 reply; 12+ messages in thread
From: Hanna Reitz @ 2021-12-15 12:34 UTC (permalink / raw)
  To: Emanuele Giuseppe Esposito, Stefan Hajnoczi
  Cc: Kevin Wolf, Paolo Bonzini, Fam Zheng, qemu-devel, qemu-block

On 14.12.21 19:10, Emanuele Giuseppe Esposito wrote:
>
>
> On 13/12/2021 15:52, Stefan Hajnoczi wrote:
>> Off-topic: I don't understand the difference between the effects of
>> bdrv_drained_begin() and bdrv_subtree_drained_begin(). Both call
>> aio_disable_external(aio_context) and aio_poll(). bdrv_drained_begin()
>> only polls parents and itself, while bdrv_subtree_drained_begin() also
>> polls children. But why does that distinction matter? I wouldn't know
>> when to use one over the other.
>
> Good point. Now I am wondering the same, so it would be great if 
> anyone could clarify it.

As far as I understand, bdrv_drained_begin() is used to drain and stop 
requests on a single BDS, whereas bdrv_subtree_drained_begin() drains 
the BDS and all of its children.  So when you don’t care about lingering 
requests in child nodes, then bdrv_drained_begin() suffices.

Hanna



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC PATCH 0/6] Removal of Aiocontext lock and usage of subtree drains in aborted transactions
  2021-12-15 12:34     ` Hanna Reitz
@ 2021-12-16 10:37       ` Kevin Wolf
  0 siblings, 0 replies; 12+ messages in thread
From: Kevin Wolf @ 2021-12-16 10:37 UTC (permalink / raw)
  To: Hanna Reitz
  Cc: Emanuele Giuseppe Esposito, Fam Zheng, qemu-block, qemu-devel,
	Stefan Hajnoczi, Paolo Bonzini

Am 15.12.2021 um 13:34 hat Hanna Reitz geschrieben:
> On 14.12.21 19:10, Emanuele Giuseppe Esposito wrote:
> > 
> > 
> > On 13/12/2021 15:52, Stefan Hajnoczi wrote:
> > > Off-topic: I don't understand the difference between the effects of
> > > bdrv_drained_begin() and bdrv_subtree_drained_begin(). Both call
> > > aio_disable_external(aio_context) and aio_poll(). bdrv_drained_begin()
> > > only polls parents and itself, while bdrv_subtree_drained_begin() also
> > > polls children. But why does that distinction matter? I wouldn't know
> > > when to use one over the other.
> > 
> > Good point. Now I am wondering the same, so it would be great if anyone
> > could clarify it.
> 
> As far as I understand, bdrv_drained_begin() is used to drain and stop
> requests on a single BDS, whereas bdrv_subtree_drained_begin() drains the
> BDS and all of its children.  So when you don’t care about lingering
> requests in child nodes, then bdrv_drained_begin() suffices.

Right. This is different in practice when a child node has multiple
parents. Usually, when you want to quiesce one parent, the other parent
can keep using the child without being in the way.

For example, two qcow2 overlays based on a single template:

    vda             vdb
     |               |
     v               v
   qcow2           qcow2
(vda.qcow2)     (vdb.qcow2)
     |               |
     +-----+   +-----+
           |   |
           v   v
           qcow2
      (template.qcow2)

If you drain vdb.qcow2 because you want to safely modify something in
its BlockDriverState, there is nothing that should stop vda.qcow2 from
processing requests.

If you're not sure which one to use, bdrv_drained_begin() is what you
want. If you want bdrv_subtree_drained_begin(), you'll know. (It's
currently only used by reopen and by drop_intermediates, which both
operate on more than one node.)

Kevin



^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2021-12-16 10:39 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-12-13 10:40 [RFC PATCH 0/6] Removal of Aiocontext lock and usage of subtree drains in aborted transactions Emanuele Giuseppe Esposito
2021-12-13 10:40 ` [RFC PATCH 1/6] tests/unit/test-bdrv-drain.c: graph setup functions can't run in coroutines Emanuele Giuseppe Esposito
2021-12-13 10:40 ` [RFC PATCH 2/6] introduce BDRV_POLL_WHILE_UNLOCKED Emanuele Giuseppe Esposito
2021-12-13 10:40 ` [RFC PATCH 3/6] block/io.c: introduce bdrv_subtree_drained_{begin/end}_unlocked Emanuele Giuseppe Esposito
2021-12-13 10:40 ` [RFC PATCH 4/6] block.c: add subtree_drains where needed Emanuele Giuseppe Esposito
2021-12-13 10:40 ` [RFC PATCH 5/6] test-bdrv-drain.c: adapt test to the new subtree drains Emanuele Giuseppe Esposito
2021-12-13 10:40 ` [RFC PATCH 6/6] block/io.c: enable assert_bdrv_graph_writable Emanuele Giuseppe Esposito
2021-12-13 14:52 ` [RFC PATCH 0/6] Removal of Aiocontext lock and usage of subtree drains in aborted transactions Stefan Hajnoczi
2021-12-14 18:10   ` Emanuele Giuseppe Esposito
2021-12-15 12:34     ` Hanna Reitz
2021-12-16 10:37       ` Kevin Wolf
2021-12-14 16:47 ` Stefan Hajnoczi

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.