qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v3 00/10] block: Delay poll when ending drained sections
@ 2019-07-19  9:26 Max Reitz
  2019-07-19  9:26 ` [Qemu-devel] [PATCH v3 01/10] block: Introduce BdrvChild.parent_quiesce_counter Max Reitz
                   ` (10 more replies)
  0 siblings, 11 replies; 12+ messages in thread
From: Max Reitz @ 2019-07-19  9:26 UTC (permalink / raw)
  To: qemu-block; +Cc: Kevin Wolf, qemu-devel, Stefan Hajnoczi, Max Reitz

Hi,

This series:

(1) Keeps patch 1, as the previous series, and

(2) Decides whether all *drained_end* functions should poll or not; as
    proposed by Kevin, all that should not poll now get a
    @drained_end_counter pointer, whose pointee they have to increment
    once for every background operation scheduled, and that background
    operation will decrement it once it settles.
    This allows functions that should poll to do so until the counter
    reaches 0, so they don’t have to poll after scheduling every single
    operation but can do so once in a place where it’s safe.

v3:
- Change the design as described above (drained_end_counter instead of a
  list of BdrvCoDrainData objects to poll)
- Added a test to test-bdrv-drain


git-backport-diff against v2:

Key:
[----] : patches are identical
[####] : number of functional differences between upstream/downstream patch
[down] : patch is downstream-only
The flags [FC] indicate (F)unctional and (C)ontextual differences, respectively

001/10:[----] [--] 'block: Introduce BdrvChild.parent_quiesce_counter'
002/10:[down] 'tests: Add job commit by drained_end test'
003/10:[down] 'block: Add @drained_end_counter to bdrv_drain_invoke()'
004/10:[down] 'block: Make bdrv_parent_drained_[^_]*() static'
005/10:[down] 'tests: Lock AioContexts in test-block-iothread'
006/10:[down] 'block: Only poll once in bdrv_drained_end()'
007/10:[down] 'tests: Extend commit by drained_end test'
008/10:[down] 'block: Loop unsafely in bdrv*drained_end()'
009/10:[----] [--] 'iotests: Add @has_quit to vm.shutdown()'
010/10:[----] [--] 'iotests: Test commit with a filter on the chain'


Max Reitz (10):
  block: Introduce BdrvChild.parent_quiesce_counter
  tests: Add job commit by drained_end test
  block: Add @drained_end_counter
  block: Make bdrv_parent_drained_[^_]*() static
  tests: Lock AioContexts in test-block-iothread
  block: Do not poll in bdrv_do_drained_end()
  tests: Extend commit by drained_end test
  block: Loop unsafely in bdrv*drained_end()
  iotests: Add @has_quit to vm.shutdown()
  iotests: Test commit with a filter on the chain

 include/block/block.h       |  42 +++++++----
 include/block/block_int.h   |  15 +++-
 block.c                     |  52 ++++++++-----
 block/block-backend.c       |   6 +-
 block/io.c                  | 134 +++++++++++++++++++++++---------
 blockjob.c                  |   2 +-
 tests/test-bdrv-drain.c     | 147 ++++++++++++++++++++++++++++++++++++
 tests/test-block-iothread.c |  40 ++++++----
 python/qemu/machine.py      |   5 +-
 tests/qemu-iotests/040      |  40 +++++++++-
 tests/qemu-iotests/040.out  |   4 +-
 tests/qemu-iotests/255      |   2 +-
 12 files changed, 397 insertions(+), 92 deletions(-)

-- 
2.21.0



^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Qemu-devel] [PATCH v3 01/10] block: Introduce BdrvChild.parent_quiesce_counter
  2019-07-19  9:26 [Qemu-devel] [PATCH v3 00/10] block: Delay poll when ending drained sections Max Reitz
@ 2019-07-19  9:26 ` Max Reitz
  2019-07-19  9:26 ` [Qemu-devel] [PATCH v3 02/10] tests: Add job commit by drained_end test Max Reitz
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: Max Reitz @ 2019-07-19  9:26 UTC (permalink / raw)
  To: qemu-block; +Cc: Kevin Wolf, qemu-devel, Stefan Hajnoczi, Max Reitz

Commit 5cb2737e925042e6c7cd3fb0b01313950b03cddf laid out why
bdrv_do_drained_end() must decrement the quiesce_counter after
bdrv_drain_invoke().  It did not give a very good reason why it has to
happen after bdrv_parent_drained_end(), instead only claiming symmetry
to bdrv_do_drained_begin().

It turns out that delaying it for so long is wrong.

Situation: We have an active commit job (i.e. a mirror job) from top to
base for the following graph:

                  filter
                    |
                  [file]
                    |
                    v
top --[backing]--> base

Now the VM is closed, which results in the job being cancelled and a
bdrv_drain_all() happening pretty much simultaneously.

Beginning the drain means the job is paused once whenever one of its
nodes is quiesced.  This is reversed when the drain ends.

With how the code currently is, after base's drain ends (which means
that it will have unpaused the job once), its quiesce_counter remains at
1 while it goes to undrain its parents (bdrv_parent_drained_end()).  For
some reason or another, undraining filter causes the job to be kicked
and enter mirror_exit_common(), where it proceeds to invoke
block_job_remove_all_bdrv().

Now base will be detached from the job.  Because its quiesce_counter is
still 1, it will unpause the job once more.  So in total, undraining
base will unpause the job twice.  Eventually, this will lead to the
job's pause_count going negative -- well, it would, were there not an
assertion against this, which crashes qemu.

The general problem is that if in bdrv_parent_drained_end() we undrain
parent A, and then undrain parent B, which then leads to A detaching the
child, bdrv_replace_child_noperm() will undrain A as if we had not done
so yet; that is, one time too many.

It follows that we cannot decrement the quiesce_counter after invoking
bdrv_parent_drained_end().

Unfortunately, decrementing it before bdrv_parent_drained_end() would be
wrong, too.  Imagine the above situation in reverse: Undraining A leads
to B detaching the child.  If we had already decremented the
quiesce_counter by that point, bdrv_replace_child_noperm() would undrain
B one time too little; because it expects bdrv_parent_drained_end() to
issue this undrain.  But bdrv_parent_drained_end() won't do that,
because B is no longer a parent.

Therefore, we have to do something else.  This patch opts for
introducing a second quiesce_counter that counts how many times a
child's parent has been quiesced (though c->role->drained_*).  With
that, bdrv_replace_child_noperm() just has to undrain the parent exactly
that many times when removing a child, and it will always be right.

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 include/block/block.h     |  7 +++++++
 include/block/block_int.h |  9 +++++++++
 block.c                   | 15 +++++----------
 block/io.c                | 14 +++++++++++---
 4 files changed, 32 insertions(+), 13 deletions(-)

diff --git a/include/block/block.h b/include/block/block.h
index 734c9d2f76..bff3317696 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -617,6 +617,13 @@ void bdrv_parent_drained_begin(BlockDriverState *bs, BdrvChild *ignore,
  */
 void bdrv_parent_drained_begin_single(BdrvChild *c, bool poll);
 
+/**
+ * bdrv_parent_drained_end_single:
+ *
+ * End a quiesced section for the parent of @c.
+ */
+void bdrv_parent_drained_end_single(BdrvChild *c);
+
 /**
  * bdrv_parent_drained_end:
  *
diff --git a/include/block/block_int.h b/include/block/block_int.h
index 50902531b7..f5b044fcdb 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -729,6 +729,15 @@ struct BdrvChild {
      */
     bool frozen;
 
+    /*
+     * How many times the parent of this child has been drained
+     * (through role->drained_*).
+     * Usually, this is equal to bs->quiesce_counter (potentially
+     * reduced by bdrv_drain_all_count).  It may differ while the
+     * child is entering or leaving a drained section.
+     */
+    int parent_quiesce_counter;
+
     QLIST_ENTRY(BdrvChild) next;
     QLIST_ENTRY(BdrvChild) next_parent;
 };
diff --git a/block.c b/block.c
index 29e931e217..8440712ca0 100644
--- a/block.c
+++ b/block.c
@@ -2251,24 +2251,19 @@ static void bdrv_replace_child_noperm(BdrvChild *child,
         if (child->role->detach) {
             child->role->detach(child);
         }
-        if (old_bs->quiesce_counter && child->role->drained_end) {
-            int num = old_bs->quiesce_counter;
-            if (child->role->parent_is_bds) {
-                num -= bdrv_drain_all_count;
-            }
-            assert(num >= 0);
-            for (i = 0; i < num; i++) {
-                child->role->drained_end(child);
-            }
+        while (child->parent_quiesce_counter) {
+            bdrv_parent_drained_end_single(child);
         }
         QLIST_REMOVE(child, next_parent);
+    } else {
+        assert(child->parent_quiesce_counter == 0);
     }
 
     child->bs = new_bs;
 
     if (new_bs) {
         QLIST_INSERT_HEAD(&new_bs->parents, child, next_parent);
-        if (new_bs->quiesce_counter && child->role->drained_begin) {
+        if (new_bs->quiesce_counter) {
             int num = new_bs->quiesce_counter;
             if (child->role->parent_is_bds) {
                 num -= bdrv_drain_all_count;
diff --git a/block/io.c b/block/io.c
index 24a18759fd..1e618f9a37 100644
--- a/block/io.c
+++ b/block/io.c
@@ -55,6 +55,15 @@ void bdrv_parent_drained_begin(BlockDriverState *bs, BdrvChild *ignore,
     }
 }
 
+void bdrv_parent_drained_end_single(BdrvChild *c)
+{
+    assert(c->parent_quiesce_counter > 0);
+    c->parent_quiesce_counter--;
+    if (c->role->drained_end) {
+        c->role->drained_end(c);
+    }
+}
+
 void bdrv_parent_drained_end(BlockDriverState *bs, BdrvChild *ignore,
                              bool ignore_bds_parents)
 {
@@ -64,9 +73,7 @@ void bdrv_parent_drained_end(BlockDriverState *bs, BdrvChild *ignore,
         if (c == ignore || (ignore_bds_parents && c->role->parent_is_bds)) {
             continue;
         }
-        if (c->role->drained_end) {
-            c->role->drained_end(c);
-        }
+        bdrv_parent_drained_end_single(c);
     }
 }
 
@@ -96,6 +103,7 @@ static bool bdrv_parent_drained_poll(BlockDriverState *bs, BdrvChild *ignore,
 
 void bdrv_parent_drained_begin_single(BdrvChild *c, bool poll)
 {
+    c->parent_quiesce_counter++;
     if (c->role->drained_begin) {
         c->role->drained_begin(c);
     }
-- 
2.21.0



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [Qemu-devel] [PATCH v3 02/10] tests: Add job commit by drained_end test
  2019-07-19  9:26 [Qemu-devel] [PATCH v3 00/10] block: Delay poll when ending drained sections Max Reitz
  2019-07-19  9:26 ` [Qemu-devel] [PATCH v3 01/10] block: Introduce BdrvChild.parent_quiesce_counter Max Reitz
@ 2019-07-19  9:26 ` Max Reitz
  2019-07-19  9:26 ` [Qemu-devel] [PATCH v3 03/10] block: Add @drained_end_counter Max Reitz
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: Max Reitz @ 2019-07-19  9:26 UTC (permalink / raw)
  To: qemu-block; +Cc: Kevin Wolf, qemu-devel, Stefan Hajnoczi, Max Reitz

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 tests/test-bdrv-drain.c | 119 ++++++++++++++++++++++++++++++++++++++++
 1 file changed, 119 insertions(+)

diff --git a/tests/test-bdrv-drain.c b/tests/test-bdrv-drain.c
index 12e2ecf517..3503ce3b69 100644
--- a/tests/test-bdrv-drain.c
+++ b/tests/test-bdrv-drain.c
@@ -1527,6 +1527,122 @@ static void test_set_aio_context(void)
     iothread_join(b);
 }
 
+
+typedef struct TestDropBackingBlockJob {
+    BlockJob common;
+    bool should_complete;
+    bool *did_complete;
+} TestDropBackingBlockJob;
+
+static int coroutine_fn test_drop_backing_job_run(Job *job, Error **errp)
+{
+    TestDropBackingBlockJob *s =
+        container_of(job, TestDropBackingBlockJob, common.job);
+
+    while (!s->should_complete) {
+        job_sleep_ns(job, 0);
+    }
+
+    return 0;
+}
+
+static void test_drop_backing_job_commit(Job *job)
+{
+    TestDropBackingBlockJob *s =
+        container_of(job, TestDropBackingBlockJob, common.job);
+
+    bdrv_set_backing_hd(blk_bs(s->common.blk), NULL, &error_abort);
+
+    *s->did_complete = true;
+}
+
+static const BlockJobDriver test_drop_backing_job_driver = {
+    .job_driver = {
+        .instance_size  = sizeof(TestDropBackingBlockJob),
+        .free           = block_job_free,
+        .user_resume    = block_job_user_resume,
+        .drain          = block_job_drain,
+        .run            = test_drop_backing_job_run,
+        .commit         = test_drop_backing_job_commit,
+    }
+};
+
+/**
+ * Creates a child node with three parent nodes on it, and then runs a
+ * block job on the final one, parent-node-2.
+ *
+ * (TODO: parent-node-0 currently serves no purpose, but will as of a
+ * follow-up patch.)
+ *
+ * The job is then asked to complete before a section where the child
+ * is drained.
+ *
+ * Ending this section will undrain the child's parents, first
+ * parent-node-2, then parent-node-1, then parent-node-0 -- the parent
+ * list is in reverse order of how they were added.  Ending the drain
+ * on parent-node-2 will resume the job, thus completing it and
+ * scheduling job_exit().
+ *
+ * Ending the drain on parent-node-1 will poll the AioContext, which
+ * lets job_exit() and thus test_drop_backing_job_commit() run.  That
+ * function removes the child as parent-node-2's backing file.
+ *
+ * In old (and buggy) implementations, there are two problems with
+ * that:
+ * (A) bdrv_drain_invoke() polls for every node that leaves the
+ *     drained section.  This means that job_exit() is scheduled
+ *     before the child has left the drained section.  Its
+ *     quiesce_counter is therefore still 1 when it is removed from
+ *     parent-node-2.
+ *
+ * (B) bdrv_replace_child_noperm() calls drained_end() on the old
+ *     child's parents as many times as the child is quiesced.  This
+ *     means it will call drained_end() on parent-node-2 once.
+ *     Because parent-node-2 is no longer quiesced at this point, this
+ *     will fail.
+ *
+ * bdrv_replace_child_noperm() therefore must call drained_end() on
+ * the parent only if it really is still drained because the child is
+ * drained.
+ */
+static void test_blockjob_commit_by_drained_end(void)
+{
+    BlockDriverState *bs_child, *bs_parents[3];
+    TestDropBackingBlockJob *job;
+    bool job_has_completed = false;
+    int i;
+
+    bs_child = bdrv_new_open_driver(&bdrv_test, "child-node", BDRV_O_RDWR,
+                                    &error_abort);
+
+    for (i = 0; i < 3; i++) {
+        char name[32];
+        snprintf(name, sizeof(name), "parent-node-%i", i);
+        bs_parents[i] = bdrv_new_open_driver(&bdrv_test, name, BDRV_O_RDWR,
+                                             &error_abort);
+        bdrv_set_backing_hd(bs_parents[i], bs_child, &error_abort);
+    }
+
+    job = block_job_create("job", &test_drop_backing_job_driver, NULL,
+                           bs_parents[2], 0, BLK_PERM_ALL, 0, 0, NULL, NULL,
+                           &error_abort);
+
+    job->did_complete = &job_has_completed;
+
+    job_start(&job->common.job);
+
+    job->should_complete = true;
+    bdrv_drained_begin(bs_child);
+    g_assert(!job_has_completed);
+    bdrv_drained_end(bs_child);
+    g_assert(job_has_completed);
+
+    bdrv_unref(bs_parents[0]);
+    bdrv_unref(bs_parents[1]);
+    bdrv_unref(bs_parents[2]);
+    bdrv_unref(bs_child);
+}
+
 int main(int argc, char **argv)
 {
     int ret;
@@ -1610,6 +1726,9 @@ int main(int argc, char **argv)
 
     g_test_add_func("/bdrv-drain/set_aio_context", test_set_aio_context);
 
+    g_test_add_func("/bdrv-drain/blockjob/commit_by_drained_end",
+                    test_blockjob_commit_by_drained_end);
+
     ret = g_test_run();
     qemu_event_destroy(&done_event);
     return ret;
-- 
2.21.0



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [Qemu-devel] [PATCH v3 03/10] block: Add @drained_end_counter
  2019-07-19  9:26 [Qemu-devel] [PATCH v3 00/10] block: Delay poll when ending drained sections Max Reitz
  2019-07-19  9:26 ` [Qemu-devel] [PATCH v3 01/10] block: Introduce BdrvChild.parent_quiesce_counter Max Reitz
  2019-07-19  9:26 ` [Qemu-devel] [PATCH v3 02/10] tests: Add job commit by drained_end test Max Reitz
@ 2019-07-19  9:26 ` Max Reitz
  2019-07-19  9:26 ` [Qemu-devel] [PATCH v3 04/10] block: Make bdrv_parent_drained_[^_]*() static Max Reitz
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: Max Reitz @ 2019-07-19  9:26 UTC (permalink / raw)
  To: qemu-block; +Cc: Kevin Wolf, qemu-devel, Stefan Hajnoczi, Max Reitz

Callers can now pass a pointer to an integer that bdrv_drain_invoke()
(and its recursive callees) will increment for every
bdrv_drain_invoke_entry() operation they schedule.
bdrv_drain_invoke_entry() in turn will decrement it once it has invoked
BlockDriver.bdrv_co_drain_end().

We use atomic operations to access the pointee, because the
bdrv_do_drained_end() caller may wish to end drained sections for
multiple nodes in different AioContexts (bdrv_drain_all_end() does, for
example).

This is the first step to moving the polling for BdrvCoDrainData.done to
become true out of bdrv_drain_invoke() and into the root drained_end
function.

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 block/io.c | 58 +++++++++++++++++++++++++++++++++++++-----------------
 1 file changed, 40 insertions(+), 18 deletions(-)

diff --git a/block/io.c b/block/io.c
index 1e618f9a37..c42e18b068 100644
--- a/block/io.c
+++ b/block/io.c
@@ -194,6 +194,7 @@ typedef struct {
     bool poll;
     BdrvChild *parent;
     bool ignore_bds_parents;
+    int *drained_end_counter;
 } BdrvCoDrainData;
 
 static void coroutine_fn bdrv_drain_invoke_entry(void *opaque)
@@ -211,13 +212,18 @@ static void coroutine_fn bdrv_drain_invoke_entry(void *opaque)
     atomic_mb_set(&data->done, true);
     bdrv_dec_in_flight(bs);
 
-    if (data->begin) {
+    if (data->drained_end_counter) {
+        atomic_dec(data->drained_end_counter);
+    }
+
+    if (data->begin || data->drained_end_counter) {
         g_free(data);
     }
 }
 
 /* Recursively call BlockDriver.bdrv_co_drain_begin/end callbacks */
-static void bdrv_drain_invoke(BlockDriverState *bs, bool begin)
+static void bdrv_drain_invoke(BlockDriverState *bs, bool begin,
+                              int *drained_end_counter)
 {
     BdrvCoDrainData *data;
 
@@ -230,16 +236,25 @@ static void bdrv_drain_invoke(BlockDriverState *bs, bool begin)
     *data = (BdrvCoDrainData) {
         .bs = bs,
         .done = false,
-        .begin = begin
+        .begin = begin,
+        .drained_end_counter = drained_end_counter,
     };
 
+    if (!begin && drained_end_counter) {
+        atomic_inc(drained_end_counter);
+    }
+
     /* Make sure the driver callback completes during the polling phase for
      * drain_begin. */
     bdrv_inc_in_flight(bs);
     data->co = qemu_coroutine_create(bdrv_drain_invoke_entry, data);
     aio_co_schedule(bdrv_get_aio_context(bs), data->co);
 
-    if (!begin) {
+    /*
+     * TODO: Drop this and make callers pass @drained_end_counter and poll
+     * themselves
+     */
+    if (!begin && !drained_end_counter) {
         BDRV_POLL_WHILE(bs, !data->done);
         g_free(data);
     }
@@ -281,7 +296,8 @@ static void bdrv_do_drained_begin(BlockDriverState *bs, bool recursive,
                                   BdrvChild *parent, bool ignore_bds_parents,
                                   bool poll);
 static void bdrv_do_drained_end(BlockDriverState *bs, bool recursive,
-                                BdrvChild *parent, bool ignore_bds_parents);
+                                BdrvChild *parent, bool ignore_bds_parents,
+                                int *drained_end_counter);
 
 static void bdrv_co_drain_bh_cb(void *opaque)
 {
@@ -308,7 +324,8 @@ static void bdrv_co_drain_bh_cb(void *opaque)
                                   data->ignore_bds_parents, data->poll);
         } else {
             bdrv_do_drained_end(bs, data->recursive, data->parent,
-                                data->ignore_bds_parents);
+                                data->ignore_bds_parents,
+                                data->drained_end_counter);
         }
         if (ctx == co_ctx) {
             aio_context_release(ctx);
@@ -326,7 +343,8 @@ static void coroutine_fn bdrv_co_yield_to_drain(BlockDriverState *bs,
                                                 bool begin, bool recursive,
                                                 BdrvChild *parent,
                                                 bool ignore_bds_parents,
-                                                bool poll)
+                                                bool poll,
+                                                int *drained_end_counter)
 {
     BdrvCoDrainData data;
 
@@ -343,7 +361,9 @@ static void coroutine_fn bdrv_co_yield_to_drain(BlockDriverState *bs,
         .parent = parent,
         .ignore_bds_parents = ignore_bds_parents,
         .poll = poll,
+        .drained_end_counter = drained_end_counter,
     };
+
     if (bs) {
         bdrv_inc_in_flight(bs);
     }
@@ -367,7 +387,7 @@ void bdrv_do_drained_begin_quiesce(BlockDriverState *bs,
     }
 
     bdrv_parent_drained_begin(bs, parent, ignore_bds_parents);
-    bdrv_drain_invoke(bs, true);
+    bdrv_drain_invoke(bs, true, NULL);
 }
 
 static void bdrv_do_drained_begin(BlockDriverState *bs, bool recursive,
@@ -378,7 +398,7 @@ static void bdrv_do_drained_begin(BlockDriverState *bs, bool recursive,
 
     if (qemu_in_coroutine()) {
         bdrv_co_yield_to_drain(bs, true, recursive, parent, ignore_bds_parents,
-                               poll);
+                               poll, NULL);
         return;
     }
 
@@ -419,20 +439,21 @@ void bdrv_subtree_drained_begin(BlockDriverState *bs)
 }
 
 static void bdrv_do_drained_end(BlockDriverState *bs, bool recursive,
-                                BdrvChild *parent, bool ignore_bds_parents)
+                                BdrvChild *parent, bool ignore_bds_parents,
+                                int *drained_end_counter)
 {
     BdrvChild *child, *next;
     int old_quiesce_counter;
 
     if (qemu_in_coroutine()) {
         bdrv_co_yield_to_drain(bs, false, recursive, parent, ignore_bds_parents,
-                               false);
+                               false, drained_end_counter);
         return;
     }
     assert(bs->quiesce_counter > 0);
 
     /* Re-enable things in child-to-parent order */
-    bdrv_drain_invoke(bs, false);
+    bdrv_drain_invoke(bs, false, drained_end_counter);
     bdrv_parent_drained_end(bs, parent, ignore_bds_parents);
 
     old_quiesce_counter = atomic_fetch_dec(&bs->quiesce_counter);
@@ -444,19 +465,20 @@ static void bdrv_do_drained_end(BlockDriverState *bs, bool recursive,
         assert(!ignore_bds_parents);
         bs->recursive_quiesce_counter--;
         QLIST_FOREACH_SAFE(child, &bs->children, next, next) {
-            bdrv_do_drained_end(child->bs, true, child, ignore_bds_parents);
+            bdrv_do_drained_end(child->bs, true, child, ignore_bds_parents,
+                                drained_end_counter);
         }
     }
 }
 
 void bdrv_drained_end(BlockDriverState *bs)
 {
-    bdrv_do_drained_end(bs, false, NULL, false);
+    bdrv_do_drained_end(bs, false, NULL, false, NULL);
 }
 
 void bdrv_subtree_drained_end(BlockDriverState *bs)
 {
-    bdrv_do_drained_end(bs, true, NULL, false);
+    bdrv_do_drained_end(bs, true, NULL, false, NULL);
 }
 
 void bdrv_apply_subtree_drain(BdrvChild *child, BlockDriverState *new_parent)
@@ -473,7 +495,7 @@ void bdrv_unapply_subtree_drain(BdrvChild *child, BlockDriverState *old_parent)
     int i;
 
     for (i = 0; i < old_parent->recursive_quiesce_counter; i++) {
-        bdrv_do_drained_end(child->bs, true, child, false);
+        bdrv_do_drained_end(child->bs, true, child, false, NULL);
     }
 }
 
@@ -543,7 +565,7 @@ void bdrv_drain_all_begin(void)
     BlockDriverState *bs = NULL;
 
     if (qemu_in_coroutine()) {
-        bdrv_co_yield_to_drain(NULL, true, false, NULL, true, true);
+        bdrv_co_yield_to_drain(NULL, true, false, NULL, true, true, NULL);
         return;
     }
 
@@ -579,7 +601,7 @@ void bdrv_drain_all_end(void)
         AioContext *aio_context = bdrv_get_aio_context(bs);
 
         aio_context_acquire(aio_context);
-        bdrv_do_drained_end(bs, false, NULL, true);
+        bdrv_do_drained_end(bs, false, NULL, true, NULL);
         aio_context_release(aio_context);
     }
 
-- 
2.21.0



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [Qemu-devel] [PATCH v3 04/10] block: Make bdrv_parent_drained_[^_]*() static
  2019-07-19  9:26 [Qemu-devel] [PATCH v3 00/10] block: Delay poll when ending drained sections Max Reitz
                   ` (2 preceding siblings ...)
  2019-07-19  9:26 ` [Qemu-devel] [PATCH v3 03/10] block: Add @drained_end_counter Max Reitz
@ 2019-07-19  9:26 ` Max Reitz
  2019-07-19  9:26 ` [Qemu-devel] [PATCH v3 05/10] tests: Lock AioContexts in test-block-iothread Max Reitz
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: Max Reitz @ 2019-07-19  9:26 UTC (permalink / raw)
  To: qemu-block; +Cc: Kevin Wolf, qemu-devel, Stefan Hajnoczi, Max Reitz

These functions are not used outside of block/io.c, there is no reason
why they should be globally available.

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 include/block/block.h | 18 ------------------
 block/io.c            |  8 ++++----
 2 files changed, 4 insertions(+), 22 deletions(-)

diff --git a/include/block/block.h b/include/block/block.h
index bff3317696..a81645e8a3 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -600,15 +600,6 @@ int bdrv_probe_geometry(BlockDriverState *bs, HDGeometry *geo);
 void bdrv_io_plug(BlockDriverState *bs);
 void bdrv_io_unplug(BlockDriverState *bs);
 
-/**
- * bdrv_parent_drained_begin:
- *
- * Begin a quiesced section of all users of @bs. This is part of
- * bdrv_drained_begin.
- */
-void bdrv_parent_drained_begin(BlockDriverState *bs, BdrvChild *ignore,
-                               bool ignore_bds_parents);
-
 /**
  * bdrv_parent_drained_begin_single:
  *
@@ -624,15 +615,6 @@ void bdrv_parent_drained_begin_single(BdrvChild *c, bool poll);
  */
 void bdrv_parent_drained_end_single(BdrvChild *c);
 
-/**
- * bdrv_parent_drained_end:
- *
- * End a quiesced section of all users of @bs. This is part of
- * bdrv_drained_end.
- */
-void bdrv_parent_drained_end(BlockDriverState *bs, BdrvChild *ignore,
-                             bool ignore_bds_parents);
-
 /**
  * bdrv_drain_poll:
  *
diff --git a/block/io.c b/block/io.c
index c42e18b068..b0b33174d3 100644
--- a/block/io.c
+++ b/block/io.c
@@ -42,8 +42,8 @@ static void bdrv_parent_cb_resize(BlockDriverState *bs);
 static int coroutine_fn bdrv_co_do_pwrite_zeroes(BlockDriverState *bs,
     int64_t offset, int bytes, BdrvRequestFlags flags);
 
-void bdrv_parent_drained_begin(BlockDriverState *bs, BdrvChild *ignore,
-                               bool ignore_bds_parents)
+static void bdrv_parent_drained_begin(BlockDriverState *bs, BdrvChild *ignore,
+                                      bool ignore_bds_parents)
 {
     BdrvChild *c, *next;
 
@@ -64,8 +64,8 @@ void bdrv_parent_drained_end_single(BdrvChild *c)
     }
 }
 
-void bdrv_parent_drained_end(BlockDriverState *bs, BdrvChild *ignore,
-                             bool ignore_bds_parents)
+static void bdrv_parent_drained_end(BlockDriverState *bs, BdrvChild *ignore,
+                                    bool ignore_bds_parents)
 {
     BdrvChild *c, *next;
 
-- 
2.21.0



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [Qemu-devel] [PATCH v3 05/10] tests: Lock AioContexts in test-block-iothread
  2019-07-19  9:26 [Qemu-devel] [PATCH v3 00/10] block: Delay poll when ending drained sections Max Reitz
                   ` (3 preceding siblings ...)
  2019-07-19  9:26 ` [Qemu-devel] [PATCH v3 04/10] block: Make bdrv_parent_drained_[^_]*() static Max Reitz
@ 2019-07-19  9:26 ` Max Reitz
  2019-07-19  9:26 ` [Qemu-devel] [PATCH v3 06/10] block: Do not poll in bdrv_do_drained_end() Max Reitz
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: Max Reitz @ 2019-07-19  9:26 UTC (permalink / raw)
  To: qemu-block; +Cc: Kevin Wolf, qemu-devel, Stefan Hajnoczi, Max Reitz

When changing a node's AioContext, the caller must acquire the old
AioContext (unless it currently runs in that old context).  Therefore,
unless the node currently is in the main context, we always have to
acquire the old context around calls that may change a node's
AioContext.

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 tests/test-block-iothread.c | 40 ++++++++++++++++++++++++-------------
 1 file changed, 26 insertions(+), 14 deletions(-)

diff --git a/tests/test-block-iothread.c b/tests/test-block-iothread.c
index 79d9cf8a57..1949d5e61a 100644
--- a/tests/test-block-iothread.c
+++ b/tests/test-block-iothread.c
@@ -348,8 +348,8 @@ static void test_sync_op(const void *opaque)
     if (t->blkfn) {
         t->blkfn(blk);
     }
-    aio_context_release(ctx);
     blk_set_aio_context(blk, qemu_get_aio_context(), &error_abort);
+    aio_context_release(ctx);
 
     bdrv_unref(bs);
     blk_unref(blk);
@@ -476,6 +476,7 @@ static void test_propagate_basic(void)
 {
     IOThread *iothread = iothread_new();
     AioContext *ctx = iothread_get_aio_context(iothread);
+    AioContext *main_ctx;
     BlockBackend *blk;
     BlockDriverState *bs_a, *bs_b, *bs_verify;
     QDict *options;
@@ -504,12 +505,14 @@ static void test_propagate_basic(void)
     g_assert(bdrv_get_aio_context(bs_b) == ctx);
 
     /* Switch the AioContext back */
-    ctx = qemu_get_aio_context();
-    blk_set_aio_context(blk, ctx, &error_abort);
-    g_assert(blk_get_aio_context(blk) == ctx);
-    g_assert(bdrv_get_aio_context(bs_a) == ctx);
-    g_assert(bdrv_get_aio_context(bs_verify) == ctx);
-    g_assert(bdrv_get_aio_context(bs_b) == ctx);
+    main_ctx = qemu_get_aio_context();
+    aio_context_acquire(ctx);
+    blk_set_aio_context(blk, main_ctx, &error_abort);
+    aio_context_release(ctx);
+    g_assert(blk_get_aio_context(blk) == main_ctx);
+    g_assert(bdrv_get_aio_context(bs_a) == main_ctx);
+    g_assert(bdrv_get_aio_context(bs_verify) == main_ctx);
+    g_assert(bdrv_get_aio_context(bs_b) == main_ctx);
 
     bdrv_unref(bs_verify);
     bdrv_unref(bs_b);
@@ -534,6 +537,7 @@ static void test_propagate_diamond(void)
 {
     IOThread *iothread = iothread_new();
     AioContext *ctx = iothread_get_aio_context(iothread);
+    AioContext *main_ctx;
     BlockBackend *blk;
     BlockDriverState *bs_a, *bs_b, *bs_c, *bs_verify;
     QDict *options;
@@ -573,13 +577,15 @@ static void test_propagate_diamond(void)
     g_assert(bdrv_get_aio_context(bs_c) == ctx);
 
     /* Switch the AioContext back */
-    ctx = qemu_get_aio_context();
-    blk_set_aio_context(blk, ctx, &error_abort);
-    g_assert(blk_get_aio_context(blk) == ctx);
-    g_assert(bdrv_get_aio_context(bs_verify) == ctx);
-    g_assert(bdrv_get_aio_context(bs_a) == ctx);
-    g_assert(bdrv_get_aio_context(bs_b) == ctx);
-    g_assert(bdrv_get_aio_context(bs_c) == ctx);
+    main_ctx = qemu_get_aio_context();
+    aio_context_acquire(ctx);
+    blk_set_aio_context(blk, main_ctx, &error_abort);
+    aio_context_release(ctx);
+    g_assert(blk_get_aio_context(blk) == main_ctx);
+    g_assert(bdrv_get_aio_context(bs_verify) == main_ctx);
+    g_assert(bdrv_get_aio_context(bs_a) == main_ctx);
+    g_assert(bdrv_get_aio_context(bs_b) == main_ctx);
+    g_assert(bdrv_get_aio_context(bs_c) == main_ctx);
 
     blk_unref(blk);
     bdrv_unref(bs_verify);
@@ -685,7 +691,9 @@ static void test_attach_second_node(void)
     g_assert(bdrv_get_aio_context(bs) == ctx);
     g_assert(bdrv_get_aio_context(filter) == ctx);
 
+    aio_context_acquire(ctx);
     blk_set_aio_context(blk, main_ctx, &error_abort);
+    aio_context_release(ctx);
     g_assert(blk_get_aio_context(blk) == main_ctx);
     g_assert(bdrv_get_aio_context(bs) == main_ctx);
     g_assert(bdrv_get_aio_context(filter) == main_ctx);
@@ -712,7 +720,9 @@ static void test_attach_preserve_blk_ctx(void)
     g_assert(bdrv_get_aio_context(bs) == ctx);
 
     /* Remove the node again */
+    aio_context_acquire(ctx);
     blk_remove_bs(blk);
+    aio_context_release(ctx);
     g_assert(blk_get_aio_context(blk) == ctx);
     g_assert(bdrv_get_aio_context(bs) == qemu_get_aio_context());
 
@@ -721,7 +731,9 @@ static void test_attach_preserve_blk_ctx(void)
     g_assert(blk_get_aio_context(blk) == ctx);
     g_assert(bdrv_get_aio_context(bs) == ctx);
 
+    aio_context_acquire(ctx);
     blk_set_aio_context(blk, qemu_get_aio_context(), &error_abort);
+    aio_context_release(ctx);
     bdrv_unref(bs);
     blk_unref(blk);
 }
-- 
2.21.0



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [Qemu-devel] [PATCH v3 06/10] block: Do not poll in bdrv_do_drained_end()
  2019-07-19  9:26 [Qemu-devel] [PATCH v3 00/10] block: Delay poll when ending drained sections Max Reitz
                   ` (4 preceding siblings ...)
  2019-07-19  9:26 ` [Qemu-devel] [PATCH v3 05/10] tests: Lock AioContexts in test-block-iothread Max Reitz
@ 2019-07-19  9:26 ` Max Reitz
  2019-07-19  9:26 ` [Qemu-devel] [PATCH v3 07/10] tests: Extend commit by drained_end test Max Reitz
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: Max Reitz @ 2019-07-19  9:26 UTC (permalink / raw)
  To: qemu-block; +Cc: Kevin Wolf, qemu-devel, Stefan Hajnoczi, Max Reitz

We should never poll anywhere in bdrv_do_drained_end() (including its
recursive callees like bdrv_drain_invoke()), because it does not cope
well with graph changes.  In fact, it has been written based on the
postulation that no graph changes will happen in it.

Instead, the callers that want to poll must poll, i.e. all currently
globally available wrappers: bdrv_drained_end(),
bdrv_subtree_drained_end(), bdrv_unapply_subtree_drain(), and
bdrv_drain_all_end().  Graph changes there do not matter.

They can poll simply by passing a pointer to a drained_end_counter and
wait until it reaches 0.

This patch also adds a non-polling global wrapper for
bdrv_do_drained_end() that takes a drained_end_counter pointer.  We need
such a variant because now no function called anywhere from
bdrv_do_drained_end() must poll.  This includes
BdrvChildRole.drained_end(), which already must not poll according to
its interface documentation, but bdrv_child_cb_drained_end() just
violates that by invoking bdrv_drained_end() (which does poll).
Therefore, BdrvChildRole.drained_end() must take a *drained_end_counter
parameter, which bdrv_child_cb_drained_end() can pass on to the new
bdrv_drained_end_no_poll() function.

Note that we now have a pattern of all drained_end-related functions
either polling or receiving a *drained_end_counter to let the caller
poll based on that.

A problem with a single poll loop is that when the drained section in
bdrv_set_aio_context_ignore() ends, some nodes in the subgraph may be in
the old contexts, while others are in the new context already.  To let
the collective poll in bdrv_drained_end() work correctly, we must not
hold a lock to the old context, so that the old context can make
progress in case it is different from the current context.

(In the process, remove the comment saying that the current context is
always the old context, because it is wrong.)

In all other places, all nodes in a subtree must be in the same context,
so we can just poll that.  The exception of course is
bdrv_drain_all_end(), but that always runs in the main context, so we
can just poll NULL (like bdrv_drain_all_begin() does).

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 include/block/block.h     | 25 ++++++++++++
 include/block/block_int.h |  6 ++-
 block.c                   | 37 ++++++++++++++----
 block/block-backend.c     |  6 +--
 block/io.c                | 80 ++++++++++++++++++++++++++++-----------
 blockjob.c                |  2 +-
 6 files changed, 120 insertions(+), 36 deletions(-)

diff --git a/include/block/block.h b/include/block/block.h
index a81645e8a3..60f00479e0 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -612,6 +612,9 @@ void bdrv_parent_drained_begin_single(BdrvChild *c, bool poll);
  * bdrv_parent_drained_end_single:
  *
  * End a quiesced section for the parent of @c.
+ *
+ * This polls @bs's AioContext until all scheduled sub-drained_ends
+ * have settled, which may result in graph changes.
  */
 void bdrv_parent_drained_end_single(BdrvChild *c);
 
@@ -661,9 +664,31 @@ void bdrv_subtree_drained_begin(BlockDriverState *bs);
  * bdrv_drained_end:
  *
  * End a quiescent section started by bdrv_drained_begin().
+ *
+ * This polls @bs's AioContext until all scheduled sub-drained_ends
+ * have settled.  On one hand, that may result in graph changes.  On
+ * the other, this requires that all involved nodes (@bs and all of
+ * its parents) are in the same AioContext, and that the caller has
+ * acquired it.
+ * If there are any nodes that are in different contexts from @bs,
+ * these contexts must not be acquired.
  */
 void bdrv_drained_end(BlockDriverState *bs);
 
+/**
+ * bdrv_drained_end_no_poll:
+ *
+ * Same as bdrv_drained_end(), but do not poll for the subgraph to
+ * actually become unquiesced.  Therefore, no graph changes will occur
+ * with this function.
+ *
+ * *drained_end_counter is incremented for every background operation
+ * that is scheduled, and will be decremented for every operation once
+ * it settles.  The caller must poll until it reaches 0.  The counter
+ * should be accessed using atomic operations only.
+ */
+void bdrv_drained_end_no_poll(BlockDriverState *bs, int *drained_end_counter);
+
 /**
  * End a quiescent section started by bdrv_subtree_drained_begin().
  */
diff --git a/include/block/block_int.h b/include/block/block_int.h
index f5b044fcdb..3aa1e832a8 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -664,11 +664,15 @@ struct BdrvChildRole {
      * These functions must not change the graph (and therefore also must not
      * call aio_poll(), which could change the graph indirectly).
      *
+     * If drained_end() schedules background operations, it must atomically
+     * increment *drained_end_counter for each such operation and atomically
+     * decrement it once the operation has settled.
+     *
      * Note that this can be nested. If drained_begin() was called twice, new
      * I/O is allowed only after drained_end() was called twice, too.
      */
     void (*drained_begin)(BdrvChild *child);
-    void (*drained_end)(BdrvChild *child);
+    void (*drained_end)(BdrvChild *child, int *drained_end_counter);
 
     /*
      * Returns whether the parent has pending requests for the child. This
diff --git a/block.c b/block.c
index 8440712ca0..9c94f7f28a 100644
--- a/block.c
+++ b/block.c
@@ -911,10 +911,11 @@ static bool bdrv_child_cb_drained_poll(BdrvChild *child)
     return bdrv_drain_poll(bs, false, NULL, false);
 }
 
-static void bdrv_child_cb_drained_end(BdrvChild *child)
+static void bdrv_child_cb_drained_end(BdrvChild *child,
+                                      int *drained_end_counter)
 {
     BlockDriverState *bs = child->opaque;
-    bdrv_drained_end(bs);
+    bdrv_drained_end_no_poll(bs, drained_end_counter);
 }
 
 static void bdrv_child_cb_attach(BdrvChild *child)
@@ -5923,9 +5924,11 @@ static void bdrv_attach_aio_context(BlockDriverState *bs,
 void bdrv_set_aio_context_ignore(BlockDriverState *bs,
                                  AioContext *new_context, GSList **ignore)
 {
+    AioContext *old_context = bdrv_get_aio_context(bs);
+    AioContext *current_context = qemu_get_current_aio_context();
     BdrvChild *child;
 
-    if (bdrv_get_aio_context(bs) == new_context) {
+    if (old_context == new_context) {
         return;
     }
 
@@ -5949,13 +5952,31 @@ void bdrv_set_aio_context_ignore(BlockDriverState *bs,
 
     bdrv_detach_aio_context(bs);
 
-    /* This function executes in the old AioContext so acquire the new one in
-     * case it runs in a different thread.
-     */
-    aio_context_acquire(new_context);
+    /* Acquire the new context, if necessary */
+    if (current_context != new_context) {
+        aio_context_acquire(new_context);
+    }
+
     bdrv_attach_aio_context(bs, new_context);
+
+    /*
+     * If this function was recursively called from
+     * bdrv_set_aio_context_ignore(), there may be nodes in the
+     * subtree that have not yet been moved to the new AioContext.
+     * Release the old one so bdrv_drained_end() can poll them.
+     */
+    if (current_context != old_context) {
+        aio_context_release(old_context);
+    }
+
     bdrv_drained_end(bs);
-    aio_context_release(new_context);
+
+    if (current_context != old_context) {
+        aio_context_acquire(old_context);
+    }
+    if (current_context != new_context) {
+        aio_context_release(new_context);
+    }
 }
 
 static bool bdrv_parent_can_set_aio_context(BdrvChild *c, AioContext *ctx,
diff --git a/block/block-backend.c b/block/block-backend.c
index a8d160fd5d..0056b526b8 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -121,7 +121,7 @@ static void blk_root_inherit_options(int *child_flags, QDict *child_options,
 }
 static void blk_root_drained_begin(BdrvChild *child);
 static bool blk_root_drained_poll(BdrvChild *child);
-static void blk_root_drained_end(BdrvChild *child);
+static void blk_root_drained_end(BdrvChild *child, int *drained_end_counter);
 
 static void blk_root_change_media(BdrvChild *child, bool load);
 static void blk_root_resize(BdrvChild *child);
@@ -1249,7 +1249,7 @@ int blk_pread_unthrottled(BlockBackend *blk, int64_t offset, uint8_t *buf,
 
     blk_root_drained_begin(blk->root);
     ret = blk_pread(blk, offset, buf, count);
-    blk_root_drained_end(blk->root);
+    blk_root_drained_end(blk->root, NULL);
     return ret;
 }
 
@@ -2236,7 +2236,7 @@ static bool blk_root_drained_poll(BdrvChild *child)
     return !!blk->in_flight;
 }
 
-static void blk_root_drained_end(BdrvChild *child)
+static void blk_root_drained_end(BdrvChild *child, int *drained_end_counter)
 {
     BlockBackend *blk = child->opaque;
     assert(blk->quiesce_counter);
diff --git a/block/io.c b/block/io.c
index b0b33174d3..8f23cab82e 100644
--- a/block/io.c
+++ b/block/io.c
@@ -55,17 +55,26 @@ static void bdrv_parent_drained_begin(BlockDriverState *bs, BdrvChild *ignore,
     }
 }
 
-void bdrv_parent_drained_end_single(BdrvChild *c)
+static void bdrv_parent_drained_end_single_no_poll(BdrvChild *c,
+                                                   int *drained_end_counter)
 {
     assert(c->parent_quiesce_counter > 0);
     c->parent_quiesce_counter--;
     if (c->role->drained_end) {
-        c->role->drained_end(c);
+        c->role->drained_end(c, drained_end_counter);
     }
 }
 
+void bdrv_parent_drained_end_single(BdrvChild *c)
+{
+    int drained_end_counter = 0;
+    bdrv_parent_drained_end_single_no_poll(c, &drained_end_counter);
+    BDRV_POLL_WHILE(c->bs, atomic_read(&drained_end_counter) > 0);
+}
+
 static void bdrv_parent_drained_end(BlockDriverState *bs, BdrvChild *ignore,
-                                    bool ignore_bds_parents)
+                                    bool ignore_bds_parents,
+                                    int *drained_end_counter)
 {
     BdrvChild *c, *next;
 
@@ -73,7 +82,7 @@ static void bdrv_parent_drained_end(BlockDriverState *bs, BdrvChild *ignore,
         if (c == ignore || (ignore_bds_parents && c->role->parent_is_bds)) {
             continue;
         }
-        bdrv_parent_drained_end_single(c);
+        bdrv_parent_drained_end_single_no_poll(c, drained_end_counter);
     }
 }
 
@@ -212,13 +221,11 @@ static void coroutine_fn bdrv_drain_invoke_entry(void *opaque)
     atomic_mb_set(&data->done, true);
     bdrv_dec_in_flight(bs);
 
-    if (data->drained_end_counter) {
+    if (!data->begin) {
         atomic_dec(data->drained_end_counter);
     }
 
-    if (data->begin || data->drained_end_counter) {
-        g_free(data);
-    }
+    g_free(data);
 }
 
 /* Recursively call BlockDriver.bdrv_co_drain_begin/end callbacks */
@@ -240,7 +247,7 @@ static void bdrv_drain_invoke(BlockDriverState *bs, bool begin,
         .drained_end_counter = drained_end_counter,
     };
 
-    if (!begin && drained_end_counter) {
+    if (!begin) {
         atomic_inc(drained_end_counter);
     }
 
@@ -249,15 +256,6 @@ static void bdrv_drain_invoke(BlockDriverState *bs, bool begin,
     bdrv_inc_in_flight(bs);
     data->co = qemu_coroutine_create(bdrv_drain_invoke_entry, data);
     aio_co_schedule(bdrv_get_aio_context(bs), data->co);
-
-    /*
-     * TODO: Drop this and make callers pass @drained_end_counter and poll
-     * themselves
-     */
-    if (!begin && !drained_end_counter) {
-        BDRV_POLL_WHILE(bs, !data->done);
-        g_free(data);
-    }
 }
 
 /* Returns true if BDRV_POLL_WHILE() should go into a blocking aio_poll() */
@@ -320,9 +318,11 @@ static void bdrv_co_drain_bh_cb(void *opaque)
         }
         bdrv_dec_in_flight(bs);
         if (data->begin) {
+            assert(!data->drained_end_counter);
             bdrv_do_drained_begin(bs, data->recursive, data->parent,
                                   data->ignore_bds_parents, data->poll);
         } else {
+            assert(!data->poll);
             bdrv_do_drained_end(bs, data->recursive, data->parent,
                                 data->ignore_bds_parents,
                                 data->drained_end_counter);
@@ -438,6 +438,20 @@ void bdrv_subtree_drained_begin(BlockDriverState *bs)
     bdrv_do_drained_begin(bs, true, NULL, false, true);
 }
 
+/**
+ * This function does not poll, nor must any of its recursively called
+ * functions.  The *drained_end_counter pointee will be incremented
+ * once for every background operation scheduled, and decremented once
+ * the operation settles.  Therefore, the pointer must remain valid
+ * until the pointee reaches 0.  That implies that whoever sets up the
+ * pointee has to poll until it is 0.
+ *
+ * We use atomic operations to access *drained_end_counter, because
+ * (1) when called from bdrv_set_aio_context_ignore(), the subgraph of
+ *     @bs may contain nodes in different AioContexts,
+ * (2) bdrv_drain_all_end() uses the same counter for all nodes,
+ *     regardless of which AioContext they are in.
+ */
 static void bdrv_do_drained_end(BlockDriverState *bs, bool recursive,
                                 BdrvChild *parent, bool ignore_bds_parents,
                                 int *drained_end_counter)
@@ -445,6 +459,8 @@ static void bdrv_do_drained_end(BlockDriverState *bs, bool recursive,
     BdrvChild *child, *next;
     int old_quiesce_counter;
 
+    assert(drained_end_counter != NULL);
+
     if (qemu_in_coroutine()) {
         bdrv_co_yield_to_drain(bs, false, recursive, parent, ignore_bds_parents,
                                false, drained_end_counter);
@@ -454,7 +470,8 @@ static void bdrv_do_drained_end(BlockDriverState *bs, bool recursive,
 
     /* Re-enable things in child-to-parent order */
     bdrv_drain_invoke(bs, false, drained_end_counter);
-    bdrv_parent_drained_end(bs, parent, ignore_bds_parents);
+    bdrv_parent_drained_end(bs, parent, ignore_bds_parents,
+                            drained_end_counter);
 
     old_quiesce_counter = atomic_fetch_dec(&bs->quiesce_counter);
     if (old_quiesce_counter == 1) {
@@ -473,12 +490,21 @@ static void bdrv_do_drained_end(BlockDriverState *bs, bool recursive,
 
 void bdrv_drained_end(BlockDriverState *bs)
 {
-    bdrv_do_drained_end(bs, false, NULL, false, NULL);
+    int drained_end_counter = 0;
+    bdrv_do_drained_end(bs, false, NULL, false, &drained_end_counter);
+    BDRV_POLL_WHILE(bs, atomic_read(&drained_end_counter) > 0);
+}
+
+void bdrv_drained_end_no_poll(BlockDriverState *bs, int *drained_end_counter)
+{
+    bdrv_do_drained_end(bs, false, NULL, false, drained_end_counter);
 }
 
 void bdrv_subtree_drained_end(BlockDriverState *bs)
 {
-    bdrv_do_drained_end(bs, true, NULL, false, NULL);
+    int drained_end_counter = 0;
+    bdrv_do_drained_end(bs, true, NULL, false, &drained_end_counter);
+    BDRV_POLL_WHILE(bs, atomic_read(&drained_end_counter) > 0);
 }
 
 void bdrv_apply_subtree_drain(BdrvChild *child, BlockDriverState *new_parent)
@@ -492,11 +518,15 @@ void bdrv_apply_subtree_drain(BdrvChild *child, BlockDriverState *new_parent)
 
 void bdrv_unapply_subtree_drain(BdrvChild *child, BlockDriverState *old_parent)
 {
+    int drained_end_counter = 0;
     int i;
 
     for (i = 0; i < old_parent->recursive_quiesce_counter; i++) {
-        bdrv_do_drained_end(child->bs, true, child, false, NULL);
+        bdrv_do_drained_end(child->bs, true, child, false,
+                            &drained_end_counter);
     }
+
+    BDRV_POLL_WHILE(child->bs, atomic_read(&drained_end_counter) > 0);
 }
 
 /*
@@ -596,15 +626,19 @@ void bdrv_drain_all_begin(void)
 void bdrv_drain_all_end(void)
 {
     BlockDriverState *bs = NULL;
+    int drained_end_counter = 0;
 
     while ((bs = bdrv_next_all_states(bs))) {
         AioContext *aio_context = bdrv_get_aio_context(bs);
 
         aio_context_acquire(aio_context);
-        bdrv_do_drained_end(bs, false, NULL, true, NULL);
+        bdrv_do_drained_end(bs, false, NULL, true, &drained_end_counter);
         aio_context_release(aio_context);
     }
 
+    assert(qemu_get_current_aio_context() == qemu_get_aio_context());
+    AIO_WAIT_WHILE(NULL, atomic_read(&drained_end_counter) > 0);
+
     assert(bdrv_drain_all_count > 0);
     bdrv_drain_all_count--;
 }
diff --git a/blockjob.c b/blockjob.c
index 458ae76f51..20b7f557da 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -135,7 +135,7 @@ static bool child_job_drained_poll(BdrvChild *c)
     }
 }
 
-static void child_job_drained_end(BdrvChild *c)
+static void child_job_drained_end(BdrvChild *c, int *drained_end_counter)
 {
     BlockJob *job = c->opaque;
     job_resume(&job->job);
-- 
2.21.0



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [Qemu-devel] [PATCH v3 07/10] tests: Extend commit by drained_end test
  2019-07-19  9:26 [Qemu-devel] [PATCH v3 00/10] block: Delay poll when ending drained sections Max Reitz
                   ` (5 preceding siblings ...)
  2019-07-19  9:26 ` [Qemu-devel] [PATCH v3 06/10] block: Do not poll in bdrv_do_drained_end() Max Reitz
@ 2019-07-19  9:26 ` Max Reitz
  2019-07-19  9:26 ` [Qemu-devel] [PATCH v3 08/10] block: Loop unsafely in bdrv*drained_end() Max Reitz
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: Max Reitz @ 2019-07-19  9:26 UTC (permalink / raw)
  To: qemu-block; +Cc: Kevin Wolf, qemu-devel, Stefan Hajnoczi, Max Reitz

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 tests/test-bdrv-drain.c | 36 ++++++++++++++++++++++++++++++++----
 1 file changed, 32 insertions(+), 4 deletions(-)

diff --git a/tests/test-bdrv-drain.c b/tests/test-bdrv-drain.c
index 3503ce3b69..03fa1142a1 100644
--- a/tests/test-bdrv-drain.c
+++ b/tests/test-bdrv-drain.c
@@ -1532,6 +1532,7 @@ typedef struct TestDropBackingBlockJob {
     BlockJob common;
     bool should_complete;
     bool *did_complete;
+    BlockDriverState *detach_also;
 } TestDropBackingBlockJob;
 
 static int coroutine_fn test_drop_backing_job_run(Job *job, Error **errp)
@@ -1552,6 +1553,7 @@ static void test_drop_backing_job_commit(Job *job)
         container_of(job, TestDropBackingBlockJob, common.job);
 
     bdrv_set_backing_hd(blk_bs(s->common.blk), NULL, &error_abort);
+    bdrv_set_backing_hd(s->detach_also, NULL, &error_abort);
 
     *s->did_complete = true;
 }
@@ -1571,9 +1573,6 @@ static const BlockJobDriver test_drop_backing_job_driver = {
  * Creates a child node with three parent nodes on it, and then runs a
  * block job on the final one, parent-node-2.
  *
- * (TODO: parent-node-0 currently serves no purpose, but will as of a
- * follow-up patch.)
- *
  * The job is then asked to complete before a section where the child
  * is drained.
  *
@@ -1585,7 +1584,7 @@ static const BlockJobDriver test_drop_backing_job_driver = {
  *
  * Ending the drain on parent-node-1 will poll the AioContext, which
  * lets job_exit() and thus test_drop_backing_job_commit() run.  That
- * function removes the child as parent-node-2's backing file.
+ * function first removes the child as parent-node-2's backing file.
  *
  * In old (and buggy) implementations, there are two problems with
  * that:
@@ -1604,6 +1603,34 @@ static const BlockJobDriver test_drop_backing_job_driver = {
  * bdrv_replace_child_noperm() therefore must call drained_end() on
  * the parent only if it really is still drained because the child is
  * drained.
+ *
+ * If removing child from parent-node-2 was successful (as it should
+ * be), test_drop_backing_job_commit() will then also remove the child
+ * from parent-node-0.
+ *
+ * With an old version of our drain infrastructure ((A) above), that
+ * resulted in the following flow:
+ *
+ * 1. child attempts to leave its drained section.  The call recurses
+ *    to its parents.
+ *
+ * 2. parent-node-2 leaves the drained section.  Polling in
+ *    bdrv_drain_invoke() will schedule job_exit().
+ *
+ * 3. parent-node-1 leaves the drained section.  Polling in
+ *    bdrv_drain_invoke() will run job_exit(), thus disconnecting
+ *    parent-node-0 from the child node.
+ *
+ * 4. bdrv_parent_drained_end() uses a QLIST_FOREACH_SAFE() loop to
+ *    iterate over the parents.  Thus, it now accesses the BdrvChild
+ *    object that used to connect parent-node-0 and the child node.
+ *    However, that object no longer exists, so it accesses a dangling
+ *    pointer.
+ *
+ * The solution is to only poll once when running a bdrv_drained_end()
+ * operation, specifically at the end when all drained_end()
+ * operations for all involved nodes have been scheduled.
+ * Note that this also solves (A) above, thus hiding (B).
  */
 static void test_blockjob_commit_by_drained_end(void)
 {
@@ -1627,6 +1654,7 @@ static void test_blockjob_commit_by_drained_end(void)
                            bs_parents[2], 0, BLK_PERM_ALL, 0, 0, NULL, NULL,
                            &error_abort);
 
+    job->detach_also = bs_parents[0];
     job->did_complete = &job_has_completed;
 
     job_start(&job->common.job);
-- 
2.21.0



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [Qemu-devel] [PATCH v3 08/10] block: Loop unsafely in bdrv*drained_end()
  2019-07-19  9:26 [Qemu-devel] [PATCH v3 00/10] block: Delay poll when ending drained sections Max Reitz
                   ` (6 preceding siblings ...)
  2019-07-19  9:26 ` [Qemu-devel] [PATCH v3 07/10] tests: Extend commit by drained_end test Max Reitz
@ 2019-07-19  9:26 ` Max Reitz
  2019-07-19  9:26 ` [Qemu-devel] [PATCH v3 09/10] iotests: Add @has_quit to vm.shutdown() Max Reitz
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: Max Reitz @ 2019-07-19  9:26 UTC (permalink / raw)
  To: qemu-block; +Cc: Kevin Wolf, qemu-devel, Stefan Hajnoczi, Max Reitz

The graph must not change in these loops (or a QLIST_FOREACH_SAFE would
not even be enough).  We now ensure this by only polling once in the
root bdrv_drained_end() call, so we can drop the _SAFE suffix.  Doing so
makes it clear that the graph must not change.

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 block/io.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/block/io.c b/block/io.c
index 8f23cab82e..b89e155d21 100644
--- a/block/io.c
+++ b/block/io.c
@@ -76,9 +76,9 @@ static void bdrv_parent_drained_end(BlockDriverState *bs, BdrvChild *ignore,
                                     bool ignore_bds_parents,
                                     int *drained_end_counter)
 {
-    BdrvChild *c, *next;
+    BdrvChild *c;
 
-    QLIST_FOREACH_SAFE(c, &bs->parents, next_parent, next) {
+    QLIST_FOREACH(c, &bs->parents, next_parent) {
         if (c == ignore || (ignore_bds_parents && c->role->parent_is_bds)) {
             continue;
         }
@@ -456,7 +456,7 @@ static void bdrv_do_drained_end(BlockDriverState *bs, bool recursive,
                                 BdrvChild *parent, bool ignore_bds_parents,
                                 int *drained_end_counter)
 {
-    BdrvChild *child, *next;
+    BdrvChild *child;
     int old_quiesce_counter;
 
     assert(drained_end_counter != NULL);
@@ -481,7 +481,7 @@ static void bdrv_do_drained_end(BlockDriverState *bs, bool recursive,
     if (recursive) {
         assert(!ignore_bds_parents);
         bs->recursive_quiesce_counter--;
-        QLIST_FOREACH_SAFE(child, &bs->children, next, next) {
+        QLIST_FOREACH(child, &bs->children, next) {
             bdrv_do_drained_end(child->bs, true, child, ignore_bds_parents,
                                 drained_end_counter);
         }
-- 
2.21.0



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [Qemu-devel] [PATCH v3 09/10] iotests: Add @has_quit to vm.shutdown()
  2019-07-19  9:26 [Qemu-devel] [PATCH v3 00/10] block: Delay poll when ending drained sections Max Reitz
                   ` (7 preceding siblings ...)
  2019-07-19  9:26 ` [Qemu-devel] [PATCH v3 08/10] block: Loop unsafely in bdrv*drained_end() Max Reitz
@ 2019-07-19  9:26 ` Max Reitz
  2019-07-19  9:26 ` [Qemu-devel] [PATCH v3 10/10] iotests: Test commit with a filter on the chain Max Reitz
  2019-07-19 13:16 ` [Qemu-devel] [PATCH v3 00/10] block: Delay poll when ending drained sections Kevin Wolf
  10 siblings, 0 replies; 12+ messages in thread
From: Max Reitz @ 2019-07-19  9:26 UTC (permalink / raw)
  To: qemu-block; +Cc: Kevin Wolf, qemu-devel, Stefan Hajnoczi, Max Reitz

If a test has issued a quit command already (which may be useful to do
explicitly because the test wants to show its effects),
QEMUMachine.shutdown() should not do so again.  Otherwise, the VM may
well return an ECONNRESET which will lead QEMUMachine.shutdown() to
killing it, which then turns into a "qemu received signal 9" line.

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 python/qemu/machine.py | 5 +++--
 tests/qemu-iotests/255 | 2 +-
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/python/qemu/machine.py b/python/qemu/machine.py
index 49445e675b..128a3d1dc2 100644
--- a/python/qemu/machine.py
+++ b/python/qemu/machine.py
@@ -329,13 +329,14 @@ class QEMUMachine(object):
         self._load_io_log()
         self._post_shutdown()
 
-    def shutdown(self):
+    def shutdown(self, has_quit=False):
         """
         Terminate the VM and clean up
         """
         if self.is_running():
             try:
-                self._qmp.cmd('quit')
+                if not has_quit:
+                    self._qmp.cmd('quit')
                 self._qmp.close()
             except:
                 self._popen.kill()
diff --git a/tests/qemu-iotests/255 b/tests/qemu-iotests/255
index 49433ec122..3632d507d0 100755
--- a/tests/qemu-iotests/255
+++ b/tests/qemu-iotests/255
@@ -132,4 +132,4 @@ with iotests.FilePath('src.qcow2') as src_path, \
     vm.qmp_log('block-job-cancel', device='job0')
     vm.qmp_log('quit')
 
-    vm.shutdown()
+    vm.shutdown(has_quit=True)
-- 
2.21.0



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [Qemu-devel] [PATCH v3 10/10] iotests: Test commit with a filter on the chain
  2019-07-19  9:26 [Qemu-devel] [PATCH v3 00/10] block: Delay poll when ending drained sections Max Reitz
                   ` (8 preceding siblings ...)
  2019-07-19  9:26 ` [Qemu-devel] [PATCH v3 09/10] iotests: Add @has_quit to vm.shutdown() Max Reitz
@ 2019-07-19  9:26 ` Max Reitz
  2019-07-19 13:16 ` [Qemu-devel] [PATCH v3 00/10] block: Delay poll when ending drained sections Kevin Wolf
  10 siblings, 0 replies; 12+ messages in thread
From: Max Reitz @ 2019-07-19  9:26 UTC (permalink / raw)
  To: qemu-block; +Cc: Kevin Wolf, qemu-devel, Stefan Hajnoczi, Max Reitz

Before the previous patches, the first case resulted in a failed
assertion (which is noted as qemu receiving a SIGABRT in the test
output), and the second usually triggered a segmentation fault.

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 tests/qemu-iotests/040     | 40 +++++++++++++++++++++++++++++++++++++-
 tests/qemu-iotests/040.out |  4 ++--
 2 files changed, 41 insertions(+), 3 deletions(-)

diff --git a/tests/qemu-iotests/040 b/tests/qemu-iotests/040
index b81133a474..aa0b1847e3 100755
--- a/tests/qemu-iotests/040
+++ b/tests/qemu-iotests/040
@@ -92,9 +92,10 @@ class TestSingleDrive(ImageCommitTestCase):
 
         self.vm.add_device("scsi-hd,id=scsi0,drive=drive0")
         self.vm.launch()
+        self.has_quit = False
 
     def tearDown(self):
-        self.vm.shutdown()
+        self.vm.shutdown(has_quit=self.has_quit)
         os.remove(test_img)
         os.remove(mid_img)
         os.remove(backing_img)
@@ -109,6 +110,43 @@ class TestSingleDrive(ImageCommitTestCase):
         self.assertEqual(-1, qemu_io('-f', 'raw', '-c', 'read -P 0xab 0 524288', backing_img).find("verification failed"))
         self.assertEqual(-1, qemu_io('-f', 'raw', '-c', 'read -P 0xef 524288 524288', backing_img).find("verification failed"))
 
+    def test_commit_with_filter_and_quit(self):
+        result = self.vm.qmp('object-add', qom_type='throttle-group', id='tg')
+        self.assert_qmp(result, 'return', {})
+
+        # Add a filter outside of the backing chain
+        result = self.vm.qmp('blockdev-add', driver='throttle', node_name='filter', throttle_group='tg', file='mid')
+        self.assert_qmp(result, 'return', {})
+
+        result = self.vm.qmp('block-commit', device='drive0')
+        self.assert_qmp(result, 'return', {})
+
+        # Quit immediately, thus forcing a simultaneous cancel of the
+        # block job and a bdrv_drain_all()
+        result = self.vm.qmp('quit')
+        self.assert_qmp(result, 'return', {})
+
+        self.has_quit = True
+
+    # Same as above, but this time we add the filter after starting the job
+    def test_commit_plus_filter_and_quit(self):
+        result = self.vm.qmp('object-add', qom_type='throttle-group', id='tg')
+        self.assert_qmp(result, 'return', {})
+
+        result = self.vm.qmp('block-commit', device='drive0')
+        self.assert_qmp(result, 'return', {})
+
+        # Add a filter outside of the backing chain
+        result = self.vm.qmp('blockdev-add', driver='throttle', node_name='filter', throttle_group='tg', file='mid')
+        self.assert_qmp(result, 'return', {})
+
+        # Quit immediately, thus forcing a simultaneous cancel of the
+        # block job and a bdrv_drain_all()
+        result = self.vm.qmp('quit')
+        self.assert_qmp(result, 'return', {})
+
+        self.has_quit = True
+
     def test_device_not_found(self):
         result = self.vm.qmp('block-commit', device='nonexistent', top='%s' % mid_img)
         self.assert_qmp(result, 'error/class', 'DeviceNotFound')
diff --git a/tests/qemu-iotests/040.out b/tests/qemu-iotests/040.out
index 802ffaa0c0..220a5fa82c 100644
--- a/tests/qemu-iotests/040.out
+++ b/tests/qemu-iotests/040.out
@@ -1,5 +1,5 @@
-...........................................
+...............................................
 ----------------------------------------------------------------------
-Ran 43 tests
+Ran 47 tests
 
 OK
-- 
2.21.0



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [Qemu-devel] [PATCH v3 00/10] block: Delay poll when ending drained sections
  2019-07-19  9:26 [Qemu-devel] [PATCH v3 00/10] block: Delay poll when ending drained sections Max Reitz
                   ` (9 preceding siblings ...)
  2019-07-19  9:26 ` [Qemu-devel] [PATCH v3 10/10] iotests: Test commit with a filter on the chain Max Reitz
@ 2019-07-19 13:16 ` Kevin Wolf
  10 siblings, 0 replies; 12+ messages in thread
From: Kevin Wolf @ 2019-07-19 13:16 UTC (permalink / raw)
  To: Max Reitz; +Cc: Stefan Hajnoczi, qemu-devel, qemu-block

Am 19.07.2019 um 11:26 hat Max Reitz geschrieben:
> Hi,
> 
> This series:
> 
> (1) Keeps patch 1, as the previous series, and
> 
> (2) Decides whether all *drained_end* functions should poll or not; as
>     proposed by Kevin, all that should not poll now get a
>     @drained_end_counter pointer, whose pointee they have to increment
>     once for every background operation scheduled, and that background
>     operation will decrement it once it settles.
>     This allows functions that should poll to do so until the counter
>     reaches 0, so they don’t have to poll after scheduling every single
>     operation but can do so once in a place where it’s safe.

Thanks, applied to the block branch.

Kevin


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2019-07-19 13:17 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-07-19  9:26 [Qemu-devel] [PATCH v3 00/10] block: Delay poll when ending drained sections Max Reitz
2019-07-19  9:26 ` [Qemu-devel] [PATCH v3 01/10] block: Introduce BdrvChild.parent_quiesce_counter Max Reitz
2019-07-19  9:26 ` [Qemu-devel] [PATCH v3 02/10] tests: Add job commit by drained_end test Max Reitz
2019-07-19  9:26 ` [Qemu-devel] [PATCH v3 03/10] block: Add @drained_end_counter Max Reitz
2019-07-19  9:26 ` [Qemu-devel] [PATCH v3 04/10] block: Make bdrv_parent_drained_[^_]*() static Max Reitz
2019-07-19  9:26 ` [Qemu-devel] [PATCH v3 05/10] tests: Lock AioContexts in test-block-iothread Max Reitz
2019-07-19  9:26 ` [Qemu-devel] [PATCH v3 06/10] block: Do not poll in bdrv_do_drained_end() Max Reitz
2019-07-19  9:26 ` [Qemu-devel] [PATCH v3 07/10] tests: Extend commit by drained_end test Max Reitz
2019-07-19  9:26 ` [Qemu-devel] [PATCH v3 08/10] block: Loop unsafely in bdrv*drained_end() Max Reitz
2019-07-19  9:26 ` [Qemu-devel] [PATCH v3 09/10] iotests: Add @has_quit to vm.shutdown() Max Reitz
2019-07-19  9:26 ` [Qemu-devel] [PATCH v3 10/10] iotests: Test commit with a filter on the chain Max Reitz
2019-07-19 13:16 ` [Qemu-devel] [PATCH v3 00/10] block: Delay poll when ending drained sections Kevin Wolf

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).