qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v2 0/9] block: Delay poll when ending drained sections
@ 2019-06-19 15:25 Max Reitz
  2019-06-19 15:25 ` [Qemu-devel] [PATCH v2 1/9] block: Introduce BdrvChild.parent_quiesce_counter Max Reitz
                   ` (10 more replies)
  0 siblings, 11 replies; 15+ messages in thread
From: Max Reitz @ 2019-06-19 15:25 UTC (permalink / raw)
  To: qemu-block; +Cc: Kevin Wolf, qemu-devel, Stefan Hajnoczi, Max Reitz

Hi,

This is v2 to “block: Keep track of parent quiescing”.

Please read this cover letter, because I’m very unsure about the design
in this series and I’d appreciate some comments.

As Kevin wrote in his reply to that series, the actual problem is that
bdrv_drain_invoke() polls on every node whenever ending a drain.  This
may cause graph changes, which is actually forbidden.

To solve that problem, this series makes the drain code construct a list
of undrain operations that have been initiated, and then polls all of
them on the root level once graph changes are acceptable.

Note that I don’t like this list concept very much, so I’m open to
alternatives.

Furthermore, all BdrvChildRoles with BDS parents have a broken
.drained_end() implementation.  The documentation clearly states that
this function is not allowed to poll, but it does.  So this series
changes it to a variant (using the new code) that does not poll.

There is a catch, which may actually be a problem, I don’t know: The new
variant of that .drained_end() does not poll at all, never.  As
described above, now every bdrv_drain_invoke() returns an object that
describes when it will be done and which can thus be polled for.  These
objects are just discarded when using BdrvChildRole.drained_end().  That
does not feel quite right.  It would probably be more correct to let
BdrvChildRole.drained_end() return these objects so the top level
bdrv_drained_end() can poll for their completion.

I decided not to do this, for two reasons:
(1) Doing so would spill the “list of objects to poll for” design to
    places outside of block/io.c.  I don’t like the design very much as
    it is, but I can live with it as long as it’s constrained to the
    core drain code in block/io.c.
    This is made worse by the fact that currently, those objects are of
    type BdrvCoDrainData.  But it shouldn’t be a problem to add a new
    type that is externally visible (we only need the AioContext and
    whether bdrv_drain_invoke_entry() is done).

(2) It seems to work as it is.

The alternative would be to add the same GSList ** parameter to
BdrvChildRole.drained_end() that I added in the core drain code in patch
2, and then let the .drained_end() implementation fill that with objects
to poll for.  (Which would be accomplished by adding a frontend to
bdrv_do_drained_end() that lets bdrv_child_cb_drained_poll() pass the
parameter through.)

Opinions?


And then we have bdrv_replace_child_noperm(), which actually wants a
polling BdrvChildRole.drained_end().  So this series adds
BdrvChildRole.drained_end_unquiesce(), which takes that role (if there
is any polling to do).

Note that if I implemented the alternative described above
(.drained_end() gets said GSList ** parameter), a
.drained_end_unquiesce() wouldn’t be necessary.
bdrv_parent_drained_end_single() could just poll the list returned by
.drained_end() by itself.


Finally, patches 1, 8, and 9 are unmodified from v1.


git backport-diff against v1:

Key:
[----] : patches are identical
[####] : number of functional differences between upstream/downstream patch
[down] : patch is downstream-only
The flags [FC] indicate (F)unctional and (C)ontextual differences, respectively

001/9:[----] [--] 'block: Introduce BdrvChild.parent_quiesce_counter'
002/9:[down] 'block: Add @data_objs to bdrv_drain_invoke()'
003/9:[down] 'block: Add bdrv_poll_drain_data_objs()'
004/9:[down] 'block: Move polling out of bdrv_drain_invoke()'
005/9:[down] 'block: Add @poll to bdrv_parent_drained_end_single()'
006/9:[down] 'block: Add bdrv_drained_end_no_poll()'
007/9:[down] 'block: Fix BDS children's .drained_end()'
008/9:[----] [--] 'iotests: Add @has_quit to vm.shutdown()'
009/9:[----] [--] 'iotests: Test commit with a filter on the chain'


Max Reitz (9):
  block: Introduce BdrvChild.parent_quiesce_counter
  block: Add @data_objs to bdrv_drain_invoke()
  block: Add bdrv_poll_drain_data_objs()
  block: Move polling out of bdrv_drain_invoke()
  block: Add @poll to bdrv_parent_drained_end_single()
  block: Add bdrv_drained_end_no_poll()
  block: Fix BDS children's .drained_end()
  iotests: Add @has_quit to vm.shutdown()
  iotests: Test commit with a filter on the chain

 include/block/block.h      |  22 +++++-
 include/block/block_int.h  |  23 ++++++
 block.c                    |  24 +++---
 block/io.c                 | 155 ++++++++++++++++++++++++++++++-------
 python/qemu/__init__.py    |   5 +-
 tests/qemu-iotests/040     |  40 +++++++++-
 tests/qemu-iotests/040.out |   4 +-
 tests/qemu-iotests/255     |   2 +-
 8 files changed, 231 insertions(+), 44 deletions(-)

-- 
2.21.0



^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Qemu-devel] [PATCH v2 1/9] block: Introduce BdrvChild.parent_quiesce_counter
  2019-06-19 15:25 [Qemu-devel] [PATCH v2 0/9] block: Delay poll when ending drained sections Max Reitz
@ 2019-06-19 15:25 ` Max Reitz
  2019-06-19 15:25 ` [Qemu-devel] [PATCH v2 2/9] block: Add @data_objs to bdrv_drain_invoke() Max Reitz
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 15+ messages in thread
From: Max Reitz @ 2019-06-19 15:25 UTC (permalink / raw)
  To: qemu-block; +Cc: Kevin Wolf, qemu-devel, Stefan Hajnoczi, Max Reitz

Commit 5cb2737e925042e6c7cd3fb0b01313950b03cddf laid out why
bdrv_do_drained_end() must decrement the quiesce_counter after
bdrv_drain_invoke().  It did not give a very good reason why it has to
happen after bdrv_parent_drained_end(), instead only claiming symmetry
to bdrv_do_drained_begin().

It turns out that delaying it for so long is wrong.

Situation: We have an active commit job (i.e. a mirror job) from top to
base for the following graph:

                  filter
                    |
                  [file]
                    |
                    v
top --[backing]--> base

Now the VM is closed, which results in the job being cancelled and a
bdrv_drain_all() happening pretty much simultaneously.

Beginning the drain means the job is paused once whenever one of its
nodes is quiesced.  This is reversed when the drain ends.

With how the code currently is, after base's drain ends (which means
that it will have unpaused the job once), its quiesce_counter remains at
1 while it goes to undrain its parents (bdrv_parent_drained_end()).  For
some reason or another, undraining filter causes the job to be kicked
and enter mirror_exit_common(), where it proceeds to invoke
block_job_remove_all_bdrv().

Now base will be detached from the job.  Because its quiesce_counter is
still 1, it will unpause the job once more.  So in total, undraining
base will unpause the job twice.  Eventually, this will lead to the
job's pause_count going negative -- well, it would, were there not an
assertion against this, which crashes qemu.

The general problem is that if in bdrv_parent_drained_end() we undrain
parent A, and then undrain parent B, which then leads to A detaching the
child, bdrv_replace_child_noperm() will undrain A as if we had not done
so yet; that is, one time too many.

It follows that we cannot decrement the quiesce_counter after invoking
bdrv_parent_drained_end().

Unfortunately, decrementing it before bdrv_parent_drained_end() would be
wrong, too.  Imagine the above situation in reverse: Undraining A leads
to B detaching the child.  If we had already decremented the
quiesce_counter by that point, bdrv_replace_child_noperm() would undrain
B one time too little; because it expects bdrv_parent_drained_end() to
issue this undrain.  But bdrv_parent_drained_end() won't do that,
because B is no longer a parent.

Therefore, we have to do something else.  This patch opts for
introducing a second quiesce_counter that counts how many times a
child's parent has been quiesced (though c->role->drained_*).  With
that, bdrv_replace_child_noperm() just has to undrain the parent exactly
that many times when removing a child, and it will always be right.

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 include/block/block.h     |  7 +++++++
 include/block/block_int.h |  9 +++++++++
 block.c                   | 15 +++++----------
 block/io.c                | 14 +++++++++++---
 4 files changed, 32 insertions(+), 13 deletions(-)

diff --git a/include/block/block.h b/include/block/block.h
index f9415ed740..3c084e8222 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -616,6 +616,13 @@ void bdrv_parent_drained_begin(BlockDriverState *bs, BdrvChild *ignore,
  */
 void bdrv_parent_drained_begin_single(BdrvChild *c, bool poll);
 
+/**
+ * bdrv_parent_drained_end_single:
+ *
+ * End a quiesced section for the parent of @c.
+ */
+void bdrv_parent_drained_end_single(BdrvChild *c);
+
 /**
  * bdrv_parent_drained_end:
  *
diff --git a/include/block/block_int.h b/include/block/block_int.h
index d6415b53c1..7f62907932 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -729,6 +729,15 @@ struct BdrvChild {
      */
     bool frozen;
 
+    /*
+     * How many times the parent of this child has been drained
+     * (through role->drained_*).
+     * Usually, this is equal to bs->quiesce_counter (potentially
+     * reduced by bdrv_drain_all_count).  It may differ while the
+     * child is entering or leaving a drained section.
+     */
+    int parent_quiesce_counter;
+
     QLIST_ENTRY(BdrvChild) next;
     QLIST_ENTRY(BdrvChild) next_parent;
 };
diff --git a/block.c b/block.c
index c139540f2b..f7d7d8ccef 100644
--- a/block.c
+++ b/block.c
@@ -2251,24 +2251,19 @@ static void bdrv_replace_child_noperm(BdrvChild *child,
         if (child->role->detach) {
             child->role->detach(child);
         }
-        if (old_bs->quiesce_counter && child->role->drained_end) {
-            int num = old_bs->quiesce_counter;
-            if (child->role->parent_is_bds) {
-                num -= bdrv_drain_all_count;
-            }
-            assert(num >= 0);
-            for (i = 0; i < num; i++) {
-                child->role->drained_end(child);
-            }
+        while (child->parent_quiesce_counter) {
+            bdrv_parent_drained_end_single(child);
         }
         QLIST_REMOVE(child, next_parent);
+    } else {
+        assert(child->parent_quiesce_counter == 0);
     }
 
     child->bs = new_bs;
 
     if (new_bs) {
         QLIST_INSERT_HEAD(&new_bs->parents, child, next_parent);
-        if (new_bs->quiesce_counter && child->role->drained_begin) {
+        if (new_bs->quiesce_counter) {
             int num = new_bs->quiesce_counter;
             if (child->role->parent_is_bds) {
                 num -= bdrv_drain_all_count;
diff --git a/block/io.c b/block/io.c
index 9ba1bada36..112eed385c 100644
--- a/block/io.c
+++ b/block/io.c
@@ -55,6 +55,15 @@ void bdrv_parent_drained_begin(BlockDriverState *bs, BdrvChild *ignore,
     }
 }
 
+void bdrv_parent_drained_end_single(BdrvChild *c)
+{
+    assert(c->parent_quiesce_counter > 0);
+    c->parent_quiesce_counter--;
+    if (c->role->drained_end) {
+        c->role->drained_end(c);
+    }
+}
+
 void bdrv_parent_drained_end(BlockDriverState *bs, BdrvChild *ignore,
                              bool ignore_bds_parents)
 {
@@ -64,9 +73,7 @@ void bdrv_parent_drained_end(BlockDriverState *bs, BdrvChild *ignore,
         if (c == ignore || (ignore_bds_parents && c->role->parent_is_bds)) {
             continue;
         }
-        if (c->role->drained_end) {
-            c->role->drained_end(c);
-        }
+        bdrv_parent_drained_end_single(c);
     }
 }
 
@@ -96,6 +103,7 @@ static bool bdrv_parent_drained_poll(BlockDriverState *bs, BdrvChild *ignore,
 
 void bdrv_parent_drained_begin_single(BdrvChild *c, bool poll)
 {
+    c->parent_quiesce_counter++;
     if (c->role->drained_begin) {
         c->role->drained_begin(c);
     }
-- 
2.21.0



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [Qemu-devel] [PATCH v2 2/9] block: Add @data_objs to bdrv_drain_invoke()
  2019-06-19 15:25 [Qemu-devel] [PATCH v2 0/9] block: Delay poll when ending drained sections Max Reitz
  2019-06-19 15:25 ` [Qemu-devel] [PATCH v2 1/9] block: Introduce BdrvChild.parent_quiesce_counter Max Reitz
@ 2019-06-19 15:25 ` Max Reitz
  2019-06-19 15:25 ` [Qemu-devel] [PATCH v2 3/9] block: Add bdrv_poll_drain_data_objs() Max Reitz
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 15+ messages in thread
From: Max Reitz @ 2019-06-19 15:25 UTC (permalink / raw)
  To: qemu-block; +Cc: Kevin Wolf, qemu-devel, Stefan Hajnoczi, Max Reitz

Callers can now pass a pointer to a GSList that bdrv_drain_invoke() (and
its recursive callees) will fill with all BdrvCoDrainData objects they
create.  This will allow us to move the polling for BdrvCoDrainData.done
to become true out of bdrv_drain_invoke() and into the root drain_end
function.

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 block/io.c | 65 ++++++++++++++++++++++++++++++++++++------------------
 1 file changed, 44 insertions(+), 21 deletions(-)

diff --git a/block/io.c b/block/io.c
index 112eed385c..6b518e5eb0 100644
--- a/block/io.c
+++ b/block/io.c
@@ -194,12 +194,14 @@ typedef struct {
     bool poll;
     BdrvChild *parent;
     bool ignore_bds_parents;
+    GSList **data_objs;
 } BdrvCoDrainData;
 
 static void coroutine_fn bdrv_drain_invoke_entry(void *opaque)
 {
     BdrvCoDrainData *data = opaque;
     BlockDriverState *bs = data->bs;
+    bool data_owned_by_caller = data->data_objs || !data->begin;
 
     if (data->begin) {
         bs->drv->bdrv_co_drain_begin(bs);
@@ -211,13 +213,14 @@ static void coroutine_fn bdrv_drain_invoke_entry(void *opaque)
     atomic_mb_set(&data->done, true);
     bdrv_dec_in_flight(bs);
 
-    if (data->begin) {
+    if (!data_owned_by_caller) {
         g_free(data);
     }
 }
 
 /* Recursively call BlockDriver.bdrv_co_drain_begin/end callbacks */
-static void bdrv_drain_invoke(BlockDriverState *bs, bool begin)
+static void bdrv_drain_invoke(BlockDriverState *bs, bool begin,
+                              GSList **data_objs)
 {
     BdrvCoDrainData *data;
 
@@ -230,16 +233,23 @@ static void bdrv_drain_invoke(BlockDriverState *bs, bool begin)
     *data = (BdrvCoDrainData) {
         .bs = bs,
         .done = false,
-        .begin = begin
+        .begin = begin,
+        .data_objs = data_objs,
     };
 
+    if (data_objs) {
+        *data_objs = g_slist_prepend(*data_objs, data);
+    }
+
     /* Make sure the driver callback completes during the polling phase for
      * drain_begin. */
     bdrv_inc_in_flight(bs);
     data->co = qemu_coroutine_create(bdrv_drain_invoke_entry, data);
     aio_co_schedule(bdrv_get_aio_context(bs), data->co);
 
+    /* TODO: Drop this and make callers pass @data_objs and poll */
     if (!begin) {
+        assert(!data_objs);
         BDRV_POLL_WHILE(bs, !data->done);
         g_free(data);
     }
@@ -281,7 +291,8 @@ static void bdrv_do_drained_begin(BlockDriverState *bs, bool recursive,
                                   BdrvChild *parent, bool ignore_bds_parents,
                                   bool poll);
 static void bdrv_do_drained_end(BlockDriverState *bs, bool recursive,
-                                BdrvChild *parent, bool ignore_bds_parents);
+                                BdrvChild *parent, bool ignore_bds_parents,
+                                GSList **data_objs);
 
 static void bdrv_co_drain_bh_cb(void *opaque)
 {
@@ -308,7 +319,7 @@ static void bdrv_co_drain_bh_cb(void *opaque)
                                   data->ignore_bds_parents, data->poll);
         } else {
             bdrv_do_drained_end(bs, data->recursive, data->parent,
-                                data->ignore_bds_parents);
+                                data->ignore_bds_parents, data->data_objs);
         }
         if (ctx == co_ctx) {
             aio_context_release(ctx);
@@ -326,15 +337,15 @@ static void coroutine_fn bdrv_co_yield_to_drain(BlockDriverState *bs,
                                                 bool begin, bool recursive,
                                                 BdrvChild *parent,
                                                 bool ignore_bds_parents,
-                                                bool poll)
+                                                bool poll, GSList **data_objs)
 {
-    BdrvCoDrainData data;
+    BdrvCoDrainData *data = g_new(BdrvCoDrainData, 1);
 
     /* Calling bdrv_drain() from a BH ensures the current coroutine yields and
      * other coroutines run if they were queued by aio_co_enter(). */
 
     assert(qemu_in_coroutine());
-    data = (BdrvCoDrainData) {
+    *data = (BdrvCoDrainData) {
         .co = qemu_coroutine_self(),
         .bs = bs,
         .done = false,
@@ -343,17 +354,27 @@ static void coroutine_fn bdrv_co_yield_to_drain(BlockDriverState *bs,
         .parent = parent,
         .ignore_bds_parents = ignore_bds_parents,
         .poll = poll,
+        .data_objs = data_objs,
     };
+
+    if (data_objs) {
+        *data_objs = g_slist_prepend(*data_objs, data);
+    }
+
     if (bs) {
         bdrv_inc_in_flight(bs);
     }
     aio_bh_schedule_oneshot(bdrv_get_aio_context(bs),
-                            bdrv_co_drain_bh_cb, &data);
+                            bdrv_co_drain_bh_cb, data);
 
     qemu_coroutine_yield();
     /* If we are resumed from some other event (such as an aio completion or a
      * timer callback), it is a bug in the caller that should be fixed. */
-    assert(data.done);
+    assert(data->done);
+
+    if (!data_objs) {
+        g_free(data);
+    }
 }
 
 void bdrv_do_drained_begin_quiesce(BlockDriverState *bs,
@@ -367,7 +388,7 @@ void bdrv_do_drained_begin_quiesce(BlockDriverState *bs,
     }
 
     bdrv_parent_drained_begin(bs, parent, ignore_bds_parents);
-    bdrv_drain_invoke(bs, true);
+    bdrv_drain_invoke(bs, true, NULL);
 }
 
 static void bdrv_do_drained_begin(BlockDriverState *bs, bool recursive,
@@ -378,7 +399,7 @@ static void bdrv_do_drained_begin(BlockDriverState *bs, bool recursive,
 
     if (qemu_in_coroutine()) {
         bdrv_co_yield_to_drain(bs, true, recursive, parent, ignore_bds_parents,
-                               poll);
+                               poll, NULL);
         return;
     }
 
@@ -419,20 +440,21 @@ void bdrv_subtree_drained_begin(BlockDriverState *bs)
 }
 
 static void bdrv_do_drained_end(BlockDriverState *bs, bool recursive,
-                                BdrvChild *parent, bool ignore_bds_parents)
+                                BdrvChild *parent, bool ignore_bds_parents,
+                                GSList **data_objs)
 {
     BdrvChild *child, *next;
     int old_quiesce_counter;
 
     if (qemu_in_coroutine()) {
         bdrv_co_yield_to_drain(bs, false, recursive, parent, ignore_bds_parents,
-                               false);
+                               false, data_objs);
         return;
     }
     assert(bs->quiesce_counter > 0);
 
     /* Re-enable things in child-to-parent order */
-    bdrv_drain_invoke(bs, false);
+    bdrv_drain_invoke(bs, false, data_objs);
     bdrv_parent_drained_end(bs, parent, ignore_bds_parents);
 
     old_quiesce_counter = atomic_fetch_dec(&bs->quiesce_counter);
@@ -444,19 +466,20 @@ static void bdrv_do_drained_end(BlockDriverState *bs, bool recursive,
         assert(!ignore_bds_parents);
         bs->recursive_quiesce_counter--;
         QLIST_FOREACH_SAFE(child, &bs->children, next, next) {
-            bdrv_do_drained_end(child->bs, true, child, ignore_bds_parents);
+            bdrv_do_drained_end(child->bs, true, child, ignore_bds_parents,
+                                data_objs);
         }
     }
 }
 
 void bdrv_drained_end(BlockDriverState *bs)
 {
-    bdrv_do_drained_end(bs, false, NULL, false);
+    bdrv_do_drained_end(bs, false, NULL, false, NULL);
 }
 
 void bdrv_subtree_drained_end(BlockDriverState *bs)
 {
-    bdrv_do_drained_end(bs, true, NULL, false);
+    bdrv_do_drained_end(bs, true, NULL, false, NULL);
 }
 
 void bdrv_apply_subtree_drain(BdrvChild *child, BlockDriverState *new_parent)
@@ -473,7 +496,7 @@ void bdrv_unapply_subtree_drain(BdrvChild *child, BlockDriverState *old_parent)
     int i;
 
     for (i = 0; i < old_parent->recursive_quiesce_counter; i++) {
-        bdrv_do_drained_end(child->bs, true, child, false);
+        bdrv_do_drained_end(child->bs, true, child, false, NULL);
     }
 }
 
@@ -543,7 +566,7 @@ void bdrv_drain_all_begin(void)
     BlockDriverState *bs = NULL;
 
     if (qemu_in_coroutine()) {
-        bdrv_co_yield_to_drain(NULL, true, false, NULL, true, true);
+        bdrv_co_yield_to_drain(NULL, true, false, NULL, true, true, NULL);
         return;
     }
 
@@ -579,7 +602,7 @@ void bdrv_drain_all_end(void)
         AioContext *aio_context = bdrv_get_aio_context(bs);
 
         aio_context_acquire(aio_context);
-        bdrv_do_drained_end(bs, false, NULL, true);
+        bdrv_do_drained_end(bs, false, NULL, true, NULL);
         aio_context_release(aio_context);
     }
 
-- 
2.21.0



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [Qemu-devel] [PATCH v2 3/9] block: Add bdrv_poll_drain_data_objs()
  2019-06-19 15:25 [Qemu-devel] [PATCH v2 0/9] block: Delay poll when ending drained sections Max Reitz
  2019-06-19 15:25 ` [Qemu-devel] [PATCH v2 1/9] block: Introduce BdrvChild.parent_quiesce_counter Max Reitz
  2019-06-19 15:25 ` [Qemu-devel] [PATCH v2 2/9] block: Add @data_objs to bdrv_drain_invoke() Max Reitz
@ 2019-06-19 15:25 ` Max Reitz
  2019-06-19 15:25 ` [Qemu-devel] [PATCH v2 4/9] block: Move polling out of bdrv_drain_invoke() Max Reitz
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 15+ messages in thread
From: Max Reitz @ 2019-06-19 15:25 UTC (permalink / raw)
  To: qemu-block; +Cc: Kevin Wolf, qemu-devel, Stefan Hajnoczi, Max Reitz

This function polls all of the involved AioContexts for a GSList of
BdrvCoDrainData objects until all objects' .done is true.

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 block/io.c | 57 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 57 insertions(+)

diff --git a/block/io.c b/block/io.c
index 6b518e5eb0..eb84774abd 100644
--- a/block/io.c
+++ b/block/io.c
@@ -255,6 +255,63 @@ static void bdrv_drain_invoke(BlockDriverState *bs, bool begin,
     }
 }
 
+/* TODO: Actually use this function and drop this forward declaration */
+static void bdrv_poll_drain_data_objs(GSList **data_objs, bool acquire_ctx)
+    __attribute__((unused));
+
+/*
+ * Poll the AioContexts in the given list of BdrvCoDrainData objects
+ * until all of those objects are "done" (i.e. their .done field is
+ * true).
+ * Also, free all objects and the list.
+ *
+ * If @acquire_ctx is true, the AioContexts are locked while they are
+ * polled.
+ */
+static void bdrv_poll_drain_data_objs(GSList **data_objs, bool acquire_ctx)
+{
+    GSList *contexts = NULL;
+    GSList *iter;
+
+    /* First collect the contexts while the BDSs are not gone */
+    for (iter = *data_objs; iter; iter = iter->next) {
+        BdrvCoDrainData *drain_data = iter->data;
+        contexts = g_slist_prepend(contexts,
+                                   bdrv_get_aio_context(drain_data->bs));
+    }
+
+    /*
+     * Reverse the list so it is in the same order as *data_objs
+     * (prepend and then reverse has better performance than appending)
+     */
+    contexts = g_slist_reverse(contexts);
+
+    /* The BDSs may disappear here, but we only need their contexts */
+    while (*data_objs) {
+        GSList *next;
+        BdrvCoDrainData *drain_data = (*data_objs)->data;
+        AioContext *ctx = contexts->data;
+
+        if (acquire_ctx) {
+            aio_context_acquire(ctx);
+        }
+        AIO_WAIT_WHILE(ctx, !drain_data->done);
+        if (acquire_ctx) {
+            aio_context_release(ctx);
+        }
+
+        g_free(drain_data);
+
+        next = (*data_objs)->next;
+        g_slist_free_1(*data_objs);
+        *data_objs = next;
+
+        next = contexts->next;
+        g_slist_free_1(contexts);
+        contexts = next;
+    }
+}
+
 /* Returns true if BDRV_POLL_WHILE() should go into a blocking aio_poll() */
 bool bdrv_drain_poll(BlockDriverState *bs, bool recursive,
                      BdrvChild *ignore_parent, bool ignore_bds_parents)
-- 
2.21.0



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [Qemu-devel] [PATCH v2 4/9] block: Move polling out of bdrv_drain_invoke()
  2019-06-19 15:25 [Qemu-devel] [PATCH v2 0/9] block: Delay poll when ending drained sections Max Reitz
                   ` (2 preceding siblings ...)
  2019-06-19 15:25 ` [Qemu-devel] [PATCH v2 3/9] block: Add bdrv_poll_drain_data_objs() Max Reitz
@ 2019-06-19 15:25 ` Max Reitz
  2019-06-19 15:25 ` [Qemu-devel] [PATCH v2 5/9] block: Add @poll to bdrv_parent_drained_end_single() Max Reitz
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 15+ messages in thread
From: Max Reitz @ 2019-06-19 15:25 UTC (permalink / raw)
  To: qemu-block; +Cc: Kevin Wolf, qemu-devel, Stefan Hajnoczi, Max Reitz

Instead, let the root drained_end functions poll after the whole subtree
has been undrained.  These are bdrv_drain_all_end() and sometimes
bdrv_do_drained_end(); the "sometimes" implies that the latter needs a
parameter to tell whether it should poll or not.

Note that bdrv_do_drained_end() now always receives either poll=true
(for the root level) or a pointer to a BdrvCoDrainData object list (as a
recursive call from the root bdrv_do_drained_end() invocation).
In the future, we will have callers that pass poll=false and no list,
because they do not care about polling at all.

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 block/io.c | 46 +++++++++++++++++++++++++---------------------
 1 file changed, 25 insertions(+), 21 deletions(-)

diff --git a/block/io.c b/block/io.c
index eb84774abd..426ad5b4a1 100644
--- a/block/io.c
+++ b/block/io.c
@@ -201,7 +201,7 @@ static void coroutine_fn bdrv_drain_invoke_entry(void *opaque)
 {
     BdrvCoDrainData *data = opaque;
     BlockDriverState *bs = data->bs;
-    bool data_owned_by_caller = data->data_objs || !data->begin;
+    bool data_owned_by_caller = data->data_objs;
 
     if (data->begin) {
         bs->drv->bdrv_co_drain_begin(bs);
@@ -246,19 +246,8 @@ static void bdrv_drain_invoke(BlockDriverState *bs, bool begin,
     bdrv_inc_in_flight(bs);
     data->co = qemu_coroutine_create(bdrv_drain_invoke_entry, data);
     aio_co_schedule(bdrv_get_aio_context(bs), data->co);
-
-    /* TODO: Drop this and make callers pass @data_objs and poll */
-    if (!begin) {
-        assert(!data_objs);
-        BDRV_POLL_WHILE(bs, !data->done);
-        g_free(data);
-    }
 }
 
-/* TODO: Actually use this function and drop this forward declaration */
-static void bdrv_poll_drain_data_objs(GSList **data_objs, bool acquire_ctx)
-    __attribute__((unused));
-
 /*
  * Poll the AioContexts in the given list of BdrvCoDrainData objects
  * until all of those objects are "done" (i.e. their .done field is
@@ -349,7 +338,7 @@ static void bdrv_do_drained_begin(BlockDriverState *bs, bool recursive,
                                   bool poll);
 static void bdrv_do_drained_end(BlockDriverState *bs, bool recursive,
                                 BdrvChild *parent, bool ignore_bds_parents,
-                                GSList **data_objs);
+                                bool poll, GSList **data_objs);
 
 static void bdrv_co_drain_bh_cb(void *opaque)
 {
@@ -376,7 +365,8 @@ static void bdrv_co_drain_bh_cb(void *opaque)
                                   data->ignore_bds_parents, data->poll);
         } else {
             bdrv_do_drained_end(bs, data->recursive, data->parent,
-                                data->ignore_bds_parents, data->data_objs);
+                                data->ignore_bds_parents, data->poll,
+                                data->data_objs);
         }
         if (ctx == co_ctx) {
             aio_context_release(ctx);
@@ -498,18 +488,24 @@ void bdrv_subtree_drained_begin(BlockDriverState *bs)
 
 static void bdrv_do_drained_end(BlockDriverState *bs, bool recursive,
                                 BdrvChild *parent, bool ignore_bds_parents,
-                                GSList **data_objs)
+                                bool poll, GSList **data_objs)
 {
     BdrvChild *child, *next;
     int old_quiesce_counter;
+    GSList *poll_data_objs = NULL;
 
     if (qemu_in_coroutine()) {
         bdrv_co_yield_to_drain(bs, false, recursive, parent, ignore_bds_parents,
-                               false, data_objs);
+                               poll, data_objs);
         return;
     }
     assert(bs->quiesce_counter > 0);
 
+    if (poll) {
+        assert(data_objs == NULL);
+        data_objs = &poll_data_objs;
+    }
+
     /* Re-enable things in child-to-parent order */
     bdrv_drain_invoke(bs, false, data_objs);
     bdrv_parent_drained_end(bs, parent, ignore_bds_parents);
@@ -524,19 +520,24 @@ static void bdrv_do_drained_end(BlockDriverState *bs, bool recursive,
         bs->recursive_quiesce_counter--;
         QLIST_FOREACH_SAFE(child, &bs->children, next, next) {
             bdrv_do_drained_end(child->bs, true, child, ignore_bds_parents,
-                                data_objs);
+                                false, data_objs);
         }
     }
+
+    if (poll) {
+        assert(data_objs == &poll_data_objs);
+        bdrv_poll_drain_data_objs(data_objs, false);
+    }
 }
 
 void bdrv_drained_end(BlockDriverState *bs)
 {
-    bdrv_do_drained_end(bs, false, NULL, false, NULL);
+    bdrv_do_drained_end(bs, false, NULL, false, true, NULL);
 }
 
 void bdrv_subtree_drained_end(BlockDriverState *bs)
 {
-    bdrv_do_drained_end(bs, true, NULL, false, NULL);
+    bdrv_do_drained_end(bs, true, NULL, false, true, NULL);
 }
 
 void bdrv_apply_subtree_drain(BdrvChild *child, BlockDriverState *new_parent)
@@ -553,7 +554,7 @@ void bdrv_unapply_subtree_drain(BdrvChild *child, BlockDriverState *old_parent)
     int i;
 
     for (i = 0; i < old_parent->recursive_quiesce_counter; i++) {
-        bdrv_do_drained_end(child->bs, true, child, false, NULL);
+        bdrv_do_drained_end(child->bs, true, child, false, true, NULL);
     }
 }
 
@@ -654,15 +655,18 @@ void bdrv_drain_all_begin(void)
 void bdrv_drain_all_end(void)
 {
     BlockDriverState *bs = NULL;
+    GSList *poll_data_objs = NULL;
 
     while ((bs = bdrv_next_all_states(bs))) {
         AioContext *aio_context = bdrv_get_aio_context(bs);
 
         aio_context_acquire(aio_context);
-        bdrv_do_drained_end(bs, false, NULL, true, NULL);
+        bdrv_do_drained_end(bs, false, NULL, true, false, &poll_data_objs);
         aio_context_release(aio_context);
     }
 
+    bdrv_poll_drain_data_objs(&poll_data_objs, true);
+
     assert(bdrv_drain_all_count > 0);
     bdrv_drain_all_count--;
 }
-- 
2.21.0



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [Qemu-devel] [PATCH v2 5/9] block: Add @poll to bdrv_parent_drained_end_single()
  2019-06-19 15:25 [Qemu-devel] [PATCH v2 0/9] block: Delay poll when ending drained sections Max Reitz
                   ` (3 preceding siblings ...)
  2019-06-19 15:25 ` [Qemu-devel] [PATCH v2 4/9] block: Move polling out of bdrv_drain_invoke() Max Reitz
@ 2019-06-19 15:25 ` Max Reitz
  2019-06-19 15:26 ` [Qemu-devel] [PATCH v2 6/9] block: Add bdrv_drained_end_no_poll() Max Reitz
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 15+ messages in thread
From: Max Reitz @ 2019-06-19 15:25 UTC (permalink / raw)
  To: qemu-block; +Cc: Kevin Wolf, qemu-devel, Stefan Hajnoczi, Max Reitz

This function has two callers; one wants it to poll, the other does not.

To implement this parameter, we also need a new BdrvChildRole method,
namely .drained_end_unquiesce().  This function differs from
.drained_end() in that it should poll and is thus allowed to modify the
block graph.

Note that currently every BDS child's .drained_end() implementation is
simply broken in exactly that regard; the current implementation is
actually fit for .drained_end_unquiesce().  However, at this point we
cannot implement .drained_end() correctly, so a later patch in this
series will address that issue.

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 include/block/block.h     |  5 ++++-
 include/block/block_int.h | 14 ++++++++++++++
 block.c                   |  2 +-
 block/io.c                | 10 +++++++---
 4 files changed, 26 insertions(+), 5 deletions(-)

diff --git a/include/block/block.h b/include/block/block.h
index 3c084e8222..d98237f0f1 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -620,8 +620,11 @@ void bdrv_parent_drained_begin_single(BdrvChild *c, bool poll);
  * bdrv_parent_drained_end_single:
  *
  * End a quiesced section for the parent of @c.
+ *
+ * If @poll is true, poll until the parent is unquiesced.  This may
+ * lead to changes in the block graph.
  */
-void bdrv_parent_drained_end_single(BdrvChild *c);
+void bdrv_parent_drained_end_single(BdrvChild *c, bool poll);
 
 /**
  * bdrv_parent_drained_end:
diff --git a/include/block/block_int.h b/include/block/block_int.h
index 7f62907932..fce9a9e7ee 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -670,6 +670,20 @@ struct BdrvChildRole {
     void (*drained_begin)(BdrvChild *child);
     void (*drained_end)(BdrvChild *child);
 
+    /*
+     * Parents that require polling to actually become unquiesced
+     * should implement this function in addition to .drained_end().
+     * In it, the parent should end the drain and aio_poll() until it
+     * is no longer quiesced.
+     *
+     * Thus, in contrast to .drained_end(), this function is allowed
+     * to change the graph.
+     *
+     * (This pointer may be NULL, in which case .drained_end() is
+     * called instead.)
+     */
+    void (*drained_end_unquiesce)(BdrvChild *child);
+
     /*
      * Returns whether the parent has pending requests for the child. This
      * callback is polled after .drained_begin() has been called until all
diff --git a/block.c b/block.c
index f7d7d8ccef..31b85df0f0 100644
--- a/block.c
+++ b/block.c
@@ -2252,7 +2252,7 @@ static void bdrv_replace_child_noperm(BdrvChild *child,
             child->role->detach(child);
         }
         while (child->parent_quiesce_counter) {
-            bdrv_parent_drained_end_single(child);
+            bdrv_parent_drained_end_single(child, true);
         }
         QLIST_REMOVE(child, next_parent);
     } else {
diff --git a/block/io.c b/block/io.c
index 426ad5b4a1..5f30baa8e5 100644
--- a/block/io.c
+++ b/block/io.c
@@ -55,12 +55,16 @@ void bdrv_parent_drained_begin(BlockDriverState *bs, BdrvChild *ignore,
     }
 }
 
-void bdrv_parent_drained_end_single(BdrvChild *c)
+void bdrv_parent_drained_end_single(BdrvChild *c, bool poll)
 {
     assert(c->parent_quiesce_counter > 0);
     c->parent_quiesce_counter--;
     if (c->role->drained_end) {
-        c->role->drained_end(c);
+        if (poll && c->role->drained_end_unquiesce) {
+            c->role->drained_end_unquiesce(c);
+        } else {
+            c->role->drained_end(c);
+        }
     }
 }
 
@@ -73,7 +77,7 @@ void bdrv_parent_drained_end(BlockDriverState *bs, BdrvChild *ignore,
         if (c == ignore || (ignore_bds_parents && c->role->parent_is_bds)) {
             continue;
         }
-        bdrv_parent_drained_end_single(c);
+        bdrv_parent_drained_end_single(c, false);
     }
 }
 
-- 
2.21.0



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [Qemu-devel] [PATCH v2 6/9] block: Add bdrv_drained_end_no_poll()
  2019-06-19 15:25 [Qemu-devel] [PATCH v2 0/9] block: Delay poll when ending drained sections Max Reitz
                   ` (4 preceding siblings ...)
  2019-06-19 15:25 ` [Qemu-devel] [PATCH v2 5/9] block: Add @poll to bdrv_parent_drained_end_single() Max Reitz
@ 2019-06-19 15:26 ` Max Reitz
  2019-06-19 15:26 ` [Qemu-devel] [PATCH v2 7/9] block: Fix BDS children's .drained_end() Max Reitz
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 15+ messages in thread
From: Max Reitz @ 2019-06-19 15:26 UTC (permalink / raw)
  To: qemu-block; +Cc: Kevin Wolf, qemu-devel, Stefan Hajnoczi, Max Reitz

In contrast to bdrv_drained_end(), which does poll and may thus lead to
graph changes, this new interface to bdrv_do_drained_end() will not
poll.

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 include/block/block.h | 12 +++++++++++-
 block/io.c            |  5 +++++
 2 files changed, 16 insertions(+), 1 deletion(-)

diff --git a/include/block/block.h b/include/block/block.h
index d98237f0f1..422448d773 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -680,10 +680,20 @@ void bdrv_subtree_drained_begin(BlockDriverState *bs);
 /**
  * bdrv_drained_end:
  *
- * End a quiescent section started by bdrv_drained_begin().
+ * End a quiescent section started by bdrv_drained_begin().  This may
+ * result in block graph changes.
  */
 void bdrv_drained_end(BlockDriverState *bs);
 
+/**
+ * bdrv_drained_end_no_poll:
+ *
+ * Same as bdrv_drained_end(), but do not poll for the subgraph to
+ * actually become unquiesced.  Therefore, no graph changes will occur
+ * with this function.
+ */
+void bdrv_drained_end_no_poll(BlockDriverState *bs);
+
 /**
  * End a quiescent section started by bdrv_subtree_drained_begin().
  */
diff --git a/block/io.c b/block/io.c
index 5f30baa8e5..188dea22bc 100644
--- a/block/io.c
+++ b/block/io.c
@@ -539,6 +539,11 @@ void bdrv_drained_end(BlockDriverState *bs)
     bdrv_do_drained_end(bs, false, NULL, false, true, NULL);
 }
 
+void bdrv_drained_end_no_poll(BlockDriverState *bs)
+{
+    bdrv_do_drained_end(bs, false, NULL, false, false, NULL);
+}
+
 void bdrv_subtree_drained_end(BlockDriverState *bs)
 {
     bdrv_do_drained_end(bs, true, NULL, false, true, NULL);
-- 
2.21.0



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [Qemu-devel] [PATCH v2 7/9] block: Fix BDS children's .drained_end()
  2019-06-19 15:25 [Qemu-devel] [PATCH v2 0/9] block: Delay poll when ending drained sections Max Reitz
                   ` (5 preceding siblings ...)
  2019-06-19 15:26 ` [Qemu-devel] [PATCH v2 6/9] block: Add bdrv_drained_end_no_poll() Max Reitz
@ 2019-06-19 15:26 ` Max Reitz
  2019-06-19 15:26 ` [Qemu-devel] [PATCH v2 8/9] iotests: Add @has_quit to vm.shutdown() Max Reitz
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 15+ messages in thread
From: Max Reitz @ 2019-06-19 15:26 UTC (permalink / raw)
  To: qemu-block; +Cc: Kevin Wolf, qemu-devel, Stefan Hajnoczi, Max Reitz

BdrvChildRole.drained_end() must not poll, bdrv_child_cb_drained_end()
should use bdrv_drained_end_no_poll() instead of bdrv_drained_end().

The existing implementation works perfectly well for
.drained_end_unquiesce(), though, so use it there.

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 block.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/block.c b/block.c
index 31b85df0f0..1652f3d29b 100644
--- a/block.c
+++ b/block.c
@@ -912,6 +912,12 @@ static bool bdrv_child_cb_drained_poll(BdrvChild *child)
 }
 
 static void bdrv_child_cb_drained_end(BdrvChild *child)
+{
+    BlockDriverState *bs = child->opaque;
+    bdrv_drained_end_no_poll(bs);
+}
+
+static void bdrv_child_cb_drained_end_unquiesce(BdrvChild *child)
 {
     BlockDriverState *bs = child->opaque;
     bdrv_drained_end(bs);
@@ -1014,6 +1020,7 @@ const BdrvChildRole child_file = {
     .drained_begin   = bdrv_child_cb_drained_begin,
     .drained_poll    = bdrv_child_cb_drained_poll,
     .drained_end     = bdrv_child_cb_drained_end,
+    .drained_end_unquiesce = bdrv_child_cb_drained_end_unquiesce,
     .attach          = bdrv_child_cb_attach,
     .detach          = bdrv_child_cb_detach,
     .inactivate      = bdrv_child_cb_inactivate,
@@ -1042,6 +1049,7 @@ const BdrvChildRole child_format = {
     .drained_begin   = bdrv_child_cb_drained_begin,
     .drained_poll    = bdrv_child_cb_drained_poll,
     .drained_end     = bdrv_child_cb_drained_end,
+    .drained_end_unquiesce = bdrv_child_cb_drained_end_unquiesce,
     .attach          = bdrv_child_cb_attach,
     .detach          = bdrv_child_cb_detach,
     .inactivate      = bdrv_child_cb_inactivate,
@@ -1168,6 +1176,7 @@ const BdrvChildRole child_backing = {
     .drained_begin   = bdrv_child_cb_drained_begin,
     .drained_poll    = bdrv_child_cb_drained_poll,
     .drained_end     = bdrv_child_cb_drained_end,
+    .drained_end_unquiesce = bdrv_child_cb_drained_end_unquiesce,
     .inactivate      = bdrv_child_cb_inactivate,
     .update_filename = bdrv_backing_update_filename,
     .can_set_aio_ctx = bdrv_child_cb_can_set_aio_ctx,
-- 
2.21.0



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [Qemu-devel] [PATCH v2 8/9] iotests: Add @has_quit to vm.shutdown()
  2019-06-19 15:25 [Qemu-devel] [PATCH v2 0/9] block: Delay poll when ending drained sections Max Reitz
                   ` (6 preceding siblings ...)
  2019-06-19 15:26 ` [Qemu-devel] [PATCH v2 7/9] block: Fix BDS children's .drained_end() Max Reitz
@ 2019-06-19 15:26 ` Max Reitz
  2019-06-19 15:26 ` [Qemu-devel] [PATCH v2 9/9] iotests: Test commit with a filter on the chain Max Reitz
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 15+ messages in thread
From: Max Reitz @ 2019-06-19 15:26 UTC (permalink / raw)
  To: qemu-block; +Cc: Kevin Wolf, qemu-devel, Stefan Hajnoczi, Max Reitz

If a test has issued a quit command already (which may be useful to do
explicitly because the test wants to show its effects),
QEMUMachine.shutdown() should not do so again.  Otherwise, the VM may
well return an ECONNRESET which will lead QEMUMachine.shutdown() to
killing it, which then turns into a "qemu received signal 9" line.

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 python/qemu/__init__.py | 5 +++--
 tests/qemu-iotests/255  | 2 +-
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/python/qemu/__init__.py b/python/qemu/__init__.py
index dbaf8a5311..25207a2970 100644
--- a/python/qemu/__init__.py
+++ b/python/qemu/__init__.py
@@ -332,13 +332,14 @@ class QEMUMachine(object):
         self._load_io_log()
         self._post_shutdown()
 
-    def shutdown(self):
+    def shutdown(self, has_quit=False):
         """
         Terminate the VM and clean up
         """
         if self.is_running():
             try:
-                self._qmp.cmd('quit')
+                if not has_quit:
+                    self._qmp.cmd('quit')
                 self._qmp.close()
             except:
                 self._popen.kill()
diff --git a/tests/qemu-iotests/255 b/tests/qemu-iotests/255
index 49433ec122..3632d507d0 100755
--- a/tests/qemu-iotests/255
+++ b/tests/qemu-iotests/255
@@ -132,4 +132,4 @@ with iotests.FilePath('src.qcow2') as src_path, \
     vm.qmp_log('block-job-cancel', device='job0')
     vm.qmp_log('quit')
 
-    vm.shutdown()
+    vm.shutdown(has_quit=True)
-- 
2.21.0



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [Qemu-devel] [PATCH v2 9/9] iotests: Test commit with a filter on the chain
  2019-06-19 15:25 [Qemu-devel] [PATCH v2 0/9] block: Delay poll when ending drained sections Max Reitz
                   ` (7 preceding siblings ...)
  2019-06-19 15:26 ` [Qemu-devel] [PATCH v2 8/9] iotests: Add @has_quit to vm.shutdown() Max Reitz
@ 2019-06-19 15:26 ` Max Reitz
  2019-07-15 13:24 ` [Qemu-devel] [PATCH v2 0/9] block: Delay poll when ending drained sections Max Reitz
  2019-07-16 14:40 ` Kevin Wolf
  10 siblings, 0 replies; 15+ messages in thread
From: Max Reitz @ 2019-06-19 15:26 UTC (permalink / raw)
  To: qemu-block; +Cc: Kevin Wolf, qemu-devel, Stefan Hajnoczi, Max Reitz

Before the previous patches, the first case resulted in a failed
assertion (which is noted as qemu receiving a SIGABRT in the test
output), and the second usually triggered a segmentation fault.

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 tests/qemu-iotests/040     | 40 +++++++++++++++++++++++++++++++++++++-
 tests/qemu-iotests/040.out |  4 ++--
 2 files changed, 41 insertions(+), 3 deletions(-)

diff --git a/tests/qemu-iotests/040 b/tests/qemu-iotests/040
index b81133a474..aa0b1847e3 100755
--- a/tests/qemu-iotests/040
+++ b/tests/qemu-iotests/040
@@ -92,9 +92,10 @@ class TestSingleDrive(ImageCommitTestCase):
 
         self.vm.add_device("scsi-hd,id=scsi0,drive=drive0")
         self.vm.launch()
+        self.has_quit = False
 
     def tearDown(self):
-        self.vm.shutdown()
+        self.vm.shutdown(has_quit=self.has_quit)
         os.remove(test_img)
         os.remove(mid_img)
         os.remove(backing_img)
@@ -109,6 +110,43 @@ class TestSingleDrive(ImageCommitTestCase):
         self.assertEqual(-1, qemu_io('-f', 'raw', '-c', 'read -P 0xab 0 524288', backing_img).find("verification failed"))
         self.assertEqual(-1, qemu_io('-f', 'raw', '-c', 'read -P 0xef 524288 524288', backing_img).find("verification failed"))
 
+    def test_commit_with_filter_and_quit(self):
+        result = self.vm.qmp('object-add', qom_type='throttle-group', id='tg')
+        self.assert_qmp(result, 'return', {})
+
+        # Add a filter outside of the backing chain
+        result = self.vm.qmp('blockdev-add', driver='throttle', node_name='filter', throttle_group='tg', file='mid')
+        self.assert_qmp(result, 'return', {})
+
+        result = self.vm.qmp('block-commit', device='drive0')
+        self.assert_qmp(result, 'return', {})
+
+        # Quit immediately, thus forcing a simultaneous cancel of the
+        # block job and a bdrv_drain_all()
+        result = self.vm.qmp('quit')
+        self.assert_qmp(result, 'return', {})
+
+        self.has_quit = True
+
+    # Same as above, but this time we add the filter after starting the job
+    def test_commit_plus_filter_and_quit(self):
+        result = self.vm.qmp('object-add', qom_type='throttle-group', id='tg')
+        self.assert_qmp(result, 'return', {})
+
+        result = self.vm.qmp('block-commit', device='drive0')
+        self.assert_qmp(result, 'return', {})
+
+        # Add a filter outside of the backing chain
+        result = self.vm.qmp('blockdev-add', driver='throttle', node_name='filter', throttle_group='tg', file='mid')
+        self.assert_qmp(result, 'return', {})
+
+        # Quit immediately, thus forcing a simultaneous cancel of the
+        # block job and a bdrv_drain_all()
+        result = self.vm.qmp('quit')
+        self.assert_qmp(result, 'return', {})
+
+        self.has_quit = True
+
     def test_device_not_found(self):
         result = self.vm.qmp('block-commit', device='nonexistent', top='%s' % mid_img)
         self.assert_qmp(result, 'error/class', 'DeviceNotFound')
diff --git a/tests/qemu-iotests/040.out b/tests/qemu-iotests/040.out
index 802ffaa0c0..220a5fa82c 100644
--- a/tests/qemu-iotests/040.out
+++ b/tests/qemu-iotests/040.out
@@ -1,5 +1,5 @@
-...........................................
+...............................................
 ----------------------------------------------------------------------
-Ran 43 tests
+Ran 47 tests
 
 OK
-- 
2.21.0



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [PATCH v2 0/9] block: Delay poll when ending drained sections
  2019-06-19 15:25 [Qemu-devel] [PATCH v2 0/9] block: Delay poll when ending drained sections Max Reitz
                   ` (8 preceding siblings ...)
  2019-06-19 15:26 ` [Qemu-devel] [PATCH v2 9/9] iotests: Test commit with a filter on the chain Max Reitz
@ 2019-07-15 13:24 ` Max Reitz
  2019-07-16 14:40 ` Kevin Wolf
  10 siblings, 0 replies; 15+ messages in thread
From: Max Reitz @ 2019-07-15 13:24 UTC (permalink / raw)
  To: qemu-block; +Cc: Kevin Wolf, qemu-devel, Stefan Hajnoczi


[-- Attachment #1.1: Type: text/plain, Size: 5195 bytes --]

Ping – as this series fixes an abort and a segfault, I would appreciate
reviews.

(Head over to “Fixes for concurrent block jobs” for even more fixes for
aborts and segfaults.)

On 19.06.19 17:25, Max Reitz wrote:
> Hi,
> 
> This is v2 to “block: Keep track of parent quiescing”.
> 
> Please read this cover letter, because I’m very unsure about the design
> in this series and I’d appreciate some comments.
> 
> As Kevin wrote in his reply to that series, the actual problem is that
> bdrv_drain_invoke() polls on every node whenever ending a drain.  This
> may cause graph changes, which is actually forbidden.
> 
> To solve that problem, this series makes the drain code construct a list
> of undrain operations that have been initiated, and then polls all of
> them on the root level once graph changes are acceptable.
> 
> Note that I don’t like this list concept very much, so I’m open to
> alternatives.
> 
> Furthermore, all BdrvChildRoles with BDS parents have a broken
> .drained_end() implementation.  The documentation clearly states that
> this function is not allowed to poll, but it does.  So this series
> changes it to a variant (using the new code) that does not poll.
> 
> There is a catch, which may actually be a problem, I don’t know: The new
> variant of that .drained_end() does not poll at all, never.  As
> described above, now every bdrv_drain_invoke() returns an object that
> describes when it will be done and which can thus be polled for.  These
> objects are just discarded when using BdrvChildRole.drained_end().  That
> does not feel quite right.  It would probably be more correct to let
> BdrvChildRole.drained_end() return these objects so the top level
> bdrv_drained_end() can poll for their completion.
> 
> I decided not to do this, for two reasons:
> (1) Doing so would spill the “list of objects to poll for” design to
>     places outside of block/io.c.  I don’t like the design very much as
>     it is, but I can live with it as long as it’s constrained to the
>     core drain code in block/io.c.
>     This is made worse by the fact that currently, those objects are of
>     type BdrvCoDrainData.  But it shouldn’t be a problem to add a new
>     type that is externally visible (we only need the AioContext and
>     whether bdrv_drain_invoke_entry() is done).
> 
> (2) It seems to work as it is.
> 
> The alternative would be to add the same GSList ** parameter to
> BdrvChildRole.drained_end() that I added in the core drain code in patch
> 2, and then let the .drained_end() implementation fill that with objects
> to poll for.  (Which would be accomplished by adding a frontend to
> bdrv_do_drained_end() that lets bdrv_child_cb_drained_poll() pass the
> parameter through.)
> 
> Opinions?
> 
> 
> And then we have bdrv_replace_child_noperm(), which actually wants a
> polling BdrvChildRole.drained_end().  So this series adds
> BdrvChildRole.drained_end_unquiesce(), which takes that role (if there
> is any polling to do).
> 
> Note that if I implemented the alternative described above
> (.drained_end() gets said GSList ** parameter), a
> .drained_end_unquiesce() wouldn’t be necessary.
> bdrv_parent_drained_end_single() could just poll the list returned by
> .drained_end() by itself.
> 
> 
> Finally, patches 1, 8, and 9 are unmodified from v1.
> 
> 
> git backport-diff against v1:
> 
> Key:
> [----] : patches are identical
> [####] : number of functional differences between upstream/downstream patch
> [down] : patch is downstream-only
> The flags [FC] indicate (F)unctional and (C)ontextual differences, respectively
> 
> 001/9:[----] [--] 'block: Introduce BdrvChild.parent_quiesce_counter'
> 002/9:[down] 'block: Add @data_objs to bdrv_drain_invoke()'
> 003/9:[down] 'block: Add bdrv_poll_drain_data_objs()'
> 004/9:[down] 'block: Move polling out of bdrv_drain_invoke()'
> 005/9:[down] 'block: Add @poll to bdrv_parent_drained_end_single()'
> 006/9:[down] 'block: Add bdrv_drained_end_no_poll()'
> 007/9:[down] 'block: Fix BDS children's .drained_end()'
> 008/9:[----] [--] 'iotests: Add @has_quit to vm.shutdown()'
> 009/9:[----] [--] 'iotests: Test commit with a filter on the chain'
> 
> 
> Max Reitz (9):
>   block: Introduce BdrvChild.parent_quiesce_counter
>   block: Add @data_objs to bdrv_drain_invoke()
>   block: Add bdrv_poll_drain_data_objs()
>   block: Move polling out of bdrv_drain_invoke()
>   block: Add @poll to bdrv_parent_drained_end_single()
>   block: Add bdrv_drained_end_no_poll()
>   block: Fix BDS children's .drained_end()
>   iotests: Add @has_quit to vm.shutdown()
>   iotests: Test commit with a filter on the chain
> 
>  include/block/block.h      |  22 +++++-
>  include/block/block_int.h  |  23 ++++++
>  block.c                    |  24 +++---
>  block/io.c                 | 155 ++++++++++++++++++++++++++++++-------
>  python/qemu/__init__.py    |   5 +-
>  tests/qemu-iotests/040     |  40 +++++++++-
>  tests/qemu-iotests/040.out |   4 +-
>  tests/qemu-iotests/255     |   2 +-
>  8 files changed, 231 insertions(+), 44 deletions(-)
> 



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [PATCH v2 0/9] block: Delay poll when ending drained sections
  2019-06-19 15:25 [Qemu-devel] [PATCH v2 0/9] block: Delay poll when ending drained sections Max Reitz
                   ` (9 preceding siblings ...)
  2019-07-15 13:24 ` [Qemu-devel] [PATCH v2 0/9] block: Delay poll when ending drained sections Max Reitz
@ 2019-07-16 14:40 ` Kevin Wolf
  2019-07-16 16:24   ` Max Reitz
  10 siblings, 1 reply; 15+ messages in thread
From: Kevin Wolf @ 2019-07-16 14:40 UTC (permalink / raw)
  To: Max Reitz; +Cc: Stefan Hajnoczi, qemu-devel, qemu-block

Am 19.06.2019 um 17:25 hat Max Reitz geschrieben:
> Hi,
> 
> This is v2 to “block: Keep track of parent quiescing”.
> 
> Please read this cover letter, because I’m very unsure about the design
> in this series and I’d appreciate some comments.
> 
> As Kevin wrote in his reply to that series, the actual problem is that
> bdrv_drain_invoke() polls on every node whenever ending a drain.  This
> may cause graph changes, which is actually forbidden.
> 
> To solve that problem, this series makes the drain code construct a list
> of undrain operations that have been initiated, and then polls all of
> them on the root level once graph changes are acceptable.
> 
> Note that I don’t like this list concept very much, so I’m open to
> alternatives.

So drain_end is different from drain_begin in that it wants to wait only
for all bdrv_drain_invoke() calls to complete, but not for other
requests that are in flight. Makes sense.

Though instead of managing a whole list, wouldn't a counter suffice?

> Furthermore, all BdrvChildRoles with BDS parents have a broken
> .drained_end() implementation.  The documentation clearly states that
> this function is not allowed to poll, but it does.  So this series
> changes it to a variant (using the new code) that does not poll.
> 
> There is a catch, which may actually be a problem, I don’t know: The new
> variant of that .drained_end() does not poll at all, never.  As
> described above, now every bdrv_drain_invoke() returns an object that
> describes when it will be done and which can thus be polled for.  These
> objects are just discarded when using BdrvChildRole.drained_end().  That
> does not feel quite right.  It would probably be more correct to let
> BdrvChildRole.drained_end() return these objects so the top level
> bdrv_drained_end() can poll for their completion.
> 
> I decided not to do this, for two reasons:
> (1) Doing so would spill the “list of objects to poll for” design to
>     places outside of block/io.c.  I don’t like the design very much as
>     it is, but I can live with it as long as it’s constrained to the
>     core drain code in block/io.c.
>     This is made worse by the fact that currently, those objects are of
>     type BdrvCoDrainData.  But it shouldn’t be a problem to add a new
>     type that is externally visible (we only need the AioContext and
>     whether bdrv_drain_invoke_entry() is done).
> 
> (2) It seems to work as it is.
> 
> The alternative would be to add the same GSList ** parameter to
> BdrvChildRole.drained_end() that I added in the core drain code in patch
> 2, and then let the .drained_end() implementation fill that with objects
> to poll for.  (Which would be accomplished by adding a frontend to
> bdrv_do_drained_end() that lets bdrv_child_cb_drained_poll() pass the
> parameter through.)
> 
> Opinions?

I think I would add an int* to BdrvChildRole.drained_end() so that we
can just increase the counter whereever we need to.

> And then we have bdrv_replace_child_noperm(), which actually wants a
> polling BdrvChildRole.drained_end().  So this series adds
> BdrvChildRole.drained_end_unquiesce(), which takes that role (if there
> is any polling to do).
> 
> Note that if I implemented the alternative described above
> (.drained_end() gets said GSList ** parameter), a
> .drained_end_unquiesce() wouldn’t be necessary.
> bdrv_parent_drained_end_single() could just poll the list returned by
> .drained_end() by itself.

The split between .drained_end/.drained_end_unquiesce feels wrong. It
shouldn't be the job of the BdrvChildRole to worry about this. Polling
should be handled inside bdrv_parent_drained_end_single(), like we do in
bdrv_parent_drained_begin_single(), so that the BdrvChildRole never has
to poll.

> Finally, patches 1, 8, and 9 are unmodified from v1.
> [...]
> 
>  include/block/block.h      |  22 +++++-
>  include/block/block_int.h  |  23 ++++++
>  block.c                    |  24 +++---
>  block/io.c                 | 155 ++++++++++++++++++++++++++++++-------
>  python/qemu/__init__.py    |   5 +-
>  tests/qemu-iotests/040     |  40 +++++++++-
>  tests/qemu-iotests/040.out |   4 +-
>  tests/qemu-iotests/255     |   2 +-
>  8 files changed, 231 insertions(+), 44 deletions(-)

I feel this series should add something to tests/test-bdrv-drain.c, too.
qemu-iotests can only test high-level block job commands that happen to
trigger the bug today, but that code may change in the future. Unit
tests allow us to test the problematic cases more directly.

Kevin


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [PATCH v2 0/9] block: Delay poll when ending drained sections
  2019-07-16 14:40 ` Kevin Wolf
@ 2019-07-16 16:24   ` Max Reitz
  2019-07-16 16:37     ` Kevin Wolf
  0 siblings, 1 reply; 15+ messages in thread
From: Max Reitz @ 2019-07-16 16:24 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: Stefan Hajnoczi, qemu-devel, qemu-block


[-- Attachment #1.1: Type: text/plain, Size: 5132 bytes --]

On 16.07.19 16:40, Kevin Wolf wrote:
> Am 19.06.2019 um 17:25 hat Max Reitz geschrieben:
>> Hi,
>>
>> This is v2 to “block: Keep track of parent quiescing”.
>>
>> Please read this cover letter, because I’m very unsure about the design
>> in this series and I’d appreciate some comments.
>>
>> As Kevin wrote in his reply to that series, the actual problem is that
>> bdrv_drain_invoke() polls on every node whenever ending a drain.  This
>> may cause graph changes, which is actually forbidden.
>>
>> To solve that problem, this series makes the drain code construct a list
>> of undrain operations that have been initiated, and then polls all of
>> them on the root level once graph changes are acceptable.
>>
>> Note that I don’t like this list concept very much, so I’m open to
>> alternatives.
> 
> So drain_end is different from drain_begin in that it wants to wait only
> for all bdrv_drain_invoke() calls to complete, but not for other
> requests that are in flight. Makes sense.
> 
> Though instead of managing a whole list, wouldn't a counter suffice?
> 
>> Furthermore, all BdrvChildRoles with BDS parents have a broken
>> .drained_end() implementation.  The documentation clearly states that
>> this function is not allowed to poll, but it does.  So this series
>> changes it to a variant (using the new code) that does not poll.
>>
>> There is a catch, which may actually be a problem, I don’t know: The new
>> variant of that .drained_end() does not poll at all, never.  As
>> described above, now every bdrv_drain_invoke() returns an object that
>> describes when it will be done and which can thus be polled for.  These
>> objects are just discarded when using BdrvChildRole.drained_end().  That
>> does not feel quite right.  It would probably be more correct to let
>> BdrvChildRole.drained_end() return these objects so the top level
>> bdrv_drained_end() can poll for their completion.
>>
>> I decided not to do this, for two reasons:
>> (1) Doing so would spill the “list of objects to poll for” design to
>>     places outside of block/io.c.  I don’t like the design very much as
>>     it is, but I can live with it as long as it’s constrained to the
>>     core drain code in block/io.c.
>>     This is made worse by the fact that currently, those objects are of
>>     type BdrvCoDrainData.  But it shouldn’t be a problem to add a new
>>     type that is externally visible (we only need the AioContext and
>>     whether bdrv_drain_invoke_entry() is done).
>>
>> (2) It seems to work as it is.
>>
>> The alternative would be to add the same GSList ** parameter to
>> BdrvChildRole.drained_end() that I added in the core drain code in patch
>> 2, and then let the .drained_end() implementation fill that with objects
>> to poll for.  (Which would be accomplished by adding a frontend to
>> bdrv_do_drained_end() that lets bdrv_child_cb_drained_poll() pass the
>> parameter through.)
>>
>> Opinions?
> 
> I think I would add an int* to BdrvChildRole.drained_end() so that we
> can just increase the counter whereever we need to.

So you mean just polling the @bs for which a caller gave poll=true until
the counter reaches 0?  I’ll try, sounds good (if I can get it to work).

>> And then we have bdrv_replace_child_noperm(), which actually wants a
>> polling BdrvChildRole.drained_end().  So this series adds
>> BdrvChildRole.drained_end_unquiesce(), which takes that role (if there
>> is any polling to do).
>>
>> Note that if I implemented the alternative described above
>> (.drained_end() gets said GSList ** parameter), a
>> .drained_end_unquiesce() wouldn’t be necessary.
>> bdrv_parent_drained_end_single() could just poll the list returned by
>> .drained_end() by itself.
> 
> The split between .drained_end/.drained_end_unquiesce feels wrong. It
> shouldn't be the job of the BdrvChildRole to worry about this. Polling
> should be handled inside bdrv_parent_drained_end_single(), like we do in
> bdrv_parent_drained_begin_single(), so that the BdrvChildRole never has
> to poll.

If it’s just an int pointer, there’s no point in having two functions, I
suppose.

Thanks for you suggestion!

>> Finally, patches 1, 8, and 9 are unmodified from v1.
>> [...]
>>
>>  include/block/block.h      |  22 +++++-
>>  include/block/block_int.h  |  23 ++++++
>>  block.c                    |  24 +++---
>>  block/io.c                 | 155 ++++++++++++++++++++++++++++++-------
>>  python/qemu/__init__.py    |   5 +-
>>  tests/qemu-iotests/040     |  40 +++++++++-
>>  tests/qemu-iotests/040.out |   4 +-
>>  tests/qemu-iotests/255     |   2 +-
>>  8 files changed, 231 insertions(+), 44 deletions(-)
> 
> I feel this series should add something to tests/test-bdrv-drain.c, too.
> qemu-iotests can only test high-level block job commands that happen to
> trigger the bug today, but that code may change in the future. Unit
> tests allow us to test the problematic cases more directly.

Well, I’m glad if test-bdrv-drain just keeps working. :-)

I’ll try.

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [PATCH v2 0/9] block: Delay poll when ending drained sections
  2019-07-16 16:24   ` Max Reitz
@ 2019-07-16 16:37     ` Kevin Wolf
  2019-07-17 13:20       ` Max Reitz
  0 siblings, 1 reply; 15+ messages in thread
From: Kevin Wolf @ 2019-07-16 16:37 UTC (permalink / raw)
  To: Max Reitz; +Cc: Stefan Hajnoczi, qemu-devel, qemu-block

[-- Attachment #1: Type: text/plain, Size: 4025 bytes --]

Am 16.07.2019 um 18:24 hat Max Reitz geschrieben:
> On 16.07.19 16:40, Kevin Wolf wrote:
> > Am 19.06.2019 um 17:25 hat Max Reitz geschrieben:
> >> Hi,
> >>
> >> This is v2 to “block: Keep track of parent quiescing”.
> >>
> >> Please read this cover letter, because I’m very unsure about the design
> >> in this series and I’d appreciate some comments.
> >>
> >> As Kevin wrote in his reply to that series, the actual problem is that
> >> bdrv_drain_invoke() polls on every node whenever ending a drain.  This
> >> may cause graph changes, which is actually forbidden.
> >>
> >> To solve that problem, this series makes the drain code construct a list
> >> of undrain operations that have been initiated, and then polls all of
> >> them on the root level once graph changes are acceptable.
> >>
> >> Note that I don’t like this list concept very much, so I’m open to
> >> alternatives.
> > 
> > So drain_end is different from drain_begin in that it wants to wait only
> > for all bdrv_drain_invoke() calls to complete, but not for other
> > requests that are in flight. Makes sense.
> > 
> > Though instead of managing a whole list, wouldn't a counter suffice?
> > 
> >> Furthermore, all BdrvChildRoles with BDS parents have a broken
> >> .drained_end() implementation.  The documentation clearly states that
> >> this function is not allowed to poll, but it does.  So this series
> >> changes it to a variant (using the new code) that does not poll.
> >>
> >> There is a catch, which may actually be a problem, I don’t know: The new
> >> variant of that .drained_end() does not poll at all, never.  As
> >> described above, now every bdrv_drain_invoke() returns an object that
> >> describes when it will be done and which can thus be polled for.  These
> >> objects are just discarded when using BdrvChildRole.drained_end().  That
> >> does not feel quite right.  It would probably be more correct to let
> >> BdrvChildRole.drained_end() return these objects so the top level
> >> bdrv_drained_end() can poll for their completion.
> >>
> >> I decided not to do this, for two reasons:
> >> (1) Doing so would spill the “list of objects to poll for” design to
> >>     places outside of block/io.c.  I don’t like the design very much as
> >>     it is, but I can live with it as long as it’s constrained to the
> >>     core drain code in block/io.c.
> >>     This is made worse by the fact that currently, those objects are of
> >>     type BdrvCoDrainData.  But it shouldn’t be a problem to add a new
> >>     type that is externally visible (we only need the AioContext and
> >>     whether bdrv_drain_invoke_entry() is done).
> >>
> >> (2) It seems to work as it is.
> >>
> >> The alternative would be to add the same GSList ** parameter to
> >> BdrvChildRole.drained_end() that I added in the core drain code in patch
> >> 2, and then let the .drained_end() implementation fill that with objects
> >> to poll for.  (Which would be accomplished by adding a frontend to
> >> bdrv_do_drained_end() that lets bdrv_child_cb_drained_poll() pass the
> >> parameter through.)
> >>
> >> Opinions?
> > 
> > I think I would add an int* to BdrvChildRole.drained_end() so that we
> > can just increase the counter whereever we need to.
> 
> So you mean just polling the @bs for which a caller gave poll=true until
> the counter reaches 0?  I’ll try, sounds good (if I can get it to work).

Yes, that's what I have in mind.

We expect graph changes to happen during the polling, but I think the
caller is responsible for making sure that the top-level @bs stays
around, so we don't need to be extra careful here.

Also, bdrv_drain_invoke() is always called in the same AioContext as the
top-level drain operation, so the whole aio_context_acquire/release
stuff from this series should become unnecessary, and we don't need
atomics to access the counter either.

So I think this should really simplify the series a lot.

Kevin

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 801 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [PATCH v2 0/9] block: Delay poll when ending drained sections
  2019-07-16 16:37     ` Kevin Wolf
@ 2019-07-17 13:20       ` Max Reitz
  0 siblings, 0 replies; 15+ messages in thread
From: Max Reitz @ 2019-07-17 13:20 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: Stefan Hajnoczi, qemu-devel, qemu-block


[-- Attachment #1.1: Type: text/plain, Size: 5233 bytes --]

On 16.07.19 18:37, Kevin Wolf wrote:
> Am 16.07.2019 um 18:24 hat Max Reitz geschrieben:
>> On 16.07.19 16:40, Kevin Wolf wrote:
>>> Am 19.06.2019 um 17:25 hat Max Reitz geschrieben:
>>>> Hi,
>>>>
>>>> This is v2 to “block: Keep track of parent quiescing”.
>>>>
>>>> Please read this cover letter, because I’m very unsure about the design
>>>> in this series and I’d appreciate some comments.
>>>>
>>>> As Kevin wrote in his reply to that series, the actual problem is that
>>>> bdrv_drain_invoke() polls on every node whenever ending a drain.  This
>>>> may cause graph changes, which is actually forbidden.
>>>>
>>>> To solve that problem, this series makes the drain code construct a list
>>>> of undrain operations that have been initiated, and then polls all of
>>>> them on the root level once graph changes are acceptable.
>>>>
>>>> Note that I don’t like this list concept very much, so I’m open to
>>>> alternatives.
>>>
>>> So drain_end is different from drain_begin in that it wants to wait only
>>> for all bdrv_drain_invoke() calls to complete, but not for other
>>> requests that are in flight. Makes sense.
>>>
>>> Though instead of managing a whole list, wouldn't a counter suffice?
>>>
>>>> Furthermore, all BdrvChildRoles with BDS parents have a broken
>>>> .drained_end() implementation.  The documentation clearly states that
>>>> this function is not allowed to poll, but it does.  So this series
>>>> changes it to a variant (using the new code) that does not poll.
>>>>
>>>> There is a catch, which may actually be a problem, I don’t know: The new
>>>> variant of that .drained_end() does not poll at all, never.  As
>>>> described above, now every bdrv_drain_invoke() returns an object that
>>>> describes when it will be done and which can thus be polled for.  These
>>>> objects are just discarded when using BdrvChildRole.drained_end().  That
>>>> does not feel quite right.  It would probably be more correct to let
>>>> BdrvChildRole.drained_end() return these objects so the top level
>>>> bdrv_drained_end() can poll for their completion.
>>>>
>>>> I decided not to do this, for two reasons:
>>>> (1) Doing so would spill the “list of objects to poll for” design to
>>>>     places outside of block/io.c.  I don’t like the design very much as
>>>>     it is, but I can live with it as long as it’s constrained to the
>>>>     core drain code in block/io.c.
>>>>     This is made worse by the fact that currently, those objects are of
>>>>     type BdrvCoDrainData.  But it shouldn’t be a problem to add a new
>>>>     type that is externally visible (we only need the AioContext and
>>>>     whether bdrv_drain_invoke_entry() is done).
>>>>
>>>> (2) It seems to work as it is.
>>>>
>>>> The alternative would be to add the same GSList ** parameter to
>>>> BdrvChildRole.drained_end() that I added in the core drain code in patch
>>>> 2, and then let the .drained_end() implementation fill that with objects
>>>> to poll for.  (Which would be accomplished by adding a frontend to
>>>> bdrv_do_drained_end() that lets bdrv_child_cb_drained_poll() pass the
>>>> parameter through.)
>>>>
>>>> Opinions?
>>>
>>> I think I would add an int* to BdrvChildRole.drained_end() so that we
>>> can just increase the counter whereever we need to.
>>
>> So you mean just polling the @bs for which a caller gave poll=true until
>> the counter reaches 0?  I’ll try, sounds good (if I can get it to work).
> 
> Yes, that's what I have in mind.
> 
> We expect graph changes to happen during the polling, but I think the
> caller is responsible for making sure that the top-level @bs stays
> around, so we don't need to be extra careful here.
> 
> Also, bdrv_drain_invoke() is always called in the same AioContext as the
> top-level drain operation, so the whole aio_context_acquire/release
> stuff from this series should become unnecessary, and we don't need
> atomics to access the counter either.
> 
> So I think this should really simplify the series a lot.

Hm.  Unfortunately, not all nodes in a chain always have the same
AioContext.

I think they generally should, but there is at least one exception:
bdrv_set_aio_context*() itself.  bdrv_set_aio_context_ignore() drains
the node, then puts other members of the subgraph into the same
AioContext, then itself.

Now say this reaches the bottom node.  That node will not recurse
anywhere else, but only change its own AioContext, in a drained section.
 So when that section ends, the bottom node will be in a different
AioContext than the other nodes.

So, er, well.  I have three ideas:

(1) Skip the polling on the top level drained_end if the node still has
another quiesce_counter on it.  Sounds a bit too error-prone to me.

(2) Drop the drained sections in bdrv_set_aio_context_ignore().  Instead
require the root caller to have the whole subtree drained.  That way,
drained_end will never be invoked while the subtree has different
AioContexts.

(3) I need a list after all (one that only contains AioContexts, but still).


I like (3) as little as I did in this series.  (1) seems wrong.  I’ll
try (2) first.

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2019-07-17 13:20 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-06-19 15:25 [Qemu-devel] [PATCH v2 0/9] block: Delay poll when ending drained sections Max Reitz
2019-06-19 15:25 ` [Qemu-devel] [PATCH v2 1/9] block: Introduce BdrvChild.parent_quiesce_counter Max Reitz
2019-06-19 15:25 ` [Qemu-devel] [PATCH v2 2/9] block: Add @data_objs to bdrv_drain_invoke() Max Reitz
2019-06-19 15:25 ` [Qemu-devel] [PATCH v2 3/9] block: Add bdrv_poll_drain_data_objs() Max Reitz
2019-06-19 15:25 ` [Qemu-devel] [PATCH v2 4/9] block: Move polling out of bdrv_drain_invoke() Max Reitz
2019-06-19 15:25 ` [Qemu-devel] [PATCH v2 5/9] block: Add @poll to bdrv_parent_drained_end_single() Max Reitz
2019-06-19 15:26 ` [Qemu-devel] [PATCH v2 6/9] block: Add bdrv_drained_end_no_poll() Max Reitz
2019-06-19 15:26 ` [Qemu-devel] [PATCH v2 7/9] block: Fix BDS children's .drained_end() Max Reitz
2019-06-19 15:26 ` [Qemu-devel] [PATCH v2 8/9] iotests: Add @has_quit to vm.shutdown() Max Reitz
2019-06-19 15:26 ` [Qemu-devel] [PATCH v2 9/9] iotests: Test commit with a filter on the chain Max Reitz
2019-07-15 13:24 ` [Qemu-devel] [PATCH v2 0/9] block: Delay poll when ending drained sections Max Reitz
2019-07-16 14:40 ` Kevin Wolf
2019-07-16 16:24   ` Max Reitz
2019-07-16 16:37     ` Kevin Wolf
2019-07-17 13:20       ` Max Reitz

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).