qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/7] block: Attempt on fixing 030-reported errors
@ 2021-11-04 10:38 Hanna Reitz
  2021-11-04 10:38 ` [PATCH 1/7] stream: Traverse graph after modification Hanna Reitz
                   ` (8 more replies)
  0 siblings, 9 replies; 23+ messages in thread
From: Hanna Reitz @ 2021-11-04 10:38 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Hanna Reitz, Vladimir Sementsov-Ogievskiy, qemu-devel

Hi,

I’ve tried to investigate what causes the iotest 030 to fail.  Here’s
what I found:

(1) stream_prepare() gets the base node by looking up the node below
    above_base.  It then invokes bdrv_cor_filter_drop(), before we
    actually use the base node.
    bdrv_cor_filter_drop() modifies the block graph, which means
    draining, which means other parties might modify the graph, too.
    Therefore, afterwards, the node below above_base might be completely
    different, and the base node we got before might already be gone.

(2) bdrv_replace_child_noperm() can set BdrvChild.bs to NULL.  That’s
    problematic, because most of our code cannot deal with BdrvChild
    objects whose .bs pointer is NULL.  We assume that such objects are
    immediately removed from the BDS.children list, and that they won’t
    appear under bs->backing or bs->file (i.e. that those pointers are
    immediately NULLed when bs->{backing,file}->bs is NULLed).
    After setting BdrvChild.bs to NULL, bdrv_replace_child_noperm() may
    invoke bdrv_parent_drained_end_single() on the BdrvChild.
    Therefore, other code is run at that point, and it might not be
    ready to encounter something like
    `bs->backing != NULL && bs->backing->bs == NULL`.

(3) 030 in one case launches four stream jobs concurrently, all with
    speed=1024.  It then unthrottles them one after each other, but the
    problem is that if one job finishes, the jobs above it will be
    advanced by a step (which is actually 512k); so since we unthrottle
    bottom to top, it’s possible that all jobs below the top job are
    finished before we get to unthrottle the top job.  This will advance
    the top job so far (3 * 512k + 512k = 2M) that it actually finishes
    despite still being throttled.  Attempting to unthrottle it then
    throws an error.


Here’s how I think we can solve these problems:

(1) Invoke bdrv_cor_filter_drop() first, then get the base node
    afterwards, when the graph will no longer change.
    Implemented in patch 1.

(2A) bdrv_replace_child_noperm() should immediately set bs->file or
     bs->backing to NULL when it sets bs->{file,backing}->bs to NULL.
     It should also immediately remove any BdrvChild with .bs == NULL
     from the parent’s BDS.children list.
     Implemented in patches 2 through 6.

(2B) Alternatively, we could always keep the whole subgraph drained
     while we manipulate it.  Then, the bdrv_parent_drained_end_single()
     in bdrv_replace_child_noperm() wouldn’t do anything.
     To fix 030, we would need to add a drained section to
     stream_prepare(): Namely we’d need to drain the subgraph below the
     COR filter node.
     This would be a much simpler solution, but I don’t feel like it’s
     the right one.

(3) Just unthrottle the jobs from bottom to top instead of top to
    bottom.


As you can see, I’m not sure which of 2A or 2B is the right solution.  I
decided to investigate both: 2A was much more complicated, but seemed
like the right thing to do; 2B is much simpler, but doesn’t feel as
right.  Therefore, I decided to go with 2A in this first version of this
series.


Hanna Reitz (7):
  stream: Traverse graph after modification
  block: Manipulate children list in .attach/.detach
  block: Unite remove_empty_child and child_free
  block: Drop detached child from ignore list
  block: Pass BdrvChild ** to replace_child_noperm
  block: Let replace_child_noperm free children
  iotests/030: Unthrottle parallel jobs in reverse

 block.c                | 178 +++++++++++++++++++++++++++++------------
 block/stream.c         |   7 +-
 tests/qemu-iotests/030 |  11 ++-
 3 files changed, 144 insertions(+), 52 deletions(-)

-- 
2.33.1



^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 1/7] stream: Traverse graph after modification
  2021-11-04 10:38 [PATCH 0/7] block: Attempt on fixing 030-reported errors Hanna Reitz
@ 2021-11-04 10:38 ` Hanna Reitz
  2021-11-10 12:17   ` Vladimir Sementsov-Ogievskiy
  2021-11-04 10:38 ` [PATCH 2/7] block: Manipulate children list in .attach/.detach Hanna Reitz
                   ` (7 subsequent siblings)
  8 siblings, 1 reply; 23+ messages in thread
From: Hanna Reitz @ 2021-11-04 10:38 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Hanna Reitz, Vladimir Sementsov-Ogievskiy, qemu-devel

bdrv_cor_filter_drop() modifies the block graph.  That means that other
parties can also modify the block graph before it returns.  Therefore,
we cannot assume that the result of a graph traversal we did before
remains valid afterwards.

We should thus fetch `base` and `unfiltered_base` afterwards instead of
before.

Signed-off-by: Hanna Reitz <hreitz@redhat.com>
---
 block/stream.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/block/stream.c b/block/stream.c
index 97bee482dc..e45113aed6 100644
--- a/block/stream.c
+++ b/block/stream.c
@@ -54,8 +54,8 @@ static int stream_prepare(Job *job)
 {
     StreamBlockJob *s = container_of(job, StreamBlockJob, common.job);
     BlockDriverState *unfiltered_bs = bdrv_skip_filters(s->target_bs);
-    BlockDriverState *base = bdrv_filter_or_cow_bs(s->above_base);
-    BlockDriverState *unfiltered_base = bdrv_skip_filters(base);
+    BlockDriverState *base;
+    BlockDriverState *unfiltered_base;
     Error *local_err = NULL;
     int ret = 0;
 
@@ -63,6 +63,9 @@ static int stream_prepare(Job *job)
     bdrv_cor_filter_drop(s->cor_filter_bs);
     s->cor_filter_bs = NULL;
 
+    base = bdrv_filter_or_cow_bs(s->above_base);
+    unfiltered_base = bdrv_skip_filters(base);
+
     if (bdrv_cow_child(unfiltered_bs)) {
         const char *base_id = NULL, *base_fmt = NULL;
         if (unfiltered_base) {
-- 
2.33.1



^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 2/7] block: Manipulate children list in .attach/.detach
  2021-11-04 10:38 [PATCH 0/7] block: Attempt on fixing 030-reported errors Hanna Reitz
  2021-11-04 10:38 ` [PATCH 1/7] stream: Traverse graph after modification Hanna Reitz
@ 2021-11-04 10:38 ` Hanna Reitz
  2021-11-10 12:46   ` Vladimir Sementsov-Ogievskiy
  2021-11-10 12:51   ` Vladimir Sementsov-Ogievskiy
  2021-11-04 10:38 ` [PATCH 3/7] block: Unite remove_empty_child and child_free Hanna Reitz
                   ` (6 subsequent siblings)
  8 siblings, 2 replies; 23+ messages in thread
From: Hanna Reitz @ 2021-11-04 10:38 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Hanna Reitz, Vladimir Sementsov-Ogievskiy, qemu-devel

The children list is specific to BDS parents.  We should not modify it
in the general children modification code, but let BDS parents deal with
it in their .attach() and .detach() methods.

This also has the advantage that a BdrvChild is removed from the
children list before its .bs pointer can become NULL.  BDS parents
generally assume that their children's .bs pointer is never NULL, so
this is actually a bug fix.

Signed-off-by: Hanna Reitz <hreitz@redhat.com>
---
 block.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/block.c b/block.c
index 580cb77a70..243ae206b5 100644
--- a/block.c
+++ b/block.c
@@ -1387,6 +1387,8 @@ static void bdrv_child_cb_attach(BdrvChild *child)
 {
     BlockDriverState *bs = child->opaque;
 
+    QLIST_INSERT_HEAD(&bs->children, child, next);
+
     if (child->role & BDRV_CHILD_COW) {
         bdrv_backing_attach(child);
     }
@@ -1403,6 +1405,8 @@ static void bdrv_child_cb_detach(BdrvChild *child)
     }
 
     bdrv_unapply_subtree_drain(child, bs);
+
+    QLIST_REMOVE(child, next);
 }
 
 static int bdrv_child_cb_update_filename(BdrvChild *c, BlockDriverState *base,
@@ -2747,7 +2751,7 @@ static void bdrv_child_free(void *opaque)
 static void bdrv_remove_empty_child(BdrvChild *child)
 {
     assert(!child->bs);
-    QLIST_SAFE_REMOVE(child, next);
+    assert(!child->next.le_prev); /* not in children list */
     bdrv_child_free(child);
 }
 
@@ -2913,7 +2917,6 @@ static int bdrv_attach_child_noperm(BlockDriverState *parent_bs,
         return ret;
     }
 
-    QLIST_INSERT_HEAD(&parent_bs->children, *child, next);
     /*
      * child is removed in bdrv_attach_child_common_abort(), so don't care to
      * abort this change separately.
@@ -4851,7 +4854,6 @@ static void bdrv_remove_filter_or_cow_child_abort(void *opaque)
     BdrvRemoveFilterOrCowChild *s = opaque;
     BlockDriverState *parent_bs = s->child->opaque;
 
-    QLIST_INSERT_HEAD(&parent_bs->children, s->child, next);
     if (s->is_backing) {
         parent_bs->backing = s->child;
     } else {
@@ -4906,7 +4908,6 @@ static void bdrv_remove_file_or_backing_child(BlockDriverState *bs,
     };
     tran_add(tran, &bdrv_remove_filter_or_cow_child_drv, s);
 
-    QLIST_SAFE_REMOVE(child, next);
     if (s->is_backing) {
         bs->backing = NULL;
     } else {
-- 
2.33.1



^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 3/7] block: Unite remove_empty_child and child_free
  2021-11-04 10:38 [PATCH 0/7] block: Attempt on fixing 030-reported errors Hanna Reitz
  2021-11-04 10:38 ` [PATCH 1/7] stream: Traverse graph after modification Hanna Reitz
  2021-11-04 10:38 ` [PATCH 2/7] block: Manipulate children list in .attach/.detach Hanna Reitz
@ 2021-11-04 10:38 ` Hanna Reitz
  2021-11-10 12:52   ` Vladimir Sementsov-Ogievskiy
  2021-11-04 10:38 ` [PATCH 4/7] block: Drop detached child from ignore list Hanna Reitz
                   ` (5 subsequent siblings)
  8 siblings, 1 reply; 23+ messages in thread
From: Hanna Reitz @ 2021-11-04 10:38 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Hanna Reitz, Vladimir Sementsov-Ogievskiy, qemu-devel

Now that bdrv_remove_empty_child() no longer removes the child from the
parent's children list but only checks that it is not in such a list, it
is only a wrapper around bdrv_child_free() that checks that the child is
empty and unused.  That should apply to all children that we free, so
put those checks into bdrv_child_free() and drop
bdrv_remove_empty_child().

Signed-off-by: Hanna Reitz <hreitz@redhat.com>
---
 block.c | 26 +++++++++++++-------------
 1 file changed, 13 insertions(+), 13 deletions(-)

diff --git a/block.c b/block.c
index 243ae206b5..b95f8dcf8f 100644
--- a/block.c
+++ b/block.c
@@ -2740,19 +2740,19 @@ static void bdrv_replace_child_noperm(BdrvChild *child,
     }
 }
 
-static void bdrv_child_free(void *opaque)
-{
-    BdrvChild *c = opaque;
-
-    g_free(c->name);
-    g_free(c);
-}
-
-static void bdrv_remove_empty_child(BdrvChild *child)
+/**
+ * Free the given @child.
+ *
+ * The child must be empty (i.e. `child->bs == NULL`) and it must be
+ * unused (i.e. not in a children list).
+ */
+static void bdrv_child_free(BdrvChild *child)
 {
     assert(!child->bs);
     assert(!child->next.le_prev); /* not in children list */
-    bdrv_child_free(child);
+
+    g_free(child->name);
+    g_free(child);
 }
 
 typedef struct BdrvAttachChildCommonState {
@@ -2786,7 +2786,7 @@ static void bdrv_attach_child_common_abort(void *opaque)
     }
 
     bdrv_unref(bs);
-    bdrv_remove_empty_child(child);
+    bdrv_child_free(child);
     *s->child = NULL;
 }
 
@@ -2859,7 +2859,7 @@ static int bdrv_attach_child_common(BlockDriverState *child_bs,
 
         if (ret < 0) {
             error_propagate(errp, local_err);
-            bdrv_remove_empty_child(new_child);
+            bdrv_child_free(new_child);
             return ret;
         }
     }
@@ -2930,7 +2930,7 @@ static void bdrv_detach_child(BdrvChild *child)
     BlockDriverState *old_bs = child->bs;
 
     bdrv_replace_child_noperm(child, NULL);
-    bdrv_remove_empty_child(child);
+    bdrv_child_free(child);
 
     if (old_bs) {
         /*
-- 
2.33.1



^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 4/7] block: Drop detached child from ignore list
  2021-11-04 10:38 [PATCH 0/7] block: Attempt on fixing 030-reported errors Hanna Reitz
                   ` (2 preceding siblings ...)
  2021-11-04 10:38 ` [PATCH 3/7] block: Unite remove_empty_child and child_free Hanna Reitz
@ 2021-11-04 10:38 ` Hanna Reitz
  2021-11-10 13:38   ` Vladimir Sementsov-Ogievskiy via
  2021-11-04 10:38 ` [PATCH 5/7] block: Pass BdrvChild ** to replace_child_noperm Hanna Reitz
                   ` (4 subsequent siblings)
  8 siblings, 1 reply; 23+ messages in thread
From: Hanna Reitz @ 2021-11-04 10:38 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Hanna Reitz, Vladimir Sementsov-Ogievskiy, qemu-devel

bdrv_attach_child_common_abort() restores the parent's AioContext.  To
do so, the child (which was supposed to be attached, but is now detached
again by this abort handler) is added to the ignore list for the
AioContext changing functions.

However, since we modify a BDS's children list in the BdrvChildClass's
.attach and .detach handlers, the child is already effectively detached
from the parent by this point.  We do not need to put it into the ignore
list.

Use this opportunity to clean up the empty line structure: Keep setting
the ignore list, invoking the AioContext function, and freeing the
ignore list in blocks separated by empty lines.

Signed-off-by: Hanna Reitz <hreitz@redhat.com>
---
 block.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/block.c b/block.c
index b95f8dcf8f..6d230ad3d1 100644
--- a/block.c
+++ b/block.c
@@ -2774,14 +2774,16 @@ static void bdrv_attach_child_common_abort(void *opaque)
     }
 
     if (bdrv_child_get_parent_aio_context(child) != s->old_parent_ctx) {
-        GSList *ignore = g_slist_prepend(NULL, child);
+        GSList *ignore;
 
+        /* No need to ignore `child`, because it has been detached already */
+        ignore = NULL;
         child->klass->can_set_aio_ctx(child, s->old_parent_ctx, &ignore,
                                       &error_abort);
         g_slist_free(ignore);
-        ignore = g_slist_prepend(NULL, child);
-        child->klass->set_aio_ctx(child, s->old_parent_ctx, &ignore);
 
+        ignore = NULL;
+        child->klass->set_aio_ctx(child, s->old_parent_ctx, &ignore);
         g_slist_free(ignore);
     }
 
-- 
2.33.1



^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 5/7] block: Pass BdrvChild ** to replace_child_noperm
  2021-11-04 10:38 [PATCH 0/7] block: Attempt on fixing 030-reported errors Hanna Reitz
                   ` (3 preceding siblings ...)
  2021-11-04 10:38 ` [PATCH 4/7] block: Drop detached child from ignore list Hanna Reitz
@ 2021-11-04 10:38 ` Hanna Reitz
  2021-11-05 15:15   ` Kevin Wolf
  2021-11-04 10:38 ` [PATCH 6/7] block: Let replace_child_noperm free children Hanna Reitz
                   ` (3 subsequent siblings)
  8 siblings, 1 reply; 23+ messages in thread
From: Hanna Reitz @ 2021-11-04 10:38 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Hanna Reitz, Vladimir Sementsov-Ogievskiy, qemu-devel

bdrv_replace_child_noperm() modifies BdrvChild.bs, and can potentially
set it to NULL.  That is dangerous, because BDS parents generally assume
that their children's .bs pointer is never NULL.  We therefore want to
let bdrv_replace_child_noperm() set the corresponding BdrvChild pointer
to NULL, too.

This patch lays the foundation for it by passing a BdrvChild ** pointer
to bdrv_replace_child_noperm() so that it can later use it to NULL the
BdrvChild pointer immediately after setting BdrvChild.bs to NULL.

Signed-off-by: Hanna Reitz <hreitz@redhat.com>
---
 block.c | 59 ++++++++++++++++++++++++++++++++-------------------------
 1 file changed, 33 insertions(+), 26 deletions(-)

diff --git a/block.c b/block.c
index 6d230ad3d1..ff45447686 100644
--- a/block.c
+++ b/block.c
@@ -87,7 +87,7 @@ static BlockDriverState *bdrv_open_inherit(const char *filename,
 static bool bdrv_recurse_has_child(BlockDriverState *bs,
                                    BlockDriverState *child);
 
-static void bdrv_replace_child_noperm(BdrvChild *child,
+static void bdrv_replace_child_noperm(BdrvChild **child,
                                       BlockDriverState *new_bs);
 static void bdrv_remove_file_or_backing_child(BlockDriverState *bs,
                                               BdrvChild *child,
@@ -2254,6 +2254,7 @@ static int bdrv_drv_set_perm(BlockDriverState *bs, uint64_t perm,
 
 typedef struct BdrvReplaceChildState {
     BdrvChild *child;
+    BdrvChild **childp;
     BlockDriverState *old_bs;
 } BdrvReplaceChildState;
 
@@ -2269,8 +2270,8 @@ static void bdrv_replace_child_abort(void *opaque)
     BdrvReplaceChildState *s = opaque;
     BlockDriverState *new_bs = s->child->bs;
 
-    /* old_bs reference is transparently moved from @s to @s->child */
-    bdrv_replace_child_noperm(s->child, s->old_bs);
+    /* old_bs reference is transparently moved from @s to *s->childp */
+    bdrv_replace_child_noperm(s->childp, s->old_bs);
     bdrv_unref(new_bs);
 }
 
@@ -2287,21 +2288,23 @@ static TransactionActionDrv bdrv_replace_child_drv = {
  *
  * The function doesn't update permissions, caller is responsible for this.
  */
-static void bdrv_replace_child_tran(BdrvChild *child, BlockDriverState *new_bs,
+static void bdrv_replace_child_tran(BdrvChild **childp,
+                                    BlockDriverState *new_bs,
                                     Transaction *tran)
 {
     BdrvReplaceChildState *s = g_new(BdrvReplaceChildState, 1);
     *s = (BdrvReplaceChildState) {
-        .child = child,
-        .old_bs = child->bs,
+        .child = *childp,
+        .childp = childp,
+        .old_bs = (*childp)->bs,
     };
     tran_add(tran, &bdrv_replace_child_drv, s);
 
     if (new_bs) {
         bdrv_ref(new_bs);
     }
-    bdrv_replace_child_noperm(child, new_bs);
-    /* old_bs reference is transparently moved from @child to @s */
+    bdrv_replace_child_noperm(childp, new_bs);
+    /* old_bs reference is transparently moved from *childp to @s */
 }
 
 /*
@@ -2672,9 +2675,10 @@ uint64_t bdrv_qapi_perm_to_blk_perm(BlockPermission qapi_perm)
     return permissions[qapi_perm];
 }
 
-static void bdrv_replace_child_noperm(BdrvChild *child,
+static void bdrv_replace_child_noperm(BdrvChild **childp,
                                       BlockDriverState *new_bs)
 {
+    BdrvChild *child = *childp;
     BlockDriverState *old_bs = child->bs;
     int new_bs_quiesce_counter;
     int drain_saldo;
@@ -2767,7 +2771,7 @@ static void bdrv_attach_child_common_abort(void *opaque)
     BdrvChild *child = *s->child;
     BlockDriverState *bs = child->bs;
 
-    bdrv_replace_child_noperm(child, NULL);
+    bdrv_replace_child_noperm(s->child, NULL);
 
     if (bdrv_get_aio_context(bs) != s->old_child_ctx) {
         bdrv_try_set_aio_context(bs, s->old_child_ctx, &error_abort);
@@ -2867,7 +2871,7 @@ static int bdrv_attach_child_common(BlockDriverState *child_bs,
     }
 
     bdrv_ref(child_bs);
-    bdrv_replace_child_noperm(new_child, child_bs);
+    bdrv_replace_child_noperm(&new_child, child_bs);
 
     *child = new_child;
 
@@ -2927,12 +2931,12 @@ static int bdrv_attach_child_noperm(BlockDriverState *parent_bs,
     return 0;
 }
 
-static void bdrv_detach_child(BdrvChild *child)
+static void bdrv_detach_child(BdrvChild **childp)
 {
-    BlockDriverState *old_bs = child->bs;
+    BlockDriverState *old_bs = (*childp)->bs;
 
-    bdrv_replace_child_noperm(child, NULL);
-    bdrv_child_free(child);
+    bdrv_replace_child_noperm(childp, NULL);
+    bdrv_child_free(*childp);
 
     if (old_bs) {
         /*
@@ -3038,7 +3042,7 @@ void bdrv_root_unref_child(BdrvChild *child)
     BlockDriverState *child_bs;
 
     child_bs = child->bs;
-    bdrv_detach_child(child);
+    bdrv_detach_child(&child);
     bdrv_unref(child_bs);
 }
 
@@ -4891,30 +4895,33 @@ static void bdrv_remove_file_or_backing_child(BlockDriverState *bs,
                                               BdrvChild *child,
                                               Transaction *tran)
 {
+    BdrvChild **childp;
     BdrvRemoveFilterOrCowChild *s;
 
-    assert(child == bs->backing || child == bs->file);
+    if (child == bs->backing) {
+        childp = &bs->backing;
+    } else if (child == bs->file) {
+        childp = &bs->file;
+    } else {
+        g_assert_not_reached();
+    }
 
     if (!child) {
         return;
     }
 
     if (child->bs) {
-        bdrv_replace_child_tran(child, NULL, tran);
+        bdrv_replace_child_tran(childp, NULL, tran);
     }
 
     s = g_new(BdrvRemoveFilterOrCowChild, 1);
     *s = (BdrvRemoveFilterOrCowChild) {
         .child = child,
-        .is_backing = (child == bs->backing),
+        .is_backing = (childp == &bs->backing),
     };
     tran_add(tran, &bdrv_remove_filter_or_cow_child_drv, s);
 
-    if (s->is_backing) {
-        bs->backing = NULL;
-    } else {
-        bs->file = NULL;
-    }
+    *childp = NULL;
 }
 
 /*
@@ -4950,7 +4957,7 @@ static int bdrv_replace_node_noperm(BlockDriverState *from,
                        c->name, from->node_name);
             return -EPERM;
         }
-        bdrv_replace_child_tran(c, to, tran);
+        bdrv_replace_child_tran(&c, to, tran);
     }
 
     return 0;
@@ -5099,7 +5106,7 @@ int bdrv_replace_child_bs(BdrvChild *child, BlockDriverState *new_bs,
     bdrv_drained_begin(old_bs);
     bdrv_drained_begin(new_bs);
 
-    bdrv_replace_child_tran(child, new_bs, tran);
+    bdrv_replace_child_tran(&child, new_bs, tran);
 
     found = g_hash_table_new(NULL, NULL);
     refresh_list = bdrv_topological_dfs(refresh_list, found, old_bs);
-- 
2.33.1



^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 6/7] block: Let replace_child_noperm free children
  2021-11-04 10:38 [PATCH 0/7] block: Attempt on fixing 030-reported errors Hanna Reitz
                   ` (4 preceding siblings ...)
  2021-11-04 10:38 ` [PATCH 5/7] block: Pass BdrvChild ** to replace_child_noperm Hanna Reitz
@ 2021-11-04 10:38 ` Hanna Reitz
  2021-11-05 15:41   ` Kevin Wolf
  2021-11-04 10:38 ` [PATCH 7/7] iotests/030: Unthrottle parallel jobs in reverse Hanna Reitz
                   ` (2 subsequent siblings)
  8 siblings, 1 reply; 23+ messages in thread
From: Hanna Reitz @ 2021-11-04 10:38 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Hanna Reitz, Vladimir Sementsov-Ogievskiy, qemu-devel

In most of the block layer, especially when traversing down from other
BlockDriverStates, we assume that BdrvChild.bs can never be NULL.  When
it becomes NULL, it is expected that the corresponding BdrvChild pointer
also becomes NULL and the BdrvChild object is freed.

Therefore, once bdrv_replace_child_noperm() sets the BdrvChild.bs
pointer to NULL, it should also immediately set the corresponding
BdrvChild pointer (like bs->file or bs->backing) to NULL.

In that context, it also makes sense for this function to free the
child.  Sometimes we cannot do so, though, because it is called in a
transactional context where the caller might still want to reinstate the
child in the abort branch (and free it only on commit), so this behavior
has to remain optional.

In bdrv_replace_child_tran()'s abort handler, we now rely on the fact
that the BdrvChild passed to bdrv_replace_child_tran() must have had a
non-NULL .bs pointer initially.  Make a note of that and assert it.

Signed-off-by: Hanna Reitz <hreitz@redhat.com>
---
 block.c | 102 +++++++++++++++++++++++++++++++++++++++++++++++---------
 1 file changed, 86 insertions(+), 16 deletions(-)

diff --git a/block.c b/block.c
index ff45447686..0a85ede8dc 100644
--- a/block.c
+++ b/block.c
@@ -87,8 +87,10 @@ static BlockDriverState *bdrv_open_inherit(const char *filename,
 static bool bdrv_recurse_has_child(BlockDriverState *bs,
                                    BlockDriverState *child);
 
+static void bdrv_child_free(BdrvChild *child);
 static void bdrv_replace_child_noperm(BdrvChild **child,
-                                      BlockDriverState *new_bs);
+                                      BlockDriverState *new_bs,
+                                      bool free_empty_child);
 static void bdrv_remove_file_or_backing_child(BlockDriverState *bs,
                                               BdrvChild *child,
                                               Transaction *tran);
@@ -2256,12 +2258,16 @@ typedef struct BdrvReplaceChildState {
     BdrvChild *child;
     BdrvChild **childp;
     BlockDriverState *old_bs;
+    bool free_empty_child;
 } BdrvReplaceChildState;
 
 static void bdrv_replace_child_commit(void *opaque)
 {
     BdrvReplaceChildState *s = opaque;
 
+    if (s->free_empty_child && !s->child->bs) {
+        bdrv_child_free(s->child);
+    }
     bdrv_unref(s->old_bs);
 }
 
@@ -2270,8 +2276,23 @@ static void bdrv_replace_child_abort(void *opaque)
     BdrvReplaceChildState *s = opaque;
     BlockDriverState *new_bs = s->child->bs;
 
-    /* old_bs reference is transparently moved from @s to *s->childp */
-    bdrv_replace_child_noperm(s->childp, s->old_bs);
+    /*
+     * old_bs reference is transparently moved from @s to s->child;
+     * pass &s->child here instead of s->childp, because *s->childp
+     * will have been cleared by bdrv_replace_child_tran()'s
+     * bdrv_replace_child_noperm() call if new_bs is NULL, and we must
+     * not pass a NULL *s->childp here.
+     */
+    bdrv_replace_child_noperm(&s->child, s->old_bs, true);
+    /*
+     * The child was pre-existing, so s->old_bs must be non-NULL, and
+     * s->child thus must not have been freed
+     */
+    assert(s->child != NULL);
+    if (!new_bs) {
+        /* As described above, *s->childp was cleared, so restore it */
+        *s->childp = s->child;
+    }
     bdrv_unref(new_bs);
 }
 
@@ -2287,23 +2308,40 @@ static TransactionActionDrv bdrv_replace_child_drv = {
  * Note: real unref of old_bs is done only on commit.
  *
  * The function doesn't update permissions, caller is responsible for this.
+ *
+ * (*childp)->bs must not be NULL.
+ *
+ * If @free_empty_child is true and @new_bs is NULL, the BdrvChild is
+ * freed (on commit).  @free_empty_child should only be false if the
+ * caller will free the BDrvChild themselves (which may be important
+ * if this is in turn called in another transactional context).
  */
 static void bdrv_replace_child_tran(BdrvChild **childp,
                                     BlockDriverState *new_bs,
-                                    Transaction *tran)
+                                    Transaction *tran,
+                                    bool free_empty_child)
 {
     BdrvReplaceChildState *s = g_new(BdrvReplaceChildState, 1);
     *s = (BdrvReplaceChildState) {
         .child = *childp,
         .childp = childp,
         .old_bs = (*childp)->bs,
+        .free_empty_child = free_empty_child,
     };
     tran_add(tran, &bdrv_replace_child_drv, s);
 
+    /* The abort handler relies on this */
+    assert(s->old_bs != NULL);
+
     if (new_bs) {
         bdrv_ref(new_bs);
     }
-    bdrv_replace_child_noperm(childp, new_bs);
+    /*
+     * Pass free_empty_child=false, we will free the child (if
+     * necessary) in bdrv_replace_child_commit() (if our
+     * @free_empty_child parameter was true).
+     */
+    bdrv_replace_child_noperm(childp, new_bs, false);
     /* old_bs reference is transparently moved from *childp to @s */
 }
 
@@ -2675,8 +2713,22 @@ uint64_t bdrv_qapi_perm_to_blk_perm(BlockPermission qapi_perm)
     return permissions[qapi_perm];
 }
 
+/**
+ * Replace (*childp)->bs by @new_bs.
+ *
+ * If @new_bs is NULL, *childp will be set to NULL, too: BDS parents
+ * generally cannot handle a BdrvChild with .bs == NULL, so clearing
+ * BdrvChild.bs should generally immediately be followed by the
+ * BdrvChild pointer being cleared as well.
+ *
+ * If @free_empty_child is true and @new_bs is NULL, the BdrvChild is
+ * freed.  @free_empty_child should only be false if the caller will
+ * free the BdrvChild themselves (this may be important in a
+ * transactional context, where it may only be freed on commit).
+ */
 static void bdrv_replace_child_noperm(BdrvChild **childp,
-                                      BlockDriverState *new_bs)
+                                      BlockDriverState *new_bs,
+                                      bool free_empty_child)
 {
     BdrvChild *child = *childp;
     BlockDriverState *old_bs = child->bs;
@@ -2713,6 +2765,9 @@ static void bdrv_replace_child_noperm(BdrvChild **childp,
     }
 
     child->bs = new_bs;
+    if (!new_bs) {
+        *childp = NULL;
+    }
 
     if (new_bs) {
         QLIST_INSERT_HEAD(&new_bs->parents, child, next_parent);
@@ -2742,6 +2797,10 @@ static void bdrv_replace_child_noperm(BdrvChild **childp,
         bdrv_parent_drained_end_single(child);
         drain_saldo++;
     }
+
+    if (free_empty_child && !child->bs) {
+        bdrv_child_free(child);
+    }
 }
 
 /**
@@ -2771,7 +2830,14 @@ static void bdrv_attach_child_common_abort(void *opaque)
     BdrvChild *child = *s->child;
     BlockDriverState *bs = child->bs;
 
-    bdrv_replace_child_noperm(s->child, NULL);
+    /*
+     * Pass free_empty_child=false, because we still need the child
+     * for the AioContext operations on the parent below; those
+     * BdrvChildClass methods all work on a BdrvChild object, so we
+     * need to keep it as an empty shell (after this function, it will
+     * not be attached to any parent, and it will not have a .bs).
+     */
+    bdrv_replace_child_noperm(s->child, NULL, false);
 
     if (bdrv_get_aio_context(bs) != s->old_child_ctx) {
         bdrv_try_set_aio_context(bs, s->old_child_ctx, &error_abort);
@@ -2793,7 +2859,6 @@ static void bdrv_attach_child_common_abort(void *opaque)
 
     bdrv_unref(bs);
     bdrv_child_free(child);
-    *s->child = NULL;
 }
 
 static TransactionActionDrv bdrv_attach_child_common_drv = {
@@ -2871,7 +2936,9 @@ static int bdrv_attach_child_common(BlockDriverState *child_bs,
     }
 
     bdrv_ref(child_bs);
-    bdrv_replace_child_noperm(&new_child, child_bs);
+    bdrv_replace_child_noperm(&new_child, child_bs, true);
+    /* child_bs was non-NULL, so new_child must not have been freed */
+    assert(new_child != NULL);
 
     *child = new_child;
 
@@ -2935,8 +3002,7 @@ static void bdrv_detach_child(BdrvChild **childp)
 {
     BlockDriverState *old_bs = (*childp)->bs;
 
-    bdrv_replace_child_noperm(childp, NULL);
-    bdrv_child_free(*childp);
+    bdrv_replace_child_noperm(childp, NULL, true);
 
     if (old_bs) {
         /*
@@ -4911,7 +4977,11 @@ static void bdrv_remove_file_or_backing_child(BlockDriverState *bs,
     }
 
     if (child->bs) {
-        bdrv_replace_child_tran(childp, NULL, tran);
+        /*
+         * Pass free_empty_child=false, we will free the child in
+         * bdrv_remove_filter_or_cow_child_commit()
+         */
+        bdrv_replace_child_tran(childp, NULL, tran, false);
     }
 
     s = g_new(BdrvRemoveFilterOrCowChild, 1);
@@ -4920,8 +4990,6 @@ static void bdrv_remove_file_or_backing_child(BlockDriverState *bs,
         .is_backing = (childp == &bs->backing),
     };
     tran_add(tran, &bdrv_remove_filter_or_cow_child_drv, s);
-
-    *childp = NULL;
 }
 
 /*
@@ -4957,7 +5025,7 @@ static int bdrv_replace_node_noperm(BlockDriverState *from,
                        c->name, from->node_name);
             return -EPERM;
         }
-        bdrv_replace_child_tran(&c, to, tran);
+        bdrv_replace_child_tran(&c, to, tran, true);
     }
 
     return 0;
@@ -5106,7 +5174,9 @@ int bdrv_replace_child_bs(BdrvChild *child, BlockDriverState *new_bs,
     bdrv_drained_begin(old_bs);
     bdrv_drained_begin(new_bs);
 
-    bdrv_replace_child_tran(&child, new_bs, tran);
+    bdrv_replace_child_tran(&child, new_bs, tran, true);
+    /* @new_bs must have been non-NULL, so @child must not have been freed */
+    assert(child != NULL);
 
     found = g_hash_table_new(NULL, NULL);
     refresh_list = bdrv_topological_dfs(refresh_list, found, old_bs);
-- 
2.33.1



^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 7/7] iotests/030: Unthrottle parallel jobs in reverse
  2021-11-04 10:38 [PATCH 0/7] block: Attempt on fixing 030-reported errors Hanna Reitz
                   ` (5 preceding siblings ...)
  2021-11-04 10:38 ` [PATCH 6/7] block: Let replace_child_noperm free children Hanna Reitz
@ 2021-11-04 10:38 ` Hanna Reitz
  2021-11-04 11:58 ` [PATCH 0/7] block: Attempt on fixing 030-reported errors Kevin Wolf
  2021-11-05 15:44 ` Kevin Wolf
  8 siblings, 0 replies; 23+ messages in thread
From: Hanna Reitz @ 2021-11-04 10:38 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Hanna Reitz, Vladimir Sementsov-Ogievskiy, qemu-devel

See the comment for why this is necessary.

Signed-off-by: Hanna Reitz <hreitz@redhat.com>
---
 tests/qemu-iotests/030 | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/tests/qemu-iotests/030 b/tests/qemu-iotests/030
index 5fb65b4bef..567bf1da67 100755
--- a/tests/qemu-iotests/030
+++ b/tests/qemu-iotests/030
@@ -251,7 +251,16 @@ class TestParallelOps(iotests.QMPTestCase):
                                  speed=1024)
             self.assert_qmp(result, 'return', {})
 
-        for job in pending_jobs:
+        # Do this in reverse: After unthrottling them, some jobs may finish
+        # before we have unthrottled all of them.  This will drain their
+        # subgraph, and this will make jobs above them advance (despite those
+        # jobs on top being throttled).  In the worst case, all jobs below the
+        # top one are finished before we can unthrottle it, and this makes it
+        # advance so far that it completes before we can unthrottle it - which
+        # results in an error.
+        # Starting from the top (i.e. in reverse) does not have this problem:
+        # When a job finishes, the ones below it are not advanced.
+        for job in reversed(pending_jobs):
             result = self.vm.qmp('block-job-set-speed', device=job, speed=0)
             self.assert_qmp(result, 'return', {})
 
-- 
2.33.1



^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH 0/7] block: Attempt on fixing 030-reported errors
  2021-11-04 10:38 [PATCH 0/7] block: Attempt on fixing 030-reported errors Hanna Reitz
                   ` (6 preceding siblings ...)
  2021-11-04 10:38 ` [PATCH 7/7] iotests/030: Unthrottle parallel jobs in reverse Hanna Reitz
@ 2021-11-04 11:58 ` Kevin Wolf
  2021-11-04 13:34   ` Hanna Reitz
  2021-11-05 15:44 ` Kevin Wolf
  8 siblings, 1 reply; 23+ messages in thread
From: Kevin Wolf @ 2021-11-04 11:58 UTC (permalink / raw)
  To: Hanna Reitz; +Cc: Vladimir Sementsov-Ogievskiy, qemu-devel, qemu-block

Am 04.11.2021 um 11:38 hat Hanna Reitz geschrieben:
> (2A) bdrv_replace_child_noperm() should immediately set bs->file or
>      bs->backing to NULL when it sets bs->{file,backing}->bs to NULL.
>      It should also immediately remove any BdrvChild with .bs == NULL
>      from the parent’s BDS.children list.
>      Implemented in patches 2 through 6.
> 
> (2B) Alternatively, we could always keep the whole subgraph drained
>      while we manipulate it.  Then, the bdrv_parent_drained_end_single()
>      in bdrv_replace_child_noperm() wouldn’t do anything.
>      To fix 030, we would need to add a drained section to
>      stream_prepare(): Namely we’d need to drain the subgraph below the
>      COR filter node.
>      This would be a much simpler solution, but I don’t feel like it’s
>      the right one.

> As you can see, I’m not sure which of 2A or 2B is the right solution.  I
> decided to investigate both: 2A was much more complicated, but seemed
> like the right thing to do; 2B is much simpler, but doesn’t feel as
> right.  Therefore, I decided to go with 2A in this first version of this
> series.

I haven't looked at the patches yet, but if I understand correctly the
choice you're presenting here is between protecting code from accessing
invalid state and not creating the invalid state in the first place. I
agree that the latter is preferable as long as it doesn't make things so
complicated that we would be willing to accept the higher risk of
breakage in the former. If it's doable in five patches, it's probably
not complicated enough to make such compromises.

Kevin



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 0/7] block: Attempt on fixing 030-reported errors
  2021-11-04 11:58 ` [PATCH 0/7] block: Attempt on fixing 030-reported errors Kevin Wolf
@ 2021-11-04 13:34   ` Hanna Reitz
  0 siblings, 0 replies; 23+ messages in thread
From: Hanna Reitz @ 2021-11-04 13:34 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: Vladimir Sementsov-Ogievskiy, qemu-devel, qemu-block

On 04.11.21 12:58, Kevin Wolf wrote:
> Am 04.11.2021 um 11:38 hat Hanna Reitz geschrieben:
>> (2A) bdrv_replace_child_noperm() should immediately set bs->file or
>>       bs->backing to NULL when it sets bs->{file,backing}->bs to NULL.
>>       It should also immediately remove any BdrvChild with .bs == NULL
>>       from the parent’s BDS.children list.
>>       Implemented in patches 2 through 6.
>>
>> (2B) Alternatively, we could always keep the whole subgraph drained
>>       while we manipulate it.  Then, the bdrv_parent_drained_end_single()
>>       in bdrv_replace_child_noperm() wouldn’t do anything.
>>       To fix 030, we would need to add a drained section to
>>       stream_prepare(): Namely we’d need to drain the subgraph below the
>>       COR filter node.
>>       This would be a much simpler solution, but I don’t feel like it’s
>>       the right one.
>> As you can see, I’m not sure which of 2A or 2B is the right solution.  I
>> decided to investigate both: 2A was much more complicated, but seemed
>> like the right thing to do; 2B is much simpler, but doesn’t feel as
>> right.  Therefore, I decided to go with 2A in this first version of this
>> series.
> I haven't looked at the patches yet, but if I understand correctly the
> choice you're presenting here is between protecting code from accessing
> invalid state and not creating the invalid state in the first place.

Yes, that’s right.

> I agree that the latter is preferable as long as it doesn't make things
> so complicated that we would be willing to accept the higher risk of
> breakage in the former.

No, I don’t think it’s too complicated.  Just not as sample as a 
drained_begin + drained_end.

> If it's doable in five patches, it's probably
> not complicated enough to make such compromises.

Without the clean-up patches that are patches 3 and 4, it would be 
doable in even fewer patches. :)

Hanna



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 5/7] block: Pass BdrvChild ** to replace_child_noperm
  2021-11-04 10:38 ` [PATCH 5/7] block: Pass BdrvChild ** to replace_child_noperm Hanna Reitz
@ 2021-11-05 15:15   ` Kevin Wolf
  2021-11-08  9:58     ` Hanna Reitz
  0 siblings, 1 reply; 23+ messages in thread
From: Kevin Wolf @ 2021-11-05 15:15 UTC (permalink / raw)
  To: Hanna Reitz; +Cc: Vladimir Sementsov-Ogievskiy, qemu-devel, qemu-block

Am 04.11.2021 um 11:38 hat Hanna Reitz geschrieben:
> bdrv_replace_child_noperm() modifies BdrvChild.bs, and can potentially
> set it to NULL.  That is dangerous, because BDS parents generally assume
> that their children's .bs pointer is never NULL.  We therefore want to
> let bdrv_replace_child_noperm() set the corresponding BdrvChild pointer
> to NULL, too.
> 
> This patch lays the foundation for it by passing a BdrvChild ** pointer
> to bdrv_replace_child_noperm() so that it can later use it to NULL the
> BdrvChild pointer immediately after setting BdrvChild.bs to NULL.
> 
> Signed-off-by: Hanna Reitz <hreitz@redhat.com>
> ---
>  block.c | 59 ++++++++++++++++++++++++++++++++-------------------------
>  1 file changed, 33 insertions(+), 26 deletions(-)
> 
> diff --git a/block.c b/block.c
> index 6d230ad3d1..ff45447686 100644
> --- a/block.c
> +++ b/block.c
> @@ -87,7 +87,7 @@ static BlockDriverState *bdrv_open_inherit(const char *filename,
>  static bool bdrv_recurse_has_child(BlockDriverState *bs,
>                                     BlockDriverState *child);
>  
> -static void bdrv_replace_child_noperm(BdrvChild *child,
> +static void bdrv_replace_child_noperm(BdrvChild **child,
>                                        BlockDriverState *new_bs);
>  static void bdrv_remove_file_or_backing_child(BlockDriverState *bs,
>                                                BdrvChild *child,
> @@ -2254,6 +2254,7 @@ static int bdrv_drv_set_perm(BlockDriverState *bs, uint64_t perm,
>  
>  typedef struct BdrvReplaceChildState {
>      BdrvChild *child;
> +    BdrvChild **childp;

Would it be clearer to rename child to old_child now that it can be
changed?

I would also consider having childp first because this is really the
object that we're modifying and potentially rolling back now.

>      BlockDriverState *old_bs;
>  } BdrvReplaceChildState;
>  
> @@ -2269,8 +2270,8 @@ static void bdrv_replace_child_abort(void *opaque)
>      BdrvReplaceChildState *s = opaque;
>      BlockDriverState *new_bs = s->child->bs;
>  
> -    /* old_bs reference is transparently moved from @s to @s->child */
> -    bdrv_replace_child_noperm(s->child, s->old_bs);
> +    /* old_bs reference is transparently moved from @s to *s->childp */
> +    bdrv_replace_child_noperm(s->childp, s->old_bs);
>      bdrv_unref(new_bs);
>  }
>  
> @@ -2287,21 +2288,23 @@ static TransactionActionDrv bdrv_replace_child_drv = {
>   *
>   * The function doesn't update permissions, caller is responsible for this.
>   */
> -static void bdrv_replace_child_tran(BdrvChild *child, BlockDriverState *new_bs,
> +static void bdrv_replace_child_tran(BdrvChild **childp,
> +                                    BlockDriverState *new_bs,
>                                      Transaction *tran)
>  {
>      BdrvReplaceChildState *s = g_new(BdrvReplaceChildState, 1);
>      *s = (BdrvReplaceChildState) {
> -        .child = child,
> -        .old_bs = child->bs,
> +        .child = *childp,
> +        .childp = childp,
> +        .old_bs = (*childp)->bs,
>      };
>      tran_add(tran, &bdrv_replace_child_drv, s);
>  
>      if (new_bs) {
>          bdrv_ref(new_bs);
>      }
> -    bdrv_replace_child_noperm(child, new_bs);
> -    /* old_bs reference is transparently moved from @child to @s */
> +    bdrv_replace_child_noperm(childp, new_bs);
> +    /* old_bs reference is transparently moved from *childp to @s */
>  }
>  
>  /*
> @@ -2672,9 +2675,10 @@ uint64_t bdrv_qapi_perm_to_blk_perm(BlockPermission qapi_perm)
>      return permissions[qapi_perm];
>  }
>  
> -static void bdrv_replace_child_noperm(BdrvChild *child,
> +static void bdrv_replace_child_noperm(BdrvChild **childp,
>                                        BlockDriverState *new_bs)
>  {
> +    BdrvChild *child = *childp;
>      BlockDriverState *old_bs = child->bs;
>      int new_bs_quiesce_counter;
>      int drain_saldo;
> @@ -2767,7 +2771,7 @@ static void bdrv_attach_child_common_abort(void *opaque)
>      BdrvChild *child = *s->child;
>      BlockDriverState *bs = child->bs;
>  
> -    bdrv_replace_child_noperm(child, NULL);
> +    bdrv_replace_child_noperm(s->child, NULL);
>  
>      if (bdrv_get_aio_context(bs) != s->old_child_ctx) {
>          bdrv_try_set_aio_context(bs, s->old_child_ctx, &error_abort);
> @@ -2867,7 +2871,7 @@ static int bdrv_attach_child_common(BlockDriverState *child_bs,
>      }
>  
>      bdrv_ref(child_bs);
> -    bdrv_replace_child_noperm(new_child, child_bs);
> +    bdrv_replace_child_noperm(&new_child, child_bs);
>  
>      *child = new_child;
>  
> @@ -2927,12 +2931,12 @@ static int bdrv_attach_child_noperm(BlockDriverState *parent_bs,
>      return 0;
>  }
>  
> -static void bdrv_detach_child(BdrvChild *child)
> +static void bdrv_detach_child(BdrvChild **childp)
>  {
> -    BlockDriverState *old_bs = child->bs;
> +    BlockDriverState *old_bs = (*childp)->bs;
>  
> -    bdrv_replace_child_noperm(child, NULL);
> -    bdrv_child_free(child);
> +    bdrv_replace_child_noperm(childp, NULL);
> +    bdrv_child_free(*childp);
>  
>      if (old_bs) {
>          /*
> @@ -3038,7 +3042,7 @@ void bdrv_root_unref_child(BdrvChild *child)
>      BlockDriverState *child_bs;
>  
>      child_bs = child->bs;
> -    bdrv_detach_child(child);
> +    bdrv_detach_child(&child);
>      bdrv_unref(child_bs);
>  }
>  
> @@ -4891,30 +4895,33 @@ static void bdrv_remove_file_or_backing_child(BlockDriverState *bs,
>                                                BdrvChild *child,
>                                                Transaction *tran)
>  {
> +    BdrvChild **childp;
>      BdrvRemoveFilterOrCowChild *s;
>  
> -    assert(child == bs->backing || child == bs->file);
> +    if (child == bs->backing) {
> +        childp = &bs->backing;
> +    } else if (child == bs->file) {
> +        childp = &bs->file;
> +    } else {
> +        g_assert_not_reached();
> +    }
>  
>      if (!child) {
>          return;
>      }
>  
>      if (child->bs) {
> -        bdrv_replace_child_tran(child, NULL, tran);
> +        bdrv_replace_child_tran(childp, NULL, tran);
>      }
>  
>      s = g_new(BdrvRemoveFilterOrCowChild, 1);
>      *s = (BdrvRemoveFilterOrCowChild) {
>          .child = child,
> -        .is_backing = (child == bs->backing),
> +        .is_backing = (childp == &bs->backing),
>      };
>      tran_add(tran, &bdrv_remove_filter_or_cow_child_drv, s);
>  
> -    if (s->is_backing) {
> -        bs->backing = NULL;
> -    } else {
> -        bs->file = NULL;
> -    }
> +    *childp = NULL;
>  }

Hmm... This looks a bit dangerous. Is it guaranteed that bs lives until
the end of the transaction? I guess it has to because we want to be able
to roll back, so probably this is okay, though I'm not sure if I would
bet money on it.

But...

> @@ -4950,7 +4957,7 @@ static int bdrv_replace_node_noperm(BlockDriverState *from,
>                         c->name, from->node_name);
>              return -EPERM;
>          }
> -        bdrv_replace_child_tran(c, to, tran);
> +        bdrv_replace_child_tran(&c, to, tran);
>      }

...here, c is a local stack variable that is even reused in a loop.
bdrv_replace_child_tran() now keeps a pointer to the same variable in
the transaction state for each BdrvChild in the whole parent list, and
by the time the transaction is finalised, I think they are all dangling
pointers anyway because they pointed to stack variables in a function
that has already returned.

bdrv_replace_child_tran() should probably have a comment like we already
have in bdrv_attach_child_common():

 * @child is saved to a new entry of @tran, so that *@child could be reverted to
 * NULL on abort(). So referenced variable must live at least until transaction
 * end.

>      return 0;
> @@ -5099,7 +5106,7 @@ int bdrv_replace_child_bs(BdrvChild *child, BlockDriverState *new_bs,
>      bdrv_drained_begin(old_bs);
>      bdrv_drained_begin(new_bs);
>  
> -    bdrv_replace_child_tran(child, new_bs, tran);
> +    bdrv_replace_child_tran(&child, new_bs, tran);
>  
>      found = g_hash_table_new(NULL, NULL);
>      refresh_list = bdrv_topological_dfs(refresh_list, found, old_bs);

Kevin



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 6/7] block: Let replace_child_noperm free children
  2021-11-04 10:38 ` [PATCH 6/7] block: Let replace_child_noperm free children Hanna Reitz
@ 2021-11-05 15:41   ` Kevin Wolf
  2021-11-08 10:12     ` Hanna Reitz
  0 siblings, 1 reply; 23+ messages in thread
From: Kevin Wolf @ 2021-11-05 15:41 UTC (permalink / raw)
  To: Hanna Reitz; +Cc: Vladimir Sementsov-Ogievskiy, qemu-devel, qemu-block

Am 04.11.2021 um 11:38 hat Hanna Reitz geschrieben:
> In most of the block layer, especially when traversing down from other
> BlockDriverStates, we assume that BdrvChild.bs can never be NULL.  When
> it becomes NULL, it is expected that the corresponding BdrvChild pointer
> also becomes NULL and the BdrvChild object is freed.
> 
> Therefore, once bdrv_replace_child_noperm() sets the BdrvChild.bs
> pointer to NULL, it should also immediately set the corresponding
> BdrvChild pointer (like bs->file or bs->backing) to NULL.
> 
> In that context, it also makes sense for this function to free the
> child.  Sometimes we cannot do so, though, because it is called in a
> transactional context where the caller might still want to reinstate the
> child in the abort branch (and free it only on commit), so this behavior
> has to remain optional.
> 
> In bdrv_replace_child_tran()'s abort handler, we now rely on the fact
> that the BdrvChild passed to bdrv_replace_child_tran() must have had a
> non-NULL .bs pointer initially.  Make a note of that and assert it.
> 
> Signed-off-by: Hanna Reitz <hreitz@redhat.com>
> ---
>  block.c | 102 +++++++++++++++++++++++++++++++++++++++++++++++---------
>  1 file changed, 86 insertions(+), 16 deletions(-)
> 
> diff --git a/block.c b/block.c
> index ff45447686..0a85ede8dc 100644
> --- a/block.c
> +++ b/block.c
> @@ -87,8 +87,10 @@ static BlockDriverState *bdrv_open_inherit(const char *filename,
>  static bool bdrv_recurse_has_child(BlockDriverState *bs,
>                                     BlockDriverState *child);
>  
> +static void bdrv_child_free(BdrvChild *child);
>  static void bdrv_replace_child_noperm(BdrvChild **child,
> -                                      BlockDriverState *new_bs);
> +                                      BlockDriverState *new_bs,
> +                                      bool free_empty_child);
>  static void bdrv_remove_file_or_backing_child(BlockDriverState *bs,
>                                                BdrvChild *child,
>                                                Transaction *tran);
> @@ -2256,12 +2258,16 @@ typedef struct BdrvReplaceChildState {
>      BdrvChild *child;
>      BdrvChild **childp;
>      BlockDriverState *old_bs;
> +    bool free_empty_child;
>  } BdrvReplaceChildState;
>  
>  static void bdrv_replace_child_commit(void *opaque)
>  {
>      BdrvReplaceChildState *s = opaque;
>  
> +    if (s->free_empty_child && !s->child->bs) {
> +        bdrv_child_free(s->child);
> +    }
>      bdrv_unref(s->old_bs);
>  }
>  
> @@ -2270,8 +2276,23 @@ static void bdrv_replace_child_abort(void *opaque)
>      BdrvReplaceChildState *s = opaque;
>      BlockDriverState *new_bs = s->child->bs;
>  
> -    /* old_bs reference is transparently moved from @s to *s->childp */
> -    bdrv_replace_child_noperm(s->childp, s->old_bs);
> +    /*
> +     * old_bs reference is transparently moved from @s to s->child;
> +     * pass &s->child here instead of s->childp, because *s->childp
> +     * will have been cleared by bdrv_replace_child_tran()'s
> +     * bdrv_replace_child_noperm() call if new_bs is NULL, and we must
> +     * not pass a NULL *s->childp here.
> +     */
> +    bdrv_replace_child_noperm(&s->child, s->old_bs, true);

Passing free_empty_child=true with a non-NULL new_bs looks a bit
confusing because the child isn't supposed to become empty anyway.

> +    /*
> +     * The child was pre-existing, so s->old_bs must be non-NULL, and
> +     * s->child thus must not have been freed
> +     */
> +    assert(s->child != NULL);
> +    if (!new_bs) {
> +        /* As described above, *s->childp was cleared, so restore it */
> +        *s->childp = s->child;
> +    }

If it wasn't cleared, doesn't it still contain s->child, so this could
just be an unconditional rollback?

>      bdrv_unref(new_bs);
>  }

Kevin



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 0/7] block: Attempt on fixing 030-reported errors
  2021-11-04 10:38 [PATCH 0/7] block: Attempt on fixing 030-reported errors Hanna Reitz
                   ` (7 preceding siblings ...)
  2021-11-04 11:58 ` [PATCH 0/7] block: Attempt on fixing 030-reported errors Kevin Wolf
@ 2021-11-05 15:44 ` Kevin Wolf
  8 siblings, 0 replies; 23+ messages in thread
From: Kevin Wolf @ 2021-11-05 15:44 UTC (permalink / raw)
  To: Hanna Reitz; +Cc: Vladimir Sementsov-Ogievskiy, qemu-devel, qemu-block

Am 04.11.2021 um 11:38 hat Hanna Reitz geschrieben:
> Hanna Reitz (7):
>   stream: Traverse graph after modification
>   block: Manipulate children list in .attach/.detach
>   block: Unite remove_empty_child and child_free
>   block: Drop detached child from ignore list
>   block: Pass BdrvChild ** to replace_child_noperm
>   block: Let replace_child_noperm free children
>   iotests/030: Unthrottle parallel jobs in reverse

Now I know that I don't aspire to a new career as a full time borrow
checker. :-)

Patches 1-4:
Reviewed-by: Kevin Wolf <kwolf@redhat.com>



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 5/7] block: Pass BdrvChild ** to replace_child_noperm
  2021-11-05 15:15   ` Kevin Wolf
@ 2021-11-08  9:58     ` Hanna Reitz
  0 siblings, 0 replies; 23+ messages in thread
From: Hanna Reitz @ 2021-11-08  9:58 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: Vladimir Sementsov-Ogievskiy, qemu-devel, qemu-block

On 05.11.21 16:15, Kevin Wolf wrote:
> Am 04.11.2021 um 11:38 hat Hanna Reitz geschrieben:
>> bdrv_replace_child_noperm() modifies BdrvChild.bs, and can potentially
>> set it to NULL.  That is dangerous, because BDS parents generally assume
>> that their children's .bs pointer is never NULL.  We therefore want to
>> let bdrv_replace_child_noperm() set the corresponding BdrvChild pointer
>> to NULL, too.
>>
>> This patch lays the foundation for it by passing a BdrvChild ** pointer
>> to bdrv_replace_child_noperm() so that it can later use it to NULL the
>> BdrvChild pointer immediately after setting BdrvChild.bs to NULL.
>>
>> Signed-off-by: Hanna Reitz <hreitz@redhat.com>
>> ---
>>   block.c | 59 ++++++++++++++++++++++++++++++++-------------------------
>>   1 file changed, 33 insertions(+), 26 deletions(-)
>>
>> diff --git a/block.c b/block.c
>> index 6d230ad3d1..ff45447686 100644
>> --- a/block.c
>> +++ b/block.c
>> @@ -87,7 +87,7 @@ static BlockDriverState *bdrv_open_inherit(const char *filename,
>>   static bool bdrv_recurse_has_child(BlockDriverState *bs,
>>                                      BlockDriverState *child);
>>   
>> -static void bdrv_replace_child_noperm(BdrvChild *child,
>> +static void bdrv_replace_child_noperm(BdrvChild **child,
>>                                         BlockDriverState *new_bs);
>>   static void bdrv_remove_file_or_backing_child(BlockDriverState *bs,
>>                                                 BdrvChild *child,
>> @@ -2254,6 +2254,7 @@ static int bdrv_drv_set_perm(BlockDriverState *bs, uint64_t perm,
>>   
>>   typedef struct BdrvReplaceChildState {
>>       BdrvChild *child;
>> +    BdrvChild **childp;
> Would it be clearer to rename child to old_child now that it can be
> changed?
>
> I would also consider having childp first because this is really the
> object that we're modifying and potentially rolling back now.
>
>>       BlockDriverState *old_bs;
>>   } BdrvReplaceChildState;
>>   
>> @@ -2269,8 +2270,8 @@ static void bdrv_replace_child_abort(void *opaque)
>>       BdrvReplaceChildState *s = opaque;
>>       BlockDriverState *new_bs = s->child->bs;
>>   
>> -    /* old_bs reference is transparently moved from @s to @s->child */
>> -    bdrv_replace_child_noperm(s->child, s->old_bs);
>> +    /* old_bs reference is transparently moved from @s to *s->childp */
>> +    bdrv_replace_child_noperm(s->childp, s->old_bs);
>>       bdrv_unref(new_bs);
>>   }
>>   
>> @@ -2287,21 +2288,23 @@ static TransactionActionDrv bdrv_replace_child_drv = {
>>    *
>>    * The function doesn't update permissions, caller is responsible for this.
>>    */
>> -static void bdrv_replace_child_tran(BdrvChild *child, BlockDriverState *new_bs,
>> +static void bdrv_replace_child_tran(BdrvChild **childp,
>> +                                    BlockDriverState *new_bs,
>>                                       Transaction *tran)
>>   {
>>       BdrvReplaceChildState *s = g_new(BdrvReplaceChildState, 1);
>>       *s = (BdrvReplaceChildState) {
>> -        .child = child,
>> -        .old_bs = child->bs,
>> +        .child = *childp,
>> +        .childp = childp,
>> +        .old_bs = (*childp)->bs,
>>       };
>>       tran_add(tran, &bdrv_replace_child_drv, s);
>>   
>>       if (new_bs) {
>>           bdrv_ref(new_bs);
>>       }
>> -    bdrv_replace_child_noperm(child, new_bs);
>> -    /* old_bs reference is transparently moved from @child to @s */
>> +    bdrv_replace_child_noperm(childp, new_bs);
>> +    /* old_bs reference is transparently moved from *childp to @s */
>>   }
>>   
>>   /*
>> @@ -2672,9 +2675,10 @@ uint64_t bdrv_qapi_perm_to_blk_perm(BlockPermission qapi_perm)
>>       return permissions[qapi_perm];
>>   }
>>   
>> -static void bdrv_replace_child_noperm(BdrvChild *child,
>> +static void bdrv_replace_child_noperm(BdrvChild **childp,
>>                                         BlockDriverState *new_bs)
>>   {
>> +    BdrvChild *child = *childp;
>>       BlockDriverState *old_bs = child->bs;
>>       int new_bs_quiesce_counter;
>>       int drain_saldo;
>> @@ -2767,7 +2771,7 @@ static void bdrv_attach_child_common_abort(void *opaque)
>>       BdrvChild *child = *s->child;
>>       BlockDriverState *bs = child->bs;
>>   
>> -    bdrv_replace_child_noperm(child, NULL);
>> +    bdrv_replace_child_noperm(s->child, NULL);
>>   
>>       if (bdrv_get_aio_context(bs) != s->old_child_ctx) {
>>           bdrv_try_set_aio_context(bs, s->old_child_ctx, &error_abort);
>> @@ -2867,7 +2871,7 @@ static int bdrv_attach_child_common(BlockDriverState *child_bs,
>>       }
>>   
>>       bdrv_ref(child_bs);
>> -    bdrv_replace_child_noperm(new_child, child_bs);
>> +    bdrv_replace_child_noperm(&new_child, child_bs);
>>   
>>       *child = new_child;
>>   
>> @@ -2927,12 +2931,12 @@ static int bdrv_attach_child_noperm(BlockDriverState *parent_bs,
>>       return 0;
>>   }
>>   
>> -static void bdrv_detach_child(BdrvChild *child)
>> +static void bdrv_detach_child(BdrvChild **childp)
>>   {
>> -    BlockDriverState *old_bs = child->bs;
>> +    BlockDriverState *old_bs = (*childp)->bs;
>>   
>> -    bdrv_replace_child_noperm(child, NULL);
>> -    bdrv_child_free(child);
>> +    bdrv_replace_child_noperm(childp, NULL);
>> +    bdrv_child_free(*childp);
>>   
>>       if (old_bs) {
>>           /*
>> @@ -3038,7 +3042,7 @@ void bdrv_root_unref_child(BdrvChild *child)
>>       BlockDriverState *child_bs;
>>   
>>       child_bs = child->bs;
>> -    bdrv_detach_child(child);
>> +    bdrv_detach_child(&child);
>>       bdrv_unref(child_bs);
>>   }
>>   
>> @@ -4891,30 +4895,33 @@ static void bdrv_remove_file_or_backing_child(BlockDriverState *bs,
>>                                                 BdrvChild *child,
>>                                                 Transaction *tran)
>>   {
>> +    BdrvChild **childp;
>>       BdrvRemoveFilterOrCowChild *s;
>>   
>> -    assert(child == bs->backing || child == bs->file);
>> +    if (child == bs->backing) {
>> +        childp = &bs->backing;
>> +    } else if (child == bs->file) {
>> +        childp = &bs->file;
>> +    } else {
>> +        g_assert_not_reached();
>> +    }
>>   
>>       if (!child) {
>>           return;
>>       }
>>   
>>       if (child->bs) {
>> -        bdrv_replace_child_tran(child, NULL, tran);
>> +        bdrv_replace_child_tran(childp, NULL, tran);
>>       }
>>   
>>       s = g_new(BdrvRemoveFilterOrCowChild, 1);
>>       *s = (BdrvRemoveFilterOrCowChild) {
>>           .child = child,
>> -        .is_backing = (child == bs->backing),
>> +        .is_backing = (childp == &bs->backing),
>>       };
>>       tran_add(tran, &bdrv_remove_filter_or_cow_child_drv, s);
>>   
>> -    if (s->is_backing) {
>> -        bs->backing = NULL;
>> -    } else {
>> -        bs->file = NULL;
>> -    }
>> +    *childp = NULL;
>>   }
> Hmm... This looks a bit dangerous. Is it guaranteed that bs lives until
> the end of the transaction? I guess it has to because we want to be able
> to roll back, so probably this is okay, though I'm not sure if I would
> bet money on it.

Well, if it should live long enough, a bdrv_ref()+unref() definitely 
shouldn’t hurt.

> But...
>
>> @@ -4950,7 +4957,7 @@ static int bdrv_replace_node_noperm(BlockDriverState *from,
>>                          c->name, from->node_name);
>>               return -EPERM;
>>           }
>> -        bdrv_replace_child_tran(c, to, tran);
>> +        bdrv_replace_child_tran(&c, to, tran);
>>       }
> ...here, c is a local stack variable that is even reused in a loop.
> bdrv_replace_child_tran() now keeps a pointer to the same variable in
> the transaction state for each BdrvChild in the whole parent list, and
> by the time the transaction is finalised, I think they are all dangling
> pointers anyway because they pointed to stack variables in a function
> that has already returned.

Oh no.  Yes, that’s a bit of a problem...

> bdrv_replace_child_tran() should probably have a comment like we already
> have in bdrv_attach_child_common():
>
>   * @child is saved to a new entry of @tran, so that *@child could be reverted to
>   * NULL on abort(). So referenced variable must live at least until transaction
>   * end.

Right.  Now just to figure out how to solve the concrete problem... :/

(Yes, a borrow checker would’ve helped :))

Hanna



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 6/7] block: Let replace_child_noperm free children
  2021-11-05 15:41   ` Kevin Wolf
@ 2021-11-08 10:12     ` Hanna Reitz
  0 siblings, 0 replies; 23+ messages in thread
From: Hanna Reitz @ 2021-11-08 10:12 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: Vladimir Sementsov-Ogievskiy, qemu-devel, qemu-block

On 05.11.21 16:41, Kevin Wolf wrote:
> Am 04.11.2021 um 11:38 hat Hanna Reitz geschrieben:
>> In most of the block layer, especially when traversing down from other
>> BlockDriverStates, we assume that BdrvChild.bs can never be NULL.  When
>> it becomes NULL, it is expected that the corresponding BdrvChild pointer
>> also becomes NULL and the BdrvChild object is freed.
>>
>> Therefore, once bdrv_replace_child_noperm() sets the BdrvChild.bs
>> pointer to NULL, it should also immediately set the corresponding
>> BdrvChild pointer (like bs->file or bs->backing) to NULL.
>>
>> In that context, it also makes sense for this function to free the
>> child.  Sometimes we cannot do so, though, because it is called in a
>> transactional context where the caller might still want to reinstate the
>> child in the abort branch (and free it only on commit), so this behavior
>> has to remain optional.
>>
>> In bdrv_replace_child_tran()'s abort handler, we now rely on the fact
>> that the BdrvChild passed to bdrv_replace_child_tran() must have had a
>> non-NULL .bs pointer initially.  Make a note of that and assert it.
>>
>> Signed-off-by: Hanna Reitz <hreitz@redhat.com>
>> ---
>>   block.c | 102 +++++++++++++++++++++++++++++++++++++++++++++++---------
>>   1 file changed, 86 insertions(+), 16 deletions(-)
>>
>> diff --git a/block.c b/block.c
>> index ff45447686..0a85ede8dc 100644
>> --- a/block.c
>> +++ b/block.c
>> @@ -87,8 +87,10 @@ static BlockDriverState *bdrv_open_inherit(const char *filename,
>>   static bool bdrv_recurse_has_child(BlockDriverState *bs,
>>                                      BlockDriverState *child);
>>   
>> +static void bdrv_child_free(BdrvChild *child);
>>   static void bdrv_replace_child_noperm(BdrvChild **child,
>> -                                      BlockDriverState *new_bs);
>> +                                      BlockDriverState *new_bs,
>> +                                      bool free_empty_child);
>>   static void bdrv_remove_file_or_backing_child(BlockDriverState *bs,
>>                                                 BdrvChild *child,
>>                                                 Transaction *tran);
>> @@ -2256,12 +2258,16 @@ typedef struct BdrvReplaceChildState {
>>       BdrvChild *child;
>>       BdrvChild **childp;
>>       BlockDriverState *old_bs;
>> +    bool free_empty_child;
>>   } BdrvReplaceChildState;
>>   
>>   static void bdrv_replace_child_commit(void *opaque)
>>   {
>>       BdrvReplaceChildState *s = opaque;
>>   
>> +    if (s->free_empty_child && !s->child->bs) {
>> +        bdrv_child_free(s->child);
>> +    }
>>       bdrv_unref(s->old_bs);
>>   }
>>   
>> @@ -2270,8 +2276,23 @@ static void bdrv_replace_child_abort(void *opaque)
>>       BdrvReplaceChildState *s = opaque;
>>       BlockDriverState *new_bs = s->child->bs;
>>   
>> -    /* old_bs reference is transparently moved from @s to *s->childp */
>> -    bdrv_replace_child_noperm(s->childp, s->old_bs);
>> +    /*
>> +     * old_bs reference is transparently moved from @s to s->child;
>> +     * pass &s->child here instead of s->childp, because *s->childp
>> +     * will have been cleared by bdrv_replace_child_tran()'s
>> +     * bdrv_replace_child_noperm() call if new_bs is NULL, and we must
>> +     * not pass a NULL *s->childp here.
>> +     */
>> +    bdrv_replace_child_noperm(&s->child, s->old_bs, true);
> Passing free_empty_child=true with a non-NULL new_bs looks a bit
> confusing because the child isn't supposed to become empty anyway.

I wasn’t sure what to do.  I decided to make the contract “Caller should 
pass false only if they will deal with freeing the child”, and so I 
ended up passing `true` in cases such as here.  I felt like `true` 
should kind of be the default case, and `false` the exception.

>> +    /*
>> +     * The child was pre-existing, so s->old_bs must be non-NULL, and
>> +     * s->child thus must not have been freed
>> +     */
>> +    assert(s->child != NULL);
>> +    if (!new_bs) {
>> +        /* As described above, *s->childp was cleared, so restore it */
>> +        *s->childp = s->child;
>> +    }
> If it wasn't cleared, doesn't it still contain s->child, so this could
> just be an unconditional rollback?

I think so, yes.  We’d still have to explain why the rollback is 
required, though, so would dropping the condition really make it 
simpler?  (Of course this also goes in the opposite direction: Is making 
it conditional simpler? :))

I just feel like the comment would be something like “Restore *s->childp 
in case it was cleared as described above”, and then I’d find it a bit 
strange if the “in case” isn’t part of the code...  *shrug*

Hanna



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 1/7] stream: Traverse graph after modification
  2021-11-04 10:38 ` [PATCH 1/7] stream: Traverse graph after modification Hanna Reitz
@ 2021-11-10 12:17   ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 23+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2021-11-10 12:17 UTC (permalink / raw)
  To: Hanna Reitz, qemu-block; +Cc: qemu-devel, Kevin Wolf

04.11.2021 13:38, Hanna Reitz wrote:
> bdrv_cor_filter_drop() modifies the block graph.  That means that other
> parties can also modify the block graph before it returns.  Therefore,
> we cannot assume that the result of a graph traversal we did before
> remains valid afterwards.
> 
> We should thus fetch `base` and `unfiltered_base` afterwards instead of
> before.
> 
> Signed-off-by: Hanna Reitz <hreitz@redhat.com>

The fact that other parties can modify graph during our graph modification is a global problem.. The patch doesn't fix it, but reduces its effect in specific case.. OK.

Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>


> ---
>   block/stream.c | 7 +++++--
>   1 file changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/block/stream.c b/block/stream.c
> index 97bee482dc..e45113aed6 100644
> --- a/block/stream.c
> +++ b/block/stream.c
> @@ -54,8 +54,8 @@ static int stream_prepare(Job *job)
>   {
>       StreamBlockJob *s = container_of(job, StreamBlockJob, common.job);
>       BlockDriverState *unfiltered_bs = bdrv_skip_filters(s->target_bs);
> -    BlockDriverState *base = bdrv_filter_or_cow_bs(s->above_base);
> -    BlockDriverState *unfiltered_base = bdrv_skip_filters(base);
> +    BlockDriverState *base;
> +    BlockDriverState *unfiltered_base;
>       Error *local_err = NULL;
>       int ret = 0;
>   
> @@ -63,6 +63,9 @@ static int stream_prepare(Job *job)
>       bdrv_cor_filter_drop(s->cor_filter_bs);
>       s->cor_filter_bs = NULL;
>   
> +    base = bdrv_filter_or_cow_bs(s->above_base);
> +    unfiltered_base = bdrv_skip_filters(base);
> +
>       if (bdrv_cow_child(unfiltered_bs)) {
>           const char *base_id = NULL, *base_fmt = NULL;
>           if (unfiltered_base) {
> 


-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 2/7] block: Manipulate children list in .attach/.detach
  2021-11-04 10:38 ` [PATCH 2/7] block: Manipulate children list in .attach/.detach Hanna Reitz
@ 2021-11-10 12:46   ` Vladimir Sementsov-Ogievskiy
  2021-11-10 15:05     ` Hanna Reitz
  2021-11-10 12:51   ` Vladimir Sementsov-Ogievskiy
  1 sibling, 1 reply; 23+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2021-11-10 12:46 UTC (permalink / raw)
  To: Hanna Reitz, qemu-block; +Cc: qemu-devel, Kevin Wolf

04.11.2021 13:38, Hanna Reitz wrote:
> The children list is specific to BDS parents.  We should not modify it
> in the general children modification code, but let BDS parents deal with
> it in their .attach() and .detach() methods.
> 
> This also has the advantage that a BdrvChild is removed from the
> children list before its .bs pointer can become NULL.  BDS parents
> generally assume that their children's .bs pointer is never NULL, so
> this is actually a bug fix.
> 
> Signed-off-by: Hanna Reitz <hreitz@redhat.com>
> ---
>   block.c | 9 +++++----
>   1 file changed, 5 insertions(+), 4 deletions(-)
> 
> diff --git a/block.c b/block.c
> index 580cb77a70..243ae206b5 100644
> --- a/block.c
> +++ b/block.c
> @@ -1387,6 +1387,8 @@ static void bdrv_child_cb_attach(BdrvChild *child)
>   {
>       BlockDriverState *bs = child->opaque;
>   
> +    QLIST_INSERT_HEAD(&bs->children, child, next);
> +
>       if (child->role & BDRV_CHILD_COW) {
>           bdrv_backing_attach(child);
>       }
> @@ -1403,6 +1405,8 @@ static void bdrv_child_cb_detach(BdrvChild *child)
>       }
>   
>       bdrv_unapply_subtree_drain(child, bs);
> +
> +    QLIST_REMOVE(child, next);
>   }
>   
>   static int bdrv_child_cb_update_filename(BdrvChild *c, BlockDriverState *base,
> @@ -2747,7 +2751,7 @@ static void bdrv_child_free(void *opaque)
>   static void bdrv_remove_empty_child(BdrvChild *child)
>   {
>       assert(!child->bs);
> -    QLIST_SAFE_REMOVE(child, next);
> +    assert(!child->next.le_prev); /* not in children list */
>       bdrv_child_free(child);
>   }
>   
> @@ -2913,7 +2917,6 @@ static int bdrv_attach_child_noperm(BlockDriverState *parent_bs,
>           return ret;
>       }
>   
> -    QLIST_INSERT_HEAD(&parent_bs->children, *child, next);

The following comment become stale. We should remove it too. With comment removed:

Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>

>       /*
>        * child is removed in bdrv_attach_child_common_abort(), so don't care to
>        * abort this change separately.
> @@ -4851,7 +4854,6 @@ static void bdrv_remove_filter_or_cow_child_abort(void *opaque)
>       BdrvRemoveFilterOrCowChild *s = opaque;
>       BlockDriverState *parent_bs = s->child->opaque;
>   
> -    QLIST_INSERT_HEAD(&parent_bs->children, s->child, next);
>       if (s->is_backing) {
>           parent_bs->backing = s->child;
>       } else {
> @@ -4906,7 +4908,6 @@ static void bdrv_remove_file_or_backing_child(BlockDriverState *bs,
>       };
>       tran_add(tran, &bdrv_remove_filter_or_cow_child_drv, s);
>   
> -    QLIST_SAFE_REMOVE(child, next);
>       if (s->is_backing) {
>           bs->backing = NULL;
>       } else {
> 


-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 2/7] block: Manipulate children list in .attach/.detach
  2021-11-04 10:38 ` [PATCH 2/7] block: Manipulate children list in .attach/.detach Hanna Reitz
  2021-11-10 12:46   ` Vladimir Sementsov-Ogievskiy
@ 2021-11-10 12:51   ` Vladimir Sementsov-Ogievskiy
  2021-11-10 15:12     ` Hanna Reitz
  1 sibling, 1 reply; 23+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2021-11-10 12:51 UTC (permalink / raw)
  To: Hanna Reitz, qemu-block; +Cc: qemu-devel, Kevin Wolf

04.11.2021 13:38, Hanna Reitz wrote:
> The children list is specific to BDS parents.  We should not modify it
> in the general children modification code, but let BDS parents deal with
> it in their .attach() and .detach() methods.
> 
> This also has the advantage that a BdrvChild is removed from the
> children list before its .bs pointer can become NULL.  BDS parents
> generally assume that their children's .bs pointer is never NULL, so
> this is actually a bug fix.
> 
> Signed-off-by: Hanna Reitz<hreitz@redhat.com>


Interesting that nor child_root neither child_job do similar things in .attach / .detach ... Should we do something with it?

-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 3/7] block: Unite remove_empty_child and child_free
  2021-11-04 10:38 ` [PATCH 3/7] block: Unite remove_empty_child and child_free Hanna Reitz
@ 2021-11-10 12:52   ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 23+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2021-11-10 12:52 UTC (permalink / raw)
  To: Hanna Reitz, qemu-block; +Cc: qemu-devel, Kevin Wolf

04.11.2021 13:38, Hanna Reitz wrote:
> Now that bdrv_remove_empty_child() no longer removes the child from the
> parent's children list but only checks that it is not in such a list, it
> is only a wrapper around bdrv_child_free() that checks that the child is
> empty and unused.  That should apply to all children that we free, so
> put those checks into bdrv_child_free() and drop
> bdrv_remove_empty_child().
> 
> Signed-off-by: Hanna Reitz<hreitz@redhat.com>


Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>

-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 4/7] block: Drop detached child from ignore list
  2021-11-04 10:38 ` [PATCH 4/7] block: Drop detached child from ignore list Hanna Reitz
@ 2021-11-10 13:38   ` Vladimir Sementsov-Ogievskiy via
  2021-11-10 17:32     ` Hanna Reitz
  0 siblings, 1 reply; 23+ messages in thread
From: Vladimir Sementsov-Ogievskiy via @ 2021-11-10 13:38 UTC (permalink / raw)
  To: Hanna Reitz, qemu-block; +Cc: qemu-devel, Kevin Wolf

04.11.2021 13:38, Hanna Reitz wrote:
> bdrv_attach_child_common_abort() restores the parent's AioContext.  To
> do so, the child (which was supposed to be attached, but is now detached
> again by this abort handler) is added to the ignore list for the
> AioContext changing functions.
> 
> However, since we modify a BDS's children list in the BdrvChildClass's
> .attach and .detach handlers, the child is already effectively detached
> from the parent by this point.  We do not need to put it into the ignore
> list.
> 
> Use this opportunity to clean up the empty line structure: Keep setting
> the ignore list, invoking the AioContext function, and freeing the
> ignore list in blocks separated by empty lines.
> 
> Signed-off-by: Hanna Reitz <hreitz@redhat.com>


Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>


What comes in my mind, is that now bdrv_replace_child_noperm() is more strong in detaching.. But may be not enough strong: we still have link from the child to parent (child->opaque).. Maybe more correct for BdrvChild object to have no relation with parent after detaching. But than we'll need some GenericParent object (as child->class is mostly about parent, not about child and even not about the edge).. Now GenericParent is "included" into BdrvChild, which represents both GenericParent and Edge objects..

> ---
>   block.c | 8 +++++---
>   1 file changed, 5 insertions(+), 3 deletions(-)
> 
> diff --git a/block.c b/block.c
> index b95f8dcf8f..6d230ad3d1 100644
> --- a/block.c
> +++ b/block.c
> @@ -2774,14 +2774,16 @@ static void bdrv_attach_child_common_abort(void *opaque)
>       }
>   
>       if (bdrv_child_get_parent_aio_context(child) != s->old_parent_ctx) {
> -        GSList *ignore = g_slist_prepend(NULL, child);
> +        GSList *ignore;
>   
> +        /* No need to ignore `child`, because it has been detached already */
> +        ignore = NULL;
>           child->klass->can_set_aio_ctx(child, s->old_parent_ctx, &ignore,
>                                         &error_abort);
>           g_slist_free(ignore);
> -        ignore = g_slist_prepend(NULL, child);
> -        child->klass->set_aio_ctx(child, s->old_parent_ctx, &ignore);
>   
> +        ignore = NULL;
> +        child->klass->set_aio_ctx(child, s->old_parent_ctx, &ignore);
>           g_slist_free(ignore);
>       }
>   
> 


-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 2/7] block: Manipulate children list in .attach/.detach
  2021-11-10 12:46   ` Vladimir Sementsov-Ogievskiy
@ 2021-11-10 15:05     ` Hanna Reitz
  0 siblings, 0 replies; 23+ messages in thread
From: Hanna Reitz @ 2021-11-10 15:05 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-block; +Cc: Kevin Wolf, qemu-devel

On 10.11.21 13:46, Vladimir Sementsov-Ogievskiy wrote:
> 04.11.2021 13:38, Hanna Reitz wrote:
>> The children list is specific to BDS parents.  We should not modify it
>> in the general children modification code, but let BDS parents deal with
>> it in their .attach() and .detach() methods.
>>
>> This also has the advantage that a BdrvChild is removed from the
>> children list before its .bs pointer can become NULL.  BDS parents
>> generally assume that their children's .bs pointer is never NULL, so
>> this is actually a bug fix.
>>
>> Signed-off-by: Hanna Reitz <hreitz@redhat.com>
>> ---
>>   block.c | 9 +++++----
>>   1 file changed, 5 insertions(+), 4 deletions(-)
>>
>> diff --git a/block.c b/block.c
>> index 580cb77a70..243ae206b5 100644
>> --- a/block.c
>> +++ b/block.c
>> @@ -1387,6 +1387,8 @@ static void bdrv_child_cb_attach(BdrvChild *child)
>>   {
>>       BlockDriverState *bs = child->opaque;
>>   +    QLIST_INSERT_HEAD(&bs->children, child, next);
>> +
>>       if (child->role & BDRV_CHILD_COW) {
>>           bdrv_backing_attach(child);
>>       }
>> @@ -1403,6 +1405,8 @@ static void bdrv_child_cb_detach(BdrvChild *child)
>>       }
>>         bdrv_unapply_subtree_drain(child, bs);
>> +
>> +    QLIST_REMOVE(child, next);
>>   }
>>     static int bdrv_child_cb_update_filename(BdrvChild *c, 
>> BlockDriverState *base,
>> @@ -2747,7 +2751,7 @@ static void bdrv_child_free(void *opaque)
>>   static void bdrv_remove_empty_child(BdrvChild *child)
>>   {
>>       assert(!child->bs);
>> -    QLIST_SAFE_REMOVE(child, next);
>> +    assert(!child->next.le_prev); /* not in children list */
>>       bdrv_child_free(child);
>>   }
>>   @@ -2913,7 +2917,6 @@ static int 
>> bdrv_attach_child_noperm(BlockDriverState *parent_bs,
>>           return ret;
>>       }
>>   -    QLIST_INSERT_HEAD(&parent_bs->children, *child, next);
>
> The following comment become stale. We should remove it too. With 
> comment removed:

Ah, right, thanks!

> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 2/7] block: Manipulate children list in .attach/.detach
  2021-11-10 12:51   ` Vladimir Sementsov-Ogievskiy
@ 2021-11-10 15:12     ` Hanna Reitz
  0 siblings, 0 replies; 23+ messages in thread
From: Hanna Reitz @ 2021-11-10 15:12 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-block; +Cc: Kevin Wolf, qemu-devel

On 10.11.21 13:51, Vladimir Sementsov-Ogievskiy wrote:
> 04.11.2021 13:38, Hanna Reitz wrote:
>> The children list is specific to BDS parents.  We should not modify it
>> in the general children modification code, but let BDS parents deal with
>> it in their .attach() and .detach() methods.
>>
>> This also has the advantage that a BdrvChild is removed from the
>> children list before its .bs pointer can become NULL.  BDS parents
>> generally assume that their children's .bs pointer is never NULL, so
>> this is actually a bug fix.
>>
>> Signed-off-by: Hanna Reitz<hreitz@redhat.com>
>
>
> Interesting that nor child_root neither child_job do similar things in 
> .attach / .detach ... Should we do something with it?

Well, it’s up to them, I thought. :)

A BB only has a single child, so it doesn’t need a list.  Jobs do have 
their own child list (BlockJob.nodes).  I thought a bit about this when 
writing this series, and I figured perhaps they don’t need to care about 
that in .attach() and .detach(), because they don’t really expect nodes 
to be detached or attached anyway; child_job.stay_at_node is true, after 
all.

Hanna



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 4/7] block: Drop detached child from ignore list
  2021-11-10 13:38   ` Vladimir Sementsov-Ogievskiy via
@ 2021-11-10 17:32     ` Hanna Reitz
  0 siblings, 0 replies; 23+ messages in thread
From: Hanna Reitz @ 2021-11-10 17:32 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-block; +Cc: Kevin Wolf, qemu-devel

On 10.11.21 14:38, Vladimir Sementsov-Ogievskiy wrote:
> 04.11.2021 13:38, Hanna Reitz wrote:
>> bdrv_attach_child_common_abort() restores the parent's AioContext.  To
>> do so, the child (which was supposed to be attached, but is now detached
>> again by this abort handler) is added to the ignore list for the
>> AioContext changing functions.
>>
>> However, since we modify a BDS's children list in the BdrvChildClass's
>> .attach and .detach handlers, the child is already effectively detached
>> from the parent by this point.  We do not need to put it into the ignore
>> list.
>>
>> Use this opportunity to clean up the empty line structure: Keep setting
>> the ignore list, invoking the AioContext function, and freeing the
>> ignore list in blocks separated by empty lines.
>>
>> Signed-off-by: Hanna Reitz <hreitz@redhat.com>
>
>
> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>
>
> What comes in my mind, is that now bdrv_replace_child_noperm() is more 
> strong in detaching.. But may be not enough strong: we still have link 
> from the child to parent (child->opaque).. Maybe more correct for 
> BdrvChild object to have no relation with parent after detaching. But 
> than we'll need some GenericParent object (as child->class is mostly 
> about parent, not about child and even not about the edge).. Now 
> GenericParent is "included" into BdrvChild, which represents both 
> GenericParent and Edge objects..

Yes, I thought about this in exactly this function here 
(bdrv_attach_child_common_abort()).  I was wondering whether I could 
save a pointer to the BdrvChildClass, then just let 
bdrv_replace_child_noperm() free the child, and then invoke the 
BdrvChildClass methods when the child is already gone.  As you say, it’s 
really just about the parent, and child->opaque is effectively just a 
pointer to the parent object, so all of the methods that only use 
child->opaque and nothing else from the BdrvChild object can be invoked 
even after the child is gone.

But doing something about that (like changing some methods like the 
AioContext methods to only take the opaque pointer) was too invasive for 
me for this series, so in case of this function I decided to just 
begrudgingly keep the BdrvChild object around, including its 
back-connection to the parent (via child->opaque), and free it at the 
end of the function.

Besides that back-connection, there’s also to consider that some block 
drivers can keep pointers to special children in their BDS 
“subclasses”.  For example, qcow2 keeps s->data_file.  That’s a problem 
just like bs->backing or bs->file, except that it’s much more unlikely 
to cause problems in practice, and that it would probably be even more 
invasive to fix...

Hanna

>> ---
>>   block.c | 8 +++++---
>>   1 file changed, 5 insertions(+), 3 deletions(-)
>>
>> diff --git a/block.c b/block.c
>> index b95f8dcf8f..6d230ad3d1 100644
>> --- a/block.c
>> +++ b/block.c
>> @@ -2774,14 +2774,16 @@ static void 
>> bdrv_attach_child_common_abort(void *opaque)
>>       }
>>         if (bdrv_child_get_parent_aio_context(child) != 
>> s->old_parent_ctx) {
>> -        GSList *ignore = g_slist_prepend(NULL, child);
>> +        GSList *ignore;
>>   +        /* No need to ignore `child`, because it has been detached 
>> already */
>> +        ignore = NULL;
>>           child->klass->can_set_aio_ctx(child, s->old_parent_ctx, 
>> &ignore,
>>                                         &error_abort);
>>           g_slist_free(ignore);
>> -        ignore = g_slist_prepend(NULL, child);
>> -        child->klass->set_aio_ctx(child, s->old_parent_ctx, &ignore);
>>   +        ignore = NULL;
>> +        child->klass->set_aio_ctx(child, s->old_parent_ctx, &ignore);
>>           g_slist_free(ignore);
>>       }
>>
>
>



^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2021-11-10 17:33 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-11-04 10:38 [PATCH 0/7] block: Attempt on fixing 030-reported errors Hanna Reitz
2021-11-04 10:38 ` [PATCH 1/7] stream: Traverse graph after modification Hanna Reitz
2021-11-10 12:17   ` Vladimir Sementsov-Ogievskiy
2021-11-04 10:38 ` [PATCH 2/7] block: Manipulate children list in .attach/.detach Hanna Reitz
2021-11-10 12:46   ` Vladimir Sementsov-Ogievskiy
2021-11-10 15:05     ` Hanna Reitz
2021-11-10 12:51   ` Vladimir Sementsov-Ogievskiy
2021-11-10 15:12     ` Hanna Reitz
2021-11-04 10:38 ` [PATCH 3/7] block: Unite remove_empty_child and child_free Hanna Reitz
2021-11-10 12:52   ` Vladimir Sementsov-Ogievskiy
2021-11-04 10:38 ` [PATCH 4/7] block: Drop detached child from ignore list Hanna Reitz
2021-11-10 13:38   ` Vladimir Sementsov-Ogievskiy via
2021-11-10 17:32     ` Hanna Reitz
2021-11-04 10:38 ` [PATCH 5/7] block: Pass BdrvChild ** to replace_child_noperm Hanna Reitz
2021-11-05 15:15   ` Kevin Wolf
2021-11-08  9:58     ` Hanna Reitz
2021-11-04 10:38 ` [PATCH 6/7] block: Let replace_child_noperm free children Hanna Reitz
2021-11-05 15:41   ` Kevin Wolf
2021-11-08 10:12     ` Hanna Reitz
2021-11-04 10:38 ` [PATCH 7/7] iotests/030: Unthrottle parallel jobs in reverse Hanna Reitz
2021-11-04 11:58 ` [PATCH 0/7] block: Attempt on fixing 030-reported errors Kevin Wolf
2021-11-04 13:34   ` Hanna Reitz
2021-11-05 15:44 ` Kevin Wolf

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).