All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/21] Graph locking part 4 (node management)
@ 2023-08-17 12:49 Kevin Wolf
  2023-08-17 12:50 ` [PATCH 01/21] block: Remove unused BlockReopenQueueEntry.perms_checked Kevin Wolf
                   ` (21 more replies)
  0 siblings, 22 replies; 70+ messages in thread
From: Kevin Wolf @ 2023-08-17 12:49 UTC (permalink / raw)
  To: qemu-block
  Cc: kwolf, stefanha, eblake, eesposit, pbonzini, vsementsov, qemu-devel

The previous parts of the graph locking changes focussed mostly on the
BlockDriver side and taking reader locks while performing I/O. This
series focusses more on the functions managing the graph structure, i.e
adding, removing and replacing nodes and updating their permissions.

Many of these places actually need to take the writer lock to avoid
readers seeing an inconsistent half-updated graph state. Therefore
taking the writer lock is now pushed down from the very low-level
function bdrv_replace_child_noperm() into its more high level callers.

Kevin Wolf (21):
  block: Remove unused BlockReopenQueueEntry.perms_checked
  preallocate: Factor out preallocate_truncate_to_real_size()
  preallocate: Don't poll during permission updates
  block: Take AioContext lock for bdrv_append() more consistently
  block: Introduce bdrv_schedule_unref()
  block-coroutine-wrapper: Add no_co_wrapper_bdrv_wrlock functions
  block-coroutine-wrapper: Allow arbitrary parameter names
  block: Mark bdrv_replace_child_noperm() GRAPH_WRLOCK
  block: Mark bdrv_replace_child_tran() GRAPH_WRLOCK
  block: Mark bdrv_attach_child_common() GRAPH_WRLOCK
  block: Call transaction callbacks with lock held
  block: Mark bdrv_attach_child() GRAPH_WRLOCK
  block: Mark bdrv_parent_perms_conflict() and callers GRAPH_RDLOCK
  block: Mark bdrv_get_cumulative_perm() and callers GRAPH_RDLOCK
  block: Mark bdrv_child_perm() GRAPH_RDLOCK
  block: Mark bdrv_parent_cb_change_media() GRAPH_RDLOCK
  block: Take graph rdlock in bdrv_drop_intermediate()
  block: Take graph rdlock in bdrv_change_aio_context()
  block: Mark bdrv_root_unref_child() GRAPH_WRLOCK
  block: Mark bdrv_unref_child() GRAPH_WRLOCK
  block: Mark bdrv_add/del_child() and caller GRAPH_WRLOCK

 include/block/block-common.h                |   4 +
 include/block/block-global-state.h          |  30 +-
 include/block/block_int-common.h            |  34 +-
 include/block/block_int-global-state.h      |  14 +-
 include/sysemu/block-backend-global-state.h |   4 +-
 block.c                                     | 340 ++++++++++++++------
 block/blklogwrites.c                        |   4 +
 block/blkverify.c                           |   2 +
 block/block-backend.c                       |  29 +-
 block/copy-before-write.c                   |  10 +-
 block/crypto.c                              |   6 +-
 block/graph-lock.c                          |  23 +-
 block/mirror.c                              |   8 +
 block/preallocate.c                         | 133 +++++---
 block/qcow2.c                               |   4 +-
 block/quorum.c                              |  23 +-
 block/replication.c                         |   9 +
 block/snapshot.c                            |   2 +
 block/stream.c                              |  20 +-
 block/vmdk.c                                |  13 +
 blockdev.c                                  |  23 +-
 blockjob.c                                  |   2 +
 tests/unit/test-bdrv-drain.c                |  23 +-
 tests/unit/test-bdrv-graph-mod.c            |  20 ++
 tests/unit/test-block-iothread.c            |   3 +
 scripts/block-coroutine-wrapper.py          |  18 +-
 tests/qemu-iotests/051.pc.out               |   6 +-
 27 files changed, 580 insertions(+), 227 deletions(-)

-- 
2.41.0



^ permalink raw reply	[flat|nested] 70+ messages in thread

* [PATCH 01/21] block: Remove unused BlockReopenQueueEntry.perms_checked
  2023-08-17 12:49 [PATCH 00/21] Graph locking part 4 (node management) Kevin Wolf
@ 2023-08-17 12:50 ` Kevin Wolf
  2023-08-17 19:18   ` Eric Blake
                     ` (2 more replies)
  2023-08-17 12:50 ` [PATCH 02/21] preallocate: Factor out preallocate_truncate_to_real_size() Kevin Wolf
                   ` (20 subsequent siblings)
  21 siblings, 3 replies; 70+ messages in thread
From: Kevin Wolf @ 2023-08-17 12:50 UTC (permalink / raw)
  To: qemu-block
  Cc: kwolf, stefanha, eblake, eesposit, pbonzini, vsementsov, qemu-devel

This field has been unused since commit 72373e40fbc ('block:
bdrv_reopen_multiple: refresh permissions on updated graph').
Remove it.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 block.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/block.c b/block.c
index a307c151a8..6376452768 100644
--- a/block.c
+++ b/block.c
@@ -2113,7 +2113,6 @@ static int bdrv_fill_options(QDict **options, const char *filename,
 
 typedef struct BlockReopenQueueEntry {
      bool prepared;
-     bool perms_checked;
      BDRVReopenState state;
      QTAILQ_ENTRY(BlockReopenQueueEntry) entry;
 } BlockReopenQueueEntry;
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PATCH 02/21] preallocate: Factor out preallocate_truncate_to_real_size()
  2023-08-17 12:49 [PATCH 00/21] Graph locking part 4 (node management) Kevin Wolf
  2023-08-17 12:50 ` [PATCH 01/21] block: Remove unused BlockReopenQueueEntry.perms_checked Kevin Wolf
@ 2023-08-17 12:50 ` Kevin Wolf
  2023-08-18 15:23   ` Eric Blake
                     ` (2 more replies)
  2023-08-17 12:50 ` [PATCH 03/21] preallocate: Don't poll during permission updates Kevin Wolf
                   ` (19 subsequent siblings)
  21 siblings, 3 replies; 70+ messages in thread
From: Kevin Wolf @ 2023-08-17 12:50 UTC (permalink / raw)
  To: qemu-block
  Cc: kwolf, stefanha, eblake, eesposit, pbonzini, vsementsov, qemu-devel

It's essentially the same code in preallocate_check_perm() and
preallocate_close(), except that the latter ignores errors.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 block/preallocate.c | 48 +++++++++++++++++++++------------------------
 1 file changed, 22 insertions(+), 26 deletions(-)

diff --git a/block/preallocate.c b/block/preallocate.c
index 4d82125036..3259c51c1b 100644
--- a/block/preallocate.c
+++ b/block/preallocate.c
@@ -162,26 +162,39 @@ static int preallocate_open(BlockDriverState *bs, QDict *options, int flags,
     return 0;
 }
 
-static void preallocate_close(BlockDriverState *bs)
+static int preallocate_truncate_to_real_size(BlockDriverState *bs, Error **errp)
 {
-    int ret;
     BDRVPreallocateState *s = bs->opaque;
-
-    if (s->data_end < 0) {
-        return;
-    }
+    int ret;
 
     if (s->file_end < 0) {
         s->file_end = bdrv_getlength(bs->file->bs);
         if (s->file_end < 0) {
-            return;
+            error_setg_errno(errp, -s->file_end, "Failed to get file length");
+            return s->file_end;
         }
     }
 
     if (s->data_end < s->file_end) {
         ret = bdrv_truncate(bs->file, s->data_end, true, PREALLOC_MODE_OFF, 0,
                             NULL);
-        s->file_end = ret < 0 ? ret : s->data_end;
+        if (ret < 0) {
+            error_setg_errno(errp, -ret, "Failed to drop preallocation");
+            s->file_end = ret;
+            return ret;
+        }
+        s->file_end = s->data_end;
+    }
+
+    return 0;
+}
+
+static void preallocate_close(BlockDriverState *bs)
+{
+    BDRVPreallocateState *s = bs->opaque;
+
+    if (s->data_end >= 0) {
+        preallocate_truncate_to_real_size(bs, NULL);
     }
 }
 
@@ -473,24 +486,7 @@ static int preallocate_check_perm(BlockDriverState *bs,
          * We should truncate in check_perm, as in set_perm bs->file->perm will
          * be already changed, and we should not violate it.
          */
-        if (s->file_end < 0) {
-            s->file_end = bdrv_getlength(bs->file->bs);
-            if (s->file_end < 0) {
-                error_setg(errp, "Failed to get file length");
-                return s->file_end;
-            }
-        }
-
-        if (s->data_end < s->file_end) {
-            int ret = bdrv_truncate(bs->file, s->data_end, true,
-                                    PREALLOC_MODE_OFF, 0, NULL);
-            if (ret < 0) {
-                error_setg(errp, "Failed to drop preallocation");
-                s->file_end = ret;
-                return ret;
-            }
-            s->file_end = s->data_end;
-        }
+        return preallocate_truncate_to_real_size(bs, errp);
     }
 
     return 0;
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PATCH 03/21] preallocate: Don't poll during permission updates
  2023-08-17 12:49 [PATCH 00/21] Graph locking part 4 (node management) Kevin Wolf
  2023-08-17 12:50 ` [PATCH 01/21] block: Remove unused BlockReopenQueueEntry.perms_checked Kevin Wolf
  2023-08-17 12:50 ` [PATCH 02/21] preallocate: Factor out preallocate_truncate_to_real_size() Kevin Wolf
@ 2023-08-17 12:50 ` Kevin Wolf
  2023-08-18 15:34   ` Eric Blake
  2023-08-22 18:41   ` Stefan Hajnoczi
  2023-08-17 12:50 ` [PATCH 04/21] block: Take AioContext lock for bdrv_append() more consistently Kevin Wolf
                   ` (18 subsequent siblings)
  21 siblings, 2 replies; 70+ messages in thread
From: Kevin Wolf @ 2023-08-17 12:50 UTC (permalink / raw)
  To: qemu-block
  Cc: kwolf, stefanha, eblake, eesposit, pbonzini, vsementsov, qemu-devel

When the permission related BlockDriver callbacks are called, we are in
the middle of an operation traversing the block graph. Polling in such a
place is a very bad idea because the graph could change in unexpected
ways. In the future, callers will also hold the graph lock, which is
likely to turn polling into a deadlock.

So we need to get rid of calls to functions like bdrv_getlength() or
bdrv_truncate() there as these functions poll internally. They are
currently used so that when no parent has write/resize permissions on
the image any more, the preallocate filter drops the extra preallocated
area in the image file and gives up write/resize permissions itself.

In order to achieve this without polling in .bdrv_check_perm, don't
immediately truncate the image, but only schedule a BH to do so. The
filter keeps the write/resize permissions a bit longer now until the BH
has executed.

There is one case in which delaying doesn't work: Reopening the image
read-only. In this case, bs->file will likely be reopened read-only,
too, so keeping write permissions a bit longer on it doesn't work. But
we can already cover this case in preallocate_reopen_prepare() and not
rely on the permission updates for it.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 block/preallocate.c | 89 +++++++++++++++++++++++++++++++++++----------
 1 file changed, 69 insertions(+), 20 deletions(-)

diff --git a/block/preallocate.c b/block/preallocate.c
index 3259c51c1b..ace475a850 100644
--- a/block/preallocate.c
+++ b/block/preallocate.c
@@ -75,8 +75,14 @@ typedef struct BDRVPreallocateState {
      * be invalid (< 0) when we don't have both exclusive BLK_PERM_RESIZE and
      * BLK_PERM_WRITE permissions on file child.
      */
+
+    /* Gives up the resize permission on children when parents don't need it */
+    QEMUBH *drop_resize_bh;
 } BDRVPreallocateState;
 
+static int preallocate_drop_resize(BlockDriverState *bs, Error **errp);
+static void preallocate_drop_resize_bh(void *opaque);
+
 #define PREALLOCATE_OPT_PREALLOC_ALIGN "prealloc-align"
 #define PREALLOCATE_OPT_PREALLOC_SIZE "prealloc-size"
 static QemuOptsList runtime_opts = {
@@ -142,6 +148,7 @@ static int preallocate_open(BlockDriverState *bs, QDict *options, int flags,
      * For this to work, mark them invalid.
      */
     s->file_end = s->zero_start = s->data_end = -EINVAL;
+    s->drop_resize_bh = qemu_bh_new(preallocate_drop_resize_bh, bs);
 
     ret = bdrv_open_file_child(NULL, options, "file", bs, errp);
     if (ret < 0) {
@@ -193,6 +200,9 @@ static void preallocate_close(BlockDriverState *bs)
 {
     BDRVPreallocateState *s = bs->opaque;
 
+    qemu_bh_cancel(s->drop_resize_bh);
+    qemu_bh_delete(s->drop_resize_bh);
+
     if (s->data_end >= 0) {
         preallocate_truncate_to_real_size(bs, NULL);
     }
@@ -211,6 +221,7 @@ static int preallocate_reopen_prepare(BDRVReopenState *reopen_state,
                                       BlockReopenQueue *queue, Error **errp)
 {
     PreallocateOpts *opts = g_new0(PreallocateOpts, 1);
+    int ret;
 
     if (!preallocate_absorb_opts(opts, reopen_state->options,
                                  reopen_state->bs->file->bs, errp)) {
@@ -218,6 +229,19 @@ static int preallocate_reopen_prepare(BDRVReopenState *reopen_state,
         return -EINVAL;
     }
 
+    /*
+     * Drop the preallocation already here if reopening read-only. The child
+     * might also be reopened read-only and then scheduling a BH during the
+     * permission update is too late.
+     */
+    if ((reopen_state->flags & BDRV_O_RDWR) == 0) {
+        ret = preallocate_drop_resize(reopen_state->bs, errp);
+        if (ret < 0) {
+            g_free(opts);
+            return ret;
+        }
+    }
+
     reopen_state->opaque = opts;
 
     return 0;
@@ -475,41 +499,61 @@ preallocate_co_getlength(BlockDriverState *bs)
     return ret;
 }
 
-static int preallocate_check_perm(BlockDriverState *bs,
-                                  uint64_t perm, uint64_t shared, Error **errp)
+static int preallocate_drop_resize(BlockDriverState *bs, Error **errp)
 {
     BDRVPreallocateState *s = bs->opaque;
+    int ret;
 
-    if (s->data_end >= 0 && !can_write_resize(perm)) {
-        /*
-         * Lose permissions.
-         * We should truncate in check_perm, as in set_perm bs->file->perm will
-         * be already changed, and we should not violate it.
-         */
-        return preallocate_truncate_to_real_size(bs, errp);
+    if (s->data_end < 0) {
+        return 0;
+    }
+
+    /*
+     * Before switching children to be read-only, truncate them to remove
+     * the preallocation and let them have the real size.
+     */
+    ret = preallocate_truncate_to_real_size(bs, errp);
+    if (ret < 0) {
+        return ret;
     }
 
+    /*
+     * We'll drop our permissions and will allow other users to take write and
+     * resize permissions (see preallocate_child_perm). Anyone will be able to
+     * change the child, so mark all states invalid. We'll regain control if a
+     * parent requests write access again.
+     */
+    s->data_end = s->file_end = s->zero_start = -EINVAL;
+
+    bdrv_graph_rdlock_main_loop();
+    bdrv_child_refresh_perms(bs, bs->file, NULL);
+    bdrv_graph_rdunlock_main_loop();
+
     return 0;
 }
 
+static void preallocate_drop_resize_bh(void *opaque)
+{
+    /*
+     * In case of errors, we'll simply keep the exclusive lock on the image
+     * indefinitely.
+     */
+    preallocate_drop_resize(opaque, NULL);
+}
+
 static void preallocate_set_perm(BlockDriverState *bs,
                                  uint64_t perm, uint64_t shared)
 {
     BDRVPreallocateState *s = bs->opaque;
 
     if (can_write_resize(perm)) {
+        qemu_bh_cancel(s->drop_resize_bh);
         if (s->data_end < 0) {
             s->data_end = s->file_end = s->zero_start =
-                bdrv_getlength(bs->file->bs);
+                bs->file->bs->total_sectors * BDRV_SECTOR_SIZE;
         }
     } else {
-        /*
-         * We drop our permissions, as well as allow shared
-         * permissions (see preallocate_child_perm), anyone will be able to
-         * change the child, so mark all states invalid. We'll regain control if
-         * get good permissions back.
-         */
-        s->data_end = s->file_end = s->zero_start = -EINVAL;
+        qemu_bh_schedule(s->drop_resize_bh);
     }
 }
 
@@ -517,10 +561,16 @@ static void preallocate_child_perm(BlockDriverState *bs, BdrvChild *c,
     BdrvChildRole role, BlockReopenQueue *reopen_queue,
     uint64_t perm, uint64_t shared, uint64_t *nperm, uint64_t *nshared)
 {
+    BDRVPreallocateState *s = bs->opaque;
+
     bdrv_default_perms(bs, c, role, reopen_queue, perm, shared, nperm, nshared);
 
-    if (can_write_resize(perm)) {
-        /* This should come by default, but let's enforce: */
+    /*
+     * We need exclusive write and resize permissions on the child not only when
+     * the parent can write to it, but also after the parent gave up write
+     * permissions until preallocate_drop_resize() has completed.
+     */
+    if (can_write_resize(perm) || s->data_end != -EINVAL) {
         *nperm |= BLK_PERM_WRITE | BLK_PERM_RESIZE;
 
         /*
@@ -550,7 +600,6 @@ BlockDriver bdrv_preallocate_filter = {
     .bdrv_co_flush = preallocate_co_flush,
     .bdrv_co_truncate = preallocate_co_truncate,
 
-    .bdrv_check_perm = preallocate_check_perm,
     .bdrv_set_perm = preallocate_set_perm,
     .bdrv_child_perm = preallocate_child_perm,
 
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PATCH 04/21] block: Take AioContext lock for bdrv_append() more consistently
  2023-08-17 12:49 [PATCH 00/21] Graph locking part 4 (node management) Kevin Wolf
                   ` (2 preceding siblings ...)
  2023-08-17 12:50 ` [PATCH 03/21] preallocate: Don't poll during permission updates Kevin Wolf
@ 2023-08-17 12:50 ` Kevin Wolf
  2023-08-18 16:07   ` Eric Blake
                     ` (2 more replies)
  2023-08-17 12:50 ` [PATCH 05/21] block: Introduce bdrv_schedule_unref() Kevin Wolf
                   ` (17 subsequent siblings)
  21 siblings, 3 replies; 70+ messages in thread
From: Kevin Wolf @ 2023-08-17 12:50 UTC (permalink / raw)
  To: qemu-block
  Cc: kwolf, stefanha, eblake, eesposit, pbonzini, vsementsov, qemu-devel

The documentation for bdrv_append() says that the caller must hold the
AioContext lock for bs_top. Change all callers to actually adhere to the
contract.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 tests/unit/test-bdrv-drain.c     | 3 +++
 tests/unit/test-bdrv-graph-mod.c | 6 ++++++
 tests/unit/test-block-iothread.c | 3 +++
 3 files changed, 12 insertions(+)

diff --git a/tests/unit/test-bdrv-drain.c b/tests/unit/test-bdrv-drain.c
index ccc453c29e..89c8fa6780 100644
--- a/tests/unit/test-bdrv-drain.c
+++ b/tests/unit/test-bdrv-drain.c
@@ -1359,7 +1359,10 @@ static void test_append_to_drained(void)
     g_assert_cmpint(base_s->drain_count, ==, 1);
     g_assert_cmpint(base->in_flight, ==, 0);
 
+    aio_context_acquire(qemu_get_aio_context());
     bdrv_append(overlay, base, &error_abort);
+    aio_context_release(qemu_get_aio_context());
+
     g_assert_cmpint(base->in_flight, ==, 0);
     g_assert_cmpint(overlay->in_flight, ==, 0);
 
diff --git a/tests/unit/test-bdrv-graph-mod.c b/tests/unit/test-bdrv-graph-mod.c
index c522591531..dc80ee8f85 100644
--- a/tests/unit/test-bdrv-graph-mod.c
+++ b/tests/unit/test-bdrv-graph-mod.c
@@ -140,8 +140,10 @@ static void test_update_perm_tree(void)
     bdrv_attach_child(filter, bs, "child", &child_of_bds,
                       BDRV_CHILD_DATA, &error_abort);
 
+    aio_context_acquire(qemu_get_aio_context());
     ret = bdrv_append(filter, bs, NULL);
     g_assert_cmpint(ret, <, 0);
+    aio_context_release(qemu_get_aio_context());
 
     bdrv_unref(filter);
     blk_unref(root);
@@ -205,7 +207,9 @@ static void test_should_update_child(void)
     g_assert(target->backing->bs == bs);
     bdrv_attach_child(filter, target, "target", &child_of_bds,
                       BDRV_CHILD_DATA, &error_abort);
+    aio_context_acquire(qemu_get_aio_context());
     bdrv_append(filter, bs, &error_abort);
+    aio_context_release(qemu_get_aio_context());
     g_assert(target->backing->bs == bs);
 
     bdrv_unref(filter);
@@ -410,7 +414,9 @@ static void test_append_greedy_filter(void)
                       BDRV_CHILD_FILTERED | BDRV_CHILD_PRIMARY,
                       &error_abort);
 
+    aio_context_acquire(qemu_get_aio_context());
     bdrv_append(fl, base, &error_abort);
+    aio_context_release(qemu_get_aio_context());
     bdrv_unref(fl);
     bdrv_unref(top);
 }
diff --git a/tests/unit/test-block-iothread.c b/tests/unit/test-block-iothread.c
index d727a5fee8..9155547313 100644
--- a/tests/unit/test-block-iothread.c
+++ b/tests/unit/test-block-iothread.c
@@ -756,11 +756,14 @@ static void test_propagate_mirror(void)
                                   &error_abort);
 
     /* Start a mirror job */
+    aio_context_acquire(main_ctx);
     mirror_start("job0", src, target, NULL, JOB_DEFAULT, 0, 0, 0,
                  MIRROR_SYNC_MODE_NONE, MIRROR_OPEN_BACKING_CHAIN, false,
                  BLOCKDEV_ON_ERROR_REPORT, BLOCKDEV_ON_ERROR_REPORT,
                  false, "filter_node", MIRROR_COPY_MODE_BACKGROUND,
                  &error_abort);
+    aio_context_release(main_ctx);
+
     WITH_JOB_LOCK_GUARD() {
         job = job_get_locked("job0");
     }
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PATCH 05/21] block: Introduce bdrv_schedule_unref()
  2023-08-17 12:49 [PATCH 00/21] Graph locking part 4 (node management) Kevin Wolf
                   ` (3 preceding siblings ...)
  2023-08-17 12:50 ` [PATCH 04/21] block: Take AioContext lock for bdrv_append() more consistently Kevin Wolf
@ 2023-08-17 12:50 ` Kevin Wolf
  2023-08-18 16:23   ` Eric Blake
  2023-08-22 19:01   ` Stefan Hajnoczi
  2023-08-17 12:50 ` [PATCH 06/21] block-coroutine-wrapper: Add no_co_wrapper_bdrv_wrlock functions Kevin Wolf
                   ` (16 subsequent siblings)
  21 siblings, 2 replies; 70+ messages in thread
From: Kevin Wolf @ 2023-08-17 12:50 UTC (permalink / raw)
  To: qemu-block
  Cc: kwolf, stefanha, eblake, eesposit, pbonzini, vsementsov, qemu-devel

bdrv_unref() is called by a lot of places that need to hold the graph
lock (it naturally happens in the context of operations that change the
graph). However, bdrv_unref() takes the graph writer lock internally, so
it can't actually be called while already holding a graph lock without
causing a deadlock.

bdrv_unref() also can't just become GRAPH_WRLOCK because it drains the
node before closing it, and draining requires that the graph is
unlocked.

The solution is to defer deleting the node until we don't hold the lock
any more and draining is possible again.

Note that keeping images open for longer than necessary can create
problems, too: You can't open an image again before it is really closed
(if image locking didn't prevent it, it would cause corruption).
Reopening an image immediately happens at least during bdrv_open() and
bdrv_co_create().

In order to solve this problem, make sure to run the deferred unref in
bdrv_graph_wrunlock(), i.e. the first possible place where we can drain
again. This is also why bdrv_schedule_unref() is marked GRAPH_WRLOCK.

The output of iotest 051 is updated because the additional polling
changes the order of HMP output, resulting in a new "(qemu)" prompt in
the test output that was previously on a separate line and filtered out.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 include/block/block-global-state.h |  1 +
 block.c                            |  9 +++++++++
 block/graph-lock.c                 | 23 ++++++++++++++++-------
 tests/qemu-iotests/051.pc.out      |  6 +++---
 4 files changed, 29 insertions(+), 10 deletions(-)

diff --git a/include/block/block-global-state.h b/include/block/block-global-state.h
index f347199bff..e570799f85 100644
--- a/include/block/block-global-state.h
+++ b/include/block/block-global-state.h
@@ -224,6 +224,7 @@ void bdrv_img_create(const char *filename, const char *fmt,
 void bdrv_ref(BlockDriverState *bs);
 void no_coroutine_fn bdrv_unref(BlockDriverState *bs);
 void coroutine_fn no_co_wrapper bdrv_co_unref(BlockDriverState *bs);
+void GRAPH_WRLOCK bdrv_schedule_unref(BlockDriverState *bs);
 void bdrv_unref_child(BlockDriverState *parent, BdrvChild *child);
 BdrvChild *bdrv_attach_child(BlockDriverState *parent_bs,
                              BlockDriverState *child_bs,
diff --git a/block.c b/block.c
index 6376452768..9c4f24f4b9 100644
--- a/block.c
+++ b/block.c
@@ -7033,6 +7033,15 @@ void bdrv_unref(BlockDriverState *bs)
     }
 }
 
+void bdrv_schedule_unref(BlockDriverState *bs)
+{
+    if (!bs) {
+        return;
+    }
+    aio_bh_schedule_oneshot(qemu_get_aio_context(),
+                            (QEMUBHFunc *) bdrv_unref, bs);
+}
+
 struct BdrvOpBlocker {
     Error *reason;
     QLIST_ENTRY(BdrvOpBlocker) list;
diff --git a/block/graph-lock.c b/block/graph-lock.c
index 5e66f01ae8..395d387651 100644
--- a/block/graph-lock.c
+++ b/block/graph-lock.c
@@ -163,17 +163,26 @@ void bdrv_graph_wrlock(BlockDriverState *bs)
 void bdrv_graph_wrunlock(void)
 {
     GLOBAL_STATE_CODE();
-    QEMU_LOCK_GUARD(&aio_context_list_lock);
     assert(qatomic_read(&has_writer));
 
+    WITH_QEMU_LOCK_GUARD(&aio_context_list_lock) {
+        /*
+         * No need for memory barriers, this works in pair with
+         * the slow path of rdlock() and both take the lock.
+         */
+        qatomic_store_release(&has_writer, 0);
+
+        /* Wake up all coroutine that are waiting to read the graph */
+        qemu_co_enter_all(&reader_queue, &aio_context_list_lock);
+    }
+
     /*
-     * No need for memory barriers, this works in pair with
-     * the slow path of rdlock() and both take the lock.
+     * Run any BHs that were scheduled during the wrlock section and that
+     * callers might expect to have finished (e.g. bdrv_unref() calls). Do this
+     * only after restarting coroutines so that nested event loops in BHs don't
+     * deadlock if their condition relies on the coroutine making progress.
      */
-    qatomic_store_release(&has_writer, 0);
-
-    /* Wake up all coroutine that are waiting to read the graph */
-    qemu_co_enter_all(&reader_queue, &aio_context_list_lock);
+    aio_bh_poll(qemu_get_aio_context());
 }
 
 void coroutine_fn bdrv_graph_co_rdlock(void)
diff --git a/tests/qemu-iotests/051.pc.out b/tests/qemu-iotests/051.pc.out
index 4d4af5a486..7e10c5fa1b 100644
--- a/tests/qemu-iotests/051.pc.out
+++ b/tests/qemu-iotests/051.pc.out
@@ -169,11 +169,11 @@ QEMU_PROG: -device scsi-hd,drive=disk: Device needs media, but drive is empty
 
 Testing: -drive file=TEST_DIR/t.qcow2,if=none,node-name=disk -object iothread,id=thread0 -device virtio-scsi,iothread=thread0,id=virtio-scsi0 -device scsi-hd,bus=virtio-scsi0.0,drive=disk,share-rw=on -device ide-hd,drive=disk,share-rw=on
 QEMU X.Y.Z monitor - type 'help' for more information
-QEMU_PROG: -device ide-hd,drive=disk,share-rw=on: Cannot change iothread of active block backend
+(qemu) QEMU_PROG: -device ide-hd,drive=disk,share-rw=on: Cannot change iothread of active block backend
 
 Testing: -drive file=TEST_DIR/t.qcow2,if=none,node-name=disk -object iothread,id=thread0 -device virtio-scsi,iothread=thread0,id=virtio-scsi0 -device scsi-hd,bus=virtio-scsi0.0,drive=disk,share-rw=on -device virtio-blk-pci,drive=disk,share-rw=on
 QEMU X.Y.Z monitor - type 'help' for more information
-QEMU_PROG: -device virtio-blk-pci,drive=disk,share-rw=on: Cannot change iothread of active block backend
+(qemu) QEMU_PROG: -device virtio-blk-pci,drive=disk,share-rw=on: Cannot change iothread of active block backend
 
 Testing: -drive file=TEST_DIR/t.qcow2,if=none,node-name=disk -object iothread,id=thread0 -device virtio-scsi,iothread=thread0,id=virtio-scsi0 -device scsi-hd,bus=virtio-scsi0.0,drive=disk,share-rw=on -device lsi53c895a,id=lsi0 -device scsi-hd,bus=lsi0.0,drive=disk,share-rw=on
 QEMU X.Y.Z monitor - type 'help' for more information
@@ -185,7 +185,7 @@ QEMU X.Y.Z monitor - type 'help' for more information
 
 Testing: -drive file=TEST_DIR/t.qcow2,if=none,node-name=disk -object iothread,id=thread0 -device virtio-scsi,iothread=thread0,id=virtio-scsi0 -device scsi-hd,bus=virtio-scsi0.0,drive=disk,share-rw=on -device virtio-blk-pci,drive=disk,iothread=thread0,share-rw=on
 QEMU X.Y.Z monitor - type 'help' for more information
-QEMU_PROG: -device virtio-blk-pci,drive=disk,iothread=thread0,share-rw=on: Cannot change iothread of active block backend
+(qemu) QEMU_PROG: -device virtio-blk-pci,drive=disk,iothread=thread0,share-rw=on: Cannot change iothread of active block backend
 
 Testing: -drive file=TEST_DIR/t.qcow2,if=none,node-name=disk -object iothread,id=thread0 -device virtio-scsi,iothread=thread0,id=virtio-scsi0 -device scsi-hd,bus=virtio-scsi0.0,drive=disk,share-rw=on -device virtio-scsi,id=virtio-scsi1,iothread=thread0 -device scsi-hd,bus=virtio-scsi1.0,drive=disk,share-rw=on
 QEMU X.Y.Z monitor - type 'help' for more information
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PATCH 06/21] block-coroutine-wrapper: Add no_co_wrapper_bdrv_wrlock functions
  2023-08-17 12:49 [PATCH 00/21] Graph locking part 4 (node management) Kevin Wolf
                   ` (4 preceding siblings ...)
  2023-08-17 12:50 ` [PATCH 05/21] block: Introduce bdrv_schedule_unref() Kevin Wolf
@ 2023-08-17 12:50 ` Kevin Wolf
  2023-08-21  7:34   ` Emanuele Giuseppe Esposito
  2023-08-22 19:03   ` Stefan Hajnoczi
  2023-08-17 12:50 ` [PATCH 07/21] block-coroutine-wrapper: Allow arbitrary parameter names Kevin Wolf
                   ` (15 subsequent siblings)
  21 siblings, 2 replies; 70+ messages in thread
From: Kevin Wolf @ 2023-08-17 12:50 UTC (permalink / raw)
  To: qemu-block
  Cc: kwolf, stefanha, eblake, eesposit, pbonzini, vsementsov, qemu-devel

Add a new wrapper type for GRAPH_WRLOCK functions that should be called
from coroutine context.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 include/block/block-common.h       |  4 ++++
 scripts/block-coroutine-wrapper.py | 11 +++++++++++
 2 files changed, 15 insertions(+)

diff --git a/include/block/block-common.h b/include/block/block-common.h
index e15395f2cb..267ac6b583 100644
--- a/include/block/block-common.h
+++ b/include/block/block-common.h
@@ -66,10 +66,14 @@
  * function. The coroutine yields after scheduling the BH and is reentered when
  * the wrapped function returns.
  *
+ * A no_co_wrapper_bdrv_wrlock function is a no_co_wrapper function that
+ * automatically takes the graph wrlock when calling the wrapped function.
+ *
  * If the first parameter of the function is a BlockDriverState, BdrvChild or
  * BlockBackend pointer, the AioContext lock for it is taken in the wrapper.
  */
 #define no_co_wrapper
+#define no_co_wrapper_bdrv_wrlock
 
 #include "block/blockjob.h"
 
diff --git a/scripts/block-coroutine-wrapper.py b/scripts/block-coroutine-wrapper.py
index d4a183db61..fa01c06567 100644
--- a/scripts/block-coroutine-wrapper.py
+++ b/scripts/block-coroutine-wrapper.py
@@ -71,10 +71,13 @@ def __init__(self, wrapper_type: str, return_type: str, name: str,
         self.args = [ParamDecl(arg.strip()) for arg in args.split(',')]
         self.create_only_co = 'mixed' not in variant
         self.graph_rdlock = 'bdrv_rdlock' in variant
+        self.graph_wrlock = 'bdrv_wrlock' in variant
 
         self.wrapper_type = wrapper_type
 
         if wrapper_type == 'co':
+            if self.graph_wrlock:
+                raise ValueError(f"co function can't be wrlock: {self.name}")
             subsystem, subname = self.name.split('_', 1)
             self.target_name = f'{subsystem}_co_{subname}'
         else:
@@ -250,6 +253,12 @@ def gen_no_co_wrapper(func: FuncDecl) -> str:
     name = func.target_name
     struct_name = func.struct_name
 
+    graph_lock=''
+    graph_unlock=''
+    if func.graph_wrlock:
+        graph_lock='    bdrv_graph_wrlock(NULL);'
+        graph_unlock='    bdrv_graph_wrunlock();'
+
     return f"""\
 /*
  * Wrappers for {name}
@@ -266,9 +275,11 @@ def gen_no_co_wrapper(func: FuncDecl) -> str:
     {struct_name} *s = opaque;
     AioContext *ctx = {func.gen_ctx('s->')};
 
+{graph_lock}
     aio_context_acquire(ctx);
     {func.get_result}{name}({ func.gen_list('s->{name}') });
     aio_context_release(ctx);
+{graph_unlock}
 
     aio_co_wake(s->co);
 }}
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PATCH 07/21] block-coroutine-wrapper: Allow arbitrary parameter names
  2023-08-17 12:49 [PATCH 00/21] Graph locking part 4 (node management) Kevin Wolf
                   ` (5 preceding siblings ...)
  2023-08-17 12:50 ` [PATCH 06/21] block-coroutine-wrapper: Add no_co_wrapper_bdrv_wrlock functions Kevin Wolf
@ 2023-08-17 12:50 ` Kevin Wolf
  2023-08-21  7:35   ` Emanuele Giuseppe Esposito
  2023-08-22 19:03   ` Stefan Hajnoczi
  2023-08-17 12:50 ` [PATCH 08/21] block: Mark bdrv_replace_child_noperm() GRAPH_WRLOCK Kevin Wolf
                   ` (14 subsequent siblings)
  21 siblings, 2 replies; 70+ messages in thread
From: Kevin Wolf @ 2023-08-17 12:50 UTC (permalink / raw)
  To: qemu-block
  Cc: kwolf, stefanha, eblake, eesposit, pbonzini, vsementsov, qemu-devel

Don't assume specific parameter names like 'bs' or 'blk' in the
generated code, but use the actual name.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 scripts/block-coroutine-wrapper.py | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/scripts/block-coroutine-wrapper.py b/scripts/block-coroutine-wrapper.py
index fa01c06567..685d0b4ed4 100644
--- a/scripts/block-coroutine-wrapper.py
+++ b/scripts/block-coroutine-wrapper.py
@@ -105,12 +105,13 @@ def __init__(self, wrapper_type: str, return_type: str, name: str,
 
     def gen_ctx(self, prefix: str = '') -> str:
         t = self.args[0].type
+        name = self.args[0].name
         if t == 'BlockDriverState *':
-            return f'bdrv_get_aio_context({prefix}bs)'
+            return f'bdrv_get_aio_context({prefix}{name})'
         elif t == 'BdrvChild *':
-            return f'bdrv_get_aio_context({prefix}child->bs)'
+            return f'bdrv_get_aio_context({prefix}{name}->bs)'
         elif t == 'BlockBackend *':
-            return f'blk_get_aio_context({prefix}blk)'
+            return f'blk_get_aio_context({prefix}{name})'
         else:
             return 'qemu_get_aio_context()'
 
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PATCH 08/21] block: Mark bdrv_replace_child_noperm() GRAPH_WRLOCK
  2023-08-17 12:49 [PATCH 00/21] Graph locking part 4 (node management) Kevin Wolf
                   ` (6 preceding siblings ...)
  2023-08-17 12:50 ` [PATCH 07/21] block-coroutine-wrapper: Allow arbitrary parameter names Kevin Wolf
@ 2023-08-17 12:50 ` Kevin Wolf
  2023-08-21  7:35   ` Emanuele Giuseppe Esposito
  2023-08-22 19:05   ` Stefan Hajnoczi
  2023-08-17 12:50 ` [PATCH 09/21] block: Mark bdrv_replace_child_tran() GRAPH_WRLOCK Kevin Wolf
                   ` (13 subsequent siblings)
  21 siblings, 2 replies; 70+ messages in thread
From: Kevin Wolf @ 2023-08-17 12:50 UTC (permalink / raw)
  To: qemu-block
  Cc: kwolf, stefanha, eblake, eesposit, pbonzini, vsementsov, qemu-devel

Instead of taking the writer lock internally, require callers to already
hold it when calling bdrv_replace_child_noperm(). These callers will
typically already hold the graph lock once the locking work is
completed, which means that they can't call functions that take it
internally.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 block.c | 26 +++++++++++++++++++-------
 1 file changed, 19 insertions(+), 7 deletions(-)

diff --git a/block.c b/block.c
index 9c4f24f4b9..500a08b26c 100644
--- a/block.c
+++ b/block.c
@@ -91,8 +91,9 @@ static BlockDriverState *bdrv_open_inherit(const char *filename,
 static bool bdrv_recurse_has_child(BlockDriverState *bs,
                                    BlockDriverState *child);
 
-static void bdrv_replace_child_noperm(BdrvChild *child,
-                                      BlockDriverState *new_bs);
+static void GRAPH_WRLOCK
+bdrv_replace_child_noperm(BdrvChild *child, BlockDriverState *new_bs);
+
 static void bdrv_remove_child(BdrvChild *child, Transaction *tran);
 
 static int bdrv_reopen_prepare(BDRVReopenState *reopen_state,
@@ -2385,6 +2386,8 @@ static void bdrv_replace_child_abort(void *opaque)
     BlockDriverState *new_bs = s->child->bs;
 
     GLOBAL_STATE_CODE();
+    bdrv_graph_wrlock(s->old_bs);
+
     /* old_bs reference is transparently moved from @s to @s->child */
     if (!s->child->bs) {
         /*
@@ -2401,6 +2404,8 @@ static void bdrv_replace_child_abort(void *opaque)
     }
     assert(s->child->quiesced_parent);
     bdrv_replace_child_noperm(s->child, s->old_bs);
+
+    bdrv_graph_wrunlock();
     bdrv_unref(new_bs);
 }
 
@@ -2437,7 +2442,10 @@ static void bdrv_replace_child_tran(BdrvChild *child, BlockDriverState *new_bs,
     if (new_bs) {
         bdrv_ref(new_bs);
     }
+
+    bdrv_graph_wrlock(new_bs);
     bdrv_replace_child_noperm(child, new_bs);
+    bdrv_graph_wrunlock();
     /* old_bs reference is transparently moved from @child to @s */
 }
 
@@ -2856,8 +2864,8 @@ uint64_t bdrv_qapi_perm_to_blk_perm(BlockPermission qapi_perm)
  * If @new_bs is non-NULL, the parent of @child must already be drained through
  * @child and the caller must hold the AioContext lock for @new_bs.
  */
-static void bdrv_replace_child_noperm(BdrvChild *child,
-                                      BlockDriverState *new_bs)
+static void GRAPH_WRLOCK
+bdrv_replace_child_noperm(BdrvChild *child, BlockDriverState *new_bs)
 {
     BlockDriverState *old_bs = child->bs;
     int new_bs_quiesce_counter;
@@ -2892,8 +2900,6 @@ static void bdrv_replace_child_noperm(BdrvChild *child,
         assert(bdrv_get_aio_context(old_bs) == bdrv_get_aio_context(new_bs));
     }
 
-    /* TODO Pull this up into the callers to avoid polling here */
-    bdrv_graph_wrlock(new_bs);
     if (old_bs) {
         if (child->klass->detach) {
             child->klass->detach(child);
@@ -2909,7 +2915,6 @@ static void bdrv_replace_child_noperm(BdrvChild *child,
             child->klass->attach(child);
         }
     }
-    bdrv_graph_wrunlock();
 
     /*
      * If the parent was drained through this BdrvChild previously, but new_bs
@@ -2950,7 +2955,10 @@ static void bdrv_attach_child_common_abort(void *opaque)
     BlockDriverState *bs = s->child->bs;
 
     GLOBAL_STATE_CODE();
+
+    bdrv_graph_wrlock(NULL);
     bdrv_replace_child_noperm(s->child, NULL);
+    bdrv_graph_wrunlock();
 
     if (bdrv_get_aio_context(bs) != s->old_child_ctx) {
         bdrv_try_change_aio_context(bs, s->old_child_ctx, NULL, &error_abort);
@@ -3078,8 +3086,10 @@ static BdrvChild *bdrv_attach_child_common(BlockDriverState *child_bs,
      * a problem, we already did this), but it will still poll until the parent
      * is fully quiesced, so it will not be negatively affected either.
      */
+    bdrv_graph_wrlock(child_bs);
     bdrv_parent_drained_begin_single(new_child);
     bdrv_replace_child_noperm(new_child, child_bs);
+    bdrv_graph_wrunlock();
 
     BdrvAttachChildCommonState *s = g_new(BdrvAttachChildCommonState, 1);
     *s = (BdrvAttachChildCommonState) {
@@ -3223,7 +3233,9 @@ void bdrv_root_unref_child(BdrvChild *child)
     BlockDriverState *child_bs = child->bs;
 
     GLOBAL_STATE_CODE();
+    bdrv_graph_wrlock(NULL);
     bdrv_replace_child_noperm(child, NULL);
+    bdrv_graph_wrunlock();
     bdrv_child_free(child);
 
     if (child_bs) {
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PATCH 09/21] block: Mark bdrv_replace_child_tran() GRAPH_WRLOCK
  2023-08-17 12:49 [PATCH 00/21] Graph locking part 4 (node management) Kevin Wolf
                   ` (7 preceding siblings ...)
  2023-08-17 12:50 ` [PATCH 08/21] block: Mark bdrv_replace_child_noperm() GRAPH_WRLOCK Kevin Wolf
@ 2023-08-17 12:50 ` Kevin Wolf
  2023-08-21  7:35   ` Emanuele Giuseppe Esposito
  2023-08-22 19:14   ` Stefan Hajnoczi
  2023-08-17 12:50 ` [PATCH 10/21] block: Mark bdrv_attach_child_common() GRAPH_WRLOCK Kevin Wolf
                   ` (12 subsequent siblings)
  21 siblings, 2 replies; 70+ messages in thread
From: Kevin Wolf @ 2023-08-17 12:50 UTC (permalink / raw)
  To: qemu-block
  Cc: kwolf, stefanha, eblake, eesposit, pbonzini, vsementsov, qemu-devel

Instead of taking the writer lock internally, require callers to already
hold it when calling bdrv_replace_child_tran(). These callers will
typically already hold the graph lock once the locking work is
completed, which means that they can't call functions that take it
internally.

While a graph lock is held, polling is not allowed. Therefore draining
the necessary nodes can no longer be done in bdrv_remove_child() and
bdrv_replace_node_noperm(), but the callers must already make sure that
they are drained.

Note that the transaction callbacks still take the lock internally, so
tran_finalize() must be called without the lock held. This is because
bdrv_append() also calls bdrv_attach_child_noperm(), which currently
requires to be called unlocked. Once it changes, the transaction
callbacks can be changed, too.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 block.c | 78 ++++++++++++++++++++++++++++++++++++---------------------
 1 file changed, 50 insertions(+), 28 deletions(-)

diff --git a/block.c b/block.c
index 500a08b26c..c06853f36e 100644
--- a/block.c
+++ b/block.c
@@ -94,7 +94,8 @@ static bool bdrv_recurse_has_child(BlockDriverState *bs,
 static void GRAPH_WRLOCK
 bdrv_replace_child_noperm(BdrvChild *child, BlockDriverState *new_bs);
 
-static void bdrv_remove_child(BdrvChild *child, Transaction *tran);
+static void GRAPH_WRLOCK
+bdrv_remove_child(BdrvChild *child, Transaction *tran);
 
 static int bdrv_reopen_prepare(BDRVReopenState *reopen_state,
                                BlockReopenQueue *queue,
@@ -2425,8 +2426,9 @@ static TransactionActionDrv bdrv_replace_child_drv = {
  *
  * The function doesn't update permissions, caller is responsible for this.
  */
-static void bdrv_replace_child_tran(BdrvChild *child, BlockDriverState *new_bs,
-                                    Transaction *tran)
+static void GRAPH_WRLOCK
+bdrv_replace_child_tran(BdrvChild *child, BlockDriverState *new_bs,
+                        Transaction *tran)
 {
     BdrvReplaceChildState *s = g_new(BdrvReplaceChildState, 1);
 
@@ -2443,9 +2445,7 @@ static void bdrv_replace_child_tran(BdrvChild *child, BlockDriverState *new_bs,
         bdrv_ref(new_bs);
     }
 
-    bdrv_graph_wrlock(new_bs);
     bdrv_replace_child_noperm(child, new_bs);
-    bdrv_graph_wrunlock();
     /* old_bs reference is transparently moved from @child to @s */
 }
 
@@ -3437,8 +3437,14 @@ static int bdrv_set_file_or_backing_noperm(BlockDriverState *parent_bs,
     }
 
     if (child) {
+        bdrv_drained_begin(child->bs);
+        bdrv_graph_wrlock(NULL);
+
         bdrv_unset_inherits_from(parent_bs, child, tran);
         bdrv_remove_child(child, tran);
+
+        bdrv_graph_wrunlock();
+        bdrv_drained_end(child->bs);
     }
 
     if (!child_bs) {
@@ -5131,7 +5137,7 @@ void bdrv_close_all(void)
     assert(QTAILQ_EMPTY(&all_bdrv_states));
 }
 
-static bool should_update_child(BdrvChild *c, BlockDriverState *to)
+static bool GRAPH_RDLOCK should_update_child(BdrvChild *c, BlockDriverState *to)
 {
     GQueue *queue;
     GHashTable *found;
@@ -5220,45 +5226,41 @@ static TransactionActionDrv bdrv_remove_child_drv = {
     .commit = bdrv_remove_child_commit,
 };
 
-/* Function doesn't update permissions, caller is responsible for this. */
-static void bdrv_remove_child(BdrvChild *child, Transaction *tran)
+/*
+ * Function doesn't update permissions, caller is responsible for this.
+ *
+ * @child->bs (if non-NULL) must be drained.
+ */
+static void GRAPH_WRLOCK bdrv_remove_child(BdrvChild *child, Transaction *tran)
 {
     if (!child) {
         return;
     }
 
     if (child->bs) {
-        BlockDriverState *bs = child->bs;
-        bdrv_drained_begin(bs);
+        assert(child->quiesced_parent);
         bdrv_replace_child_tran(child, NULL, tran);
-        bdrv_drained_end(bs);
     }
 
     tran_add(tran, &bdrv_remove_child_drv, child);
 }
 
-static void undrain_on_clean_cb(void *opaque)
-{
-    bdrv_drained_end(opaque);
-}
-
-static TransactionActionDrv undrain_on_clean = {
-    .clean = undrain_on_clean_cb,
-};
-
-static int bdrv_replace_node_noperm(BlockDriverState *from,
-                                    BlockDriverState *to,
-                                    bool auto_skip, Transaction *tran,
-                                    Error **errp)
+/*
+ * Both @from and @to (if non-NULL) must be drained. @to must be kept drained
+ * until the transaction is completed.
+ */
+static int GRAPH_WRLOCK
+bdrv_replace_node_noperm(BlockDriverState *from,
+                         BlockDriverState *to,
+                         bool auto_skip, Transaction *tran,
+                         Error **errp)
 {
     BdrvChild *c, *next;
 
     GLOBAL_STATE_CODE();
 
-    bdrv_drained_begin(from);
-    bdrv_drained_begin(to);
-    tran_add(tran, &undrain_on_clean, from);
-    tran_add(tran, &undrain_on_clean, to);
+    assert(from->quiesce_counter);
+    assert(to->quiesce_counter);
 
     QLIST_FOREACH_SAFE(c, &from->parents, next_parent, next) {
         assert(c->bs == from);
@@ -5321,6 +5323,9 @@ static int bdrv_replace_node_common(BlockDriverState *from,
     assert(qemu_get_current_aio_context() == qemu_get_aio_context());
     assert(bdrv_get_aio_context(from) == bdrv_get_aio_context(to));
     bdrv_drained_begin(from);
+    bdrv_drained_begin(to);
+
+    bdrv_graph_wrlock(to);
 
     /*
      * Do the replacement without permission update.
@@ -5334,6 +5339,7 @@ static int bdrv_replace_node_common(BlockDriverState *from,
     }
 
     if (detach_subchain) {
+        /* to_cow_parent is already drained because from is drained */
         bdrv_remove_child(bdrv_filter_or_cow_child(to_cow_parent), tran);
     }
 
@@ -5348,8 +5354,10 @@ static int bdrv_replace_node_common(BlockDriverState *from,
     ret = 0;
 
 out:
+    bdrv_graph_wrunlock();
     tran_finalize(tran, ret);
 
+    bdrv_drained_end(to);
     bdrv_drained_end(from);
     bdrv_unref(from);
 
@@ -5393,6 +5401,7 @@ int bdrv_append(BlockDriverState *bs_new, BlockDriverState *bs_top,
     BdrvChild *child;
     Transaction *tran = tran_new();
     AioContext *old_context, *new_context = NULL;
+    bool drained = false;
 
     GLOBAL_STATE_CODE();
 
@@ -5421,7 +5430,13 @@ int bdrv_append(BlockDriverState *bs_new, BlockDriverState *bs_top,
         aio_context_acquire(new_context);
     }
 
+    bdrv_drained_begin(bs_new);
+    bdrv_drained_begin(bs_top);
+    drained = true;
+
+    bdrv_graph_wrlock(bs_new);
     ret = bdrv_replace_node_noperm(bs_top, bs_new, true, tran, errp);
+    bdrv_graph_wrunlock();
     if (ret < 0) {
         goto out;
     }
@@ -5434,6 +5449,11 @@ out:
     bdrv_refresh_limits(bs_top, NULL, NULL);
     bdrv_graph_rdunlock_main_loop();
 
+    if (drained) {
+        bdrv_drained_end(bs_top);
+        bdrv_drained_end(bs_new);
+    }
+
     if (new_context && old_context != new_context) {
         aio_context_release(new_context);
         aio_context_acquire(old_context);
@@ -5456,6 +5476,7 @@ int bdrv_replace_child_bs(BdrvChild *child, BlockDriverState *new_bs,
     bdrv_ref(old_bs);
     bdrv_drained_begin(old_bs);
     bdrv_drained_begin(new_bs);
+    bdrv_graph_wrlock(new_bs);
 
     bdrv_replace_child_tran(child, new_bs, tran);
 
@@ -5463,6 +5484,7 @@ int bdrv_replace_child_bs(BdrvChild *child, BlockDriverState *new_bs,
     refresh_list = g_slist_prepend(refresh_list, new_bs);
 
     ret = bdrv_list_refresh_perms(refresh_list, NULL, tran, errp);
+    bdrv_graph_wrunlock();
 
     tran_finalize(tran, ret);
 
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PATCH 10/21] block: Mark bdrv_attach_child_common() GRAPH_WRLOCK
  2023-08-17 12:49 [PATCH 00/21] Graph locking part 4 (node management) Kevin Wolf
                   ` (8 preceding siblings ...)
  2023-08-17 12:50 ` [PATCH 09/21] block: Mark bdrv_replace_child_tran() GRAPH_WRLOCK Kevin Wolf
@ 2023-08-17 12:50 ` Kevin Wolf
  2023-08-22 19:16   ` Stefan Hajnoczi
  2023-08-17 12:50 ` [PATCH 11/21] block: Call transaction callbacks with lock held Kevin Wolf
                   ` (11 subsequent siblings)
  21 siblings, 1 reply; 70+ messages in thread
From: Kevin Wolf @ 2023-08-17 12:50 UTC (permalink / raw)
  To: qemu-block
  Cc: kwolf, stefanha, eblake, eesposit, pbonzini, vsementsov, qemu-devel

Instead of taking the writer lock internally, require callers to already
hold it when calling bdrv_attach_child_common(). These callers will
typically already hold the graph lock once the locking work is
completed, which means that they can't call functions that take it
internally.

Note that the transaction callbacks still take the lock internally, so
tran_finalize() must be called without the lock held. This is because
bdrv_append() also calls bdrv_replace_node_noperm(), which currently
requires the transaction callbacks to be called unlocked. In the next
step, both of them can be switched to locked tran_finalize() calls
together.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 block.c        | 133 +++++++++++++++++++++++++++++++------------------
 block/stream.c |  20 ++++++--
 2 files changed, 100 insertions(+), 53 deletions(-)

diff --git a/block.c b/block.c
index c06853f36e..780d129007 100644
--- a/block.c
+++ b/block.c
@@ -3002,13 +3002,14 @@ static TransactionActionDrv bdrv_attach_child_common_drv = {
  * @child_bs can move to a different AioContext in this function. Callers must
  * make sure that their AioContext locking is still correct after this.
  */
-static BdrvChild *bdrv_attach_child_common(BlockDriverState *child_bs,
-                                           const char *child_name,
-                                           const BdrvChildClass *child_class,
-                                           BdrvChildRole child_role,
-                                           uint64_t perm, uint64_t shared_perm,
-                                           void *opaque,
-                                           Transaction *tran, Error **errp)
+static BdrvChild * GRAPH_WRLOCK
+bdrv_attach_child_common(BlockDriverState *child_bs,
+                         const char *child_name,
+                         const BdrvChildClass *child_class,
+                         BdrvChildRole child_role,
+                         uint64_t perm, uint64_t shared_perm,
+                         void *opaque,
+                         Transaction *tran, Error **errp)
 {
     BdrvChild *new_child;
     AioContext *parent_ctx, *new_child_ctx;
@@ -3086,10 +3087,8 @@ static BdrvChild *bdrv_attach_child_common(BlockDriverState *child_bs,
      * a problem, we already did this), but it will still poll until the parent
      * is fully quiesced, so it will not be negatively affected either.
      */
-    bdrv_graph_wrlock(child_bs);
     bdrv_parent_drained_begin_single(new_child);
     bdrv_replace_child_noperm(new_child, child_bs);
-    bdrv_graph_wrunlock();
 
     BdrvAttachChildCommonState *s = g_new(BdrvAttachChildCommonState, 1);
     *s = (BdrvAttachChildCommonState) {
@@ -3114,13 +3113,14 @@ static BdrvChild *bdrv_attach_child_common(BlockDriverState *child_bs,
  * @child_bs can move to a different AioContext in this function. Callers must
  * make sure that their AioContext locking is still correct after this.
  */
-static BdrvChild *bdrv_attach_child_noperm(BlockDriverState *parent_bs,
-                                           BlockDriverState *child_bs,
-                                           const char *child_name,
-                                           const BdrvChildClass *child_class,
-                                           BdrvChildRole child_role,
-                                           Transaction *tran,
-                                           Error **errp)
+static BdrvChild * GRAPH_WRLOCK
+bdrv_attach_child_noperm(BlockDriverState *parent_bs,
+                         BlockDriverState *child_bs,
+                         const char *child_name,
+                         const BdrvChildClass *child_class,
+                         BdrvChildRole child_role,
+                         Transaction *tran,
+                         Error **errp)
 {
     uint64_t perm, shared_perm;
 
@@ -3165,6 +3165,8 @@ BdrvChild *bdrv_root_attach_child(BlockDriverState *child_bs,
 
     GLOBAL_STATE_CODE();
 
+    bdrv_graph_wrlock(child_bs);
+
     child = bdrv_attach_child_common(child_bs, child_name, child_class,
                                    child_role, perm, shared_perm, opaque,
                                    tran, errp);
@@ -3176,6 +3178,7 @@ BdrvChild *bdrv_root_attach_child(BlockDriverState *child_bs,
     ret = bdrv_refresh_perms(child_bs, tran, errp);
 
 out:
+    bdrv_graph_wrunlock();
     tran_finalize(tran, ret);
 
     bdrv_unref(child_bs);
@@ -3207,6 +3210,8 @@ BdrvChild *bdrv_attach_child(BlockDriverState *parent_bs,
 
     GLOBAL_STATE_CODE();
 
+    bdrv_graph_wrlock(child_bs);
+
     child = bdrv_attach_child_noperm(parent_bs, child_bs, child_name,
                                      child_class, child_role, tran, errp);
     if (!child) {
@@ -3220,6 +3225,7 @@ BdrvChild *bdrv_attach_child(BlockDriverState *parent_bs,
     }
 
 out:
+    bdrv_graph_wrunlock();
     tran_finalize(tran, ret);
 
     bdrv_unref(child_bs);
@@ -3377,16 +3383,20 @@ static BdrvChildRole bdrv_backing_role(BlockDriverState *bs)
  * Sets the bs->backing or bs->file link of a BDS. A new reference is created;
  * callers which don't need their own reference any more must call bdrv_unref().
  *
+ * If the respective child is already present (i.e. we're detaching a node),
+ * that child node must be drained.
+ *
  * Function doesn't update permissions, caller is responsible for this.
  *
  * The caller must hold the AioContext lock for @child_bs. Both @parent_bs and
  * @child_bs can move to a different AioContext in this function. Callers must
  * make sure that their AioContext locking is still correct after this.
  */
-static int bdrv_set_file_or_backing_noperm(BlockDriverState *parent_bs,
-                                           BlockDriverState *child_bs,
-                                           bool is_backing,
-                                           Transaction *tran, Error **errp)
+static int GRAPH_WRLOCK
+bdrv_set_file_or_backing_noperm(BlockDriverState *parent_bs,
+                                BlockDriverState *child_bs,
+                                bool is_backing,
+                                Transaction *tran, Error **errp)
 {
     bool update_inherits_from =
         bdrv_inherits_from_recursive(child_bs, parent_bs);
@@ -3437,14 +3447,9 @@ static int bdrv_set_file_or_backing_noperm(BlockDriverState *parent_bs,
     }
 
     if (child) {
-        bdrv_drained_begin(child->bs);
-        bdrv_graph_wrlock(NULL);
-
+        assert(child->bs->quiesce_counter);
         bdrv_unset_inherits_from(parent_bs, child, tran);
         bdrv_remove_child(child, tran);
-
-        bdrv_graph_wrunlock();
-        bdrv_drained_end(child->bs);
     }
 
     if (!child_bs) {
@@ -3469,9 +3474,7 @@ static int bdrv_set_file_or_backing_noperm(BlockDriverState *parent_bs,
     }
 
 out:
-    bdrv_graph_rdlock_main_loop();
     bdrv_refresh_limits(parent_bs, tran, NULL);
-    bdrv_graph_rdunlock_main_loop();
 
     return 0;
 }
@@ -3480,10 +3483,14 @@ out:
  * The caller must hold the AioContext lock for @backing_hd. Both @bs and
  * @backing_hd can move to a different AioContext in this function. Callers must
  * make sure that their AioContext locking is still correct after this.
+ *
+ * If a backing child is already present (i.e. we're detaching a node), that
+ * child node must be drained.
  */
-static int bdrv_set_backing_noperm(BlockDriverState *bs,
-                                   BlockDriverState *backing_hd,
-                                   Transaction *tran, Error **errp)
+static int GRAPH_WRLOCK
+bdrv_set_backing_noperm(BlockDriverState *bs,
+                        BlockDriverState *backing_hd,
+                        Transaction *tran, Error **errp)
 {
     GLOBAL_STATE_CODE();
     return bdrv_set_file_or_backing_noperm(bs, backing_hd, true, tran, errp);
@@ -3498,6 +3505,10 @@ int bdrv_set_backing_hd_drained(BlockDriverState *bs,
 
     GLOBAL_STATE_CODE();
     assert(bs->quiesce_counter > 0);
+    if (bs->backing) {
+        assert(bs->backing->bs->quiesce_counter > 0);
+    }
+    bdrv_graph_wrlock(backing_hd);
 
     ret = bdrv_set_backing_noperm(bs, backing_hd, tran, errp);
     if (ret < 0) {
@@ -3506,6 +3517,7 @@ int bdrv_set_backing_hd_drained(BlockDriverState *bs,
 
     ret = bdrv_refresh_perms(bs, tran, errp);
 out:
+    bdrv_graph_wrunlock();
     tran_finalize(tran, ret);
     return ret;
 }
@@ -3513,12 +3525,15 @@ out:
 int bdrv_set_backing_hd(BlockDriverState *bs, BlockDriverState *backing_hd,
                         Error **errp)
 {
+    BlockDriverState *drain_bs = bs->backing ? bs->backing->bs : bs;
     int ret;
     GLOBAL_STATE_CODE();
 
-    bdrv_drained_begin(bs);
+    bdrv_ref(drain_bs);
+    bdrv_drained_begin(drain_bs);
     ret = bdrv_set_backing_hd_drained(bs, backing_hd, errp);
-    bdrv_drained_end(bs);
+    bdrv_drained_end(drain_bs);
+    bdrv_unref(drain_bs);
 
     return ret;
 }
@@ -4595,6 +4610,7 @@ int bdrv_reopen_multiple(BlockReopenQueue *bs_queue, Error **errp)
 
 abort:
     tran_abort(tran);
+
     QTAILQ_FOREACH_SAFE(bs_entry, bs_queue, entry, next) {
         if (bs_entry->prepared) {
             ctx = bdrv_get_aio_context(bs_entry->state.bs);
@@ -4744,6 +4760,11 @@ static int bdrv_reopen_parse_file_or_backing(BDRVReopenState *reopen_state,
         reopen_state->old_file_bs = old_child_bs;
     }
 
+    if (old_child_bs) {
+        bdrv_ref(old_child_bs);
+        bdrv_drained_begin(old_child_bs);
+    }
+
     old_ctx = bdrv_get_aio_context(bs);
     ctx = bdrv_get_aio_context(new_child_bs);
     if (old_ctx != ctx) {
@@ -4751,14 +4772,23 @@ static int bdrv_reopen_parse_file_or_backing(BDRVReopenState *reopen_state,
         aio_context_acquire(ctx);
     }
 
+    bdrv_graph_wrlock(new_child_bs);
+
     ret = bdrv_set_file_or_backing_noperm(bs, new_child_bs, is_backing,
                                           tran, errp);
 
+    bdrv_graph_wrunlock();
+
     if (old_ctx != ctx) {
         aio_context_release(ctx);
         aio_context_acquire(old_ctx);
     }
 
+    if (old_child_bs) {
+        bdrv_drained_end(old_child_bs);
+        bdrv_unref(old_child_bs);
+    }
+
     return ret;
 }
 
@@ -5401,13 +5431,28 @@ int bdrv_append(BlockDriverState *bs_new, BlockDriverState *bs_top,
     BdrvChild *child;
     Transaction *tran = tran_new();
     AioContext *old_context, *new_context = NULL;
-    bool drained = false;
 
     GLOBAL_STATE_CODE();
 
     assert(!bs_new->backing);
 
     old_context = bdrv_get_aio_context(bs_top);
+    bdrv_drained_begin(bs_top);
+
+    /*
+     * bdrv_drained_begin() requires that only the AioContext of the drained
+     * node is locked, and at this point it can still differ from the AioContext
+     * of bs_top.
+     */
+    new_context = bdrv_get_aio_context(bs_new);
+    aio_context_release(old_context);
+    aio_context_acquire(new_context);
+    bdrv_drained_begin(bs_new);
+    aio_context_release(new_context);
+    aio_context_acquire(old_context);
+    new_context = NULL;
+
+    bdrv_graph_wrlock(bs_top);
 
     child = bdrv_attach_child_noperm(bs_new, bs_top, "backing",
                                      &child_of_bds, bdrv_backing_role(bs_new),
@@ -5418,10 +5463,9 @@ int bdrv_append(BlockDriverState *bs_new, BlockDriverState *bs_top,
     }
 
     /*
-     * bdrv_attach_child_noperm could change the AioContext of bs_top.
-     * bdrv_replace_node_noperm calls bdrv_drained_begin, so let's temporarily
-     * hold the new AioContext, since bdrv_drained_begin calls BDRV_POLL_WHILE
-     * that assumes the new lock is taken.
+     * bdrv_attach_child_noperm could change the AioContext of bs_top and
+     * bs_new, but at least they are in the same AioContext now. This is the
+     * AioContext that we need to lock for the rest of the function.
      */
     new_context = bdrv_get_aio_context(bs_top);
 
@@ -5430,29 +5474,22 @@ int bdrv_append(BlockDriverState *bs_new, BlockDriverState *bs_top,
         aio_context_acquire(new_context);
     }
 
-    bdrv_drained_begin(bs_new);
-    bdrv_drained_begin(bs_top);
-    drained = true;
-
-    bdrv_graph_wrlock(bs_new);
     ret = bdrv_replace_node_noperm(bs_top, bs_new, true, tran, errp);
-    bdrv_graph_wrunlock();
     if (ret < 0) {
         goto out;
     }
 
     ret = bdrv_refresh_perms(bs_new, tran, errp);
 out:
+    bdrv_graph_wrunlock();
     tran_finalize(tran, ret);
 
     bdrv_graph_rdlock_main_loop();
     bdrv_refresh_limits(bs_top, NULL, NULL);
     bdrv_graph_rdunlock_main_loop();
 
-    if (drained) {
-        bdrv_drained_end(bs_top);
-        bdrv_drained_end(bs_new);
-    }
+    bdrv_drained_end(bs_top);
+    bdrv_drained_end(bs_new);
 
     if (new_context && old_context != new_context) {
         aio_context_release(new_context);
diff --git a/block/stream.c b/block/stream.c
index e522bbdec5..e4da214f1f 100644
--- a/block/stream.c
+++ b/block/stream.c
@@ -54,6 +54,7 @@ static int stream_prepare(Job *job)
 {
     StreamBlockJob *s = container_of(job, StreamBlockJob, common.job);
     BlockDriverState *unfiltered_bs = bdrv_skip_filters(s->target_bs);
+    BlockDriverState *unfiltered_bs_cow = bdrv_cow_bs(unfiltered_bs);
     BlockDriverState *base;
     BlockDriverState *unfiltered_base;
     Error *local_err = NULL;
@@ -64,13 +65,18 @@ static int stream_prepare(Job *job)
     s->cor_filter_bs = NULL;
 
     /*
-     * bdrv_set_backing_hd() requires that unfiltered_bs is drained. Drain
-     * already here and use bdrv_set_backing_hd_drained() instead because
-     * the polling during drained_begin() might change the graph, and if we do
-     * this only later, we may end up working with the wrong base node (or it
-     * might even have gone away by the time we want to use it).
+     * bdrv_set_backing_hd() requires that the unfiltered_bs and the COW child
+     * of unfiltered_bs is drained. Drain already here and use
+     * bdrv_set_backing_hd_drained() instead because the polling during
+     * drained_begin() might change the graph, and if we do this only later, we
+     * may end up working with the wrong base node (or it might even have gone
+     * away by the time we want to use it).
      */
     bdrv_drained_begin(unfiltered_bs);
+    if (unfiltered_bs_cow) {
+        bdrv_ref(unfiltered_bs_cow);
+        bdrv_drained_begin(unfiltered_bs_cow);
+    }
 
     base = bdrv_filter_or_cow_bs(s->above_base);
     unfiltered_base = bdrv_skip_filters(base);
@@ -100,6 +106,10 @@ static int stream_prepare(Job *job)
     }
 
 out:
+    if (unfiltered_bs_cow) {
+        bdrv_drained_end(unfiltered_bs_cow);
+        bdrv_unref(unfiltered_bs_cow);
+    }
     bdrv_drained_end(unfiltered_bs);
     return ret;
 }
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PATCH 11/21] block: Call transaction callbacks with lock held
  2023-08-17 12:49 [PATCH 00/21] Graph locking part 4 (node management) Kevin Wolf
                   ` (9 preceding siblings ...)
  2023-08-17 12:50 ` [PATCH 10/21] block: Mark bdrv_attach_child_common() GRAPH_WRLOCK Kevin Wolf
@ 2023-08-17 12:50 ` Kevin Wolf
  2023-08-22 19:19   ` Stefan Hajnoczi
  2023-08-17 12:50 ` [PATCH 12/21] block: Mark bdrv_attach_child() GRAPH_WRLOCK Kevin Wolf
                   ` (10 subsequent siblings)
  21 siblings, 1 reply; 70+ messages in thread
From: Kevin Wolf @ 2023-08-17 12:50 UTC (permalink / raw)
  To: qemu-block
  Cc: kwolf, stefanha, eblake, eesposit, pbonzini, vsementsov, qemu-devel

In previous patches, we changed some transactionable functions to be
marked as GRAPH_WRLOCK, but required that tran_finalize() is still
called without the lock. This was because all callbacks that can be in
the same transaction need to follow the same convention.

Now that we don't have conflicting requirements any more, we can switch
all of the transaction callbacks to be declared GRAPH_WRLOCK, too, and
call tran_finalize() with the lock held.

Document for each of these transactionable functions that the lock needs
to be held when completing the transaction, and make sure that all
callers down to the place where the transaction is finalised actually
have the writer lock.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 block.c | 61 +++++++++++++++++++++++++++++++++++++++++----------------
 1 file changed, 44 insertions(+), 17 deletions(-)

diff --git a/block.c b/block.c
index 780d129007..064851e0e1 100644
--- a/block.c
+++ b/block.c
@@ -2373,21 +2373,21 @@ typedef struct BdrvReplaceChildState {
     BlockDriverState *old_bs;
 } BdrvReplaceChildState;
 
-static void bdrv_replace_child_commit(void *opaque)
+static void GRAPH_WRLOCK bdrv_replace_child_commit(void *opaque)
 {
     BdrvReplaceChildState *s = opaque;
     GLOBAL_STATE_CODE();
 
-    bdrv_unref(s->old_bs);
+    bdrv_schedule_unref(s->old_bs);
 }
 
-static void bdrv_replace_child_abort(void *opaque)
+static void GRAPH_WRLOCK bdrv_replace_child_abort(void *opaque)
 {
     BdrvReplaceChildState *s = opaque;
     BlockDriverState *new_bs = s->child->bs;
 
     GLOBAL_STATE_CODE();
-    bdrv_graph_wrlock(s->old_bs);
+    assert_bdrv_graph_writable();
 
     /* old_bs reference is transparently moved from @s to @s->child */
     if (!s->child->bs) {
@@ -2406,7 +2406,6 @@ static void bdrv_replace_child_abort(void *opaque)
     assert(s->child->quiesced_parent);
     bdrv_replace_child_noperm(s->child, s->old_bs);
 
-    bdrv_graph_wrunlock();
     bdrv_unref(new_bs);
 }
 
@@ -2424,6 +2423,9 @@ static TransactionActionDrv bdrv_replace_child_drv = {
  * Both @child->bs and @new_bs (if non-NULL) must be drained. @new_bs must be
  * kept drained until the transaction is completed.
  *
+ * After calling this function, the transaction @tran may only be completed
+ * while holding a writer lock for the graph.
+ *
  * The function doesn't update permissions, caller is responsible for this.
  */
 static void GRAPH_WRLOCK
@@ -2949,16 +2951,15 @@ typedef struct BdrvAttachChildCommonState {
     AioContext *old_child_ctx;
 } BdrvAttachChildCommonState;
 
-static void bdrv_attach_child_common_abort(void *opaque)
+static void GRAPH_WRLOCK bdrv_attach_child_common_abort(void *opaque)
 {
     BdrvAttachChildCommonState *s = opaque;
     BlockDriverState *bs = s->child->bs;
 
     GLOBAL_STATE_CODE();
+    assert_bdrv_graph_writable();
 
-    bdrv_graph_wrlock(NULL);
     bdrv_replace_child_noperm(s->child, NULL);
-    bdrv_graph_wrunlock();
 
     if (bdrv_get_aio_context(bs) != s->old_child_ctx) {
         bdrv_try_change_aio_context(bs, s->old_child_ctx, NULL, &error_abort);
@@ -2982,7 +2983,7 @@ static void bdrv_attach_child_common_abort(void *opaque)
         tran_commit(tran);
     }
 
-    bdrv_unref(bs);
+    bdrv_schedule_unref(bs);
     bdrv_child_free(s->child);
 }
 
@@ -2996,6 +2997,9 @@ static TransactionActionDrv bdrv_attach_child_common_drv = {
  *
  * Function doesn't update permissions, caller is responsible for this.
  *
+ * After calling this function, the transaction @tran may only be completed
+ * while holding a writer lock for the graph.
+ *
  * Returns new created child.
  *
  * The caller must hold the AioContext lock for @child_bs. Both @parent_bs and
@@ -3112,6 +3116,9 @@ bdrv_attach_child_common(BlockDriverState *child_bs,
  * The caller must hold the AioContext lock for @child_bs. Both @parent_bs and
  * @child_bs can move to a different AioContext in this function. Callers must
  * make sure that their AioContext locking is still correct after this.
+ *
+ * After calling this function, the transaction @tran may only be completed
+ * while holding a writer lock for the graph.
  */
 static BdrvChild * GRAPH_WRLOCK
 bdrv_attach_child_noperm(BlockDriverState *parent_bs,
@@ -3178,8 +3185,8 @@ BdrvChild *bdrv_root_attach_child(BlockDriverState *child_bs,
     ret = bdrv_refresh_perms(child_bs, tran, errp);
 
 out:
-    bdrv_graph_wrunlock();
     tran_finalize(tran, ret);
+    bdrv_graph_wrunlock();
 
     bdrv_unref(child_bs);
 
@@ -3225,8 +3232,8 @@ BdrvChild *bdrv_attach_child(BlockDriverState *parent_bs,
     }
 
 out:
-    bdrv_graph_wrunlock();
     tran_finalize(tran, ret);
+    bdrv_graph_wrunlock();
 
     bdrv_unref(child_bs);
 
@@ -3391,6 +3398,9 @@ static BdrvChildRole bdrv_backing_role(BlockDriverState *bs)
  * The caller must hold the AioContext lock for @child_bs. Both @parent_bs and
  * @child_bs can move to a different AioContext in this function. Callers must
  * make sure that their AioContext locking is still correct after this.
+ *
+ * After calling this function, the transaction @tran may only be completed
+ * while holding a writer lock for the graph.
  */
 static int GRAPH_WRLOCK
 bdrv_set_file_or_backing_noperm(BlockDriverState *parent_bs,
@@ -3486,6 +3496,9 @@ out:
  *
  * If a backing child is already present (i.e. we're detaching a node), that
  * child node must be drained.
+ *
+ * After calling this function, the transaction @tran may only be completed
+ * while holding a writer lock for the graph.
  */
 static int GRAPH_WRLOCK
 bdrv_set_backing_noperm(BlockDriverState *bs,
@@ -3517,8 +3530,8 @@ int bdrv_set_backing_hd_drained(BlockDriverState *bs,
 
     ret = bdrv_refresh_perms(bs, tran, errp);
 out:
-    bdrv_graph_wrunlock();
     tran_finalize(tran, ret);
+    bdrv_graph_wrunlock();
     return ret;
 }
 
@@ -4592,7 +4605,9 @@ int bdrv_reopen_multiple(BlockReopenQueue *bs_queue, Error **errp)
         aio_context_release(ctx);
     }
 
+    bdrv_graph_wrlock(NULL);
     tran_commit(tran);
+    bdrv_graph_wrunlock();
 
     QTAILQ_FOREACH_REVERSE(bs_entry, bs_queue, entry) {
         BlockDriverState *bs = bs_entry->state.bs;
@@ -4609,7 +4624,9 @@ int bdrv_reopen_multiple(BlockReopenQueue *bs_queue, Error **errp)
     goto cleanup;
 
 abort:
+    bdrv_graph_wrlock(NULL);
     tran_abort(tran);
+    bdrv_graph_wrunlock();
 
     QTAILQ_FOREACH_SAFE(bs_entry, bs_queue, entry, next) {
         if (bs_entry->prepared) {
@@ -4676,6 +4693,9 @@ int bdrv_reopen_set_read_only(BlockDriverState *bs, bool read_only,
  * true and reopen_state->new_backing_bs contains a pointer to the new
  * backing BlockDriverState (or NULL).
  *
+ * After calling this function, the transaction @tran may only be completed
+ * while holding a writer lock for the graph.
+ *
  * Return 0 on success, otherwise return < 0 and set @errp.
  *
  * The caller must hold the AioContext lock of @reopen_state->bs.
@@ -4809,6 +4829,9 @@ static int bdrv_reopen_parse_file_or_backing(BDRVReopenState *reopen_state,
  * commit() for any other BDS that have been left in a prepare() state
  *
  * The caller must hold the AioContext lock of @reopen_state->bs.
+ *
+ * After calling this function, the transaction @change_child_tran may only be
+ * completed while holding a writer lock for the graph.
  */
 static int bdrv_reopen_prepare(BDRVReopenState *reopen_state,
                                BlockReopenQueue *queue,
@@ -5260,6 +5283,9 @@ static TransactionActionDrv bdrv_remove_child_drv = {
  * Function doesn't update permissions, caller is responsible for this.
  *
  * @child->bs (if non-NULL) must be drained.
+ *
+ * After calling this function, the transaction @tran may only be completed
+ * while holding a writer lock for the graph.
  */
 static void GRAPH_WRLOCK bdrv_remove_child(BdrvChild *child, Transaction *tran)
 {
@@ -5278,6 +5304,9 @@ static void GRAPH_WRLOCK bdrv_remove_child(BdrvChild *child, Transaction *tran)
 /*
  * Both @from and @to (if non-NULL) must be drained. @to must be kept drained
  * until the transaction is completed.
+ *
+ * After calling this function, the transaction @tran may only be completed
+ * while holding a writer lock for the graph.
  */
 static int GRAPH_WRLOCK
 bdrv_replace_node_noperm(BlockDriverState *from,
@@ -5384,8 +5413,8 @@ static int bdrv_replace_node_common(BlockDriverState *from,
     ret = 0;
 
 out:
-    bdrv_graph_wrunlock();
     tran_finalize(tran, ret);
+    bdrv_graph_wrunlock();
 
     bdrv_drained_end(to);
     bdrv_drained_end(from);
@@ -5481,12 +5510,10 @@ int bdrv_append(BlockDriverState *bs_new, BlockDriverState *bs_top,
 
     ret = bdrv_refresh_perms(bs_new, tran, errp);
 out:
-    bdrv_graph_wrunlock();
     tran_finalize(tran, ret);
 
-    bdrv_graph_rdlock_main_loop();
     bdrv_refresh_limits(bs_top, NULL, NULL);
-    bdrv_graph_rdunlock_main_loop();
+    bdrv_graph_wrunlock();
 
     bdrv_drained_end(bs_top);
     bdrv_drained_end(bs_new);
@@ -5521,10 +5548,10 @@ int bdrv_replace_child_bs(BdrvChild *child, BlockDriverState *new_bs,
     refresh_list = g_slist_prepend(refresh_list, new_bs);
 
     ret = bdrv_list_refresh_perms(refresh_list, NULL, tran, errp);
-    bdrv_graph_wrunlock();
 
     tran_finalize(tran, ret);
 
+    bdrv_graph_wrunlock();
     bdrv_drained_end(old_bs);
     bdrv_drained_end(new_bs);
     bdrv_unref(old_bs);
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PATCH 12/21] block: Mark bdrv_attach_child() GRAPH_WRLOCK
  2023-08-17 12:49 [PATCH 00/21] Graph locking part 4 (node management) Kevin Wolf
                   ` (10 preceding siblings ...)
  2023-08-17 12:50 ` [PATCH 11/21] block: Call transaction callbacks with lock held Kevin Wolf
@ 2023-08-17 12:50 ` Kevin Wolf
  2023-08-21  7:35   ` Emanuele Giuseppe Esposito
  2023-08-22 19:21   ` Stefan Hajnoczi
  2023-08-17 12:50 ` [PATCH 13/21] block: Mark bdrv_parent_perms_conflict() and callers GRAPH_RDLOCK Kevin Wolf
                   ` (9 subsequent siblings)
  21 siblings, 2 replies; 70+ messages in thread
From: Kevin Wolf @ 2023-08-17 12:50 UTC (permalink / raw)
  To: qemu-block
  Cc: kwolf, stefanha, eblake, eesposit, pbonzini, vsementsov, qemu-devel

Instead of taking the writer lock internally, require callers to already
hold it when calling bdrv_attach_child_common(). These callers will
typically already hold the graph lock once the locking work is
completed, which means that they can't call functions that take it
internally.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 include/block/block-global-state.h | 14 ++++++++------
 block.c                            |  7 +++----
 block/quorum.c                     |  2 ++
 block/replication.c                |  6 ++++++
 tests/unit/test-bdrv-drain.c       | 14 ++++++++++++++
 tests/unit/test-bdrv-graph-mod.c   | 10 ++++++++++
 6 files changed, 43 insertions(+), 10 deletions(-)

diff --git a/include/block/block-global-state.h b/include/block/block-global-state.h
index e570799f85..eb12a35439 100644
--- a/include/block/block-global-state.h
+++ b/include/block/block-global-state.h
@@ -226,12 +226,14 @@ void no_coroutine_fn bdrv_unref(BlockDriverState *bs);
 void coroutine_fn no_co_wrapper bdrv_co_unref(BlockDriverState *bs);
 void GRAPH_WRLOCK bdrv_schedule_unref(BlockDriverState *bs);
 void bdrv_unref_child(BlockDriverState *parent, BdrvChild *child);
-BdrvChild *bdrv_attach_child(BlockDriverState *parent_bs,
-                             BlockDriverState *child_bs,
-                             const char *child_name,
-                             const BdrvChildClass *child_class,
-                             BdrvChildRole child_role,
-                             Error **errp);
+
+BdrvChild * GRAPH_WRLOCK
+bdrv_attach_child(BlockDriverState *parent_bs,
+                  BlockDriverState *child_bs,
+                  const char *child_name,
+                  const BdrvChildClass *child_class,
+                  BdrvChildRole child_role,
+                  Error **errp);
 
 bool bdrv_op_is_blocked(BlockDriverState *bs, BlockOpType op, Error **errp);
 void bdrv_op_block(BlockDriverState *bs, BlockOpType op, Error *reason);
diff --git a/block.c b/block.c
index 064851e0e1..67ffa3518c 100644
--- a/block.c
+++ b/block.c
@@ -3217,8 +3217,6 @@ BdrvChild *bdrv_attach_child(BlockDriverState *parent_bs,
 
     GLOBAL_STATE_CODE();
 
-    bdrv_graph_wrlock(child_bs);
-
     child = bdrv_attach_child_noperm(parent_bs, child_bs, child_name,
                                      child_class, child_role, tran, errp);
     if (!child) {
@@ -3233,9 +3231,8 @@ BdrvChild *bdrv_attach_child(BlockDriverState *parent_bs,
 
 out:
     tran_finalize(tran, ret);
-    bdrv_graph_wrunlock();
 
-    bdrv_unref(child_bs);
+    bdrv_schedule_unref(child_bs);
 
     return ret < 0 ? NULL : child;
 }
@@ -3756,11 +3753,13 @@ BdrvChild *bdrv_open_child(const char *filename,
         return NULL;
     }
 
+    bdrv_graph_wrlock(NULL);
     ctx = bdrv_get_aio_context(bs);
     aio_context_acquire(ctx);
     child = bdrv_attach_child(parent, bs, bdref_key, child_class, child_role,
                               errp);
     aio_context_release(ctx);
+    bdrv_graph_wrunlock();
 
     return child;
 }
diff --git a/block/quorum.c b/block/quorum.c
index f28758cf2b..def0539fda 100644
--- a/block/quorum.c
+++ b/block/quorum.c
@@ -1094,8 +1094,10 @@ static void quorum_add_child(BlockDriverState *bs, BlockDriverState *child_bs,
     /* We can safely add the child now */
     bdrv_ref(child_bs);
 
+    bdrv_graph_wrlock(child_bs);
     child = bdrv_attach_child(bs, child_bs, indexstr, &child_of_bds,
                               BDRV_CHILD_DATA, errp);
+    bdrv_graph_wrunlock();
     if (child == NULL) {
         s->next_child_index--;
         goto out;
diff --git a/block/replication.c b/block/replication.c
index ea4bf1aa80..eec9819625 100644
--- a/block/replication.c
+++ b/block/replication.c
@@ -542,12 +542,15 @@ static void replication_start(ReplicationState *rs, ReplicationMode mode,
             return;
         }
 
+        bdrv_graph_wrlock(bs);
+
         bdrv_ref(hidden_disk->bs);
         s->hidden_disk = bdrv_attach_child(bs, hidden_disk->bs, "hidden disk",
                                            &child_of_bds, BDRV_CHILD_DATA,
                                            &local_err);
         if (local_err) {
             error_propagate(errp, local_err);
+            bdrv_graph_wrunlock();
             aio_context_release(aio_context);
             return;
         }
@@ -558,10 +561,13 @@ static void replication_start(ReplicationState *rs, ReplicationMode mode,
                                               BDRV_CHILD_DATA, &local_err);
         if (local_err) {
             error_propagate(errp, local_err);
+            bdrv_graph_wrunlock();
             aio_context_release(aio_context);
             return;
         }
 
+        bdrv_graph_wrunlock();
+
         /* start backup job now */
         error_setg(&s->blocker,
                    "Block device is in use by internal backup job");
diff --git a/tests/unit/test-bdrv-drain.c b/tests/unit/test-bdrv-drain.c
index 89c8fa6780..6d33249656 100644
--- a/tests/unit/test-bdrv-drain.c
+++ b/tests/unit/test-bdrv-drain.c
@@ -1055,8 +1055,10 @@ static void do_test_delete_by_drain(bool detach_instead_of_delete,
 
     null_bs = bdrv_open("null-co://", NULL, NULL, BDRV_O_RDWR | BDRV_O_PROTOCOL,
                         &error_abort);
+    bdrv_graph_wrlock(NULL);
     bdrv_attach_child(bs, null_bs, "null-child", &child_of_bds,
                       BDRV_CHILD_DATA, &error_abort);
+    bdrv_graph_wrunlock();
 
     /* This child will be the one to pass to requests through to, and
      * it will stall until a drain occurs */
@@ -1064,17 +1066,21 @@ static void do_test_delete_by_drain(bool detach_instead_of_delete,
                                     &error_abort);
     child_bs->total_sectors = 65536 >> BDRV_SECTOR_BITS;
     /* Takes our reference to child_bs */
+    bdrv_graph_wrlock(NULL);
     tts->wait_child = bdrv_attach_child(bs, child_bs, "wait-child",
                                         &child_of_bds,
                                         BDRV_CHILD_DATA | BDRV_CHILD_PRIMARY,
                                         &error_abort);
+    bdrv_graph_wrunlock();
 
     /* This child is just there to be deleted
      * (for detach_instead_of_delete == true) */
     null_bs = bdrv_open("null-co://", NULL, NULL, BDRV_O_RDWR | BDRV_O_PROTOCOL,
                         &error_abort);
+    bdrv_graph_wrlock(NULL);
     bdrv_attach_child(bs, null_bs, "null-child", &child_of_bds, BDRV_CHILD_DATA,
                       &error_abort);
+    bdrv_graph_wrunlock();
 
     blk = blk_new(qemu_get_aio_context(), BLK_PERM_ALL, BLK_PERM_ALL);
     blk_insert_bs(blk, bs, &error_abort);
@@ -1159,9 +1165,11 @@ static void detach_indirect_bh(void *opaque)
     bdrv_unref_child(data->parent_b, data->child_b);
 
     bdrv_ref(data->c);
+    bdrv_graph_wrlock(NULL);
     data->child_c = bdrv_attach_child(data->parent_b, data->c, "PB-C",
                                       &child_of_bds, BDRV_CHILD_DATA,
                                       &error_abort);
+    bdrv_graph_wrunlock();
 }
 
 static void detach_by_parent_aio_cb(void *opaque, int ret)
@@ -1258,6 +1266,7 @@ static void test_detach_indirect(bool by_parent_cb)
     /* Set child relationships */
     bdrv_ref(b);
     bdrv_ref(a);
+    bdrv_graph_wrlock(NULL);
     child_b = bdrv_attach_child(parent_b, b, "PB-B", &child_of_bds,
                                 BDRV_CHILD_DATA, &error_abort);
     child_a = bdrv_attach_child(parent_b, a, "PB-A", &child_of_bds,
@@ -1267,6 +1276,7 @@ static void test_detach_indirect(bool by_parent_cb)
     bdrv_attach_child(parent_a, a, "PA-A",
                       by_parent_cb ? &child_of_bds : &detach_by_driver_cb_class,
                       BDRV_CHILD_DATA, &error_abort);
+    bdrv_graph_wrunlock();
 
     g_assert_cmpint(parent_a->refcnt, ==, 1);
     g_assert_cmpint(parent_b->refcnt, ==, 1);
@@ -1685,6 +1695,7 @@ static void test_drop_intermediate_poll(void)
      * Establish the chain last, so the chain links are the first
      * elements in the BDS.parents lists
      */
+    bdrv_graph_wrlock(NULL);
     for (i = 0; i < 3; i++) {
         if (i) {
             /* Takes the reference to chain[i - 1] */
@@ -1692,6 +1703,7 @@ static void test_drop_intermediate_poll(void)
                               &chain_child_class, BDRV_CHILD_COW, &error_abort);
         }
     }
+    bdrv_graph_wrunlock();
 
     job = block_job_create("job", &test_simple_job_driver, NULL, job_node,
                            0, BLK_PERM_ALL, 0, 0, NULL, NULL, &error_abort);
@@ -1936,8 +1948,10 @@ static void do_test_replace_child_mid_drain(int old_drain_count,
     new_child_bs->total_sectors = 1;
 
     bdrv_ref(old_child_bs);
+    bdrv_graph_wrlock(NULL);
     bdrv_attach_child(parent_bs, old_child_bs, "child", &child_of_bds,
                       BDRV_CHILD_COW, &error_abort);
+    bdrv_graph_wrunlock();
     parent_s->setup_completed = true;
 
     for (i = 0; i < old_drain_count; i++) {
diff --git a/tests/unit/test-bdrv-graph-mod.c b/tests/unit/test-bdrv-graph-mod.c
index dc80ee8f85..65811e5520 100644
--- a/tests/unit/test-bdrv-graph-mod.c
+++ b/tests/unit/test-bdrv-graph-mod.c
@@ -137,8 +137,10 @@ static void test_update_perm_tree(void)
 
     blk_insert_bs(root, bs, &error_abort);
 
+    bdrv_graph_wrlock(NULL);
     bdrv_attach_child(filter, bs, "child", &child_of_bds,
                       BDRV_CHILD_DATA, &error_abort);
+    bdrv_graph_wrunlock();
 
     aio_context_acquire(qemu_get_aio_context());
     ret = bdrv_append(filter, bs, NULL);
@@ -205,8 +207,10 @@ static void test_should_update_child(void)
     bdrv_set_backing_hd(target, bs, &error_abort);
 
     g_assert(target->backing->bs == bs);
+    bdrv_graph_wrlock(NULL);
     bdrv_attach_child(filter, target, "target", &child_of_bds,
                       BDRV_CHILD_DATA, &error_abort);
+    bdrv_graph_wrunlock();
     aio_context_acquire(qemu_get_aio_context());
     bdrv_append(filter, bs, &error_abort);
     aio_context_release(qemu_get_aio_context());
@@ -236,6 +240,7 @@ static void test_parallel_exclusive_write(void)
      */
     bdrv_ref(base);
 
+    bdrv_graph_wrlock(NULL);
     bdrv_attach_child(top, fl1, "backing", &child_of_bds,
                       BDRV_CHILD_FILTERED | BDRV_CHILD_PRIMARY,
                       &error_abort);
@@ -245,6 +250,7 @@ static void test_parallel_exclusive_write(void)
     bdrv_attach_child(fl2, base, "backing", &child_of_bds,
                       BDRV_CHILD_FILTERED | BDRV_CHILD_PRIMARY,
                       &error_abort);
+    bdrv_graph_wrunlock();
 
     bdrv_replace_node(fl1, fl2, &error_abort);
 
@@ -349,6 +355,7 @@ static void test_parallel_perm_update(void)
      */
     bdrv_ref(base);
 
+    bdrv_graph_wrlock(NULL);
     bdrv_attach_child(top, ws, "file", &child_of_bds, BDRV_CHILD_DATA,
                       &error_abort);
     c_fl1 = bdrv_attach_child(ws, fl1, "first", &child_of_bds,
@@ -361,6 +368,7 @@ static void test_parallel_perm_update(void)
     bdrv_attach_child(fl2, base, "backing", &child_of_bds,
                       BDRV_CHILD_FILTERED | BDRV_CHILD_PRIMARY,
                       &error_abort);
+    bdrv_graph_wrunlock();
 
     /* Select fl1 as first child to be active */
     s->selected = c_fl1;
@@ -410,9 +418,11 @@ static void test_append_greedy_filter(void)
     BlockDriverState *base = no_perm_node("base");
     BlockDriverState *fl = exclusive_writer_node("fl1");
 
+    bdrv_graph_wrlock(NULL);
     bdrv_attach_child(top, base, "backing", &child_of_bds,
                       BDRV_CHILD_FILTERED | BDRV_CHILD_PRIMARY,
                       &error_abort);
+    bdrv_graph_wrunlock();
 
     aio_context_acquire(qemu_get_aio_context());
     bdrv_append(fl, base, &error_abort);
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PATCH 13/21] block: Mark bdrv_parent_perms_conflict() and callers GRAPH_RDLOCK
  2023-08-17 12:49 [PATCH 00/21] Graph locking part 4 (node management) Kevin Wolf
                   ` (11 preceding siblings ...)
  2023-08-17 12:50 ` [PATCH 12/21] block: Mark bdrv_attach_child() GRAPH_WRLOCK Kevin Wolf
@ 2023-08-17 12:50 ` Kevin Wolf
  2023-08-21  7:35   ` Emanuele Giuseppe Esposito
  2023-08-22 19:23   ` Stefan Hajnoczi
  2023-08-17 12:50 ` [PATCH 14/21] block: Mark bdrv_get_cumulative_perm() " Kevin Wolf
                   ` (8 subsequent siblings)
  21 siblings, 2 replies; 70+ messages in thread
From: Kevin Wolf @ 2023-08-17 12:50 UTC (permalink / raw)
  To: qemu-block
  Cc: kwolf, stefanha, eblake, eesposit, pbonzini, vsementsov, qemu-devel

The function reads the parents list, so it needs to hold the graph lock.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 include/block/block_int-common.h            |  6 ++---
 include/block/block_int-global-state.h      |  8 +++---
 include/sysemu/block-backend-global-state.h |  4 +--
 block.c                                     | 28 +++++++++++++--------
 block/block-backend.c                       | 26 ++++++++++++++-----
 block/crypto.c                              |  6 +++--
 block/mirror.c                              |  8 ++++++
 block/vmdk.c                                |  2 ++
 tests/unit/test-bdrv-graph-mod.c            |  4 +++
 9 files changed, 66 insertions(+), 26 deletions(-)

diff --git a/include/block/block_int-common.h b/include/block/block_int-common.h
index 74195c3004..5738451a0a 100644
--- a/include/block/block_int-common.h
+++ b/include/block/block_int-common.h
@@ -311,7 +311,7 @@ struct BlockDriver {
      */
     void (*bdrv_cancel_in_flight)(BlockDriverState *bs);
 
-    int (*bdrv_inactivate)(BlockDriverState *bs);
+    int GRAPH_RDLOCK_PTR (*bdrv_inactivate)(BlockDriverState *bs);
 
     int (*bdrv_snapshot_create)(BlockDriverState *bs,
                                 QEMUSnapshotInfo *sn_info);
@@ -944,8 +944,8 @@ struct BdrvChildClass {
      * when migration is completing) and it can start/stop requesting
      * permissions and doing I/O on it.
      */
-    void (*activate)(BdrvChild *child, Error **errp);
-    int (*inactivate)(BdrvChild *child);
+    void GRAPH_RDLOCK_PTR (*activate)(BdrvChild *child, Error **errp);
+    int GRAPH_RDLOCK_PTR (*inactivate)(BdrvChild *child);
 
     void GRAPH_WRLOCK_PTR (*attach)(BdrvChild *child);
     void GRAPH_WRLOCK_PTR (*detach)(BdrvChild *child);
diff --git a/include/block/block_int-global-state.h b/include/block/block_int-global-state.h
index da5fb31089..bebcc08bce 100644
--- a/include/block/block_int-global-state.h
+++ b/include/block/block_int-global-state.h
@@ -212,8 +212,9 @@ void bdrv_get_cumulative_perm(BlockDriverState *bs, uint64_t *perm,
  * bdrv_child_refresh_perms() instead and make the parent's
  * .bdrv_child_perm() implementation return the correct values.
  */
-int bdrv_child_try_set_perm(BdrvChild *c, uint64_t perm, uint64_t shared,
-                            Error **errp);
+int GRAPH_RDLOCK
+bdrv_child_try_set_perm(BdrvChild *c, uint64_t perm, uint64_t shared,
+                        Error **errp);
 
 /**
  * Calls bs->drv->bdrv_child_perm() and updates the child's permission
@@ -223,7 +224,8 @@ int bdrv_child_try_set_perm(BdrvChild *c, uint64_t perm, uint64_t shared,
  * values than before, but which will not result in the block layer
  * automatically refreshing the permissions.
  */
-int bdrv_child_refresh_perms(BlockDriverState *bs, BdrvChild *c, Error **errp);
+int GRAPH_RDLOCK
+bdrv_child_refresh_perms(BlockDriverState *bs, BdrvChild *c, Error **errp);
 
 bool GRAPH_RDLOCK bdrv_recurse_can_replace(BlockDriverState *bs,
                                            BlockDriverState *to_replace);
diff --git a/include/sysemu/block-backend-global-state.h b/include/sysemu/block-backend-global-state.h
index 184e667ebd..d5f675493a 100644
--- a/include/sysemu/block-backend-global-state.h
+++ b/include/sysemu/block-backend-global-state.h
@@ -61,8 +61,8 @@ int blk_insert_bs(BlockBackend *blk, BlockDriverState *bs, Error **errp);
 int blk_replace_bs(BlockBackend *blk, BlockDriverState *new_bs, Error **errp);
 bool bdrv_has_blk(BlockDriverState *bs);
 bool bdrv_is_root_node(BlockDriverState *bs);
-int blk_set_perm(BlockBackend *blk, uint64_t perm, uint64_t shared_perm,
-                 Error **errp);
+int GRAPH_UNLOCKED blk_set_perm(BlockBackend *blk, uint64_t perm,
+                                uint64_t shared_perm, Error **errp);
 void blk_get_perm(BlockBackend *blk, uint64_t *perm, uint64_t *shared_perm);
 
 void blk_iostatus_enable(BlockBackend *blk);
diff --git a/block.c b/block.c
index 67ffa3518c..66de57aa9b 100644
--- a/block.c
+++ b/block.c
@@ -2200,7 +2200,8 @@ static bool bdrv_a_allow_b(BdrvChild *a, BdrvChild *b, Error **errp)
     return false;
 }
 
-static bool bdrv_parent_perms_conflict(BlockDriverState *bs, Error **errp)
+static bool GRAPH_RDLOCK
+bdrv_parent_perms_conflict(BlockDriverState *bs, Error **errp)
 {
     BdrvChild *a, *b;
     GLOBAL_STATE_CODE();
@@ -2253,8 +2254,8 @@ static void bdrv_child_perm(BlockDriverState *bs, BlockDriverState *child_bs,
  * simplest way to satisfy this criteria: use only result of
  * bdrv_topological_dfs() or NULL as @list parameter.
  */
-static GSList *bdrv_topological_dfs(GSList *list, GHashTable *found,
-                                    BlockDriverState *bs)
+static GSList * GRAPH_RDLOCK
+bdrv_topological_dfs(GSList *list, GHashTable *found, BlockDriverState *bs)
 {
     BdrvChild *child;
     g_autoptr(GHashTable) local_found = NULL;
@@ -2530,8 +2531,9 @@ static int bdrv_node_refresh_perm(BlockDriverState *bs, BlockReopenQueue *q,
  * @list is a product of bdrv_topological_dfs() (may be called several times) -
  * a topologically sorted subgraph.
  */
-static int bdrv_do_refresh_perms(GSList *list, BlockReopenQueue *q,
-                                 Transaction *tran, Error **errp)
+static int GRAPH_RDLOCK
+bdrv_do_refresh_perms(GSList *list, BlockReopenQueue *q, Transaction *tran,
+                      Error **errp)
 {
     int ret;
     BlockDriverState *bs;
@@ -2558,8 +2560,9 @@ static int bdrv_do_refresh_perms(GSList *list, BlockReopenQueue *q,
  * topologically sorted. It's not a problem if some node occurs in the @list
  * several times.
  */
-static int bdrv_list_refresh_perms(GSList *list, BlockReopenQueue *q,
-                                   Transaction *tran, Error **errp)
+static int GRAPH_RDLOCK
+bdrv_list_refresh_perms(GSList *list, BlockReopenQueue *q, Transaction *tran,
+                        Error **errp)
 {
     g_autoptr(GHashTable) found = g_hash_table_new(NULL, NULL);
     g_autoptr(GSList) refresh_list = NULL;
@@ -2619,8 +2622,8 @@ char *bdrv_perm_names(uint64_t perm)
 
 
 /* @tran is allowed to be NULL. In this case no rollback is possible */
-static int bdrv_refresh_perms(BlockDriverState *bs, Transaction *tran,
-                              Error **errp)
+static int GRAPH_RDLOCK
+bdrv_refresh_perms(BlockDriverState *bs, Transaction *tran, Error **errp)
 {
     int ret;
     Transaction *local_tran = NULL;
@@ -3245,7 +3248,6 @@ void bdrv_root_unref_child(BdrvChild *child)
     GLOBAL_STATE_CODE();
     bdrv_graph_wrlock(NULL);
     bdrv_replace_child_noperm(child, NULL);
-    bdrv_graph_wrunlock();
     bdrv_child_free(child);
 
     if (child_bs) {
@@ -3264,6 +3266,7 @@ void bdrv_root_unref_child(BdrvChild *child)
                                     NULL);
     }
 
+    bdrv_graph_wrunlock();
     bdrv_unref(child_bs);
 }
 
@@ -4583,7 +4586,10 @@ int bdrv_reopen_multiple(BlockReopenQueue *bs_queue, Error **errp)
      * reconfiguring the fd and that's why it does it in raw_check_perm(), not
      * in raw_reopen_prepare() which is called with "old" permissions.
      */
+    bdrv_graph_rdlock_main_loop();
     ret = bdrv_list_refresh_perms(refresh_list, bs_queue, tran, errp);
+    bdrv_graph_rdunlock_main_loop();
+
     if (ret < 0) {
         goto abort;
     }
@@ -6822,6 +6828,7 @@ int bdrv_activate(BlockDriverState *bs, Error **errp)
     BdrvDirtyBitmap *bm;
 
     GLOBAL_STATE_CODE();
+    GRAPH_RDLOCK_GUARD_MAINLOOP();
 
     if (!bs->drv)  {
         return -ENOMEDIUM;
@@ -6952,6 +6959,7 @@ static int bdrv_inactivate_recurse(BlockDriverState *bs)
     uint64_t cumulative_perms, cumulative_shared_perms;
 
     GLOBAL_STATE_CODE();
+    GRAPH_RDLOCK_GUARD_MAINLOOP();
 
     if (!bs->drv) {
         return -ENOMEDIUM;
diff --git a/block/block-backend.c b/block/block-backend.c
index 4009ed5fed..8d0282a5d9 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -121,6 +121,10 @@ static QTAILQ_HEAD(, BlockBackend) block_backends =
 static QTAILQ_HEAD(, BlockBackend) monitor_block_backends =
     QTAILQ_HEAD_INITIALIZER(monitor_block_backends);
 
+static int coroutine_mixed_fn GRAPH_RDLOCK
+blk_set_perm_locked(BlockBackend *blk, uint64_t perm, uint64_t shared_perm,
+                    Error **errp);
+
 static void blk_root_inherit_options(BdrvChildRole role, bool parent_is_format,
                                      int *child_flags, QDict *child_options,
                                      int parent_flags, QDict *parent_options)
@@ -186,7 +190,7 @@ static void blk_vm_state_changed(void *opaque, bool running, RunState state)
  *
  * If an error is returned, the VM cannot be allowed to be resumed.
  */
-static void blk_root_activate(BdrvChild *child, Error **errp)
+static void GRAPH_RDLOCK blk_root_activate(BdrvChild *child, Error **errp)
 {
     BlockBackend *blk = child->opaque;
     Error *local_err = NULL;
@@ -207,7 +211,7 @@ static void blk_root_activate(BdrvChild *child, Error **errp)
      */
     saved_shared_perm = blk->shared_perm;
 
-    blk_set_perm(blk, blk->perm, BLK_PERM_ALL, &local_err);
+    blk_set_perm_locked(blk, blk->perm, BLK_PERM_ALL, &local_err);
     if (local_err) {
         error_propagate(errp, local_err);
         blk->disable_perm = true;
@@ -226,7 +230,7 @@ static void blk_root_activate(BdrvChild *child, Error **errp)
         return;
     }
 
-    blk_set_perm(blk, blk->perm, blk->shared_perm, &local_err);
+    blk_set_perm_locked(blk, blk->perm, blk->shared_perm, &local_err);
     if (local_err) {
         error_propagate(errp, local_err);
         blk->disable_perm = true;
@@ -259,7 +263,7 @@ static bool blk_can_inactivate(BlockBackend *blk)
     return blk->force_allow_inactivate;
 }
 
-static int blk_root_inactivate(BdrvChild *child)
+static int GRAPH_RDLOCK blk_root_inactivate(BdrvChild *child)
 {
     BlockBackend *blk = child->opaque;
 
@@ -953,8 +957,9 @@ int blk_replace_bs(BlockBackend *blk, BlockDriverState *new_bs, Error **errp)
 /*
  * Sets the permission bitmasks that the user of the BlockBackend needs.
  */
-int blk_set_perm(BlockBackend *blk, uint64_t perm, uint64_t shared_perm,
-                 Error **errp)
+static int coroutine_mixed_fn GRAPH_RDLOCK
+blk_set_perm_locked(BlockBackend *blk, uint64_t perm, uint64_t shared_perm,
+                    Error **errp)
 {
     int ret;
     GLOBAL_STATE_CODE();
@@ -972,6 +977,15 @@ int blk_set_perm(BlockBackend *blk, uint64_t perm, uint64_t shared_perm,
     return 0;
 }
 
+int blk_set_perm(BlockBackend *blk, uint64_t perm, uint64_t shared_perm,
+                 Error **errp)
+{
+    GLOBAL_STATE_CODE();
+    GRAPH_RDLOCK_GUARD_MAINLOOP();
+
+    return blk_set_perm_locked(blk, perm, shared_perm, errp);
+}
+
 void blk_get_perm(BlockBackend *blk, uint64_t *perm, uint64_t *shared_perm)
 {
     GLOBAL_STATE_CODE();
diff --git a/block/crypto.c b/block/crypto.c
index 6ee8d46d30..c9c9a39fa3 100644
--- a/block/crypto.c
+++ b/block/crypto.c
@@ -777,7 +777,7 @@ block_crypto_get_specific_info_luks(BlockDriverState *bs, Error **errp)
     return spec_info;
 }
 
-static int
+static int GRAPH_RDLOCK
 block_crypto_amend_prepare(BlockDriverState *bs, Error **errp)
 {
     BlockCrypto *crypto = bs->opaque;
@@ -793,7 +793,7 @@ block_crypto_amend_prepare(BlockDriverState *bs, Error **errp)
     return ret;
 }
 
-static void
+static void GRAPH_RDLOCK
 block_crypto_amend_cleanup(BlockDriverState *bs)
 {
     BlockCrypto *crypto = bs->opaque;
@@ -841,6 +841,8 @@ block_crypto_amend_options_luks(BlockDriverState *bs,
     QCryptoBlockAmendOptions *amend_options = NULL;
     int ret = -EINVAL;
 
+    assume_graph_lock(); /* FIXME */
+
     assert(crypto);
     assert(crypto->block);
 
diff --git a/block/mirror.c b/block/mirror.c
index d3cacd1708..13c7340ac2 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -702,8 +702,12 @@ static int mirror_exit_common(Job *job)
      * mirror_top_bs from now on, so keep it drained. */
     bdrv_drained_begin(mirror_top_bs);
     bs_opaque->stop = true;
+
+    bdrv_graph_rdlock_main_loop();
     bdrv_child_refresh_perms(mirror_top_bs, mirror_top_bs->backing,
                              &error_abort);
+    bdrv_graph_rdunlock_main_loop();
+
     if (!abort && s->backing_mode == MIRROR_SOURCE_BACKING_CHAIN) {
         BlockDriverState *backing = s->is_none_mode ? src : s->base;
         BlockDriverState *unfiltered_target = bdrv_skip_filters(target_bs);
@@ -1670,6 +1674,8 @@ static BlockJob *mirror_start_job(
     uint64_t target_perms, target_shared_perms;
     int ret;
 
+    GLOBAL_STATE_CODE();
+
     if (granularity == 0) {
         granularity = bdrv_get_default_bitmap_granularity(target);
     }
@@ -1906,8 +1912,10 @@ fail:
     }
 
     bs_opaque->stop = true;
+    bdrv_graph_rdlock_main_loop();
     bdrv_child_refresh_perms(mirror_top_bs, mirror_top_bs->backing,
                              &error_abort);
+    bdrv_graph_rdunlock_main_loop();
     bdrv_replace_node(mirror_top_bs, mirror_top_bs->backing->bs, &error_abort);
 
     bdrv_unref(mirror_top_bs);
diff --git a/block/vmdk.c b/block/vmdk.c
index 70066c2b01..e4f8acdf25 100644
--- a/block/vmdk.c
+++ b/block/vmdk.c
@@ -1309,6 +1309,8 @@ static int vmdk_open(BlockDriverState *bs, QDict *options, int flags,
     BDRVVmdkState *s = bs->opaque;
     uint32_t magic;
 
+    GRAPH_RDLOCK_GUARD_MAINLOOP();
+
     ret = bdrv_open_file_child(NULL, options, "file", bs, errp);
     if (ret < 0) {
         return ret;
diff --git a/tests/unit/test-bdrv-graph-mod.c b/tests/unit/test-bdrv-graph-mod.c
index 65811e5520..7007fe9a59 100644
--- a/tests/unit/test-bdrv-graph-mod.c
+++ b/tests/unit/test-bdrv-graph-mod.c
@@ -372,6 +372,9 @@ static void test_parallel_perm_update(void)
 
     /* Select fl1 as first child to be active */
     s->selected = c_fl1;
+
+    bdrv_graph_rdlock_main_loop();
+
     bdrv_child_refresh_perms(top, top->children.lh_first, &error_abort);
 
     assert(c_fl1->perm & BLK_PERM_WRITE);
@@ -391,6 +394,7 @@ static void test_parallel_perm_update(void)
     assert(c_fl1->perm & BLK_PERM_WRITE);
     assert(!(c_fl2->perm & BLK_PERM_WRITE));
 
+    bdrv_graph_rdunlock_main_loop();
     bdrv_unref(top);
 }
 
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PATCH 14/21] block: Mark bdrv_get_cumulative_perm() and callers GRAPH_RDLOCK
  2023-08-17 12:49 [PATCH 00/21] Graph locking part 4 (node management) Kevin Wolf
                   ` (12 preceding siblings ...)
  2023-08-17 12:50 ` [PATCH 13/21] block: Mark bdrv_parent_perms_conflict() and callers GRAPH_RDLOCK Kevin Wolf
@ 2023-08-17 12:50 ` Kevin Wolf
  2023-08-21  7:35   ` Emanuele Giuseppe Esposito
  2023-08-22 19:28   ` Stefan Hajnoczi
  2023-08-17 12:50 ` [PATCH 15/21] block: Mark bdrv_child_perm() GRAPH_RDLOCK Kevin Wolf
                   ` (7 subsequent siblings)
  21 siblings, 2 replies; 70+ messages in thread
From: Kevin Wolf @ 2023-08-17 12:50 UTC (permalink / raw)
  To: qemu-block
  Cc: kwolf, stefanha, eblake, eesposit, pbonzini, vsementsov, qemu-devel

The function reads the parents list, so it needs to hold the graph lock.

This happens to result in BlockDriver.bdrv_set_perm() to be called with
the graph lock held. For consistency, make it the same for all of the
BlockDriver callbacks for updating permissions and annotate the function
pointers with GRAPH_RDLOCK_PTR.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 include/block/block_int-common.h       |  9 ++++---
 include/block/block_int-global-state.h |  4 +--
 block.c                                | 35 ++++++++++++++++++++------
 blockdev.c                             |  6 +++++
 4 files changed, 40 insertions(+), 14 deletions(-)

diff --git a/include/block/block_int-common.h b/include/block/block_int-common.h
index 5738451a0a..f183982e67 100644
--- a/include/block/block_int-common.h
+++ b/include/block/block_int-common.h
@@ -413,8 +413,8 @@ struct BlockDriver {
      * If both conditions are met, 0 is returned. Otherwise, -errno is returned
      * and errp is set to an error describing the conflict.
      */
-    int (*bdrv_check_perm)(BlockDriverState *bs, uint64_t perm,
-                           uint64_t shared, Error **errp);
+    int GRAPH_RDLOCK_PTR (*bdrv_check_perm)(BlockDriverState *bs, uint64_t perm,
+                                            uint64_t shared, Error **errp);
 
     /**
      * Called to inform the driver that the set of cumulative set of used
@@ -426,7 +426,8 @@ struct BlockDriver {
      * This function is only invoked after bdrv_check_perm(), so block drivers
      * may rely on preparations made in their .bdrv_check_perm implementation.
      */
-    void (*bdrv_set_perm)(BlockDriverState *bs, uint64_t perm, uint64_t shared);
+    void GRAPH_RDLOCK_PTR (*bdrv_set_perm)(
+        BlockDriverState *bs, uint64_t perm, uint64_t shared);
 
     /*
      * Called to inform the driver that after a previous bdrv_check_perm()
@@ -436,7 +437,7 @@ struct BlockDriver {
      * This function can be called even for nodes that never saw a
      * bdrv_check_perm() call. It is a no-op then.
      */
-    void (*bdrv_abort_perm_update)(BlockDriverState *bs);
+    void GRAPH_RDLOCK_PTR (*bdrv_abort_perm_update)(BlockDriverState *bs);
 
     /**
      * Returns in @nperm and @nshared the permissions that the driver for @bs
diff --git a/include/block/block_int-global-state.h b/include/block/block_int-global-state.h
index bebcc08bce..e2304db58b 100644
--- a/include/block/block_int-global-state.h
+++ b/include/block/block_int-global-state.h
@@ -204,8 +204,8 @@ BdrvChild *bdrv_root_attach_child(BlockDriverState *child_bs,
                                   void *opaque, Error **errp);
 void bdrv_root_unref_child(BdrvChild *child);
 
-void bdrv_get_cumulative_perm(BlockDriverState *bs, uint64_t *perm,
-                              uint64_t *shared_perm);
+void GRAPH_RDLOCK bdrv_get_cumulative_perm(BlockDriverState *bs, uint64_t *perm,
+                                           uint64_t *shared_perm);
 
 /**
  * Sets a BdrvChild's permissions.  Avoid if the parent is a BDS; use
diff --git a/block.c b/block.c
index 66de57aa9b..e20a4710da 100644
--- a/block.c
+++ b/block.c
@@ -2318,7 +2318,7 @@ static void bdrv_child_set_perm(BdrvChild *c, uint64_t perm,
     tran_add(tran, &bdrv_child_set_pem_drv, s);
 }
 
-static void bdrv_drv_set_perm_commit(void *opaque)
+static void GRAPH_RDLOCK bdrv_drv_set_perm_commit(void *opaque)
 {
     BlockDriverState *bs = opaque;
     uint64_t cumulative_perms, cumulative_shared_perms;
@@ -2331,7 +2331,7 @@ static void bdrv_drv_set_perm_commit(void *opaque)
     }
 }
 
-static void bdrv_drv_set_perm_abort(void *opaque)
+static void GRAPH_RDLOCK bdrv_drv_set_perm_abort(void *opaque)
 {
     BlockDriverState *bs = opaque;
     GLOBAL_STATE_CODE();
@@ -2346,9 +2346,13 @@ TransactionActionDrv bdrv_drv_set_perm_drv = {
     .commit = bdrv_drv_set_perm_commit,
 };
 
-static int bdrv_drv_set_perm(BlockDriverState *bs, uint64_t perm,
-                             uint64_t shared_perm, Transaction *tran,
-                             Error **errp)
+/*
+ * After calling this function, the transaction @tran may only be completed
+ * while holding a reader lock for the graph.
+ */
+static int GRAPH_RDLOCK
+bdrv_drv_set_perm(BlockDriverState *bs, uint64_t perm, uint64_t shared_perm,
+                  Transaction *tran, Error **errp)
 {
     GLOBAL_STATE_CODE();
     if (!bs->drv) {
@@ -2455,9 +2459,13 @@ bdrv_replace_child_tran(BdrvChild *child, BlockDriverState *new_bs,
 /*
  * Refresh permissions in @bs subtree. The function is intended to be called
  * after some graph modification that was done without permission update.
+ *
+ * After calling this function, the transaction @tran may only be completed
+ * while holding a reader lock for the graph.
  */
-static int bdrv_node_refresh_perm(BlockDriverState *bs, BlockReopenQueue *q,
-                                  Transaction *tran, Error **errp)
+static int GRAPH_RDLOCK
+bdrv_node_refresh_perm(BlockDriverState *bs, BlockReopenQueue *q,
+                       Transaction *tran, Error **errp)
 {
     BlockDriver *drv = bs->drv;
     BdrvChild *c;
@@ -2530,6 +2538,9 @@ static int bdrv_node_refresh_perm(BlockDriverState *bs, BlockReopenQueue *q,
 /*
  * @list is a product of bdrv_topological_dfs() (may be called several times) -
  * a topologically sorted subgraph.
+ *
+ * After calling this function, the transaction @tran may only be completed
+ * while holding a reader lock for the graph.
  */
 static int GRAPH_RDLOCK
 bdrv_do_refresh_perms(GSList *list, BlockReopenQueue *q, Transaction *tran,
@@ -2559,6 +2570,9 @@ bdrv_do_refresh_perms(GSList *list, BlockReopenQueue *q, Transaction *tran,
  * @list is any list of nodes. List is completed by all subtrees and
  * topologically sorted. It's not a problem if some node occurs in the @list
  * several times.
+ *
+ * After calling this function, the transaction @tran may only be completed
+ * while holding a reader lock for the graph.
  */
 static int GRAPH_RDLOCK
 bdrv_list_refresh_perms(GSList *list, BlockReopenQueue *q, Transaction *tran,
@@ -2621,7 +2635,12 @@ char *bdrv_perm_names(uint64_t perm)
 }
 
 
-/* @tran is allowed to be NULL. In this case no rollback is possible */
+/*
+ * @tran is allowed to be NULL. In this case no rollback is possible.
+ *
+ * After calling this function, the transaction @tran may only be completed
+ * while holding a reader lock for the graph.
+ */
 static int GRAPH_RDLOCK
 bdrv_refresh_perms(BlockDriverState *bs, Transaction *tran, Error **errp)
 {
diff --git a/blockdev.c b/blockdev.c
index e6eba61484..372eaf198c 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -1378,6 +1378,9 @@ static void external_snapshot_action(TransactionAction *action,
     AioContext *aio_context;
     uint64_t perm, shared;
 
+    /* TODO We'll eventually have to take a writer lock in this function */
+    GRAPH_RDLOCK_GUARD_MAINLOOP();
+
     tran_add(tran, &external_snapshot_drv, state);
 
     /* 'blockdev-snapshot' and 'blockdev-snapshot-sync' have similar
@@ -2521,6 +2524,9 @@ void qmp_block_commit(const char *job_id, const char *device,
     int job_flags = JOB_DEFAULT;
     uint64_t top_perm, top_shared;
 
+    /* TODO We'll eventually have to take a writer lock in this function */
+    GRAPH_RDLOCK_GUARD_MAINLOOP();
+
     if (!has_speed) {
         speed = 0;
     }
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PATCH 15/21] block: Mark bdrv_child_perm() GRAPH_RDLOCK
  2023-08-17 12:49 [PATCH 00/21] Graph locking part 4 (node management) Kevin Wolf
                   ` (13 preceding siblings ...)
  2023-08-17 12:50 ` [PATCH 14/21] block: Mark bdrv_get_cumulative_perm() " Kevin Wolf
@ 2023-08-17 12:50 ` Kevin Wolf
  2023-08-21  7:35   ` Emanuele Giuseppe Esposito
  2023-08-22 19:29   ` Stefan Hajnoczi
  2023-08-17 12:50 ` [PATCH 16/21] block: Mark bdrv_parent_cb_change_media() GRAPH_RDLOCK Kevin Wolf
                   ` (6 subsequent siblings)
  21 siblings, 2 replies; 70+ messages in thread
From: Kevin Wolf @ 2023-08-17 12:50 UTC (permalink / raw)
  To: qemu-block
  Cc: kwolf, stefanha, eblake, eesposit, pbonzini, vsementsov, qemu-devel

This adds GRAPH_RDLOCK annotations to declare that callers of
bdrv_child_perm() need to hold a reader lock for the graph because
some implementations access the children list of a node.

The callers of bdrv_child_perm() conveniently already hold the lock.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 include/block/block_int-common.h | 10 +++++-----
 block.c                          | 11 ++++++-----
 block/copy-before-write.c        | 10 +++++-----
 3 files changed, 16 insertions(+), 15 deletions(-)

diff --git a/include/block/block_int-common.h b/include/block/block_int-common.h
index f183982e67..9d5002df5a 100644
--- a/include/block/block_int-common.h
+++ b/include/block/block_int-common.h
@@ -451,11 +451,11 @@ struct BlockDriver {
      * permissions, but those that will be needed after applying the
      * @reopen_queue.
      */
-     void (*bdrv_child_perm)(BlockDriverState *bs, BdrvChild *c,
-                             BdrvChildRole role,
-                             BlockReopenQueue *reopen_queue,
-                             uint64_t parent_perm, uint64_t parent_shared,
-                             uint64_t *nperm, uint64_t *nshared);
+     void GRAPH_RDLOCK_PTR (*bdrv_child_perm)(
+        BlockDriverState *bs, BdrvChild *c, BdrvChildRole role,
+        BlockReopenQueue *reopen_queue,
+        uint64_t parent_perm, uint64_t parent_shared,
+        uint64_t *nperm, uint64_t *nshared);
 
     /**
      * Register/unregister a buffer for I/O. For example, when the driver is
diff --git a/block.c b/block.c
index e20a4710da..0a28bfc3c4 100644
--- a/block.c
+++ b/block.c
@@ -2226,11 +2226,12 @@ bdrv_parent_perms_conflict(BlockDriverState *bs, Error **errp)
     return false;
 }
 
-static void bdrv_child_perm(BlockDriverState *bs, BlockDriverState *child_bs,
-                            BdrvChild *c, BdrvChildRole role,
-                            BlockReopenQueue *reopen_queue,
-                            uint64_t parent_perm, uint64_t parent_shared,
-                            uint64_t *nperm, uint64_t *nshared)
+static void GRAPH_RDLOCK
+bdrv_child_perm(BlockDriverState *bs, BlockDriverState *child_bs,
+                BdrvChild *c, BdrvChildRole role,
+                BlockReopenQueue *reopen_queue,
+                uint64_t parent_perm, uint64_t parent_shared,
+                uint64_t *nperm, uint64_t *nshared)
 {
     assert(bs->drv && bs->drv->bdrv_child_perm);
     GLOBAL_STATE_CODE();
diff --git a/block/copy-before-write.c b/block/copy-before-write.c
index b866e42271..6d9c165127 100644
--- a/block/copy-before-write.c
+++ b/block/copy-before-write.c
@@ -341,11 +341,11 @@ static void cbw_refresh_filename(BlockDriverState *bs)
             bs->file->bs->filename);
 }
 
-static void cbw_child_perm(BlockDriverState *bs, BdrvChild *c,
-                           BdrvChildRole role,
-                           BlockReopenQueue *reopen_queue,
-                           uint64_t perm, uint64_t shared,
-                           uint64_t *nperm, uint64_t *nshared)
+static void GRAPH_RDLOCK
+cbw_child_perm(BlockDriverState *bs, BdrvChild *c, BdrvChildRole role,
+               BlockReopenQueue *reopen_queue,
+               uint64_t perm, uint64_t shared,
+               uint64_t *nperm, uint64_t *nshared)
 {
     if (!(role & BDRV_CHILD_FILTERED)) {
         /*
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PATCH 16/21] block: Mark bdrv_parent_cb_change_media() GRAPH_RDLOCK
  2023-08-17 12:49 [PATCH 00/21] Graph locking part 4 (node management) Kevin Wolf
                   ` (14 preceding siblings ...)
  2023-08-17 12:50 ` [PATCH 15/21] block: Mark bdrv_child_perm() GRAPH_RDLOCK Kevin Wolf
@ 2023-08-17 12:50 ` Kevin Wolf
  2023-08-21  7:35   ` Emanuele Giuseppe Esposito
  2023-08-22 19:29   ` Stefan Hajnoczi
  2023-08-17 12:50 ` [PATCH 17/21] block: Take graph rdlock in bdrv_drop_intermediate() Kevin Wolf
                   ` (5 subsequent siblings)
  21 siblings, 2 replies; 70+ messages in thread
From: Kevin Wolf @ 2023-08-17 12:50 UTC (permalink / raw)
  To: qemu-block
  Cc: kwolf, stefanha, eblake, eesposit, pbonzini, vsementsov, qemu-devel

The function reads the parents list, so it needs to hold the graph lock.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 block.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/block.c b/block.c
index 0a28bfc3c4..7df8780d6e 100644
--- a/block.c
+++ b/block.c
@@ -3369,7 +3369,8 @@ void bdrv_unref_child(BlockDriverState *parent, BdrvChild *child)
 }
 
 
-static void bdrv_parent_cb_change_media(BlockDriverState *bs, bool load)
+static void GRAPH_RDLOCK
+bdrv_parent_cb_change_media(BlockDriverState *bs, bool load)
 {
     BdrvChild *c;
     GLOBAL_STATE_CODE();
@@ -3967,6 +3968,9 @@ bdrv_open_inherit(const char *filename, const char *reference, QDict *options,
     GLOBAL_STATE_CODE();
     assert(!qemu_in_coroutine());
 
+    /* TODO We'll eventually have to take a writer lock in this function */
+    GRAPH_RDLOCK_GUARD_MAINLOOP();
+
     if (reference) {
         bool options_non_empty = options ? qdict_size(options) : false;
         qobject_unref(options);
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PATCH 17/21] block: Take graph rdlock in bdrv_drop_intermediate()
  2023-08-17 12:49 [PATCH 00/21] Graph locking part 4 (node management) Kevin Wolf
                   ` (15 preceding siblings ...)
  2023-08-17 12:50 ` [PATCH 16/21] block: Mark bdrv_parent_cb_change_media() GRAPH_RDLOCK Kevin Wolf
@ 2023-08-17 12:50 ` Kevin Wolf
  2023-08-21  7:35   ` Emanuele Giuseppe Esposito
  2023-08-22 19:35   ` Stefan Hajnoczi
  2023-08-17 12:50 ` [PATCH 18/21] block: Take graph rdlock in bdrv_change_aio_context() Kevin Wolf
                   ` (4 subsequent siblings)
  21 siblings, 2 replies; 70+ messages in thread
From: Kevin Wolf @ 2023-08-17 12:50 UTC (permalink / raw)
  To: qemu-block
  Cc: kwolf, stefanha, eblake, eesposit, pbonzini, vsementsov, qemu-devel

The function reads the parents list, so it needs to hold the graph lock.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 block.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/block.c b/block.c
index 7df8780d6e..a82389f742 100644
--- a/block.c
+++ b/block.c
@@ -5934,9 +5934,11 @@ int bdrv_drop_intermediate(BlockDriverState *top, BlockDriverState *base,
         backing_file_str = base->filename;
     }
 
+    bdrv_graph_rdlock_main_loop();
     QLIST_FOREACH(c, &top->parents, next_parent) {
         updated_children = g_slist_prepend(updated_children, c);
     }
+    bdrv_graph_rdunlock_main_loop();
 
     /*
      * It seems correct to pass detach_subchain=true here, but it triggers
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PATCH 18/21] block: Take graph rdlock in bdrv_change_aio_context()
  2023-08-17 12:49 [PATCH 00/21] Graph locking part 4 (node management) Kevin Wolf
                   ` (16 preceding siblings ...)
  2023-08-17 12:50 ` [PATCH 17/21] block: Take graph rdlock in bdrv_drop_intermediate() Kevin Wolf
@ 2023-08-17 12:50 ` Kevin Wolf
  2023-08-21  7:35   ` Emanuele Giuseppe Esposito
  2023-08-17 12:50 ` [PATCH 19/21] block: Mark bdrv_root_unref_child() GRAPH_WRLOCK Kevin Wolf
                   ` (3 subsequent siblings)
  21 siblings, 1 reply; 70+ messages in thread
From: Kevin Wolf @ 2023-08-17 12:50 UTC (permalink / raw)
  To: qemu-block
  Cc: kwolf, stefanha, eblake, eesposit, pbonzini, vsementsov, qemu-devel

The function reads the parents list, so it needs to hold the graph lock.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 block.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/block.c b/block.c
index a82389f742..c0a8460434 100644
--- a/block.c
+++ b/block.c
@@ -7669,17 +7669,21 @@ static bool bdrv_change_aio_context(BlockDriverState *bs, AioContext *ctx,
         return true;
     }
 
+    bdrv_graph_rdlock_main_loop();
     QLIST_FOREACH(c, &bs->parents, next_parent) {
         if (!bdrv_parent_change_aio_context(c, ctx, visited, tran, errp)) {
+            bdrv_graph_rdunlock_main_loop();
             return false;
         }
     }
 
     QLIST_FOREACH(c, &bs->children, next) {
         if (!bdrv_child_change_aio_context(c, ctx, visited, tran, errp)) {
+            bdrv_graph_rdunlock_main_loop();
             return false;
         }
     }
+    bdrv_graph_rdunlock_main_loop();
 
     state = g_new(BdrvStateSetAioContext, 1);
     *state = (BdrvStateSetAioContext) {
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PATCH 19/21] block: Mark bdrv_root_unref_child() GRAPH_WRLOCK
  2023-08-17 12:49 [PATCH 00/21] Graph locking part 4 (node management) Kevin Wolf
                   ` (17 preceding siblings ...)
  2023-08-17 12:50 ` [PATCH 18/21] block: Take graph rdlock in bdrv_change_aio_context() Kevin Wolf
@ 2023-08-17 12:50 ` Kevin Wolf
  2023-08-21  7:35   ` Emanuele Giuseppe Esposito
  2023-08-22 19:38   ` Stefan Hajnoczi
  2023-08-17 12:50 ` [PATCH 20/21] block: Mark bdrv_unref_child() GRAPH_WRLOCK Kevin Wolf
                   ` (2 subsequent siblings)
  21 siblings, 2 replies; 70+ messages in thread
From: Kevin Wolf @ 2023-08-17 12:50 UTC (permalink / raw)
  To: qemu-block
  Cc: kwolf, stefanha, eblake, eesposit, pbonzini, vsementsov, qemu-devel

Instead of taking the writer lock internally, require callers to already
hold it when calling bdrv_root_unref_child(). These callers will
typically already hold the graph lock once the locking work is
completed, which means that they can't call functions that take it
internally.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 include/block/block_int-global-state.h | 2 +-
 block.c                                | 6 +++---
 block/block-backend.c                  | 3 +++
 blockjob.c                             | 2 ++
 4 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/include/block/block_int-global-state.h b/include/block/block_int-global-state.h
index e2304db58b..074b677838 100644
--- a/include/block/block_int-global-state.h
+++ b/include/block/block_int-global-state.h
@@ -202,7 +202,7 @@ BdrvChild *bdrv_root_attach_child(BlockDriverState *child_bs,
                                   BdrvChildRole child_role,
                                   uint64_t perm, uint64_t shared_perm,
                                   void *opaque, Error **errp);
-void bdrv_root_unref_child(BdrvChild *child);
+void GRAPH_WRLOCK bdrv_root_unref_child(BdrvChild *child);
 
 void GRAPH_RDLOCK bdrv_get_cumulative_perm(BlockDriverState *bs, uint64_t *perm,
                                            uint64_t *shared_perm);
diff --git a/block.c b/block.c
index c0a8460434..0bb0c84c0e 100644
--- a/block.c
+++ b/block.c
@@ -3266,7 +3266,6 @@ void bdrv_root_unref_child(BdrvChild *child)
     BlockDriverState *child_bs = child->bs;
 
     GLOBAL_STATE_CODE();
-    bdrv_graph_wrlock(NULL);
     bdrv_replace_child_noperm(child, NULL);
     bdrv_child_free(child);
 
@@ -3286,8 +3285,7 @@ void bdrv_root_unref_child(BdrvChild *child)
                                     NULL);
     }
 
-    bdrv_graph_wrunlock();
-    bdrv_unref(child_bs);
+    bdrv_schedule_unref(child_bs);
 }
 
 typedef struct BdrvSetInheritsFrom {
@@ -3364,8 +3362,10 @@ void bdrv_unref_child(BlockDriverState *parent, BdrvChild *child)
         return;
     }
 
+    bdrv_graph_wrlock(NULL);
     bdrv_unset_inherits_from(parent, child, NULL);
     bdrv_root_unref_child(child);
+    bdrv_graph_wrunlock();
 }
 
 
diff --git a/block/block-backend.c b/block/block-backend.c
index 8d0282a5d9..c2636f4351 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -915,7 +915,10 @@ void blk_remove_bs(BlockBackend *blk)
     blk_drain(blk);
     root = blk->root;
     blk->root = NULL;
+
+    bdrv_graph_wrlock(NULL);
     bdrv_root_unref_child(root);
+    bdrv_graph_wrunlock();
 }
 
 /*
diff --git a/blockjob.c b/blockjob.c
index 25fe8e625d..58c5d64539 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -198,6 +198,7 @@ void block_job_remove_all_bdrv(BlockJob *job)
      * one to make sure that such a concurrent access does not attempt
      * to process an already freed BdrvChild.
      */
+    bdrv_graph_wrlock(NULL);
     while (job->nodes) {
         GSList *l = job->nodes;
         BdrvChild *c = l->data;
@@ -209,6 +210,7 @@ void block_job_remove_all_bdrv(BlockJob *job)
 
         g_slist_free_1(l);
     }
+    bdrv_graph_wrunlock();
 }
 
 bool block_job_has_bdrv(BlockJob *job, BlockDriverState *bs)
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PATCH 20/21] block: Mark bdrv_unref_child() GRAPH_WRLOCK
  2023-08-17 12:49 [PATCH 00/21] Graph locking part 4 (node management) Kevin Wolf
                   ` (18 preceding siblings ...)
  2023-08-17 12:50 ` [PATCH 19/21] block: Mark bdrv_root_unref_child() GRAPH_WRLOCK Kevin Wolf
@ 2023-08-17 12:50 ` Kevin Wolf
  2023-08-21  7:35   ` Emanuele Giuseppe Esposito
  2023-08-22 19:39   ` Stefan Hajnoczi
  2023-08-17 12:50 ` [PATCH 21/21] block: Mark bdrv_add/del_child() and caller GRAPH_WRLOCK Kevin Wolf
  2023-08-22 18:20 ` [PATCH 00/21] Graph locking part 4 (node management) Stefan Hajnoczi
  21 siblings, 2 replies; 70+ messages in thread
From: Kevin Wolf @ 2023-08-17 12:50 UTC (permalink / raw)
  To: qemu-block
  Cc: kwolf, stefanha, eblake, eesposit, pbonzini, vsementsov, qemu-devel

Instead of taking the writer lock internally, require callers to already
hold it when calling bdrv_unref_child(). These callers will typically
already hold the graph lock once the locking work is completed, which
means that they can't call functions that take it internally.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 include/block/block-global-state.h |  7 ++++++-
 block.c                            | 11 +++++++----
 block/blklogwrites.c               |  4 ++++
 block/blkverify.c                  |  2 ++
 block/qcow2.c                      |  4 +++-
 block/quorum.c                     |  6 ++++++
 block/replication.c                |  3 +++
 block/snapshot.c                   |  2 ++
 block/vmdk.c                       | 11 +++++++++++
 tests/unit/test-bdrv-drain.c       |  8 ++++++--
 10 files changed, 50 insertions(+), 8 deletions(-)

diff --git a/include/block/block-global-state.h b/include/block/block-global-state.h
index eb12a35439..0f6df8f1a2 100644
--- a/include/block/block-global-state.h
+++ b/include/block/block-global-state.h
@@ -225,7 +225,12 @@ void bdrv_ref(BlockDriverState *bs);
 void no_coroutine_fn bdrv_unref(BlockDriverState *bs);
 void coroutine_fn no_co_wrapper bdrv_co_unref(BlockDriverState *bs);
 void GRAPH_WRLOCK bdrv_schedule_unref(BlockDriverState *bs);
-void bdrv_unref_child(BlockDriverState *parent, BdrvChild *child);
+
+void GRAPH_WRLOCK
+bdrv_unref_child(BlockDriverState *parent, BdrvChild *child);
+
+void coroutine_fn no_co_wrapper_bdrv_wrlock
+bdrv_co_unref_child(BlockDriverState *parent, BdrvChild *child);
 
 BdrvChild * GRAPH_WRLOCK
 bdrv_attach_child(BlockDriverState *parent_bs,
diff --git a/block.c b/block.c
index 0bb0c84c0e..a3e016f054 100644
--- a/block.c
+++ b/block.c
@@ -1699,7 +1699,9 @@ bdrv_open_driver(BlockDriverState *bs, BlockDriver *drv, const char *node_name,
 open_failed:
     bs->drv = NULL;
     if (bs->file != NULL) {
+        bdrv_graph_wrlock(NULL);
         bdrv_unref_child(bs, bs->file);
+        bdrv_graph_wrunlock();
         assert(!bs->file);
     }
     g_free(bs->opaque);
@@ -3329,8 +3331,9 @@ static void bdrv_set_inherits_from(BlockDriverState *bs,
  * @root that point to @root, where necessary.
  * @tran is allowed to be NULL. In this case no rollback is possible
  */
-static void bdrv_unset_inherits_from(BlockDriverState *root, BdrvChild *child,
-                                     Transaction *tran)
+static void GRAPH_WRLOCK
+bdrv_unset_inherits_from(BlockDriverState *root, BdrvChild *child,
+                         Transaction *tran)
 {
     BdrvChild *c;
 
@@ -3362,10 +3365,8 @@ void bdrv_unref_child(BlockDriverState *parent, BdrvChild *child)
         return;
     }
 
-    bdrv_graph_wrlock(NULL);
     bdrv_unset_inherits_from(parent, child, NULL);
     bdrv_root_unref_child(child);
-    bdrv_graph_wrunlock();
 }
 
 
@@ -5162,9 +5163,11 @@ static void bdrv_close(BlockDriverState *bs)
         bs->drv = NULL;
     }
 
+    bdrv_graph_wrlock(NULL);
     QLIST_FOREACH_SAFE(child, &bs->children, next, next) {
         bdrv_unref_child(bs, child);
     }
+    bdrv_graph_wrunlock();
 
     assert(!bs->backing);
     assert(!bs->file);
diff --git a/block/blklogwrites.c b/block/blklogwrites.c
index 3ea7141cb5..a0d70729bb 100644
--- a/block/blklogwrites.c
+++ b/block/blklogwrites.c
@@ -251,7 +251,9 @@ static int blk_log_writes_open(BlockDriverState *bs, QDict *options, int flags,
     ret = 0;
 fail_log:
     if (ret < 0) {
+        bdrv_graph_wrlock(NULL);
         bdrv_unref_child(bs, s->log_file);
+        bdrv_graph_wrunlock();
         s->log_file = NULL;
     }
 fail:
@@ -263,8 +265,10 @@ static void blk_log_writes_close(BlockDriverState *bs)
 {
     BDRVBlkLogWritesState *s = bs->opaque;
 
+    bdrv_graph_wrlock(NULL);
     bdrv_unref_child(bs, s->log_file);
     s->log_file = NULL;
+    bdrv_graph_wrunlock();
 }
 
 static int64_t coroutine_fn GRAPH_RDLOCK
diff --git a/block/blkverify.c b/block/blkverify.c
index 7326461f30..dae9716a26 100644
--- a/block/blkverify.c
+++ b/block/blkverify.c
@@ -151,8 +151,10 @@ static void blkverify_close(BlockDriverState *bs)
 {
     BDRVBlkverifyState *s = bs->opaque;
 
+    bdrv_graph_wrlock(NULL);
     bdrv_unref_child(bs, s->test_file);
     s->test_file = NULL;
+    bdrv_graph_wrunlock();
 }
 
 static int64_t coroutine_fn GRAPH_RDLOCK
diff --git a/block/qcow2.c b/block/qcow2.c
index c51388e99d..5b9c4526e3 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -1880,7 +1880,7 @@ qcow2_do_open(BlockDriverState *bs, QDict *options, int flags,
     g_free(s->image_data_file);
     if (open_data_file && has_data_file(bs)) {
         bdrv_graph_co_rdunlock();
-        bdrv_unref_child(bs, s->data_file);
+        bdrv_co_unref_child(bs, s->data_file);
         bdrv_graph_co_rdlock();
         s->data_file = NULL;
     }
@@ -2790,7 +2790,9 @@ static void qcow2_do_close(BlockDriverState *bs, bool close_data_file)
     g_free(s->image_backing_format);
 
     if (close_data_file && has_data_file(bs)) {
+        bdrv_graph_wrlock(NULL);
         bdrv_unref_child(bs, s->data_file);
+        bdrv_graph_wrunlock();
         s->data_file = NULL;
     }
 
diff --git a/block/quorum.c b/block/quorum.c
index def0539fda..620a50ba2c 100644
--- a/block/quorum.c
+++ b/block/quorum.c
@@ -1037,12 +1037,14 @@ static int quorum_open(BlockDriverState *bs, QDict *options, int flags,
 
 close_exit:
     /* cleanup on error */
+    bdrv_graph_wrlock(NULL);
     for (i = 0; i < s->num_children; i++) {
         if (!opened[i]) {
             continue;
         }
         bdrv_unref_child(bs, s->children[i]);
     }
+    bdrv_graph_wrunlock();
     g_free(s->children);
     g_free(opened);
 exit:
@@ -1055,9 +1057,11 @@ static void quorum_close(BlockDriverState *bs)
     BDRVQuorumState *s = bs->opaque;
     int i;
 
+    bdrv_graph_wrlock(NULL);
     for (i = 0; i < s->num_children; i++) {
         bdrv_unref_child(bs, s->children[i]);
     }
+    bdrv_graph_wrunlock();
 
     g_free(s->children);
 }
@@ -1147,7 +1151,9 @@ static void quorum_del_child(BlockDriverState *bs, BdrvChild *child,
     memmove(&s->children[i], &s->children[i + 1],
             (s->num_children - i - 1) * sizeof(BdrvChild *));
     s->children = g_renew(BdrvChild *, s->children, --s->num_children);
+    bdrv_graph_wrlock(NULL);
     bdrv_unref_child(bs, child);
+    bdrv_graph_wrunlock();
 
     quorum_refresh_flags(bs);
     bdrv_drained_end(bs);
diff --git a/block/replication.c b/block/replication.c
index eec9819625..dd166d2d82 100644
--- a/block/replication.c
+++ b/block/replication.c
@@ -672,10 +672,13 @@ static void replication_done(void *opaque, int ret)
     if (ret == 0) {
         s->stage = BLOCK_REPLICATION_DONE;
 
+        bdrv_graph_wrlock(NULL);
         bdrv_unref_child(bs, s->secondary_disk);
         s->secondary_disk = NULL;
         bdrv_unref_child(bs, s->hidden_disk);
         s->hidden_disk = NULL;
+        bdrv_graph_wrunlock();
+
         s->error = 0;
     } else {
         s->stage = BLOCK_REPLICATION_FAILOVER_FAILED;
diff --git a/block/snapshot.c b/block/snapshot.c
index e22ac3eac6..b86b5b24ad 100644
--- a/block/snapshot.c
+++ b/block/snapshot.c
@@ -281,7 +281,9 @@ int bdrv_snapshot_goto(BlockDriverState *bs,
         }
 
         /* .bdrv_open() will re-attach it */
+        bdrv_graph_wrlock(NULL);
         bdrv_unref_child(bs, fallback);
+        bdrv_graph_wrunlock();
 
         ret = bdrv_snapshot_goto(fallback_bs, snapshot_id, errp);
         open_ret = drv->bdrv_open(bs, options, bs->open_flags, &local_err);
diff --git a/block/vmdk.c b/block/vmdk.c
index e4f8acdf25..95227afbab 100644
--- a/block/vmdk.c
+++ b/block/vmdk.c
@@ -272,6 +272,7 @@ static void vmdk_free_extents(BlockDriverState *bs)
     BDRVVmdkState *s = bs->opaque;
     VmdkExtent *e;
 
+    bdrv_graph_wrlock(NULL);
     for (i = 0; i < s->num_extents; i++) {
         e = &s->extents[i];
         g_free(e->l1_table);
@@ -282,6 +283,8 @@ static void vmdk_free_extents(BlockDriverState *bs)
             bdrv_unref_child(bs, e->file);
         }
     }
+    bdrv_graph_wrunlock();
+
     g_free(s->extents);
 }
 
@@ -1220,7 +1223,9 @@ static int vmdk_parse_extents(const char *desc, BlockDriverState *bs,
             ret = vmdk_add_extent(bs, extent_file, true, sectors,
                             0, 0, 0, 0, 0, &extent, errp);
             if (ret < 0) {
+                bdrv_graph_wrlock(NULL);
                 bdrv_unref_child(bs, extent_file);
+                bdrv_graph_wrunlock();
                 goto out;
             }
             extent->flat_start_offset = flat_offset << 9;
@@ -1235,20 +1240,26 @@ static int vmdk_parse_extents(const char *desc, BlockDriverState *bs,
             }
             g_free(buf);
             if (ret) {
+                bdrv_graph_wrlock(NULL);
                 bdrv_unref_child(bs, extent_file);
+                bdrv_graph_wrunlock();
                 goto out;
             }
             extent = &s->extents[s->num_extents - 1];
         } else if (!strcmp(type, "SESPARSE")) {
             ret = vmdk_open_se_sparse(bs, extent_file, bs->open_flags, errp);
             if (ret) {
+                bdrv_graph_wrlock(NULL);
                 bdrv_unref_child(bs, extent_file);
+                bdrv_graph_wrunlock();
                 goto out;
             }
             extent = &s->extents[s->num_extents - 1];
         } else {
             error_setg(errp, "Unsupported extent type '%s'", type);
+            bdrv_graph_wrlock(NULL);
             bdrv_unref_child(bs, extent_file);
+            bdrv_graph_wrunlock();
             ret = -ENOTSUP;
             goto out;
         }
diff --git a/tests/unit/test-bdrv-drain.c b/tests/unit/test-bdrv-drain.c
index 6d33249656..b040a73bb9 100644
--- a/tests/unit/test-bdrv-drain.c
+++ b/tests/unit/test-bdrv-drain.c
@@ -967,9 +967,12 @@ typedef struct BDRVTestTopState {
 static void bdrv_test_top_close(BlockDriverState *bs)
 {
     BdrvChild *c, *next_c;
+
+    bdrv_graph_wrlock(NULL);
     QLIST_FOREACH_SAFE(c, &bs->children, next, next_c) {
         bdrv_unref_child(bs, c);
     }
+    bdrv_graph_wrunlock();
 }
 
 static int coroutine_fn GRAPH_RDLOCK
@@ -1024,7 +1027,7 @@ static void coroutine_fn test_co_delete_by_drain(void *opaque)
     } else {
         BdrvChild *c, *next_c;
         QLIST_FOREACH_SAFE(c, &bs->children, next, next_c) {
-            bdrv_unref_child(bs, c);
+            bdrv_co_unref_child(bs, c);
         }
     }
 
@@ -1162,10 +1165,11 @@ static void detach_indirect_bh(void *opaque)
     struct detach_by_parent_data *data = opaque;
 
     bdrv_dec_in_flight(data->child_b->bs);
+
+    bdrv_graph_wrlock(NULL);
     bdrv_unref_child(data->parent_b, data->child_b);
 
     bdrv_ref(data->c);
-    bdrv_graph_wrlock(NULL);
     data->child_c = bdrv_attach_child(data->parent_b, data->c, "PB-C",
                                       &child_of_bds, BDRV_CHILD_DATA,
                                       &error_abort);
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PATCH 21/21] block: Mark bdrv_add/del_child() and caller GRAPH_WRLOCK
  2023-08-17 12:49 [PATCH 00/21] Graph locking part 4 (node management) Kevin Wolf
                   ` (19 preceding siblings ...)
  2023-08-17 12:50 ` [PATCH 20/21] block: Mark bdrv_unref_child() GRAPH_WRLOCK Kevin Wolf
@ 2023-08-17 12:50 ` Kevin Wolf
  2023-08-21  7:35   ` Emanuele Giuseppe Esposito
  2023-08-22 19:40   ` Stefan Hajnoczi
  2023-08-22 18:20 ` [PATCH 00/21] Graph locking part 4 (node management) Stefan Hajnoczi
  21 siblings, 2 replies; 70+ messages in thread
From: Kevin Wolf @ 2023-08-17 12:50 UTC (permalink / raw)
  To: qemu-block
  Cc: kwolf, stefanha, eblake, eesposit, pbonzini, vsementsov, qemu-devel

The functions read the parents list in the generic block layer, so we
need to hold the graph lock already there. The BlockDriver
implementations actually modify the graph, so it has to be a writer
lock.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 include/block/block-global-state.h |  8 +++++---
 include/block/block_int-common.h   |  9 +++++----
 block/quorum.c                     | 23 ++++++-----------------
 blockdev.c                         | 17 +++++++++++------
 4 files changed, 27 insertions(+), 30 deletions(-)

diff --git a/include/block/block-global-state.h b/include/block/block-global-state.h
index 0f6df8f1a2..f31660c7b1 100644
--- a/include/block/block-global-state.h
+++ b/include/block/block-global-state.h
@@ -276,9 +276,11 @@ int bdrv_try_change_aio_context(BlockDriverState *bs, AioContext *ctx,
 int bdrv_probe_blocksizes(BlockDriverState *bs, BlockSizes *bsz);
 int bdrv_probe_geometry(BlockDriverState *bs, HDGeometry *geo);
 
-void bdrv_add_child(BlockDriverState *parent, BlockDriverState *child,
-                    Error **errp);
-void bdrv_del_child(BlockDriverState *parent, BdrvChild *child, Error **errp);
+void GRAPH_WRLOCK
+bdrv_add_child(BlockDriverState *parent, BlockDriverState *child, Error **errp);
+
+void GRAPH_WRLOCK
+bdrv_del_child(BlockDriverState *parent, BdrvChild *child, Error **errp);
 
 /**
  *
diff --git a/include/block/block_int-common.h b/include/block/block_int-common.h
index 9d5002df5a..277168b973 100644
--- a/include/block/block_int-common.h
+++ b/include/block/block_int-common.h
@@ -393,10 +393,11 @@ struct BlockDriver {
      */
     int (*bdrv_probe_geometry)(BlockDriverState *bs, HDGeometry *geo);
 
-    void (*bdrv_add_child)(BlockDriverState *parent, BlockDriverState *child,
-                           Error **errp);
-    void (*bdrv_del_child)(BlockDriverState *parent, BdrvChild *child,
-                           Error **errp);
+    void GRAPH_WRLOCK_PTR (*bdrv_add_child)(
+        BlockDriverState *parent, BlockDriverState *child, Error **errp);
+
+    void GRAPH_WRLOCK_PTR (*bdrv_del_child)(
+        BlockDriverState *parent, BdrvChild *child, Error **errp);
 
     /**
      * Informs the block driver that a permission change is intended. The
diff --git a/block/quorum.c b/block/quorum.c
index 620a50ba2c..05220cab7f 100644
--- a/block/quorum.c
+++ b/block/quorum.c
@@ -1066,8 +1066,8 @@ static void quorum_close(BlockDriverState *bs)
     g_free(s->children);
 }
 
-static void quorum_add_child(BlockDriverState *bs, BlockDriverState *child_bs,
-                             Error **errp)
+static void GRAPH_WRLOCK
+quorum_add_child(BlockDriverState *bs, BlockDriverState *child_bs, Error **errp)
 {
     BDRVQuorumState *s = bs->opaque;
     BdrvChild *child;
@@ -1093,29 +1093,22 @@ static void quorum_add_child(BlockDriverState *bs, BlockDriverState *child_bs,
     }
     s->next_child_index++;
 
-    bdrv_drained_begin(bs);
-
     /* We can safely add the child now */
     bdrv_ref(child_bs);
 
-    bdrv_graph_wrlock(child_bs);
     child = bdrv_attach_child(bs, child_bs, indexstr, &child_of_bds,
                               BDRV_CHILD_DATA, errp);
-    bdrv_graph_wrunlock();
     if (child == NULL) {
         s->next_child_index--;
-        goto out;
+        return;
     }
     s->children = g_renew(BdrvChild *, s->children, s->num_children + 1);
     s->children[s->num_children++] = child;
     quorum_refresh_flags(bs);
-
-out:
-    bdrv_drained_end(bs);
 }
 
-static void quorum_del_child(BlockDriverState *bs, BdrvChild *child,
-                             Error **errp)
+static void GRAPH_WRLOCK
+quorum_del_child(BlockDriverState *bs, BdrvChild *child, Error **errp)
 {
     BDRVQuorumState *s = bs->opaque;
     char indexstr[INDEXSTR_LEN];
@@ -1145,18 +1138,14 @@ static void quorum_del_child(BlockDriverState *bs, BdrvChild *child,
         s->next_child_index--;
     }
 
-    bdrv_drained_begin(bs);
-
     /* We can safely remove this child now */
     memmove(&s->children[i], &s->children[i + 1],
             (s->num_children - i - 1) * sizeof(BdrvChild *));
     s->children = g_renew(BdrvChild *, s->children, --s->num_children);
-    bdrv_graph_wrlock(NULL);
+
     bdrv_unref_child(bs, child);
-    bdrv_graph_wrunlock();
 
     quorum_refresh_flags(bs);
-    bdrv_drained_end(bs);
 }
 
 static void quorum_gather_child_options(BlockDriverState *bs, QDict *target,
diff --git a/blockdev.c b/blockdev.c
index 372eaf198c..325b7a3bef 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -3545,8 +3545,8 @@ out:
     aio_context_release(aio_context);
 }
 
-static BdrvChild *bdrv_find_child(BlockDriverState *parent_bs,
-                                  const char *child_name)
+static BdrvChild * GRAPH_RDLOCK
+bdrv_find_child(BlockDriverState *parent_bs, const char *child_name)
 {
     BdrvChild *child;
 
@@ -3565,9 +3565,11 @@ void qmp_x_blockdev_change(const char *parent, const char *child,
     BlockDriverState *parent_bs, *new_bs = NULL;
     BdrvChild *p_child;
 
+    bdrv_graph_wrlock(NULL);
+
     parent_bs = bdrv_lookup_bs(parent, parent, errp);
     if (!parent_bs) {
-        return;
+        goto out;
     }
 
     if (!child == !node) {
@@ -3576,7 +3578,7 @@ void qmp_x_blockdev_change(const char *parent, const char *child,
         } else {
             error_setg(errp, "Either child or node must be specified");
         }
-        return;
+        goto out;
     }
 
     if (child) {
@@ -3584,7 +3586,7 @@ void qmp_x_blockdev_change(const char *parent, const char *child,
         if (!p_child) {
             error_setg(errp, "Node '%s' does not have child '%s'",
                        parent, child);
-            return;
+            goto out;
         }
         bdrv_del_child(parent_bs, p_child, errp);
     }
@@ -3593,10 +3595,13 @@ void qmp_x_blockdev_change(const char *parent, const char *child,
         new_bs = bdrv_find_node(node);
         if (!new_bs) {
             error_setg(errp, "Node '%s' not found", node);
-            return;
+            goto out;
         }
         bdrv_add_child(parent_bs, new_bs, errp);
     }
+
+out:
+    bdrv_graph_wrunlock();
 }
 
 BlockJobInfoList *qmp_query_block_jobs(Error **errp)
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 70+ messages in thread

* Re: [PATCH 01/21] block: Remove unused BlockReopenQueueEntry.perms_checked
  2023-08-17 12:50 ` [PATCH 01/21] block: Remove unused BlockReopenQueueEntry.perms_checked Kevin Wolf
@ 2023-08-17 19:18   ` Eric Blake
  2023-08-21  7:34   ` Emanuele Giuseppe Esposito
  2023-08-22 18:23   ` Stefan Hajnoczi
  2 siblings, 0 replies; 70+ messages in thread
From: Eric Blake @ 2023-08-17 19:18 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: qemu-block, stefanha, eesposit, pbonzini, vsementsov, qemu-devel

On Thu, Aug 17, 2023 at 02:50:00PM +0200, Kevin Wolf wrote:
> This field has been unused since commit 72373e40fbc ('block:
> bdrv_reopen_multiple: refresh permissions on updated graph').
> Remove it.
> 
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> ---
>  block.c | 1 -
>  1 file changed, 1 deletion(-)

Reviewed-by: Eric Blake <eblake@redhat.com.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.
Virtualization:  qemu.org | libguestfs.org



^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 02/21] preallocate: Factor out preallocate_truncate_to_real_size()
  2023-08-17 12:50 ` [PATCH 02/21] preallocate: Factor out preallocate_truncate_to_real_size() Kevin Wolf
@ 2023-08-18 15:23   ` Eric Blake
  2023-08-21  7:34   ` Emanuele Giuseppe Esposito
  2023-08-22 18:26   ` Stefan Hajnoczi
  2 siblings, 0 replies; 70+ messages in thread
From: Eric Blake @ 2023-08-18 15:23 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: qemu-block, stefanha, eesposit, pbonzini, vsementsov, qemu-devel

On Thu, Aug 17, 2023 at 02:50:01PM +0200, Kevin Wolf wrote:
> It's essentially the same code in preallocate_check_perm() and
> preallocate_close(), except that the latter ignores errors.
> 
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> ---
>  block/preallocate.c | 48 +++++++++++++++++++++------------------------
>  1 file changed, 22 insertions(+), 26 deletions(-)
> 

Reviewed-by: Eric Blake <eblake@redhat.com>

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.
Virtualization:  qemu.org | libguestfs.org



^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 03/21] preallocate: Don't poll during permission updates
  2023-08-17 12:50 ` [PATCH 03/21] preallocate: Don't poll during permission updates Kevin Wolf
@ 2023-08-18 15:34   ` Eric Blake
  2023-08-22 18:41   ` Stefan Hajnoczi
  1 sibling, 0 replies; 70+ messages in thread
From: Eric Blake @ 2023-08-18 15:34 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: qemu-block, stefanha, eesposit, pbonzini, vsementsov, qemu-devel

On Thu, Aug 17, 2023 at 02:50:02PM +0200, Kevin Wolf wrote:
> When the permission related BlockDriver callbacks are called, we are in
> the middle of an operation traversing the block graph. Polling in such a
> place is a very bad idea because the graph could change in unexpected
> ways. In the future, callers will also hold the graph lock, which is
> likely to turn polling into a deadlock.

One I'm sure you encountered before writing this patch ;)

> 
> So we need to get rid of calls to functions like bdrv_getlength() or
> bdrv_truncate() there as these functions poll internally. They are
> currently used so that when no parent has write/resize permissions on
> the image any more, the preallocate filter drops the extra preallocated
> area in the image file and gives up write/resize permissions itself.

Sounds like a sane plan, and the patch matches the implementation
spelled out here.

> 
> In order to achieve this without polling in .bdrv_check_perm, don't
> immediately truncate the image, but only schedule a BH to do so. The
> filter keeps the write/resize permissions a bit longer now until the BH
> has executed.
> 
> There is one case in which delaying doesn't work: Reopening the image
> read-only. In this case, bs->file will likely be reopened read-only,
> too, so keeping write permissions a bit longer on it doesn't work. But
> we can already cover this case in preallocate_reopen_prepare() and not
> rely on the permission updates for it.
> 
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> ---
>  block/preallocate.c | 89 +++++++++++++++++++++++++++++++++++----------
>  1 file changed, 69 insertions(+), 20 deletions(-)
>  

Reviewed-by: Eric Blake <eblake@redhat.com>

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.
Virtualization:  qemu.org | libguestfs.org



^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 04/21] block: Take AioContext lock for bdrv_append() more consistently
  2023-08-17 12:50 ` [PATCH 04/21] block: Take AioContext lock for bdrv_append() more consistently Kevin Wolf
@ 2023-08-18 16:07   ` Eric Blake
  2023-08-21  7:34   ` Emanuele Giuseppe Esposito
  2023-08-22 18:49   ` Stefan Hajnoczi
  2 siblings, 0 replies; 70+ messages in thread
From: Eric Blake @ 2023-08-18 16:07 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: qemu-block, stefanha, eesposit, pbonzini, vsementsov, qemu-devel

On Thu, Aug 17, 2023 at 02:50:03PM +0200, Kevin Wolf wrote:
> The documentation for bdrv_append() says that the caller must hold the
> AioContext lock for bs_top. Change all callers to actually adhere to the
> contract.
> 
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> ---
>  tests/unit/test-bdrv-drain.c     | 3 +++
>  tests/unit/test-bdrv-graph-mod.c | 6 ++++++
>  tests/unit/test-block-iothread.c | 3 +++
>  3 files changed, 12 insertions(+)
> 

Reviewed-by: Eric Blake <eblake@redhat.com>

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.
Virtualization:  qemu.org | libguestfs.org



^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 05/21] block: Introduce bdrv_schedule_unref()
  2023-08-17 12:50 ` [PATCH 05/21] block: Introduce bdrv_schedule_unref() Kevin Wolf
@ 2023-08-18 16:23   ` Eric Blake
  2023-08-18 16:26     ` Eric Blake
  2023-08-22 19:01   ` Stefan Hajnoczi
  1 sibling, 1 reply; 70+ messages in thread
From: Eric Blake @ 2023-08-18 16:23 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: qemu-block, stefanha, eesposit, pbonzini, vsementsov, qemu-devel

On Thu, Aug 17, 2023 at 02:50:04PM +0200, Kevin Wolf wrote:
> bdrv_unref() is called by a lot of places that need to hold the graph
> lock (it naturally happens in the context of operations that change the
> graph). However, bdrv_unref() takes the graph writer lock internally, so
> it can't actually be called while already holding a graph lock without
> causing a deadlock.
> 
> bdrv_unref() also can't just become GRAPH_WRLOCK because it drains the
> node before closing it, and draining requires that the graph is
> unlocked.
> 
> The solution is to defer deleting the node until we don't hold the lock
> any more and draining is possible again.
> 
> Note that keeping images open for longer than necessary can create
> problems, too: You can't open an image again before it is really closed
> (if image locking didn't prevent it, it would cause corruption).
> Reopening an image immediately happens at least during bdrv_open() and
> bdrv_co_create().
> 
> In order to solve this problem, make sure to run the deferred unref in
> bdrv_graph_wrunlock(), i.e. the first possible place where we can drain
> again. This is also why bdrv_schedule_unref() is marked GRAPH_WRLOCK.
> 
> The output of iotest 051 is updated because the additional polling
> changes the order of HMP output, resulting in a new "(qemu)" prompt in
> the test output that was previously on a separate line and filtered out.
> 
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> ---

> +++ b/block/graph-lock.c
> @@ -163,17 +163,26 @@ void bdrv_graph_wrlock(BlockDriverState *bs)
>  void bdrv_graph_wrunlock(void)
>  {
>      GLOBAL_STATE_CODE();
> -    QEMU_LOCK_GUARD(&aio_context_list_lock);
>      assert(qatomic_read(&has_writer));
>  
> +    WITH_QEMU_LOCK_GUARD(&aio_context_list_lock) {
> +        /*
> +         * No need for memory barriers, this works in pair with
> +         * the slow path of rdlock() and both take the lock.
> +         */
> +        qatomic_store_release(&has_writer, 0);
> +
> +        /* Wake up all coroutine that are waiting to read the graph */
> +        qemu_co_enter_all(&reader_queue, &aio_context_list_lock);

So if I understand coroutines correctly, this says all pending
coroutines are now scheduled to run (or maybe they do try to run here,
but then immediately return control back to this coroutine to await
the right lock conditions since we are still in the block guarded by
list_lock)...

> +    }
> +
>      /*
> -     * No need for memory barriers, this works in pair with
> -     * the slow path of rdlock() and both take the lock.
> +     * Run any BHs that were scheduled during the wrlock section and that
> +     * callers might expect to have finished (e.g. bdrv_unref() calls). Do this
> +     * only after restarting coroutines so that nested event loops in BHs don't
> +     * deadlock if their condition relies on the coroutine making progress.
>       */
> -    qatomic_store_release(&has_writer, 0);
> -
> -    /* Wake up all coroutine that are waiting to read the graph */
> -    qemu_co_enter_all(&reader_queue, &aio_context_list_lock);
> +    aio_bh_poll(qemu_get_aio_context());

...and as long as the other coroutines sharing this thread don't
actually get to make progress until the next point at which the
current coroutine yields, and as long as our aio_bh_poll() doesn't
also include a yield point, then we are ensured that the BH has
completed before the next yield point in our caller.

There are times (like today) where I'm still trying to wrap my mind
about the subtle differences between true multi-threading
vs. cooperative coroutines sharing a single thread via the use of
yield points.  coroutines are cool when they can get rid of some of
the races that you have to worry about in true multi-threading.

Reviewed-by: Eric Blake <eblake@redhat.com>

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.
Virtualization:  qemu.org | libguestfs.org



^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 05/21] block: Introduce bdrv_schedule_unref()
  2023-08-18 16:23   ` Eric Blake
@ 2023-08-18 16:26     ` Eric Blake
  2023-08-21 16:59       ` Kevin Wolf
  0 siblings, 1 reply; 70+ messages in thread
From: Eric Blake @ 2023-08-18 16:26 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: qemu-block, stefanha, eesposit, pbonzini, vsementsov, qemu-devel

On Fri, Aug 18, 2023 at 11:24:00AM -0500, Eric Blake wrote:
> > +++ b/block/graph-lock.c
> > @@ -163,17 +163,26 @@ void bdrv_graph_wrlock(BlockDriverState *bs)
> >  void bdrv_graph_wrunlock(void)
> >  {
> >      GLOBAL_STATE_CODE();
> > -    QEMU_LOCK_GUARD(&aio_context_list_lock);
> >      assert(qatomic_read(&has_writer));
> >  
> > +    WITH_QEMU_LOCK_GUARD(&aio_context_list_lock) {
> > +        /*
> > +         * No need for memory barriers, this works in pair with
> > +         * the slow path of rdlock() and both take the lock.
> > +         */
> > +        qatomic_store_release(&has_writer, 0);
> > +
> > +        /* Wake up all coroutine that are waiting to read the graph */
> > +        qemu_co_enter_all(&reader_queue, &aio_context_list_lock);
> 
> So if I understand coroutines correctly, this says all pending
> coroutines are now scheduled to run (or maybe they do try to run here,
> but then immediately return control back to this coroutine to await
> the right lock conditions since we are still in the block guarded by
> list_lock)...
> 
> > +    }
> > +
> >      /*
> > -     * No need for memory barriers, this works in pair with
> > -     * the slow path of rdlock() and both take the lock.
> > +     * Run any BHs that were scheduled during the wrlock section and that
> > +     * callers might expect to have finished (e.g. bdrv_unref() calls). Do this
> > +     * only after restarting coroutines so that nested event loops in BHs don't
> > +     * deadlock if their condition relies on the coroutine making progress.
> >       */
> > -    qatomic_store_release(&has_writer, 0);
> > -
> > -    /* Wake up all coroutine that are waiting to read the graph */
> > -    qemu_co_enter_all(&reader_queue, &aio_context_list_lock);
> > +    aio_bh_poll(qemu_get_aio_context());
> 
> ...and as long as the other coroutines sharing this thread don't
> actually get to make progress until the next point at which the
> current coroutine yields, and as long as our aio_bh_poll() doesn't
> also include a yield point, then we are ensured that the BH has
> completed before the next yield point in our caller.
> 
> There are times (like today) where I'm still trying to wrap my mind
> about the subtle differences between true multi-threading
> vs. cooperative coroutines sharing a single thread via the use of
> yield points.  coroutines are cool when they can get rid of some of
> the races that you have to worry about in true multi-threading.

That said, once we introduce multi-queue, can we ever have a scenario
where a different iothread might be trying to access the graph and
perform a reopen in the time while this thread has not completed the
BH close?  Or is that protected by some other mutual exclusion (where
the only one we have to worry about is reopen by a coroutine in the
same thread, because all other threads are locked out of graph
modifications)?

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.
Virtualization:  qemu.org | libguestfs.org



^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 01/21] block: Remove unused BlockReopenQueueEntry.perms_checked
  2023-08-17 12:50 ` [PATCH 01/21] block: Remove unused BlockReopenQueueEntry.perms_checked Kevin Wolf
  2023-08-17 19:18   ` Eric Blake
@ 2023-08-21  7:34   ` Emanuele Giuseppe Esposito
  2023-08-22 18:23   ` Stefan Hajnoczi
  2 siblings, 0 replies; 70+ messages in thread
From: Emanuele Giuseppe Esposito @ 2023-08-21  7:34 UTC (permalink / raw)
  To: Kevin Wolf, qemu-block; +Cc: stefanha, eblake, pbonzini, vsementsov, qemu-devel



Am 17/08/2023 um 14:50 schrieb Kevin Wolf:
> This field has been unused since commit 72373e40fbc ('block:
> bdrv_reopen_multiple: refresh permissions on updated graph').
> Remove it.
> 
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>

Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>



^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 02/21] preallocate: Factor out preallocate_truncate_to_real_size()
  2023-08-17 12:50 ` [PATCH 02/21] preallocate: Factor out preallocate_truncate_to_real_size() Kevin Wolf
  2023-08-18 15:23   ` Eric Blake
@ 2023-08-21  7:34   ` Emanuele Giuseppe Esposito
  2023-08-22 18:26   ` Stefan Hajnoczi
  2 siblings, 0 replies; 70+ messages in thread
From: Emanuele Giuseppe Esposito @ 2023-08-21  7:34 UTC (permalink / raw)
  To: Kevin Wolf, qemu-block; +Cc: stefanha, eblake, pbonzini, vsementsov, qemu-devel



Am 17/08/2023 um 14:50 schrieb Kevin Wolf:
> It's essentially the same code in preallocate_check_perm() and
> preallocate_close(), except that the latter ignores errors.
> 
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> ---

Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>



^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 04/21] block: Take AioContext lock for bdrv_append() more consistently
  2023-08-17 12:50 ` [PATCH 04/21] block: Take AioContext lock for bdrv_append() more consistently Kevin Wolf
  2023-08-18 16:07   ` Eric Blake
@ 2023-08-21  7:34   ` Emanuele Giuseppe Esposito
  2023-08-22 18:49   ` Stefan Hajnoczi
  2 siblings, 0 replies; 70+ messages in thread
From: Emanuele Giuseppe Esposito @ 2023-08-21  7:34 UTC (permalink / raw)
  To: Kevin Wolf, qemu-block; +Cc: stefanha, eblake, pbonzini, vsementsov, qemu-devel



Am 17/08/2023 um 14:50 schrieb Kevin Wolf:
> The documentation for bdrv_append() says that the caller must hold the
> AioContext lock for bs_top. Change all callers to actually adhere to the
> contract.
> 
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>

Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>



^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 06/21] block-coroutine-wrapper: Add no_co_wrapper_bdrv_wrlock functions
  2023-08-17 12:50 ` [PATCH 06/21] block-coroutine-wrapper: Add no_co_wrapper_bdrv_wrlock functions Kevin Wolf
@ 2023-08-21  7:34   ` Emanuele Giuseppe Esposito
  2023-08-22 19:03   ` Stefan Hajnoczi
  1 sibling, 0 replies; 70+ messages in thread
From: Emanuele Giuseppe Esposito @ 2023-08-21  7:34 UTC (permalink / raw)
  To: Kevin Wolf, qemu-block; +Cc: stefanha, eblake, pbonzini, vsementsov, qemu-devel



Am 17/08/2023 um 14:50 schrieb Kevin Wolf:
> Add a new wrapper type for GRAPH_WRLOCK functions that should be called
> from coroutine context.
> 
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>

Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>



^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 07/21] block-coroutine-wrapper: Allow arbitrary parameter names
  2023-08-17 12:50 ` [PATCH 07/21] block-coroutine-wrapper: Allow arbitrary parameter names Kevin Wolf
@ 2023-08-21  7:35   ` Emanuele Giuseppe Esposito
  2023-08-22 19:03   ` Stefan Hajnoczi
  1 sibling, 0 replies; 70+ messages in thread
From: Emanuele Giuseppe Esposito @ 2023-08-21  7:35 UTC (permalink / raw)
  To: Kevin Wolf, qemu-block; +Cc: stefanha, eblake, pbonzini, vsementsov, qemu-devel



Am 17/08/2023 um 14:50 schrieb Kevin Wolf:
> Don't assume specific parameter names like 'bs' or 'blk' in the
> generated code, but use the actual name.
> 
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>

Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>



^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 08/21] block: Mark bdrv_replace_child_noperm() GRAPH_WRLOCK
  2023-08-17 12:50 ` [PATCH 08/21] block: Mark bdrv_replace_child_noperm() GRAPH_WRLOCK Kevin Wolf
@ 2023-08-21  7:35   ` Emanuele Giuseppe Esposito
  2023-08-22 19:05   ` Stefan Hajnoczi
  1 sibling, 0 replies; 70+ messages in thread
From: Emanuele Giuseppe Esposito @ 2023-08-21  7:35 UTC (permalink / raw)
  To: Kevin Wolf, qemu-block; +Cc: stefanha, eblake, pbonzini, vsementsov, qemu-devel



Am 17/08/2023 um 14:50 schrieb Kevin Wolf:
> Instead of taking the writer lock internally, require callers to already
> hold it when calling bdrv_replace_child_noperm(). These callers will
> typically already hold the graph lock once the locking work is
> completed, which means that they can't call functions that take it
> internally.
> 
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>

Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>



^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 09/21] block: Mark bdrv_replace_child_tran() GRAPH_WRLOCK
  2023-08-17 12:50 ` [PATCH 09/21] block: Mark bdrv_replace_child_tran() GRAPH_WRLOCK Kevin Wolf
@ 2023-08-21  7:35   ` Emanuele Giuseppe Esposito
  2023-08-22 19:14   ` Stefan Hajnoczi
  1 sibling, 0 replies; 70+ messages in thread
From: Emanuele Giuseppe Esposito @ 2023-08-21  7:35 UTC (permalink / raw)
  To: Kevin Wolf, qemu-block; +Cc: stefanha, eblake, pbonzini, vsementsov, qemu-devel



Am 17/08/2023 um 14:50 schrieb Kevin Wolf:
> Instead of taking the writer lock internally, require callers to already
> hold it when calling bdrv_replace_child_tran(). These callers will
> typically already hold the graph lock once the locking work is
> completed, which means that they can't call functions that take it
> internally.
> 
> While a graph lock is held, polling is not allowed. Therefore draining
> the necessary nodes can no longer be done in bdrv_remove_child() and
> bdrv_replace_node_noperm(), but the callers must already make sure that
> they are drained.
> 
> Note that the transaction callbacks still take the lock internally, so
> tran_finalize() must be called without the lock held. This is because
> bdrv_append() also calls bdrv_attach_child_noperm(), which currently
> requires to be called unlocked. Once it changes, the transaction
> callbacks can be changed, too.
> 
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>

Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>



^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 12/21] block: Mark bdrv_attach_child() GRAPH_WRLOCK
  2023-08-17 12:50 ` [PATCH 12/21] block: Mark bdrv_attach_child() GRAPH_WRLOCK Kevin Wolf
@ 2023-08-21  7:35   ` Emanuele Giuseppe Esposito
  2023-08-22 19:21   ` Stefan Hajnoczi
  1 sibling, 0 replies; 70+ messages in thread
From: Emanuele Giuseppe Esposito @ 2023-08-21  7:35 UTC (permalink / raw)
  To: Kevin Wolf, qemu-block; +Cc: stefanha, eblake, pbonzini, vsementsov, qemu-devel



Am 17/08/2023 um 14:50 schrieb Kevin Wolf:
> Instead of taking the writer lock internally, require callers to already
> hold it when calling bdrv_attach_child_common(). These callers will
> typically already hold the graph lock once the locking work is
> completed, which means that they can't call functions that take it
> internally.
> 
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>

Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>



^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 13/21] block: Mark bdrv_parent_perms_conflict() and callers GRAPH_RDLOCK
  2023-08-17 12:50 ` [PATCH 13/21] block: Mark bdrv_parent_perms_conflict() and callers GRAPH_RDLOCK Kevin Wolf
@ 2023-08-21  7:35   ` Emanuele Giuseppe Esposito
  2023-08-22 19:23   ` Stefan Hajnoczi
  1 sibling, 0 replies; 70+ messages in thread
From: Emanuele Giuseppe Esposito @ 2023-08-21  7:35 UTC (permalink / raw)
  To: Kevin Wolf, qemu-block; +Cc: stefanha, eblake, pbonzini, vsementsov, qemu-devel



Am 17/08/2023 um 14:50 schrieb Kevin Wolf:
> The function reads the parents list, so it needs to hold the graph lock.
> 
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>

Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>



^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 14/21] block: Mark bdrv_get_cumulative_perm() and callers GRAPH_RDLOCK
  2023-08-17 12:50 ` [PATCH 14/21] block: Mark bdrv_get_cumulative_perm() " Kevin Wolf
@ 2023-08-21  7:35   ` Emanuele Giuseppe Esposito
  2023-08-22 19:28   ` Stefan Hajnoczi
  1 sibling, 0 replies; 70+ messages in thread
From: Emanuele Giuseppe Esposito @ 2023-08-21  7:35 UTC (permalink / raw)
  To: Kevin Wolf, qemu-block; +Cc: stefanha, eblake, pbonzini, vsementsov, qemu-devel



Am 17/08/2023 um 14:50 schrieb Kevin Wolf:
> The function reads the parents list, so it needs to hold the graph lock.
> 
> This happens to result in BlockDriver.bdrv_set_perm() to be called with
> the graph lock held. For consistency, make it the same for all of the
> BlockDriver callbacks for updating permissions and annotate the function
> pointers with GRAPH_RDLOCK_PTR.
> 
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>

Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>



^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 15/21] block: Mark bdrv_child_perm() GRAPH_RDLOCK
  2023-08-17 12:50 ` [PATCH 15/21] block: Mark bdrv_child_perm() GRAPH_RDLOCK Kevin Wolf
@ 2023-08-21  7:35   ` Emanuele Giuseppe Esposito
  2023-08-22 19:29   ` Stefan Hajnoczi
  1 sibling, 0 replies; 70+ messages in thread
From: Emanuele Giuseppe Esposito @ 2023-08-21  7:35 UTC (permalink / raw)
  To: Kevin Wolf, qemu-block; +Cc: stefanha, eblake, pbonzini, vsementsov, qemu-devel



Am 17/08/2023 um 14:50 schrieb Kevin Wolf:
> This adds GRAPH_RDLOCK annotations to declare that callers of
> bdrv_child_perm() need to hold a reader lock for the graph because
> some implementations access the children list of a node.
> 
> The callers of bdrv_child_perm() conveniently already hold the lock.
> 
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>

Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>



^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 16/21] block: Mark bdrv_parent_cb_change_media() GRAPH_RDLOCK
  2023-08-17 12:50 ` [PATCH 16/21] block: Mark bdrv_parent_cb_change_media() GRAPH_RDLOCK Kevin Wolf
@ 2023-08-21  7:35   ` Emanuele Giuseppe Esposito
  2023-08-22 19:29   ` Stefan Hajnoczi
  1 sibling, 0 replies; 70+ messages in thread
From: Emanuele Giuseppe Esposito @ 2023-08-21  7:35 UTC (permalink / raw)
  To: Kevin Wolf, qemu-block; +Cc: stefanha, eblake, pbonzini, vsementsov, qemu-devel



Am 17/08/2023 um 14:50 schrieb Kevin Wolf:
> The function reads the parents list, so it needs to hold the graph lock.
> 
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>

Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>



^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 17/21] block: Take graph rdlock in bdrv_drop_intermediate()
  2023-08-17 12:50 ` [PATCH 17/21] block: Take graph rdlock in bdrv_drop_intermediate() Kevin Wolf
@ 2023-08-21  7:35   ` Emanuele Giuseppe Esposito
  2023-08-22 19:35   ` Stefan Hajnoczi
  1 sibling, 0 replies; 70+ messages in thread
From: Emanuele Giuseppe Esposito @ 2023-08-21  7:35 UTC (permalink / raw)
  To: Kevin Wolf, qemu-block; +Cc: stefanha, eblake, pbonzini, vsementsov, qemu-devel



Am 17/08/2023 um 14:50 schrieb Kevin Wolf:
> The function reads the parents list, so it needs to hold the graph lock.
> 
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>

Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>



^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 18/21] block: Take graph rdlock in bdrv_change_aio_context()
  2023-08-17 12:50 ` [PATCH 18/21] block: Take graph rdlock in bdrv_change_aio_context() Kevin Wolf
@ 2023-08-21  7:35   ` Emanuele Giuseppe Esposito
  0 siblings, 0 replies; 70+ messages in thread
From: Emanuele Giuseppe Esposito @ 2023-08-21  7:35 UTC (permalink / raw)
  To: Kevin Wolf, qemu-block; +Cc: stefanha, eblake, pbonzini, vsementsov, qemu-devel



Am 17/08/2023 um 14:50 schrieb Kevin Wolf:
> The function reads the parents list, so it needs to hold the graph lock.
> 
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>

Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>



^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 19/21] block: Mark bdrv_root_unref_child() GRAPH_WRLOCK
  2023-08-17 12:50 ` [PATCH 19/21] block: Mark bdrv_root_unref_child() GRAPH_WRLOCK Kevin Wolf
@ 2023-08-21  7:35   ` Emanuele Giuseppe Esposito
  2023-08-22 19:38   ` Stefan Hajnoczi
  1 sibling, 0 replies; 70+ messages in thread
From: Emanuele Giuseppe Esposito @ 2023-08-21  7:35 UTC (permalink / raw)
  To: Kevin Wolf, qemu-block; +Cc: stefanha, eblake, pbonzini, vsementsov, qemu-devel



Am 17/08/2023 um 14:50 schrieb Kevin Wolf:
> Instead of taking the writer lock internally, require callers to already
> hold it when calling bdrv_root_unref_child(). These callers will
> typically already hold the graph lock once the locking work is
> completed, which means that they can't call functions that take it
> internally.
> 
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>

Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>



^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 20/21] block: Mark bdrv_unref_child() GRAPH_WRLOCK
  2023-08-17 12:50 ` [PATCH 20/21] block: Mark bdrv_unref_child() GRAPH_WRLOCK Kevin Wolf
@ 2023-08-21  7:35   ` Emanuele Giuseppe Esposito
  2023-08-22 19:39   ` Stefan Hajnoczi
  1 sibling, 0 replies; 70+ messages in thread
From: Emanuele Giuseppe Esposito @ 2023-08-21  7:35 UTC (permalink / raw)
  To: Kevin Wolf, qemu-block; +Cc: stefanha, eblake, pbonzini, vsementsov, qemu-devel



Am 17/08/2023 um 14:50 schrieb Kevin Wolf:
> Instead of taking the writer lock internally, require callers to already
> hold it when calling bdrv_unref_child(). These callers will typically
> already hold the graph lock once the locking work is completed, which
> means that they can't call functions that take it internally.
> 
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>

Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>



^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 21/21] block: Mark bdrv_add/del_child() and caller GRAPH_WRLOCK
  2023-08-17 12:50 ` [PATCH 21/21] block: Mark bdrv_add/del_child() and caller GRAPH_WRLOCK Kevin Wolf
@ 2023-08-21  7:35   ` Emanuele Giuseppe Esposito
  2023-08-22 19:40   ` Stefan Hajnoczi
  1 sibling, 0 replies; 70+ messages in thread
From: Emanuele Giuseppe Esposito @ 2023-08-21  7:35 UTC (permalink / raw)
  To: Kevin Wolf, qemu-block; +Cc: stefanha, eblake, pbonzini, vsementsov, qemu-devel



Am 17/08/2023 um 14:50 schrieb Kevin Wolf:
> The functions read the parents list in the generic block layer, so we
> need to hold the graph lock already there. The BlockDriver
> implementations actually modify the graph, so it has to be a writer
> lock.
> 
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>

Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>



^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 05/21] block: Introduce bdrv_schedule_unref()
  2023-08-18 16:26     ` Eric Blake
@ 2023-08-21 16:59       ` Kevin Wolf
  0 siblings, 0 replies; 70+ messages in thread
From: Kevin Wolf @ 2023-08-21 16:59 UTC (permalink / raw)
  To: Eric Blake
  Cc: qemu-block, stefanha, eesposit, pbonzini, vsementsov, qemu-devel

Am 18.08.2023 um 18:26 hat Eric Blake geschrieben:
> On Fri, Aug 18, 2023 at 11:24:00AM -0500, Eric Blake wrote:
> > > +++ b/block/graph-lock.c
> > > @@ -163,17 +163,26 @@ void bdrv_graph_wrlock(BlockDriverState *bs)
> > >  void bdrv_graph_wrunlock(void)
> > >  {
> > >      GLOBAL_STATE_CODE();
> > > -    QEMU_LOCK_GUARD(&aio_context_list_lock);
> > >      assert(qatomic_read(&has_writer));
> > >  
> > > +    WITH_QEMU_LOCK_GUARD(&aio_context_list_lock) {
> > > +        /*
> > > +         * No need for memory barriers, this works in pair with
> > > +         * the slow path of rdlock() and both take the lock.
> > > +         */
> > > +        qatomic_store_release(&has_writer, 0);
> > > +
> > > +        /* Wake up all coroutine that are waiting to read the graph */
> > > +        qemu_co_enter_all(&reader_queue, &aio_context_list_lock);
> > 
> > So if I understand coroutines correctly, this says all pending
> > coroutines are now scheduled to run (or maybe they do try to run here,
> > but then immediately return control back to this coroutine to await
> > the right lock conditions since we are still in the block guarded by
> > list_lock)...
> > 
> > > +    }
> > > +
> > >      /*
> > > -     * No need for memory barriers, this works in pair with
> > > -     * the slow path of rdlock() and both take the lock.
> > > +     * Run any BHs that were scheduled during the wrlock section and that
> > > +     * callers might expect to have finished (e.g. bdrv_unref() calls). Do this
> > > +     * only after restarting coroutines so that nested event loops in BHs don't
> > > +     * deadlock if their condition relies on the coroutine making progress.
> > >       */
> > > -    qatomic_store_release(&has_writer, 0);
> > > -
> > > -    /* Wake up all coroutine that are waiting to read the graph */
> > > -    qemu_co_enter_all(&reader_queue, &aio_context_list_lock);
> > > +    aio_bh_poll(qemu_get_aio_context());
> > 
> > ...and as long as the other coroutines sharing this thread don't
> > actually get to make progress until the next point at which the
> > current coroutine yields, and as long as our aio_bh_poll() doesn't
> > also include a yield point, then we are ensured that the BH has
> > completed before the next yield point in our caller.
> > 
> > There are times (like today) where I'm still trying to wrap my mind
> > about the subtle differences between true multi-threading
> > vs. cooperative coroutines sharing a single thread via the use of
> > yield points.  coroutines are cool when they can get rid of some of
> > the races that you have to worry about in true multi-threading.
> 
> That said, once we introduce multi-queue, can we ever have a scenario
> where a different iothread might be trying to access the graph and
> perform a reopen in the time while this thread has not completed the
> BH close?  Or is that protected by some other mutual exclusion (where
> the only one we have to worry about is reopen by a coroutine in the
> same thread, because all other threads are locked out of graph
> modifications)?

We don't have to worry about that one because reopen (and taking the
writer lock in general) only happen in the main thread, which is exactly
the thread that this code runs in.

The thing that we need to take into consideration is that aio_bh_poll()
could call something that wants to take the writer lock and modify the
graph again. It's not really a problem, though, because we're already
done at that point. Any readers that we resumed above will just
synchronise with the writer in the usual way and one of them will have
to wait. But there is nothing that is still waiting to be resumed and
could deadlock.

Kevin



^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 00/21] Graph locking part 4 (node management)
  2023-08-17 12:49 [PATCH 00/21] Graph locking part 4 (node management) Kevin Wolf
                   ` (20 preceding siblings ...)
  2023-08-17 12:50 ` [PATCH 21/21] block: Mark bdrv_add/del_child() and caller GRAPH_WRLOCK Kevin Wolf
@ 2023-08-22 18:20 ` Stefan Hajnoczi
  21 siblings, 0 replies; 70+ messages in thread
From: Stefan Hajnoczi @ 2023-08-22 18:20 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: qemu-block, eblake, eesposit, pbonzini, vsementsov, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 3733 bytes --]

On Thu, Aug 17, 2023 at 02:49:59PM +0200, Kevin Wolf wrote:
> The previous parts of the graph locking changes focussed mostly on the
> BlockDriver side and taking reader locks while performing I/O. This
> series focusses more on the functions managing the graph structure, i.e
> adding, removing and replacing nodes and updating their permissions.
> 
> Many of these places actually need to take the writer lock to avoid
> readers seeing an inconsistent half-updated graph state. Therefore
> taking the writer lock is now pushed down from the very low-level
> function bdrv_replace_child_noperm() into its more high level callers.

The word "down" confuses me. It seems like the direction is up instead
of down and the scope of the lock is increasing rather than decreasing.
But I haven't read the actual patches yet...

> 
> Kevin Wolf (21):
>   block: Remove unused BlockReopenQueueEntry.perms_checked
>   preallocate: Factor out preallocate_truncate_to_real_size()
>   preallocate: Don't poll during permission updates
>   block: Take AioContext lock for bdrv_append() more consistently
>   block: Introduce bdrv_schedule_unref()
>   block-coroutine-wrapper: Add no_co_wrapper_bdrv_wrlock functions
>   block-coroutine-wrapper: Allow arbitrary parameter names
>   block: Mark bdrv_replace_child_noperm() GRAPH_WRLOCK
>   block: Mark bdrv_replace_child_tran() GRAPH_WRLOCK
>   block: Mark bdrv_attach_child_common() GRAPH_WRLOCK
>   block: Call transaction callbacks with lock held
>   block: Mark bdrv_attach_child() GRAPH_WRLOCK
>   block: Mark bdrv_parent_perms_conflict() and callers GRAPH_RDLOCK
>   block: Mark bdrv_get_cumulative_perm() and callers GRAPH_RDLOCK
>   block: Mark bdrv_child_perm() GRAPH_RDLOCK
>   block: Mark bdrv_parent_cb_change_media() GRAPH_RDLOCK
>   block: Take graph rdlock in bdrv_drop_intermediate()
>   block: Take graph rdlock in bdrv_change_aio_context()
>   block: Mark bdrv_root_unref_child() GRAPH_WRLOCK
>   block: Mark bdrv_unref_child() GRAPH_WRLOCK
>   block: Mark bdrv_add/del_child() and caller GRAPH_WRLOCK
> 
>  include/block/block-common.h                |   4 +
>  include/block/block-global-state.h          |  30 +-
>  include/block/block_int-common.h            |  34 +-
>  include/block/block_int-global-state.h      |  14 +-
>  include/sysemu/block-backend-global-state.h |   4 +-
>  block.c                                     | 340 ++++++++++++++------
>  block/blklogwrites.c                        |   4 +
>  block/blkverify.c                           |   2 +
>  block/block-backend.c                       |  29 +-
>  block/copy-before-write.c                   |  10 +-
>  block/crypto.c                              |   6 +-
>  block/graph-lock.c                          |  23 +-
>  block/mirror.c                              |   8 +
>  block/preallocate.c                         | 133 +++++---
>  block/qcow2.c                               |   4 +-
>  block/quorum.c                              |  23 +-
>  block/replication.c                         |   9 +
>  block/snapshot.c                            |   2 +
>  block/stream.c                              |  20 +-
>  block/vmdk.c                                |  13 +
>  blockdev.c                                  |  23 +-
>  blockjob.c                                  |   2 +
>  tests/unit/test-bdrv-drain.c                |  23 +-
>  tests/unit/test-bdrv-graph-mod.c            |  20 ++
>  tests/unit/test-block-iothread.c            |   3 +
>  scripts/block-coroutine-wrapper.py          |  18 +-
>  tests/qemu-iotests/051.pc.out               |   6 +-
>  27 files changed, 580 insertions(+), 227 deletions(-)
> 
> -- 
> 2.41.0
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 01/21] block: Remove unused BlockReopenQueueEntry.perms_checked
  2023-08-17 12:50 ` [PATCH 01/21] block: Remove unused BlockReopenQueueEntry.perms_checked Kevin Wolf
  2023-08-17 19:18   ` Eric Blake
  2023-08-21  7:34   ` Emanuele Giuseppe Esposito
@ 2023-08-22 18:23   ` Stefan Hajnoczi
  2 siblings, 0 replies; 70+ messages in thread
From: Stefan Hajnoczi @ 2023-08-22 18:23 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: qemu-block, eblake, eesposit, pbonzini, vsementsov, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 369 bytes --]

On Thu, Aug 17, 2023 at 02:50:00PM +0200, Kevin Wolf wrote:
> This field has been unused since commit 72373e40fbc ('block:
> bdrv_reopen_multiple: refresh permissions on updated graph').
> Remove it.
> 
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> ---
>  block.c | 1 -
>  1 file changed, 1 deletion(-)

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 02/21] preallocate: Factor out preallocate_truncate_to_real_size()
  2023-08-17 12:50 ` [PATCH 02/21] preallocate: Factor out preallocate_truncate_to_real_size() Kevin Wolf
  2023-08-18 15:23   ` Eric Blake
  2023-08-21  7:34   ` Emanuele Giuseppe Esposito
@ 2023-08-22 18:26   ` Stefan Hajnoczi
  2 siblings, 0 replies; 70+ messages in thread
From: Stefan Hajnoczi @ 2023-08-22 18:26 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: qemu-block, eblake, eesposit, pbonzini, vsementsov, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 432 bytes --]

On Thu, Aug 17, 2023 at 02:50:01PM +0200, Kevin Wolf wrote:
> It's essentially the same code in preallocate_check_perm() and
> preallocate_close(), except that the latter ignores errors.
> 
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> ---
>  block/preallocate.c | 48 +++++++++++++++++++++------------------------
>  1 file changed, 22 insertions(+), 26 deletions(-)

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 03/21] preallocate: Don't poll during permission updates
  2023-08-17 12:50 ` [PATCH 03/21] preallocate: Don't poll during permission updates Kevin Wolf
  2023-08-18 15:34   ` Eric Blake
@ 2023-08-22 18:41   ` Stefan Hajnoczi
  1 sibling, 0 replies; 70+ messages in thread
From: Stefan Hajnoczi @ 2023-08-22 18:41 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: qemu-block, eblake, eesposit, pbonzini, vsementsov, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 1680 bytes --]

On Thu, Aug 17, 2023 at 02:50:02PM +0200, Kevin Wolf wrote:
> When the permission related BlockDriver callbacks are called, we are in
> the middle of an operation traversing the block graph. Polling in such a
> place is a very bad idea because the graph could change in unexpected
> ways. In the future, callers will also hold the graph lock, which is
> likely to turn polling into a deadlock.
> 
> So we need to get rid of calls to functions like bdrv_getlength() or
> bdrv_truncate() there as these functions poll internally. They are
> currently used so that when no parent has write/resize permissions on
> the image any more, the preallocate filter drops the extra preallocated
> area in the image file and gives up write/resize permissions itself.
> 
> In order to achieve this without polling in .bdrv_check_perm, don't
> immediately truncate the image, but only schedule a BH to do so. The
> filter keeps the write/resize permissions a bit longer now until the BH
> has executed.
> 
> There is one case in which delaying doesn't work: Reopening the image
> read-only. In this case, bs->file will likely be reopened read-only,
> too, so keeping write permissions a bit longer on it doesn't work. But
> we can already cover this case in preallocate_reopen_prepare() and not
> rely on the permission updates for it.
> 
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> ---
>  block/preallocate.c | 89 +++++++++++++++++++++++++++++++++++----------
>  1 file changed, 69 insertions(+), 20 deletions(-)

I don't know the permissions code well enough, but the patch matches the
commit description:

Acked-by: Stefan Hajnoczi <stefanha@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 04/21] block: Take AioContext lock for bdrv_append() more consistently
  2023-08-17 12:50 ` [PATCH 04/21] block: Take AioContext lock for bdrv_append() more consistently Kevin Wolf
  2023-08-18 16:07   ` Eric Blake
  2023-08-21  7:34   ` Emanuele Giuseppe Esposito
@ 2023-08-22 18:49   ` Stefan Hajnoczi
  2 siblings, 0 replies; 70+ messages in thread
From: Stefan Hajnoczi @ 2023-08-22 18:49 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: qemu-block, eblake, eesposit, pbonzini, vsementsov, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 1007 bytes --]

On Thu, Aug 17, 2023 at 02:50:03PM +0200, Kevin Wolf wrote:
> diff --git a/tests/unit/test-block-iothread.c b/tests/unit/test-block-iothread.c
> index d727a5fee8..9155547313 100644
> --- a/tests/unit/test-block-iothread.c
> +++ b/tests/unit/test-block-iothread.c
> @@ -756,11 +756,14 @@ static void test_propagate_mirror(void)
>                                    &error_abort);
>  
>      /* Start a mirror job */
> +    aio_context_acquire(main_ctx);
>      mirror_start("job0", src, target, NULL, JOB_DEFAULT, 0, 0, 0,
>                   MIRROR_SYNC_MODE_NONE, MIRROR_OPEN_BACKING_CHAIN, false,
>                   BLOCKDEV_ON_ERROR_REPORT, BLOCKDEV_ON_ERROR_REPORT,
>                   false, "filter_node", MIRROR_COPY_MODE_BACKGROUND,
>                   &error_abort);
> +    aio_context_release(main_ctx);

For other reviewers: mirror_start() isn't mentioned in the commit
description, but it calls bdrv_append() internally.

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 05/21] block: Introduce bdrv_schedule_unref()
  2023-08-17 12:50 ` [PATCH 05/21] block: Introduce bdrv_schedule_unref() Kevin Wolf
  2023-08-18 16:23   ` Eric Blake
@ 2023-08-22 19:01   ` Stefan Hajnoczi
  2023-09-05 16:39     ` Kevin Wolf
  1 sibling, 1 reply; 70+ messages in thread
From: Stefan Hajnoczi @ 2023-08-22 19:01 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: qemu-block, eblake, eesposit, pbonzini, vsementsov, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 7943 bytes --]

On Thu, Aug 17, 2023 at 02:50:04PM +0200, Kevin Wolf wrote:
> bdrv_unref() is called by a lot of places that need to hold the graph
> lock (it naturally happens in the context of operations that change the
> graph). However, bdrv_unref() takes the graph writer lock internally, so
> it can't actually be called while already holding a graph lock without
> causing a deadlock.
> 
> bdrv_unref() also can't just become GRAPH_WRLOCK because it drains the
> node before closing it, and draining requires that the graph is
> unlocked.
> 
> The solution is to defer deleting the node until we don't hold the lock
> any more and draining is possible again.
> 
> Note that keeping images open for longer than necessary can create
> problems, too: You can't open an image again before it is really closed
> (if image locking didn't prevent it, it would cause corruption).
> Reopening an image immediately happens at least during bdrv_open() and
> bdrv_co_create().
> 
> In order to solve this problem, make sure to run the deferred unref in
> bdrv_graph_wrunlock(), i.e. the first possible place where we can drain
> again. This is also why bdrv_schedule_unref() is marked GRAPH_WRLOCK.
> 
> The output of iotest 051 is updated because the additional polling
> changes the order of HMP output, resulting in a new "(qemu)" prompt in
> the test output that was previously on a separate line and filtered out.
> 
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> ---
>  include/block/block-global-state.h |  1 +
>  block.c                            |  9 +++++++++
>  block/graph-lock.c                 | 23 ++++++++++++++++-------
>  tests/qemu-iotests/051.pc.out      |  6 +++---
>  4 files changed, 29 insertions(+), 10 deletions(-)
> 
> diff --git a/include/block/block-global-state.h b/include/block/block-global-state.h
> index f347199bff..e570799f85 100644
> --- a/include/block/block-global-state.h
> +++ b/include/block/block-global-state.h
> @@ -224,6 +224,7 @@ void bdrv_img_create(const char *filename, const char *fmt,
>  void bdrv_ref(BlockDriverState *bs);
>  void no_coroutine_fn bdrv_unref(BlockDriverState *bs);
>  void coroutine_fn no_co_wrapper bdrv_co_unref(BlockDriverState *bs);
> +void GRAPH_WRLOCK bdrv_schedule_unref(BlockDriverState *bs);
>  void bdrv_unref_child(BlockDriverState *parent, BdrvChild *child);
>  BdrvChild *bdrv_attach_child(BlockDriverState *parent_bs,
>                               BlockDriverState *child_bs,
> diff --git a/block.c b/block.c
> index 6376452768..9c4f24f4b9 100644
> --- a/block.c
> +++ b/block.c
> @@ -7033,6 +7033,15 @@ void bdrv_unref(BlockDriverState *bs)
>      }
>  }
>  
> +void bdrv_schedule_unref(BlockDriverState *bs)

Please add a doc comment explaining when and why this should be used.

> +{
> +    if (!bs) {
> +        return;
> +    }
> +    aio_bh_schedule_oneshot(qemu_get_aio_context(),
> +                            (QEMUBHFunc *) bdrv_unref, bs);
> +}
> +
>  struct BdrvOpBlocker {
>      Error *reason;
>      QLIST_ENTRY(BdrvOpBlocker) list;
> diff --git a/block/graph-lock.c b/block/graph-lock.c
> index 5e66f01ae8..395d387651 100644
> --- a/block/graph-lock.c
> +++ b/block/graph-lock.c
> @@ -163,17 +163,26 @@ void bdrv_graph_wrlock(BlockDriverState *bs)
>  void bdrv_graph_wrunlock(void)
>  {
>      GLOBAL_STATE_CODE();
> -    QEMU_LOCK_GUARD(&aio_context_list_lock);
>      assert(qatomic_read(&has_writer));
>  
> +    WITH_QEMU_LOCK_GUARD(&aio_context_list_lock) {
> +        /*
> +         * No need for memory barriers, this works in pair with
> +         * the slow path of rdlock() and both take the lock.
> +         */
> +        qatomic_store_release(&has_writer, 0);
> +
> +        /* Wake up all coroutine that are waiting to read the graph */

s/coroutine/coroutines/

> +        qemu_co_enter_all(&reader_queue, &aio_context_list_lock);
> +    }
> +
>      /*
> -     * No need for memory barriers, this works in pair with
> -     * the slow path of rdlock() and both take the lock.
> +     * Run any BHs that were scheduled during the wrlock section and that
> +     * callers might expect to have finished (e.g. bdrv_unref() calls). Do this

Referring directly to bdrv_schedule_unref() would help make it clearer
what you mean.

> +     * only after restarting coroutines so that nested event loops in BHs don't
> +     * deadlock if their condition relies on the coroutine making progress.
>       */
> -    qatomic_store_release(&has_writer, 0);
> -
> -    /* Wake up all coroutine that are waiting to read the graph */
> -    qemu_co_enter_all(&reader_queue, &aio_context_list_lock);
> +    aio_bh_poll(qemu_get_aio_context());

Keeping a dedicated list of BDSes pending unref seems cleaner than using
the aio_bh_poll(). There's a lot of stuff happening in the event loop
and I wonder if those other BHs might cause issues if they run at the
end of bdrv_graph_wrunlock(). The change to qemu-iotests 051's output is
an example of this, but other things could happen too (e.g. monitor
command processing).

>  }
>  
>  void coroutine_fn bdrv_graph_co_rdlock(void)
> diff --git a/tests/qemu-iotests/051.pc.out b/tests/qemu-iotests/051.pc.out
> index 4d4af5a486..7e10c5fa1b 100644
> --- a/tests/qemu-iotests/051.pc.out
> +++ b/tests/qemu-iotests/051.pc.out
> @@ -169,11 +169,11 @@ QEMU_PROG: -device scsi-hd,drive=disk: Device needs media, but drive is empty
>  
>  Testing: -drive file=TEST_DIR/t.qcow2,if=none,node-name=disk -object iothread,id=thread0 -device virtio-scsi,iothread=thread0,id=virtio-scsi0 -device scsi-hd,bus=virtio-scsi0.0,drive=disk,share-rw=on -device ide-hd,drive=disk,share-rw=on
>  QEMU X.Y.Z monitor - type 'help' for more information
> -QEMU_PROG: -device ide-hd,drive=disk,share-rw=on: Cannot change iothread of active block backend
> +(qemu) QEMU_PROG: -device ide-hd,drive=disk,share-rw=on: Cannot change iothread of active block backend
>  
>  Testing: -drive file=TEST_DIR/t.qcow2,if=none,node-name=disk -object iothread,id=thread0 -device virtio-scsi,iothread=thread0,id=virtio-scsi0 -device scsi-hd,bus=virtio-scsi0.0,drive=disk,share-rw=on -device virtio-blk-pci,drive=disk,share-rw=on
>  QEMU X.Y.Z monitor - type 'help' for more information
> -QEMU_PROG: -device virtio-blk-pci,drive=disk,share-rw=on: Cannot change iothread of active block backend
> +(qemu) QEMU_PROG: -device virtio-blk-pci,drive=disk,share-rw=on: Cannot change iothread of active block backend
>  
>  Testing: -drive file=TEST_DIR/t.qcow2,if=none,node-name=disk -object iothread,id=thread0 -device virtio-scsi,iothread=thread0,id=virtio-scsi0 -device scsi-hd,bus=virtio-scsi0.0,drive=disk,share-rw=on -device lsi53c895a,id=lsi0 -device scsi-hd,bus=lsi0.0,drive=disk,share-rw=on
>  QEMU X.Y.Z monitor - type 'help' for more information
> @@ -185,7 +185,7 @@ QEMU X.Y.Z monitor - type 'help' for more information
>  
>  Testing: -drive file=TEST_DIR/t.qcow2,if=none,node-name=disk -object iothread,id=thread0 -device virtio-scsi,iothread=thread0,id=virtio-scsi0 -device scsi-hd,bus=virtio-scsi0.0,drive=disk,share-rw=on -device virtio-blk-pci,drive=disk,iothread=thread0,share-rw=on
>  QEMU X.Y.Z monitor - type 'help' for more information
> -QEMU_PROG: -device virtio-blk-pci,drive=disk,iothread=thread0,share-rw=on: Cannot change iothread of active block backend
> +(qemu) QEMU_PROG: -device virtio-blk-pci,drive=disk,iothread=thread0,share-rw=on: Cannot change iothread of active block backend
>  
>  Testing: -drive file=TEST_DIR/t.qcow2,if=none,node-name=disk -object iothread,id=thread0 -device virtio-scsi,iothread=thread0,id=virtio-scsi0 -device scsi-hd,bus=virtio-scsi0.0,drive=disk,share-rw=on -device virtio-scsi,id=virtio-scsi1,iothread=thread0 -device scsi-hd,bus=virtio-scsi1.0,drive=disk,share-rw=on
>  QEMU X.Y.Z monitor - type 'help' for more information
> -- 
> 2.41.0
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 06/21] block-coroutine-wrapper: Add no_co_wrapper_bdrv_wrlock functions
  2023-08-17 12:50 ` [PATCH 06/21] block-coroutine-wrapper: Add no_co_wrapper_bdrv_wrlock functions Kevin Wolf
  2023-08-21  7:34   ` Emanuele Giuseppe Esposito
@ 2023-08-22 19:03   ` Stefan Hajnoczi
  1 sibling, 0 replies; 70+ messages in thread
From: Stefan Hajnoczi @ 2023-08-22 19:03 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: qemu-block, eblake, eesposit, pbonzini, vsementsov, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 419 bytes --]

On Thu, Aug 17, 2023 at 02:50:05PM +0200, Kevin Wolf wrote:
> Add a new wrapper type for GRAPH_WRLOCK functions that should be called
> from coroutine context.
> 
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> ---
>  include/block/block-common.h       |  4 ++++
>  scripts/block-coroutine-wrapper.py | 11 +++++++++++
>  2 files changed, 15 insertions(+)

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 07/21] block-coroutine-wrapper: Allow arbitrary parameter names
  2023-08-17 12:50 ` [PATCH 07/21] block-coroutine-wrapper: Allow arbitrary parameter names Kevin Wolf
  2023-08-21  7:35   ` Emanuele Giuseppe Esposito
@ 2023-08-22 19:03   ` Stefan Hajnoczi
  1 sibling, 0 replies; 70+ messages in thread
From: Stefan Hajnoczi @ 2023-08-22 19:03 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: qemu-block, eblake, eesposit, pbonzini, vsementsov, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 388 bytes --]

On Thu, Aug 17, 2023 at 02:50:06PM +0200, Kevin Wolf wrote:
> Don't assume specific parameter names like 'bs' or 'blk' in the
> generated code, but use the actual name.
> 
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> ---
>  scripts/block-coroutine-wrapper.py | 7 ++++---
>  1 file changed, 4 insertions(+), 3 deletions(-)

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 08/21] block: Mark bdrv_replace_child_noperm() GRAPH_WRLOCK
  2023-08-17 12:50 ` [PATCH 08/21] block: Mark bdrv_replace_child_noperm() GRAPH_WRLOCK Kevin Wolf
  2023-08-21  7:35   ` Emanuele Giuseppe Esposito
@ 2023-08-22 19:05   ` Stefan Hajnoczi
  1 sibling, 0 replies; 70+ messages in thread
From: Stefan Hajnoczi @ 2023-08-22 19:05 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: qemu-block, eblake, eesposit, pbonzini, vsementsov, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 570 bytes --]

On Thu, Aug 17, 2023 at 02:50:07PM +0200, Kevin Wolf wrote:
> Instead of taking the writer lock internally, require callers to already
> hold it when calling bdrv_replace_child_noperm(). These callers will
> typically already hold the graph lock once the locking work is
> completed, which means that they can't call functions that take it
> internally.
> 
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> ---
>  block.c | 26 +++++++++++++++++++-------
>  1 file changed, 19 insertions(+), 7 deletions(-)

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 09/21] block: Mark bdrv_replace_child_tran() GRAPH_WRLOCK
  2023-08-17 12:50 ` [PATCH 09/21] block: Mark bdrv_replace_child_tran() GRAPH_WRLOCK Kevin Wolf
  2023-08-21  7:35   ` Emanuele Giuseppe Esposito
@ 2023-08-22 19:14   ` Stefan Hajnoczi
  1 sibling, 0 replies; 70+ messages in thread
From: Stefan Hajnoczi @ 2023-08-22 19:14 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: qemu-block, eblake, eesposit, pbonzini, vsementsov, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 1171 bytes --]

On Thu, Aug 17, 2023 at 02:50:08PM +0200, Kevin Wolf wrote:
> Instead of taking the writer lock internally, require callers to already
> hold it when calling bdrv_replace_child_tran(). These callers will
> typically already hold the graph lock once the locking work is
> completed, which means that they can't call functions that take it
> internally.
> 
> While a graph lock is held, polling is not allowed. Therefore draining
> the necessary nodes can no longer be done in bdrv_remove_child() and
> bdrv_replace_node_noperm(), but the callers must already make sure that
> they are drained.
> 
> Note that the transaction callbacks still take the lock internally, so
> tran_finalize() must be called without the lock held. This is because
> bdrv_append() also calls bdrv_attach_child_noperm(), which currently
> requires to be called unlocked. Once it changes, the transaction
> callbacks can be changed, too.
> 
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> ---
>  block.c | 78 ++++++++++++++++++++++++++++++++++++---------------------
>  1 file changed, 50 insertions(+), 28 deletions(-)

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 10/21] block: Mark bdrv_attach_child_common() GRAPH_WRLOCK
  2023-08-17 12:50 ` [PATCH 10/21] block: Mark bdrv_attach_child_common() GRAPH_WRLOCK Kevin Wolf
@ 2023-08-22 19:16   ` Stefan Hajnoczi
  0 siblings, 0 replies; 70+ messages in thread
From: Stefan Hajnoczi @ 2023-08-22 19:16 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: qemu-block, eblake, eesposit, pbonzini, vsementsov, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 1016 bytes --]

On Thu, Aug 17, 2023 at 02:50:09PM +0200, Kevin Wolf wrote:
> Instead of taking the writer lock internally, require callers to already
> hold it when calling bdrv_attach_child_common(). These callers will
> typically already hold the graph lock once the locking work is
> completed, which means that they can't call functions that take it
> internally.
> 
> Note that the transaction callbacks still take the lock internally, so
> tran_finalize() must be called without the lock held. This is because
> bdrv_append() also calls bdrv_replace_node_noperm(), which currently
> requires the transaction callbacks to be called unlocked. In the next
> step, both of them can be switched to locked tran_finalize() calls
> together.
> 
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> ---
>  block.c        | 133 +++++++++++++++++++++++++++++++------------------
>  block/stream.c |  20 ++++++--
>  2 files changed, 100 insertions(+), 53 deletions(-)

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 11/21] block: Call transaction callbacks with lock held
  2023-08-17 12:50 ` [PATCH 11/21] block: Call transaction callbacks with lock held Kevin Wolf
@ 2023-08-22 19:19   ` Stefan Hajnoczi
  0 siblings, 0 replies; 70+ messages in thread
From: Stefan Hajnoczi @ 2023-08-22 19:19 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: qemu-block, eblake, eesposit, pbonzini, vsementsov, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 1024 bytes --]

On Thu, Aug 17, 2023 at 02:50:10PM +0200, Kevin Wolf wrote:
> In previous patches, we changed some transactionable functions to be
> marked as GRAPH_WRLOCK, but required that tran_finalize() is still
> called without the lock. This was because all callbacks that can be in
> the same transaction need to follow the same convention.
> 
> Now that we don't have conflicting requirements any more, we can switch
> all of the transaction callbacks to be declared GRAPH_WRLOCK, too, and
> call tran_finalize() with the lock held.
> 
> Document for each of these transactionable functions that the lock needs
> to be held when completing the transaction, and make sure that all
> callers down to the place where the transaction is finalised actually
> have the writer lock.
> 
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> ---
>  block.c | 61 +++++++++++++++++++++++++++++++++++++++++----------------
>  1 file changed, 44 insertions(+), 17 deletions(-)

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 12/21] block: Mark bdrv_attach_child() GRAPH_WRLOCK
  2023-08-17 12:50 ` [PATCH 12/21] block: Mark bdrv_attach_child() GRAPH_WRLOCK Kevin Wolf
  2023-08-21  7:35   ` Emanuele Giuseppe Esposito
@ 2023-08-22 19:21   ` Stefan Hajnoczi
  1 sibling, 0 replies; 70+ messages in thread
From: Stefan Hajnoczi @ 2023-08-22 19:21 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: qemu-block, eblake, eesposit, pbonzini, vsementsov, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 850 bytes --]

On Thu, Aug 17, 2023 at 02:50:11PM +0200, Kevin Wolf wrote:
> Instead of taking the writer lock internally, require callers to already
> hold it when calling bdrv_attach_child_common(). These callers will
> typically already hold the graph lock once the locking work is
> completed, which means that they can't call functions that take it
> internally.
> 
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> ---
>  include/block/block-global-state.h | 14 ++++++++------
>  block.c                            |  7 +++----
>  block/quorum.c                     |  2 ++
>  block/replication.c                |  6 ++++++
>  tests/unit/test-bdrv-drain.c       | 14 ++++++++++++++
>  tests/unit/test-bdrv-graph-mod.c   | 10 ++++++++++
>  6 files changed, 43 insertions(+), 10 deletions(-)

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 13/21] block: Mark bdrv_parent_perms_conflict() and callers GRAPH_RDLOCK
  2023-08-17 12:50 ` [PATCH 13/21] block: Mark bdrv_parent_perms_conflict() and callers GRAPH_RDLOCK Kevin Wolf
  2023-08-21  7:35   ` Emanuele Giuseppe Esposito
@ 2023-08-22 19:23   ` Stefan Hajnoczi
  1 sibling, 0 replies; 70+ messages in thread
From: Stefan Hajnoczi @ 2023-08-22 19:23 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: qemu-block, eblake, eesposit, pbonzini, vsementsov, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 861 bytes --]

On Thu, Aug 17, 2023 at 02:50:12PM +0200, Kevin Wolf wrote:
> The function reads the parents list, so it needs to hold the graph lock.
> 
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> ---
>  include/block/block_int-common.h            |  6 ++---
>  include/block/block_int-global-state.h      |  8 +++---
>  include/sysemu/block-backend-global-state.h |  4 +--
>  block.c                                     | 28 +++++++++++++--------
>  block/block-backend.c                       | 26 ++++++++++++++-----
>  block/crypto.c                              |  6 +++--
>  block/mirror.c                              |  8 ++++++
>  block/vmdk.c                                |  2 ++
>  tests/unit/test-bdrv-graph-mod.c            |  4 +++
>  9 files changed, 66 insertions(+), 26 deletions(-)

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 14/21] block: Mark bdrv_get_cumulative_perm() and callers GRAPH_RDLOCK
  2023-08-17 12:50 ` [PATCH 14/21] block: Mark bdrv_get_cumulative_perm() " Kevin Wolf
  2023-08-21  7:35   ` Emanuele Giuseppe Esposito
@ 2023-08-22 19:28   ` Stefan Hajnoczi
  1 sibling, 0 replies; 70+ messages in thread
From: Stefan Hajnoczi @ 2023-08-22 19:28 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: qemu-block, eblake, eesposit, pbonzini, vsementsov, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 805 bytes --]

On Thu, Aug 17, 2023 at 02:50:13PM +0200, Kevin Wolf wrote:
> The function reads the parents list, so it needs to hold the graph lock.
> 
> This happens to result in BlockDriver.bdrv_set_perm() to be called with
> the graph lock held. For consistency, make it the same for all of the
> BlockDriver callbacks for updating permissions and annotate the function
> pointers with GRAPH_RDLOCK_PTR.
> 
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> ---
>  include/block/block_int-common.h       |  9 ++++---
>  include/block/block_int-global-state.h |  4 +--
>  block.c                                | 35 ++++++++++++++++++++------
>  blockdev.c                             |  6 +++++
>  4 files changed, 40 insertions(+), 14 deletions(-)

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 15/21] block: Mark bdrv_child_perm() GRAPH_RDLOCK
  2023-08-17 12:50 ` [PATCH 15/21] block: Mark bdrv_child_perm() GRAPH_RDLOCK Kevin Wolf
  2023-08-21  7:35   ` Emanuele Giuseppe Esposito
@ 2023-08-22 19:29   ` Stefan Hajnoczi
  1 sibling, 0 replies; 70+ messages in thread
From: Stefan Hajnoczi @ 2023-08-22 19:29 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: qemu-block, eblake, eesposit, pbonzini, vsementsov, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 660 bytes --]

On Thu, Aug 17, 2023 at 02:50:14PM +0200, Kevin Wolf wrote:
> This adds GRAPH_RDLOCK annotations to declare that callers of
> bdrv_child_perm() need to hold a reader lock for the graph because
> some implementations access the children list of a node.
> 
> The callers of bdrv_child_perm() conveniently already hold the lock.
> 
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> ---
>  include/block/block_int-common.h | 10 +++++-----
>  block.c                          | 11 ++++++-----
>  block/copy-before-write.c        | 10 +++++-----
>  3 files changed, 16 insertions(+), 15 deletions(-)

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 16/21] block: Mark bdrv_parent_cb_change_media() GRAPH_RDLOCK
  2023-08-17 12:50 ` [PATCH 16/21] block: Mark bdrv_parent_cb_change_media() GRAPH_RDLOCK Kevin Wolf
  2023-08-21  7:35   ` Emanuele Giuseppe Esposito
@ 2023-08-22 19:29   ` Stefan Hajnoczi
  1 sibling, 0 replies; 70+ messages in thread
From: Stefan Hajnoczi @ 2023-08-22 19:29 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: qemu-block, eblake, eesposit, pbonzini, vsementsov, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 324 bytes --]

On Thu, Aug 17, 2023 at 02:50:15PM +0200, Kevin Wolf wrote:
> The function reads the parents list, so it needs to hold the graph lock.
> 
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> ---
>  block.c | 6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 17/21] block: Take graph rdlock in bdrv_drop_intermediate()
  2023-08-17 12:50 ` [PATCH 17/21] block: Take graph rdlock in bdrv_drop_intermediate() Kevin Wolf
  2023-08-21  7:35   ` Emanuele Giuseppe Esposito
@ 2023-08-22 19:35   ` Stefan Hajnoczi
  2023-09-05 15:26     ` Kevin Wolf
  1 sibling, 1 reply; 70+ messages in thread
From: Stefan Hajnoczi @ 2023-08-22 19:35 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: qemu-block, eblake, eesposit, pbonzini, vsementsov, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 1057 bytes --]

On Thu, Aug 17, 2023 at 02:50:16PM +0200, Kevin Wolf wrote:
> The function reads the parents list, so it needs to hold the graph lock.
> 
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> ---
>  block.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/block.c b/block.c
> index 7df8780d6e..a82389f742 100644
> --- a/block.c
> +++ b/block.c
> @@ -5934,9 +5934,11 @@ int bdrv_drop_intermediate(BlockDriverState *top, BlockDriverState *base,
>          backing_file_str = base->filename;
>      }
>  
> +    bdrv_graph_rdlock_main_loop();
>      QLIST_FOREACH(c, &top->parents, next_parent) {
>          updated_children = g_slist_prepend(updated_children, c);
>      }
> +    bdrv_graph_rdunlock_main_loop();

This is GLOBAL_STATE_CODE, so why take the read lock? I thought nothing
can modify the graph here. If it could, then stashing the parents in the
updated_children probably wouldn't be safe anyway.

>  
>      /*
>       * It seems correct to pass detach_subchain=true here, but it triggers
> -- 
> 2.41.0
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 19/21] block: Mark bdrv_root_unref_child() GRAPH_WRLOCK
  2023-08-17 12:50 ` [PATCH 19/21] block: Mark bdrv_root_unref_child() GRAPH_WRLOCK Kevin Wolf
  2023-08-21  7:35   ` Emanuele Giuseppe Esposito
@ 2023-08-22 19:38   ` Stefan Hajnoczi
  1 sibling, 0 replies; 70+ messages in thread
From: Stefan Hajnoczi @ 2023-08-22 19:38 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: qemu-block, eblake, eesposit, pbonzini, vsementsov, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 727 bytes --]

On Thu, Aug 17, 2023 at 02:50:18PM +0200, Kevin Wolf wrote:
> Instead of taking the writer lock internally, require callers to already
> hold it when calling bdrv_root_unref_child(). These callers will
> typically already hold the graph lock once the locking work is
> completed, which means that they can't call functions that take it
> internally.
> 
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> ---
>  include/block/block_int-global-state.h | 2 +-
>  block.c                                | 6 +++---
>  block/block-backend.c                  | 3 +++
>  blockjob.c                             | 2 ++
>  4 files changed, 9 insertions(+), 4 deletions(-)

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 20/21] block: Mark bdrv_unref_child() GRAPH_WRLOCK
  2023-08-17 12:50 ` [PATCH 20/21] block: Mark bdrv_unref_child() GRAPH_WRLOCK Kevin Wolf
  2023-08-21  7:35   ` Emanuele Giuseppe Esposito
@ 2023-08-22 19:39   ` Stefan Hajnoczi
  1 sibling, 0 replies; 70+ messages in thread
From: Stefan Hajnoczi @ 2023-08-22 19:39 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: qemu-block, eblake, eesposit, pbonzini, vsementsov, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 1024 bytes --]

On Thu, Aug 17, 2023 at 02:50:19PM +0200, Kevin Wolf wrote:
> Instead of taking the writer lock internally, require callers to already
> hold it when calling bdrv_unref_child(). These callers will typically
> already hold the graph lock once the locking work is completed, which
> means that they can't call functions that take it internally.
> 
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> ---
>  include/block/block-global-state.h |  7 ++++++-
>  block.c                            | 11 +++++++----
>  block/blklogwrites.c               |  4 ++++
>  block/blkverify.c                  |  2 ++
>  block/qcow2.c                      |  4 +++-
>  block/quorum.c                     |  6 ++++++
>  block/replication.c                |  3 +++
>  block/snapshot.c                   |  2 ++
>  block/vmdk.c                       | 11 +++++++++++
>  tests/unit/test-bdrv-drain.c       |  8 ++++++--
>  10 files changed, 50 insertions(+), 8 deletions(-)

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 21/21] block: Mark bdrv_add/del_child() and caller GRAPH_WRLOCK
  2023-08-17 12:50 ` [PATCH 21/21] block: Mark bdrv_add/del_child() and caller GRAPH_WRLOCK Kevin Wolf
  2023-08-21  7:35   ` Emanuele Giuseppe Esposito
@ 2023-08-22 19:40   ` Stefan Hajnoczi
  1 sibling, 0 replies; 70+ messages in thread
From: Stefan Hajnoczi @ 2023-08-22 19:40 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: qemu-block, eblake, eesposit, pbonzini, vsementsov, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 681 bytes --]

On Thu, Aug 17, 2023 at 02:50:20PM +0200, Kevin Wolf wrote:
> The functions read the parents list in the generic block layer, so we
> need to hold the graph lock already there. The BlockDriver
> implementations actually modify the graph, so it has to be a writer
> lock.
> 
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> ---
>  include/block/block-global-state.h |  8 +++++---
>  include/block/block_int-common.h   |  9 +++++----
>  block/quorum.c                     | 23 ++++++-----------------
>  blockdev.c                         | 17 +++++++++++------
>  4 files changed, 27 insertions(+), 30 deletions(-)

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 17/21] block: Take graph rdlock in bdrv_drop_intermediate()
  2023-08-22 19:35   ` Stefan Hajnoczi
@ 2023-09-05 15:26     ` Kevin Wolf
  0 siblings, 0 replies; 70+ messages in thread
From: Kevin Wolf @ 2023-09-05 15:26 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: qemu-block, eblake, eesposit, pbonzini, vsementsov, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 1281 bytes --]

Am 22.08.2023 um 21:35 hat Stefan Hajnoczi geschrieben:
> On Thu, Aug 17, 2023 at 02:50:16PM +0200, Kevin Wolf wrote:
> > The function reads the parents list, so it needs to hold the graph lock.
> > 
> > Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> > ---
> >  block.c | 2 ++
> >  1 file changed, 2 insertions(+)
> > 
> > diff --git a/block.c b/block.c
> > index 7df8780d6e..a82389f742 100644
> > --- a/block.c
> > +++ b/block.c
> > @@ -5934,9 +5934,11 @@ int bdrv_drop_intermediate(BlockDriverState *top, BlockDriverState *base,
> >          backing_file_str = base->filename;
> >      }
> >  
> > +    bdrv_graph_rdlock_main_loop();
> >      QLIST_FOREACH(c, &top->parents, next_parent) {
> >          updated_children = g_slist_prepend(updated_children, c);
> >      }
> > +    bdrv_graph_rdunlock_main_loop();
> 
> This is GLOBAL_STATE_CODE, so why take the read lock? I thought nothing
> can modify the graph here. If it could, then stashing the parents in
> the updated_children probably wouldn't be safe anyway.

The only thing bdrv_graph_rdlock_main_loop() does is asserting that the
conditions for doing nothing are met (GLOBAL_STATE_CODE + non-coroutine
context) and providing the right TSA attributes to make the compiler
happy.

Kevin

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 05/21] block: Introduce bdrv_schedule_unref()
  2023-08-22 19:01   ` Stefan Hajnoczi
@ 2023-09-05 16:39     ` Kevin Wolf
  2023-09-06  9:17       ` Kevin Wolf
  0 siblings, 1 reply; 70+ messages in thread
From: Kevin Wolf @ 2023-09-05 16:39 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: qemu-block, eblake, eesposit, pbonzini, vsementsov, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 6627 bytes --]

Am 22.08.2023 um 21:01 hat Stefan Hajnoczi geschrieben:
> On Thu, Aug 17, 2023 at 02:50:04PM +0200, Kevin Wolf wrote:
> > bdrv_unref() is called by a lot of places that need to hold the graph
> > lock (it naturally happens in the context of operations that change the
> > graph). However, bdrv_unref() takes the graph writer lock internally, so
> > it can't actually be called while already holding a graph lock without
> > causing a deadlock.
> > 
> > bdrv_unref() also can't just become GRAPH_WRLOCK because it drains the
> > node before closing it, and draining requires that the graph is
> > unlocked.
> > 
> > The solution is to defer deleting the node until we don't hold the lock
> > any more and draining is possible again.
> > 
> > Note that keeping images open for longer than necessary can create
> > problems, too: You can't open an image again before it is really closed
> > (if image locking didn't prevent it, it would cause corruption).
> > Reopening an image immediately happens at least during bdrv_open() and
> > bdrv_co_create().
> > 
> > In order to solve this problem, make sure to run the deferred unref in
> > bdrv_graph_wrunlock(), i.e. the first possible place where we can drain
> > again. This is also why bdrv_schedule_unref() is marked GRAPH_WRLOCK.
> > 
> > The output of iotest 051 is updated because the additional polling
> > changes the order of HMP output, resulting in a new "(qemu)" prompt in
> > the test output that was previously on a separate line and filtered out.
> > 
> > Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> > ---
> >  include/block/block-global-state.h |  1 +
> >  block.c                            |  9 +++++++++
> >  block/graph-lock.c                 | 23 ++++++++++++++++-------
> >  tests/qemu-iotests/051.pc.out      |  6 +++---
> >  4 files changed, 29 insertions(+), 10 deletions(-)
> > 
> > diff --git a/include/block/block-global-state.h b/include/block/block-global-state.h
> > index f347199bff..e570799f85 100644
> > --- a/include/block/block-global-state.h
> > +++ b/include/block/block-global-state.h
> > @@ -224,6 +224,7 @@ void bdrv_img_create(const char *filename, const char *fmt,
> >  void bdrv_ref(BlockDriverState *bs);
> >  void no_coroutine_fn bdrv_unref(BlockDriverState *bs);
> >  void coroutine_fn no_co_wrapper bdrv_co_unref(BlockDriverState *bs);
> > +void GRAPH_WRLOCK bdrv_schedule_unref(BlockDriverState *bs);
> >  void bdrv_unref_child(BlockDriverState *parent, BdrvChild *child);
> >  BdrvChild *bdrv_attach_child(BlockDriverState *parent_bs,
> >                               BlockDriverState *child_bs,
> > diff --git a/block.c b/block.c
> > index 6376452768..9c4f24f4b9 100644
> > --- a/block.c
> > +++ b/block.c
> > @@ -7033,6 +7033,15 @@ void bdrv_unref(BlockDriverState *bs)
> >      }
> >  }
> >  
> > +void bdrv_schedule_unref(BlockDriverState *bs)
> 
> Please add a doc comment explaining when and why this should be used.

Ok.

> > +{
> > +    if (!bs) {
> > +        return;
> > +    }
> > +    aio_bh_schedule_oneshot(qemu_get_aio_context(),
> > +                            (QEMUBHFunc *) bdrv_unref, bs);
> > +}
> > +
> >  struct BdrvOpBlocker {
> >      Error *reason;
> >      QLIST_ENTRY(BdrvOpBlocker) list;
> > diff --git a/block/graph-lock.c b/block/graph-lock.c
> > index 5e66f01ae8..395d387651 100644
> > --- a/block/graph-lock.c
> > +++ b/block/graph-lock.c
> > @@ -163,17 +163,26 @@ void bdrv_graph_wrlock(BlockDriverState *bs)
> >  void bdrv_graph_wrunlock(void)
> >  {
> >      GLOBAL_STATE_CODE();
> > -    QEMU_LOCK_GUARD(&aio_context_list_lock);
> >      assert(qatomic_read(&has_writer));
> >  
> > +    WITH_QEMU_LOCK_GUARD(&aio_context_list_lock) {
> > +        /*
> > +         * No need for memory barriers, this works in pair with
> > +         * the slow path of rdlock() and both take the lock.
> > +         */
> > +        qatomic_store_release(&has_writer, 0);
> > +
> > +        /* Wake up all coroutine that are waiting to read the graph */
> 
> s/coroutine/coroutines/

I only changed the indentation, but I guess I can just fix it while I
touch it.

> > +        qemu_co_enter_all(&reader_queue, &aio_context_list_lock);
> > +    }
> > +
> >      /*
> > -     * No need for memory barriers, this works in pair with
> > -     * the slow path of rdlock() and both take the lock.
> > +     * Run any BHs that were scheduled during the wrlock section and that
> > +     * callers might expect to have finished (e.g. bdrv_unref() calls). Do this
> 
> Referring directly to bdrv_schedule_unref() would help make it clearer
> what you mean.
> 
> > +     * only after restarting coroutines so that nested event loops in BHs don't
> > +     * deadlock if their condition relies on the coroutine making progress.
> >       */
> > -    qatomic_store_release(&has_writer, 0);
> > -
> > -    /* Wake up all coroutine that are waiting to read the graph */
> > -    qemu_co_enter_all(&reader_queue, &aio_context_list_lock);
> > +    aio_bh_poll(qemu_get_aio_context());
> 
> Keeping a dedicated list of BDSes pending unref seems cleaner than using
> the aio_bh_poll(). There's a lot of stuff happening in the event loop
> and I wonder if those other BHs might cause issues if they run at the
> end of bdrv_graph_wrunlock(). The change to qemu-iotests 051's output is
> an example of this, but other things could happen too (e.g. monitor
> command processing).

I would argue that it's no worse than the old code: The bdrv_unref()
that we're replacing would already run a nested event loop if it was the
last reference. We only moved this a bit later, making the part of the
code that doesn't run an event loop a bit larger. The difference is that
we're running it unconditionally now, not only in a special case, but I
think that's actually an improvement (in terms of test coverage etc.)

We run nested event loops in all kinds of different places without
thinking much about it. If we're worried about it here, what do we do
about all these other places?

Anyway, if you think that we should process only bdrv_schedule_unref()
here, how would you approach this? Restrict it to bdrv_schedule_unref()
in the hope that this is the only operation we'll need to delay, or
implement another BH mechanism from scratch in graph-lock.c? In theory
we could also add another AioContext where actual BHs can be queued and
that is only polled here, but the iohandler context is already painful
enough that I don't necessarily want to create another thing like it.

Kevin

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 05/21] block: Introduce bdrv_schedule_unref()
  2023-09-05 16:39     ` Kevin Wolf
@ 2023-09-06  9:17       ` Kevin Wolf
  0 siblings, 0 replies; 70+ messages in thread
From: Kevin Wolf @ 2023-09-06  9:17 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: qemu-block, eblake, eesposit, pbonzini, vsementsov, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 7406 bytes --]

Am 05.09.2023 um 18:39 hat Kevin Wolf geschrieben:
> Am 22.08.2023 um 21:01 hat Stefan Hajnoczi geschrieben:
> > On Thu, Aug 17, 2023 at 02:50:04PM +0200, Kevin Wolf wrote:
> > > bdrv_unref() is called by a lot of places that need to hold the graph
> > > lock (it naturally happens in the context of operations that change the
> > > graph). However, bdrv_unref() takes the graph writer lock internally, so
> > > it can't actually be called while already holding a graph lock without
> > > causing a deadlock.
> > > 
> > > bdrv_unref() also can't just become GRAPH_WRLOCK because it drains the
> > > node before closing it, and draining requires that the graph is
> > > unlocked.
> > > 
> > > The solution is to defer deleting the node until we don't hold the lock
> > > any more and draining is possible again.
> > > 
> > > Note that keeping images open for longer than necessary can create
> > > problems, too: You can't open an image again before it is really closed
> > > (if image locking didn't prevent it, it would cause corruption).
> > > Reopening an image immediately happens at least during bdrv_open() and
> > > bdrv_co_create().
> > > 
> > > In order to solve this problem, make sure to run the deferred unref in
> > > bdrv_graph_wrunlock(), i.e. the first possible place where we can drain
> > > again. This is also why bdrv_schedule_unref() is marked GRAPH_WRLOCK.
> > > 
> > > The output of iotest 051 is updated because the additional polling
> > > changes the order of HMP output, resulting in a new "(qemu)" prompt in
> > > the test output that was previously on a separate line and filtered out.
> > > 
> > > Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> > > ---
> > >  include/block/block-global-state.h |  1 +
> > >  block.c                            |  9 +++++++++
> > >  block/graph-lock.c                 | 23 ++++++++++++++++-------
> > >  tests/qemu-iotests/051.pc.out      |  6 +++---
> > >  4 files changed, 29 insertions(+), 10 deletions(-)
> > > 
> > > diff --git a/include/block/block-global-state.h b/include/block/block-global-state.h
> > > index f347199bff..e570799f85 100644
> > > --- a/include/block/block-global-state.h
> > > +++ b/include/block/block-global-state.h
> > > @@ -224,6 +224,7 @@ void bdrv_img_create(const char *filename, const char *fmt,
> > >  void bdrv_ref(BlockDriverState *bs);
> > >  void no_coroutine_fn bdrv_unref(BlockDriverState *bs);
> > >  void coroutine_fn no_co_wrapper bdrv_co_unref(BlockDriverState *bs);
> > > +void GRAPH_WRLOCK bdrv_schedule_unref(BlockDriverState *bs);
> > >  void bdrv_unref_child(BlockDriverState *parent, BdrvChild *child);
> > >  BdrvChild *bdrv_attach_child(BlockDriverState *parent_bs,
> > >                               BlockDriverState *child_bs,
> > > diff --git a/block.c b/block.c
> > > index 6376452768..9c4f24f4b9 100644
> > > --- a/block.c
> > > +++ b/block.c
> > > @@ -7033,6 +7033,15 @@ void bdrv_unref(BlockDriverState *bs)
> > >      }
> > >  }
> > >  
> > > +void bdrv_schedule_unref(BlockDriverState *bs)
> > 
> > Please add a doc comment explaining when and why this should be used.
> 
> Ok.
> 
> > > +{
> > > +    if (!bs) {
> > > +        return;
> > > +    }
> > > +    aio_bh_schedule_oneshot(qemu_get_aio_context(),
> > > +                            (QEMUBHFunc *) bdrv_unref, bs);
> > > +}
> > > +
> > >  struct BdrvOpBlocker {
> > >      Error *reason;
> > >      QLIST_ENTRY(BdrvOpBlocker) list;
> > > diff --git a/block/graph-lock.c b/block/graph-lock.c
> > > index 5e66f01ae8..395d387651 100644
> > > --- a/block/graph-lock.c
> > > +++ b/block/graph-lock.c
> > > @@ -163,17 +163,26 @@ void bdrv_graph_wrlock(BlockDriverState *bs)
> > >  void bdrv_graph_wrunlock(void)
> > >  {
> > >      GLOBAL_STATE_CODE();
> > > -    QEMU_LOCK_GUARD(&aio_context_list_lock);
> > >      assert(qatomic_read(&has_writer));
> > >  
> > > +    WITH_QEMU_LOCK_GUARD(&aio_context_list_lock) {
> > > +        /*
> > > +         * No need for memory barriers, this works in pair with
> > > +         * the slow path of rdlock() and both take the lock.
> > > +         */
> > > +        qatomic_store_release(&has_writer, 0);
> > > +
> > > +        /* Wake up all coroutine that are waiting to read the graph */
> > 
> > s/coroutine/coroutines/
> 
> I only changed the indentation, but I guess I can just fix it while I
> touch it.
> 
> > > +        qemu_co_enter_all(&reader_queue, &aio_context_list_lock);
> > > +    }
> > > +
> > >      /*
> > > -     * No need for memory barriers, this works in pair with
> > > -     * the slow path of rdlock() and both take the lock.
> > > +     * Run any BHs that were scheduled during the wrlock section and that
> > > +     * callers might expect to have finished (e.g. bdrv_unref() calls). Do this
> > 
> > Referring directly to bdrv_schedule_unref() would help make it clearer
> > what you mean.
> > 
> > > +     * only after restarting coroutines so that nested event loops in BHs don't
> > > +     * deadlock if their condition relies on the coroutine making progress.
> > >       */
> > > -    qatomic_store_release(&has_writer, 0);
> > > -
> > > -    /* Wake up all coroutine that are waiting to read the graph */
> > > -    qemu_co_enter_all(&reader_queue, &aio_context_list_lock);
> > > +    aio_bh_poll(qemu_get_aio_context());
> > 
> > Keeping a dedicated list of BDSes pending unref seems cleaner than using
> > the aio_bh_poll(). There's a lot of stuff happening in the event loop
> > and I wonder if those other BHs might cause issues if they run at the
> > end of bdrv_graph_wrunlock(). The change to qemu-iotests 051's output is
> > an example of this, but other things could happen too (e.g. monitor
> > command processing).
> 
> I would argue that it's no worse than the old code: The bdrv_unref()
> that we're replacing would already run a nested event loop if it was the
> last reference. We only moved this a bit later, making the part of the
> code that doesn't run an event loop a bit larger. The difference is that
> we're running it unconditionally now, not only in a special case, but I
> think that's actually an improvement (in terms of test coverage etc.)
> 
> We run nested event loops in all kinds of different places without
> thinking much about it. If we're worried about it here, what do we do
> about all these other places?
> 
> Anyway, if you think that we should process only bdrv_schedule_unref()
> here, how would you approach this? Restrict it to bdrv_schedule_unref()
> in the hope that this is the only operation we'll need to delay, or
> implement another BH mechanism from scratch in graph-lock.c? In theory
> we could also add another AioContext where actual BHs can be queued and
> that is only polled here, but the iohandler context is already painful
> enough that I don't necessarily want to create another thing like it.

Actually, come to think of it, even processing only scheduled unrefs is
effectively a nested event loop (via bdrv_close()) that could run for
any caller. The only thing we would achieve is making it conditional
again so that it triggers only sometimes. But callers still have to make
sure that polling is safe because it could be the last reference.

So even the little use I saw yesterday, I'm not sure about it any more
now.

Kevin

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 70+ messages in thread

end of thread, other threads:[~2023-09-06  9:21 UTC | newest]

Thread overview: 70+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-08-17 12:49 [PATCH 00/21] Graph locking part 4 (node management) Kevin Wolf
2023-08-17 12:50 ` [PATCH 01/21] block: Remove unused BlockReopenQueueEntry.perms_checked Kevin Wolf
2023-08-17 19:18   ` Eric Blake
2023-08-21  7:34   ` Emanuele Giuseppe Esposito
2023-08-22 18:23   ` Stefan Hajnoczi
2023-08-17 12:50 ` [PATCH 02/21] preallocate: Factor out preallocate_truncate_to_real_size() Kevin Wolf
2023-08-18 15:23   ` Eric Blake
2023-08-21  7:34   ` Emanuele Giuseppe Esposito
2023-08-22 18:26   ` Stefan Hajnoczi
2023-08-17 12:50 ` [PATCH 03/21] preallocate: Don't poll during permission updates Kevin Wolf
2023-08-18 15:34   ` Eric Blake
2023-08-22 18:41   ` Stefan Hajnoczi
2023-08-17 12:50 ` [PATCH 04/21] block: Take AioContext lock for bdrv_append() more consistently Kevin Wolf
2023-08-18 16:07   ` Eric Blake
2023-08-21  7:34   ` Emanuele Giuseppe Esposito
2023-08-22 18:49   ` Stefan Hajnoczi
2023-08-17 12:50 ` [PATCH 05/21] block: Introduce bdrv_schedule_unref() Kevin Wolf
2023-08-18 16:23   ` Eric Blake
2023-08-18 16:26     ` Eric Blake
2023-08-21 16:59       ` Kevin Wolf
2023-08-22 19:01   ` Stefan Hajnoczi
2023-09-05 16:39     ` Kevin Wolf
2023-09-06  9:17       ` Kevin Wolf
2023-08-17 12:50 ` [PATCH 06/21] block-coroutine-wrapper: Add no_co_wrapper_bdrv_wrlock functions Kevin Wolf
2023-08-21  7:34   ` Emanuele Giuseppe Esposito
2023-08-22 19:03   ` Stefan Hajnoczi
2023-08-17 12:50 ` [PATCH 07/21] block-coroutine-wrapper: Allow arbitrary parameter names Kevin Wolf
2023-08-21  7:35   ` Emanuele Giuseppe Esposito
2023-08-22 19:03   ` Stefan Hajnoczi
2023-08-17 12:50 ` [PATCH 08/21] block: Mark bdrv_replace_child_noperm() GRAPH_WRLOCK Kevin Wolf
2023-08-21  7:35   ` Emanuele Giuseppe Esposito
2023-08-22 19:05   ` Stefan Hajnoczi
2023-08-17 12:50 ` [PATCH 09/21] block: Mark bdrv_replace_child_tran() GRAPH_WRLOCK Kevin Wolf
2023-08-21  7:35   ` Emanuele Giuseppe Esposito
2023-08-22 19:14   ` Stefan Hajnoczi
2023-08-17 12:50 ` [PATCH 10/21] block: Mark bdrv_attach_child_common() GRAPH_WRLOCK Kevin Wolf
2023-08-22 19:16   ` Stefan Hajnoczi
2023-08-17 12:50 ` [PATCH 11/21] block: Call transaction callbacks with lock held Kevin Wolf
2023-08-22 19:19   ` Stefan Hajnoczi
2023-08-17 12:50 ` [PATCH 12/21] block: Mark bdrv_attach_child() GRAPH_WRLOCK Kevin Wolf
2023-08-21  7:35   ` Emanuele Giuseppe Esposito
2023-08-22 19:21   ` Stefan Hajnoczi
2023-08-17 12:50 ` [PATCH 13/21] block: Mark bdrv_parent_perms_conflict() and callers GRAPH_RDLOCK Kevin Wolf
2023-08-21  7:35   ` Emanuele Giuseppe Esposito
2023-08-22 19:23   ` Stefan Hajnoczi
2023-08-17 12:50 ` [PATCH 14/21] block: Mark bdrv_get_cumulative_perm() " Kevin Wolf
2023-08-21  7:35   ` Emanuele Giuseppe Esposito
2023-08-22 19:28   ` Stefan Hajnoczi
2023-08-17 12:50 ` [PATCH 15/21] block: Mark bdrv_child_perm() GRAPH_RDLOCK Kevin Wolf
2023-08-21  7:35   ` Emanuele Giuseppe Esposito
2023-08-22 19:29   ` Stefan Hajnoczi
2023-08-17 12:50 ` [PATCH 16/21] block: Mark bdrv_parent_cb_change_media() GRAPH_RDLOCK Kevin Wolf
2023-08-21  7:35   ` Emanuele Giuseppe Esposito
2023-08-22 19:29   ` Stefan Hajnoczi
2023-08-17 12:50 ` [PATCH 17/21] block: Take graph rdlock in bdrv_drop_intermediate() Kevin Wolf
2023-08-21  7:35   ` Emanuele Giuseppe Esposito
2023-08-22 19:35   ` Stefan Hajnoczi
2023-09-05 15:26     ` Kevin Wolf
2023-08-17 12:50 ` [PATCH 18/21] block: Take graph rdlock in bdrv_change_aio_context() Kevin Wolf
2023-08-21  7:35   ` Emanuele Giuseppe Esposito
2023-08-17 12:50 ` [PATCH 19/21] block: Mark bdrv_root_unref_child() GRAPH_WRLOCK Kevin Wolf
2023-08-21  7:35   ` Emanuele Giuseppe Esposito
2023-08-22 19:38   ` Stefan Hajnoczi
2023-08-17 12:50 ` [PATCH 20/21] block: Mark bdrv_unref_child() GRAPH_WRLOCK Kevin Wolf
2023-08-21  7:35   ` Emanuele Giuseppe Esposito
2023-08-22 19:39   ` Stefan Hajnoczi
2023-08-17 12:50 ` [PATCH 21/21] block: Mark bdrv_add/del_child() and caller GRAPH_WRLOCK Kevin Wolf
2023-08-21  7:35   ` Emanuele Giuseppe Esposito
2023-08-22 19:40   ` Stefan Hajnoczi
2023-08-22 18:20 ` [PATCH 00/21] Graph locking part 4 (node management) Stefan Hajnoczi

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.