All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 00/21] block: Fix check_to_replace_node()
@ 2020-01-30 21:44 Max Reitz
  2020-01-30 21:44 ` [PATCH v3 01/21] blockdev: Allow external snapshots everywhere Max Reitz
                   ` (20 more replies)
  0 siblings, 21 replies; 41+ messages in thread
From: Max Reitz @ 2020-01-30 21:44 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, qemu-devel, Max Reitz

Branch: https://github.com/XanClic/qemu.git fix-can-replace-v3
Branch: https://git.xanclic.moe/XanClic/qemu.git fix-can-replace-v3

v1: https://lists.nongnu.org/archive/html/qemu-block/2019-09/msg01027.html
v2: https://lists.nongnu.org/archive/html/qemu-block/2019-11/msg00370.html


Hi,

For what this series does, see the cover letter of v1 (linked above).


In v3 I’ve addressed Vladimir’s further comments:
- Patch 9: Skip the loop if c == NULL (and also renamed @i to @child_i)

- Patch 10: Drop local_err, check bdrv_child_refresh_perm()’s return
            value instead

- Patch 11: Amended commit message and comment

- Patch 12: Extended to cover the “new” compress filter driver

- Patch 15: Dropped; considering qemu never actually creates loops,
            there is never an unsafe configuration.  Sometimes the user
            may not get what they want, but then maybe this was actually
            what they wanted.  I think the discussion I had with
            Vladimir shows that this issue isn’t as clear-cut as I
            thought, so let’s wait with a “fix” until someone actually
            has a problem with the current behavior.

- Patch 17: Dropped; was superseded by a patch from Thomas.

- Patch 16 (was 18):
  - Moved doc string under def
  - Pointed out that @expected_node may be None, but this only means
    that the leaf must not exist.
  - Contracted asserts
  - %s/path_node/child_name/
  - Kept the prefix /, because, well, why not
  - Kept % for formatting, because I don’t see the advantage of
    .format() – I would have liked to use f strings, but those were only
    introduced in 3.6, and our minimum version is 3.5.

- Patch 17: Added

- Patch 18 (was 19): Dropped removal of self.vm.shutdown(), because
                     that’s done by patch 17 now

- Patch 20 (was 21): Moved doc strings under defs (kept R-b, because
                     that seemed right to do)

- Patch 21 (was 22): Moved doc string under def

- Patch 23: Dropped, because it only tested things changed by the old
            patch 17 (which I dropped, too)


git backport-diff against v2:

Key:
[----] : patches are identical
[####] : number of functional differences between upstream/downstream patch
[down] : patch is downstream-only
The flags [FC] indicate (F)unctional and (C)ontextual differences, respectively

001/21:[----] [--] 'blockdev: Allow external snapshots everywhere'
002/21:[----] [--] 'blockdev: Allow resizing everywhere'
003/21:[----] [--] 'block: Drop bdrv_is_first_non_filter()'
004/21:[----] [--] 'iotests: Let 041 use -blockdev for quorum children'
005/21:[----] [--] 'quorum: Fix child permissions'
006/21:[----] [--] 'block: Add bdrv_recurse_can_replace()'
007/21:[----] [--] 'blkverify: Implement .bdrv_recurse_can_replace()'
008/21:[----] [--] 'quorum: Store children in own structure'
009/21:[0014] [FC] 'quorum: Add QuorumChild.to_be_replaced'
010/21:[0011] [FC] 'quorum: Implement .bdrv_recurse_can_replace()'
011/21:[0002] [FC] 'block: Use bdrv_recurse_can_replace()'
012/21:[0009] [FC] 'block: Remove bdrv_recurse_is_first_non_filter()'
013/21:[----] [--] 'mirror: Double-check immediately before replacing'
014/21:[----] [--] 'quorum: Stop marking it as a filter'
015/21:[----] [--] 'iotests: Use complete_and_wait() in 155'
016/21:[0053] [FC] 'iotests: Add VM.assert_block_path()'
017/21:[down] 'iotests/041: Drop superfluous shutdowns'
018/21:[0001] [FC] 'iotests: Resolve TODOs in 041'
019/21:[----] [--] 'iotests: Use self.image_len in TestRepairQuorum'
020/21:[0016] [FC] 'iotests: Add tests for invalid Quorum @replaces'
021/21:[0006] [FC] 'iotests: Check that @replaces can replace filters'


Max Reitz (21):
  blockdev: Allow external snapshots everywhere
  blockdev: Allow resizing everywhere
  block: Drop bdrv_is_first_non_filter()
  iotests: Let 041 use -blockdev for quorum children
  quorum: Fix child permissions
  block: Add bdrv_recurse_can_replace()
  blkverify: Implement .bdrv_recurse_can_replace()
  quorum: Store children in own structure
  quorum: Add QuorumChild.to_be_replaced
  quorum: Implement .bdrv_recurse_can_replace()
  block: Use bdrv_recurse_can_replace()
  block: Remove bdrv_recurse_is_first_non_filter()
  mirror: Double-check immediately before replacing
  quorum: Stop marking it as a filter
  iotests: Use complete_and_wait() in 155
  iotests: Add VM.assert_block_path()
  iotests/041: Drop superfluous shutdowns
  iotests: Resolve TODOs in 041
  iotests: Use self.image_len in TestRepairQuorum
  iotests: Add tests for invalid Quorum @replaces
  iotests: Check that @replaces can replace filters

 block.c                       |  85 +++++++++---------
 block/blkverify.c             |  20 ++---
 block/copy-on-read.c          |   9 --
 block/filter-compress.c       |   9 --
 block/mirror.c                |  14 ++-
 block/quorum.c                | 160 ++++++++++++++++++++++++++--------
 block/replication.c           |   7 --
 block/throttle.c              |   8 --
 blockdev.c                    |  10 ---
 include/block/block.h         |   5 --
 include/block/block_int.h     |  16 ++--
 tests/qemu-iotests/041        | 138 +++++++++++++++++++++++++----
 tests/qemu-iotests/041.out    |   4 +-
 tests/qemu-iotests/155        |   7 +-
 tests/qemu-iotests/iotests.py |  56 ++++++++++++
 15 files changed, 375 insertions(+), 173 deletions(-)

-- 
2.24.1



^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH v3 01/21] blockdev: Allow external snapshots everywhere
  2020-01-30 21:44 [PATCH v3 00/21] block: Fix check_to_replace_node() Max Reitz
@ 2020-01-30 21:44 ` Max Reitz
  2020-01-30 21:44 ` [PATCH v3 02/21] blockdev: Allow resizing everywhere Max Reitz
                   ` (19 subsequent siblings)
  20 siblings, 0 replies; 41+ messages in thread
From: Max Reitz @ 2020-01-30 21:44 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, qemu-devel, Max Reitz

There is no good reason why we would allow external snapshots only on
the first non-filter node in a chain.  Parent BDSs should not care
whether their child is replaced by a snapshot.  (If they do care, they
should announce that via freezing the chain, which is checked in
bdrv_append() through bdrv_set_backing_hd().)

Before we had bdrv_is_first_non_filter() here (since 212a5a8f095), there
was a special function bdrv_check_ext_snapshot() that allowed snapshots
by default, but block drivers could override this.  Only blkverify did
so, however.

It is not clear to me why blkverify would do so; maybe just so that the
testee block driver would not be replaced.  The introducing commit
f6186f49e2c does not explain why.  Maybe because 08b24cfe376 would have
been the correct solution?  (Which adds a .supports_backing check.)

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 blockdev.c | 5 -----
 1 file changed, 5 deletions(-)

diff --git a/blockdev.c b/blockdev.c
index 4cd9a58d36..d47ed8e569 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -1596,11 +1596,6 @@ static void external_snapshot_prepare(BlkActionState *common,
         }
     }
 
-    if (!bdrv_is_first_non_filter(state->old_bs)) {
-        error_setg(errp, QERR_FEATURE_DISABLED, "snapshot");
-        goto out;
-    }
-
     if (action->type == TRANSACTION_ACTION_KIND_BLOCKDEV_SNAPSHOT_SYNC) {
         BlockdevSnapshotSync *s = action->u.blockdev_snapshot_sync.data;
         const char *format = s->has_format ? s->format : "qcow2";
-- 
2.24.1



^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v3 02/21] blockdev: Allow resizing everywhere
  2020-01-30 21:44 [PATCH v3 00/21] block: Fix check_to_replace_node() Max Reitz
  2020-01-30 21:44 ` [PATCH v3 01/21] blockdev: Allow external snapshots everywhere Max Reitz
@ 2020-01-30 21:44 ` Max Reitz
  2020-01-30 21:44 ` [PATCH v3 03/21] block: Drop bdrv_is_first_non_filter() Max Reitz
                   ` (18 subsequent siblings)
  20 siblings, 0 replies; 41+ messages in thread
From: Max Reitz @ 2020-01-30 21:44 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, qemu-devel, Max Reitz

Block nodes that do not allow resizing should not share BLK_PERM_RESIZE.
It does not matter whether they are the first non-filter in their chain
or not.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 blockdev.c | 5 -----
 1 file changed, 5 deletions(-)

diff --git a/blockdev.c b/blockdev.c
index d47ed8e569..bdd66d6c48 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -3335,11 +3335,6 @@ void qmp_block_resize(bool has_device, const char *device,
     aio_context = bdrv_get_aio_context(bs);
     aio_context_acquire(aio_context);
 
-    if (!bdrv_is_first_non_filter(bs)) {
-        error_setg(errp, QERR_FEATURE_DISABLED, "resize");
-        goto out;
-    }
-
     if (size < 0) {
         error_setg(errp, QERR_INVALID_PARAMETER_VALUE, "size", "a >0 size");
         goto out;
-- 
2.24.1



^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v3 03/21] block: Drop bdrv_is_first_non_filter()
  2020-01-30 21:44 [PATCH v3 00/21] block: Fix check_to_replace_node() Max Reitz
  2020-01-30 21:44 ` [PATCH v3 01/21] blockdev: Allow external snapshots everywhere Max Reitz
  2020-01-30 21:44 ` [PATCH v3 02/21] blockdev: Allow resizing everywhere Max Reitz
@ 2020-01-30 21:44 ` Max Reitz
  2020-01-30 21:44 ` [PATCH v3 04/21] iotests: Let 041 use -blockdev for quorum children Max Reitz
                   ` (17 subsequent siblings)
  20 siblings, 0 replies; 41+ messages in thread
From: Max Reitz @ 2020-01-30 21:44 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, qemu-devel, Max Reitz

It is unused now.  (And it was ugly because it needed to explore all BDS
chains from the top.)

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 block.c               | 26 --------------------------
 include/block/block.h |  1 -
 2 files changed, 27 deletions(-)

diff --git a/block.c b/block.c
index 99ce26d64d..6f58a4900f 100644
--- a/block.c
+++ b/block.c
@@ -6212,32 +6212,6 @@ bool bdrv_recurse_is_first_non_filter(BlockDriverState *bs,
     return false;
 }
 
-/* This function checks if the candidate is the first non filter bs down it's
- * bs chain. Since we don't have pointers to parents it explore all bs chains
- * from the top. Some filters can choose not to pass down the recursion.
- */
-bool bdrv_is_first_non_filter(BlockDriverState *candidate)
-{
-    BlockDriverState *bs;
-    BdrvNextIterator it;
-
-    /* walk down the bs forest recursively */
-    for (bs = bdrv_first(&it); bs; bs = bdrv_next(&it)) {
-        bool perm;
-
-        /* try to recurse in this top level bs */
-        perm = bdrv_recurse_is_first_non_filter(bs, candidate);
-
-        /* candidate is the first non filter */
-        if (perm) {
-            bdrv_next_cleanup(&it);
-            return true;
-        }
-    }
-
-    return false;
-}
-
 BlockDriverState *check_to_replace_node(BlockDriverState *parent_bs,
                                         const char *node_name, Error **errp)
 {
diff --git a/include/block/block.h b/include/block/block.h
index e9dcfef7fa..8f6a0cad9c 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -404,7 +404,6 @@ int bdrv_amend_options(BlockDriverState *bs_new, QemuOpts *opts,
 /* external snapshots */
 bool bdrv_recurse_is_first_non_filter(BlockDriverState *bs,
                                       BlockDriverState *candidate);
-bool bdrv_is_first_non_filter(BlockDriverState *candidate);
 
 /* check if a named node can be replaced when doing drive-mirror */
 BlockDriverState *check_to_replace_node(BlockDriverState *parent_bs,
-- 
2.24.1



^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v3 04/21] iotests: Let 041 use -blockdev for quorum children
  2020-01-30 21:44 [PATCH v3 00/21] block: Fix check_to_replace_node() Max Reitz
                   ` (2 preceding siblings ...)
  2020-01-30 21:44 ` [PATCH v3 03/21] block: Drop bdrv_is_first_non_filter() Max Reitz
@ 2020-01-30 21:44 ` Max Reitz
  2020-01-30 21:44 ` [PATCH v3 05/21] quorum: Fix child permissions Max Reitz
                   ` (16 subsequent siblings)
  20 siblings, 0 replies; 41+ messages in thread
From: Max Reitz @ 2020-01-30 21:44 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, qemu-devel, Max Reitz

Using -drive with default options means that a virtio-blk drive will be
created that has write access to the to-be quorum children.  Quorum
should have exclusive write access to them, so we should use -blockdev
instead.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 tests/qemu-iotests/041 | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/tests/qemu-iotests/041 b/tests/qemu-iotests/041
index 0181f7a9b6..a429281f61 100755
--- a/tests/qemu-iotests/041
+++ b/tests/qemu-iotests/041
@@ -885,7 +885,10 @@ class TestRepairQuorum(iotests.QMPTestCase):
             # Assign a node name to each quorum image in order to manipulate
             # them
             opts = "node-name=img%i" % self.IMAGES.index(i)
-            self.vm = self.vm.add_drive(i, opts)
+            opts += ',driver=%s' % iotests.imgfmt
+            opts += ',file.driver=file'
+            opts += ',file.filename=%s' % i
+            self.vm = self.vm.add_blockdev(opts)
 
         self.vm.launch()
 
-- 
2.24.1



^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v3 05/21] quorum: Fix child permissions
  2020-01-30 21:44 [PATCH v3 00/21] block: Fix check_to_replace_node() Max Reitz
                   ` (3 preceding siblings ...)
  2020-01-30 21:44 ` [PATCH v3 04/21] iotests: Let 041 use -blockdev for quorum children Max Reitz
@ 2020-01-30 21:44 ` Max Reitz
  2020-01-30 21:44 ` [PATCH v3 06/21] block: Add bdrv_recurse_can_replace() Max Reitz
                   ` (15 subsequent siblings)
  20 siblings, 0 replies; 41+ messages in thread
From: Max Reitz @ 2020-01-30 21:44 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, qemu-devel, Max Reitz

Quorum cannot share WRITE or RESIZE on its children.  Presumably, it
only does so because as a filter, it seemed intuitively correct to point
its .bdrv_child_perm to bdrv_filter_default_perm().

However, it is not really a filter, and bdrv_filter_default_perm() does
not work for it, so we have to provide a custom .bdrv_child_perm
implementation.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 block/quorum.c | 19 ++++++++++++++++++-
 1 file changed, 18 insertions(+), 1 deletion(-)

diff --git a/block/quorum.c b/block/quorum.c
index df68adcfaa..17b439056f 100644
--- a/block/quorum.c
+++ b/block/quorum.c
@@ -1114,6 +1114,23 @@ static char *quorum_dirname(BlockDriverState *bs, Error **errp)
     return NULL;
 }
 
+static void quorum_child_perm(BlockDriverState *bs, BdrvChild *c,
+                              const BdrvChildRole *role,
+                              BlockReopenQueue *reopen_queue,
+                              uint64_t perm, uint64_t shared,
+                              uint64_t *nperm, uint64_t *nshared)
+{
+    *nperm = perm & DEFAULT_PERM_PASSTHROUGH;
+
+    /*
+     * We cannot share RESIZE or WRITE, as this would make the
+     * children differ from each other.
+     */
+    *nshared = (shared & (BLK_PERM_CONSISTENT_READ |
+                          BLK_PERM_WRITE_UNCHANGED))
+             | DEFAULT_PERM_UNCHANGED;
+}
+
 static const char *const quorum_strong_runtime_opts[] = {
     QUORUM_OPT_VOTE_THRESHOLD,
     QUORUM_OPT_BLKVERIFY,
@@ -1143,7 +1160,7 @@ static BlockDriver bdrv_quorum = {
     .bdrv_add_child                     = quorum_add_child,
     .bdrv_del_child                     = quorum_del_child,
 
-    .bdrv_child_perm                    = bdrv_filter_default_perms,
+    .bdrv_child_perm                    = quorum_child_perm,
 
     .is_filter                          = true,
     .bdrv_recurse_is_first_non_filter   = quorum_recurse_is_first_non_filter,
-- 
2.24.1



^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v3 06/21] block: Add bdrv_recurse_can_replace()
  2020-01-30 21:44 [PATCH v3 00/21] block: Fix check_to_replace_node() Max Reitz
                   ` (4 preceding siblings ...)
  2020-01-30 21:44 ` [PATCH v3 05/21] quorum: Fix child permissions Max Reitz
@ 2020-01-30 21:44 ` Max Reitz
  2020-01-30 21:44 ` [PATCH v3 07/21] blkverify: Implement .bdrv_recurse_can_replace() Max Reitz
                   ` (14 subsequent siblings)
  20 siblings, 0 replies; 41+ messages in thread
From: Max Reitz @ 2020-01-30 21:44 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, qemu-devel, Max Reitz

After a couple of follow-up patches, this function will replace
bdrv_recurse_is_first_non_filter() in check_to_replace_node().

bdrv_recurse_is_first_non_filter() is both not sufficiently specific for
check_to_replace_node() (it allows cases that should not be allowed,
like replacing child nodes of quorum with dissenting data that have more
parents than just quorum), and it is too restrictive (it is perfectly
fine to replace filters).

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 block.c                   | 38 ++++++++++++++++++++++++++++++++++++++
 include/block/block_int.h | 10 ++++++++++
 2 files changed, 48 insertions(+)

diff --git a/block.c b/block.c
index 6f58a4900f..6623248443 100644
--- a/block.c
+++ b/block.c
@@ -6212,6 +6212,44 @@ bool bdrv_recurse_is_first_non_filter(BlockDriverState *bs,
     return false;
 }
 
+/*
+ * This function checks whether the given @to_replace is allowed to be
+ * replaced by a node that always shows the same data as @bs.  This is
+ * used for example to verify whether the mirror job can replace
+ * @to_replace by the target mirrored from @bs.
+ * To be replaceable, @bs and @to_replace may either be guaranteed to
+ * always show the same data (because they are only connected through
+ * filters), or some driver may allow replacing one of its children
+ * because it can guarantee that this child's data is not visible at
+ * all (for example, for dissenting quorum children that have no other
+ * parents).
+ */
+bool bdrv_recurse_can_replace(BlockDriverState *bs,
+                              BlockDriverState *to_replace)
+{
+    if (!bs || !bs->drv) {
+        return false;
+    }
+
+    if (bs == to_replace) {
+        return true;
+    }
+
+    /* See what the driver can do */
+    if (bs->drv->bdrv_recurse_can_replace) {
+        return bs->drv->bdrv_recurse_can_replace(bs, to_replace);
+    }
+
+    /* For filters without an own implementation, we can recurse on our own */
+    if (bs->drv->is_filter) {
+        BdrvChild *child = bs->file ?: bs->backing;
+        return bdrv_recurse_can_replace(child->bs, to_replace);
+    }
+
+    /* Safe default */
+    return false;
+}
+
 BlockDriverState *check_to_replace_node(BlockDriverState *parent_bs,
                                         const char *node_name, Error **errp)
 {
diff --git a/include/block/block_int.h b/include/block/block_int.h
index dd033d0b37..75f03dcc38 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -102,6 +102,13 @@ struct BlockDriver {
      */
     bool (*bdrv_recurse_is_first_non_filter)(BlockDriverState *bs,
                                              BlockDriverState *candidate);
+    /*
+     * Return true if @to_replace can be replaced by a BDS with the
+     * same data as @bs without it affecting @bs's behavior (that is,
+     * without it being visible to @bs's parents).
+     */
+    bool (*bdrv_recurse_can_replace)(BlockDriverState *bs,
+                                     BlockDriverState *to_replace);
 
     int (*bdrv_probe)(const uint8_t *buf, int buf_size, const char *filename);
     int (*bdrv_probe_device)(const char *filename);
@@ -1264,6 +1271,9 @@ void bdrv_format_default_perms(BlockDriverState *bs, BdrvChild *c,
                                uint64_t perm, uint64_t shared,
                                uint64_t *nperm, uint64_t *nshared);
 
+bool bdrv_recurse_can_replace(BlockDriverState *bs,
+                              BlockDriverState *to_replace);
+
 /*
  * Default implementation for drivers to pass bdrv_co_block_status() to
  * their file.
-- 
2.24.1



^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v3 07/21] blkverify: Implement .bdrv_recurse_can_replace()
  2020-01-30 21:44 [PATCH v3 00/21] block: Fix check_to_replace_node() Max Reitz
                   ` (5 preceding siblings ...)
  2020-01-30 21:44 ` [PATCH v3 06/21] block: Add bdrv_recurse_can_replace() Max Reitz
@ 2020-01-30 21:44 ` Max Reitz
  2020-01-30 21:44 ` [PATCH v3 08/21] quorum: Store children in own structure Max Reitz
                   ` (13 subsequent siblings)
  20 siblings, 0 replies; 41+ messages in thread
From: Max Reitz @ 2020-01-30 21:44 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, qemu-devel, Max Reitz

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 block/blkverify.c | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/block/blkverify.c b/block/blkverify.c
index 304b0a1368..0add3ab483 100644
--- a/block/blkverify.c
+++ b/block/blkverify.c
@@ -282,6 +282,20 @@ static bool blkverify_recurse_is_first_non_filter(BlockDriverState *bs,
     return bdrv_recurse_is_first_non_filter(s->test_file->bs, candidate);
 }
 
+static bool blkverify_recurse_can_replace(BlockDriverState *bs,
+                                          BlockDriverState *to_replace)
+{
+    BDRVBlkverifyState *s = bs->opaque;
+
+    /*
+     * blkverify quits the whole qemu process if there is a mismatch
+     * between bs->file->bs and s->test_file->bs.  Therefore, we know
+     * know that both must match bs and we can recurse down to either.
+     */
+    return bdrv_recurse_can_replace(bs->file->bs, to_replace) ||
+           bdrv_recurse_can_replace(s->test_file->bs, to_replace);
+}
+
 static void blkverify_refresh_filename(BlockDriverState *bs)
 {
     BDRVBlkverifyState *s = bs->opaque;
@@ -328,6 +342,7 @@ static BlockDriver bdrv_blkverify = {
 
     .is_filter                        = true,
     .bdrv_recurse_is_first_non_filter = blkverify_recurse_is_first_non_filter,
+    .bdrv_recurse_can_replace         = blkverify_recurse_can_replace,
 };
 
 static void bdrv_blkverify_init(void)
-- 
2.24.1



^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v3 08/21] quorum: Store children in own structure
  2020-01-30 21:44 [PATCH v3 00/21] block: Fix check_to_replace_node() Max Reitz
                   ` (6 preceding siblings ...)
  2020-01-30 21:44 ` [PATCH v3 07/21] blkverify: Implement .bdrv_recurse_can_replace() Max Reitz
@ 2020-01-30 21:44 ` Max Reitz
  2020-01-30 21:44 ` [PATCH v3 09/21] quorum: Add QuorumChild.to_be_replaced Max Reitz
                   ` (12 subsequent siblings)
  20 siblings, 0 replies; 41+ messages in thread
From: Max Reitz @ 2020-01-30 21:44 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, qemu-devel, Max Reitz

This will be useful when we want to store additional attributes for each
child.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 block/quorum.c | 64 ++++++++++++++++++++++++++++----------------------
 1 file changed, 36 insertions(+), 28 deletions(-)

diff --git a/block/quorum.c b/block/quorum.c
index 17b439056f..59cd524502 100644
--- a/block/quorum.c
+++ b/block/quorum.c
@@ -65,9 +65,13 @@ typedef struct QuorumVotes {
     bool (*compare)(QuorumVoteValue *a, QuorumVoteValue *b);
 } QuorumVotes;
 
+typedef struct QuorumChild {
+    BdrvChild *child;
+} QuorumChild;
+
 /* the following structure holds the state of one quorum instance */
 typedef struct BDRVQuorumState {
-    BdrvChild **children;  /* children BlockDriverStates */
+    QuorumChild *children;
     int num_children;      /* children count */
     unsigned next_child_index;  /* the index of the next child that should
                                  * be added
@@ -264,7 +268,7 @@ static void quorum_report_bad_versions(BDRVQuorumState *s,
         }
         QLIST_FOREACH(item, &version->items, next) {
             quorum_report_bad(QUORUM_OP_TYPE_READ, acb->offset, acb->bytes,
-                              s->children[item->index]->bs->node_name, 0);
+                              s->children[item->index].child->bs->node_name, 0);
         }
     }
 }
@@ -279,7 +283,7 @@ static void quorum_rewrite_entry(void *opaque)
      * corrupted data.
      * Mask out BDRV_REQ_WRITE_UNCHANGED because this overwrites the
      * area with different data from the other children. */
-    bdrv_co_pwritev(s->children[co->idx], acb->offset, acb->bytes,
+    bdrv_co_pwritev(s->children[co->idx].child, acb->offset, acb->bytes,
                     acb->qiov, acb->flags & ~BDRV_REQ_WRITE_UNCHANGED);
 
     /* Wake up the caller after the last rewrite */
@@ -578,8 +582,8 @@ static void read_quorum_children_entry(void *opaque)
     int i = co->idx;
     QuorumChildRequest *sacb = &acb->qcrs[i];
 
-    sacb->bs = s->children[i]->bs;
-    sacb->ret = bdrv_co_preadv(s->children[i], acb->offset, acb->bytes,
+    sacb->bs = s->children[i].child->bs;
+    sacb->ret = bdrv_co_preadv(s->children[i].child, acb->offset, acb->bytes,
                                &acb->qcrs[i].qiov, 0);
 
     if (sacb->ret == 0) {
@@ -605,7 +609,8 @@ static int read_quorum_children(QuorumAIOCB *acb)
 
     acb->children_read = s->num_children;
     for (i = 0; i < s->num_children; i++) {
-        acb->qcrs[i].buf = qemu_blockalign(s->children[i]->bs, acb->qiov->size);
+        acb->qcrs[i].buf = qemu_blockalign(s->children[i].child->bs,
+                                           acb->qiov->size);
         qemu_iovec_init(&acb->qcrs[i].qiov, acb->qiov->niov);
         qemu_iovec_clone(&acb->qcrs[i].qiov, acb->qiov, acb->qcrs[i].buf);
     }
@@ -647,8 +652,8 @@ static int read_fifo_child(QuorumAIOCB *acb)
     /* We try to read the next child in FIFO order if we failed to read */
     do {
         n = acb->children_read++;
-        acb->qcrs[n].bs = s->children[n]->bs;
-        ret = bdrv_co_preadv(s->children[n], acb->offset, acb->bytes,
+        acb->qcrs[n].bs = s->children[n].child->bs;
+        ret = bdrv_co_preadv(s->children[n].child, acb->offset, acb->bytes,
                              acb->qiov, 0);
         if (ret < 0) {
             quorum_report_bad_acb(&acb->qcrs[n], ret);
@@ -688,8 +693,8 @@ static void write_quorum_entry(void *opaque)
     int i = co->idx;
     QuorumChildRequest *sacb = &acb->qcrs[i];
 
-    sacb->bs = s->children[i]->bs;
-    sacb->ret = bdrv_co_pwritev(s->children[i], acb->offset, acb->bytes,
+    sacb->bs = s->children[i].child->bs;
+    sacb->ret = bdrv_co_pwritev(s->children[i].child, acb->offset, acb->bytes,
                                 acb->qiov, acb->flags);
     if (sacb->ret == 0) {
         acb->success_count++;
@@ -743,12 +748,12 @@ static int64_t quorum_getlength(BlockDriverState *bs)
     int i;
 
     /* check that all file have the same length */
-    result = bdrv_getlength(s->children[0]->bs);
+    result = bdrv_getlength(s->children[0].child->bs);
     if (result < 0) {
         return result;
     }
     for (i = 1; i < s->num_children; i++) {
-        int64_t value = bdrv_getlength(s->children[i]->bs);
+        int64_t value = bdrv_getlength(s->children[i].child->bs);
         if (value < 0) {
             return value;
         }
@@ -774,10 +779,10 @@ static coroutine_fn int quorum_co_flush(BlockDriverState *bs)
     error_votes.compare = quorum_64bits_compare;
 
     for (i = 0; i < s->num_children; i++) {
-        result = bdrv_co_flush(s->children[i]->bs);
+        result = bdrv_co_flush(s->children[i].child->bs);
         if (result) {
             quorum_report_bad(QUORUM_OP_TYPE_FLUSH, 0, 0,
-                              s->children[i]->bs->node_name, result);
+                              s->children[i].child->bs->node_name, result);
             result_value.l = result;
             quorum_count_vote(&error_votes, &result_value, i);
         } else {
@@ -803,7 +808,7 @@ static bool quorum_recurse_is_first_non_filter(BlockDriverState *bs,
     int i;
 
     for (i = 0; i < s->num_children; i++) {
-        bool perm = bdrv_recurse_is_first_non_filter(s->children[i]->bs,
+        bool perm = bdrv_recurse_is_first_non_filter(s->children[i].child->bs,
                                                      candidate);
         if (perm) {
             return true;
@@ -932,7 +937,7 @@ static int quorum_open(BlockDriverState *bs, QDict *options, int flags,
     }
 
     /* allocate the children array */
-    s->children = g_new0(BdrvChild *, s->num_children);
+    s->children = g_new0(QuorumChild, s->num_children);
     opened = g_new0(bool, s->num_children);
 
     for (i = 0; i < s->num_children; i++) {
@@ -940,8 +945,9 @@ static int quorum_open(BlockDriverState *bs, QDict *options, int flags,
         ret = snprintf(indexstr, 32, "children.%d", i);
         assert(ret < 32);
 
-        s->children[i] = bdrv_open_child(NULL, options, indexstr, bs,
-                                         &child_format, false, &local_err);
+        s->children[i].child = bdrv_open_child(NULL, options, indexstr, bs,
+                                               &child_format, false,
+                                               &local_err);
         if (local_err) {
             ret = -EINVAL;
             goto close_exit;
@@ -962,7 +968,7 @@ close_exit:
         if (!opened[i]) {
             continue;
         }
-        bdrv_unref_child(bs, s->children[i]);
+        bdrv_unref_child(bs, s->children[i].child);
     }
     g_free(s->children);
     g_free(opened);
@@ -979,7 +985,7 @@ static void quorum_close(BlockDriverState *bs)
     int i;
 
     for (i = 0; i < s->num_children; i++) {
-        bdrv_unref_child(bs, s->children[i]);
+        bdrv_unref_child(bs, s->children[i].child);
     }
 
     g_free(s->children);
@@ -998,8 +1004,8 @@ static void quorum_add_child(BlockDriverState *bs, BlockDriverState *child_bs,
         return;
     }
 
-    assert(s->num_children <= INT_MAX / sizeof(BdrvChild *));
-    if (s->num_children == INT_MAX / sizeof(BdrvChild *) ||
+    assert(s->num_children <= INT_MAX / sizeof(QuorumChild));
+    if (s->num_children == INT_MAX / sizeof(QuorumChild) ||
         s->next_child_index == UINT_MAX) {
         error_setg(errp, "Too many children");
         return;
@@ -1022,8 +1028,10 @@ static void quorum_add_child(BlockDriverState *bs, BlockDriverState *child_bs,
         s->next_child_index--;
         goto out;
     }
-    s->children = g_renew(BdrvChild *, s->children, s->num_children + 1);
-    s->children[s->num_children++] = child;
+    s->children = g_renew(QuorumChild, s->children, s->num_children + 1);
+    s->children[s->num_children++] = (QuorumChild){
+        .child = child,
+    };
 
 out:
     bdrv_drained_end(bs);
@@ -1036,7 +1044,7 @@ static void quorum_del_child(BlockDriverState *bs, BdrvChild *child,
     int i;
 
     for (i = 0; i < s->num_children; i++) {
-        if (s->children[i] == child) {
+        if (s->children[i].child == child) {
             break;
         }
     }
@@ -1058,8 +1066,8 @@ static void quorum_del_child(BlockDriverState *bs, BdrvChild *child,
 
     /* We can safely remove this child now */
     memmove(&s->children[i], &s->children[i + 1],
-            (s->num_children - i - 1) * sizeof(BdrvChild *));
-    s->children = g_renew(BdrvChild *, s->children, --s->num_children);
+            (s->num_children - i - 1) * sizeof(QuorumChild));
+    s->children = g_renew(QuorumChild, s->children, --s->num_children);
     bdrv_unref_child(bs, child);
 
     bdrv_drained_end(bs);
@@ -1100,7 +1108,7 @@ static void quorum_gather_child_options(BlockDriverState *bs, QDict *target,
 
     for (i = 0; i < s->num_children; i++) {
         qlist_append(children_list,
-                     qobject_ref(s->children[i]->bs->full_open_options));
+                     qobject_ref(s->children[i].child->bs->full_open_options));
     }
 }
 
-- 
2.24.1



^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v3 09/21] quorum: Add QuorumChild.to_be_replaced
  2020-01-30 21:44 [PATCH v3 00/21] block: Fix check_to_replace_node() Max Reitz
                   ` (7 preceding siblings ...)
  2020-01-30 21:44 ` [PATCH v3 08/21] quorum: Store children in own structure Max Reitz
@ 2020-01-30 21:44 ` Max Reitz
  2020-02-04  9:33   ` Vladimir Sementsov-Ogievskiy
  2020-02-05 15:38   ` Kevin Wolf
  2020-01-30 21:44 ` [PATCH v3 10/21] quorum: Implement .bdrv_recurse_can_replace() Max Reitz
                   ` (11 subsequent siblings)
  20 siblings, 2 replies; 41+ messages in thread
From: Max Reitz @ 2020-01-30 21:44 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, qemu-devel, Max Reitz

We will need this to verify that Quorum can let one of its children be
replaced without breaking anything else.

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 block/quorum.c | 25 +++++++++++++++++++++++++
 1 file changed, 25 insertions(+)

diff --git a/block/quorum.c b/block/quorum.c
index 59cd524502..6a7224c9e4 100644
--- a/block/quorum.c
+++ b/block/quorum.c
@@ -67,6 +67,13 @@ typedef struct QuorumVotes {
 
 typedef struct QuorumChild {
     BdrvChild *child;
+
+    /*
+     * If set, check whether this node can be replaced without any
+     * other parent noticing: Unshare CONSISTENT_READ, and take the
+     * WRITE permission.
+     */
+    bool to_be_replaced;
 } QuorumChild;
 
 /* the following structure holds the state of one quorum instance */
@@ -1128,6 +1135,18 @@ static void quorum_child_perm(BlockDriverState *bs, BdrvChild *c,
                               uint64_t perm, uint64_t shared,
                               uint64_t *nperm, uint64_t *nshared)
 {
+    BDRVQuorumState *s = bs->opaque;
+    int child_i = -1;
+
+    if (c) {
+        for (child_i = 0; child_i < s->num_children; child_i++) {
+            if (s->children[child_i].child == c) {
+                break;
+            }
+        }
+        assert(child_i < s->num_children);
+    }
+
     *nperm = perm & DEFAULT_PERM_PASSTHROUGH;
 
     /*
@@ -1137,6 +1156,12 @@ static void quorum_child_perm(BlockDriverState *bs, BdrvChild *c,
     *nshared = (shared & (BLK_PERM_CONSISTENT_READ |
                           BLK_PERM_WRITE_UNCHANGED))
              | DEFAULT_PERM_UNCHANGED;
+
+    if (child_i >= 0 && s->children[child_i].to_be_replaced) {
+        /* Prepare for sudden data changes */
+        *nperm |= BLK_PERM_WRITE;
+        *nshared &= ~BLK_PERM_CONSISTENT_READ;
+    }
 }
 
 static const char *const quorum_strong_runtime_opts[] = {
-- 
2.24.1



^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v3 10/21] quorum: Implement .bdrv_recurse_can_replace()
  2020-01-30 21:44 [PATCH v3 00/21] block: Fix check_to_replace_node() Max Reitz
                   ` (8 preceding siblings ...)
  2020-01-30 21:44 ` [PATCH v3 09/21] quorum: Add QuorumChild.to_be_replaced Max Reitz
@ 2020-01-30 21:44 ` Max Reitz
  2020-02-04  9:37   ` Vladimir Sementsov-Ogievskiy
  2020-01-30 21:44 ` [PATCH v3 11/21] block: Use bdrv_recurse_can_replace() Max Reitz
                   ` (10 subsequent siblings)
  20 siblings, 1 reply; 41+ messages in thread
From: Max Reitz @ 2020-01-30 21:44 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, qemu-devel, Max Reitz

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 block/quorum.c | 57 ++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 57 insertions(+)

diff --git a/block/quorum.c b/block/quorum.c
index 6a7224c9e4..22c1060b42 100644
--- a/block/quorum.c
+++ b/block/quorum.c
@@ -825,6 +825,62 @@ static bool quorum_recurse_is_first_non_filter(BlockDriverState *bs,
     return false;
 }
 
+static bool quorum_recurse_can_replace(BlockDriverState *bs,
+                                       BlockDriverState *to_replace)
+{
+    BDRVQuorumState *s = bs->opaque;
+    int i;
+
+    for (i = 0; i < s->num_children; i++) {
+        /*
+         * We have no idea whether our children show the same data as
+         * this node (@bs).  It is actually highly likely that
+         * @to_replace does not, because replacing a broken child is
+         * one of the main use cases here.
+         *
+         * We do know that the new BDS will match @bs, so replacing
+         * any of our children by it will be safe.  It cannot change
+         * the data this quorum node presents to its parents.
+         *
+         * However, replacing @to_replace by @bs in any of our
+         * children's chains may change visible data somewhere in
+         * there.  We therefore cannot recurse down those chains with
+         * bdrv_recurse_can_replace().
+         * (More formally, bdrv_recurse_can_replace() requires that
+         * @to_replace will be replaced by something matching the @bs
+         * passed to it.  We cannot guarantee that.)
+         *
+         * Thus, we can only check whether any of our immediate
+         * children matches @to_replace.
+         *
+         * (In the future, we might add a function to recurse down a
+         * chain that checks that nothing there cares about a change
+         * in data from the respective child in question.  For
+         * example, most filters do not care when their child's data
+         * suddenly changes, as long as their parents do not care.)
+         */
+        if (s->children[i].child->bs == to_replace) {
+            int ret;
+
+            /*
+             * We now have to ensure that there is no other parent
+             * that cares about replacing this child by a node with
+             * potentially different data.
+             */
+            s->children[i].to_be_replaced = true;
+            ret = bdrv_child_refresh_perms(bs, s->children[i].child, NULL);
+
+            /* Revert permissions */
+            s->children[i].to_be_replaced = false;
+            bdrv_child_refresh_perms(bs, s->children[i].child, &error_abort);
+
+            return ret == 0;
+        }
+    }
+
+    return false;
+}
+
 static int quorum_valid_threshold(int threshold, int num_children, Error **errp)
 {
 
@@ -1197,6 +1253,7 @@ static BlockDriver bdrv_quorum = {
 
     .is_filter                          = true,
     .bdrv_recurse_is_first_non_filter   = quorum_recurse_is_first_non_filter,
+    .bdrv_recurse_can_replace           = quorum_recurse_can_replace,
 
     .strong_runtime_opts                = quorum_strong_runtime_opts,
 };
-- 
2.24.1



^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v3 11/21] block: Use bdrv_recurse_can_replace()
  2020-01-30 21:44 [PATCH v3 00/21] block: Fix check_to_replace_node() Max Reitz
                   ` (9 preceding siblings ...)
  2020-01-30 21:44 ` [PATCH v3 10/21] quorum: Implement .bdrv_recurse_can_replace() Max Reitz
@ 2020-01-30 21:44 ` Max Reitz
  2020-01-30 21:44 ` [PATCH v3 12/21] block: Remove bdrv_recurse_is_first_non_filter() Max Reitz
                   ` (9 subsequent siblings)
  20 siblings, 0 replies; 41+ messages in thread
From: Max Reitz @ 2020-01-30 21:44 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, qemu-devel, Max Reitz

Let check_to_replace_node() use the more specialized
bdrv_recurse_can_replace() instead of
bdrv_recurse_is_first_non_filter(), which is too restrictive (or, in the
case of quorum, sometimes not restrictive enough).

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 block.c | 18 ++++++++++++++++--
 1 file changed, 16 insertions(+), 2 deletions(-)

diff --git a/block.c b/block.c
index 6623248443..67844ad5ac 100644
--- a/block.c
+++ b/block.c
@@ -6250,6 +6250,17 @@ bool bdrv_recurse_can_replace(BlockDriverState *bs,
     return false;
 }
 
+/*
+ * Check whether the given @node_name can be replaced by a node that
+ * has the same data as @parent_bs.  If so, return @node_name's BDS;
+ * NULL otherwise.
+ *
+ * @node_name must be a (recursive) *child of @parent_bs (or this
+ * function will return NULL).
+ *
+ * The result (whether the node can be replaced or not) is only valid
+ * for as long as no graph or permission changes occur.
+ */
 BlockDriverState *check_to_replace_node(BlockDriverState *parent_bs,
                                         const char *node_name, Error **errp)
 {
@@ -6274,8 +6285,11 @@ BlockDriverState *check_to_replace_node(BlockDriverState *parent_bs,
      * Another benefit is that this tests exclude backing files which are
      * blocked by the backing blockers.
      */
-    if (!bdrv_recurse_is_first_non_filter(parent_bs, to_replace_bs)) {
-        error_setg(errp, "Only top most non filter can be replaced");
+    if (!bdrv_recurse_can_replace(parent_bs, to_replace_bs)) {
+        error_setg(errp, "Cannot replace '%s' by a node mirrored from '%s', "
+                   "because it cannot be guaranteed that doing so would not "
+                   "lead to an abrupt change of visible data",
+                   node_name, parent_bs->node_name);
         to_replace_bs = NULL;
         goto out;
     }
-- 
2.24.1



^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v3 12/21] block: Remove bdrv_recurse_is_first_non_filter()
  2020-01-30 21:44 [PATCH v3 00/21] block: Fix check_to_replace_node() Max Reitz
                   ` (10 preceding siblings ...)
  2020-01-30 21:44 ` [PATCH v3 11/21] block: Use bdrv_recurse_can_replace() Max Reitz
@ 2020-01-30 21:44 ` Max Reitz
  2020-02-04  9:45   ` Vladimir Sementsov-Ogievskiy
  2020-01-30 21:44 ` [PATCH v3 13/21] mirror: Double-check immediately before replacing Max Reitz
                   ` (8 subsequent siblings)
  20 siblings, 1 reply; 41+ messages in thread
From: Max Reitz @ 2020-01-30 21:44 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, qemu-devel, Max Reitz

It no longer has any users.

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 block.c                   | 33 ---------------------------------
 block/blkverify.c         | 15 ---------------
 block/copy-on-read.c      |  9 ---------
 block/filter-compress.c   |  9 ---------
 block/quorum.c            | 18 ------------------
 block/replication.c       |  7 -------
 block/throttle.c          |  8 --------
 include/block/block.h     |  4 ----
 include/block/block_int.h |  8 --------
 9 files changed, 111 deletions(-)

diff --git a/block.c b/block.c
index 67844ad5ac..1a27205252 100644
--- a/block.c
+++ b/block.c
@@ -6179,39 +6179,6 @@ int bdrv_amend_options(BlockDriverState *bs, QemuOpts *opts,
     return bs->drv->bdrv_amend_options(bs, opts, status_cb, cb_opaque, errp);
 }
 
-/* This function will be called by the bdrv_recurse_is_first_non_filter method
- * of block filter and by bdrv_is_first_non_filter.
- * It is used to test if the given bs is the candidate or recurse more in the
- * node graph.
- */
-bool bdrv_recurse_is_first_non_filter(BlockDriverState *bs,
-                                      BlockDriverState *candidate)
-{
-    /* return false if basic checks fails */
-    if (!bs || !bs->drv) {
-        return false;
-    }
-
-    /* the code reached a non block filter driver -> check if the bs is
-     * the same as the candidate. It's the recursion termination condition.
-     */
-    if (!bs->drv->is_filter) {
-        return bs == candidate;
-    }
-    /* Down this path the driver is a block filter driver */
-
-    /* If the block filter recursion method is defined use it to recurse down
-     * the node graph.
-     */
-    if (bs->drv->bdrv_recurse_is_first_non_filter) {
-        return bs->drv->bdrv_recurse_is_first_non_filter(bs, candidate);
-    }
-
-    /* the driver is a block filter but don't allow to recurse -> return false
-     */
-    return false;
-}
-
 /*
  * This function checks whether the given @to_replace is allowed to be
  * replaced by a node that always shows the same data as @bs.  This is
diff --git a/block/blkverify.c b/block/blkverify.c
index 0add3ab483..ba6b1853ae 100644
--- a/block/blkverify.c
+++ b/block/blkverify.c
@@ -268,20 +268,6 @@ static int blkverify_co_flush(BlockDriverState *bs)
     return bdrv_co_flush(s->test_file->bs);
 }
 
-static bool blkverify_recurse_is_first_non_filter(BlockDriverState *bs,
-                                                  BlockDriverState *candidate)
-{
-    BDRVBlkverifyState *s = bs->opaque;
-
-    bool perm = bdrv_recurse_is_first_non_filter(bs->file->bs, candidate);
-
-    if (perm) {
-        return true;
-    }
-
-    return bdrv_recurse_is_first_non_filter(s->test_file->bs, candidate);
-}
-
 static bool blkverify_recurse_can_replace(BlockDriverState *bs,
                                           BlockDriverState *to_replace)
 {
@@ -341,7 +327,6 @@ static BlockDriver bdrv_blkverify = {
     .bdrv_co_flush                    = blkverify_co_flush,
 
     .is_filter                        = true,
-    .bdrv_recurse_is_first_non_filter = blkverify_recurse_is_first_non_filter,
     .bdrv_recurse_can_replace         = blkverify_recurse_can_replace,
 };
 
diff --git a/block/copy-on-read.c b/block/copy-on-read.c
index e95223d3cb..242d3ff055 100644
--- a/block/copy-on-read.c
+++ b/block/copy-on-read.c
@@ -118,13 +118,6 @@ static void cor_lock_medium(BlockDriverState *bs, bool locked)
 }
 
 
-static bool cor_recurse_is_first_non_filter(BlockDriverState *bs,
-                                            BlockDriverState *candidate)
-{
-    return bdrv_recurse_is_first_non_filter(bs->file->bs, candidate);
-}
-
-
 static BlockDriver bdrv_copy_on_read = {
     .format_name                        = "copy-on-read",
 
@@ -143,8 +136,6 @@ static BlockDriver bdrv_copy_on_read = {
 
     .bdrv_co_block_status               = bdrv_co_block_status_from_file,
 
-    .bdrv_recurse_is_first_non_filter   = cor_recurse_is_first_non_filter,
-
     .has_variable_length                = true,
     .is_filter                          = true,
 };
diff --git a/block/filter-compress.c b/block/filter-compress.c
index 60137fb680..82c315b298 100644
--- a/block/filter-compress.c
+++ b/block/filter-compress.c
@@ -128,13 +128,6 @@ static void compress_lock_medium(BlockDriverState *bs, bool locked)
 }
 
 
-static bool compress_recurse_is_first_non_filter(BlockDriverState *bs,
-                                                 BlockDriverState *candidate)
-{
-    return bdrv_recurse_is_first_non_filter(bs->file->bs, candidate);
-}
-
-
 static BlockDriver bdrv_compress = {
     .format_name                        = "compress",
 
@@ -154,8 +147,6 @@ static BlockDriver bdrv_compress = {
 
     .bdrv_co_block_status               = bdrv_co_block_status_from_file,
 
-    .bdrv_recurse_is_first_non_filter   = compress_recurse_is_first_non_filter,
-
     .has_variable_length                = true,
     .is_filter                          = true,
 };
diff --git a/block/quorum.c b/block/quorum.c
index 22c1060b42..476def878d 100644
--- a/block/quorum.c
+++ b/block/quorum.c
@@ -808,23 +808,6 @@ static coroutine_fn int quorum_co_flush(BlockDriverState *bs)
     return result;
 }
 
-static bool quorum_recurse_is_first_non_filter(BlockDriverState *bs,
-                                               BlockDriverState *candidate)
-{
-    BDRVQuorumState *s = bs->opaque;
-    int i;
-
-    for (i = 0; i < s->num_children; i++) {
-        bool perm = bdrv_recurse_is_first_non_filter(s->children[i].child->bs,
-                                                     candidate);
-        if (perm) {
-            return true;
-        }
-    }
-
-    return false;
-}
-
 static bool quorum_recurse_can_replace(BlockDriverState *bs,
                                        BlockDriverState *to_replace)
 {
@@ -1252,7 +1235,6 @@ static BlockDriver bdrv_quorum = {
     .bdrv_child_perm                    = quorum_child_perm,
 
     .is_filter                          = true,
-    .bdrv_recurse_is_first_non_filter   = quorum_recurse_is_first_non_filter,
     .bdrv_recurse_can_replace           = quorum_recurse_can_replace,
 
     .strong_runtime_opts                = quorum_strong_runtime_opts,
diff --git a/block/replication.c b/block/replication.c
index 99532ce521..d6681b6c84 100644
--- a/block/replication.c
+++ b/block/replication.c
@@ -306,12 +306,6 @@ out:
     return ret;
 }
 
-static bool replication_recurse_is_first_non_filter(BlockDriverState *bs,
-                                                    BlockDriverState *candidate)
-{
-    return bdrv_recurse_is_first_non_filter(bs->file->bs, candidate);
-}
-
 static void secondary_do_checkpoint(BDRVReplicationState *s, Error **errp)
 {
     Error *local_err = NULL;
@@ -699,7 +693,6 @@ static BlockDriver bdrv_replication = {
     .bdrv_co_writev             = replication_co_writev,
 
     .is_filter                  = true,
-    .bdrv_recurse_is_first_non_filter = replication_recurse_is_first_non_filter,
 
     .has_variable_length        = true,
     .strong_runtime_opts        = replication_strong_runtime_opts,
diff --git a/block/throttle.c b/block/throttle.c
index 0349f42257..71f4bb0ad1 100644
--- a/block/throttle.c
+++ b/block/throttle.c
@@ -207,12 +207,6 @@ static void throttle_reopen_abort(BDRVReopenState *reopen_state)
     reopen_state->opaque = NULL;
 }
 
-static bool throttle_recurse_is_first_non_filter(BlockDriverState *bs,
-                                                 BlockDriverState *candidate)
-{
-    return bdrv_recurse_is_first_non_filter(bs->file->bs, candidate);
-}
-
 static void coroutine_fn throttle_co_drain_begin(BlockDriverState *bs)
 {
     ThrottleGroupMember *tgm = bs->opaque;
@@ -252,8 +246,6 @@ static BlockDriver bdrv_throttle = {
     .bdrv_co_pwrite_zeroes              =   throttle_co_pwrite_zeroes,
     .bdrv_co_pdiscard                   =   throttle_co_pdiscard,
 
-    .bdrv_recurse_is_first_non_filter   =   throttle_recurse_is_first_non_filter,
-
     .bdrv_attach_aio_context            =   throttle_attach_aio_context,
     .bdrv_detach_aio_context            =   throttle_detach_aio_context,
 
diff --git a/include/block/block.h b/include/block/block.h
index 8f6a0cad9c..764a217de6 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -401,10 +401,6 @@ int bdrv_amend_options(BlockDriverState *bs_new, QemuOpts *opts,
                        BlockDriverAmendStatusCB *status_cb, void *cb_opaque,
                        Error **errp);
 
-/* external snapshots */
-bool bdrv_recurse_is_first_non_filter(BlockDriverState *bs,
-                                      BlockDriverState *candidate);
-
 /* check if a named node can be replaced when doing drive-mirror */
 BlockDriverState *check_to_replace_node(BlockDriverState *parent_bs,
                                         const char *node_name, Error **errp);
diff --git a/include/block/block_int.h b/include/block/block_int.h
index 75f03dcc38..589a797fab 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -94,14 +94,6 @@ struct BlockDriver {
      * must implement them and return -ENOTSUP.
      */
     bool is_filter;
-    /* for snapshots block filter like Quorum can implement the
-     * following recursive callback.
-     * It's purpose is to recurse on the filter children while calling
-     * bdrv_recurse_is_first_non_filter on them.
-     * For a sample implementation look in the future Quorum block filter.
-     */
-    bool (*bdrv_recurse_is_first_non_filter)(BlockDriverState *bs,
-                                             BlockDriverState *candidate);
     /*
      * Return true if @to_replace can be replaced by a BDS with the
      * same data as @bs without it affecting @bs's behavior (that is,
-- 
2.24.1



^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v3 13/21] mirror: Double-check immediately before replacing
  2020-01-30 21:44 [PATCH v3 00/21] block: Fix check_to_replace_node() Max Reitz
                   ` (11 preceding siblings ...)
  2020-01-30 21:44 ` [PATCH v3 12/21] block: Remove bdrv_recurse_is_first_non_filter() Max Reitz
@ 2020-01-30 21:44 ` Max Reitz
  2020-01-30 21:44 ` [PATCH v3 14/21] quorum: Stop marking it as a filter Max Reitz
                   ` (7 subsequent siblings)
  20 siblings, 0 replies; 41+ messages in thread
From: Max Reitz @ 2020-01-30 21:44 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, qemu-devel, Max Reitz

There is no guarantee that we can still replace the node we want to
replace at the end of the mirror job.  Double-check by calling
bdrv_recurse_can_replace().

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 block/mirror.c | 14 +++++++++++++-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/block/mirror.c b/block/mirror.c
index cacbc70014..447051dbc6 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -700,7 +700,19 @@ static int mirror_exit_common(Job *job)
          * drain potential other users of the BDS before changing the graph. */
         assert(s->in_drain);
         bdrv_drained_begin(target_bs);
-        bdrv_replace_node(to_replace, target_bs, &local_err);
+        /*
+         * Cannot use check_to_replace_node() here, because that would
+         * check for an op blocker on @to_replace, and we have our own
+         * there.
+         */
+        if (bdrv_recurse_can_replace(src, to_replace)) {
+            bdrv_replace_node(to_replace, target_bs, &local_err);
+        } else {
+            error_setg(&local_err, "Can no longer replace '%s' by '%s', "
+                       "because it can no longer be guaranteed that doing so "
+                       "would not lead to an abrupt change of visible data",
+                       to_replace->node_name, target_bs->node_name);
+        }
         bdrv_drained_end(target_bs);
         if (local_err) {
             error_report_err(local_err);
-- 
2.24.1



^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v3 14/21] quorum: Stop marking it as a filter
  2020-01-30 21:44 [PATCH v3 00/21] block: Fix check_to_replace_node() Max Reitz
                   ` (12 preceding siblings ...)
  2020-01-30 21:44 ` [PATCH v3 13/21] mirror: Double-check immediately before replacing Max Reitz
@ 2020-01-30 21:44 ` Max Reitz
  2020-01-30 21:44 ` [PATCH v3 15/21] iotests: Use complete_and_wait() in 155 Max Reitz
                   ` (6 subsequent siblings)
  20 siblings, 0 replies; 41+ messages in thread
From: Max Reitz @ 2020-01-30 21:44 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, qemu-devel, Max Reitz

Quorum is not a filter, for example because it cannot guarantee which of
its children will serve the next request.  Thus, any of its children may
differ from the data visible to quorum's parents.

We have other filters with multiple children, but they differ in this
aspect:

- blkverify quits the whole qemu process if its children differ.  As
  such, we can always skip it when we want to skip it (as a filter node)
  by going to any of its children.  Both have the same data.

- replication generally serves requests from bs->file, so this is its
  only actually filtered child.

- Block job filters currently only have one child, but they will
  probably get more children in the future.  Still, they will always
  have only one actually filtered child.

Having "filters" as a dedicated node category only makes sense if you
can skip them by going to a one fixed child that always shows the same
data as the filter node.  Quorum cannot fulfill this, so it is not a
filter.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 block/quorum.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/block/quorum.c b/block/quorum.c
index 476def878d..d85db88dbb 100644
--- a/block/quorum.c
+++ b/block/quorum.c
@@ -1234,7 +1234,6 @@ static BlockDriver bdrv_quorum = {
 
     .bdrv_child_perm                    = quorum_child_perm,
 
-    .is_filter                          = true,
     .bdrv_recurse_can_replace           = quorum_recurse_can_replace,
 
     .strong_runtime_opts                = quorum_strong_runtime_opts,
-- 
2.24.1



^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v3 15/21] iotests: Use complete_and_wait() in 155
  2020-01-30 21:44 [PATCH v3 00/21] block: Fix check_to_replace_node() Max Reitz
                   ` (13 preceding siblings ...)
  2020-01-30 21:44 ` [PATCH v3 14/21] quorum: Stop marking it as a filter Max Reitz
@ 2020-01-30 21:44 ` Max Reitz
  2020-01-30 21:44 ` [PATCH v3 16/21] iotests: Add VM.assert_block_path() Max Reitz
                   ` (5 subsequent siblings)
  20 siblings, 0 replies; 41+ messages in thread
From: Max Reitz @ 2020-01-30 21:44 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, qemu-devel, Max Reitz

This way, we get to see errors during the completion phase.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 tests/qemu-iotests/155 | 7 +------
 1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/tests/qemu-iotests/155 b/tests/qemu-iotests/155
index e19485911c..d7ef2579d3 100755
--- a/tests/qemu-iotests/155
+++ b/tests/qemu-iotests/155
@@ -163,12 +163,7 @@ class MirrorBaseClass(BaseClass):
 
         self.assert_qmp(result, 'return', {})
 
-        self.vm.event_wait('BLOCK_JOB_READY')
-
-        result = self.vm.qmp('block-job-complete', device='mirror-job')
-        self.assert_qmp(result, 'return', {})
-
-        self.vm.event_wait('BLOCK_JOB_COMPLETED')
+        self.complete_and_wait('mirror-job')
 
     def testFull(self):
         self.runMirror('full')
-- 
2.24.1



^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v3 16/21] iotests: Add VM.assert_block_path()
  2020-01-30 21:44 [PATCH v3 00/21] block: Fix check_to_replace_node() Max Reitz
                   ` (14 preceding siblings ...)
  2020-01-30 21:44 ` [PATCH v3 15/21] iotests: Use complete_and_wait() in 155 Max Reitz
@ 2020-01-30 21:44 ` Max Reitz
  2020-02-04 10:33   ` Vladimir Sementsov-Ogievskiy
  2020-02-04 14:09   ` Vladimir Sementsov-Ogievskiy
  2020-01-30 21:44 ` [PATCH v3 17/21] iotests/041: Drop superfluous shutdowns Max Reitz
                   ` (4 subsequent siblings)
  20 siblings, 2 replies; 41+ messages in thread
From: Max Reitz @ 2020-01-30 21:44 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, qemu-devel, Max Reitz

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 tests/qemu-iotests/iotests.py | 56 +++++++++++++++++++++++++++++++++++
 1 file changed, 56 insertions(+)

diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py
index 01b58dcb50..69861cf05e 100644
--- a/tests/qemu-iotests/iotests.py
+++ b/tests/qemu-iotests/iotests.py
@@ -713,6 +713,62 @@ class VM(qtest.QEMUQtestMachine):
 
         return fields.items() <= ret.items()
 
+    def assert_block_path(self, root, path, expected_node, graph=None):
+        """
+        Check whether the node under the given path in the block graph
+        is @expected_node.
+
+        @root is the node name of the node where the @path is rooted.
+
+        @path is a string that consists of child names separated by
+        slashes.  It must begin with a slash.
+
+        Examples for @root + @path:
+          - root="qcow2-node", path="/backing/file"
+          - root="quorum-node", path="/children.2/file"
+
+        Hypothetically, @path could be empty, in which case it would
+        point to @root.  However, in practice this case is not useful
+        and hence not allowed.
+
+        @expected_node may be None.  (All elements of the path but the
+        leaf must still exist.)
+
+        @graph may be None or the result of an x-debug-query-block-graph
+        call that has already been performed.
+        """
+        if graph is None:
+            graph = self.qmp('x-debug-query-block-graph')['return']
+
+        iter_path = iter(path.split('/'))
+
+        # Must start with a /
+        assert next(iter_path) == ''
+
+        node = next((node for node in graph['nodes'] if node['name'] == root),
+                    None)
+
+        for child_name in iter_path:
+            assert node is not None, 'Cannot follow path %s' % path
+
+            try:
+                node_id = next(edge['child'] for edge in graph['edges'] \
+                                             if edge['parent'] == node['id'] and
+                                                edge['name'] == child_name)
+
+                node = next(node for node in graph['nodes'] \
+                                 if node['id'] == node_id)
+            except StopIteration:
+                node = None
+
+        if node is None:
+            assert expected_node is None, \
+                   'No node found under %s (but expected %s)' % \
+                   (path, expected_node)
+        else:
+            assert node['name'] == expected_node, \
+                   'Found node %s under %s (but expected %s)' % \
+                   (node['name'], path, expected_node)
 
 index_re = re.compile(r'([^\[]+)\[([^\]]+)\]')
 
-- 
2.24.1



^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v3 17/21] iotests/041: Drop superfluous shutdowns
  2020-01-30 21:44 [PATCH v3 00/21] block: Fix check_to_replace_node() Max Reitz
                   ` (15 preceding siblings ...)
  2020-01-30 21:44 ` [PATCH v3 16/21] iotests: Add VM.assert_block_path() Max Reitz
@ 2020-01-30 21:44 ` Max Reitz
  2020-02-04 11:40   ` Vladimir Sementsov-Ogievskiy
  2020-01-30 21:44 ` [PATCH v3 18/21] iotests: Resolve TODOs in 041 Max Reitz
                   ` (3 subsequent siblings)
  20 siblings, 1 reply; 41+ messages in thread
From: Max Reitz @ 2020-01-30 21:44 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, qemu-devel, Max Reitz

All tearDowns in 041 shutdown the VM.  Thus, test cases do not need to
do it themselves (unless they need the VM to be down for some
post-operation check).

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 tests/qemu-iotests/041 | 11 -----------
 1 file changed, 11 deletions(-)

diff --git a/tests/qemu-iotests/041 b/tests/qemu-iotests/041
index a429281f61..20cfad1d2c 100755
--- a/tests/qemu-iotests/041
+++ b/tests/qemu-iotests/041
@@ -80,7 +80,6 @@ class TestSingleDrive(iotests.QMPTestCase):
         self.cancel_and_wait(force=True)
         result = self.vm.qmp('query-block')
         self.assert_qmp(result, 'return[0]/inserted/file', test_img)
-        self.vm.shutdown()
 
     def test_cancel_after_ready(self):
         self.assert_no_active_block_jobs()
@@ -201,8 +200,6 @@ class TestSingleDrive(iotests.QMPTestCase):
         self.assert_qmp(result, 'return[0]/node-name', 'top')
         self.assert_qmp(result, 'return[0]/backing/node-name', 'base')
 
-        self.vm.shutdown()
-
     def test_medium_not_found(self):
         if iotests.qemu_default_machine != 'pc':
             return
@@ -455,7 +452,6 @@ new_state = "1"
                     self.assert_qmp(event, 'data/id', 'drive0')
 
         self.assert_no_active_block_jobs()
-        self.vm.shutdown()
 
     def test_ignore_read(self):
         self.assert_no_active_block_jobs()
@@ -475,7 +471,6 @@ new_state = "1"
         result = self.vm.qmp('query-block-jobs')
         self.assert_qmp(result, 'return[0]/paused', False)
         self.complete_and_wait()
-        self.vm.shutdown()
 
     def test_large_cluster(self):
         self.assert_no_active_block_jobs()
@@ -540,7 +535,6 @@ new_state = "1"
 
         self.complete_and_wait(wait_ready=False)
         self.assert_no_active_block_jobs()
-        self.vm.shutdown()
 
 class TestWriteErrors(iotests.QMPTestCase):
     image_len = 2 * 1024 * 1024 # MB
@@ -614,7 +608,6 @@ new_state = "1"
                     completed = True
 
         self.assert_no_active_block_jobs()
-        self.vm.shutdown()
 
     def test_ignore_write(self):
         self.assert_no_active_block_jobs()
@@ -631,7 +624,6 @@ new_state = "1"
         result = self.vm.qmp('query-block-jobs')
         self.assert_qmp(result, 'return[0]/paused', False)
         self.complete_and_wait()
-        self.vm.shutdown()
 
     def test_stop_write(self):
         self.assert_no_active_block_jobs()
@@ -667,7 +659,6 @@ new_state = "1"
 
         self.complete_and_wait(wait_ready=False)
         self.assert_no_active_block_jobs()
-        self.vm.shutdown()
 
 class TestSetSpeed(iotests.QMPTestCase):
     image_len = 80 * 1024 * 1024 # MB
@@ -936,7 +927,6 @@ class TestRepairQuorum(iotests.QMPTestCase):
         # here we check that the last registered quorum file has not been
         # swapped out and unref
         self.assert_has_block_node(None, quorum_img3)
-        self.vm.shutdown()
 
     def test_cancel_after_ready(self):
         self.assert_no_active_block_jobs()
@@ -1043,7 +1033,6 @@ class TestRepairQuorum(iotests.QMPTestCase):
         self.assert_has_block_node("repair0", quorum_repair_img)
         # TODO: a better test requiring some QEMU infrastructure will be added
         #       to check that this file is really driven by quorum
-        self.vm.shutdown()
 
 # Test mirroring with a source that does not have any parents (not even a
 # BlockBackend)
-- 
2.24.1



^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v3 18/21] iotests: Resolve TODOs in 041
  2020-01-30 21:44 [PATCH v3 00/21] block: Fix check_to_replace_node() Max Reitz
                   ` (16 preceding siblings ...)
  2020-01-30 21:44 ` [PATCH v3 17/21] iotests/041: Drop superfluous shutdowns Max Reitz
@ 2020-01-30 21:44 ` Max Reitz
  2020-02-04  9:54   ` Vladimir Sementsov-Ogievskiy
  2020-01-30 21:44 ` [PATCH v3 19/21] iotests: Use self.image_len in TestRepairQuorum Max Reitz
                   ` (2 subsequent siblings)
  20 siblings, 1 reply; 41+ messages in thread
From: Max Reitz @ 2020-01-30 21:44 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, qemu-devel, Max Reitz

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 tests/qemu-iotests/041 | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/tests/qemu-iotests/041 b/tests/qemu-iotests/041
index 20cfad1d2c..41db2c222a 100755
--- a/tests/qemu-iotests/041
+++ b/tests/qemu-iotests/041
@@ -909,8 +909,7 @@ class TestRepairQuorum(iotests.QMPTestCase):
 
         self.complete_and_wait(drive="job0")
         self.assert_has_block_node("repair0", quorum_repair_img)
-        # TODO: a better test requiring some QEMU infrastructure will be added
-        #       to check that this file is really driven by quorum
+        self.vm.assert_block_path('quorum0', '/children.1', 'repair0')
         self.vm.shutdown()
         self.assertTrue(iotests.compare_images(quorum_img2, quorum_repair_img),
                         'target image does not match source after mirroring')
@@ -1031,8 +1030,7 @@ class TestRepairQuorum(iotests.QMPTestCase):
 
         self.complete_and_wait('job0')
         self.assert_has_block_node("repair0", quorum_repair_img)
-        # TODO: a better test requiring some QEMU infrastructure will be added
-        #       to check that this file is really driven by quorum
+        self.vm.assert_block_path('quorum0', '/children.1', 'repair0')
 
 # Test mirroring with a source that does not have any parents (not even a
 # BlockBackend)
-- 
2.24.1



^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v3 19/21] iotests: Use self.image_len in TestRepairQuorum
  2020-01-30 21:44 [PATCH v3 00/21] block: Fix check_to_replace_node() Max Reitz
                   ` (17 preceding siblings ...)
  2020-01-30 21:44 ` [PATCH v3 18/21] iotests: Resolve TODOs in 041 Max Reitz
@ 2020-01-30 21:44 ` Max Reitz
  2020-01-30 21:44 ` [PATCH v3 20/21] iotests: Add tests for invalid Quorum @replaces Max Reitz
  2020-01-30 21:44 ` [PATCH v3 21/21] iotests: Check that @replaces can replace filters Max Reitz
  20 siblings, 0 replies; 41+ messages in thread
From: Max Reitz @ 2020-01-30 21:44 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, qemu-devel, Max Reitz

041's TestRepairQuorum has its own image_len, no need to refer to
TestSingleDrive.  (This patch allows commenting out TestSingleDrive to
speed up 041 during test testing.)

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 tests/qemu-iotests/041 | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tests/qemu-iotests/041 b/tests/qemu-iotests/041
index 41db2c222a..1295a92021 100755
--- a/tests/qemu-iotests/041
+++ b/tests/qemu-iotests/041
@@ -872,7 +872,7 @@ class TestRepairQuorum(iotests.QMPTestCase):
         # Add each individual quorum images
         for i in self.IMAGES:
             qemu_img('create', '-f', iotests.imgfmt, i,
-                     str(TestSingleDrive.image_len))
+                     str(self.image_len))
             # Assign a node name to each quorum image in order to manipulate
             # them
             opts = "node-name=img%i" % self.IMAGES.index(i)
-- 
2.24.1



^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v3 20/21] iotests: Add tests for invalid Quorum @replaces
  2020-01-30 21:44 [PATCH v3 00/21] block: Fix check_to_replace_node() Max Reitz
                   ` (18 preceding siblings ...)
  2020-01-30 21:44 ` [PATCH v3 19/21] iotests: Use self.image_len in TestRepairQuorum Max Reitz
@ 2020-01-30 21:44 ` Max Reitz
  2020-01-30 21:44 ` [PATCH v3 21/21] iotests: Check that @replaces can replace filters Max Reitz
  20 siblings, 0 replies; 41+ messages in thread
From: Max Reitz @ 2020-01-30 21:44 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, qemu-devel, Max Reitz

Add two tests to see that you cannot replace a Quorum child with the
mirror job while the child is in use by a different parent.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 tests/qemu-iotests/041     | 70 +++++++++++++++++++++++++++++++++++++-
 tests/qemu-iotests/041.out |  4 +--
 2 files changed, 71 insertions(+), 3 deletions(-)

diff --git a/tests/qemu-iotests/041 b/tests/qemu-iotests/041
index 1295a92021..12149c7786 100755
--- a/tests/qemu-iotests/041
+++ b/tests/qemu-iotests/041
@@ -20,6 +20,7 @@
 
 import time
 import os
+import re
 import iotests
 from iotests import qemu_img, qemu_io
 
@@ -34,6 +35,8 @@ quorum_img3 = os.path.join(iotests.test_dir, 'quorum3.img')
 quorum_repair_img = os.path.join(iotests.test_dir, 'quorum_repair.img')
 quorum_snapshot_file = os.path.join(iotests.test_dir, 'quorum_snapshot.img')
 
+nbd_sock_path = os.path.join(iotests.test_dir, 'nbd.sock')
+
 class TestSingleDrive(iotests.QMPTestCase):
     image_len = 1 * 1024 * 1024 # MB
     qmp_cmd = 'drive-mirror'
@@ -892,7 +895,8 @@ class TestRepairQuorum(iotests.QMPTestCase):
 
     def tearDown(self):
         self.vm.shutdown()
-        for i in self.IMAGES + [ quorum_repair_img, quorum_snapshot_file ]:
+        for i in self.IMAGES + [ quorum_repair_img, quorum_snapshot_file,
+                                 nbd_sock_path ]:
             # Do a try/except because the test may have deleted some images
             try:
                 os.remove(i)
@@ -1032,6 +1036,70 @@ class TestRepairQuorum(iotests.QMPTestCase):
         self.assert_has_block_node("repair0", quorum_repair_img)
         self.vm.assert_block_path('quorum0', '/children.1', 'repair0')
 
+    def test_with_other_parent(self):
+        """
+        Check that we cannot replace a Quorum child when it has other
+        parents.
+        """
+        result = self.vm.qmp('nbd-server-start',
+                             addr={
+                                 'type': 'unix',
+                                 'data': {'path': nbd_sock_path}
+                             })
+        self.assert_qmp(result, 'return', {})
+
+        result = self.vm.qmp('nbd-server-add', device='img1')
+        self.assert_qmp(result, 'return', {})
+
+        result = self.vm.qmp('drive-mirror', job_id='mirror', device='quorum0',
+                             sync='full', node_name='repair0', replaces='img1',
+                             target=quorum_repair_img, format=iotests.imgfmt)
+        self.assert_qmp(result, 'error/desc',
+                        "Cannot replace 'img1' by a node mirrored from "
+                        "'quorum0', because it cannot be guaranteed that doing "
+                        "so would not lead to an abrupt change of visible data")
+
+    def test_with_other_parents_after_mirror_start(self):
+        """
+        The same as test_with_other_parent(), but add the NBD server
+        only when the mirror job is already running.
+        """
+        result = self.vm.qmp('nbd-server-start',
+                             addr={
+                                 'type': 'unix',
+                                 'data': {'path': nbd_sock_path}
+                             })
+        self.assert_qmp(result, 'return', {})
+
+        result = self.vm.qmp('drive-mirror', job_id='mirror', device='quorum0',
+                             sync='full', node_name='repair0', replaces='img1',
+                             target=quorum_repair_img, format=iotests.imgfmt)
+        self.assert_qmp(result, 'return', {})
+
+        result = self.vm.qmp('nbd-server-add', device='img1')
+        self.assert_qmp(result, 'return', {})
+
+        # The full error message goes to stderr, we will check it later
+        self.complete_and_wait('mirror',
+                               completion_error='Operation not permitted')
+
+        # Should not have been replaced
+        self.vm.assert_block_path('quorum0', '/children.1', 'img1')
+
+        # Check the full error message now
+        self.vm.shutdown()
+        log = self.vm.get_log()
+        log = re.sub(r'^\[I \d+\.\d+\] OPENED\n', '', log)
+        log = re.sub(r'^Formatting.*\n', '', log)
+        log = re.sub(r'\n\[I \+\d+\.\d+\] CLOSED\n?$', '', log)
+        log = re.sub(r'^qemu-system-[^:]*: ', '', log)
+
+        self.assertEqual(log,
+                         "Can no longer replace 'img1' by 'repair0', because " +
+                         "it can no longer be guaranteed that doing so would " +
+                         "not lead to an abrupt change of visible data")
+
+
 # Test mirroring with a source that does not have any parents (not even a
 # BlockBackend)
 class TestOrphanedSource(iotests.QMPTestCase):
diff --git a/tests/qemu-iotests/041.out b/tests/qemu-iotests/041.out
index f496be9197..ffc779b4d1 100644
--- a/tests/qemu-iotests/041.out
+++ b/tests/qemu-iotests/041.out
@@ -1,5 +1,5 @@
-...........................................................................................
+.............................................................................................
 ----------------------------------------------------------------------
-Ran 91 tests
+Ran 93 tests
 
 OK
-- 
2.24.1



^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v3 21/21] iotests: Check that @replaces can replace filters
  2020-01-30 21:44 [PATCH v3 00/21] block: Fix check_to_replace_node() Max Reitz
                   ` (19 preceding siblings ...)
  2020-01-30 21:44 ` [PATCH v3 20/21] iotests: Add tests for invalid Quorum @replaces Max Reitz
@ 2020-01-30 21:44 ` Max Reitz
  2020-02-04  9:56   ` Vladimir Sementsov-Ogievskiy
  20 siblings, 1 reply; 41+ messages in thread
From: Max Reitz @ 2020-01-30 21:44 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, qemu-devel, Max Reitz

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 tests/qemu-iotests/041     | 46 ++++++++++++++++++++++++++++++++++++++
 tests/qemu-iotests/041.out |  4 ++--
 2 files changed, 48 insertions(+), 2 deletions(-)

diff --git a/tests/qemu-iotests/041 b/tests/qemu-iotests/041
index 12149c7786..5ff995dbe2 100755
--- a/tests/qemu-iotests/041
+++ b/tests/qemu-iotests/041
@@ -1190,6 +1190,52 @@ class TestOrphanedSource(iotests.QMPTestCase):
         self.assertFalse('mirror-filter' in nodes,
                          'Mirror filter node did not disappear')
 
+# Test cases for @replaces that do not necessarily involve Quorum
+class TestReplaces(iotests.QMPTestCase):
+    # Each of these test cases needs their own block graph, so do not
+    # create any nodes here
+    def setUp(self):
+        self.vm = iotests.VM()
+        self.vm.launch()
+
+    def tearDown(self):
+        self.vm.shutdown()
+        for img in (test_img, target_img):
+            try:
+                os.remove(img)
+            except OSError:
+                pass
+
+    @iotests.skip_if_unsupported(['copy-on-read'])
+    def test_replace_filter(self):
+        """
+        Check that we can replace filter nodes.
+        """
+        result = self.vm.qmp('blockdev-add', **{
+                                 'driver': 'copy-on-read',
+                                 'node-name': 'filter0',
+                                 'file': {
+                                     'driver': 'copy-on-read',
+                                     'node-name': 'filter1',
+                                     'file': {
+                                         'driver': 'null-co'
+                                     }
+                                 }
+                             })
+        self.assert_qmp(result, 'return', {})
+
+        result = self.vm.qmp('blockdev-add',
+                             node_name='target', driver='null-co')
+        self.assert_qmp(result, 'return', {})
+
+        result = self.vm.qmp('blockdev-mirror', job_id='mirror', device='filter0',
+                             target='target', sync='full', replaces='filter1')
+        self.assert_qmp(result, 'return', {})
+
+        self.complete_and_wait('mirror')
+
+        self.vm.assert_block_path('filter0', '/file', 'target')
+
 if __name__ == '__main__':
     iotests.main(supported_fmts=['qcow2', 'qed'],
                  supported_protocols=['file'],
diff --git a/tests/qemu-iotests/041.out b/tests/qemu-iotests/041.out
index ffc779b4d1..877b76fd31 100644
--- a/tests/qemu-iotests/041.out
+++ b/tests/qemu-iotests/041.out
@@ -1,5 +1,5 @@
-.............................................................................................
+..............................................................................................
 ----------------------------------------------------------------------
-Ran 93 tests
+Ran 94 tests
 
 OK
-- 
2.24.1



^ permalink raw reply related	[flat|nested] 41+ messages in thread

* Re: [PATCH v3 09/21] quorum: Add QuorumChild.to_be_replaced
  2020-01-30 21:44 ` [PATCH v3 09/21] quorum: Add QuorumChild.to_be_replaced Max Reitz
@ 2020-02-04  9:33   ` Vladimir Sementsov-Ogievskiy
  2020-02-05 15:38   ` Kevin Wolf
  1 sibling, 0 replies; 41+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-02-04  9:33 UTC (permalink / raw)
  To: Max Reitz, qemu-block; +Cc: Kevin Wolf, qemu-devel

31.01.2020 0:44, Max Reitz wrote:
> We will need this to verify that Quorum can let one of its children be
> replaced without breaking anything else.
> 
> Signed-off-by: Max Reitz<mreitz@redhat.com>

Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>

-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v3 10/21] quorum: Implement .bdrv_recurse_can_replace()
  2020-01-30 21:44 ` [PATCH v3 10/21] quorum: Implement .bdrv_recurse_can_replace() Max Reitz
@ 2020-02-04  9:37   ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 41+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-02-04  9:37 UTC (permalink / raw)
  To: Max Reitz, qemu-block; +Cc: Kevin Wolf, qemu-devel

31.01.2020 0:44, Max Reitz wrote:
> Signed-off-by: Max Reitz<mreitz@redhat.com>

Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>

-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v3 12/21] block: Remove bdrv_recurse_is_first_non_filter()
  2020-01-30 21:44 ` [PATCH v3 12/21] block: Remove bdrv_recurse_is_first_non_filter() Max Reitz
@ 2020-02-04  9:45   ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 41+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-02-04  9:45 UTC (permalink / raw)
  To: Max Reitz, qemu-block; +Cc: Kevin Wolf, qemu-devel

31.01.2020 0:44, Max Reitz wrote:
> It no longer has any users.
> 
> Signed-off-by: Max Reitz<mreitz@redhat.com>

Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>

-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v3 18/21] iotests: Resolve TODOs in 041
  2020-01-30 21:44 ` [PATCH v3 18/21] iotests: Resolve TODOs in 041 Max Reitz
@ 2020-02-04  9:54   ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 41+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-02-04  9:54 UTC (permalink / raw)
  To: Max Reitz, qemu-block; +Cc: Kevin Wolf, qemu-devel

31.01.2020 0:44, Max Reitz wrote:
> Signed-off-by: Max Reitz<mreitz@redhat.com>


Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>

-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v3 21/21] iotests: Check that @replaces can replace filters
  2020-01-30 21:44 ` [PATCH v3 21/21] iotests: Check that @replaces can replace filters Max Reitz
@ 2020-02-04  9:56   ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 41+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-02-04  9:56 UTC (permalink / raw)
  To: Max Reitz, qemu-block; +Cc: Kevin Wolf, qemu-devel

31.01.2020 0:44, Max Reitz wrote:
> Signed-off-by: Max Reitz<mreitz@redhat.com>

You forget my r-b:

Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>

-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v3 16/21] iotests: Add VM.assert_block_path()
  2020-01-30 21:44 ` [PATCH v3 16/21] iotests: Add VM.assert_block_path() Max Reitz
@ 2020-02-04 10:33   ` Vladimir Sementsov-Ogievskiy
  2020-02-04 14:09   ` Vladimir Sementsov-Ogievskiy
  1 sibling, 0 replies; 41+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-02-04 10:33 UTC (permalink / raw)
  To: Max Reitz, qemu-block; +Cc: Kevin Wolf, qemu-devel

31.01.2020 0:44, Max Reitz wrote:
> Signed-off-by: Max Reitz<mreitz@redhat.com>


Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>

-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v3 17/21] iotests/041: Drop superfluous shutdowns
  2020-01-30 21:44 ` [PATCH v3 17/21] iotests/041: Drop superfluous shutdowns Max Reitz
@ 2020-02-04 11:40   ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 41+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-02-04 11:40 UTC (permalink / raw)
  To: Max Reitz, qemu-block; +Cc: Kevin Wolf, qemu-devel

31.01.2020 0:44, Max Reitz wrote:
> All tearDowns in 041 shutdown the VM.  Thus, test cases do not need to
> do it themselves (unless they need the VM to be down for some
> post-operation check).
> 
> Signed-off-by: Max Reitz<mreitz@redhat.com>


Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>

-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v3 16/21] iotests: Add VM.assert_block_path()
  2020-01-30 21:44 ` [PATCH v3 16/21] iotests: Add VM.assert_block_path() Max Reitz
  2020-02-04 10:33   ` Vladimir Sementsov-Ogievskiy
@ 2020-02-04 14:09   ` Vladimir Sementsov-Ogievskiy
  1 sibling, 0 replies; 41+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-02-04 14:09 UTC (permalink / raw)
  To: Max Reitz, qemu-block; +Cc: Kevin Wolf, qemu-devel

31.01.2020 0:44, Max Reitz wrote:
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>   tests/qemu-iotests/iotests.py | 56 +++++++++++++++++++++++++++++++++++
>   1 file changed, 56 insertions(+)
> 
> diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py
> index 01b58dcb50..69861cf05e 100644
> --- a/tests/qemu-iotests/iotests.py
> +++ b/tests/qemu-iotests/iotests.py
> @@ -713,6 +713,62 @@ class VM(qtest.QEMUQtestMachine):
>   
>           return fields.items() <= ret.items()
>   
> +    def assert_block_path(self, root, path, expected_node, graph=None):
> +        """
> +        Check whether the node under the given path in the block graph
> +        is @expected_node.
> +
> +        @root is the node name of the node where the @path is rooted.
> +
> +        @path is a string that consists of child names separated by
> +        slashes.  It must begin with a slash.
> +
> +        Examples for @root + @path:
> +          - root="qcow2-node", path="/backing/file"
> +          - root="quorum-node", path="/children.2/file"
> +
> +        Hypothetically, @path could be empty, in which case it would
> +        point to @root.  However, in practice this case is not useful
> +        and hence not allowed.
> +
> +        @expected_node may be None.  (All elements of the path but the
> +        leaf must still exist.)
> +
> +        @graph may be None or the result of an x-debug-query-block-graph
> +        call that has already been performed.
> +        """
> +        if graph is None:
> +            graph = self.qmp('x-debug-query-block-graph')['return']
> +
> +        iter_path = iter(path.split('/'))
> +
> +        # Must start with a /
> +        assert next(iter_path) == ''
> +
> +        node = next((node for node in graph['nodes'] if node['name'] == root),
> +                    None)
> +
> +        for child_name in iter_path:
> +            assert node is not None, 'Cannot follow path %s' % path

this error message will be a bit misleading if we failed to find root node. So,
may be add additional assert for root node, or at least
'Cannot follow path %s%s'.format(root, path)


It doesn't worth to resend only for this, apply it if you want and keep my r-b anyway.

> +
> +            try:
> +                node_id = next(edge['child'] for edge in graph['edges'] \
> +                                             if edge['parent'] == node['id'] and
> +                                                edge['name'] == child_name)
> +
> +                node = next(node for node in graph['nodes'] \
> +                                 if node['id'] == node_id)
> +            except StopIteration:
> +                node = None
> +
> +        if node is None:
> +            assert expected_node is None, \
> +                   'No node found under %s (but expected %s)' % \
> +                   (path, expected_node)
> +        else:
> +            assert node['name'] == expected_node, \
> +                   'Found node %s under %s (but expected %s)' % \
> +                   (node['name'], path, expected_node)
>   
>   index_re = re.compile(r'([^\[]+)\[([^\]]+)\]')
>   
> 


-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v3 09/21] quorum: Add QuorumChild.to_be_replaced
  2020-01-30 21:44 ` [PATCH v3 09/21] quorum: Add QuorumChild.to_be_replaced Max Reitz
  2020-02-04  9:33   ` Vladimir Sementsov-Ogievskiy
@ 2020-02-05 15:38   ` Kevin Wolf
  2020-02-06 10:11     ` Max Reitz
  1 sibling, 1 reply; 41+ messages in thread
From: Kevin Wolf @ 2020-02-05 15:38 UTC (permalink / raw)
  To: Max Reitz; +Cc: Vladimir Sementsov-Ogievskiy, qemu-devel, qemu-block

Am 30.01.2020 um 22:44 hat Max Reitz geschrieben:
> We will need this to verify that Quorum can let one of its children be
> replaced without breaking anything else.
> 
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>  block/quorum.c | 25 +++++++++++++++++++++++++
>  1 file changed, 25 insertions(+)
> 
> diff --git a/block/quorum.c b/block/quorum.c
> index 59cd524502..6a7224c9e4 100644
> --- a/block/quorum.c
> +++ b/block/quorum.c
> @@ -67,6 +67,13 @@ typedef struct QuorumVotes {
>  
>  typedef struct QuorumChild {
>      BdrvChild *child;
> +
> +    /*
> +     * If set, check whether this node can be replaced without any
> +     * other parent noticing: Unshare CONSISTENT_READ, and take the
> +     * WRITE permission.
> +     */
> +    bool to_be_replaced;

I don't understand these permission changes. How does (preparing for)
detaching a node from quorum make its content invalid? And why do we
suddenly need WRITE permissions even if the quorum node is only used
read-only?

The comment is a bit unclear, too. "check whether" implies that both
outcomes could be true, but it doesn't say what happens in either case.
Is this really "make sure that"?

Kevin



^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v3 09/21] quorum: Add QuorumChild.to_be_replaced
  2020-02-05 15:38   ` Kevin Wolf
@ 2020-02-06 10:11     ` Max Reitz
  2020-02-06 14:58       ` Kevin Wolf
  0 siblings, 1 reply; 41+ messages in thread
From: Max Reitz @ 2020-02-06 10:11 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: Vladimir Sementsov-Ogievskiy, qemu-devel, qemu-block


[-- Attachment #1.1: Type: text/plain, Size: 3513 bytes --]

On 05.02.20 16:38, Kevin Wolf wrote:
> Am 30.01.2020 um 22:44 hat Max Reitz geschrieben:
>> We will need this to verify that Quorum can let one of its children be
>> replaced without breaking anything else.
>>
>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>> ---
>>  block/quorum.c | 25 +++++++++++++++++++++++++
>>  1 file changed, 25 insertions(+)
>>
>> diff --git a/block/quorum.c b/block/quorum.c
>> index 59cd524502..6a7224c9e4 100644
>> --- a/block/quorum.c
>> +++ b/block/quorum.c
>> @@ -67,6 +67,13 @@ typedef struct QuorumVotes {
>>  
>>  typedef struct QuorumChild {
>>      BdrvChild *child;
>> +
>> +    /*
>> +     * If set, check whether this node can be replaced without any
>> +     * other parent noticing: Unshare CONSISTENT_READ, and take the
>> +     * WRITE permission.
>> +     */
>> +    bool to_be_replaced;
> 
> I don't understand these permission changes. How does (preparing for)
> detaching a node from quorum make its content invalid?

It doesn’t, of course.  What we are preparing for is to replace it by
some other node with some other content.

> And why do we
> suddenly need WRITE permissions even if the quorum node is only used
> read-only?
> 
> The comment is a bit unclear, too. "check whether" implies that both
> outcomes could be true, but it doesn't say what happens in either case.
> Is this really "make sure that"?

I think the comment is not only unclear, it is the problem.  (Well,
maybe the code is also.)

This series is about fixing at least some things about replacing nodes
by mirroring.  The original use cases this was introduced for was to fix
broken quorum children: The other children are still intact, so you read
from the quorum node and replace the broken child (which maybe shows
invalid data, or maybe just EIO) by the fixed mirror result.

Replacing that broken node by the fixed one changes the data that’s
visible on that node.

That’s fine with quorum because that one child never influenced its
result anyway.  In fact, we know that the new child agrees with the
quorum, so it actually reduces conflict.

But if there are other parents on the node, they may disagree.  So we
need to warn them that we will disrupt consistency by replacing the node
with a potentially completely different one.  If those other parents
don’t care about consistency (CONSISTENT_READ) and don’t mind data
changes (other WRITE users), then we can freely replace the node still.


Now (assuming that this reasoning makes sense) it may seem as if the
general block layer should handle such cases; like it should unshare
CONSISTENT_READ and take WRITE.  But that isn’t true, because it calls
bdrv_recurse_can_replace() specifically to check that the some node can
be replaced by the new node without changing any visible data.  So
usually there are no such sudden data changes.

Quorum is the only node that can tolerate such data changes on its
children without changing its own visible data.  Hence it’s responsible
for ensuring that there’s noone else that minds if one of its child
nodes is replaced by something else.

(Note that this isn’t about replacing a BdrvChild, but about replacing a
BDS.  It isn’t like only quorum’s BdrvChild will be made to point
somewhere else; all BdrvChild pointers to the old node will be made to
point to the new one.)


Again assuming this makes sense, I wonder how we could condense that
into a reasonable comment.

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v3 09/21] quorum: Add QuorumChild.to_be_replaced
  2020-02-06 10:11     ` Max Reitz
@ 2020-02-06 14:58       ` Kevin Wolf
  2020-02-06 15:21         ` Max Reitz
  0 siblings, 1 reply; 41+ messages in thread
From: Kevin Wolf @ 2020-02-06 14:58 UTC (permalink / raw)
  To: Max Reitz; +Cc: Vladimir Sementsov-Ogievskiy, qemu-devel, qemu-block

[-- Attachment #1: Type: text/plain, Size: 2947 bytes --]

Am 06.02.2020 um 11:11 hat Max Reitz geschrieben:
> On 05.02.20 16:38, Kevin Wolf wrote:
> > Am 30.01.2020 um 22:44 hat Max Reitz geschrieben:
> >> We will need this to verify that Quorum can let one of its children be
> >> replaced without breaking anything else.
> >>
> >> Signed-off-by: Max Reitz <mreitz@redhat.com>
> >> ---
> >>  block/quorum.c | 25 +++++++++++++++++++++++++
> >>  1 file changed, 25 insertions(+)
> >>
> >> diff --git a/block/quorum.c b/block/quorum.c
> >> index 59cd524502..6a7224c9e4 100644
> >> --- a/block/quorum.c
> >> +++ b/block/quorum.c
> >> @@ -67,6 +67,13 @@ typedef struct QuorumVotes {
> >>  
> >>  typedef struct QuorumChild {
> >>      BdrvChild *child;
> >> +
> >> +    /*
> >> +     * If set, check whether this node can be replaced without any
> >> +     * other parent noticing: Unshare CONSISTENT_READ, and take the
> >> +     * WRITE permission.
> >> +     */
> >> +    bool to_be_replaced;
> > 
> > I don't understand these permission changes. How does (preparing for)
> > detaching a node from quorum make its content invalid?
> 
> It doesn’t, of course.  What we are preparing for is to replace it by
> some other node with some other content.
> 
> > And why do we
> > suddenly need WRITE permissions even if the quorum node is only used
> > read-only?
> > 
> > The comment is a bit unclear, too. "check whether" implies that both
> > outcomes could be true, but it doesn't say what happens in either case.
> > Is this really "make sure that"?
> 
> I think the comment is not only unclear, it is the problem.  (Well,
> maybe the code is also.)
> 
> This series is about fixing at least some things about replacing nodes
> by mirroring.  The original use cases this was introduced for was to fix
> broken quorum children: The other children are still intact, so you read
> from the quorum node and replace the broken child (which maybe shows
> invalid data, or maybe just EIO) by the fixed mirror result.
> 
> Replacing that broken node by the fixed one changes the data that’s
> visible on that node.

Hm, yes, that's true. But I wonder if this is really something that the
permission system must catch. Like other graph manipulations, it's
essentially the user saying "trust me, I know what I'm doing, this node
makes sense in this place".

Because if you assume that the user could add a node with unsuitable
content and you want to prevent this, where do we stop?
blockdev-snapshot can insert a non-empty overlay, which would result in
visible data change. Should we therefore only allow snapshots when
shared writes are allowed? This doesn't work obviously.

So I'm inclined to say that this is the user's responsibility and we
don't have to jump through hoops to prevent every possible way that the
user could mess up. (Which often also result in preventing legitimate
cases like here a quorum of read-only nodes.)

Kevin

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 801 bytes --]

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v3 09/21] quorum: Add QuorumChild.to_be_replaced
  2020-02-06 14:58       ` Kevin Wolf
@ 2020-02-06 15:21         ` Max Reitz
  2020-02-06 15:51           ` Kevin Wolf
  0 siblings, 1 reply; 41+ messages in thread
From: Max Reitz @ 2020-02-06 15:21 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: Vladimir Sementsov-Ogievskiy, qemu-devel, qemu-block


[-- Attachment #1.1: Type: text/plain, Size: 3496 bytes --]

On 06.02.20 15:58, Kevin Wolf wrote:
> Am 06.02.2020 um 11:11 hat Max Reitz geschrieben:
>> On 05.02.20 16:38, Kevin Wolf wrote:
>>> Am 30.01.2020 um 22:44 hat Max Reitz geschrieben:
>>>> We will need this to verify that Quorum can let one of its children be
>>>> replaced without breaking anything else.
>>>>
>>>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>>>> ---
>>>>  block/quorum.c | 25 +++++++++++++++++++++++++
>>>>  1 file changed, 25 insertions(+)
>>>>
>>>> diff --git a/block/quorum.c b/block/quorum.c
>>>> index 59cd524502..6a7224c9e4 100644
>>>> --- a/block/quorum.c
>>>> +++ b/block/quorum.c
>>>> @@ -67,6 +67,13 @@ typedef struct QuorumVotes {
>>>>  
>>>>  typedef struct QuorumChild {
>>>>      BdrvChild *child;
>>>> +
>>>> +    /*
>>>> +     * If set, check whether this node can be replaced without any
>>>> +     * other parent noticing: Unshare CONSISTENT_READ, and take the
>>>> +     * WRITE permission.
>>>> +     */
>>>> +    bool to_be_replaced;
>>>
>>> I don't understand these permission changes. How does (preparing for)
>>> detaching a node from quorum make its content invalid?
>>
>> It doesn’t, of course.  What we are preparing for is to replace it by
>> some other node with some other content.
>>
>>> And why do we
>>> suddenly need WRITE permissions even if the quorum node is only used
>>> read-only?
>>>
>>> The comment is a bit unclear, too. "check whether" implies that both
>>> outcomes could be true, but it doesn't say what happens in either case.
>>> Is this really "make sure that"?
>>
>> I think the comment is not only unclear, it is the problem.  (Well,
>> maybe the code is also.)
>>
>> This series is about fixing at least some things about replacing nodes
>> by mirroring.  The original use cases this was introduced for was to fix
>> broken quorum children: The other children are still intact, so you read
>> from the quorum node and replace the broken child (which maybe shows
>> invalid data, or maybe just EIO) by the fixed mirror result.
>>
>> Replacing that broken node by the fixed one changes the data that’s
>> visible on that node.
> 
> Hm, yes, that's true. But I wonder if this is really something that the
> permission system must catch. Like other graph manipulations, it's
> essentially the user saying "trust me, I know what I'm doing, this node
> makes sense in this place".
> 
> Because if you assume that the user could add a node with unsuitable
> content and you want to prevent this, where do we stop?
> blockdev-snapshot can insert a non-empty overlay, which would result in
> visible data change. Should we therefore only allow snapshots when
> shared writes are allowed? This doesn't work obviously.
> 
> So I'm inclined to say that this is the user's responsibility and we
> don't have to jump through hoops to prevent every possible way that the
> user could mess up. (Which often also result in preventing legitimate
> cases like here a quorum of read-only nodes.)

Well, if you ask the question “where do we stop”, we also have to ask
the question “where do we start”.  If we say the user knows what they’re
doing, we might as well drop the whole can_replace infrastructure
altogether and just assume that you can replace any node by anything.

If the WRITE permission is the problem, then I suppose we can drop that.
 Unsharing CONSISTENT_READ is bad enough that it effectively deters all
other parents anyway.

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v3 09/21] quorum: Add QuorumChild.to_be_replaced
  2020-02-06 15:21         ` Max Reitz
@ 2020-02-06 15:51           ` Kevin Wolf
  2020-02-06 16:43             ` Max Reitz
  0 siblings, 1 reply; 41+ messages in thread
From: Kevin Wolf @ 2020-02-06 15:51 UTC (permalink / raw)
  To: Max Reitz; +Cc: Vladimir Sementsov-Ogievskiy, qemu-devel, qemu-block

[-- Attachment #1: Type: text/plain, Size: 5188 bytes --]

Am 06.02.2020 um 16:21 hat Max Reitz geschrieben:
> On 06.02.20 15:58, Kevin Wolf wrote:
> > Am 06.02.2020 um 11:11 hat Max Reitz geschrieben:
> >> On 05.02.20 16:38, Kevin Wolf wrote:
> >>> Am 30.01.2020 um 22:44 hat Max Reitz geschrieben:
> >>>> We will need this to verify that Quorum can let one of its children be
> >>>> replaced without breaking anything else.
> >>>>
> >>>> Signed-off-by: Max Reitz <mreitz@redhat.com>
> >>>> ---
> >>>>  block/quorum.c | 25 +++++++++++++++++++++++++
> >>>>  1 file changed, 25 insertions(+)
> >>>>
> >>>> diff --git a/block/quorum.c b/block/quorum.c
> >>>> index 59cd524502..6a7224c9e4 100644
> >>>> --- a/block/quorum.c
> >>>> +++ b/block/quorum.c
> >>>> @@ -67,6 +67,13 @@ typedef struct QuorumVotes {
> >>>>  
> >>>>  typedef struct QuorumChild {
> >>>>      BdrvChild *child;
> >>>> +
> >>>> +    /*
> >>>> +     * If set, check whether this node can be replaced without any
> >>>> +     * other parent noticing: Unshare CONSISTENT_READ, and take the
> >>>> +     * WRITE permission.
> >>>> +     */
> >>>> +    bool to_be_replaced;
> >>>
> >>> I don't understand these permission changes. How does (preparing for)
> >>> detaching a node from quorum make its content invalid?
> >>
> >> It doesn’t, of course.  What we are preparing for is to replace it by
> >> some other node with some other content.
> >>
> >>> And why do we
> >>> suddenly need WRITE permissions even if the quorum node is only used
> >>> read-only?
> >>>
> >>> The comment is a bit unclear, too. "check whether" implies that both
> >>> outcomes could be true, but it doesn't say what happens in either case.
> >>> Is this really "make sure that"?
> >>
> >> I think the comment is not only unclear, it is the problem.  (Well,
> >> maybe the code is also.)
> >>
> >> This series is about fixing at least some things about replacing nodes
> >> by mirroring.  The original use cases this was introduced for was to fix
> >> broken quorum children: The other children are still intact, so you read
> >> from the quorum node and replace the broken child (which maybe shows
> >> invalid data, or maybe just EIO) by the fixed mirror result.
> >>
> >> Replacing that broken node by the fixed one changes the data that’s
> >> visible on that node.
> > 
> > Hm, yes, that's true. But I wonder if this is really something that the
> > permission system must catch. Like other graph manipulations, it's
> > essentially the user saying "trust me, I know what I'm doing, this node
> > makes sense in this place".
> > 
> > Because if you assume that the user could add a node with unsuitable
> > content and you want to prevent this, where do we stop?
> > blockdev-snapshot can insert a non-empty overlay, which would result in
> > visible data change. Should we therefore only allow snapshots when
> > shared writes are allowed? This doesn't work obviously.
> > 
> > So I'm inclined to say that this is the user's responsibility and we
> > don't have to jump through hoops to prevent every possible way that the
> > user could mess up. (Which often also result in preventing legitimate
> > cases like here a quorum of read-only nodes.)
> 
> Well, if you ask the question “where do we stop”, we also have to ask
> the question “where do we start”.  If we say the user knows what they’re
> doing, we might as well drop the whole can_replace infrastructure
> altogether and just assume that you can replace any node by anything.

Well, I don't actually know if that would be completely unreasonable.
The idea was obviously to keep graph changes restricted to very specific
cases to avoid nasty surprises like triggering latent bugs. Meanwhile we
have quite a few more operations that allow changing the graph.

So if preventing some cases gives us headaches and is probably more work
than dealing with any bugs they might reveal, maybe preventing them is
wrong.

I'm just afraid that we might be overengineering this and waste time on
things that we don't actually get much use from.

> If the WRITE permission is the problem, then I suppose we can drop that.
>  Unsharing CONSISTENT_READ is bad enough that it effectively deters all
> other parents anyway.

WRITE is probably the more practical problem, though it's technically
the correct one to take.

CONSISTENT_READ is already a problem in theory because replacing a child
node with different content doesn't even match its definition:

    /**
     * A user that has the "permission" of consistent reads is guaranteed that
     * their view of the contents of the block device is complete and
     * self-consistent, representing the contents of a disk at a specific
     * point.
     *
     * For most block devices (including their backing files) this is true, but
     * the property cannot be maintained in a few situations like for
     * intermediate nodes of a commit block job.
     */
    BLK_PERM_CONSISTENT_READ    = 0x01,

Replacing an image with a different image means that the node represents
the content of a different disk now, but it's probably still complete
and self-consistent.

Kevin

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 801 bytes --]

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v3 09/21] quorum: Add QuorumChild.to_be_replaced
  2020-02-06 15:51           ` Kevin Wolf
@ 2020-02-06 16:43             ` Max Reitz
  2020-02-06 16:47               ` Max Reitz
  2020-02-06 16:57               ` Kevin Wolf
  0 siblings, 2 replies; 41+ messages in thread
From: Max Reitz @ 2020-02-06 16:43 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: Vladimir Sementsov-Ogievskiy, qemu-devel, qemu-block


[-- Attachment #1.1: Type: text/plain, Size: 5893 bytes --]

On 06.02.20 16:51, Kevin Wolf wrote:
> Am 06.02.2020 um 16:21 hat Max Reitz geschrieben:
>> On 06.02.20 15:58, Kevin Wolf wrote:
>>> Am 06.02.2020 um 11:11 hat Max Reitz geschrieben:
>>>> On 05.02.20 16:38, Kevin Wolf wrote:
>>>>> Am 30.01.2020 um 22:44 hat Max Reitz geschrieben:
>>>>>> We will need this to verify that Quorum can let one of its children be
>>>>>> replaced without breaking anything else.
>>>>>>
>>>>>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>>>>>> ---
>>>>>>  block/quorum.c | 25 +++++++++++++++++++++++++
>>>>>>  1 file changed, 25 insertions(+)
>>>>>>
>>>>>> diff --git a/block/quorum.c b/block/quorum.c
>>>>>> index 59cd524502..6a7224c9e4 100644
>>>>>> --- a/block/quorum.c
>>>>>> +++ b/block/quorum.c
>>>>>> @@ -67,6 +67,13 @@ typedef struct QuorumVotes {
>>>>>>  
>>>>>>  typedef struct QuorumChild {
>>>>>>      BdrvChild *child;
>>>>>> +
>>>>>> +    /*
>>>>>> +     * If set, check whether this node can be replaced without any
>>>>>> +     * other parent noticing: Unshare CONSISTENT_READ, and take the
>>>>>> +     * WRITE permission.
>>>>>> +     */
>>>>>> +    bool to_be_replaced;
>>>>>
>>>>> I don't understand these permission changes. How does (preparing for)
>>>>> detaching a node from quorum make its content invalid?
>>>>
>>>> It doesn’t, of course.  What we are preparing for is to replace it by
>>>> some other node with some other content.
>>>>
>>>>> And why do we
>>>>> suddenly need WRITE permissions even if the quorum node is only used
>>>>> read-only?
>>>>>
>>>>> The comment is a bit unclear, too. "check whether" implies that both
>>>>> outcomes could be true, but it doesn't say what happens in either case.
>>>>> Is this really "make sure that"?
>>>>
>>>> I think the comment is not only unclear, it is the problem.  (Well,
>>>> maybe the code is also.)
>>>>
>>>> This series is about fixing at least some things about replacing nodes
>>>> by mirroring.  The original use cases this was introduced for was to fix
>>>> broken quorum children: The other children are still intact, so you read
>>>> from the quorum node and replace the broken child (which maybe shows
>>>> invalid data, or maybe just EIO) by the fixed mirror result.
>>>>
>>>> Replacing that broken node by the fixed one changes the data that’s
>>>> visible on that node.
>>>
>>> Hm, yes, that's true. But I wonder if this is really something that the
>>> permission system must catch. Like other graph manipulations, it's
>>> essentially the user saying "trust me, I know what I'm doing, this node
>>> makes sense in this place".
>>>
>>> Because if you assume that the user could add a node with unsuitable
>>> content and you want to prevent this, where do we stop?
>>> blockdev-snapshot can insert a non-empty overlay, which would result in
>>> visible data change. Should we therefore only allow snapshots when
>>> shared writes are allowed? This doesn't work obviously.
>>>
>>> So I'm inclined to say that this is the user's responsibility and we
>>> don't have to jump through hoops to prevent every possible way that the
>>> user could mess up. (Which often also result in preventing legitimate
>>> cases like here a quorum of read-only nodes.)
>>
>> Well, if you ask the question “where do we stop”, we also have to ask
>> the question “where do we start”.  If we say the user knows what they’re
>> doing, we might as well drop the whole can_replace infrastructure
>> altogether and just assume that you can replace any node by anything.
> 
> Well, I don't actually know if that would be completely unreasonable.
> The idea was obviously to keep graph changes restricted to very specific
> cases to avoid nasty surprises like triggering latent bugs. Meanwhile we
> have quite a few more operations that allow changing the graph.
> 
> So if preventing some cases gives us headaches and is probably more work
> than dealing with any bugs they might reveal, maybe preventing them is
> wrong.
> 
> I'm just afraid that we might be overengineering this and waste time on
> things that we don't actually get much use from.

That’s why I’m asking.

>> If the WRITE permission is the problem, then I suppose we can drop that.
>>  Unsharing CONSISTENT_READ is bad enough that it effectively deters all
>> other parents anyway.
> 
> WRITE is probably the more practical problem, though it's technically
> the correct one to take.
> 
> CONSISTENT_READ is already a problem in theory because replacing a child
> node with different content doesn't even match its definition:
> 
>     /**
>      * A user that has the "permission" of consistent reads is guaranteed that
>      * their view of the contents of the block device is complete and
>      * self-consistent, representing the contents of a disk at a specific
>      * point.
>      *
>      * For most block devices (including their backing files) this is true, but
>      * the property cannot be maintained in a few situations like for
>      * intermediate nodes of a commit block job.
>      */
>     BLK_PERM_CONSISTENT_READ    = 0x01,
> 
> Replacing an image with a different image means that the node represents
> the content of a different disk now, but it's probably still complete
> and self-consistent.

At any point in time yes, but not over the time span of the change.  The
definition doesn’t say that the node represents the contents of a disk
at a specific point, but the view from the parent.

I argue that that view is always over some period of time, so if you
suddenly switch out the whole disk, then it isn’t a self-consistent view.

Alternatively, we could of course also just forego the permission system
here altogether and just check that there are no other parents at all.
(Which is effectively the same as unsharing CONSISTENT_READ.)

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v3 09/21] quorum: Add QuorumChild.to_be_replaced
  2020-02-06 16:43             ` Max Reitz
@ 2020-02-06 16:47               ` Max Reitz
  2020-02-06 16:58                 ` Kevin Wolf
  2020-02-06 16:57               ` Kevin Wolf
  1 sibling, 1 reply; 41+ messages in thread
From: Max Reitz @ 2020-02-06 16:47 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: Vladimir Sementsov-Ogievskiy, qemu-devel, qemu-block


[-- Attachment #1.1: Type: text/plain, Size: 4608 bytes --]

On 06.02.20 17:43, Max Reitz wrote:
> On 06.02.20 16:51, Kevin Wolf wrote:
>> Am 06.02.2020 um 16:21 hat Max Reitz geschrieben:
>>> On 06.02.20 15:58, Kevin Wolf wrote:
>>>> Am 06.02.2020 um 11:11 hat Max Reitz geschrieben:
>>>>> On 05.02.20 16:38, Kevin Wolf wrote:
>>>>>> Am 30.01.2020 um 22:44 hat Max Reitz geschrieben:
>>>>>>> We will need this to verify that Quorum can let one of its children be
>>>>>>> replaced without breaking anything else.
>>>>>>>
>>>>>>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>>>>>>> ---
>>>>>>>  block/quorum.c | 25 +++++++++++++++++++++++++
>>>>>>>  1 file changed, 25 insertions(+)
>>>>>>>
>>>>>>> diff --git a/block/quorum.c b/block/quorum.c
>>>>>>> index 59cd524502..6a7224c9e4 100644
>>>>>>> --- a/block/quorum.c
>>>>>>> +++ b/block/quorum.c
>>>>>>> @@ -67,6 +67,13 @@ typedef struct QuorumVotes {
>>>>>>>  
>>>>>>>  typedef struct QuorumChild {
>>>>>>>      BdrvChild *child;
>>>>>>> +
>>>>>>> +    /*
>>>>>>> +     * If set, check whether this node can be replaced without any
>>>>>>> +     * other parent noticing: Unshare CONSISTENT_READ, and take the
>>>>>>> +     * WRITE permission.
>>>>>>> +     */
>>>>>>> +    bool to_be_replaced;
>>>>>>
>>>>>> I don't understand these permission changes. How does (preparing for)
>>>>>> detaching a node from quorum make its content invalid?
>>>>>
>>>>> It doesn’t, of course.  What we are preparing for is to replace it by
>>>>> some other node with some other content.
>>>>>
>>>>>> And why do we
>>>>>> suddenly need WRITE permissions even if the quorum node is only used
>>>>>> read-only?
>>>>>>
>>>>>> The comment is a bit unclear, too. "check whether" implies that both
>>>>>> outcomes could be true, but it doesn't say what happens in either case.
>>>>>> Is this really "make sure that"?
>>>>>
>>>>> I think the comment is not only unclear, it is the problem.  (Well,
>>>>> maybe the code is also.)
>>>>>
>>>>> This series is about fixing at least some things about replacing nodes
>>>>> by mirroring.  The original use cases this was introduced for was to fix
>>>>> broken quorum children: The other children are still intact, so you read
>>>>> from the quorum node and replace the broken child (which maybe shows
>>>>> invalid data, or maybe just EIO) by the fixed mirror result.
>>>>>
>>>>> Replacing that broken node by the fixed one changes the data that’s
>>>>> visible on that node.
>>>>
>>>> Hm, yes, that's true. But I wonder if this is really something that the
>>>> permission system must catch. Like other graph manipulations, it's
>>>> essentially the user saying "trust me, I know what I'm doing, this node
>>>> makes sense in this place".
>>>>
>>>> Because if you assume that the user could add a node with unsuitable
>>>> content and you want to prevent this, where do we stop?
>>>> blockdev-snapshot can insert a non-empty overlay, which would result in
>>>> visible data change. Should we therefore only allow snapshots when
>>>> shared writes are allowed? This doesn't work obviously.
>>>>
>>>> So I'm inclined to say that this is the user's responsibility and we
>>>> don't have to jump through hoops to prevent every possible way that the
>>>> user could mess up. (Which often also result in preventing legitimate
>>>> cases like here a quorum of read-only nodes.)
>>>
>>> Well, if you ask the question “where do we stop”, we also have to ask
>>> the question “where do we start”.  If we say the user knows what they’re
>>> doing, we might as well drop the whole can_replace infrastructure
>>> altogether and just assume that you can replace any node by anything.
>>
>> Well, I don't actually know if that would be completely unreasonable.
>> The idea was obviously to keep graph changes restricted to very specific
>> cases to avoid nasty surprises like triggering latent bugs. Meanwhile we
>> have quite a few more operations that allow changing the graph.
>>
>> So if preventing some cases gives us headaches and is probably more work
>> than dealing with any bugs they might reveal, maybe preventing them is
>> wrong.
>>
>> I'm just afraid that we might be overengineering this and waste time on
>> things that we don't actually get much use from.
> 
> That’s why I’m asking.

(One thing to consider here, though, is that this series exists and has
been reviewed by Vladimir in full, so most of the engineering effort has
already been done.  In contrast, writing a new series to drop the whole
can_replace infrastructure with no replacement may actually cost more.)

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v3 09/21] quorum: Add QuorumChild.to_be_replaced
  2020-02-06 16:43             ` Max Reitz
  2020-02-06 16:47               ` Max Reitz
@ 2020-02-06 16:57               ` Kevin Wolf
  2020-02-06 17:06                 ` Max Reitz
  1 sibling, 1 reply; 41+ messages in thread
From: Kevin Wolf @ 2020-02-06 16:57 UTC (permalink / raw)
  To: Max Reitz; +Cc: Vladimir Sementsov-Ogievskiy, qemu-devel, qemu-block

[-- Attachment #1: Type: text/plain, Size: 6740 bytes --]

Am 06.02.2020 um 17:43 hat Max Reitz geschrieben:
> On 06.02.20 16:51, Kevin Wolf wrote:
> > Am 06.02.2020 um 16:21 hat Max Reitz geschrieben:
> >> On 06.02.20 15:58, Kevin Wolf wrote:
> >>> Am 06.02.2020 um 11:11 hat Max Reitz geschrieben:
> >>>> On 05.02.20 16:38, Kevin Wolf wrote:
> >>>>> Am 30.01.2020 um 22:44 hat Max Reitz geschrieben:
> >>>>>> We will need this to verify that Quorum can let one of its children be
> >>>>>> replaced without breaking anything else.
> >>>>>>
> >>>>>> Signed-off-by: Max Reitz <mreitz@redhat.com>
> >>>>>> ---
> >>>>>>  block/quorum.c | 25 +++++++++++++++++++++++++
> >>>>>>  1 file changed, 25 insertions(+)
> >>>>>>
> >>>>>> diff --git a/block/quorum.c b/block/quorum.c
> >>>>>> index 59cd524502..6a7224c9e4 100644
> >>>>>> --- a/block/quorum.c
> >>>>>> +++ b/block/quorum.c
> >>>>>> @@ -67,6 +67,13 @@ typedef struct QuorumVotes {
> >>>>>>  
> >>>>>>  typedef struct QuorumChild {
> >>>>>>      BdrvChild *child;
> >>>>>> +
> >>>>>> +    /*
> >>>>>> +     * If set, check whether this node can be replaced without any
> >>>>>> +     * other parent noticing: Unshare CONSISTENT_READ, and take the
> >>>>>> +     * WRITE permission.
> >>>>>> +     */
> >>>>>> +    bool to_be_replaced;
> >>>>>
> >>>>> I don't understand these permission changes. How does (preparing for)
> >>>>> detaching a node from quorum make its content invalid?
> >>>>
> >>>> It doesn’t, of course.  What we are preparing for is to replace it by
> >>>> some other node with some other content.
> >>>>
> >>>>> And why do we
> >>>>> suddenly need WRITE permissions even if the quorum node is only used
> >>>>> read-only?
> >>>>>
> >>>>> The comment is a bit unclear, too. "check whether" implies that both
> >>>>> outcomes could be true, but it doesn't say what happens in either case.
> >>>>> Is this really "make sure that"?
> >>>>
> >>>> I think the comment is not only unclear, it is the problem.  (Well,
> >>>> maybe the code is also.)
> >>>>
> >>>> This series is about fixing at least some things about replacing nodes
> >>>> by mirroring.  The original use cases this was introduced for was to fix
> >>>> broken quorum children: The other children are still intact, so you read
> >>>> from the quorum node and replace the broken child (which maybe shows
> >>>> invalid data, or maybe just EIO) by the fixed mirror result.
> >>>>
> >>>> Replacing that broken node by the fixed one changes the data that’s
> >>>> visible on that node.
> >>>
> >>> Hm, yes, that's true. But I wonder if this is really something that the
> >>> permission system must catch. Like other graph manipulations, it's
> >>> essentially the user saying "trust me, I know what I'm doing, this node
> >>> makes sense in this place".
> >>>
> >>> Because if you assume that the user could add a node with unsuitable
> >>> content and you want to prevent this, where do we stop?
> >>> blockdev-snapshot can insert a non-empty overlay, which would result in
> >>> visible data change. Should we therefore only allow snapshots when
> >>> shared writes are allowed? This doesn't work obviously.
> >>>
> >>> So I'm inclined to say that this is the user's responsibility and we
> >>> don't have to jump through hoops to prevent every possible way that the
> >>> user could mess up. (Which often also result in preventing legitimate
> >>> cases like here a quorum of read-only nodes.)
> >>
> >> Well, if you ask the question “where do we stop”, we also have to ask
> >> the question “where do we start”.  If we say the user knows what they’re
> >> doing, we might as well drop the whole can_replace infrastructure
> >> altogether and just assume that you can replace any node by anything.
> > 
> > Well, I don't actually know if that would be completely unreasonable.
> > The idea was obviously to keep graph changes restricted to very specific
> > cases to avoid nasty surprises like triggering latent bugs. Meanwhile we
> > have quite a few more operations that allow changing the graph.
> > 
> > So if preventing some cases gives us headaches and is probably more work
> > than dealing with any bugs they might reveal, maybe preventing them is
> > wrong.
> > 
> > I'm just afraid that we might be overengineering this and waste time on
> > things that we don't actually get much use from.
> 
> That’s why I’m asking.

Did I answer your question sufficiently then?

> >> If the WRITE permission is the problem, then I suppose we can drop that.
> >>  Unsharing CONSISTENT_READ is bad enough that it effectively deters all
> >> other parents anyway.
> > 
> > WRITE is probably the more practical problem, though it's technically
> > the correct one to take.
> > 
> > CONSISTENT_READ is already a problem in theory because replacing a child
> > node with different content doesn't even match its definition:
> > 
> >     /**
> >      * A user that has the "permission" of consistent reads is guaranteed that
> >      * their view of the contents of the block device is complete and
> >      * self-consistent, representing the contents of a disk at a specific
> >      * point.
> >      *
> >      * For most block devices (including their backing files) this is true, but
> >      * the property cannot be maintained in a few situations like for
> >      * intermediate nodes of a commit block job.
> >      */
> >     BLK_PERM_CONSISTENT_READ    = 0x01,
> > 
> > Replacing an image with a different image means that the node represents
> > the content of a different disk now, but it's probably still complete
> > and self-consistent.
> 
> At any point in time yes, but not over the time span of the change.  The
> definition doesn’t say that the node represents the contents of a disk
> at a specific point, but the view from the parent.
> 
> I argue that that view is always over some period of time, so if you
> suddenly switch out the whole disk, then it isn’t a self-consistent view.

I think your theory that it's over some period of time conflicts with
the documentation that says "at a specific point".

> Alternatively, we could of course also just forego the permission system
> here altogether and just check that there are no other parents at all.
> (Which is effectively the same as unsharing CONSISTENT_READ.)

This would sidestep all of the artificial permission twiddling, which
sounds good.

It would probably also needlessly restrict the allowed use cases, but
then, who cares about nodes with multiple parents, one of which is a
quorum node?

So I guess I would be fine with either checking that there are no
parents or maybe even just dropping the check completely.

Kevin

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 801 bytes --]

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v3 09/21] quorum: Add QuorumChild.to_be_replaced
  2020-02-06 16:47               ` Max Reitz
@ 2020-02-06 16:58                 ` Kevin Wolf
  0 siblings, 0 replies; 41+ messages in thread
From: Kevin Wolf @ 2020-02-06 16:58 UTC (permalink / raw)
  To: Max Reitz; +Cc: Vladimir Sementsov-Ogievskiy, qemu-devel, qemu-block

[-- Attachment #1: Type: text/plain, Size: 5035 bytes --]

Am 06.02.2020 um 17:47 hat Max Reitz geschrieben:
> On 06.02.20 17:43, Max Reitz wrote:
> > On 06.02.20 16:51, Kevin Wolf wrote:
> >> Am 06.02.2020 um 16:21 hat Max Reitz geschrieben:
> >>> On 06.02.20 15:58, Kevin Wolf wrote:
> >>>> Am 06.02.2020 um 11:11 hat Max Reitz geschrieben:
> >>>>> On 05.02.20 16:38, Kevin Wolf wrote:
> >>>>>> Am 30.01.2020 um 22:44 hat Max Reitz geschrieben:
> >>>>>>> We will need this to verify that Quorum can let one of its children be
> >>>>>>> replaced without breaking anything else.
> >>>>>>>
> >>>>>>> Signed-off-by: Max Reitz <mreitz@redhat.com>
> >>>>>>> ---
> >>>>>>>  block/quorum.c | 25 +++++++++++++++++++++++++
> >>>>>>>  1 file changed, 25 insertions(+)
> >>>>>>>
> >>>>>>> diff --git a/block/quorum.c b/block/quorum.c
> >>>>>>> index 59cd524502..6a7224c9e4 100644
> >>>>>>> --- a/block/quorum.c
> >>>>>>> +++ b/block/quorum.c
> >>>>>>> @@ -67,6 +67,13 @@ typedef struct QuorumVotes {
> >>>>>>>  
> >>>>>>>  typedef struct QuorumChild {
> >>>>>>>      BdrvChild *child;
> >>>>>>> +
> >>>>>>> +    /*
> >>>>>>> +     * If set, check whether this node can be replaced without any
> >>>>>>> +     * other parent noticing: Unshare CONSISTENT_READ, and take the
> >>>>>>> +     * WRITE permission.
> >>>>>>> +     */
> >>>>>>> +    bool to_be_replaced;
> >>>>>>
> >>>>>> I don't understand these permission changes. How does (preparing for)
> >>>>>> detaching a node from quorum make its content invalid?
> >>>>>
> >>>>> It doesn’t, of course.  What we are preparing for is to replace it by
> >>>>> some other node with some other content.
> >>>>>
> >>>>>> And why do we
> >>>>>> suddenly need WRITE permissions even if the quorum node is only used
> >>>>>> read-only?
> >>>>>>
> >>>>>> The comment is a bit unclear, too. "check whether" implies that both
> >>>>>> outcomes could be true, but it doesn't say what happens in either case.
> >>>>>> Is this really "make sure that"?
> >>>>>
> >>>>> I think the comment is not only unclear, it is the problem.  (Well,
> >>>>> maybe the code is also.)
> >>>>>
> >>>>> This series is about fixing at least some things about replacing nodes
> >>>>> by mirroring.  The original use cases this was introduced for was to fix
> >>>>> broken quorum children: The other children are still intact, so you read
> >>>>> from the quorum node and replace the broken child (which maybe shows
> >>>>> invalid data, or maybe just EIO) by the fixed mirror result.
> >>>>>
> >>>>> Replacing that broken node by the fixed one changes the data that’s
> >>>>> visible on that node.
> >>>>
> >>>> Hm, yes, that's true. But I wonder if this is really something that the
> >>>> permission system must catch. Like other graph manipulations, it's
> >>>> essentially the user saying "trust me, I know what I'm doing, this node
> >>>> makes sense in this place".
> >>>>
> >>>> Because if you assume that the user could add a node with unsuitable
> >>>> content and you want to prevent this, where do we stop?
> >>>> blockdev-snapshot can insert a non-empty overlay, which would result in
> >>>> visible data change. Should we therefore only allow snapshots when
> >>>> shared writes are allowed? This doesn't work obviously.
> >>>>
> >>>> So I'm inclined to say that this is the user's responsibility and we
> >>>> don't have to jump through hoops to prevent every possible way that the
> >>>> user could mess up. (Which often also result in preventing legitimate
> >>>> cases like here a quorum of read-only nodes.)
> >>>
> >>> Well, if you ask the question “where do we stop”, we also have to ask
> >>> the question “where do we start”.  If we say the user knows what they’re
> >>> doing, we might as well drop the whole can_replace infrastructure
> >>> altogether and just assume that you can replace any node by anything.
> >>
> >> Well, I don't actually know if that would be completely unreasonable.
> >> The idea was obviously to keep graph changes restricted to very specific
> >> cases to avoid nasty surprises like triggering latent bugs. Meanwhile we
> >> have quite a few more operations that allow changing the graph.
> >>
> >> So if preventing some cases gives us headaches and is probably more work
> >> than dealing with any bugs they might reveal, maybe preventing them is
> >> wrong.
> >>
> >> I'm just afraid that we might be overengineering this and waste time on
> >> things that we don't actually get much use from.
> > 
> > That’s why I’m asking.
> 
> (One thing to consider here, though, is that this series exists and has
> been reviewed by Vladimir in full, so most of the engineering effort has
> already been done.  In contrast, writing a new series to drop the whole
> can_replace infrastructure with no replacement may actually cost more.)

Fair enough. If the artificial permission changes didn't feel so wrong,
I think I would just say "let's merge it and forget about it". But they
do feel wrong, so I'm not as sure.

Kevin

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 801 bytes --]

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v3 09/21] quorum: Add QuorumChild.to_be_replaced
  2020-02-06 16:57               ` Kevin Wolf
@ 2020-02-06 17:06                 ` Max Reitz
  2020-02-06 17:41                   ` Kevin Wolf
  0 siblings, 1 reply; 41+ messages in thread
From: Max Reitz @ 2020-02-06 17:06 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: Vladimir Sementsov-Ogievskiy, qemu-devel, qemu-block


[-- Attachment #1.1: Type: text/plain, Size: 7610 bytes --]

On 06.02.20 17:57, Kevin Wolf wrote:
> Am 06.02.2020 um 17:43 hat Max Reitz geschrieben:
>> On 06.02.20 16:51, Kevin Wolf wrote:
>>> Am 06.02.2020 um 16:21 hat Max Reitz geschrieben:
>>>> On 06.02.20 15:58, Kevin Wolf wrote:
>>>>> Am 06.02.2020 um 11:11 hat Max Reitz geschrieben:
>>>>>> On 05.02.20 16:38, Kevin Wolf wrote:
>>>>>>> Am 30.01.2020 um 22:44 hat Max Reitz geschrieben:
>>>>>>>> We will need this to verify that Quorum can let one of its children be
>>>>>>>> replaced without breaking anything else.
>>>>>>>>
>>>>>>>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>>>>>>>> ---
>>>>>>>>  block/quorum.c | 25 +++++++++++++++++++++++++
>>>>>>>>  1 file changed, 25 insertions(+)
>>>>>>>>
>>>>>>>> diff --git a/block/quorum.c b/block/quorum.c
>>>>>>>> index 59cd524502..6a7224c9e4 100644
>>>>>>>> --- a/block/quorum.c
>>>>>>>> +++ b/block/quorum.c
>>>>>>>> @@ -67,6 +67,13 @@ typedef struct QuorumVotes {
>>>>>>>>  
>>>>>>>>  typedef struct QuorumChild {
>>>>>>>>      BdrvChild *child;
>>>>>>>> +
>>>>>>>> +    /*
>>>>>>>> +     * If set, check whether this node can be replaced without any
>>>>>>>> +     * other parent noticing: Unshare CONSISTENT_READ, and take the
>>>>>>>> +     * WRITE permission.
>>>>>>>> +     */
>>>>>>>> +    bool to_be_replaced;
>>>>>>>
>>>>>>> I don't understand these permission changes. How does (preparing for)
>>>>>>> detaching a node from quorum make its content invalid?
>>>>>>
>>>>>> It doesn’t, of course.  What we are preparing for is to replace it by
>>>>>> some other node with some other content.
>>>>>>
>>>>>>> And why do we
>>>>>>> suddenly need WRITE permissions even if the quorum node is only used
>>>>>>> read-only?
>>>>>>>
>>>>>>> The comment is a bit unclear, too. "check whether" implies that both
>>>>>>> outcomes could be true, but it doesn't say what happens in either case.
>>>>>>> Is this really "make sure that"?
>>>>>>
>>>>>> I think the comment is not only unclear, it is the problem.  (Well,
>>>>>> maybe the code is also.)
>>>>>>
>>>>>> This series is about fixing at least some things about replacing nodes
>>>>>> by mirroring.  The original use cases this was introduced for was to fix
>>>>>> broken quorum children: The other children are still intact, so you read
>>>>>> from the quorum node and replace the broken child (which maybe shows
>>>>>> invalid data, or maybe just EIO) by the fixed mirror result.
>>>>>>
>>>>>> Replacing that broken node by the fixed one changes the data that’s
>>>>>> visible on that node.
>>>>>
>>>>> Hm, yes, that's true. But I wonder if this is really something that the
>>>>> permission system must catch. Like other graph manipulations, it's
>>>>> essentially the user saying "trust me, I know what I'm doing, this node
>>>>> makes sense in this place".
>>>>>
>>>>> Because if you assume that the user could add a node with unsuitable
>>>>> content and you want to prevent this, where do we stop?
>>>>> blockdev-snapshot can insert a non-empty overlay, which would result in
>>>>> visible data change. Should we therefore only allow snapshots when
>>>>> shared writes are allowed? This doesn't work obviously.
>>>>>
>>>>> So I'm inclined to say that this is the user's responsibility and we
>>>>> don't have to jump through hoops to prevent every possible way that the
>>>>> user could mess up. (Which often also result in preventing legitimate
>>>>> cases like here a quorum of read-only nodes.)
>>>>
>>>> Well, if you ask the question “where do we stop”, we also have to ask
>>>> the question “where do we start”.  If we say the user knows what they’re
>>>> doing, we might as well drop the whole can_replace infrastructure
>>>> altogether and just assume that you can replace any node by anything.
>>>
>>> Well, I don't actually know if that would be completely unreasonable.
>>> The idea was obviously to keep graph changes restricted to very specific
>>> cases to avoid nasty surprises like triggering latent bugs. Meanwhile we
>>> have quite a few more operations that allow changing the graph.
>>>
>>> So if preventing some cases gives us headaches and is probably more work
>>> than dealing with any bugs they might reveal, maybe preventing them is
>>> wrong.
>>>
>>> I'm just afraid that we might be overengineering this and waste time on
>>> things that we don't actually get much use from.
>>
>> That’s why I’m asking.
> 
> Did I answer your question sufficiently then?

No, because “I’m afraid” is a sentiment I fully share, but it doesn’t
answer the question whether we are indeed overengineering this or not. :-)

I suppose my stance now is “This is probably overengineered, but now we
might as well roll with it”.

>>>> If the WRITE permission is the problem, then I suppose we can drop that.
>>>>  Unsharing CONSISTENT_READ is bad enough that it effectively deters all
>>>> other parents anyway.
>>>
>>> WRITE is probably the more practical problem, though it's technically
>>> the correct one to take.
>>>
>>> CONSISTENT_READ is already a problem in theory because replacing a child
>>> node with different content doesn't even match its definition:
>>>
>>>     /**
>>>      * A user that has the "permission" of consistent reads is guaranteed that
>>>      * their view of the contents of the block device is complete and
>>>      * self-consistent, representing the contents of a disk at a specific
>>>      * point.
>>>      *
>>>      * For most block devices (including their backing files) this is true, but
>>>      * the property cannot be maintained in a few situations like for
>>>      * intermediate nodes of a commit block job.
>>>      */
>>>     BLK_PERM_CONSISTENT_READ    = 0x01,
>>>
>>> Replacing an image with a different image means that the node represents
>>> the content of a different disk now, but it's probably still complete
>>> and self-consistent.
>>
>> At any point in time yes, but not over the time span of the change.  The
>> definition doesn’t say that the node represents the contents of a disk
>> at a specific point, but the view from the parent.
>>
>> I argue that that view is always over some period of time, so if you
>> suddenly switch out the whole disk, then it isn’t a self-consistent view.
> 
> I think your theory that it's over some period of time conflicts with
> the documentation that says "at a specific point".

I’d rather not get into a deeper discussion on what CONSISTENT_READ
means again... :-/

I always feel like if you really take only a single point in time, then
anything could be some hypothetical disk.

So to me, unsharing CONSISTENT_READ effectively just means “Don’t touch
this, you don’t want to”.

>> Alternatively, we could of course also just forego the permission system
>> here altogether and just check that there are no other parents at all.
>> (Which is effectively the same as unsharing CONSISTENT_READ.)
> 
> This would sidestep all of the artificial permission twiddling, which
> sounds good.
> 
> It would probably also needlessly restrict the allowed use cases,

Only in theory, though, because in practice basically everything useful
takes CONSISTENT_READ anyway.

> but
> then, who cares about nodes with multiple parents, one of which is a
> quorum node?
> 
> So I guess I would be fine with either checking that there are no
> parents or maybe even just dropping the check completely.

OK, I’ll check the parent list then.  (Except it must be exactly one
parent, namely Quorum.)

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v3 09/21] quorum: Add QuorumChild.to_be_replaced
  2020-02-06 17:06                 ` Max Reitz
@ 2020-02-06 17:41                   ` Kevin Wolf
  0 siblings, 0 replies; 41+ messages in thread
From: Kevin Wolf @ 2020-02-06 17:41 UTC (permalink / raw)
  To: Max Reitz; +Cc: Vladimir Sementsov-Ogievskiy, qemu-devel, qemu-block

[-- Attachment #1: Type: text/plain, Size: 8754 bytes --]

Am 06.02.2020 um 18:06 hat Max Reitz geschrieben:
> On 06.02.20 17:57, Kevin Wolf wrote:
> > Am 06.02.2020 um 17:43 hat Max Reitz geschrieben:
> >> On 06.02.20 16:51, Kevin Wolf wrote:
> >>> Am 06.02.2020 um 16:21 hat Max Reitz geschrieben:
> >>>> On 06.02.20 15:58, Kevin Wolf wrote:
> >>>>> Am 06.02.2020 um 11:11 hat Max Reitz geschrieben:
> >>>>>> On 05.02.20 16:38, Kevin Wolf wrote:
> >>>>>>> Am 30.01.2020 um 22:44 hat Max Reitz geschrieben:
> >>>>>>>> We will need this to verify that Quorum can let one of its children be
> >>>>>>>> replaced without breaking anything else.
> >>>>>>>>
> >>>>>>>> Signed-off-by: Max Reitz <mreitz@redhat.com>
> >>>>>>>> ---
> >>>>>>>>  block/quorum.c | 25 +++++++++++++++++++++++++
> >>>>>>>>  1 file changed, 25 insertions(+)
> >>>>>>>>
> >>>>>>>> diff --git a/block/quorum.c b/block/quorum.c
> >>>>>>>> index 59cd524502..6a7224c9e4 100644
> >>>>>>>> --- a/block/quorum.c
> >>>>>>>> +++ b/block/quorum.c
> >>>>>>>> @@ -67,6 +67,13 @@ typedef struct QuorumVotes {
> >>>>>>>>  
> >>>>>>>>  typedef struct QuorumChild {
> >>>>>>>>      BdrvChild *child;
> >>>>>>>> +
> >>>>>>>> +    /*
> >>>>>>>> +     * If set, check whether this node can be replaced without any
> >>>>>>>> +     * other parent noticing: Unshare CONSISTENT_READ, and take the
> >>>>>>>> +     * WRITE permission.
> >>>>>>>> +     */
> >>>>>>>> +    bool to_be_replaced;
> >>>>>>>
> >>>>>>> I don't understand these permission changes. How does (preparing for)
> >>>>>>> detaching a node from quorum make its content invalid?
> >>>>>>
> >>>>>> It doesn’t, of course.  What we are preparing for is to replace it by
> >>>>>> some other node with some other content.
> >>>>>>
> >>>>>>> And why do we
> >>>>>>> suddenly need WRITE permissions even if the quorum node is only used
> >>>>>>> read-only?
> >>>>>>>
> >>>>>>> The comment is a bit unclear, too. "check whether" implies that both
> >>>>>>> outcomes could be true, but it doesn't say what happens in either case.
> >>>>>>> Is this really "make sure that"?
> >>>>>>
> >>>>>> I think the comment is not only unclear, it is the problem.  (Well,
> >>>>>> maybe the code is also.)
> >>>>>>
> >>>>>> This series is about fixing at least some things about replacing nodes
> >>>>>> by mirroring.  The original use cases this was introduced for was to fix
> >>>>>> broken quorum children: The other children are still intact, so you read
> >>>>>> from the quorum node and replace the broken child (which maybe shows
> >>>>>> invalid data, or maybe just EIO) by the fixed mirror result.
> >>>>>>
> >>>>>> Replacing that broken node by the fixed one changes the data that’s
> >>>>>> visible on that node.
> >>>>>
> >>>>> Hm, yes, that's true. But I wonder if this is really something that the
> >>>>> permission system must catch. Like other graph manipulations, it's
> >>>>> essentially the user saying "trust me, I know what I'm doing, this node
> >>>>> makes sense in this place".
> >>>>>
> >>>>> Because if you assume that the user could add a node with unsuitable
> >>>>> content and you want to prevent this, where do we stop?
> >>>>> blockdev-snapshot can insert a non-empty overlay, which would result in
> >>>>> visible data change. Should we therefore only allow snapshots when
> >>>>> shared writes are allowed? This doesn't work obviously.
> >>>>>
> >>>>> So I'm inclined to say that this is the user's responsibility and we
> >>>>> don't have to jump through hoops to prevent every possible way that the
> >>>>> user could mess up. (Which often also result in preventing legitimate
> >>>>> cases like here a quorum of read-only nodes.)
> >>>>
> >>>> Well, if you ask the question “where do we stop”, we also have to ask
> >>>> the question “where do we start”.  If we say the user knows what they’re
> >>>> doing, we might as well drop the whole can_replace infrastructure
> >>>> altogether and just assume that you can replace any node by anything.
> >>>
> >>> Well, I don't actually know if that would be completely unreasonable.
> >>> The idea was obviously to keep graph changes restricted to very specific
> >>> cases to avoid nasty surprises like triggering latent bugs. Meanwhile we
> >>> have quite a few more operations that allow changing the graph.
> >>>
> >>> So if preventing some cases gives us headaches and is probably more work
> >>> than dealing with any bugs they might reveal, maybe preventing them is
> >>> wrong.
> >>>
> >>> I'm just afraid that we might be overengineering this and waste time on
> >>> things that we don't actually get much use from.
> >>
> >> That’s why I’m asking.
> > 
> > Did I answer your question sufficiently then?
> 
> No, because “I’m afraid” is a sentiment I fully share, but it doesn’t
> answer the question whether we are indeed overengineering this or not. :-)

Well, I guess I can only answer this after seeing the bug reports we
would get after removing the check. :-)

> I suppose my stance now is “This is probably overengineered, but now we
> might as well roll with it”.

Your choice. I'm not opposed to anything that feels like it makes sense.

> >>>> If the WRITE permission is the problem, then I suppose we can drop that.
> >>>>  Unsharing CONSISTENT_READ is bad enough that it effectively deters all
> >>>> other parents anyway.
> >>>
> >>> WRITE is probably the more practical problem, though it's technically
> >>> the correct one to take.
> >>>
> >>> CONSISTENT_READ is already a problem in theory because replacing a child
> >>> node with different content doesn't even match its definition:
> >>>
> >>>     /**
> >>>      * A user that has the "permission" of consistent reads is guaranteed that
> >>>      * their view of the contents of the block device is complete and
> >>>      * self-consistent, representing the contents of a disk at a specific
> >>>      * point.
> >>>      *
> >>>      * For most block devices (including their backing files) this is true, but
> >>>      * the property cannot be maintained in a few situations like for
> >>>      * intermediate nodes of a commit block job.
> >>>      */
> >>>     BLK_PERM_CONSISTENT_READ    = 0x01,
> >>>
> >>> Replacing an image with a different image means that the node represents
> >>> the content of a different disk now, but it's probably still complete
> >>> and self-consistent.
> >>
> >> At any point in time yes, but not over the time span of the change.  The
> >> definition doesn’t say that the node represents the contents of a disk
> >> at a specific point, but the view from the parent.
> >>
> >> I argue that that view is always over some period of time, so if you
> >> suddenly switch out the whole disk, then it isn’t a self-consistent view.
> > 
> > I think your theory that it's over some period of time conflicts with
> > the documentation that says "at a specific point".
> 
> I’d rather not get into a deeper discussion on what CONSISTENT_READ
> means again... :-/
> 
> I always feel like if you really take only a single point in time, then
> anything could be some hypothetical disk.
> 
> So to me, unsharing CONSISTENT_READ effectively just means “Don’t touch
> this, you don’t want to”.

The difference is that with the replace operation we aren't talking
about hypothetical corrupted disks (like we would get when accessing
intermediate nodes of the commit job), but about two actual disk images
that are both valid, though different.

But yes, maybe we should avoid this discussion...

(I mean, what it *really* means is "this is not an intermediate node
of commit". ;-))

> >> Alternatively, we could of course also just forego the permission system
> >> here altogether and just check that there are no other parents at all.
> >> (Which is effectively the same as unsharing CONSISTENT_READ.)
> > 
> > This would sidestep all of the artificial permission twiddling, which
> > sounds good.
> > 
> > It would probably also needlessly restrict the allowed use cases,
> 
> Only in theory, though, because in practice basically everything useful
> takes CONSISTENT_READ anyway.

Oh, compared to taking WRITE and unsharing CONSISTENT_READ it's probably
not more restrictice. I was comparing with the case that drops the
checks altogether.

> > but
> > then, who cares about nodes with multiple parents, one of which is a
> > quorum node?
> > 
> > So I guess I would be fine with either checking that there are no
> > parents or maybe even just dropping the check completely.
> 
> OK, I’ll check the parent list then.  (Except it must be exactly one
> parent, namely Quorum.)

Fine with me.

Kevin

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 801 bytes --]

^ permalink raw reply	[flat|nested] 41+ messages in thread

end of thread, other threads:[~2020-02-06 17:52 UTC | newest]

Thread overview: 41+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-01-30 21:44 [PATCH v3 00/21] block: Fix check_to_replace_node() Max Reitz
2020-01-30 21:44 ` [PATCH v3 01/21] blockdev: Allow external snapshots everywhere Max Reitz
2020-01-30 21:44 ` [PATCH v3 02/21] blockdev: Allow resizing everywhere Max Reitz
2020-01-30 21:44 ` [PATCH v3 03/21] block: Drop bdrv_is_first_non_filter() Max Reitz
2020-01-30 21:44 ` [PATCH v3 04/21] iotests: Let 041 use -blockdev for quorum children Max Reitz
2020-01-30 21:44 ` [PATCH v3 05/21] quorum: Fix child permissions Max Reitz
2020-01-30 21:44 ` [PATCH v3 06/21] block: Add bdrv_recurse_can_replace() Max Reitz
2020-01-30 21:44 ` [PATCH v3 07/21] blkverify: Implement .bdrv_recurse_can_replace() Max Reitz
2020-01-30 21:44 ` [PATCH v3 08/21] quorum: Store children in own structure Max Reitz
2020-01-30 21:44 ` [PATCH v3 09/21] quorum: Add QuorumChild.to_be_replaced Max Reitz
2020-02-04  9:33   ` Vladimir Sementsov-Ogievskiy
2020-02-05 15:38   ` Kevin Wolf
2020-02-06 10:11     ` Max Reitz
2020-02-06 14:58       ` Kevin Wolf
2020-02-06 15:21         ` Max Reitz
2020-02-06 15:51           ` Kevin Wolf
2020-02-06 16:43             ` Max Reitz
2020-02-06 16:47               ` Max Reitz
2020-02-06 16:58                 ` Kevin Wolf
2020-02-06 16:57               ` Kevin Wolf
2020-02-06 17:06                 ` Max Reitz
2020-02-06 17:41                   ` Kevin Wolf
2020-01-30 21:44 ` [PATCH v3 10/21] quorum: Implement .bdrv_recurse_can_replace() Max Reitz
2020-02-04  9:37   ` Vladimir Sementsov-Ogievskiy
2020-01-30 21:44 ` [PATCH v3 11/21] block: Use bdrv_recurse_can_replace() Max Reitz
2020-01-30 21:44 ` [PATCH v3 12/21] block: Remove bdrv_recurse_is_first_non_filter() Max Reitz
2020-02-04  9:45   ` Vladimir Sementsov-Ogievskiy
2020-01-30 21:44 ` [PATCH v3 13/21] mirror: Double-check immediately before replacing Max Reitz
2020-01-30 21:44 ` [PATCH v3 14/21] quorum: Stop marking it as a filter Max Reitz
2020-01-30 21:44 ` [PATCH v3 15/21] iotests: Use complete_and_wait() in 155 Max Reitz
2020-01-30 21:44 ` [PATCH v3 16/21] iotests: Add VM.assert_block_path() Max Reitz
2020-02-04 10:33   ` Vladimir Sementsov-Ogievskiy
2020-02-04 14:09   ` Vladimir Sementsov-Ogievskiy
2020-01-30 21:44 ` [PATCH v3 17/21] iotests/041: Drop superfluous shutdowns Max Reitz
2020-02-04 11:40   ` Vladimir Sementsov-Ogievskiy
2020-01-30 21:44 ` [PATCH v3 18/21] iotests: Resolve TODOs in 041 Max Reitz
2020-02-04  9:54   ` Vladimir Sementsov-Ogievskiy
2020-01-30 21:44 ` [PATCH v3 19/21] iotests: Use self.image_len in TestRepairQuorum Max Reitz
2020-01-30 21:44 ` [PATCH v3 20/21] iotests: Add tests for invalid Quorum @replaces Max Reitz
2020-01-30 21:44 ` [PATCH v3 21/21] iotests: Check that @replaces can replace filters Max Reitz
2020-02-04  9:56   ` Vladimir Sementsov-Ogievskiy

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.