qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 00/22] block: Fix check_to_replace_node()
@ 2019-09-20 15:27 Max Reitz
  2019-09-20 15:27 ` [PATCH 01/22] blockdev: Allow external snapshots everywhere Max Reitz
                   ` (21 more replies)
  0 siblings, 22 replies; 58+ messages in thread
From: Max Reitz @ 2019-09-20 15:27 UTC (permalink / raw)
  To: qemu-block; +Cc: Kevin Wolf, Alberto Garcia, qemu-devel, Max Reitz

Based-on: <20190912135632.13925-1-mreitz@redhat.com>
(“mirror: Do not dereference invalid pointers”)


Hi,

We have this function bdrv_is_first_non_filter(), and I don’t know what
we have it for exactly.  It’s used in three places:
1. To allow snapshots only on such nodes,
2. To allow resizing only on such nodes,
3. For check_to_replace_node().

In none of these places it’s clear why we’d want it.

For (1), snapshots should just work for every node that supports backing
nodes.  We have such checks in place, so we don’t need to call
bdrv_is_first_non_filter(); and it should be fine to snapshot nodes
somewhere down a backing chain, too.

For (2), bdrv_truncate() should work on any node.  There is no reason
why we’d prevent the user from invoking block_resize only on the first
non-filter in a chain.  We now have the RESIZE permission, and as long
as that can be taken, any node can be resized.

For (3), it is just wrong.  The main reason it was introduced here was
to replace broken Quorum children.  But Quorum isn’t actually a filter,
and that is evidenced precisely here: The user wants to replace a child
that *does not* match the overall quorum.  We need something else
entirely here, namely a special bdrv_recurse_can_replace() function.

That the current approach doesn’t actually work can be seen by attaching
some other parent to a Quorum child, and then trying to replace that
child; Quorum will happily agree, but nobody asked that other parent
what they think of suddenly changing the data on their child.

It isn’t wrong to let a node be replaced when there are only filters
between it and the source node (because then they must have the same
data).  What’s wrong is that Quorum calls itself a filter.

In the new .bdrv_recurse_can_replace(), Quorum can be more aware of all
of this.  So it verifies that there is noone else who might care about
sudden data change by unsharing CONSISTENT_READ and taking the WRITE
permission.


The second problem is that mirror never checked whether the combination
of mirror command (drive/blockdev), sync mode, and @replaces asks for a
loop.  While bdrv_replace_node() was fixed in commit
ec9f10fe064f2abb9dc60a9fa580d8d0933f2acf to never create loops, mirror
should still reject such a configuration because it will probably not
achieve what the user expects.


Other things this series does:
- Fix Quorum’s permissions: It cannot share WRITE or RESIZE permissions
  because that would break the Quorum
- Drop .is_filter = true from Quorum
- Add some tests


(In case you’re wondering, yes, this 22-patch series does replace a
single patch from my 42-patch series “Deal with filters”.)


Max Reitz (22):
  blockdev: Allow external snapshots everywhere
  blockdev: Allow resizing everywhere
  block: Drop bdrv_is_first_non_filter()
  iotests: Let 041 use -blockdev for quorum children
  quorum: Fix child permissions
  block: Add bdrv_recurse_can_replace()
  blkverify: Implement .bdrv_recurse_can_replace()
  quorum: Store children in own structure
  quorum: Add QuorumChild.to_be_replaced
  quorum: Implement .bdrv_recurse_can_replace()
  block: Use bdrv_recurse_can_replace()
  block: Remove bdrv_recurse_is_first_non_filter()
  mirror: Double-check immediately before replacing
  quorum: Stop marking it as a filter
  mirror: Prevent loops
  iotests: Use complete_and_wait() in 155
  iotests: Add VM.assert_block_path()
  iotests: Resolve TODOs in 041
  iotests: Use self.image_len in TestRepairQuorum
  iotests: Add tests for invalid Quorum @replaces
  iotests: Check that @replaces can replace filters
  iotests: Mirror must not attempt to create loops

 include/block/block.h         |   5 -
 include/block/block_int.h     |  19 ++-
 block.c                       | 115 +++++++++------
 block/blkverify.c             |  20 +--
 block/copy-on-read.c          |   9 --
 block/mirror.c                |  31 +++-
 block/quorum.c                | 155 ++++++++++++++++----
 block/replication.c           |   7 -
 block/throttle.c              |   8 -
 blockdev.c                    |  58 ++++++--
 tests/qemu-iotests/041        | 268 +++++++++++++++++++++++++++++++++-
 tests/qemu-iotests/041.out    |   4 +-
 tests/qemu-iotests/155        |   7 +-
 tests/qemu-iotests/iotests.py |  48 ++++++
 14 files changed, 602 insertions(+), 152 deletions(-)

-- 
2.21.0



^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH 01/22] blockdev: Allow external snapshots everywhere
  2019-09-20 15:27 [PATCH 00/22] block: Fix check_to_replace_node() Max Reitz
@ 2019-09-20 15:27 ` Max Reitz
  2019-09-25 11:45   ` Vladimir Sementsov-Ogievskiy
  2019-09-20 15:27 ` [PATCH 02/22] blockdev: Allow resizing everywhere Max Reitz
                   ` (20 subsequent siblings)
  21 siblings, 1 reply; 58+ messages in thread
From: Max Reitz @ 2019-09-20 15:27 UTC (permalink / raw)
  To: qemu-block; +Cc: Kevin Wolf, Alberto Garcia, qemu-devel, Max Reitz

There is no good reason why we would allow external snapshots only on
the first non-filter node in a chain.  Parent BDSs should not care
whether their child is replaced by a snapshot.  (If they do care, they
should announce that via freezing the chain, which is checked in
bdrv_append() through bdrv_set_backing_hd().)

Before we had bdrv_is_first_non_filter() here (since 212a5a8f095), there
was a special function bdrv_check_ext_snapshot() that allowed snapshots
by default, but block drivers could override this.  Only blkverify did
so, however.

It is not clear to me why blkverify would do so; maybe just so that the
testee block driver would not be replaced.  The introducing commit
f6186f49e2c does not explain why.  Maybe because 08b24cfe376 would have
been the correct solution?  (Which adds a .supports_backing check.)

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 blockdev.c | 5 -----
 1 file changed, 5 deletions(-)

diff --git a/blockdev.c b/blockdev.c
index f89e48fc79..b62b33dc03 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -1596,11 +1596,6 @@ static void external_snapshot_prepare(BlkActionState *common,
         }
     }
 
-    if (!bdrv_is_first_non_filter(state->old_bs)) {
-        error_setg(errp, QERR_FEATURE_DISABLED, "snapshot");
-        goto out;
-    }
-
     if (action->type == TRANSACTION_ACTION_KIND_BLOCKDEV_SNAPSHOT_SYNC) {
         BlockdevSnapshotSync *s = action->u.blockdev_snapshot_sync.data;
         const char *format = s->has_format ? s->format : "qcow2";
-- 
2.21.0



^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH 02/22] blockdev: Allow resizing everywhere
  2019-09-20 15:27 [PATCH 00/22] block: Fix check_to_replace_node() Max Reitz
  2019-09-20 15:27 ` [PATCH 01/22] blockdev: Allow external snapshots everywhere Max Reitz
@ 2019-09-20 15:27 ` Max Reitz
  2019-09-25 11:46   ` Vladimir Sementsov-Ogievskiy
  2019-09-20 15:27 ` [PATCH 03/22] block: Drop bdrv_is_first_non_filter() Max Reitz
                   ` (19 subsequent siblings)
  21 siblings, 1 reply; 58+ messages in thread
From: Max Reitz @ 2019-09-20 15:27 UTC (permalink / raw)
  To: qemu-block; +Cc: Kevin Wolf, Alberto Garcia, qemu-devel, Max Reitz

Block nodes that do not allow resizing should not share BLK_PERM_RESIZE.
It does not matter whether they are the first non-filter in their chain
or not.

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 blockdev.c | 5 -----
 1 file changed, 5 deletions(-)

diff --git a/blockdev.c b/blockdev.c
index b62b33dc03..0420bc29be 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -3198,11 +3198,6 @@ void qmp_block_resize(bool has_device, const char *device,
     aio_context = bdrv_get_aio_context(bs);
     aio_context_acquire(aio_context);
 
-    if (!bdrv_is_first_non_filter(bs)) {
-        error_setg(errp, QERR_FEATURE_DISABLED, "resize");
-        goto out;
-    }
-
     if (size < 0) {
         error_setg(errp, QERR_INVALID_PARAMETER_VALUE, "size", "a >0 size");
         goto out;
-- 
2.21.0



^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH 03/22] block: Drop bdrv_is_first_non_filter()
  2019-09-20 15:27 [PATCH 00/22] block: Fix check_to_replace_node() Max Reitz
  2019-09-20 15:27 ` [PATCH 01/22] blockdev: Allow external snapshots everywhere Max Reitz
  2019-09-20 15:27 ` [PATCH 02/22] blockdev: Allow resizing everywhere Max Reitz
@ 2019-09-20 15:27 ` Max Reitz
  2019-09-25 11:46   ` Vladimir Sementsov-Ogievskiy
  2019-09-20 15:27 ` [PATCH 04/22] iotests: Let 041 use -blockdev for quorum children Max Reitz
                   ` (18 subsequent siblings)
  21 siblings, 1 reply; 58+ messages in thread
From: Max Reitz @ 2019-09-20 15:27 UTC (permalink / raw)
  To: qemu-block; +Cc: Kevin Wolf, Alberto Garcia, qemu-devel, Max Reitz

It is unused now.  (And it was ugly because it needed to explore all BDS
chains from the top.)

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 include/block/block.h |  1 -
 block.c               | 26 --------------------------
 2 files changed, 27 deletions(-)

diff --git a/include/block/block.h b/include/block/block.h
index 37c9de7446..d3ccab4722 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -403,7 +403,6 @@ int bdrv_amend_options(BlockDriverState *bs_new, QemuOpts *opts,
 /* external snapshots */
 bool bdrv_recurse_is_first_non_filter(BlockDriverState *bs,
                                       BlockDriverState *candidate);
-bool bdrv_is_first_non_filter(BlockDriverState *candidate);
 
 /* check if a named node can be replaced when doing drive-mirror */
 BlockDriverState *check_to_replace_node(BlockDriverState *parent_bs,
diff --git a/block.c b/block.c
index 1c7c199849..7d99ca692c 100644
--- a/block.c
+++ b/block.c
@@ -6206,32 +6206,6 @@ bool bdrv_recurse_is_first_non_filter(BlockDriverState *bs,
     return false;
 }
 
-/* This function checks if the candidate is the first non filter bs down it's
- * bs chain. Since we don't have pointers to parents it explore all bs chains
- * from the top. Some filters can choose not to pass down the recursion.
- */
-bool bdrv_is_first_non_filter(BlockDriverState *candidate)
-{
-    BlockDriverState *bs;
-    BdrvNextIterator it;
-
-    /* walk down the bs forest recursively */
-    for (bs = bdrv_first(&it); bs; bs = bdrv_next(&it)) {
-        bool perm;
-
-        /* try to recurse in this top level bs */
-        perm = bdrv_recurse_is_first_non_filter(bs, candidate);
-
-        /* candidate is the first non filter */
-        if (perm) {
-            bdrv_next_cleanup(&it);
-            return true;
-        }
-    }
-
-    return false;
-}
-
 BlockDriverState *check_to_replace_node(BlockDriverState *parent_bs,
                                         const char *node_name, Error **errp)
 {
-- 
2.21.0



^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH 04/22] iotests: Let 041 use -blockdev for quorum children
  2019-09-20 15:27 [PATCH 00/22] block: Fix check_to_replace_node() Max Reitz
                   ` (2 preceding siblings ...)
  2019-09-20 15:27 ` [PATCH 03/22] block: Drop bdrv_is_first_non_filter() Max Reitz
@ 2019-09-20 15:27 ` Max Reitz
  2019-09-25 11:51   ` Vladimir Sementsov-Ogievskiy
  2019-09-20 15:27 ` [PATCH 05/22] quorum: Fix child permissions Max Reitz
                   ` (17 subsequent siblings)
  21 siblings, 1 reply; 58+ messages in thread
From: Max Reitz @ 2019-09-20 15:27 UTC (permalink / raw)
  To: qemu-block; +Cc: Kevin Wolf, Alberto Garcia, qemu-devel, Max Reitz

Using -drive with default options means that a virtio-blk drive will be
created that has write access to the to-be quorum children.  Quorum
should have exclusive write access to them, so we should use -blockdev
instead.

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 tests/qemu-iotests/041 | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/tests/qemu-iotests/041 b/tests/qemu-iotests/041
index 84bc6d6581..d91f538276 100755
--- a/tests/qemu-iotests/041
+++ b/tests/qemu-iotests/041
@@ -884,7 +884,10 @@ class TestRepairQuorum(iotests.QMPTestCase):
             # Assign a node name to each quorum image in order to manipulate
             # them
             opts = "node-name=img%i" % self.IMAGES.index(i)
-            self.vm = self.vm.add_drive(i, opts)
+            opts += ',driver=%s' % iotests.imgfmt
+            opts += ',file.driver=file'
+            opts += ',file.filename=%s' % i
+            self.vm = self.vm.add_blockdev(opts)
 
         self.vm.launch()
 
-- 
2.21.0



^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH 05/22] quorum: Fix child permissions
  2019-09-20 15:27 [PATCH 00/22] block: Fix check_to_replace_node() Max Reitz
                   ` (3 preceding siblings ...)
  2019-09-20 15:27 ` [PATCH 04/22] iotests: Let 041 use -blockdev for quorum children Max Reitz
@ 2019-09-20 15:27 ` Max Reitz
  2019-09-25 11:56   ` Vladimir Sementsov-Ogievskiy
  2019-09-20 15:27 ` [PATCH 06/22] block: Add bdrv_recurse_can_replace() Max Reitz
                   ` (16 subsequent siblings)
  21 siblings, 1 reply; 58+ messages in thread
From: Max Reitz @ 2019-09-20 15:27 UTC (permalink / raw)
  To: qemu-block; +Cc: Kevin Wolf, Alberto Garcia, qemu-devel, Max Reitz

Quorum is not actually a filter.  It cannot share WRITE or RESIZE on its
children.

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 block/quorum.c | 19 ++++++++++++++++++-
 1 file changed, 18 insertions(+), 1 deletion(-)

diff --git a/block/quorum.c b/block/quorum.c
index df68adcfaa..17b439056f 100644
--- a/block/quorum.c
+++ b/block/quorum.c
@@ -1114,6 +1114,23 @@ static char *quorum_dirname(BlockDriverState *bs, Error **errp)
     return NULL;
 }
 
+static void quorum_child_perm(BlockDriverState *bs, BdrvChild *c,
+                              const BdrvChildRole *role,
+                              BlockReopenQueue *reopen_queue,
+                              uint64_t perm, uint64_t shared,
+                              uint64_t *nperm, uint64_t *nshared)
+{
+    *nperm = perm & DEFAULT_PERM_PASSTHROUGH;
+
+    /*
+     * We cannot share RESIZE or WRITE, as this would make the
+     * children differ from each other.
+     */
+    *nshared = (shared & (BLK_PERM_CONSISTENT_READ |
+                          BLK_PERM_WRITE_UNCHANGED))
+             | DEFAULT_PERM_UNCHANGED;
+}
+
 static const char *const quorum_strong_runtime_opts[] = {
     QUORUM_OPT_VOTE_THRESHOLD,
     QUORUM_OPT_BLKVERIFY,
@@ -1143,7 +1160,7 @@ static BlockDriver bdrv_quorum = {
     .bdrv_add_child                     = quorum_add_child,
     .bdrv_del_child                     = quorum_del_child,
 
-    .bdrv_child_perm                    = bdrv_filter_default_perms,
+    .bdrv_child_perm                    = quorum_child_perm,
 
     .is_filter                          = true,
     .bdrv_recurse_is_first_non_filter   = quorum_recurse_is_first_non_filter,
-- 
2.21.0



^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH 06/22] block: Add bdrv_recurse_can_replace()
  2019-09-20 15:27 [PATCH 00/22] block: Fix check_to_replace_node() Max Reitz
                   ` (4 preceding siblings ...)
  2019-09-20 15:27 ` [PATCH 05/22] quorum: Fix child permissions Max Reitz
@ 2019-09-20 15:27 ` Max Reitz
  2019-09-25 12:39   ` Vladimir Sementsov-Ogievskiy
  2019-09-20 15:27 ` [PATCH 07/22] blkverify: Implement .bdrv_recurse_can_replace() Max Reitz
                   ` (15 subsequent siblings)
  21 siblings, 1 reply; 58+ messages in thread
From: Max Reitz @ 2019-09-20 15:27 UTC (permalink / raw)
  To: qemu-block; +Cc: Kevin Wolf, Alberto Garcia, qemu-devel, Max Reitz

After a couple of follow-up patches, this function will replace
bdrv_recurse_is_first_non_filter() in check_to_replace_node().

bdrv_recurse_is_first_non_filter() is both not sufficiently specific for
check_to_replace_node() (it allows cases that should not be allowed,
like replacing child nodes of quorum with dissenting data that have more
parents than just quorum), and it is too restrictive (it is perfectly
fine to replace filters).

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 include/block/block_int.h | 10 ++++++++++
 block.c                   | 38 ++++++++++++++++++++++++++++++++++++++
 2 files changed, 48 insertions(+)

diff --git a/include/block/block_int.h b/include/block/block_int.h
index 5fd4f17d93..0be7d12f04 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -103,6 +103,13 @@ struct BlockDriver {
      */
     bool (*bdrv_recurse_is_first_non_filter)(BlockDriverState *bs,
                                              BlockDriverState *candidate);
+    /*
+     * Return true if @to_replace can be replaced by a BDS with the
+     * same data as @bs without it affecting @bs's behavior (that is,
+     * without it being visible to @bs's parents).
+     */
+    bool (*bdrv_recurse_can_replace)(BlockDriverState *bs,
+                                     BlockDriverState *to_replace);
 
     int (*bdrv_probe)(const uint8_t *buf, int buf_size, const char *filename);
     int (*bdrv_probe_device)(const char *filename);
@@ -1254,6 +1261,9 @@ void bdrv_format_default_perms(BlockDriverState *bs, BdrvChild *c,
                                uint64_t perm, uint64_t shared,
                                uint64_t *nperm, uint64_t *nshared);
 
+bool bdrv_recurse_can_replace(BlockDriverState *bs,
+                              BlockDriverState *to_replace);
+
 /*
  * Default implementation for drivers to pass bdrv_co_block_status() to
  * their file.
diff --git a/block.c b/block.c
index 7d99ca692c..a2deca4ac9 100644
--- a/block.c
+++ b/block.c
@@ -6206,6 +6206,44 @@ bool bdrv_recurse_is_first_non_filter(BlockDriverState *bs,
     return false;
 }
 
+/*
+ * This function checks whether the given @to_replace is allowed to be
+ * replaced by a node that always shows the same data as @bs.  This is
+ * used for example to verify whether the mirror job can replace
+ * @to_replace by the target mirrored from @bs.
+ * To be replaceable, @bs and @to_replace may either be guaranteed to
+ * always show the same data (because they are only connected through
+ * filters), or some driver may allow replacing one of its children
+ * because it can guarantee that this child's data is not visible at
+ * all (for example, for dissenting quorum children that have no other
+ * parents).
+ */
+bool bdrv_recurse_can_replace(BlockDriverState *bs,
+                              BlockDriverState *to_replace)
+{
+    if (!bs || !bs->drv) {
+        return false;
+    }
+
+    if (bs == to_replace) {
+        return true;
+    }
+
+    /* For filters, we can recurse on our own */
+    if (bs->drv->is_filter) {
+        BdrvChild *child = bs->file ?: bs->backing;
+        return bdrv_recurse_can_replace(child->bs, to_replace);
+    }
+
+    /* See what the driver can do */
+    if (bs->drv->bdrv_recurse_can_replace) {
+        return bs->drv->bdrv_recurse_can_replace(bs, to_replace);
+    }
+
+    /* Safe default */
+    return false;
+}
+
 BlockDriverState *check_to_replace_node(BlockDriverState *parent_bs,
                                         const char *node_name, Error **errp)
 {
-- 
2.21.0



^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH 07/22] blkverify: Implement .bdrv_recurse_can_replace()
  2019-09-20 15:27 [PATCH 00/22] block: Fix check_to_replace_node() Max Reitz
                   ` (5 preceding siblings ...)
  2019-09-20 15:27 ` [PATCH 06/22] block: Add bdrv_recurse_can_replace() Max Reitz
@ 2019-09-20 15:27 ` Max Reitz
  2019-09-25 12:56   ` Vladimir Sementsov-Ogievskiy
  2019-09-20 15:27 ` [PATCH 08/22] quorum: Store children in own structure Max Reitz
                   ` (14 subsequent siblings)
  21 siblings, 1 reply; 58+ messages in thread
From: Max Reitz @ 2019-09-20 15:27 UTC (permalink / raw)
  To: qemu-block; +Cc: Kevin Wolf, Alberto Garcia, qemu-devel, Max Reitz

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 block/blkverify.c | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/block/blkverify.c b/block/blkverify.c
index 304b0a1368..0add3ab483 100644
--- a/block/blkverify.c
+++ b/block/blkverify.c
@@ -282,6 +282,20 @@ static bool blkverify_recurse_is_first_non_filter(BlockDriverState *bs,
     return bdrv_recurse_is_first_non_filter(s->test_file->bs, candidate);
 }
 
+static bool blkverify_recurse_can_replace(BlockDriverState *bs,
+                                          BlockDriverState *to_replace)
+{
+    BDRVBlkverifyState *s = bs->opaque;
+
+    /*
+     * blkverify quits the whole qemu process if there is a mismatch
+     * between bs->file->bs and s->test_file->bs.  Therefore, we know
+     * know that both must match bs and we can recurse down to either.
+     */
+    return bdrv_recurse_can_replace(bs->file->bs, to_replace) ||
+           bdrv_recurse_can_replace(s->test_file->bs, to_replace);
+}
+
 static void blkverify_refresh_filename(BlockDriverState *bs)
 {
     BDRVBlkverifyState *s = bs->opaque;
@@ -328,6 +342,7 @@ static BlockDriver bdrv_blkverify = {
 
     .is_filter                        = true,
     .bdrv_recurse_is_first_non_filter = blkverify_recurse_is_first_non_filter,
+    .bdrv_recurse_can_replace         = blkverify_recurse_can_replace,
 };
 
 static void bdrv_blkverify_init(void)
-- 
2.21.0



^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH 08/22] quorum: Store children in own structure
  2019-09-20 15:27 [PATCH 00/22] block: Fix check_to_replace_node() Max Reitz
                   ` (6 preceding siblings ...)
  2019-09-20 15:27 ` [PATCH 07/22] blkverify: Implement .bdrv_recurse_can_replace() Max Reitz
@ 2019-09-20 15:27 ` Max Reitz
  2019-09-25 13:21   ` Vladimir Sementsov-Ogievskiy
  2019-09-20 15:27 ` [PATCH 09/22] quorum: Add QuorumChild.to_be_replaced Max Reitz
                   ` (13 subsequent siblings)
  21 siblings, 1 reply; 58+ messages in thread
From: Max Reitz @ 2019-09-20 15:27 UTC (permalink / raw)
  To: qemu-block; +Cc: Kevin Wolf, Alberto Garcia, qemu-devel, Max Reitz

This will be useful when we want to store additional attributes for each
child.

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 block/quorum.c | 58 ++++++++++++++++++++++++++++----------------------
 1 file changed, 33 insertions(+), 25 deletions(-)

diff --git a/block/quorum.c b/block/quorum.c
index 17b439056f..cf2171cc74 100644
--- a/block/quorum.c
+++ b/block/quorum.c
@@ -65,9 +65,13 @@ typedef struct QuorumVotes {
     bool (*compare)(QuorumVoteValue *a, QuorumVoteValue *b);
 } QuorumVotes;
 
+typedef struct QuorumChild {
+    BdrvChild *child;
+} QuorumChild;
+
 /* the following structure holds the state of one quorum instance */
 typedef struct BDRVQuorumState {
-    BdrvChild **children;  /* children BlockDriverStates */
+    QuorumChild *children;
     int num_children;      /* children count */
     unsigned next_child_index;  /* the index of the next child that should
                                  * be added
@@ -264,7 +268,7 @@ static void quorum_report_bad_versions(BDRVQuorumState *s,
         }
         QLIST_FOREACH(item, &version->items, next) {
             quorum_report_bad(QUORUM_OP_TYPE_READ, acb->offset, acb->bytes,
-                              s->children[item->index]->bs->node_name, 0);
+                              s->children[item->index].child->bs->node_name, 0);
         }
     }
 }
@@ -279,7 +283,7 @@ static void quorum_rewrite_entry(void *opaque)
      * corrupted data.
      * Mask out BDRV_REQ_WRITE_UNCHANGED because this overwrites the
      * area with different data from the other children. */
-    bdrv_co_pwritev(s->children[co->idx], acb->offset, acb->bytes,
+    bdrv_co_pwritev(s->children[co->idx].child, acb->offset, acb->bytes,
                     acb->qiov, acb->flags & ~BDRV_REQ_WRITE_UNCHANGED);
 
     /* Wake up the caller after the last rewrite */
@@ -578,8 +582,8 @@ static void read_quorum_children_entry(void *opaque)
     int i = co->idx;
     QuorumChildRequest *sacb = &acb->qcrs[i];
 
-    sacb->bs = s->children[i]->bs;
-    sacb->ret = bdrv_co_preadv(s->children[i], acb->offset, acb->bytes,
+    sacb->bs = s->children[i].child->bs;
+    sacb->ret = bdrv_co_preadv(s->children[i].child, acb->offset, acb->bytes,
                                &acb->qcrs[i].qiov, 0);
 
     if (sacb->ret == 0) {
@@ -605,7 +609,8 @@ static int read_quorum_children(QuorumAIOCB *acb)
 
     acb->children_read = s->num_children;
     for (i = 0; i < s->num_children; i++) {
-        acb->qcrs[i].buf = qemu_blockalign(s->children[i]->bs, acb->qiov->size);
+        acb->qcrs[i].buf = qemu_blockalign(s->children[i].child->bs,
+                                           acb->qiov->size);
         qemu_iovec_init(&acb->qcrs[i].qiov, acb->qiov->niov);
         qemu_iovec_clone(&acb->qcrs[i].qiov, acb->qiov, acb->qcrs[i].buf);
     }
@@ -647,8 +652,8 @@ static int read_fifo_child(QuorumAIOCB *acb)
     /* We try to read the next child in FIFO order if we failed to read */
     do {
         n = acb->children_read++;
-        acb->qcrs[n].bs = s->children[n]->bs;
-        ret = bdrv_co_preadv(s->children[n], acb->offset, acb->bytes,
+        acb->qcrs[n].bs = s->children[n].child->bs;
+        ret = bdrv_co_preadv(s->children[n].child, acb->offset, acb->bytes,
                              acb->qiov, 0);
         if (ret < 0) {
             quorum_report_bad_acb(&acb->qcrs[n], ret);
@@ -688,8 +693,8 @@ static void write_quorum_entry(void *opaque)
     int i = co->idx;
     QuorumChildRequest *sacb = &acb->qcrs[i];
 
-    sacb->bs = s->children[i]->bs;
-    sacb->ret = bdrv_co_pwritev(s->children[i], acb->offset, acb->bytes,
+    sacb->bs = s->children[i].child->bs;
+    sacb->ret = bdrv_co_pwritev(s->children[i].child, acb->offset, acb->bytes,
                                 acb->qiov, acb->flags);
     if (sacb->ret == 0) {
         acb->success_count++;
@@ -743,12 +748,12 @@ static int64_t quorum_getlength(BlockDriverState *bs)
     int i;
 
     /* check that all file have the same length */
-    result = bdrv_getlength(s->children[0]->bs);
+    result = bdrv_getlength(s->children[0].child->bs);
     if (result < 0) {
         return result;
     }
     for (i = 1; i < s->num_children; i++) {
-        int64_t value = bdrv_getlength(s->children[i]->bs);
+        int64_t value = bdrv_getlength(s->children[i].child->bs);
         if (value < 0) {
             return value;
         }
@@ -774,10 +779,10 @@ static coroutine_fn int quorum_co_flush(BlockDriverState *bs)
     error_votes.compare = quorum_64bits_compare;
 
     for (i = 0; i < s->num_children; i++) {
-        result = bdrv_co_flush(s->children[i]->bs);
+        result = bdrv_co_flush(s->children[i].child->bs);
         if (result) {
             quorum_report_bad(QUORUM_OP_TYPE_FLUSH, 0, 0,
-                              s->children[i]->bs->node_name, result);
+                              s->children[i].child->bs->node_name, result);
             result_value.l = result;
             quorum_count_vote(&error_votes, &result_value, i);
         } else {
@@ -803,7 +808,7 @@ static bool quorum_recurse_is_first_non_filter(BlockDriverState *bs,
     int i;
 
     for (i = 0; i < s->num_children; i++) {
-        bool perm = bdrv_recurse_is_first_non_filter(s->children[i]->bs,
+        bool perm = bdrv_recurse_is_first_non_filter(s->children[i].child->bs,
                                                      candidate);
         if (perm) {
             return true;
@@ -932,7 +937,7 @@ static int quorum_open(BlockDriverState *bs, QDict *options, int flags,
     }
 
     /* allocate the children array */
-    s->children = g_new0(BdrvChild *, s->num_children);
+    s->children = g_new0(QuorumChild, s->num_children);
     opened = g_new0(bool, s->num_children);
 
     for (i = 0; i < s->num_children; i++) {
@@ -940,8 +945,9 @@ static int quorum_open(BlockDriverState *bs, QDict *options, int flags,
         ret = snprintf(indexstr, 32, "children.%d", i);
         assert(ret < 32);
 
-        s->children[i] = bdrv_open_child(NULL, options, indexstr, bs,
-                                         &child_format, false, &local_err);
+        s->children[i].child = bdrv_open_child(NULL, options, indexstr, bs,
+                                               &child_format, false,
+                                               &local_err);
         if (local_err) {
             ret = -EINVAL;
             goto close_exit;
@@ -962,7 +968,7 @@ close_exit:
         if (!opened[i]) {
             continue;
         }
-        bdrv_unref_child(bs, s->children[i]);
+        bdrv_unref_child(bs, s->children[i].child);
     }
     g_free(s->children);
     g_free(opened);
@@ -979,7 +985,7 @@ static void quorum_close(BlockDriverState *bs)
     int i;
 
     for (i = 0; i < s->num_children; i++) {
-        bdrv_unref_child(bs, s->children[i]);
+        bdrv_unref_child(bs, s->children[i].child);
     }
 
     g_free(s->children);
@@ -1022,8 +1028,10 @@ static void quorum_add_child(BlockDriverState *bs, BlockDriverState *child_bs,
         s->next_child_index--;
         goto out;
     }
-    s->children = g_renew(BdrvChild *, s->children, s->num_children + 1);
-    s->children[s->num_children++] = child;
+    s->children = g_renew(QuorumChild, s->children, s->num_children + 1);
+    s->children[s->num_children++] = (QuorumChild){
+        .child = child,
+    };
 
 out:
     bdrv_drained_end(bs);
@@ -1036,7 +1044,7 @@ static void quorum_del_child(BlockDriverState *bs, BdrvChild *child,
     int i;
 
     for (i = 0; i < s->num_children; i++) {
-        if (s->children[i] == child) {
+        if (s->children[i].child == child) {
             break;
         }
     }
@@ -1059,7 +1067,7 @@ static void quorum_del_child(BlockDriverState *bs, BdrvChild *child,
     /* We can safely remove this child now */
     memmove(&s->children[i], &s->children[i + 1],
             (s->num_children - i - 1) * sizeof(BdrvChild *));
-    s->children = g_renew(BdrvChild *, s->children, --s->num_children);
+    s->children = g_renew(QuorumChild, s->children, --s->num_children);
     bdrv_unref_child(bs, child);
 
     bdrv_drained_end(bs);
@@ -1100,7 +1108,7 @@ static void quorum_gather_child_options(BlockDriverState *bs, QDict *target,
 
     for (i = 0; i < s->num_children; i++) {
         qlist_append(children_list,
-                     qobject_ref(s->children[i]->bs->full_open_options));
+                     qobject_ref(s->children[i].child->bs->full_open_options));
     }
 }
 
-- 
2.21.0



^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH 09/22] quorum: Add QuorumChild.to_be_replaced
  2019-09-20 15:27 [PATCH 00/22] block: Fix check_to_replace_node() Max Reitz
                   ` (7 preceding siblings ...)
  2019-09-20 15:27 ` [PATCH 08/22] quorum: Store children in own structure Max Reitz
@ 2019-09-20 15:27 ` Max Reitz
  2019-09-25 13:45   ` Vladimir Sementsov-Ogievskiy
  2019-09-20 15:27 ` [PATCH 10/22] quorum: Implement .bdrv_recurse_can_replace() Max Reitz
                   ` (12 subsequent siblings)
  21 siblings, 1 reply; 58+ messages in thread
From: Max Reitz @ 2019-09-20 15:27 UTC (permalink / raw)
  To: qemu-block; +Cc: Kevin Wolf, Alberto Garcia, qemu-devel, Max Reitz

We will need this to verify that Quorum can let one of its children be
replaced without breaking anything else.

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 block/quorum.c | 23 +++++++++++++++++++++++
 1 file changed, 23 insertions(+)

diff --git a/block/quorum.c b/block/quorum.c
index cf2171cc74..207054a64e 100644
--- a/block/quorum.c
+++ b/block/quorum.c
@@ -67,6 +67,13 @@ typedef struct QuorumVotes {
 
 typedef struct QuorumChild {
     BdrvChild *child;
+
+    /*
+     * If set, check whether this node can be replaced without any
+     * other parent noticing: Unshare CONSISTENT_READ, and take the
+     * WRITE permission.
+     */
+    bool to_be_replaced;
 } QuorumChild;
 
 /* the following structure holds the state of one quorum instance */
@@ -1128,6 +1135,16 @@ static void quorum_child_perm(BlockDriverState *bs, BdrvChild *c,
                               uint64_t perm, uint64_t shared,
                               uint64_t *nperm, uint64_t *nshared)
 {
+    BDRVQuorumState *s = bs->opaque;
+    int i;
+
+    for (i = 0; i < s->num_children; i++) {
+        if (s->children[i].child == c) {
+            break;
+        }
+    }
+    assert(!c || i < s->num_children);
+
     *nperm = perm & DEFAULT_PERM_PASSTHROUGH;
 
     /*
@@ -1137,6 +1154,12 @@ static void quorum_child_perm(BlockDriverState *bs, BdrvChild *c,
     *nshared = (shared & (BLK_PERM_CONSISTENT_READ |
                           BLK_PERM_WRITE_UNCHANGED))
              | DEFAULT_PERM_UNCHANGED;
+
+    if (c && s->children[i].to_be_replaced) {
+        /* Prepare for sudden data changes */
+        *nperm |= BLK_PERM_WRITE;
+        *nshared &= ~BLK_PERM_CONSISTENT_READ;
+    }
 }
 
 static const char *const quorum_strong_runtime_opts[] = {
-- 
2.21.0



^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH 10/22] quorum: Implement .bdrv_recurse_can_replace()
  2019-09-20 15:27 [PATCH 00/22] block: Fix check_to_replace_node() Max Reitz
                   ` (8 preceding siblings ...)
  2019-09-20 15:27 ` [PATCH 09/22] quorum: Add QuorumChild.to_be_replaced Max Reitz
@ 2019-09-20 15:27 ` Max Reitz
  2019-09-25 13:39   ` Vladimir Sementsov-Ogievskiy
                     ` (2 more replies)
  2019-09-20 15:27 ` [PATCH 11/22] block: Use bdrv_recurse_can_replace() Max Reitz
                   ` (11 subsequent siblings)
  21 siblings, 3 replies; 58+ messages in thread
From: Max Reitz @ 2019-09-20 15:27 UTC (permalink / raw)
  To: qemu-block; +Cc: Kevin Wolf, Alberto Garcia, qemu-devel, Max Reitz

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 block/quorum.c | 62 ++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 62 insertions(+)

diff --git a/block/quorum.c b/block/quorum.c
index 207054a64e..81b57dbae2 100644
--- a/block/quorum.c
+++ b/block/quorum.c
@@ -825,6 +825,67 @@ static bool quorum_recurse_is_first_non_filter(BlockDriverState *bs,
     return false;
 }
 
+static bool quorum_recurse_can_replace(BlockDriverState *bs,
+                                       BlockDriverState *to_replace)
+{
+    BDRVQuorumState *s = bs->opaque;
+    int i;
+
+    for (i = 0; i < s->num_children; i++) {
+        /*
+         * We have no idea whether our children show the same data as
+         * this node (@bs).  It is actually highly likely that
+         * @to_replace does not, because replacing a broken child is
+         * one of the main use cases here.
+         *
+         * We do know that the new BDS will match @bs, so replacing
+         * any of our children by it will be safe.  It cannot change
+         * the data this quorum node presents to its parents.
+         *
+         * However, replacing @to_replace by @bs in any of our
+         * children's chains may change visible data somewhere in
+         * there.  We therefore cannot recurse down those chains with
+         * bdrv_recurse_can_replace().
+         * (More formally, bdrv_recurse_can_replace() requires that
+         * @to_replace will be replaced by something matching the @bs
+         * passed to it.  We cannot guarantee that.)
+         *
+         * Thus, we can only check whether any of our immediate
+         * children matches @to_replace.
+         *
+         * (In the future, we might add a function to recurse down a
+         * chain that checks that nothing there cares about a change
+         * in data from the respective child in question.  For
+         * example, most filters do not care when their child's data
+         * suddenly changes, as long as their parents do not care.)
+         */
+        if (s->children[i].child->bs == to_replace) {
+            Error *local_err = NULL;
+
+            /*
+             * We now have to ensure that there is no other parent
+             * that cares about replacing this child by a node with
+             * potentially different data.
+             */
+            s->children[i].to_be_replaced = true;
+            bdrv_child_refresh_perms(bs, s->children[i].child, &local_err);
+
+            /* Revert permissions */
+            s->children[i].to_be_replaced = false;
+            bdrv_child_refresh_perms(bs, s->children[i].child, &error_abort);
+
+            if (local_err) {
+                error_free(local_err);
+                return false;
+            }
+
+            return true;
+        }
+    }
+
+    return false;
+}
+
 static int quorum_valid_threshold(int threshold, int num_children, Error **errp)
 {
 
@@ -1195,6 +1256,7 @@ static BlockDriver bdrv_quorum = {
 
     .is_filter                          = true,
     .bdrv_recurse_is_first_non_filter   = quorum_recurse_is_first_non_filter,
+    .bdrv_recurse_can_replace           = quorum_recurse_can_replace,
 
     .strong_runtime_opts                = quorum_strong_runtime_opts,
 };
-- 
2.21.0



^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH 11/22] block: Use bdrv_recurse_can_replace()
  2019-09-20 15:27 [PATCH 00/22] block: Fix check_to_replace_node() Max Reitz
                   ` (9 preceding siblings ...)
  2019-09-20 15:27 ` [PATCH 10/22] quorum: Implement .bdrv_recurse_can_replace() Max Reitz
@ 2019-09-20 15:27 ` Max Reitz
  2019-09-20 15:27 ` [PATCH 12/22] block: Remove bdrv_recurse_is_first_non_filter() Max Reitz
                   ` (10 subsequent siblings)
  21 siblings, 0 replies; 58+ messages in thread
From: Max Reitz @ 2019-09-20 15:27 UTC (permalink / raw)
  To: qemu-block; +Cc: Kevin Wolf, Alberto Garcia, qemu-devel, Max Reitz

Let check_to_replace_node() use the more specialized
bdrv_recurse_can_replace() instead of
bdrv_recurse_is_first_non_filter(), which is too restrictive.

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 block.c | 18 ++++++++++++++++--
 1 file changed, 16 insertions(+), 2 deletions(-)

diff --git a/block.c b/block.c
index a2deca4ac9..02177bde9a 100644
--- a/block.c
+++ b/block.c
@@ -6244,6 +6244,17 @@ bool bdrv_recurse_can_replace(BlockDriverState *bs,
     return false;
 }
 
+/*
+ * Check whether the given @node_name can be replaced by a node that
+ * has the same data as @parent_bs.  If so, return @node_name's BDS;
+ * NULL otherwise.
+ *
+ * @node_name must be a (recursive) *child of @parent_bs (or this
+ * function will return NULL).
+ *
+ * The result (whether the node can be replaced or not) is only valid
+ * for as long as no graph changes occur.
+ */
 BlockDriverState *check_to_replace_node(BlockDriverState *parent_bs,
                                         const char *node_name, Error **errp)
 {
@@ -6268,8 +6279,11 @@ BlockDriverState *check_to_replace_node(BlockDriverState *parent_bs,
      * Another benefit is that this tests exclude backing files which are
      * blocked by the backing blockers.
      */
-    if (!bdrv_recurse_is_first_non_filter(parent_bs, to_replace_bs)) {
-        error_setg(errp, "Only top most non filter can be replaced");
+    if (!bdrv_recurse_can_replace(parent_bs, to_replace_bs)) {
+        error_setg(errp, "Cannot replace '%s' by a node mirrored from '%s', "
+                   "because it cannot be guaranteed that doing so would not "
+                   "lead to an abrupt change of visible data",
+                   node_name, parent_bs->node_name);
         to_replace_bs = NULL;
         goto out;
     }
-- 
2.21.0



^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH 12/22] block: Remove bdrv_recurse_is_first_non_filter()
  2019-09-20 15:27 [PATCH 00/22] block: Fix check_to_replace_node() Max Reitz
                   ` (10 preceding siblings ...)
  2019-09-20 15:27 ` [PATCH 11/22] block: Use bdrv_recurse_can_replace() Max Reitz
@ 2019-09-20 15:27 ` Max Reitz
  2019-09-25 14:16   ` Vladimir Sementsov-Ogievskiy
  2019-09-20 15:27 ` [PATCH 13/22] mirror: Double-check immediately before replacing Max Reitz
                   ` (9 subsequent siblings)
  21 siblings, 1 reply; 58+ messages in thread
From: Max Reitz @ 2019-09-20 15:27 UTC (permalink / raw)
  To: qemu-block; +Cc: Kevin Wolf, Alberto Garcia, qemu-devel, Max Reitz

It no longer has any users.

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 include/block/block.h     |  4 ----
 include/block/block_int.h |  8 --------
 block.c                   | 33 ---------------------------------
 block/blkverify.c         | 15 ---------------
 block/copy-on-read.c      |  9 ---------
 block/quorum.c            | 18 ------------------
 block/replication.c       |  7 -------
 block/throttle.c          |  8 --------
 8 files changed, 102 deletions(-)

diff --git a/include/block/block.h b/include/block/block.h
index d3ccab4722..2944980c03 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -400,10 +400,6 @@ int bdrv_amend_options(BlockDriverState *bs_new, QemuOpts *opts,
                        BlockDriverAmendStatusCB *status_cb, void *cb_opaque,
                        Error **errp);
 
-/* external snapshots */
-bool bdrv_recurse_is_first_non_filter(BlockDriverState *bs,
-                                      BlockDriverState *candidate);
-
 /* check if a named node can be replaced when doing drive-mirror */
 BlockDriverState *check_to_replace_node(BlockDriverState *parent_bs,
                                         const char *node_name, Error **errp);
diff --git a/include/block/block_int.h b/include/block/block_int.h
index 0be7d12f04..70f26530c9 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -95,14 +95,6 @@ struct BlockDriver {
      * must implement them and return -ENOTSUP.
      */
     bool is_filter;
-    /* for snapshots block filter like Quorum can implement the
-     * following recursive callback.
-     * It's purpose is to recurse on the filter children while calling
-     * bdrv_recurse_is_first_non_filter on them.
-     * For a sample implementation look in the future Quorum block filter.
-     */
-    bool (*bdrv_recurse_is_first_non_filter)(BlockDriverState *bs,
-                                             BlockDriverState *candidate);
     /*
      * Return true if @to_replace can be replaced by a BDS with the
      * same data as @bs without it affecting @bs's behavior (that is,
diff --git a/block.c b/block.c
index 02177bde9a..0d9b3de98f 100644
--- a/block.c
+++ b/block.c
@@ -6173,39 +6173,6 @@ int bdrv_amend_options(BlockDriverState *bs, QemuOpts *opts,
     return bs->drv->bdrv_amend_options(bs, opts, status_cb, cb_opaque, errp);
 }
 
-/* This function will be called by the bdrv_recurse_is_first_non_filter method
- * of block filter and by bdrv_is_first_non_filter.
- * It is used to test if the given bs is the candidate or recurse more in the
- * node graph.
- */
-bool bdrv_recurse_is_first_non_filter(BlockDriverState *bs,
-                                      BlockDriverState *candidate)
-{
-    /* return false if basic checks fails */
-    if (!bs || !bs->drv) {
-        return false;
-    }
-
-    /* the code reached a non block filter driver -> check if the bs is
-     * the same as the candidate. It's the recursion termination condition.
-     */
-    if (!bs->drv->is_filter) {
-        return bs == candidate;
-    }
-    /* Down this path the driver is a block filter driver */
-
-    /* If the block filter recursion method is defined use it to recurse down
-     * the node graph.
-     */
-    if (bs->drv->bdrv_recurse_is_first_non_filter) {
-        return bs->drv->bdrv_recurse_is_first_non_filter(bs, candidate);
-    }
-
-    /* the driver is a block filter but don't allow to recurse -> return false
-     */
-    return false;
-}
-
 /*
  * This function checks whether the given @to_replace is allowed to be
  * replaced by a node that always shows the same data as @bs.  This is
diff --git a/block/blkverify.c b/block/blkverify.c
index 0add3ab483..ba6b1853ae 100644
--- a/block/blkverify.c
+++ b/block/blkverify.c
@@ -268,20 +268,6 @@ static int blkverify_co_flush(BlockDriverState *bs)
     return bdrv_co_flush(s->test_file->bs);
 }
 
-static bool blkverify_recurse_is_first_non_filter(BlockDriverState *bs,
-                                                  BlockDriverState *candidate)
-{
-    BDRVBlkverifyState *s = bs->opaque;
-
-    bool perm = bdrv_recurse_is_first_non_filter(bs->file->bs, candidate);
-
-    if (perm) {
-        return true;
-    }
-
-    return bdrv_recurse_is_first_non_filter(s->test_file->bs, candidate);
-}
-
 static bool blkverify_recurse_can_replace(BlockDriverState *bs,
                                           BlockDriverState *to_replace)
 {
@@ -341,7 +327,6 @@ static BlockDriver bdrv_blkverify = {
     .bdrv_co_flush                    = blkverify_co_flush,
 
     .is_filter                        = true,
-    .bdrv_recurse_is_first_non_filter = blkverify_recurse_is_first_non_filter,
     .bdrv_recurse_can_replace         = blkverify_recurse_can_replace,
 };
 
diff --git a/block/copy-on-read.c b/block/copy-on-read.c
index 6631f30205..3204259513 100644
--- a/block/copy-on-read.c
+++ b/block/copy-on-read.c
@@ -125,13 +125,6 @@ static void cor_lock_medium(BlockDriverState *bs, bool locked)
 }
 
 
-static bool cor_recurse_is_first_non_filter(BlockDriverState *bs,
-                                            BlockDriverState *candidate)
-{
-    return bdrv_recurse_is_first_non_filter(bs->file->bs, candidate);
-}
-
-
 static BlockDriver bdrv_copy_on_read = {
     .format_name                        = "copy-on-read",
 
@@ -151,8 +144,6 @@ static BlockDriver bdrv_copy_on_read = {
 
     .bdrv_co_block_status               = bdrv_co_block_status_from_file,
 
-    .bdrv_recurse_is_first_non_filter   = cor_recurse_is_first_non_filter,
-
     .has_variable_length                = true,
     .is_filter                          = true,
 };
diff --git a/block/quorum.c b/block/quorum.c
index 81b57dbae2..7a8f8b5475 100644
--- a/block/quorum.c
+++ b/block/quorum.c
@@ -808,23 +808,6 @@ static coroutine_fn int quorum_co_flush(BlockDriverState *bs)
     return result;
 }
 
-static bool quorum_recurse_is_first_non_filter(BlockDriverState *bs,
-                                               BlockDriverState *candidate)
-{
-    BDRVQuorumState *s = bs->opaque;
-    int i;
-
-    for (i = 0; i < s->num_children; i++) {
-        bool perm = bdrv_recurse_is_first_non_filter(s->children[i].child->bs,
-                                                     candidate);
-        if (perm) {
-            return true;
-        }
-    }
-
-    return false;
-}
-
 static bool quorum_recurse_can_replace(BlockDriverState *bs,
                                        BlockDriverState *to_replace)
 {
@@ -1255,7 +1238,6 @@ static BlockDriver bdrv_quorum = {
     .bdrv_child_perm                    = quorum_child_perm,
 
     .is_filter                          = true,
-    .bdrv_recurse_is_first_non_filter   = quorum_recurse_is_first_non_filter,
     .bdrv_recurse_can_replace           = quorum_recurse_can_replace,
 
     .strong_runtime_opts                = quorum_strong_runtime_opts,
diff --git a/block/replication.c b/block/replication.c
index 99532ce521..d6681b6c84 100644
--- a/block/replication.c
+++ b/block/replication.c
@@ -306,12 +306,6 @@ out:
     return ret;
 }
 
-static bool replication_recurse_is_first_non_filter(BlockDriverState *bs,
-                                                    BlockDriverState *candidate)
-{
-    return bdrv_recurse_is_first_non_filter(bs->file->bs, candidate);
-}
-
 static void secondary_do_checkpoint(BDRVReplicationState *s, Error **errp)
 {
     Error *local_err = NULL;
@@ -699,7 +693,6 @@ static BlockDriver bdrv_replication = {
     .bdrv_co_writev             = replication_co_writev,
 
     .is_filter                  = true,
-    .bdrv_recurse_is_first_non_filter = replication_recurse_is_first_non_filter,
 
     .has_variable_length        = true,
     .strong_runtime_opts        = replication_strong_runtime_opts,
diff --git a/block/throttle.c b/block/throttle.c
index 0349f42257..71f4bb0ad1 100644
--- a/block/throttle.c
+++ b/block/throttle.c
@@ -207,12 +207,6 @@ static void throttle_reopen_abort(BDRVReopenState *reopen_state)
     reopen_state->opaque = NULL;
 }
 
-static bool throttle_recurse_is_first_non_filter(BlockDriverState *bs,
-                                                 BlockDriverState *candidate)
-{
-    return bdrv_recurse_is_first_non_filter(bs->file->bs, candidate);
-}
-
 static void coroutine_fn throttle_co_drain_begin(BlockDriverState *bs)
 {
     ThrottleGroupMember *tgm = bs->opaque;
@@ -252,8 +246,6 @@ static BlockDriver bdrv_throttle = {
     .bdrv_co_pwrite_zeroes              =   throttle_co_pwrite_zeroes,
     .bdrv_co_pdiscard                   =   throttle_co_pdiscard,
 
-    .bdrv_recurse_is_first_non_filter   =   throttle_recurse_is_first_non_filter,
-
     .bdrv_attach_aio_context            =   throttle_attach_aio_context,
     .bdrv_detach_aio_context            =   throttle_detach_aio_context,
 
-- 
2.21.0



^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH 13/22] mirror: Double-check immediately before replacing
  2019-09-20 15:27 [PATCH 00/22] block: Fix check_to_replace_node() Max Reitz
                   ` (11 preceding siblings ...)
  2019-09-20 15:27 ` [PATCH 12/22] block: Remove bdrv_recurse_is_first_non_filter() Max Reitz
@ 2019-09-20 15:27 ` Max Reitz
  2019-09-20 15:27 ` [PATCH 14/22] quorum: Stop marking it as a filter Max Reitz
                   ` (8 subsequent siblings)
  21 siblings, 0 replies; 58+ messages in thread
From: Max Reitz @ 2019-09-20 15:27 UTC (permalink / raw)
  To: qemu-block; +Cc: Kevin Wolf, Alberto Garcia, qemu-devel, Max Reitz

There is no guarantee that we can still replace the node we want to
replace at the end of the mirror job.  Double-check by calling
bdrv_recurse_can_replace().

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 block/mirror.c | 14 +++++++++++++-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/block/mirror.c b/block/mirror.c
index 706d80fced..d877637e1e 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -695,7 +695,19 @@ static int mirror_exit_common(Job *job)
          * drain potential other users of the BDS before changing the graph. */
         assert(s->in_drain);
         bdrv_drained_begin(target_bs);
-        bdrv_replace_node(to_replace, target_bs, &local_err);
+        /*
+         * Cannot use check_to_replace_node() here, because that would
+         * check for an op blocker on @to_replace, and we have our own
+         * there.
+         */
+        if (bdrv_recurse_can_replace(src, to_replace)) {
+            bdrv_replace_node(to_replace, target_bs, &local_err);
+        } else {
+            error_setg(&local_err, "Can no longer replace '%s' by '%s', "
+                       "because it can no longer be guaranteed that doing so "
+                       "would not lead to an abrupt change of visible data",
+                       to_replace->node_name, target_bs->node_name);
+        }
         bdrv_drained_end(target_bs);
         if (local_err) {
             error_report_err(local_err);
-- 
2.21.0



^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH 14/22] quorum: Stop marking it as a filter
  2019-09-20 15:27 [PATCH 00/22] block: Fix check_to_replace_node() Max Reitz
                   ` (12 preceding siblings ...)
  2019-09-20 15:27 ` [PATCH 13/22] mirror: Double-check immediately before replacing Max Reitz
@ 2019-09-20 15:27 ` Max Reitz
  2019-09-26 12:43   ` Vladimir Sementsov-Ogievskiy
  2019-09-20 15:27 ` [PATCH 15/22] mirror: Prevent loops Max Reitz
                   ` (7 subsequent siblings)
  21 siblings, 1 reply; 58+ messages in thread
From: Max Reitz @ 2019-09-20 15:27 UTC (permalink / raw)
  To: qemu-block; +Cc: Kevin Wolf, Alberto Garcia, qemu-devel, Max Reitz

Quorum is not a filter, for example because it cannot guarantee which of
its children will serve the next request.  Thus, any of its children may
differ from the data visible to quorum's parents.

We have other filters with multiple children, but they differ in this
aspect:

- blkverify quits the whole qemu process if its children differ.  As
  such, we can always skip it when we want to skip it (as a filter node)
  by going to any of its children.  Both have the same data.

- replication generally serves requests from bs->file, so this is its
  only actually filtered child.

- Block job filters currently only have one child, but they will
  probably get more children in the future.  Still, they will always
  have only one actually filtered child.

Having "filters" as a dedicated node category only makes sense if you
can skip them by going to a one fixed child that always shows the same
data as the filter node.  Quorum cannot fulfill this, so it is not a
filter.

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 block/quorum.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/block/quorum.c b/block/quorum.c
index 7a8f8b5475..7f56b4df7c 100644
--- a/block/quorum.c
+++ b/block/quorum.c
@@ -1237,7 +1237,6 @@ static BlockDriver bdrv_quorum = {
 
     .bdrv_child_perm                    = quorum_child_perm,
 
-    .is_filter                          = true,
     .bdrv_recurse_can_replace           = quorum_recurse_can_replace,
 
     .strong_runtime_opts                = quorum_strong_runtime_opts,
-- 
2.21.0



^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH 15/22] mirror: Prevent loops
  2019-09-20 15:27 [PATCH 00/22] block: Fix check_to_replace_node() Max Reitz
                   ` (13 preceding siblings ...)
  2019-09-20 15:27 ` [PATCH 14/22] quorum: Stop marking it as a filter Max Reitz
@ 2019-09-20 15:27 ` Max Reitz
  2019-09-26 13:08   ` Vladimir Sementsov-Ogievskiy
  2019-09-20 15:27 ` [PATCH 16/22] iotests: Use complete_and_wait() in 155 Max Reitz
                   ` (6 subsequent siblings)
  21 siblings, 1 reply; 58+ messages in thread
From: Max Reitz @ 2019-09-20 15:27 UTC (permalink / raw)
  To: qemu-block; +Cc: Kevin Wolf, Alberto Garcia, qemu-devel, Max Reitz

While bdrv_replace_node() will not follow through with it, a specific
@replaces asks the mirror job to create a loop.

For example, say both the source and the target share a child where the
source is a filter; by letting @replaces point to the common child, you
ask for a loop.

Or if you use @replaces in drive-mirror with sync=none and
mode=absolute-paths, you generally ask for a loop (@replaces must point
to a child of the source, and sync=none makes the source the backing
file of the target after the job).

bdrv_replace_node() will not create those loops, but it by doing so, it
ignores the user-requested configuration, which is not ideally either.
(In the first example above, the target's child will remain what it was,
which may still be reasonable.  But in the second example, the target
will just not become a child of the source, which is precisely what was
requested with @replaces.)

So prevent such configurations, both before the job, and before it
actually completes.

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 include/block/block_int.h |  3 +++
 block.c                   | 30 ++++++++++++++++++++++++
 block/mirror.c            | 19 +++++++++++++++-
 blockdev.c                | 48 ++++++++++++++++++++++++++++++++++++++-
 4 files changed, 98 insertions(+), 2 deletions(-)

diff --git a/include/block/block_int.h b/include/block/block_int.h
index 70f26530c9..8a26a0d88a 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -1256,6 +1256,9 @@ void bdrv_format_default_perms(BlockDriverState *bs, BdrvChild *c,
 bool bdrv_recurse_can_replace(BlockDriverState *bs,
                               BlockDriverState *to_replace);
 
+bool bdrv_is_child_of(BlockDriverState *child, BlockDriverState *parent,
+                      int min_level);
+
 /*
  * Default implementation for drivers to pass bdrv_co_block_status() to
  * their file.
diff --git a/block.c b/block.c
index 0d9b3de98f..332191fb47 100644
--- a/block.c
+++ b/block.c
@@ -6260,6 +6260,36 @@ out:
     return to_replace_bs;
 }
 
+/*
+ * Return true iff @child is a (recursive) child of @parent, with at
+ * least @min_level edges between them.
+ *
+ * (If @min_level == 0, return true if @child == @parent.  For
+ * @min_level == 1, @child needs to be at least a real child; for
+ * @min_level == 2, it needs to be at least a grand-child; and so on.)
+ */
+bool bdrv_is_child_of(BlockDriverState *child, BlockDriverState *parent,
+                      int min_level)
+{
+    BdrvChild *c;
+
+    if (child == parent && min_level <= 0) {
+        return true;
+    }
+
+    if (!parent) {
+        return false;
+    }
+
+    QLIST_FOREACH(c, &parent->children, next) {
+        if (bdrv_is_child_of(child, c->bs, min_level - 1)) {
+            return true;
+        }
+    }
+
+    return false;
+}
+
 /**
  * Iterates through the list of runtime option keys that are said to
  * be "strong" for a BDS.  An option is called "strong" if it changes
diff --git a/block/mirror.c b/block/mirror.c
index d877637e1e..f30a6933d8 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -701,7 +701,24 @@ static int mirror_exit_common(Job *job)
          * there.
          */
         if (bdrv_recurse_can_replace(src, to_replace)) {
-            bdrv_replace_node(to_replace, target_bs, &local_err);
+            /*
+             * It is OK for @to_replace to be an immediate child of
+             * @target_bs, because that is what happens with
+             * drive-mirror sync=none mode=absolute-paths: target_bs's
+             * backing file will be the source node, which is also
+             * to_replace (by default).
+             * bdrv_replace_node() handles this case by not letting
+             * target_bs->backing point to itself, but to the source
+             * still.
+             */
+            if (!bdrv_is_child_of(to_replace, target_bs, 2)) {
+                bdrv_replace_node(to_replace, target_bs, &local_err);
+            } else {
+                error_setg(&local_err, "Can no longer replace '%s' by '%s', "
+                           "because the former is now a child of the latter, "
+                           "and doing so would thus create a loop",
+                           to_replace->node_name, target_bs->node_name);
+            }
         } else {
             error_setg(&local_err, "Can no longer replace '%s' by '%s', "
                        "because it can no longer be guaranteed that doing so "
diff --git a/blockdev.c b/blockdev.c
index 0420bc29be..27344247d5 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -3845,7 +3845,7 @@ static void blockdev_mirror_common(const char *job_id, BlockDriverState *bs,
     }
 
     if (has_replaces) {
-        BlockDriverState *to_replace_bs;
+        BlockDriverState *to_replace_bs, *target_backing_bs;
         AioContext *replace_aio_context;
         int64_t bs_size, replace_size;
 
@@ -3860,6 +3860,52 @@ static void blockdev_mirror_common(const char *job_id, BlockDriverState *bs,
             return;
         }
 
+        if (bdrv_is_child_of(to_replace_bs, target, 1)) {
+            error_setg(errp, "Replacing %s by %s would result in a loop, "
+                       "because the former is a child of the latter",
+                       to_replace_bs->node_name, target->node_name);
+            return;
+        }
+
+        if (backing_mode == MIRROR_SOURCE_BACKING_CHAIN ||
+            backing_mode == MIRROR_OPEN_BACKING_CHAIN)
+        {
+            /*
+             * While we do not quite know what OPEN_BACKING_CHAIN
+             * (used for mode=existing) will yield, it is probably
+             * best to restrict it exactly like SOURCE_BACKING_CHAIN,
+             * because that is our best guess.
+             */
+            switch (sync) {
+            case MIRROR_SYNC_MODE_FULL:
+                target_backing_bs = NULL;
+                break;
+
+            case MIRROR_SYNC_MODE_TOP:
+                target_backing_bs = backing_bs(bs);
+                break;
+
+            case MIRROR_SYNC_MODE_NONE:
+                target_backing_bs = bs;
+                break;
+
+            default:
+                abort();
+            }
+        } else {
+            assert(backing_mode == MIRROR_LEAVE_BACKING_CHAIN);
+            target_backing_bs = backing_bs(target);
+        }
+
+        if (bdrv_is_child_of(to_replace_bs, target_backing_bs, 0)) {
+            error_setg(errp, "Replacing '%s' by '%s' with this sync mode would "
+                       "result in a loop, because the former would be a child "
+                       "of the latter's backing file ('%s') after the mirror "
+                       "job", to_replace_bs->node_name, target->node_name,
+                       target_backing_bs->node_name);
+            return;
+        }
+
         replace_aio_context = bdrv_get_aio_context(to_replace_bs);
         aio_context_acquire(replace_aio_context);
         replace_size = bdrv_getlength(to_replace_bs);
-- 
2.21.0



^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH 16/22] iotests: Use complete_and_wait() in 155
  2019-09-20 15:27 [PATCH 00/22] block: Fix check_to_replace_node() Max Reitz
                   ` (14 preceding siblings ...)
  2019-09-20 15:27 ` [PATCH 15/22] mirror: Prevent loops Max Reitz
@ 2019-09-20 15:27 ` Max Reitz
  2019-09-26 13:11   ` Vladimir Sementsov-Ogievskiy
  2019-09-20 15:27 ` [PATCH 17/22] iotests: Add VM.assert_block_path() Max Reitz
                   ` (5 subsequent siblings)
  21 siblings, 1 reply; 58+ messages in thread
From: Max Reitz @ 2019-09-20 15:27 UTC (permalink / raw)
  To: qemu-block; +Cc: Kevin Wolf, Alberto Garcia, qemu-devel, Max Reitz

This way, we get to see errors during the completion phase.

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 tests/qemu-iotests/155 | 7 +------
 1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/tests/qemu-iotests/155 b/tests/qemu-iotests/155
index e19485911c..d7ef2579d3 100755
--- a/tests/qemu-iotests/155
+++ b/tests/qemu-iotests/155
@@ -163,12 +163,7 @@ class MirrorBaseClass(BaseClass):
 
         self.assert_qmp(result, 'return', {})
 
-        self.vm.event_wait('BLOCK_JOB_READY')
-
-        result = self.vm.qmp('block-job-complete', device='mirror-job')
-        self.assert_qmp(result, 'return', {})
-
-        self.vm.event_wait('BLOCK_JOB_COMPLETED')
+        self.complete_and_wait('mirror-job')
 
     def testFull(self):
         self.runMirror('full')
-- 
2.21.0



^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH 17/22] iotests: Add VM.assert_block_path()
  2019-09-20 15:27 [PATCH 00/22] block: Fix check_to_replace_node() Max Reitz
                   ` (15 preceding siblings ...)
  2019-09-20 15:27 ` [PATCH 16/22] iotests: Use complete_and_wait() in 155 Max Reitz
@ 2019-09-20 15:27 ` Max Reitz
  2019-09-26 14:07   ` Vladimir Sementsov-Ogievskiy
  2019-09-20 15:28 ` [PATCH 18/22] iotests: Resolve TODOs in 041 Max Reitz
                   ` (4 subsequent siblings)
  21 siblings, 1 reply; 58+ messages in thread
From: Max Reitz @ 2019-09-20 15:27 UTC (permalink / raw)
  To: qemu-block; +Cc: Kevin Wolf, Alberto Garcia, qemu-devel, Max Reitz

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 tests/qemu-iotests/iotests.py | 48 +++++++++++++++++++++++++++++++++++
 1 file changed, 48 insertions(+)

diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py
index daed4ee013..e6fb46287d 100644
--- a/tests/qemu-iotests/iotests.py
+++ b/tests/qemu-iotests/iotests.py
@@ -670,6 +670,54 @@ class VM(qtest.QEMUQtestMachine):
 
         return fields.items() <= ret.items()
 
+    '''
+    @path is a string whose components are separated by slashes.
+    The first component is a node name, the rest are child names.
+    Examples:
+      - "qcow2-node/backing/file"
+      - "quorum-node/children.2/file"
+
+    @expected_node may be None.
+
+    @graph may be None or the result of an x-debug-query-block-graph
+    call that has already been performed.
+    '''
+    def assert_block_path(self, path, expected_node, graph=None):
+        if graph is None:
+            graph = self.qmp('x-debug-query-block-graph')['return']
+
+        iter_path = iter(path.split('/'))
+        root = next(iter_path)
+        try:
+            node = next(node for node in graph['nodes'] if node['name'] == root)
+        except StopIteration:
+            node = None
+
+        for path_node in iter_path:
+            assert node is not None, 'Cannot follow path %s' % path
+
+            try:
+                node_id = next(edge['child'] for edge in graph['edges'] \
+                                             if edge['parent'] == node['id'] and
+                                                edge['name'] == path_node)
+
+                node = next(node for node in graph['nodes'] \
+                                 if node['id'] == node_id)
+            except StopIteration:
+                node = None
+
+        assert node is not None or expected_node is None, \
+               'No node found under %s (but expected %s)' % \
+               (path, expected_node)
+
+        assert expected_node is not None or node is None, \
+               'Found node %s under %s (but expected none)' % \
+               (node['name'], path)
+
+        if node is not None and expected_node is not None:
+            assert node['name'] == expected_node, \
+                   'Found node %s under %s (but expected %s)' % \
+                   (node['name'], path, expected_node)
 
 index_re = re.compile(r'([^\[]+)\[([^\]]+)\]')
 
-- 
2.21.0



^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH 18/22] iotests: Resolve TODOs in 041
  2019-09-20 15:27 [PATCH 00/22] block: Fix check_to_replace_node() Max Reitz
                   ` (16 preceding siblings ...)
  2019-09-20 15:27 ` [PATCH 17/22] iotests: Add VM.assert_block_path() Max Reitz
@ 2019-09-20 15:28 ` Max Reitz
  2019-09-26 14:09   ` Vladimir Sementsov-Ogievskiy
  2019-09-20 15:28 ` [PATCH 19/22] iotests: Use self.image_len in TestRepairQuorum Max Reitz
                   ` (3 subsequent siblings)
  21 siblings, 1 reply; 58+ messages in thread
From: Max Reitz @ 2019-09-20 15:28 UTC (permalink / raw)
  To: qemu-block; +Cc: Kevin Wolf, Alberto Garcia, qemu-devel, Max Reitz

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 tests/qemu-iotests/041 | 7 ++-----
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/tests/qemu-iotests/041 b/tests/qemu-iotests/041
index d91f538276..ca126de3ff 100755
--- a/tests/qemu-iotests/041
+++ b/tests/qemu-iotests/041
@@ -921,8 +921,7 @@ class TestRepairQuorum(iotests.QMPTestCase):
 
         self.complete_and_wait(drive="job0")
         self.assert_has_block_node("repair0", quorum_repair_img)
-        # TODO: a better test requiring some QEMU infrastructure will be added
-        #       to check that this file is really driven by quorum
+        self.vm.assert_block_path('quorum0/children.1', 'repair0')
         self.vm.shutdown()
         self.assertTrue(iotests.compare_images(quorum_img2, quorum_repair_img),
                         'target image does not match source after mirroring')
@@ -1074,9 +1073,7 @@ class TestRepairQuorum(iotests.QMPTestCase):
 
         self.complete_and_wait('job0')
         self.assert_has_block_node("repair0", quorum_repair_img)
-        # TODO: a better test requiring some QEMU infrastructure will be added
-        #       to check that this file is really driven by quorum
-        self.vm.shutdown()
+        self.vm.assert_block_path('quorum0/children.1', 'repair0')
 
 # Test mirroring with a source that does not have any parents (not even a
 # BlockBackend)
-- 
2.21.0



^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH 19/22] iotests: Use self.image_len in TestRepairQuorum
  2019-09-20 15:27 [PATCH 00/22] block: Fix check_to_replace_node() Max Reitz
                   ` (17 preceding siblings ...)
  2019-09-20 15:28 ` [PATCH 18/22] iotests: Resolve TODOs in 041 Max Reitz
@ 2019-09-20 15:28 ` Max Reitz
  2019-09-26 14:13   ` Vladimir Sementsov-Ogievskiy
  2019-09-20 15:28 ` [PATCH 20/22] iotests: Add tests for invalid Quorum @replaces Max Reitz
                   ` (2 subsequent siblings)
  21 siblings, 1 reply; 58+ messages in thread
From: Max Reitz @ 2019-09-20 15:28 UTC (permalink / raw)
  To: qemu-block; +Cc: Kevin Wolf, Alberto Garcia, qemu-devel, Max Reitz

041's TestRepairQuorum has its own image_len, no need to refer to
TestSingleDrive.  (This patch allows uncommenting TestSingleDrive to
speed up 041 during test testing.)

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 tests/qemu-iotests/041 | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tests/qemu-iotests/041 b/tests/qemu-iotests/041
index ca126de3ff..20ae9750b7 100755
--- a/tests/qemu-iotests/041
+++ b/tests/qemu-iotests/041
@@ -880,7 +880,7 @@ class TestRepairQuorum(iotests.QMPTestCase):
         # Add each individual quorum images
         for i in self.IMAGES:
             qemu_img('create', '-f', iotests.imgfmt, i,
-                     str(TestSingleDrive.image_len))
+                     str(self.image_len))
             # Assign a node name to each quorum image in order to manipulate
             # them
             opts = "node-name=img%i" % self.IMAGES.index(i)
-- 
2.21.0



^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH 20/22] iotests: Add tests for invalid Quorum @replaces
  2019-09-20 15:27 [PATCH 00/22] block: Fix check_to_replace_node() Max Reitz
                   ` (18 preceding siblings ...)
  2019-09-20 15:28 ` [PATCH 19/22] iotests: Use self.image_len in TestRepairQuorum Max Reitz
@ 2019-09-20 15:28 ` Max Reitz
  2019-09-26 14:30   ` Vladimir Sementsov-Ogievskiy
  2019-09-20 15:28 ` [PATCH 21/22] iotests: Check that @replaces can replace filters Max Reitz
  2019-09-20 15:28 ` [PATCH 22/22] iotests: Mirror must not attempt to create loops Max Reitz
  21 siblings, 1 reply; 58+ messages in thread
From: Max Reitz @ 2019-09-20 15:28 UTC (permalink / raw)
  To: qemu-block; +Cc: Kevin Wolf, Alberto Garcia, qemu-devel, Max Reitz

Add two tests to see that you cannot replace a Quorum child with the
mirror job while the child is in use by a different parent.

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 tests/qemu-iotests/041     | 57 +++++++++++++++++++++++++++++++++++++-
 tests/qemu-iotests/041.out |  4 +--
 2 files changed, 58 insertions(+), 3 deletions(-)

diff --git a/tests/qemu-iotests/041 b/tests/qemu-iotests/041
index 20ae9750b7..148dc47ce4 100755
--- a/tests/qemu-iotests/041
+++ b/tests/qemu-iotests/041
@@ -34,6 +34,8 @@ quorum_img3 = os.path.join(iotests.test_dir, 'quorum3.img')
 quorum_repair_img = os.path.join(iotests.test_dir, 'quorum_repair.img')
 quorum_snapshot_file = os.path.join(iotests.test_dir, 'quorum_snapshot.img')
 
+nbd_sock_path = os.path.join(iotests.test_dir, 'nbd.sock')
+
 class TestSingleDrive(iotests.QMPTestCase):
     image_len = 1 * 1024 * 1024 # MB
     qmp_cmd = 'drive-mirror'
@@ -901,7 +903,8 @@ class TestRepairQuorum(iotests.QMPTestCase):
 
     def tearDown(self):
         self.vm.shutdown()
-        for i in self.IMAGES + [ quorum_repair_img, quorum_snapshot_file ]:
+        for i in self.IMAGES + [ quorum_repair_img, quorum_snapshot_file,
+                                 nbd_sock_path ]:
             # Do a try/except because the test may have deleted some images
             try:
                 os.remove(i)
@@ -1075,6 +1078,58 @@ class TestRepairQuorum(iotests.QMPTestCase):
         self.assert_has_block_node("repair0", quorum_repair_img)
         self.vm.assert_block_path('quorum0/children.1', 'repair0')
 
+    '''
+    Check that we cannot replace a Quorum child when it has other
+    parents.
+    '''
+    def test_with_other_parent(self):
+        result = self.vm.qmp('nbd-server-start',
+                             addr={
+                                 'type': 'unix',
+                                 'data': {'path': nbd_sock_path}
+                             })
+        self.assert_qmp(result, 'return', {})
+
+        result = self.vm.qmp('nbd-server-add', device='img1')
+        self.assert_qmp(result, 'return', {})
+
+        result = self.vm.qmp('drive-mirror', job_id='mirror', device='quorum0',
+                             sync='full', node_name='repair0', replaces='img1',
+                             target=quorum_repair_img, format=iotests.imgfmt)
+        self.assert_qmp(result, 'error/desc',
+                        "Cannot replace 'img1' by a node mirrored from "
+                        "'quorum0', because it cannot be guaranteed that doing "
+                        "so would not lead to an abrupt change of visible data")
+
+    '''
+    The same as test_with_other_parent(), but add the NBD server only
+    when the mirror job is already running.
+    '''
+    def test_with_other_parents_after_mirror_start(self):
+        result = self.vm.qmp('nbd-server-start',
+                             addr={
+                                 'type': 'unix',
+                                 'data': {'path': nbd_sock_path}
+                             })
+        self.assert_qmp(result, 'return', {})
+
+        result = self.vm.qmp('drive-mirror', job_id='mirror', device='quorum0',
+                             sync='full', node_name='repair0', replaces='img1',
+                             target=quorum_repair_img, format=iotests.imgfmt)
+        self.assert_qmp(result, 'return', {})
+
+        result = self.vm.qmp('nbd-server-add', device='img1')
+        self.assert_qmp(result, 'return', {})
+
+        # The full error message goes to stderr, so we unfortunately
+        # cannot check it here
+        self.complete_and_wait('mirror',
+                               completion_error='Operation not permitted')
+
+        # Should not have been replaced
+        self.vm.assert_block_path('quorum0/children.1', 'img1')
+
+
 # Test mirroring with a source that does not have any parents (not even a
 # BlockBackend)
 class TestOrphanedSource(iotests.QMPTestCase):
diff --git a/tests/qemu-iotests/041.out b/tests/qemu-iotests/041.out
index f496be9197..ffc779b4d1 100644
--- a/tests/qemu-iotests/041.out
+++ b/tests/qemu-iotests/041.out
@@ -1,5 +1,5 @@
-...........................................................................................
+.............................................................................................
 ----------------------------------------------------------------------
-Ran 91 tests
+Ran 93 tests
 
 OK
-- 
2.21.0



^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH 21/22] iotests: Check that @replaces can replace filters
  2019-09-20 15:27 [PATCH 00/22] block: Fix check_to_replace_node() Max Reitz
                   ` (19 preceding siblings ...)
  2019-09-20 15:28 ` [PATCH 20/22] iotests: Add tests for invalid Quorum @replaces Max Reitz
@ 2019-09-20 15:28 ` Max Reitz
  2019-09-26 14:42   ` Vladimir Sementsov-Ogievskiy
  2019-09-20 15:28 ` [PATCH 22/22] iotests: Mirror must not attempt to create loops Max Reitz
  21 siblings, 1 reply; 58+ messages in thread
From: Max Reitz @ 2019-09-20 15:28 UTC (permalink / raw)
  To: qemu-block; +Cc: Kevin Wolf, Alberto Garcia, qemu-devel, Max Reitz

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 tests/qemu-iotests/041     | 45 ++++++++++++++++++++++++++++++++++++++
 tests/qemu-iotests/041.out |  4 ++--
 2 files changed, 47 insertions(+), 2 deletions(-)

diff --git a/tests/qemu-iotests/041 b/tests/qemu-iotests/041
index 148dc47ce4..e4cc829fe2 100755
--- a/tests/qemu-iotests/041
+++ b/tests/qemu-iotests/041
@@ -1220,6 +1220,51 @@ class TestOrphanedSource(iotests.QMPTestCase):
         self.assertFalse('mirror-filter' in nodes,
                          'Mirror filter node did not disappear')
 
+# Test cases for @replaces that do not necessarily involve Quorum
+class TestReplaces(iotests.QMPTestCase):
+    # Each of these test cases needs their own block graph, so do not
+    # create any nodes here
+    def setUp(self):
+        self.vm = iotests.VM()
+        self.vm.launch()
+
+    def tearDown(self):
+        self.vm.shutdown()
+        for img in (test_img, target_img):
+            try:
+                os.remove(img)
+            except OSError:
+                pass
+
+    '''
+    Check that we can replace filter nodes.
+    '''
+    def test_replace_filter(self):
+        result = self.vm.qmp('blockdev-add', **{
+                                 'driver': 'copy-on-read',
+                                 'node-name': 'filter0',
+                                 'file': {
+                                     'driver': 'copy-on-read',
+                                     'node-name': 'filter1',
+                                     'file': {
+                                         'driver': 'null-co'
+                                     }
+                                 }
+                             })
+        self.assert_qmp(result, 'return', {})
+
+        result = self.vm.qmp('blockdev-add',
+                             node_name='target', driver='null-co')
+        self.assert_qmp(result, 'return', {})
+
+        result = self.vm.qmp('blockdev-mirror', job_id='mirror', device='filter0',
+                             target='target', sync='full', replaces='filter1')
+        self.assert_qmp(result, 'return', {})
+
+        self.complete_and_wait('mirror')
+
+        self.vm.assert_block_path('filter0/file', 'target')
+
 if __name__ == '__main__':
     iotests.main(supported_fmts=['qcow2', 'qed'],
                  supported_protocols=['file'])
diff --git a/tests/qemu-iotests/041.out b/tests/qemu-iotests/041.out
index ffc779b4d1..877b76fd31 100644
--- a/tests/qemu-iotests/041.out
+++ b/tests/qemu-iotests/041.out
@@ -1,5 +1,5 @@
-.............................................................................................
+..............................................................................................
 ----------------------------------------------------------------------
-Ran 93 tests
+Ran 94 tests
 
 OK
-- 
2.21.0



^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH 22/22] iotests: Mirror must not attempt to create loops
  2019-09-20 15:27 [PATCH 00/22] block: Fix check_to_replace_node() Max Reitz
                   ` (20 preceding siblings ...)
  2019-09-20 15:28 ` [PATCH 21/22] iotests: Check that @replaces can replace filters Max Reitz
@ 2019-09-20 15:28 ` Max Reitz
  2019-09-26 15:03   ` Vladimir Sementsov-Ogievskiy
  21 siblings, 1 reply; 58+ messages in thread
From: Max Reitz @ 2019-09-20 15:28 UTC (permalink / raw)
  To: qemu-block; +Cc: Kevin Wolf, Alberto Garcia, qemu-devel, Max Reitz

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 tests/qemu-iotests/041     | 152 +++++++++++++++++++++++++++++++++++++
 tests/qemu-iotests/041.out |   4 +-
 2 files changed, 154 insertions(+), 2 deletions(-)

diff --git a/tests/qemu-iotests/041 b/tests/qemu-iotests/041
index e4cc829fe2..6ea4764ae8 100755
--- a/tests/qemu-iotests/041
+++ b/tests/qemu-iotests/041
@@ -1265,6 +1265,158 @@ class TestReplaces(iotests.QMPTestCase):
 
         self.vm.assert_block_path('filter0/file', 'target')
 
+    '''
+    See what happens when the @sync/@replaces configuration dictates
+    creating a loop.
+    '''
+    def test_loop(self):
+        qemu_img('create', '-f', iotests.imgfmt, test_img, str(1 * 1024 * 1024))
+
+        # Dummy group so we can create a NOP filter
+        result = self.vm.qmp('object-add', qom_type='throttle-group', id='tg0')
+        self.assert_qmp(result, 'return', {})
+
+        result = self.vm.qmp('blockdev-add', **{
+                                 'driver': 'throttle',
+                                 'node-name': 'source',
+                                 'throttle-group': 'tg0',
+                                 'file': {
+                                     'driver': iotests.imgfmt,
+                                     'node-name': 'filtered',
+                                     'file': {
+                                         'driver': 'file',
+                                         'filename': test_img
+                                     }
+                                 }
+                             })
+        self.assert_qmp(result, 'return', {})
+
+        # Block graph is now:
+        #   source[throttle] --file--> filtered[qcow2] --file--> ...
+
+        result = self.vm.qmp('drive-mirror', job_id='mirror', device='source',
+                             target=target_img, format=iotests.imgfmt,
+                             node_name='target', sync='none',
+                             replaces='filtered')
+
+        '''
+        Block graph before mirror exits would be (ignoring mirror_top):
+          source[throttle] --file--> filtered[qcow2] --file--> ...
+          target[qcow2] --file--> ...
+
+        Then, because of sync=none and drive-mirror in absolute-paths mode,
+        the source is attached to the target:
+          source[throttle] --file--> filtered[qcow2] --file--> ...
+                 ^
+              backing
+                 |
+            target[qcow2] --file--> ...
+
+        Replacing filtered by target would yield:
+          source[throttle] --file--> target[qcow2] --file--> ...
+                 ^                        |
+                 +------- backing --------+
+
+        I.e., a loop.  bdrv_replace_node() detects this and simply
+        does not let source's file link point to target.  However,
+        that means that target cannot really replace source.
+
+        drive-mirror should detect this and not allow this case.
+        '''
+
+        self.assert_qmp(result, 'error/desc',
+                        "Replacing 'filtered' by 'target' with this sync " + \
+                        "mode would result in a loop, because the former " + \
+                        "would be a child of the latter's backing file " + \
+                        "('source') after the mirror job")
+
+    '''
+    Test what happens when there would be no loop with the pre-mirror
+    configuration, but something changes during the mirror job that asks
+    for a loop to be created during completion.
+    '''
+    def test_loop_during_mirror(self):
+        qemu_img('create', '-f', iotests.imgfmt, test_img, str(1 * 1024 * 1024))
+
+        result = self.vm.qmp('blockdev-add', **{
+                                 'driver': 'null-co',
+                                 'node-name': 'common-base',
+                                 'read-zeroes': True,
+                                 'size': 1 * 1024 * 1024
+                             })
+        self.assert_qmp(result, 'return', {})
+
+        result = self.vm.qmp('blockdev-add', **{
+                                 'driver': 'copy-on-read',
+                                 'node-name': 'source',
+                                 'file': 'common-base'
+                             })
+        self.assert_qmp(result, 'return', {})
+
+        '''
+        x-blockdev-change can only add children to a quorum node that
+        have no parent yet, so we need an intermediate node between
+        target and common-base that has no parents other than target.
+        We cannot use any parent that would forward the RESIZE
+        permission (because the job takes it, but unshares it on the
+        source), so we make it a backing child of a qcow2 node.
+        Unfortunately, we cannot add backing files to Quorum nodes
+        (because of an op blocker), so we put another raw node between
+        the qcow2 node and common-base.
+        '''
+        result = self.vm.qmp('blockdev-add', **{
+                                 'driver': 'qcow2',
+                                 'node-name': 'base-parent',
+                                 'file': {
+                                     'driver': 'file',
+                                     'filename': test_img
+                                 },
+                                 'backing': {
+                                     'driver': 'raw',
+                                     'file': 'common-base'
+                                 }
+                             })
+
+        # Add a quorum node with a single child, we will add
+        # base-parent to prepare a loop later
+        result = self.vm.qmp('blockdev-add', **{
+                                 'driver': 'quorum',
+                                 'node-name': 'target',
+                                 'vote-threshold': 1,
+                                 'children': [
+                                     {
+                                         'driver': 'null-co',
+                                         'read-zeroes': True
+                                     }
+                                 ]
+                             })
+        self.assert_qmp(result, 'return', {})
+
+        result = self.vm.qmp('blockdev-mirror', job_id='mirror',
+                             device='source', target='target', sync='full',
+                             replaces='common-base')
+        self.assert_qmp(result, 'return', {})
+
+        result = self.vm.qmp('x-blockdev-change',
+                             parent='target', node='base-parent')
+        self.assert_qmp(result, 'return', {})
+
+        '''
+        This asks for this configuration post-mirror:
+
+        source -> target (replaced common-base) -> base-parent
+                                  ^                    |
+                                  |                    v
+                                  +----------------- [raw]
+
+        bdrv_replace_node() would not allow such a configuration, but
+        we should not pretend we can create it, so the mirror job
+        should fail during completion.
+        '''
+
+        self.complete_and_wait('mirror',
+                               completion_error='Operation not permitted')
+
 if __name__ == '__main__':
     iotests.main(supported_fmts=['qcow2', 'qed'],
                  supported_protocols=['file'])
diff --git a/tests/qemu-iotests/041.out b/tests/qemu-iotests/041.out
index 877b76fd31..20a8158b99 100644
--- a/tests/qemu-iotests/041.out
+++ b/tests/qemu-iotests/041.out
@@ -1,5 +1,5 @@
-..............................................................................................
+................................................................................................
 ----------------------------------------------------------------------
-Ran 94 tests
+Ran 96 tests
 
 OK
-- 
2.21.0



^ permalink raw reply related	[flat|nested] 58+ messages in thread

* Re: [PATCH 01/22] blockdev: Allow external snapshots everywhere
  2019-09-20 15:27 ` [PATCH 01/22] blockdev: Allow external snapshots everywhere Max Reitz
@ 2019-09-25 11:45   ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 58+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2019-09-25 11:45 UTC (permalink / raw)
  To: Max Reitz, qemu-block; +Cc: Kevin Wolf, Alberto Garcia, qemu-devel

20.09.2019 18:27, Max Reitz wrote:
> There is no good reason why we would allow external snapshots only on
> the first non-filter node in a chain.  Parent BDSs should not care
> whether their child is replaced by a snapshot.  (If they do care, they
> should announce that via freezing the chain, which is checked in
> bdrv_append() through bdrv_set_backing_hd().)
> 
> Before we had bdrv_is_first_non_filter() here (since 212a5a8f095), there
> was a special function bdrv_check_ext_snapshot() that allowed snapshots
> by default, but block drivers could override this.  Only blkverify did
> so, however.
> 
> It is not clear to me why blkverify would do so; maybe just so that the
> testee block driver would not be replaced.  The introducing commit
> f6186f49e2c does not explain why.  Maybe because 08b24cfe376 would have
> been the correct solution?  (Which adds a .supports_backing check.)
> 
> Signed-off-by: Max Reitz<mreitz@redhat.com>

Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>

-- 
Best regards,
Vladimir

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 02/22] blockdev: Allow resizing everywhere
  2019-09-20 15:27 ` [PATCH 02/22] blockdev: Allow resizing everywhere Max Reitz
@ 2019-09-25 11:46   ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 58+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2019-09-25 11:46 UTC (permalink / raw)
  To: Max Reitz, qemu-block; +Cc: Kevin Wolf, Alberto Garcia, qemu-devel

20.09.2019 18:27, Max Reitz wrote:
> Block nodes that do not allow resizing should not share BLK_PERM_RESIZE.
> It does not matter whether they are the first non-filter in their chain
> or not.
> 
> Signed-off-by: Max Reitz <mreitz@redhat.com>

Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>


-- 
Best regards,
Vladimir

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 03/22] block: Drop bdrv_is_first_non_filter()
  2019-09-20 15:27 ` [PATCH 03/22] block: Drop bdrv_is_first_non_filter() Max Reitz
@ 2019-09-25 11:46   ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 58+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2019-09-25 11:46 UTC (permalink / raw)
  To: Max Reitz, qemu-block; +Cc: Kevin Wolf, Alberto Garcia, qemu-devel

20.09.2019 18:27, Max Reitz wrote:
> It is unused now.  (And it was ugly because it needed to explore all BDS
> chains from the top.)
> 
> Signed-off-by: Max Reitz<mreitz@redhat.com>

Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>

-- 
Best regards,
Vladimir

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 04/22] iotests: Let 041 use -blockdev for quorum children
  2019-09-20 15:27 ` [PATCH 04/22] iotests: Let 041 use -blockdev for quorum children Max Reitz
@ 2019-09-25 11:51   ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 58+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2019-09-25 11:51 UTC (permalink / raw)
  To: Max Reitz, qemu-block; +Cc: Kevin Wolf, Alberto Garcia, qemu-devel

20.09.2019 18:27, Max Reitz wrote:
> Using -drive with default options means that a virtio-blk drive will be
> created that has write access to the to-be quorum children.  Quorum
> should have exclusive write access to them, so we should use -blockdev
> instead.
> 
> Signed-off-by: Max Reitz <mreitz@redhat.com>


Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>


-- 
Best regards,
Vladimir

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 05/22] quorum: Fix child permissions
  2019-09-20 15:27 ` [PATCH 05/22] quorum: Fix child permissions Max Reitz
@ 2019-09-25 11:56   ` Vladimir Sementsov-Ogievskiy
  2019-09-26 11:02     ` Max Reitz
  0 siblings, 1 reply; 58+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2019-09-25 11:56 UTC (permalink / raw)
  To: Max Reitz, qemu-block; +Cc: Kevin Wolf, Alberto Garcia, qemu-devel

20.09.2019 18:27, Max Reitz wrote:
> Quorum is not actually a filter.  It cannot share WRITE or RESIZE on its
> children.

Hmm, backup-top don't want to share WRITE or RESIZE too, but it's a filter still..
May it be just "Quorum cannot share WRITE or RESIZE on its children." ?

> 
> Signed-off-by: Max Reitz <mreitz@redhat.com>

Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>



-- 
Best regards,
Vladimir

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 06/22] block: Add bdrv_recurse_can_replace()
  2019-09-20 15:27 ` [PATCH 06/22] block: Add bdrv_recurse_can_replace() Max Reitz
@ 2019-09-25 12:39   ` Vladimir Sementsov-Ogievskiy
  2019-09-26 11:03     ` Max Reitz
  0 siblings, 1 reply; 58+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2019-09-25 12:39 UTC (permalink / raw)
  To: Max Reitz, qemu-block; +Cc: Kevin Wolf, Alberto Garcia, qemu-devel

20.09.2019 18:27, Max Reitz wrote:
> After a couple of follow-up patches, this function will replace
> bdrv_recurse_is_first_non_filter() in check_to_replace_node().
> 
> bdrv_recurse_is_first_non_filter() is both not sufficiently specific for
> check_to_replace_node() (it allows cases that should not be allowed,
> like replacing child nodes of quorum with dissenting data that have more
> parents than just quorum), and it is too restrictive (it is perfectly
> fine to replace filters).
> 
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>   include/block/block_int.h | 10 ++++++++++
>   block.c                   | 38 ++++++++++++++++++++++++++++++++++++++
>   2 files changed, 48 insertions(+)
> 
> diff --git a/include/block/block_int.h b/include/block/block_int.h
> index 5fd4f17d93..0be7d12f04 100644
> --- a/include/block/block_int.h
> +++ b/include/block/block_int.h
> @@ -103,6 +103,13 @@ struct BlockDriver {
>        */
>       bool (*bdrv_recurse_is_first_non_filter)(BlockDriverState *bs,
>                                                BlockDriverState *candidate);
> +    /*
> +     * Return true if @to_replace can be replaced by a BDS with the
> +     * same data as @bs without it affecting @bs's behavior (that is,
> +     * without it being visible to @bs's parents).
> +     */
> +    bool (*bdrv_recurse_can_replace)(BlockDriverState *bs,
> +                                     BlockDriverState *to_replace);
>   
>       int (*bdrv_probe)(const uint8_t *buf, int buf_size, const char *filename);
>       int (*bdrv_probe_device)(const char *filename);
> @@ -1254,6 +1261,9 @@ void bdrv_format_default_perms(BlockDriverState *bs, BdrvChild *c,
>                                  uint64_t perm, uint64_t shared,
>                                  uint64_t *nperm, uint64_t *nshared);
>   
> +bool bdrv_recurse_can_replace(BlockDriverState *bs,
> +                              BlockDriverState *to_replace);
> +
>   /*
>    * Default implementation for drivers to pass bdrv_co_block_status() to
>    * their file.
> diff --git a/block.c b/block.c
> index 7d99ca692c..a2deca4ac9 100644
> --- a/block.c
> +++ b/block.c
> @@ -6206,6 +6206,44 @@ bool bdrv_recurse_is_first_non_filter(BlockDriverState *bs,
>       return false;
>   }
>   
> +/*
> + * This function checks whether the given @to_replace is allowed to be
> + * replaced by a node that always shows the same data as @bs.  This is
> + * used for example to verify whether the mirror job can replace
> + * @to_replace by the target mirrored from @bs.
> + * To be replaceable, @bs and @to_replace may either be guaranteed to
> + * always show the same data (because they are only connected through
> + * filters), or some driver may allow replacing one of its children
> + * because it can guarantee that this child's data is not visible at
> + * all (for example, for dissenting quorum children that have no other
> + * parents).
> + */
> +bool bdrv_recurse_can_replace(BlockDriverState *bs,
> +                              BlockDriverState *to_replace)
> +{
> +    if (!bs || !bs->drv) {
> +        return false;
> +    }
> +
> +    if (bs == to_replace) {
> +        return true;
> +    }
> +
> +    /* For filters, we can recurse on our own */
> +    if (bs->drv->is_filter) {
> +        BdrvChild *child = bs->file ?: bs->backing;

then, maybe asset(!bs->drv->bdrv_recurse_can_replace)

> +        return bdrv_recurse_can_replace(child->bs, to_replace);
> +    }

or, this may be filter-skipping loop instead of recursion, like

while (bs && bs->drv && bs->drv->is_filter) {
   bs = (bs->file ?: bs->backing)->bs;
}

at function start.

either way:

Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>

> +
> +    /* See what the driver can do */
> +    if (bs->drv->bdrv_recurse_can_replace) {
> +        return bs->drv->bdrv_recurse_can_replace(bs, to_replace);
> +    }
> +
> +    /* Safe default */
> +    return false;
> +}
> +
>   BlockDriverState *check_to_replace_node(BlockDriverState *parent_bs,
>                                           const char *node_name, Error **errp)
>   {
> 


-- 
Best regards,
Vladimir

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 07/22] blkverify: Implement .bdrv_recurse_can_replace()
  2019-09-20 15:27 ` [PATCH 07/22] blkverify: Implement .bdrv_recurse_can_replace() Max Reitz
@ 2019-09-25 12:56   ` Vladimir Sementsov-Ogievskiy
  2019-09-26 11:06     ` Max Reitz
  0 siblings, 1 reply; 58+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2019-09-25 12:56 UTC (permalink / raw)
  To: Max Reitz, qemu-block; +Cc: Kevin Wolf, Alberto Garcia, qemu-devel

20.09.2019 18:27, Max Reitz wrote:
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>   block/blkverify.c | 15 +++++++++++++++
>   1 file changed, 15 insertions(+)
> 
> diff --git a/block/blkverify.c b/block/blkverify.c
> index 304b0a1368..0add3ab483 100644
> --- a/block/blkverify.c
> +++ b/block/blkverify.c
> @@ -282,6 +282,20 @@ static bool blkverify_recurse_is_first_non_filter(BlockDriverState *bs,
>       return bdrv_recurse_is_first_non_filter(s->test_file->bs, candidate);
>   }
>   
> +static bool blkverify_recurse_can_replace(BlockDriverState *bs,
> +                                          BlockDriverState *to_replace)
> +{
> +    BDRVBlkverifyState *s = bs->opaque;
> +
> +    /*
> +     * blkverify quits the whole qemu process if there is a mismatch
> +     * between bs->file->bs and s->test_file->bs.  Therefore, we know
> +     * know that both must match bs and we can recurse down to either.
> +     */
> +    return bdrv_recurse_can_replace(bs->file->bs, to_replace) ||
> +           bdrv_recurse_can_replace(s->test_file->bs, to_replace);

Ok, now I understand, what bdrv_recurse_can_replace actually does:

It searches for to_replace in bs-rooted subtree, going only through equal
children..

So, we can replace @to_replace, by something equal to @bs, if @to_replace is
in equal-subtree of @bs.

I'll try to explain my misleading:

bdrv_recurse_can_replace declaration looks like bs and to_replace may be absolutely
unrelated nodes. So, why bs should decide, can we replace the unrelated to_replace
node by something..

So, it may be simpler to follow, if it was called bdrv_recurse_filtered_subtree, or
bdrv_recurse_transparent_subtree..

Still, now I understand, and don't care. It's better anyway than bdrv_recurse_is_first_non_filter

> +}
> +
>   static void blkverify_refresh_filename(BlockDriverState *bs)
>   {
>       BDRVBlkverifyState *s = bs->opaque;
> @@ -328,6 +342,7 @@ static BlockDriver bdrv_blkverify = {
>   
>       .is_filter                        = true,
>       .bdrv_recurse_is_first_non_filter = blkverify_recurse_is_first_non_filter,
> +    .bdrv_recurse_can_replace         = blkverify_recurse_can_replace,

it will be never called, as bdrv_recurse_can_replace handles filters by itself.

>   };
>   
>   static void bdrv_blkverify_init(void)
> 


-- 
Best regards,
Vladimir

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 08/22] quorum: Store children in own structure
  2019-09-20 15:27 ` [PATCH 08/22] quorum: Store children in own structure Max Reitz
@ 2019-09-25 13:21   ` Vladimir Sementsov-Ogievskiy
  2019-09-26 11:07     ` Max Reitz
  0 siblings, 1 reply; 58+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2019-09-25 13:21 UTC (permalink / raw)
  To: Max Reitz, qemu-block; +Cc: Kevin Wolf, Alberto Garcia, qemu-devel

20.09.2019 18:27, Max Reitz wrote:
> This will be useful when we want to store additional attributes for each
> child.
> 
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>   block/quorum.c | 58 ++++++++++++++++++++++++++++----------------------
>   1 file changed, 33 insertions(+), 25 deletions(-)
> 
> diff --git a/block/quorum.c b/block/quorum.c
> index 17b439056f..cf2171cc74 100644
> --- a/block/quorum.c
> +++ b/block/quorum.c
> @@ -65,9 +65,13 @@ typedef struct QuorumVotes {
>       bool (*compare)(QuorumVoteValue *a, QuorumVoteValue *b);
>   } QuorumVotes;
>   
> +typedef struct QuorumChild {
> +    BdrvChild *child;
> +} QuorumChild;
> +
>   /* the following structure holds the state of one quorum instance */
>   typedef struct BDRVQuorumState {
> -    BdrvChild **children;  /* children BlockDriverStates */
> +    QuorumChild *children;
>       int num_children;      /* children count */
>       unsigned next_child_index;  /* the index of the next child that should
>                                    * be added
> @@ -264,7 +268,7 @@ static void quorum_report_bad_versions(BDRVQuorumState *s,
>           }
>           QLIST_FOREACH(item, &version->items, next) {
>               quorum_report_bad(QUORUM_OP_TYPE_READ, acb->offset, acb->bytes,
> -                              s->children[item->index]->bs->node_name, 0);
> +                              s->children[item->index].child->bs->node_name, 0);
>           }
>       }
>   }
> @@ -279,7 +283,7 @@ static void quorum_rewrite_entry(void *opaque)
>        * corrupted data.
>        * Mask out BDRV_REQ_WRITE_UNCHANGED because this overwrites the
>        * area with different data from the other children. */
> -    bdrv_co_pwritev(s->children[co->idx], acb->offset, acb->bytes,
> +    bdrv_co_pwritev(s->children[co->idx].child, acb->offset, acb->bytes,
>                       acb->qiov, acb->flags & ~BDRV_REQ_WRITE_UNCHANGED);
>   
>       /* Wake up the caller after the last rewrite */
> @@ -578,8 +582,8 @@ static void read_quorum_children_entry(void *opaque)
>       int i = co->idx;
>       QuorumChildRequest *sacb = &acb->qcrs[i];
>   
> -    sacb->bs = s->children[i]->bs;
> -    sacb->ret = bdrv_co_preadv(s->children[i], acb->offset, acb->bytes,
> +    sacb->bs = s->children[i].child->bs;
> +    sacb->ret = bdrv_co_preadv(s->children[i].child, acb->offset, acb->bytes,
>                                  &acb->qcrs[i].qiov, 0);
>   
>       if (sacb->ret == 0) {
> @@ -605,7 +609,8 @@ static int read_quorum_children(QuorumAIOCB *acb)
>   
>       acb->children_read = s->num_children;
>       for (i = 0; i < s->num_children; i++) {
> -        acb->qcrs[i].buf = qemu_blockalign(s->children[i]->bs, acb->qiov->size);
> +        acb->qcrs[i].buf = qemu_blockalign(s->children[i].child->bs,
> +                                           acb->qiov->size);
>           qemu_iovec_init(&acb->qcrs[i].qiov, acb->qiov->niov);
>           qemu_iovec_clone(&acb->qcrs[i].qiov, acb->qiov, acb->qcrs[i].buf);
>       }
> @@ -647,8 +652,8 @@ static int read_fifo_child(QuorumAIOCB *acb)
>       /* We try to read the next child in FIFO order if we failed to read */
>       do {
>           n = acb->children_read++;
> -        acb->qcrs[n].bs = s->children[n]->bs;
> -        ret = bdrv_co_preadv(s->children[n], acb->offset, acb->bytes,
> +        acb->qcrs[n].bs = s->children[n].child->bs;
> +        ret = bdrv_co_preadv(s->children[n].child, acb->offset, acb->bytes,
>                                acb->qiov, 0);
>           if (ret < 0) {
>               quorum_report_bad_acb(&acb->qcrs[n], ret);
> @@ -688,8 +693,8 @@ static void write_quorum_entry(void *opaque)
>       int i = co->idx;
>       QuorumChildRequest *sacb = &acb->qcrs[i];
>   
> -    sacb->bs = s->children[i]->bs;
> -    sacb->ret = bdrv_co_pwritev(s->children[i], acb->offset, acb->bytes,
> +    sacb->bs = s->children[i].child->bs;
> +    sacb->ret = bdrv_co_pwritev(s->children[i].child, acb->offset, acb->bytes,
>                                   acb->qiov, acb->flags);
>       if (sacb->ret == 0) {
>           acb->success_count++;
> @@ -743,12 +748,12 @@ static int64_t quorum_getlength(BlockDriverState *bs)
>       int i;
>   
>       /* check that all file have the same length */
> -    result = bdrv_getlength(s->children[0]->bs);
> +    result = bdrv_getlength(s->children[0].child->bs);
>       if (result < 0) {
>           return result;
>       }
>       for (i = 1; i < s->num_children; i++) {
> -        int64_t value = bdrv_getlength(s->children[i]->bs);
> +        int64_t value = bdrv_getlength(s->children[i].child->bs);
>           if (value < 0) {
>               return value;
>           }
> @@ -774,10 +779,10 @@ static coroutine_fn int quorum_co_flush(BlockDriverState *bs)
>       error_votes.compare = quorum_64bits_compare;
>   
>       for (i = 0; i < s->num_children; i++) {
> -        result = bdrv_co_flush(s->children[i]->bs);
> +        result = bdrv_co_flush(s->children[i].child->bs);
>           if (result) {
>               quorum_report_bad(QUORUM_OP_TYPE_FLUSH, 0, 0,
> -                              s->children[i]->bs->node_name, result);
> +                              s->children[i].child->bs->node_name, result);
>               result_value.l = result;
>               quorum_count_vote(&error_votes, &result_value, i);
>           } else {
> @@ -803,7 +808,7 @@ static bool quorum_recurse_is_first_non_filter(BlockDriverState *bs,
>       int i;
>   
>       for (i = 0; i < s->num_children; i++) {
> -        bool perm = bdrv_recurse_is_first_non_filter(s->children[i]->bs,
> +        bool perm = bdrv_recurse_is_first_non_filter(s->children[i].child->bs,
>                                                        candidate);
>           if (perm) {
>               return true;
> @@ -932,7 +937,7 @@ static int quorum_open(BlockDriverState *bs, QDict *options, int flags,
>       }
>   
>       /* allocate the children array */
> -    s->children = g_new0(BdrvChild *, s->num_children);
> +    s->children = g_new0(QuorumChild, s->num_children);
>       opened = g_new0(bool, s->num_children);
>   
>       for (i = 0; i < s->num_children; i++) {
> @@ -940,8 +945,9 @@ static int quorum_open(BlockDriverState *bs, QDict *options, int flags,
>           ret = snprintf(indexstr, 32, "children.%d", i);
>           assert(ret < 32);
>   
> -        s->children[i] = bdrv_open_child(NULL, options, indexstr, bs,
> -                                         &child_format, false, &local_err);
> +        s->children[i].child = bdrv_open_child(NULL, options, indexstr, bs,
> +                                               &child_format, false,
> +                                               &local_err);
>           if (local_err) {
>               ret = -EINVAL;
>               goto close_exit;
> @@ -962,7 +968,7 @@ close_exit:
>           if (!opened[i]) {
>               continue;
>           }
> -        bdrv_unref_child(bs, s->children[i]);
> +        bdrv_unref_child(bs, s->children[i].child);
>       }
>       g_free(s->children);
>       g_free(opened);
> @@ -979,7 +985,7 @@ static void quorum_close(BlockDriverState *bs)
>       int i;
>   
>       for (i = 0; i < s->num_children; i++) {
> -        bdrv_unref_child(bs, s->children[i]);
> +        bdrv_unref_child(bs, s->children[i].child);
>       }
>   
>       g_free(s->children);
> @@ -1022,8 +1028,10 @@ static void quorum_add_child(BlockDriverState *bs, BlockDriverState *child_bs,
>           s->next_child_index--;
>           goto out;
>       }

more context:
     assert(s->num_children <= INT_MAX / sizeof(BdrvChild *));
     if (s->num_children == INT_MAX / sizeof(BdrvChild *) ||
         s->next_child_index == UINT_MAX) {
         error_setg(errp, "Too many children");
         return;
     }

here: s/BdrvChild */QuorumChild


> -    s->children = g_renew(BdrvChild *, s->children, s->num_children + 1);
> -    s->children[s->num_children++] = child;
> +    s->children = g_renew(QuorumChild, s->children, s->num_children + 1);
> +    s->children[s->num_children++] = (QuorumChild){
> +        .child = child,
> +    };
>   
>   out:
>       bdrv_drained_end(bs);
> @@ -1036,7 +1044,7 @@ static void quorum_del_child(BlockDriverState *bs, BdrvChild *child,
>       int i;
>   
>       for (i = 0; i < s->num_children; i++) {
> -        if (s->children[i] == child) {
> +        if (s->children[i].child == child) {
>               break;
>           }
>       }
> @@ -1059,7 +1067,7 @@ static void quorum_del_child(BlockDriverState *bs, BdrvChild *child,
>       /* We can safely remove this child now */
>       memmove(&s->children[i], &s->children[i + 1],
>               (s->num_children - i - 1) * sizeof(BdrvChild *));

s/BdrvChild */QuorumChild/

> -    s->children = g_renew(BdrvChild *, s->children, --s->num_children);
> +    s->children = g_renew(QuorumChild, s->children, --s->num_children);
>       bdrv_unref_child(bs, child);
>   
>       bdrv_drained_end(bs);
> @@ -1100,7 +1108,7 @@ static void quorum_gather_child_options(BlockDriverState *bs, QDict *target,
>   
>       for (i = 0; i < s->num_children; i++) {
>           qlist_append(children_list,
> -                     qobject_ref(s->children[i]->bs->full_open_options));
> +                     qobject_ref(s->children[i].child->bs->full_open_options));
>       }
>   }
>   
> 

with my suggestions:
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>

-- 
Best regards,
Vladimir

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 10/22] quorum: Implement .bdrv_recurse_can_replace()
  2019-09-20 15:27 ` [PATCH 10/22] quorum: Implement .bdrv_recurse_can_replace() Max Reitz
@ 2019-09-25 13:39   ` Vladimir Sementsov-Ogievskiy
  2019-09-26 11:12     ` Max Reitz
  2019-09-25 14:12   ` Vladimir Sementsov-Ogievskiy
  2019-09-25 14:14   ` Vladimir Sementsov-Ogievskiy
  2 siblings, 1 reply; 58+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2019-09-25 13:39 UTC (permalink / raw)
  To: Max Reitz, qemu-block; +Cc: Kevin Wolf, Alberto Garcia, qemu-devel

20.09.2019 18:27, Max Reitz wrote:
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>   block/quorum.c | 62 ++++++++++++++++++++++++++++++++++++++++++++++++++
>   1 file changed, 62 insertions(+)
> 
> diff --git a/block/quorum.c b/block/quorum.c
> index 207054a64e..81b57dbae2 100644
> --- a/block/quorum.c
> +++ b/block/quorum.c
> @@ -825,6 +825,67 @@ static bool quorum_recurse_is_first_non_filter(BlockDriverState *bs,
>       return false;
>   }
>   
> +static bool quorum_recurse_can_replace(BlockDriverState *bs,
> +                                       BlockDriverState *to_replace)
> +{
> +    BDRVQuorumState *s = bs->opaque;
> +    int i;
> +
> +    for (i = 0; i < s->num_children; i++) {
> +        /*
> +         * We have no idea whether our children show the same data as
> +         * this node (@bs).  It is actually highly likely that
> +         * @to_replace does not, because replacing a broken child is
> +         * one of the main use cases here.
> +         *
> +         * We do know that the new BDS will match @bs, so replacing
> +         * any of our children by it will be safe.  It cannot change
> +         * the data this quorum node presents to its parents.
> +         *
> +         * However, replacing @to_replace by @bs in any of our

I'm a bit misleaded: by @bs, or by node equal to @bs (accordingly to
bdrv_recurse_can_replace spec)?

> +         * children's chains may change visible data somewhere in
> +         * there.  We therefore cannot recurse down those chains with
> +         * bdrv_recurse_can_replace().
> +         * (More formally, bdrv_recurse_can_replace() requires that
> +         * @to_replace will be replaced by something matching the @bs
> +         * passed to it.  We cannot guarantee that.)

Aha, and you answered already :) OK.

> +         *
> +         * Thus, we can only check whether any of our immediate
> +         * children matches @to_replace.
> +         *
> +         * (In the future, we might add a function to recurse down a
> +         * chain that checks that nothing there cares about a change
> +         * in data from the respective child in question.  For
> +         * example, most filters do not care when their child's data
> +         * suddenly changes, as long as their parents do not care.)
> +         */
> +        if (s->children[i].child->bs == to_replace) {

Hmm, still, what is wrong if we just put bdrv_recurse_can_replace(s->children[i].child->bs, to_replace) into this if condition?
(or may be more correct to call it after taking permissions)

> +            Error *local_err = NULL;
> +
> +            /*
> +             * We now have to ensure that there is no other parent
> +             * that cares about replacing this child by a node with
> +             * potentially different data.
> +             */
> +            s->children[i].to_be_replaced = true;
> +            bdrv_child_refresh_perms(bs, s->children[i].child, &local_err);
> +
> +            /* Revert permissions */
> +            s->children[i].to_be_replaced = false;
> +            bdrv_child_refresh_perms(bs, s->children[i].child, &error_abort);
> +
> +            if (local_err) {
> +                error_free(local_err);
> +                return false;
> +            }
> +
> +            return true;
> +        }
> +    }
> +
> +    return false;
> +}
> +
>   static int quorum_valid_threshold(int threshold, int num_children, Error **errp)
>   {
>   
> @@ -1195,6 +1256,7 @@ static BlockDriver bdrv_quorum = {
>   
>       .is_filter                          = true,
>       .bdrv_recurse_is_first_non_filter   = quorum_recurse_is_first_non_filter,
> +    .bdrv_recurse_can_replace           = quorum_recurse_can_replace,
>   
>       .strong_runtime_opts                = quorum_strong_runtime_opts,
>   };
> 


-- 
Best regards,
Vladimir

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 09/22] quorum: Add QuorumChild.to_be_replaced
  2019-09-20 15:27 ` [PATCH 09/22] quorum: Add QuorumChild.to_be_replaced Max Reitz
@ 2019-09-25 13:45   ` Vladimir Sementsov-Ogievskiy
  2019-09-26 11:13     ` Max Reitz
  0 siblings, 1 reply; 58+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2019-09-25 13:45 UTC (permalink / raw)
  To: Max Reitz, qemu-block; +Cc: Kevin Wolf, Alberto Garcia, qemu-devel

20.09.2019 18:27, Max Reitz wrote:
> We will need this to verify that Quorum can let one of its children be
> replaced without breaking anything else.
> 
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>   block/quorum.c | 23 +++++++++++++++++++++++
>   1 file changed, 23 insertions(+)
> 
> diff --git a/block/quorum.c b/block/quorum.c
> index cf2171cc74..207054a64e 100644
> --- a/block/quorum.c
> +++ b/block/quorum.c
> @@ -67,6 +67,13 @@ typedef struct QuorumVotes {
>   
>   typedef struct QuorumChild {
>       BdrvChild *child;
> +
> +    /*
> +     * If set, check whether this node can be replaced without any
> +     * other parent noticing: Unshare CONSISTENT_READ, and take the
> +     * WRITE permission.
> +     */
> +    bool to_be_replaced;
>   } QuorumChild;
>   
>   /* the following structure holds the state of one quorum instance */
> @@ -1128,6 +1135,16 @@ static void quorum_child_perm(BlockDriverState *bs, BdrvChild *c,
>                                 uint64_t perm, uint64_t shared,
>                                 uint64_t *nperm, uint64_t *nshared)
>   {
> +    BDRVQuorumState *s = bs->opaque;
> +    int i;
> +
> +    for (i = 0; i < s->num_children; i++) {
> +        if (s->children[i].child == c) {
> +            break;
> +        }
> +    }
> +    assert(!c || i < s->num_children);

seems, the loop is useless if c == NULL.

> +
>       *nperm = perm & DEFAULT_PERM_PASSTHROUGH;
>   
>       /*
> @@ -1137,6 +1154,12 @@ static void quorum_child_perm(BlockDriverState *bs, BdrvChild *c,
>       *nshared = (shared & (BLK_PERM_CONSISTENT_READ |
>                             BLK_PERM_WRITE_UNCHANGED))
>                | DEFAULT_PERM_UNCHANGED;
> +
> +    if (c && s->children[i].to_be_replaced) {
> +        /* Prepare for sudden data changes */
> +        *nperm |= BLK_PERM_WRITE;
> +        *nshared &= ~BLK_PERM_CONSISTENT_READ;
> +    }
>   }
>   
>   static const char *const quorum_strong_runtime_opts[] = {
> 


-- 
Best regards,
Vladimir

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 10/22] quorum: Implement .bdrv_recurse_can_replace()
  2019-09-20 15:27 ` [PATCH 10/22] quorum: Implement .bdrv_recurse_can_replace() Max Reitz
  2019-09-25 13:39   ` Vladimir Sementsov-Ogievskiy
@ 2019-09-25 14:12   ` Vladimir Sementsov-Ogievskiy
  2019-09-26 11:17     ` Max Reitz
  2019-09-25 14:14   ` Vladimir Sementsov-Ogievskiy
  2 siblings, 1 reply; 58+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2019-09-25 14:12 UTC (permalink / raw)
  To: Max Reitz, qemu-block; +Cc: Kevin Wolf, Alberto Garcia, qemu-devel

20.09.2019 18:27, Max Reitz wrote:
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>   block/quorum.c | 62 ++++++++++++++++++++++++++++++++++++++++++++++++++
>   1 file changed, 62 insertions(+)
> 
> diff --git a/block/quorum.c b/block/quorum.c
> index 207054a64e..81b57dbae2 100644
> --- a/block/quorum.c
> +++ b/block/quorum.c
> @@ -825,6 +825,67 @@ static bool quorum_recurse_is_first_non_filter(BlockDriverState *bs,
>       return false;
>   }
>   
> +static bool quorum_recurse_can_replace(BlockDriverState *bs,
> +                                       BlockDriverState *to_replace)
> +{
> +    BDRVQuorumState *s = bs->opaque;
> +    int i;
> +
> +    for (i = 0; i < s->num_children; i++) {
> +        /*
> +         * We have no idea whether our children show the same data as
> +         * this node (@bs).  It is actually highly likely that
> +         * @to_replace does not, because replacing a broken child is
> +         * one of the main use cases here.
> +         *
> +         * We do know that the new BDS will match @bs, so replacing
> +         * any of our children by it will be safe.  It cannot change
> +         * the data this quorum node presents to its parents.
> +         *
> +         * However, replacing @to_replace by @bs in any of our
> +         * children's chains may change visible data somewhere in
> +         * there.  We therefore cannot recurse down those chains with
> +         * bdrv_recurse_can_replace().
> +         * (More formally, bdrv_recurse_can_replace() requires that
> +         * @to_replace will be replaced by something matching the @bs
> +         * passed to it.  We cannot guarantee that.)
> +         *
> +         * Thus, we can only check whether any of our immediate
> +         * children matches @to_replace.
> +         *
> +         * (In the future, we might add a function to recurse down a
> +         * chain that checks that nothing there cares about a change
> +         * in data from the respective child in question.  For
> +         * example, most filters do not care when their child's data
> +         * suddenly changes, as long as their parents do not care.)
> +         */
> +        if (s->children[i].child->bs == to_replace) {
> +            Error *local_err = NULL;
> +
> +            /*
> +             * We now have to ensure that there is no other parent
> +             * that cares about replacing this child by a node with
> +             * potentially different data.
> +             */
> +            s->children[i].to_be_replaced = true;
> +            bdrv_child_refresh_perms(bs, s->children[i].child, &local_err);
> +

So we are trying to answer on a question "is it ok to replace" it, by cheating on
permission system... Possibly, it's a problem of general design, and instead of
  examining one subtree, we should ask all parents of to_replace node, are they
OK with such replacement..

Another idea is that it's strange to check permissions somewhere else than in generic
permission check functions. But I've no idea how to handle it in permission system.

Or in other words: I don't like it all, but I at least follow how it works. And this
looks better than it was and fixes some bugs.

> +            /* Revert permissions */
> +            s->children[i].to_be_replaced = false;
> +            bdrv_child_refresh_perms(bs, s->children[i].child, &error_abort);
> +
> +            if (local_err) {
> +                error_free(local_err);
> +                return false;
> +            }
> +
> +            return true;
> +        }
> +    }
> +
> +    return false;
> +}
> +
>   static int quorum_valid_threshold(int threshold, int num_children, Error **errp)
>   {
>   
> @@ -1195,6 +1256,7 @@ static BlockDriver bdrv_quorum = {
>   
>       .is_filter                          = true,
>       .bdrv_recurse_is_first_non_filter   = quorum_recurse_is_first_non_filter,
> +    .bdrv_recurse_can_replace           = quorum_recurse_can_replace,
>   
>       .strong_runtime_opts                = quorum_strong_runtime_opts,
>   };
> 


-- 
Best regards,
Vladimir

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 10/22] quorum: Implement .bdrv_recurse_can_replace()
  2019-09-20 15:27 ` [PATCH 10/22] quorum: Implement .bdrv_recurse_can_replace() Max Reitz
  2019-09-25 13:39   ` Vladimir Sementsov-Ogievskiy
  2019-09-25 14:12   ` Vladimir Sementsov-Ogievskiy
@ 2019-09-25 14:14   ` Vladimir Sementsov-Ogievskiy
  2 siblings, 0 replies; 58+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2019-09-25 14:14 UTC (permalink / raw)
  To: Max Reitz, qemu-block; +Cc: Kevin Wolf, Alberto Garcia, qemu-devel

20.09.2019 18:27, Max Reitz wrote:
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>   block/quorum.c | 62 ++++++++++++++++++++++++++++++++++++++++++++++++++
>   1 file changed, 62 insertions(+)
> 
> diff --git a/block/quorum.c b/block/quorum.c
> index 207054a64e..81b57dbae2 100644
> --- a/block/quorum.c
> +++ b/block/quorum.c
> @@ -825,6 +825,67 @@ static bool quorum_recurse_is_first_non_filter(BlockDriverState *bs,
>       return false;
>   }
>   
> +static bool quorum_recurse_can_replace(BlockDriverState *bs,
> +                                       BlockDriverState *to_replace)
> +{
> +    BDRVQuorumState *s = bs->opaque;
> +    int i;
> +
> +    for (i = 0; i < s->num_children; i++) {
> +        /*
> +         * We have no idea whether our children show the same data as
> +         * this node (@bs).  It is actually highly likely that
> +         * @to_replace does not, because replacing a broken child is
> +         * one of the main use cases here.
> +         *
> +         * We do know that the new BDS will match @bs, so replacing
> +         * any of our children by it will be safe.  It cannot change
> +         * the data this quorum node presents to its parents.
> +         *
> +         * However, replacing @to_replace by @bs in any of our
> +         * children's chains may change visible data somewhere in
> +         * there.  We therefore cannot recurse down those chains with
> +         * bdrv_recurse_can_replace().
> +         * (More formally, bdrv_recurse_can_replace() requires that
> +         * @to_replace will be replaced by something matching the @bs
> +         * passed to it.  We cannot guarantee that.)
> +         *
> +         * Thus, we can only check whether any of our immediate
> +         * children matches @to_replace.
> +         *
> +         * (In the future, we might add a function to recurse down a
> +         * chain that checks that nothing there cares about a change
> +         * in data from the respective child in question.  For
> +         * example, most filters do not care when their child's data
> +         * suddenly changes, as long as their parents do not care.)
> +         */
> +        if (s->children[i].child->bs == to_replace) {
> +            Error *local_err = NULL;
> +
> +            /*
> +             * We now have to ensure that there is no other parent
> +             * that cares about replacing this child by a node with
> +             * potentially different data.
> +             */
> +            s->children[i].to_be_replaced = true;
> +            bdrv_child_refresh_perms(bs, s->children[i].child, &local_err);
> +
> +            /* Revert permissions */
> +            s->children[i].to_be_replaced = false;
> +            bdrv_child_refresh_perms(bs, s->children[i].child, &error_abort);
> +
> +            if (local_err) {
> +                error_free(local_err);
> +                return false;
> +            }
> +
> +            return true;
> +        }
> +    }
> +
> +    return false;
> +}
> +
>   static int quorum_valid_threshold(int threshold, int num_children, Error **errp)
>   {
>   
> @@ -1195,6 +1256,7 @@ static BlockDriver bdrv_quorum = {
>   
>       .is_filter                          = true,
>       .bdrv_recurse_is_first_non_filter   = quorum_recurse_is_first_non_filter,
> +    .bdrv_recurse_can_replace           = quorum_recurse_can_replace,

and here, it's useless until we set is_filter to false.

>   
>       .strong_runtime_opts                = quorum_strong_runtime_opts,
>   };
> 


-- 
Best regards,
Vladimir

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 12/22] block: Remove bdrv_recurse_is_first_non_filter()
  2019-09-20 15:27 ` [PATCH 12/22] block: Remove bdrv_recurse_is_first_non_filter() Max Reitz
@ 2019-09-25 14:16   ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 58+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2019-09-25 14:16 UTC (permalink / raw)
  To: Max Reitz, qemu-block; +Cc: Kevin Wolf, Alberto Garcia, qemu-devel

20.09.2019 18:27, Max Reitz wrote:
> It no longer has any users.
> 
> Signed-off-by: Max Reitz <mreitz@redhat.com>


Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>



-- 
Best regards,
Vladimir

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 05/22] quorum: Fix child permissions
  2019-09-25 11:56   ` Vladimir Sementsov-Ogievskiy
@ 2019-09-26 11:02     ` Max Reitz
  0 siblings, 0 replies; 58+ messages in thread
From: Max Reitz @ 2019-09-26 11:02 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-block
  Cc: Kevin Wolf, Alberto Garcia, qemu-devel


[-- Attachment #1.1: Type: text/plain, Size: 769 bytes --]

On 25.09.19 13:56, Vladimir Sementsov-Ogievskiy wrote:
> 20.09.2019 18:27, Max Reitz wrote:
>> Quorum is not actually a filter.  It cannot share WRITE or RESIZE on its
>> children.
> 
> Hmm, backup-top don't want to share WRITE or RESIZE too, but it's a filter still..
> May it be just "Quorum cannot share WRITE or RESIZE on its children." ?

I suppose the original reason for sharing it was just “It’s a filter, so
let’s make it use bdrv_filter_default_perms().”  So that’s what I was
trying to hint at.  I’ll try to be a bit more verbose.  Or less,
depending on what I find looks better. :-)

Max

>> Signed-off-by: Max Reitz <mreitz@redhat.com>
> 
> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> 
> 
> 



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 06/22] block: Add bdrv_recurse_can_replace()
  2019-09-25 12:39   ` Vladimir Sementsov-Ogievskiy
@ 2019-09-26 11:03     ` Max Reitz
  0 siblings, 0 replies; 58+ messages in thread
From: Max Reitz @ 2019-09-26 11:03 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-block
  Cc: Kevin Wolf, Alberto Garcia, qemu-devel


[-- Attachment #1.1: Type: text/plain, Size: 4719 bytes --]

On 25.09.19 14:39, Vladimir Sementsov-Ogievskiy wrote:
> 20.09.2019 18:27, Max Reitz wrote:
>> After a couple of follow-up patches, this function will replace
>> bdrv_recurse_is_first_non_filter() in check_to_replace_node().
>>
>> bdrv_recurse_is_first_non_filter() is both not sufficiently specific for
>> check_to_replace_node() (it allows cases that should not be allowed,
>> like replacing child nodes of quorum with dissenting data that have more
>> parents than just quorum), and it is too restrictive (it is perfectly
>> fine to replace filters).
>>
>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>> ---
>>   include/block/block_int.h | 10 ++++++++++
>>   block.c                   | 38 ++++++++++++++++++++++++++++++++++++++
>>   2 files changed, 48 insertions(+)
>>
>> diff --git a/include/block/block_int.h b/include/block/block_int.h
>> index 5fd4f17d93..0be7d12f04 100644
>> --- a/include/block/block_int.h
>> +++ b/include/block/block_int.h
>> @@ -103,6 +103,13 @@ struct BlockDriver {
>>        */
>>       bool (*bdrv_recurse_is_first_non_filter)(BlockDriverState *bs,
>>                                                BlockDriverState *candidate);
>> +    /*
>> +     * Return true if @to_replace can be replaced by a BDS with the
>> +     * same data as @bs without it affecting @bs's behavior (that is,
>> +     * without it being visible to @bs's parents).
>> +     */
>> +    bool (*bdrv_recurse_can_replace)(BlockDriverState *bs,
>> +                                     BlockDriverState *to_replace);
>>   
>>       int (*bdrv_probe)(const uint8_t *buf, int buf_size, const char *filename);
>>       int (*bdrv_probe_device)(const char *filename);
>> @@ -1254,6 +1261,9 @@ void bdrv_format_default_perms(BlockDriverState *bs, BdrvChild *c,
>>                                  uint64_t perm, uint64_t shared,
>>                                  uint64_t *nperm, uint64_t *nshared);
>>   
>> +bool bdrv_recurse_can_replace(BlockDriverState *bs,
>> +                              BlockDriverState *to_replace);
>> +
>>   /*
>>    * Default implementation for drivers to pass bdrv_co_block_status() to
>>    * their file.
>> diff --git a/block.c b/block.c
>> index 7d99ca692c..a2deca4ac9 100644
>> --- a/block.c
>> +++ b/block.c
>> @@ -6206,6 +6206,44 @@ bool bdrv_recurse_is_first_non_filter(BlockDriverState *bs,
>>       return false;
>>   }
>>   
>> +/*
>> + * This function checks whether the given @to_replace is allowed to be
>> + * replaced by a node that always shows the same data as @bs.  This is
>> + * used for example to verify whether the mirror job can replace
>> + * @to_replace by the target mirrored from @bs.
>> + * To be replaceable, @bs and @to_replace may either be guaranteed to
>> + * always show the same data (because they are only connected through
>> + * filters), or some driver may allow replacing one of its children
>> + * because it can guarantee that this child's data is not visible at
>> + * all (for example, for dissenting quorum children that have no other
>> + * parents).
>> + */
>> +bool bdrv_recurse_can_replace(BlockDriverState *bs,
>> +                              BlockDriverState *to_replace)
>> +{
>> +    if (!bs || !bs->drv) {
>> +        return false;
>> +    }
>> +
>> +    if (bs == to_replace) {
>> +        return true;
>> +    }
>> +
>> +    /* For filters, we can recurse on our own */
>> +    if (bs->drv->is_filter) {
>> +        BdrvChild *child = bs->file ?: bs->backing;
> 
> then, maybe asset(!bs->drv->bdrv_recurse_can_replace)

It’s actually the other way around.  As you find yourself, blkverify is
a filter and has its own implementation.  That is entirely correct
because we cannot recurse to just bs->file in blkverify's case.  So we
should first invoke the driver-specific function, and then have the
generic filter code.

Max

>> +        return bdrv_recurse_can_replace(child->bs, to_replace);
>> +    }
> 
> or, this may be filter-skipping loop instead of recursion, like
> 
> while (bs && bs->drv && bs->drv->is_filter) {
>    bs = (bs->file ?: bs->backing)->bs;
> }
> 
> at function start.
> 
> either way:
> 
> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> 
>> +
>> +    /* See what the driver can do */
>> +    if (bs->drv->bdrv_recurse_can_replace) {
>> +        return bs->drv->bdrv_recurse_can_replace(bs, to_replace);
>> +    }
>> +
>> +    /* Safe default */
>> +    return false;
>> +}
>> +
>>   BlockDriverState *check_to_replace_node(BlockDriverState *parent_bs,
>>                                           const char *node_name, Error **errp)
>>   {
>>
> 
> 



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 484 bytes --]

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 07/22] blkverify: Implement .bdrv_recurse_can_replace()
  2019-09-25 12:56   ` Vladimir Sementsov-Ogievskiy
@ 2019-09-26 11:06     ` Max Reitz
  0 siblings, 0 replies; 58+ messages in thread
From: Max Reitz @ 2019-09-26 11:06 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-block
  Cc: Kevin Wolf, Alberto Garcia, qemu-devel


[-- Attachment #1.1: Type: text/plain, Size: 2986 bytes --]

On 25.09.19 14:56, Vladimir Sementsov-Ogievskiy wrote:
> 20.09.2019 18:27, Max Reitz wrote:
>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>> ---
>>   block/blkverify.c | 15 +++++++++++++++
>>   1 file changed, 15 insertions(+)
>>
>> diff --git a/block/blkverify.c b/block/blkverify.c
>> index 304b0a1368..0add3ab483 100644
>> --- a/block/blkverify.c
>> +++ b/block/blkverify.c
>> @@ -282,6 +282,20 @@ static bool blkverify_recurse_is_first_non_filter(BlockDriverState *bs,
>>       return bdrv_recurse_is_first_non_filter(s->test_file->bs, candidate);
>>   }
>>   
>> +static bool blkverify_recurse_can_replace(BlockDriverState *bs,
>> +                                          BlockDriverState *to_replace)
>> +{
>> +    BDRVBlkverifyState *s = bs->opaque;
>> +
>> +    /*
>> +     * blkverify quits the whole qemu process if there is a mismatch
>> +     * between bs->file->bs and s->test_file->bs.  Therefore, we know
>> +     * know that both must match bs and we can recurse down to either.
>> +     */
>> +    return bdrv_recurse_can_replace(bs->file->bs, to_replace) ||
>> +           bdrv_recurse_can_replace(s->test_file->bs, to_replace);
> 
> Ok, now I understand, what bdrv_recurse_can_replace actually does:
> 
> It searches for to_replace in bs-rooted subtree, going only through equal
> children..
> 
> So, we can replace @to_replace, by something equal to @bs, if @to_replace is
> in equal-subtree of @bs.
> 
> I'll try to explain my misleading:
> 
> bdrv_recurse_can_replace declaration looks like bs and to_replace may be absolutely
> unrelated nodes. So, why bs should decide, can we replace the unrelated to_replace
> node by something..

I thought the comment above bdrv_recurse_can_replace() made that clear:
“To be replaceable, @bs and @to_replace may either be guaranteed to
always show the same data (because they are only connected through
filters), or some driver may allow replacing one of its children because
it can guarantee that this child's data is not visible at all (for
example, for dissenting quorum children that have no other parents).”

> So, it may be simpler to follow, if it was called bdrv_recurse_filtered_subtree, or
> bdrv_recurse_transparent_subtree..
> 
> Still, now I understand, and don't care. It's better anyway than bdrv_recurse_is_first_non_filter
> 
>> +}
>> +
>>   static void blkverify_refresh_filename(BlockDriverState *bs)
>>   {
>>       BDRVBlkverifyState *s = bs->opaque;
>> @@ -328,6 +342,7 @@ static BlockDriver bdrv_blkverify = {
>>   
>>       .is_filter                        = true,
>>       .bdrv_recurse_is_first_non_filter = blkverify_recurse_is_first_non_filter,
>> +    .bdrv_recurse_can_replace         = blkverify_recurse_can_replace,
> 
> it will be never called, as bdrv_recurse_can_replace handles filters by itself.

(As I’ve just replied to the previous patch, I think
bdrv_recurse_can_replace() is wrong about that.)

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 08/22] quorum: Store children in own structure
  2019-09-25 13:21   ` Vladimir Sementsov-Ogievskiy
@ 2019-09-26 11:07     ` Max Reitz
  0 siblings, 0 replies; 58+ messages in thread
From: Max Reitz @ 2019-09-26 11:07 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-block
  Cc: Kevin Wolf, Alberto Garcia, qemu-devel


[-- Attachment #1.1: Type: text/plain, Size: 2114 bytes --]

On 25.09.19 15:21, Vladimir Sementsov-Ogievskiy wrote:
> 20.09.2019 18:27, Max Reitz wrote:

[...]

>> @@ -1022,8 +1028,10 @@ static void quorum_add_child(BlockDriverState *bs, BlockDriverState *child_bs,
>>           s->next_child_index--;
>>           goto out;
>>       }
> 
> more context:
>      assert(s->num_children <= INT_MAX / sizeof(BdrvChild *));
>      if (s->num_children == INT_MAX / sizeof(BdrvChild *) ||
>          s->next_child_index == UINT_MAX) {
>          error_setg(errp, "Too many children");
>          return;
>      }
> 
> here: s/BdrvChild */QuorumChild
> 
> 
>> -    s->children = g_renew(BdrvChild *, s->children, s->num_children + 1);
>> -    s->children[s->num_children++] = child;
>> +    s->children = g_renew(QuorumChild, s->children, s->num_children + 1);
>> +    s->children[s->num_children++] = (QuorumChild){
>> +        .child = child,
>> +    };
>>   
>>   out:
>>       bdrv_drained_end(bs);

[...]

>> @@ -1059,7 +1067,7 @@ static void quorum_del_child(BlockDriverState *bs, BdrvChild *child,
>>       /* We can safely remove this child now */
>>       memmove(&s->children[i], &s->children[i + 1],
>>               (s->num_children - i - 1) * sizeof(BdrvChild *));
> 
> s/BdrvChild */QuorumChild/

Damn, yes to both.

I was really hoping I didn’t mess this patch up.

Thanks.

Max

>> -    s->children = g_renew(BdrvChild *, s->children, --s->num_children);
>> +    s->children = g_renew(QuorumChild, s->children, --s->num_children);
>>       bdrv_unref_child(bs, child);
>>   
>>       bdrv_drained_end(bs);
>> @@ -1100,7 +1108,7 @@ static void quorum_gather_child_options(BlockDriverState *bs, QDict *target,
>>   
>>       for (i = 0; i < s->num_children; i++) {
>>           qlist_append(children_list,
>> -                     qobject_ref(s->children[i]->bs->full_open_options));
>> +                     qobject_ref(s->children[i].child->bs->full_open_options));
>>       }
>>   }
>>   
>>
> 
> with my suggestions:
> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> 



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 10/22] quorum: Implement .bdrv_recurse_can_replace()
  2019-09-25 13:39   ` Vladimir Sementsov-Ogievskiy
@ 2019-09-26 11:12     ` Max Reitz
  0 siblings, 0 replies; 58+ messages in thread
From: Max Reitz @ 2019-09-26 11:12 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-block
  Cc: Kevin Wolf, Alberto Garcia, qemu-devel


[-- Attachment #1.1: Type: text/plain, Size: 3706 bytes --]

On 25.09.19 15:39, Vladimir Sementsov-Ogievskiy wrote:
> 20.09.2019 18:27, Max Reitz wrote:
>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>> ---
>>   block/quorum.c | 62 ++++++++++++++++++++++++++++++++++++++++++++++++++
>>   1 file changed, 62 insertions(+)
>>
>> diff --git a/block/quorum.c b/block/quorum.c
>> index 207054a64e..81b57dbae2 100644
>> --- a/block/quorum.c
>> +++ b/block/quorum.c
>> @@ -825,6 +825,67 @@ static bool quorum_recurse_is_first_non_filter(BlockDriverState *bs,
>>       return false;
>>   }
>>   
>> +static bool quorum_recurse_can_replace(BlockDriverState *bs,
>> +                                       BlockDriverState *to_replace)
>> +{
>> +    BDRVQuorumState *s = bs->opaque;
>> +    int i;
>> +
>> +    for (i = 0; i < s->num_children; i++) {
>> +        /*
>> +         * We have no idea whether our children show the same data as
>> +         * this node (@bs).  It is actually highly likely that
>> +         * @to_replace does not, because replacing a broken child is
>> +         * one of the main use cases here.
>> +         *
>> +         * We do know that the new BDS will match @bs, so replacing
>> +         * any of our children by it will be safe.  It cannot change
>> +         * the data this quorum node presents to its parents.
>> +         *
>> +         * However, replacing @to_replace by @bs in any of our
> 
> I'm a bit misleaded: by @bs, or by node equal to @bs (accordingly to
> bdrv_recurse_can_replace spec)?
> 
>> +         * children's chains may change visible data somewhere in
>> +         * there.  We therefore cannot recurse down those chains with
>> +         * bdrv_recurse_can_replace().
>> +         * (More formally, bdrv_recurse_can_replace() requires that
>> +         * @to_replace will be replaced by something matching the @bs
>> +         * passed to it.  We cannot guarantee that.)
> 
> Aha, and you answered already :) OK.
> 
>> +         *
>> +         * Thus, we can only check whether any of our immediate
>> +         * children matches @to_replace.
>> +         *
>> +         * (In the future, we might add a function to recurse down a
>> +         * chain that checks that nothing there cares about a change
>> +         * in data from the respective child in question.  For
>> +         * example, most filters do not care when their child's data
>> +         * suddenly changes, as long as their parents do not care.)
>> +         */
>> +        if (s->children[i].child->bs == to_replace) {
> 
> Hmm, still, what is wrong if we just put bdrv_recurse_can_replace(s->children[i].child->bs, to_replace) into this if condition?
> (or may be more correct to call it after taking permissions)

It’s wrong because:

bs=quorum -> child=filter -> to_replace <- other_parent

Quorum now takes WRITE on the child and unshares CONSISTENT_READ.  That
doesn’t concern the @other_parent at all, however, it only concerns
parents immediately on its child (the filter node).

bdrv_recurse_can_replace() will happily acknowledge you replacing
@to_replace by something equal to @filter, because that shouldn’t
concern @other_parent.

The problem of course is that we don’t want to replace @to_replace by
something equal to @filter, but by something equal to @quorum.  And that
may be different from @to_replace, so @other_parent should have a say in it.

What we’d need to do is have a function that recurses down and checks
whether there are any parents that care; which is exactly what I’ve
described in the last paragraph of the long comment above.


For now, I think it’s fine to just ignore such cases and forbid them.

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 09/22] quorum: Add QuorumChild.to_be_replaced
  2019-09-25 13:45   ` Vladimir Sementsov-Ogievskiy
@ 2019-09-26 11:13     ` Max Reitz
  0 siblings, 0 replies; 58+ messages in thread
From: Max Reitz @ 2019-09-26 11:13 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-block
  Cc: Kevin Wolf, Alberto Garcia, qemu-devel


[-- Attachment #1.1: Type: text/plain, Size: 1591 bytes --]

On 25.09.19 15:45, Vladimir Sementsov-Ogievskiy wrote:
> 20.09.2019 18:27, Max Reitz wrote:
>> We will need this to verify that Quorum can let one of its children be
>> replaced without breaking anything else.
>>
>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>> ---
>>   block/quorum.c | 23 +++++++++++++++++++++++
>>   1 file changed, 23 insertions(+)
>>
>> diff --git a/block/quorum.c b/block/quorum.c
>> index cf2171cc74..207054a64e 100644
>> --- a/block/quorum.c
>> +++ b/block/quorum.c
>> @@ -67,6 +67,13 @@ typedef struct QuorumVotes {
>>   
>>   typedef struct QuorumChild {
>>       BdrvChild *child;
>> +
>> +    /*
>> +     * If set, check whether this node can be replaced without any
>> +     * other parent noticing: Unshare CONSISTENT_READ, and take the
>> +     * WRITE permission.
>> +     */
>> +    bool to_be_replaced;
>>   } QuorumChild;
>>   
>>   /* the following structure holds the state of one quorum instance */
>> @@ -1128,6 +1135,16 @@ static void quorum_child_perm(BlockDriverState *bs, BdrvChild *c,
>>                                 uint64_t perm, uint64_t shared,
>>                                 uint64_t *nperm, uint64_t *nshared)
>>   {
>> +    BDRVQuorumState *s = bs->opaque;
>> +    int i;
>> +
>> +    for (i = 0; i < s->num_children; i++) {
>> +        if (s->children[i].child == c) {
>> +            break;
>> +        }
>> +    }
>> +    assert(!c || i < s->num_children);
> 
> seems, the loop is useless if c == NULL.

Not wrong.  I’ll put it all into an if (c) and initialize i to -1.

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 10/22] quorum: Implement .bdrv_recurse_can_replace()
  2019-09-25 14:12   ` Vladimir Sementsov-Ogievskiy
@ 2019-09-26 11:17     ` Max Reitz
  0 siblings, 0 replies; 58+ messages in thread
From: Max Reitz @ 2019-09-26 11:17 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-block
  Cc: Kevin Wolf, Alberto Garcia, qemu-devel


[-- Attachment #1.1: Type: text/plain, Size: 4298 bytes --]

On 25.09.19 16:12, Vladimir Sementsov-Ogievskiy wrote:
> 20.09.2019 18:27, Max Reitz wrote:
>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>> ---
>>   block/quorum.c | 62 ++++++++++++++++++++++++++++++++++++++++++++++++++
>>   1 file changed, 62 insertions(+)
>>
>> diff --git a/block/quorum.c b/block/quorum.c
>> index 207054a64e..81b57dbae2 100644
>> --- a/block/quorum.c
>> +++ b/block/quorum.c
>> @@ -825,6 +825,67 @@ static bool quorum_recurse_is_first_non_filter(BlockDriverState *bs,
>>       return false;
>>   }
>>   
>> +static bool quorum_recurse_can_replace(BlockDriverState *bs,
>> +                                       BlockDriverState *to_replace)
>> +{
>> +    BDRVQuorumState *s = bs->opaque;
>> +    int i;
>> +
>> +    for (i = 0; i < s->num_children; i++) {
>> +        /*
>> +         * We have no idea whether our children show the same data as
>> +         * this node (@bs).  It is actually highly likely that
>> +         * @to_replace does not, because replacing a broken child is
>> +         * one of the main use cases here.
>> +         *
>> +         * We do know that the new BDS will match @bs, so replacing
>> +         * any of our children by it will be safe.  It cannot change
>> +         * the data this quorum node presents to its parents.
>> +         *
>> +         * However, replacing @to_replace by @bs in any of our
>> +         * children's chains may change visible data somewhere in
>> +         * there.  We therefore cannot recurse down those chains with
>> +         * bdrv_recurse_can_replace().
>> +         * (More formally, bdrv_recurse_can_replace() requires that
>> +         * @to_replace will be replaced by something matching the @bs
>> +         * passed to it.  We cannot guarantee that.)
>> +         *
>> +         * Thus, we can only check whether any of our immediate
>> +         * children matches @to_replace.
>> +         *
>> +         * (In the future, we might add a function to recurse down a
>> +         * chain that checks that nothing there cares about a change
>> +         * in data from the respective child in question.  For
>> +         * example, most filters do not care when their child's data
>> +         * suddenly changes, as long as their parents do not care.)
>> +         */
>> +        if (s->children[i].child->bs == to_replace) {
>> +            Error *local_err = NULL;
>> +
>> +            /*
>> +             * We now have to ensure that there is no other parent
>> +             * that cares about replacing this child by a node with
>> +             * potentially different data.
>> +             */
>> +            s->children[i].to_be_replaced = true;
>> +            bdrv_child_refresh_perms(bs, s->children[i].child, &local_err);
>> +
> 
> So we are trying to answer on a question "is it ok to replace" it, by cheating on
> permission system... Possibly, it's a problem of general design, and instead of
>   examining one subtree, we should ask all parents of to_replace node, are they
> OK with such replacement..

I’m not sure whether it’s cheating.

We want to replace some node.  A parent should be A-OK with that as long
as it hasn’t frozen its child link, and as long as it doesn’t care about
data changes (it should not have taken CONSISTENT_READ, and it must have
shared WRITE).

The only actual problem we have is that currently basically everything
takes CONSISTENT_READ (which is completely fine), but the only thing
that doesn’t is the mirror_top_bs, and that has exactly the problem of
“I can only get away without CONSISTENT_READ if it was me who unshared it”.

But that’s a different problem.  I don’t think this is cheating.

> Another idea is that it's strange to check permissions somewhere else than in generic
> permission check functions. But I've no idea how to handle it in permission system.

I don’t check the permissions, though.  I let quorum take what it needs
to allow changing one of its children.

What is a problem is that I should keep the permissions tightened until
the node is actually replaced and only then release them.  But that
turned out to be a huge mess so I resorted to just double-checking
before mirror actually completes.

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 14/22] quorum: Stop marking it as a filter
  2019-09-20 15:27 ` [PATCH 14/22] quorum: Stop marking it as a filter Max Reitz
@ 2019-09-26 12:43   ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 58+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2019-09-26 12:43 UTC (permalink / raw)
  To: Max Reitz, qemu-block; +Cc: Kevin Wolf, Alberto Garcia, qemu-devel

20.09.2019 18:27, Max Reitz wrote:
> Quorum is not a filter, for example because it cannot guarantee which of
> its children will serve the next request.  Thus, any of its children may
> differ from the data visible to quorum's parents.
> 
> We have other filters with multiple children, but they differ in this
> aspect:
> 
> - blkverify quits the whole qemu process if its children differ.  As
>    such, we can always skip it when we want to skip it (as a filter node)
>    by going to any of its children.  Both have the same data.
> 
> - replication generally serves requests from bs->file, so this is its
>    only actually filtered child.
> 
> - Block job filters currently only have one child, but they will
>    probably get more children in the future.  Still, they will always
>    have only one actually filtered child.
> 
> Having "filters" as a dedicated node category only makes sense if you
> can skip them by going to a one fixed child that always shows the same
> data as the filter node.  Quorum cannot fulfill this, so it is not a
> filter.
> 
> Signed-off-by: Max Reitz <mreitz@redhat.com>

Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>

> ---
>   block/quorum.c | 1 -
>   1 file changed, 1 deletion(-)
> 
> diff --git a/block/quorum.c b/block/quorum.c
> index 7a8f8b5475..7f56b4df7c 100644
> --- a/block/quorum.c
> +++ b/block/quorum.c
> @@ -1237,7 +1237,6 @@ static BlockDriver bdrv_quorum = {
>   
>       .bdrv_child_perm                    = quorum_child_perm,
>   
> -    .is_filter                          = true,
>       .bdrv_recurse_can_replace           = quorum_recurse_can_replace,
>   
>       .strong_runtime_opts                = quorum_strong_runtime_opts,
> 


-- 
Best regards,
Vladimir

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 15/22] mirror: Prevent loops
  2019-09-20 15:27 ` [PATCH 15/22] mirror: Prevent loops Max Reitz
@ 2019-09-26 13:08   ` Vladimir Sementsov-Ogievskiy
  2019-10-02 12:36     ` Max Reitz
  0 siblings, 1 reply; 58+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2019-09-26 13:08 UTC (permalink / raw)
  To: Max Reitz, qemu-block; +Cc: Kevin Wolf, Alberto Garcia, qemu-devel

20.09.2019 18:27, Max Reitz wrote:
> While bdrv_replace_node() will not follow through with it, a specific
> @replaces asks the mirror job to create a loop.
> 
> For example, say both the source and the target share a child where the
> source is a filter; by letting @replaces point to the common child, you
> ask for a loop.


source[filter]
   |
   v
child  <----- target

and we want target to be a child if itself. OK, it's a loop.

> 
> Or if you use @replaces in drive-mirror with sync=none and
> mode=absolute-paths, you generally ask for a loop (@replaces must point
> to a child of the source, and sync=none makes the source the backing
> file of the target after the job).
> 
> bdrv_replace_node() will not create those loops, but it by doing so, it

s/but it/but ?

> ignores the user-requested configuration, which is not ideally either.
> (In the first example above, the target's child will remain what it was,
> which may still be reasonable.  But in the second example, the target
> will just not become a child of the source, which is precisely what was
> requested with @replaces.)

so you say that second example is bad [1]

> 
> So prevent such configurations, both before the job, and before it
> actually completes.
> 
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>   include/block/block_int.h |  3 +++
>   block.c                   | 30 ++++++++++++++++++++++++
>   block/mirror.c            | 19 +++++++++++++++-
>   blockdev.c                | 48 ++++++++++++++++++++++++++++++++++++++-
>   4 files changed, 98 insertions(+), 2 deletions(-)
> 
> diff --git a/include/block/block_int.h b/include/block/block_int.h
> index 70f26530c9..8a26a0d88a 100644
> --- a/include/block/block_int.h
> +++ b/include/block/block_int.h
> @@ -1256,6 +1256,9 @@ void bdrv_format_default_perms(BlockDriverState *bs, BdrvChild *c,
>   bool bdrv_recurse_can_replace(BlockDriverState *bs,
>                                 BlockDriverState *to_replace);
>   
> +bool bdrv_is_child_of(BlockDriverState *child, BlockDriverState *parent,
> +                      int min_level);
> +
>   /*
>    * Default implementation for drivers to pass bdrv_co_block_status() to
>    * their file.
> diff --git a/block.c b/block.c
> index 0d9b3de98f..332191fb47 100644
> --- a/block.c
> +++ b/block.c
> @@ -6260,6 +6260,36 @@ out:
>       return to_replace_bs;
>   }
>   
> +/*
> + * Return true iff @child is a (recursive) child of @parent, with at
> + * least @min_level edges between them.
> + *
> + * (If @min_level == 0, return true if @child == @parent.  For
> + * @min_level == 1, @child needs to be at least a real child; for
> + * @min_level == 2, it needs to be at least a grand-child; and so on.)
> + */
> +bool bdrv_is_child_of(BlockDriverState *child, BlockDriverState *parent,
> +                      int min_level)
> +{
> +    BdrvChild *c;
> +
> +    if (child == parent && min_level <= 0) {
> +        return true;
> +    }
> +
> +    if (!parent) {
> +        return false;
> +    }
> +
> +    QLIST_FOREACH(c, &parent->children, next) {
> +        if (bdrv_is_child_of(child, c->bs, min_level - 1)) {
> +            return true;
> +        }
> +    }
> +
> +    return false;
> +}
> +
>   /**
>    * Iterates through the list of runtime option keys that are said to
>    * be "strong" for a BDS.  An option is called "strong" if it changes
> diff --git a/block/mirror.c b/block/mirror.c
> index d877637e1e..f30a6933d8 100644
> --- a/block/mirror.c
> +++ b/block/mirror.c
> @@ -701,7 +701,24 @@ static int mirror_exit_common(Job *job)
>            * there.
>            */
>           if (bdrv_recurse_can_replace(src, to_replace)) {
> -            bdrv_replace_node(to_replace, target_bs, &local_err);
> +            /*
> +             * It is OK for @to_replace to be an immediate child of
> +             * @target_bs, because that is what happens with
> +             * drive-mirror sync=none mode=absolute-paths: target_bs's
> +             * backing file will be the source node, which is also
> +             * to_replace (by default).
> +             * bdrv_replace_node() handles this case by not letting
> +             * target_bs->backing point to itself, but to the source
> +             * still.

but here you said [1] is good

> +             */
> +            if (!bdrv_is_child_of(to_replace, target_bs, 2)) {
> +                bdrv_replace_node(to_replace, target_bs, &local_err);
> +            } else {


So, we allow bdrv_replace_node automatically handle case whith immediate child of target is
replaces.. But if consider several additional filters above target (so target is a filter),
we not allow it. Why filtered case is worse?

> +                error_setg(&local_err, "Can no longer replace '%s' by '%s', "
> +                           "because the former is now a child of the latter, "
> +                           "and doing so would thus create a loop",
> +                           to_replace->node_name, target_bs->node_name);
> +            }
>           } else {
>               error_setg(&local_err, "Can no longer replace '%s' by '%s', "
>                          "because it can no longer be guaranteed that doing so "
> diff --git a/blockdev.c b/blockdev.c
> index 0420bc29be..27344247d5 100644
> --- a/blockdev.c
> +++ b/blockdev.c
> @@ -3845,7 +3845,7 @@ static void blockdev_mirror_common(const char *job_id, BlockDriverState *bs,
>       }
>   
>       if (has_replaces) {
> -        BlockDriverState *to_replace_bs;
> +        BlockDriverState *to_replace_bs, *target_backing_bs;
>           AioContext *replace_aio_context;
>           int64_t bs_size, replace_size;
>   
> @@ -3860,6 +3860,52 @@ static void blockdev_mirror_common(const char *job_id, BlockDriverState *bs,
>               return;
>           }
>   
> +        if (bdrv_is_child_of(to_replace_bs, target, 1)) {
> +            error_setg(errp, "Replacing %s by %s would result in a loop, "
> +                       "because the former is a child of the latter",
> +                       to_replace_bs->node_name, target->node_name);
> +            return;
> +        }
> +
> +        if (backing_mode == MIRROR_SOURCE_BACKING_CHAIN ||
> +            backing_mode == MIRROR_OPEN_BACKING_CHAIN)
> +        {
> +            /*
> +             * While we do not quite know what OPEN_BACKING_CHAIN
> +             * (used for mode=existing) will yield, it is probably
> +             * best to restrict it exactly like SOURCE_BACKING_CHAIN,
> +             * because that is our best guess.
> +             */
> +            switch (sync) {
> +            case MIRROR_SYNC_MODE_FULL:
> +                target_backing_bs = NULL;
> +                break;
> +
> +            case MIRROR_SYNC_MODE_TOP:
> +                target_backing_bs = backing_bs(bs);
> +                break;
> +
> +            case MIRROR_SYNC_MODE_NONE:
> +                target_backing_bs = bs;
> +                break;
> +
> +            default:
> +                abort();
> +            }
> +        } else {
> +            assert(backing_mode == MIRROR_LEAVE_BACKING_CHAIN);
> +            target_backing_bs = backing_bs(target);
> +        }
> +
> +        if (bdrv_is_child_of(to_replace_bs, target_backing_bs, 0)) {
> +            error_setg(errp, "Replacing '%s' by '%s' with this sync mode would "
> +                       "result in a loop, because the former would be a child "
> +                       "of the latter's backing file ('%s') after the mirror "
> +                       "job", to_replace_bs->node_name, target->node_name,
> +                       target_backing_bs->node_name);
> +            return;
> +        }
> +
>           replace_aio_context = bdrv_get_aio_context(to_replace_bs);
>           aio_context_acquire(replace_aio_context);
>           replace_size = bdrv_getlength(to_replace_bs);
> 


-- 
Best regards,
Vladimir

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 16/22] iotests: Use complete_and_wait() in 155
  2019-09-20 15:27 ` [PATCH 16/22] iotests: Use complete_and_wait() in 155 Max Reitz
@ 2019-09-26 13:11   ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 58+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2019-09-26 13:11 UTC (permalink / raw)
  To: Max Reitz, qemu-block; +Cc: Kevin Wolf, Alberto Garcia, qemu-devel

20.09.2019 18:27, Max Reitz wrote:
> This way, we get to see errors during the completion phase.
> 
> Signed-off-by: Max Reitz<mreitz@redhat.com>

Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>

-- 
Best regards,
Vladimir

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 17/22] iotests: Add VM.assert_block_path()
  2019-09-20 15:27 ` [PATCH 17/22] iotests: Add VM.assert_block_path() Max Reitz
@ 2019-09-26 14:07   ` Vladimir Sementsov-Ogievskiy
  2019-10-02 12:40     ` Max Reitz
  0 siblings, 1 reply; 58+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2019-09-26 14:07 UTC (permalink / raw)
  To: Max Reitz, qemu-block; +Cc: Kevin Wolf, Alberto Garcia, qemu-devel

20.09.2019 18:27, Max Reitz wrote:
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>   tests/qemu-iotests/iotests.py | 48 +++++++++++++++++++++++++++++++++++
>   1 file changed, 48 insertions(+)
> 
> diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py
> index daed4ee013..e6fb46287d 100644
> --- a/tests/qemu-iotests/iotests.py
> +++ b/tests/qemu-iotests/iotests.py
> @@ -670,6 +670,54 @@ class VM(qtest.QEMUQtestMachine):
>   
>           return fields.items() <= ret.items()
>   
> +    '''
> +    @path is a string whose components are separated by slashes.
> +    The first component is a node name, the rest are child names.
> +    Examples:
> +      - "qcow2-node/backing/file"
> +      - "quorum-node/children.2/file"

Possibly, separting node-name to first parameter and keeping child-path as
a second will simplify code a bit, and be more useful for cases when on caller
part node-name is in variable.

> +
> +    @expected_node may be None.
> +
> +    @graph may be None or the result of an x-debug-query-block-graph
> +    call that has already been performed.
> +    '''
> +    def assert_block_path(self, path, expected_node, graph=None):
> +        if graph is None:
> +            graph = self.qmp('x-debug-query-block-graph')['return']

Yay! I'm happy to see that it's useful.

> +
> +        iter_path = iter(path.split('/'))
> +        root = next(iter_path)
> +        try:
> +            node = next(node for node in graph['nodes'] if node['name'] == root)
> +        except StopIteration:
> +            node = None

for such usage next has second optional argument: next(iterator[, default])

(don't think I teach you Python, actually you teach me, as before I didn't know
correct way to search first element with condition)


> +
> +        for path_node in iter_path:
> +            assert node is not None, 'Cannot follow path %s' % path
> +
> +            try:
> +                node_id = next(edge['child'] for edge in graph['edges'] \
> +                                             if edge['parent'] == node['id'] and
> +                                                edge['name'] == path_node)

Hmm here you allow default StopIteration exception [1]


> +
> +                node = next(node for node in graph['nodes'] \
> +                                 if node['id'] == node_id)
> +            except StopIteration:
> +                node = None

actually, I think this will never happen, so we may simplify code and allow it to
throw StopIteration exception in this impossible case..

> +
> +        assert node is not None or expected_node is None, \
> +               'No node found under %s (but expected %s)' % \
> +               (path, expected_node)

node may be None here only from last iteration, but it can't happen: if we have edge
with child, we'll for sure have node with such node-name in graph

> +
> +        assert expected_node is not None or node is None, \
> +               'Found node %s under %s (but expected none)' % \
> +               (node['name'], path)

hmm, so expected_node=None means we want to prove that there is no such node? It should
be mentioned in comment above the function. But this don't work due to [1]

> +
> +        if node is not None and expected_node is not None:
> +            assert node['name'] == expected_node, \
> +                   'Found node %s under %s (but expected %s)' % \
> +                   (node['name'], path, expected_node)
>   
>   index_re = re.compile(r'([^\[]+)\[([^\]]+)\]')
>   
> 


-- 
Best regards,
Vladimir

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 18/22] iotests: Resolve TODOs in 041
  2019-09-20 15:28 ` [PATCH 18/22] iotests: Resolve TODOs in 041 Max Reitz
@ 2019-09-26 14:09   ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 58+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2019-09-26 14:09 UTC (permalink / raw)
  To: Max Reitz, qemu-block; +Cc: Kevin Wolf, Alberto Garcia, qemu-devel

20.09.2019 18:28, Max Reitz wrote:
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>   tests/qemu-iotests/041 | 7 ++-----
>   1 file changed, 2 insertions(+), 5 deletions(-)
> 
> diff --git a/tests/qemu-iotests/041 b/tests/qemu-iotests/041
> index d91f538276..ca126de3ff 100755
> --- a/tests/qemu-iotests/041
> +++ b/tests/qemu-iotests/041
> @@ -921,8 +921,7 @@ class TestRepairQuorum(iotests.QMPTestCase):
>   
>           self.complete_and_wait(drive="job0")
>           self.assert_has_block_node("repair0", quorum_repair_img)
> -        # TODO: a better test requiring some QEMU infrastructure will be added
> -        #       to check that this file is really driven by quorum
> +        self.vm.assert_block_path('quorum0/children.1', 'repair0')
>           self.vm.shutdown()
>           self.assertTrue(iotests.compare_images(quorum_img2, quorum_repair_img),
>                           'target image does not match source after mirroring')
> @@ -1074,9 +1073,7 @@ class TestRepairQuorum(iotests.QMPTestCase):
>   
>           self.complete_and_wait('job0')
>           self.assert_has_block_node("repair0", quorum_repair_img)
> -        # TODO: a better test requiring some QEMU infrastructure will be added
> -        #       to check that this file is really driven by quorum
> -        self.vm.shutdown()
> +        self.vm.assert_block_path('quorum0/children.1', 'repair0')
>   
>   # Test mirroring with a source that does not have any parents (not even a
>   # BlockBackend)
> 


Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>

-- 
Best regards,
Vladimir

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 19/22] iotests: Use self.image_len in TestRepairQuorum
  2019-09-20 15:28 ` [PATCH 19/22] iotests: Use self.image_len in TestRepairQuorum Max Reitz
@ 2019-09-26 14:13   ` Vladimir Sementsov-Ogievskiy
  2019-10-02 12:42     ` Max Reitz
  0 siblings, 1 reply; 58+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2019-09-26 14:13 UTC (permalink / raw)
  To: Max Reitz, qemu-block; +Cc: Kevin Wolf, Alberto Garcia, qemu-devel

20.09.2019 18:28, Max Reitz wrote:
> 041's TestRepairQuorum has its own image_len, no need to refer to
> TestSingleDrive.  (This patch allows uncommenting TestSingleDrive to

you mean commenting

> speed up 041 during test testing.)

we definitely want a way to run a subset of test cases.

I usually do s/def test/def ntest/, and then set needed test-case back to
'def test'

> 
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>   tests/qemu-iotests/041 | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/tests/qemu-iotests/041 b/tests/qemu-iotests/041
> index ca126de3ff..20ae9750b7 100755
> --- a/tests/qemu-iotests/041
> +++ b/tests/qemu-iotests/041
> @@ -880,7 +880,7 @@ class TestRepairQuorum(iotests.QMPTestCase):
>           # Add each individual quorum images
>           for i in self.IMAGES:
>               qemu_img('create', '-f', iotests.imgfmt, i,
> -                     str(TestSingleDrive.image_len))
> +                     str(self.image_len))

yes, seems TestSingleDrive.image_len is a copy-pasting mistake here..

>               # Assign a node name to each quorum image in order to manipulate
>               # them
>               opts = "node-name=img%i" % self.IMAGES.index(i)
> 


Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>

-- 
Best regards,
Vladimir

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 20/22] iotests: Add tests for invalid Quorum @replaces
  2019-09-20 15:28 ` [PATCH 20/22] iotests: Add tests for invalid Quorum @replaces Max Reitz
@ 2019-09-26 14:30   ` Vladimir Sementsov-Ogievskiy
  2019-10-02 12:43     ` Max Reitz
  0 siblings, 1 reply; 58+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2019-09-26 14:30 UTC (permalink / raw)
  To: Max Reitz, qemu-block; +Cc: Kevin Wolf, Alberto Garcia, qemu-devel

20.09.2019 18:28, Max Reitz wrote:
> Add two tests to see that you cannot replace a Quorum child with the
> mirror job while the child is in use by a different parent.
> 
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>   tests/qemu-iotests/041     | 57 +++++++++++++++++++++++++++++++++++++-
>   tests/qemu-iotests/041.out |  4 +--
>   2 files changed, 58 insertions(+), 3 deletions(-)
> 
> diff --git a/tests/qemu-iotests/041 b/tests/qemu-iotests/041
> index 20ae9750b7..148dc47ce4 100755
> --- a/tests/qemu-iotests/041
> +++ b/tests/qemu-iotests/041
> @@ -34,6 +34,8 @@ quorum_img3 = os.path.join(iotests.test_dir, 'quorum3.img')
>   quorum_repair_img = os.path.join(iotests.test_dir, 'quorum_repair.img')
>   quorum_snapshot_file = os.path.join(iotests.test_dir, 'quorum_snapshot.img')
>   
> +nbd_sock_path = os.path.join(iotests.test_dir, 'nbd.sock')
> +
>   class TestSingleDrive(iotests.QMPTestCase):
>       image_len = 1 * 1024 * 1024 # MB
>       qmp_cmd = 'drive-mirror'
> @@ -901,7 +903,8 @@ class TestRepairQuorum(iotests.QMPTestCase):
>   
>       def tearDown(self):
>           self.vm.shutdown()
> -        for i in self.IMAGES + [ quorum_repair_img, quorum_snapshot_file ]:
> +        for i in self.IMAGES + [ quorum_repair_img, quorum_snapshot_file,
> +                                 nbd_sock_path ]:
>               # Do a try/except because the test may have deleted some images
>               try:
>                   os.remove(i)
> @@ -1075,6 +1078,58 @@ class TestRepairQuorum(iotests.QMPTestCase):
>           self.assert_has_block_node("repair0", quorum_repair_img)
>           self.vm.assert_block_path('quorum0/children.1', 'repair0')
>   
> +    '''
> +    Check that we cannot replace a Quorum child when it has other
> +    parents.
> +    '''

you constantly use ''', when PEP8 recommends """ for doc-strings. I can't
complain, as our python code is something not related to PEP8 unfortunately..

> +    def test_with_other_parent(self):

don't we need
         if not iotests.supports_quorum():
             return
like in neighbors?

> +        result = self.vm.qmp('nbd-server-start',
> +                             addr={
> +                                 'type': 'unix',
> +                                 'data': {'path': nbd_sock_path}
> +                             })
> +        self.assert_qmp(result, 'return', {})
> +
> +        result = self.vm.qmp('nbd-server-add', device='img1')
> +        self.assert_qmp(result, 'return', {})
> +
> +        result = self.vm.qmp('drive-mirror', job_id='mirror', device='quorum0',
> +                             sync='full', node_name='repair0', replaces='img1',
> +                             target=quorum_repair_img, format=iotests.imgfmt)
> +        self.assert_qmp(result, 'error/desc',
> +                        "Cannot replace 'img1' by a node mirrored from "
> +                        "'quorum0', because it cannot be guaranteed that doing "
> +                        "so would not lead to an abrupt change of visible data")
> +
> +    '''
> +    The same as test_with_other_parent(), but add the NBD server only
> +    when the mirror job is already running.
> +    '''
> +    def test_with_other_parents_after_mirror_start(self):
> +        result = self.vm.qmp('nbd-server-start',
> +                             addr={
> +                                 'type': 'unix',
> +                                 'data': {'path': nbd_sock_path}
> +                             })
> +        self.assert_qmp(result, 'return', {})
> +
> +        result = self.vm.qmp('drive-mirror', job_id='mirror', device='quorum0',
> +                             sync='full', node_name='repair0', replaces='img1',
> +                             target=quorum_repair_img, format=iotests.imgfmt)
> +        self.assert_qmp(result, 'return', {})
> +
> +        result = self.vm.qmp('nbd-server-add', device='img1')
> +        self.assert_qmp(result, 'return', {})
> +
> +        # The full error message goes to stderr, so we unfortunately
> +        # cannot check it here

We can, iotests 169 and 245 do it with help of vm.get_log()

> +        self.complete_and_wait('mirror',
> +                               completion_error='Operation not permitted')
> +
> +        # Should not have been replaced
> +        self.vm.assert_block_path('quorum0/children.1', 'img1')
> +
> +
>   # Test mirroring with a source that does not have any parents (not even a
>   # BlockBackend)
>   class TestOrphanedSource(iotests.QMPTestCase):
> diff --git a/tests/qemu-iotests/041.out b/tests/qemu-iotests/041.out
> index f496be9197..ffc779b4d1 100644
> --- a/tests/qemu-iotests/041.out
> +++ b/tests/qemu-iotests/041.out
> @@ -1,5 +1,5 @@
> -...........................................................................................
> +.............................................................................................
>   ----------------------------------------------------------------------
> -Ran 91 tests
> +Ran 93 tests
>   
>   OK
> 

With supports_quorum checked like in neighbor test-cases (or use @iotests.skip_if_unsupported):
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>

-- 
Best regards,
Vladimir

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 21/22] iotests: Check that @replaces can replace filters
  2019-09-20 15:28 ` [PATCH 21/22] iotests: Check that @replaces can replace filters Max Reitz
@ 2019-09-26 14:42   ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 58+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2019-09-26 14:42 UTC (permalink / raw)
  To: Max Reitz, qemu-block; +Cc: Kevin Wolf, Alberto Garcia, qemu-devel

20.09.2019 18:28, Max Reitz wrote:
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>   tests/qemu-iotests/041     | 45 ++++++++++++++++++++++++++++++++++++++
>   tests/qemu-iotests/041.out |  4 ++--
>   2 files changed, 47 insertions(+), 2 deletions(-)
> 
> diff --git a/tests/qemu-iotests/041 b/tests/qemu-iotests/041
> index 148dc47ce4..e4cc829fe2 100755
> --- a/tests/qemu-iotests/041
> +++ b/tests/qemu-iotests/041
> @@ -1220,6 +1220,51 @@ class TestOrphanedSource(iotests.QMPTestCase):
>           self.assertFalse('mirror-filter' in nodes,
>                            'Mirror filter node did not disappear')
>   
> +# Test cases for @replaces that do not necessarily involve Quorum
> +class TestReplaces(iotests.QMPTestCase):
> +    # Each of these test cases needs their own block graph, so do not
> +    # create any nodes here
> +    def setUp(self):
> +        self.vm = iotests.VM()
> +        self.vm.launch()
> +
> +    def tearDown(self):
> +        self.vm.shutdown()
> +        for img in (test_img, target_img):
> +            try:
> +                os.remove(img)
> +            except OSError:
> +                pass

dead code, but may be used in future patch..

> +
> +    '''
> +    Check that we can replace filter nodes.
> +    '''
> +    def test_replace_filter(self):
> +        result = self.vm.qmp('blockdev-add', **{
> +                                 'driver': 'copy-on-read',
> +                                 'node-name': 'filter0',
> +                                 'file': {
> +                                     'driver': 'copy-on-read',
> +                                     'node-name': 'filter1',
> +                                     'file': {
> +                                         'driver': 'null-co'
> +                                     }
> +                                 }
> +                             })
> +        self.assert_qmp(result, 'return', {})
> +
> +        result = self.vm.qmp('blockdev-add',
> +                             node_name='target', driver='null-co')
> +        self.assert_qmp(result, 'return', {})
> +
> +        result = self.vm.qmp('blockdev-mirror', job_id='mirror', device='filter0',
> +                             target='target', sync='full', replaces='filter1')
> +        self.assert_qmp(result, 'return', {})
> +
> +        self.complete_and_wait('mirror')
> +
> +        self.vm.assert_block_path('filter0/file', 'target')
> +
>   if __name__ == '__main__':
>       iotests.main(supported_fmts=['qcow2', 'qed'],
>                    supported_protocols=['file'])
> diff --git a/tests/qemu-iotests/041.out b/tests/qemu-iotests/041.out
> index ffc779b4d1..877b76fd31 100644
> --- a/tests/qemu-iotests/041.out
> +++ b/tests/qemu-iotests/041.out
> @@ -1,5 +1,5 @@
> -.............................................................................................
> +..............................................................................................
>   ----------------------------------------------------------------------
> -Ran 93 tests
> +Ran 94 tests
>   
>   OK
> 

Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>

-- 
Best regards,
Vladimir

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 22/22] iotests: Mirror must not attempt to create loops
  2019-09-20 15:28 ` [PATCH 22/22] iotests: Mirror must not attempt to create loops Max Reitz
@ 2019-09-26 15:03   ` Vladimir Sementsov-Ogievskiy
  2019-10-02 12:46     ` Max Reitz
  0 siblings, 1 reply; 58+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2019-09-26 15:03 UTC (permalink / raw)
  To: Max Reitz, qemu-block; +Cc: Kevin Wolf, Alberto Garcia, qemu-devel

20.09.2019 18:28, Max Reitz wrote:
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>   tests/qemu-iotests/041     | 152 +++++++++++++++++++++++++++++++++++++
>   tests/qemu-iotests/041.out |   4 +-
>   2 files changed, 154 insertions(+), 2 deletions(-)
> 
> diff --git a/tests/qemu-iotests/041 b/tests/qemu-iotests/041
> index e4cc829fe2..6ea4764ae8 100755
> --- a/tests/qemu-iotests/041
> +++ b/tests/qemu-iotests/041
> @@ -1265,6 +1265,158 @@ class TestReplaces(iotests.QMPTestCase):
>   
>           self.vm.assert_block_path('filter0/file', 'target')
>   
> +    '''
> +    See what happens when the @sync/@replaces configuration dictates
> +    creating a loop.
> +    '''
> +    def test_loop(self):
> +        qemu_img('create', '-f', iotests.imgfmt, test_img, str(1 * 1024 * 1024))
> +
> +        # Dummy group so we can create a NOP filter
> +        result = self.vm.qmp('object-add', qom_type='throttle-group', id='tg0')
> +        self.assert_qmp(result, 'return', {})
> +
> +        result = self.vm.qmp('blockdev-add', **{
> +                                 'driver': 'throttle',
> +                                 'node-name': 'source',
> +                                 'throttle-group': 'tg0',
> +                                 'file': {
> +                                     'driver': iotests.imgfmt,
> +                                     'node-name': 'filtered',
> +                                     'file': {
> +                                         'driver': 'file',
> +                                         'filename': test_img
> +                                     }
> +                                 }
> +                             })
> +        self.assert_qmp(result, 'return', {})
> +
> +        # Block graph is now:
> +        #   source[throttle] --file--> filtered[qcow2] --file--> ...

or qed, actually

> +
> +        result = self.vm.qmp('drive-mirror', job_id='mirror', device='source',
> +                             target=target_img, format=iotests.imgfmt,
> +                             node_name='target', sync='none',
> +                             replaces='filtered')
> +
> +        '''
> +        Block graph before mirror exits would be (ignoring mirror_top):
> +          source[throttle] --file--> filtered[qcow2] --file--> ...
> +          target[qcow2] --file--> ...
> +
> +        Then, because of sync=none and drive-mirror in absolute-paths mode,
> +        the source is attached to the target:
> +          source[throttle] --file--> filtered[qcow2] --file--> ...
> +                 ^
                     |
> +              backing
> +                 |
> +            target[qcow2] --file--> ...
> +
> +        Replacing filtered by target would yield:
> +          source[throttle] --file--> target[qcow2] --file--> ...
> +                 ^                        |
> +                 +------- backing --------+
> +
> +        I.e., a loop.  bdrv_replace_node() detects this and simply
> +        does not let source's file link point to target.  However,
> +        that means that target cannot really replace source.
> +
> +        drive-mirror should detect this and not allow this case.
> +        '''
> +
> +        self.assert_qmp(result, 'error/desc',
> +                        "Replacing 'filtered' by 'target' with this sync " + \
> +                        "mode would result in a loop, because the former " + \
> +                        "would be a child of the latter's backing file " + \
> +                        "('source') after the mirror job")
> +
> +    '''
> +    Test what happens when there would be no loop with the pre-mirror
> +    configuration, but something changes during the mirror job that asks
> +    for a loop to be created during completion.
> +    '''
> +    def test_loop_during_mirror(self):
> +        qemu_img('create', '-f', iotests.imgfmt, test_img, str(1 * 1024 * 1024))
> +
> +        result = self.vm.qmp('blockdev-add', **{
> +                                 'driver': 'null-co',
> +                                 'node-name': 'common-base',
> +                                 'read-zeroes': True,

why do you need read-zeroes?

> +                                 'size': 1 * 1024 * 1024
> +                             })
> +        self.assert_qmp(result, 'return', {})
> +
> +        result = self.vm.qmp('blockdev-add', **{
> +                                 'driver': 'copy-on-read',
> +                                 'node-name': 'source',
> +                                 'file': 'common-base'
> +                             })
> +        self.assert_qmp(result, 'return', {})

Hmm, why don't you create them both in one query?

> +
> +        '''

the following is hard to read without some hint like, "We are going to ..."

> +        x-blockdev-change can only add children to a quorum node that
> +        have no parent yet, so we need an intermediate node between
> +        target and common-base that has no parents other than target.
> +        We cannot use any parent that would forward the RESIZE
> +        permission (because the job takes it, but unshares it on the
> +        source), so we make it a backing child of a qcow2 node.
> +        Unfortunately, we cannot add backing files to Quorum nodes
> +        (because of an op blocker), so we put another raw node between
> +        the qcow2 node and common-base.
> +        '''
> +        result = self.vm.qmp('blockdev-add', **{
> +                                 'driver': 'qcow2',
> +                                 'node-name': 'base-parent',
> +                                 'file': {
> +                                     'driver': 'file',
> +                                     'filename': test_img
> +                                 },
> +                                 'backing': {
> +                                     'driver': 'raw',
> +                                     'file': 'common-base'
> +                                 }
> +                             })
> +
> +        # Add a quorum node with a single child, we will add
> +        # base-parent to prepare a loop later
> +        result = self.vm.qmp('blockdev-add', **{
> +                                 'driver': 'quorum',

It would be good to skip test-cases if quorum unsupported, like other test-cases
with quorum.

> +                                 'node-name': 'target',
> +                                 'vote-threshold': 1,
> +                                 'children': [
> +                                     {
> +                                         'driver': 'null-co',
> +                                         'read-zeroes': True
> +                                     }
> +                                 ]
> +                             })
> +        self.assert_qmp(result, 'return', {})

It would be nice to comment out current block graph here...

> +
> +        result = self.vm.qmp('blockdev-mirror', job_id='mirror',
> +                             device='source', target='target', sync='full',
> +                             replaces='common-base')
> +        self.assert_qmp(result, 'return', {})
> +
> +        result = self.vm.qmp('x-blockdev-change',
> +                             parent='target', node='base-parent')
> +        self.assert_qmp(result, 'return', {})
> +
> +        '''

and here, like you do in previous test-case. And here it even nicer, as this test-case
is more complex.

> +        This asks for this configuration post-mirror:
> +
> +        source -> target (replaced common-base) -> base-parent
> +                                  ^                    |
> +                                  |                    v
> +                                  +----------------- [raw]
> +
> +        bdrv_replace_node() would not allow such a configuration, but
> +        we should not pretend we can create it, so the mirror job
> +        should fail during completion.
> +        '''
> +
> +        self.complete_and_wait('mirror',
> +                               completion_error='Operation not permitted')
> +
>   if __name__ == '__main__':
>       iotests.main(supported_fmts=['qcow2', 'qed'],
>                    supported_protocols=['file'])
> diff --git a/tests/qemu-iotests/041.out b/tests/qemu-iotests/041.out
> index 877b76fd31..20a8158b99 100644
> --- a/tests/qemu-iotests/041.out
> +++ b/tests/qemu-iotests/041.out
> @@ -1,5 +1,5 @@
> -..............................................................................................
> +................................................................................................
>   ----------------------------------------------------------------------
> -Ran 94 tests
> +Ran 96 tests
>   
>   OK
> 


-- 
Best regards,
Vladimir

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 15/22] mirror: Prevent loops
  2019-09-26 13:08   ` Vladimir Sementsov-Ogievskiy
@ 2019-10-02 12:36     ` Max Reitz
  0 siblings, 0 replies; 58+ messages in thread
From: Max Reitz @ 2019-10-02 12:36 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-block
  Cc: Kevin Wolf, Alberto Garcia, qemu-devel


[-- Attachment #1.1: Type: text/plain, Size: 7474 bytes --]

On 26.09.19 15:08, Vladimir Sementsov-Ogievskiy wrote:
> 20.09.2019 18:27, Max Reitz wrote:
>> While bdrv_replace_node() will not follow through with it, a specific
>> @replaces asks the mirror job to create a loop.
>>
>> For example, say both the source and the target share a child where the
>> source is a filter; by letting @replaces point to the common child, you
>> ask for a loop.
> 
> 
> source[filter]
>    |
>    v
> child  <----- target
> 
> and we want target to be a child if itself. OK, it's a loop.
> 
>>
>> Or if you use @replaces in drive-mirror with sync=none and
>> mode=absolute-paths, you generally ask for a loop (@replaces must point
>> to a child of the source, and sync=none makes the source the backing
>> file of the target after the job).
>>
>> bdrv_replace_node() will not create those loops, but it by doing so, it
> 
> s/but it/but ?

Yep.

>> ignores the user-requested configuration, which is not ideally either.
>> (In the first example above, the target's child will remain what it was,
>> which may still be reasonable.  But in the second example, the target
>> will just not become a child of the source, which is precisely what was
>> requested with @replaces.)
> 
> so you say that second example is bad [1]
> 
>>
>> So prevent such configurations, both before the job, and before it
>> actually completes.
>>
>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>> ---
>>   include/block/block_int.h |  3 +++
>>   block.c                   | 30 ++++++++++++++++++++++++
>>   block/mirror.c            | 19 +++++++++++++++-
>>   blockdev.c                | 48 ++++++++++++++++++++++++++++++++++++++-
>>   4 files changed, 98 insertions(+), 2 deletions(-)
>>
>> diff --git a/include/block/block_int.h b/include/block/block_int.h
>> index 70f26530c9..8a26a0d88a 100644
>> --- a/include/block/block_int.h
>> +++ b/include/block/block_int.h
>> @@ -1256,6 +1256,9 @@ void bdrv_format_default_perms(BlockDriverState *bs, BdrvChild *c,
>>   bool bdrv_recurse_can_replace(BlockDriverState *bs,
>>                                 BlockDriverState *to_replace);
>>   
>> +bool bdrv_is_child_of(BlockDriverState *child, BlockDriverState *parent,
>> +                      int min_level);
>> +
>>   /*
>>    * Default implementation for drivers to pass bdrv_co_block_status() to
>>    * their file.
>> diff --git a/block.c b/block.c
>> index 0d9b3de98f..332191fb47 100644
>> --- a/block.c
>> +++ b/block.c
>> @@ -6260,6 +6260,36 @@ out:
>>       return to_replace_bs;
>>   }
>>   
>> +/*
>> + * Return true iff @child is a (recursive) child of @parent, with at
>> + * least @min_level edges between them.
>> + *
>> + * (If @min_level == 0, return true if @child == @parent.  For
>> + * @min_level == 1, @child needs to be at least a real child; for
>> + * @min_level == 2, it needs to be at least a grand-child; and so on.)
>> + */
>> +bool bdrv_is_child_of(BlockDriverState *child, BlockDriverState *parent,
>> +                      int min_level)
>> +{
>> +    BdrvChild *c;
>> +
>> +    if (child == parent && min_level <= 0) {
>> +        return true;
>> +    }
>> +
>> +    if (!parent) {
>> +        return false;
>> +    }
>> +
>> +    QLIST_FOREACH(c, &parent->children, next) {
>> +        if (bdrv_is_child_of(child, c->bs, min_level - 1)) {
>> +            return true;
>> +        }
>> +    }
>> +
>> +    return false;
>> +}
>> +
>>   /**
>>    * Iterates through the list of runtime option keys that are said to
>>    * be "strong" for a BDS.  An option is called "strong" if it changes
>> diff --git a/block/mirror.c b/block/mirror.c
>> index d877637e1e..f30a6933d8 100644
>> --- a/block/mirror.c
>> +++ b/block/mirror.c
>> @@ -701,7 +701,24 @@ static int mirror_exit_common(Job *job)
>>            * there.
>>            */
>>           if (bdrv_recurse_can_replace(src, to_replace)) {
>> -            bdrv_replace_node(to_replace, target_bs, &local_err);
>> +            /*
>> +             * It is OK for @to_replace to be an immediate child of
>> +             * @target_bs, because that is what happens with
>> +             * drive-mirror sync=none mode=absolute-paths: target_bs's
>> +             * backing file will be the source node, which is also
>> +             * to_replace (by default).
>> +             * bdrv_replace_node() handles this case by not letting
>> +             * target_bs->backing point to itself, but to the source
>> +             * still.
> 
> but here you said [1] is good

Well.  Not quite.

In [1] I say that sync=none mode=absolute-paths replaces=some-node will
not work as intended, because the target will not replace some-node.
(bdrv_replace_node() prevents that.)

Here, it’s about when there is no @replaces option.  Then, the target
will absolutely replace the source, and target->backing will point to
the source.

Here’s what happens (as far as I understand):

Without @replaces:

target->backing is first set to the source.

Then, target replaces the source node.  bdrv_replace_node() will not
create loops, so it keeps the target->backing link as it is.

With replaces=some-node (must be a (recursive) child of the source):

target->backing is first set to the source.

Then target replaces some-node.  bdrv_replace_node() will not create
loops, so it keeps all links that would connect the source and the
target as they are.  Thus, the target will actually not really replace
some-node.

Example:

quorum --children.0--> to-replace

Now we mirror from quorum to some target (target-node) with sync=none,
mode=absolute-paths, and replaces=to-replace.

What we’re effectively asking for is this:

quorum --children.0--> target-node
  ^                         |
  +---------backing---------+

Which of course doesn’t work.  It makes sense if you break up the loop
by imagining that quorum simply won’t read from children.0, but in
reality it is a loop, so it can’t work.

What happens though is this:

  quorum --children.0--> to-replace
     ^
  backing
     |
target-node

The target node does not replace to-replace, because changing the
children.0 link to point to target-node would create a loop.  Thus, the
user doesn’t get what they want.

(Note that if to-replace has other parents, target-node would absolutely
replace it for them.)

>> +             */
>> +            if (!bdrv_is_child_of(to_replace, target_bs, 2)) {
>> +                bdrv_replace_node(to_replace, target_bs, &local_err);
>> +            } else {
> 
> 
> So, we allow bdrv_replace_node automatically handle case whith immediate child of target is
> replaces.. But if consider several additional filters above target (so target is a filter),
> we not allow it. Why filtered case is worse?

Because if it’s an immediate child, there are no other nodes involved.
Just one link stays as it is (e.g. target->backing simply still points
to the source node).

If it’s an indirect child, then there are other nodes involved and it’s
likely the user won’t get what they want (as in the example above, where
the replacement just won’t happen).


The pragmatic explanation is “We need to allow immediate children,
because that’s the no-@replaces sync=none mode=absolute-paths case.  We
need to forbid everything else because it’d probably lead to unexpected
results.”

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 17/22] iotests: Add VM.assert_block_path()
  2019-09-26 14:07   ` Vladimir Sementsov-Ogievskiy
@ 2019-10-02 12:40     ` Max Reitz
  2019-10-02 13:51       ` Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 58+ messages in thread
From: Max Reitz @ 2019-10-02 12:40 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-block
  Cc: Kevin Wolf, Alberto Garcia, qemu-devel


[-- Attachment #1.1: Type: text/plain, Size: 4274 bytes --]

On 26.09.19 16:07, Vladimir Sementsov-Ogievskiy wrote:
> 20.09.2019 18:27, Max Reitz wrote:
>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>> ---
>>   tests/qemu-iotests/iotests.py | 48 +++++++++++++++++++++++++++++++++++
>>   1 file changed, 48 insertions(+)
>>
>> diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py
>> index daed4ee013..e6fb46287d 100644
>> --- a/tests/qemu-iotests/iotests.py
>> +++ b/tests/qemu-iotests/iotests.py
>> @@ -670,6 +670,54 @@ class VM(qtest.QEMUQtestMachine):
>>   
>>           return fields.items() <= ret.items()
>>   
>> +    '''
>> +    @path is a string whose components are separated by slashes.
>> +    The first component is a node name, the rest are child names.
>> +    Examples:
>> +      - "qcow2-node/backing/file"
>> +      - "quorum-node/children.2/file"
> 
> Possibly, separting node-name to first parameter and keeping child-path as
> a second will simplify code a bit, and be more useful for cases when on caller
> part node-name is in variable.

Sounds good.

>> +
>> +    @expected_node may be None.
>> +
>> +    @graph may be None or the result of an x-debug-query-block-graph
>> +    call that has already been performed.
>> +    '''
>> +    def assert_block_path(self, path, expected_node, graph=None):
>> +        if graph is None:
>> +            graph = self.qmp('x-debug-query-block-graph')['return']
> 
> Yay! I'm happy to see that it's useful.

:-)

It’s probably the best query function we have.

>> +
>> +        iter_path = iter(path.split('/'))
>> +        root = next(iter_path)
>> +        try:
>> +            node = next(node for node in graph['nodes'] if node['name'] == root)
>> +        except StopIteration:
>> +            node = None
> 
> for such usage next has second optional argument: next(iterator[, default])

Great!

> (don't think I teach you Python, actually you teach me, as before I didn't know
> correct way to search first element with condition)

We learn from one another, which is the best case.

>> +
>> +        for path_node in iter_path:
>> +            assert node is not None, 'Cannot follow path %s' % path
>> +
>> +            try:
>> +                node_id = next(edge['child'] for edge in graph['edges'] \
>> +                                             if edge['parent'] == node['id'] and
>> +                                                edge['name'] == path_node)
> 
> Hmm here you allow default StopIteration exception [1]
> 
> 
>> +
>> +                node = next(node for node in graph['nodes'] \
>> +                                 if node['id'] == node_id)
>> +            except StopIteration:
>> +                node = None
> 
> actually, I think this will never happen, so we may simplify code and allow it to
> throw StopIteration exception in this impossible case..

This is for a use case where the next child simply doesn’t exist, so you
can do:

assert_block_path('qcow2-node/backing', None)

To verify that the qcow2 node has no backing file.

>> +
>> +        assert node is not None or expected_node is None, \
>> +               'No node found under %s (but expected %s)' % \
>> +               (path, expected_node)
> 
> node may be None here only from last iteration, but it can't happen: if we have edge
> with child, we'll for sure have node with such node-name in graph

node will always be set by the try-except block, won’t it?

>> +
>> +        assert expected_node is not None or node is None, \
>> +               'Found node %s under %s (but expected none)' % \
>> +               (node['name'], path)
> 
> hmm, so expected_node=None means we want to prove that there is no such node? It should
> be mentioned in comment above the function. But this don't work due to [1]

Hm, I seem to remember I tested all cases locally and they all worked.

Max

>> +
>> +        if node is not None and expected_node is not None:
>> +            assert node['name'] == expected_node, \
>> +                   'Found node %s under %s (but expected %s)' % \
>> +                   (node['name'], path, expected_node)
>>   
>>   index_re = re.compile(r'([^\[]+)\[([^\]]+)\]')
>>   
>>
> 
> 



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 19/22] iotests: Use self.image_len in TestRepairQuorum
  2019-09-26 14:13   ` Vladimir Sementsov-Ogievskiy
@ 2019-10-02 12:42     ` Max Reitz
  0 siblings, 0 replies; 58+ messages in thread
From: Max Reitz @ 2019-10-02 12:42 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-block
  Cc: Kevin Wolf, Alberto Garcia, qemu-devel


[-- Attachment #1.1: Type: text/plain, Size: 1528 bytes --]

On 26.09.19 16:13, Vladimir Sementsov-Ogievskiy wrote:
> 20.09.2019 18:28, Max Reitz wrote:
>> 041's TestRepairQuorum has its own image_len, no need to refer to
>> TestSingleDrive.  (This patch allows uncommenting TestSingleDrive to
> 
> you mean commenting

Ah, yes, I mean commenting out. :-)

>> speed up 041 during test testing.)
> 
> we definitely want a way to run a subset of test cases.
> 
> I usually do s/def test/def ntest/, and then set needed test-case back to
> 'def test'

Thanks for that tip.

Max

>>
>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>> ---
>>   tests/qemu-iotests/041 | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/tests/qemu-iotests/041 b/tests/qemu-iotests/041
>> index ca126de3ff..20ae9750b7 100755
>> --- a/tests/qemu-iotests/041
>> +++ b/tests/qemu-iotests/041
>> @@ -880,7 +880,7 @@ class TestRepairQuorum(iotests.QMPTestCase):
>>           # Add each individual quorum images
>>           for i in self.IMAGES:
>>               qemu_img('create', '-f', iotests.imgfmt, i,
>> -                     str(TestSingleDrive.image_len))
>> +                     str(self.image_len))
> 
> yes, seems TestSingleDrive.image_len is a copy-pasting mistake here..
> 
>>               # Assign a node name to each quorum image in order to manipulate
>>               # them
>>               opts = "node-name=img%i" % self.IMAGES.index(i)
>>
> 
> 
> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> 



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 20/22] iotests: Add tests for invalid Quorum @replaces
  2019-09-26 14:30   ` Vladimir Sementsov-Ogievskiy
@ 2019-10-02 12:43     ` Max Reitz
  0 siblings, 0 replies; 58+ messages in thread
From: Max Reitz @ 2019-10-02 12:43 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-block
  Cc: Kevin Wolf, Alberto Garcia, qemu-devel


[-- Attachment #1.1: Type: text/plain, Size: 5716 bytes --]

On 26.09.19 16:30, Vladimir Sementsov-Ogievskiy wrote:
> 20.09.2019 18:28, Max Reitz wrote:
>> Add two tests to see that you cannot replace a Quorum child with the
>> mirror job while the child is in use by a different parent.
>>
>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>> ---
>>   tests/qemu-iotests/041     | 57 +++++++++++++++++++++++++++++++++++++-
>>   tests/qemu-iotests/041.out |  4 +--
>>   2 files changed, 58 insertions(+), 3 deletions(-)
>>
>> diff --git a/tests/qemu-iotests/041 b/tests/qemu-iotests/041
>> index 20ae9750b7..148dc47ce4 100755
>> --- a/tests/qemu-iotests/041
>> +++ b/tests/qemu-iotests/041
>> @@ -34,6 +34,8 @@ quorum_img3 = os.path.join(iotests.test_dir, 'quorum3.img')
>>   quorum_repair_img = os.path.join(iotests.test_dir, 'quorum_repair.img')
>>   quorum_snapshot_file = os.path.join(iotests.test_dir, 'quorum_snapshot.img')
>>   
>> +nbd_sock_path = os.path.join(iotests.test_dir, 'nbd.sock')
>> +
>>   class TestSingleDrive(iotests.QMPTestCase):
>>       image_len = 1 * 1024 * 1024 # MB
>>       qmp_cmd = 'drive-mirror'
>> @@ -901,7 +903,8 @@ class TestRepairQuorum(iotests.QMPTestCase):
>>   
>>       def tearDown(self):
>>           self.vm.shutdown()
>> -        for i in self.IMAGES + [ quorum_repair_img, quorum_snapshot_file ]:
>> +        for i in self.IMAGES + [ quorum_repair_img, quorum_snapshot_file,
>> +                                 nbd_sock_path ]:
>>               # Do a try/except because the test may have deleted some images
>>               try:
>>                   os.remove(i)
>> @@ -1075,6 +1078,58 @@ class TestRepairQuorum(iotests.QMPTestCase):
>>           self.assert_has_block_node("repair0", quorum_repair_img)
>>           self.vm.assert_block_path('quorum0/children.1', 'repair0')
>>   
>> +    '''
>> +    Check that we cannot replace a Quorum child when it has other
>> +    parents.
>> +    '''
> 
> you constantly use ''', when PEP8 recommends """ for doc-strings. I can't
> complain, as our python code is something not related to PEP8 unfortunately..

I just use what I see in existing code.

(And additionally, in scripting languages I tend to just use what works.)

>> +    def test_with_other_parent(self):
> 
> don't we need
>          if not iotests.supports_quorum():
>              return
> like in neighbors?

Good point.  Or a decorator, probably.

>> +        result = self.vm.qmp('nbd-server-start',
>> +                             addr={
>> +                                 'type': 'unix',
>> +                                 'data': {'path': nbd_sock_path}
>> +                             })
>> +        self.assert_qmp(result, 'return', {})
>> +
>> +        result = self.vm.qmp('nbd-server-add', device='img1')
>> +        self.assert_qmp(result, 'return', {})
>> +
>> +        result = self.vm.qmp('drive-mirror', job_id='mirror', device='quorum0',
>> +                             sync='full', node_name='repair0', replaces='img1',
>> +                             target=quorum_repair_img, format=iotests.imgfmt)
>> +        self.assert_qmp(result, 'error/desc',
>> +                        "Cannot replace 'img1' by a node mirrored from "
>> +                        "'quorum0', because it cannot be guaranteed that doing "
>> +                        "so would not lead to an abrupt change of visible data")
>> +
>> +    '''
>> +    The same as test_with_other_parent(), but add the NBD server only
>> +    when the mirror job is already running.
>> +    '''
>> +    def test_with_other_parents_after_mirror_start(self):
>> +        result = self.vm.qmp('nbd-server-start',
>> +                             addr={
>> +                                 'type': 'unix',
>> +                                 'data': {'path': nbd_sock_path}
>> +                             })
>> +        self.assert_qmp(result, 'return', {})
>> +
>> +        result = self.vm.qmp('drive-mirror', job_id='mirror', device='quorum0',
>> +                             sync='full', node_name='repair0', replaces='img1',
>> +                             target=quorum_repair_img, format=iotests.imgfmt)
>> +        self.assert_qmp(result, 'return', {})
>> +
>> +        result = self.vm.qmp('nbd-server-add', device='img1')
>> +        self.assert_qmp(result, 'return', {})
>> +
>> +        # The full error message goes to stderr, so we unfortunately
>> +        # cannot check it here
> 
> We can, iotests 169 and 245 do it with help of vm.get_log()

Uh, thanks, I’ll look into it.

Max

>> +        self.complete_and_wait('mirror',
>> +                               completion_error='Operation not permitted')
>> +
>> +        # Should not have been replaced
>> +        self.vm.assert_block_path('quorum0/children.1', 'img1')
>> +
>> +
>>   # Test mirroring with a source that does not have any parents (not even a
>>   # BlockBackend)
>>   class TestOrphanedSource(iotests.QMPTestCase):
>> diff --git a/tests/qemu-iotests/041.out b/tests/qemu-iotests/041.out
>> index f496be9197..ffc779b4d1 100644
>> --- a/tests/qemu-iotests/041.out
>> +++ b/tests/qemu-iotests/041.out
>> @@ -1,5 +1,5 @@
>> -...........................................................................................
>> +.............................................................................................
>>   ----------------------------------------------------------------------
>> -Ran 91 tests
>> +Ran 93 tests
>>   
>>   OK
>>
> 
> With supports_quorum checked like in neighbor test-cases (or use @iotests.skip_if_unsupported):
> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> 



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 22/22] iotests: Mirror must not attempt to create loops
  2019-09-26 15:03   ` Vladimir Sementsov-Ogievskiy
@ 2019-10-02 12:46     ` Max Reitz
  0 siblings, 0 replies; 58+ messages in thread
From: Max Reitz @ 2019-10-02 12:46 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-block
  Cc: Kevin Wolf, Alberto Garcia, qemu-devel


[-- Attachment #1.1: Type: text/plain, Size: 9424 bytes --]

On 26.09.19 17:03, Vladimir Sementsov-Ogievskiy wrote:
> 20.09.2019 18:28, Max Reitz wrote:
>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>> ---
>>   tests/qemu-iotests/041     | 152 +++++++++++++++++++++++++++++++++++++
>>   tests/qemu-iotests/041.out |   4 +-
>>   2 files changed, 154 insertions(+), 2 deletions(-)
>>
>> diff --git a/tests/qemu-iotests/041 b/tests/qemu-iotests/041
>> index e4cc829fe2..6ea4764ae8 100755
>> --- a/tests/qemu-iotests/041
>> +++ b/tests/qemu-iotests/041
>> @@ -1265,6 +1265,158 @@ class TestReplaces(iotests.QMPTestCase):
>>   
>>           self.vm.assert_block_path('filter0/file', 'target')
>>   
>> +    '''
>> +    See what happens when the @sync/@replaces configuration dictates
>> +    creating a loop.
>> +    '''
>> +    def test_loop(self):
>> +        qemu_img('create', '-f', iotests.imgfmt, test_img, str(1 * 1024 * 1024))
>> +
>> +        # Dummy group so we can create a NOP filter
>> +        result = self.vm.qmp('object-add', qom_type='throttle-group', id='tg0')
>> +        self.assert_qmp(result, 'return', {})
>> +
>> +        result = self.vm.qmp('blockdev-add', **{
>> +                                 'driver': 'throttle',
>> +                                 'node-name': 'source',
>> +                                 'throttle-group': 'tg0',
>> +                                 'file': {
>> +                                     'driver': iotests.imgfmt,
>> +                                     'node-name': 'filtered',
>> +                                     'file': {
>> +                                         'driver': 'file',
>> +                                         'filename': test_img
>> +                                     }
>> +                                 }
>> +                             })
>> +        self.assert_qmp(result, 'return', {})
>> +
>> +        # Block graph is now:
>> +        #   source[throttle] --file--> filtered[qcow2] --file--> ...
> 
> or qed, actually

Yep.

>> +
>> +        result = self.vm.qmp('drive-mirror', job_id='mirror', device='source',
>> +                             target=target_img, format=iotests.imgfmt,
>> +                             node_name='target', sync='none',
>> +                             replaces='filtered')
>> +
>> +        '''
>> +        Block graph before mirror exits would be (ignoring mirror_top):
>> +          source[throttle] --file--> filtered[qcow2] --file--> ...
>> +          target[qcow2] --file--> ...
>> +
>> +        Then, because of sync=none and drive-mirror in absolute-paths mode,
>> +        the source is attached to the target:
>> +          source[throttle] --file--> filtered[qcow2] --file--> ...
>> +                 ^
>                      |
>> +              backing
>> +                 |
>> +            target[qcow2] --file--> ...
>> +
>> +        Replacing filtered by target would yield:
>> +          source[throttle] --file--> target[qcow2] --file--> ...
>> +                 ^                        |
>> +                 +------- backing --------+
>> +
>> +        I.e., a loop.  bdrv_replace_node() detects this and simply
>> +        does not let source's file link point to target.  However,
>> +        that means that target cannot really replace source.
>> +
>> +        drive-mirror should detect this and not allow this case.
>> +        '''
>> +
>> +        self.assert_qmp(result, 'error/desc',
>> +                        "Replacing 'filtered' by 'target' with this sync " + \
>> +                        "mode would result in a loop, because the former " + \
>> +                        "would be a child of the latter's backing file " + \
>> +                        "('source') after the mirror job")
>> +
>> +    '''
>> +    Test what happens when there would be no loop with the pre-mirror
>> +    configuration, but something changes during the mirror job that asks
>> +    for a loop to be created during completion.
>> +    '''
>> +    def test_loop_during_mirror(self):
>> +        qemu_img('create', '-f', iotests.imgfmt, test_img, str(1 * 1024 * 1024))
>> +
>> +        result = self.vm.qmp('blockdev-add', **{
>> +                                 'driver': 'null-co',
>> +                                 'node-name': 'common-base',
>> +                                 'read-zeroes': True,
> 
> why do you need read-zeroes?

It’s my understanding that we’d better always set it to true.

>> +                                 'size': 1 * 1024 * 1024
>> +                             })
>> +        self.assert_qmp(result, 'return', {})
>> +
>> +        result = self.vm.qmp('blockdev-add', **{
>> +                                 'driver': 'copy-on-read',
>> +                                 'node-name': 'source',
>> +                                 'file': 'common-base'
>> +                             })
>> +        self.assert_qmp(result, 'return', {})
> 
> Hmm, why don't you create them both in one query?

No good reason, I think.

>> +
>> +        '''
> 
> the following is hard to read without some hint like, "We are going to ..."

I’ll see what I can come up with.

>> +        x-blockdev-change can only add children to a quorum node that
>> +        have no parent yet, so we need an intermediate node between
>> +        target and common-base that has no parents other than target.
>> +        We cannot use any parent that would forward the RESIZE
>> +        permission (because the job takes it, but unshares it on the
>> +        source), so we make it a backing child of a qcow2 node.
>> +        Unfortunately, we cannot add backing files to Quorum nodes
>> +        (because of an op blocker), so we put another raw node between
>> +        the qcow2 node and common-base.
>> +        '''
>> +        result = self.vm.qmp('blockdev-add', **{
>> +                                 'driver': 'qcow2',
>> +                                 'node-name': 'base-parent',
>> +                                 'file': {
>> +                                     'driver': 'file',
>> +                                     'filename': test_img
>> +                                 },
>> +                                 'backing': {
>> +                                     'driver': 'raw',
>> +                                     'file': 'common-base'
>> +                                 }
>> +                             })
>> +
>> +        # Add a quorum node with a single child, we will add
>> +        # base-parent to prepare a loop later
>> +        result = self.vm.qmp('blockdev-add', **{
>> +                                 'driver': 'quorum',
> 
> It would be good to skip test-cases if quorum unsupported, like other test-cases
> with quorum.

Will do.

>> +                                 'node-name': 'target',
>> +                                 'vote-threshold': 1,
>> +                                 'children': [
>> +                                     {
>> +                                         'driver': 'null-co',
>> +                                         'read-zeroes': True
>> +                                     }
>> +                                 ]
>> +                             })
>> +        self.assert_qmp(result, 'return', {})
> 
> It would be nice to comment out current block graph here...

OK.

>> +
>> +        result = self.vm.qmp('blockdev-mirror', job_id='mirror',
>> +                             device='source', target='target', sync='full',
>> +                             replaces='common-base')
>> +        self.assert_qmp(result, 'return', {})
>> +
>> +        result = self.vm.qmp('x-blockdev-change',
>> +                             parent='target', node='base-parent')
>> +        self.assert_qmp(result, 'return', {})
>> +
>> +        '''
> 
> and here, like you do in previous test-case. And here it even nicer, as this test-case
> is more complex.

OK.

>> +        This asks for this configuration post-mirror:
>> +
>> +        source -> target (replaced common-base) -> base-parent
>> +                                  ^                    |
>> +                                  |                    v
>> +                                  +----------------- [raw]
>> +
>> +        bdrv_replace_node() would not allow such a configuration, but
>> +        we should not pretend we can create it, so the mirror job
>> +        should fail during completion.
>> +        '''
>> +
>> +        self.complete_and_wait('mirror',
>> +                               completion_error='Operation not permitted')
>> +
>>   if __name__ == '__main__':
>>       iotests.main(supported_fmts=['qcow2', 'qed'],
>>                    supported_protocols=['file'])
>> diff --git a/tests/qemu-iotests/041.out b/tests/qemu-iotests/041.out
>> index 877b76fd31..20a8158b99 100644
>> --- a/tests/qemu-iotests/041.out
>> +++ b/tests/qemu-iotests/041.out
>> @@ -1,5 +1,5 @@
>> -..............................................................................................
>> +................................................................................................
>>   ----------------------------------------------------------------------
>> -Ran 94 tests
>> +Ran 96 tests
>>   
>>   OK
>>
> 
> 



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 17/22] iotests: Add VM.assert_block_path()
  2019-10-02 12:40     ` Max Reitz
@ 2019-10-02 13:51       ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 58+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2019-10-02 13:51 UTC (permalink / raw)
  To: Max Reitz, qemu-block; +Cc: Kevin Wolf, Alberto Garcia, qemu-devel

02.10.2019 15:40, Max Reitz wrote:
> On 26.09.19 16:07, Vladimir Sementsov-Ogievskiy wrote:
>> 20.09.2019 18:27, Max Reitz wrote:
>>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>>> ---
>>>    tests/qemu-iotests/iotests.py | 48 +++++++++++++++++++++++++++++++++++
>>>    1 file changed, 48 insertions(+)
>>>
>>> diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py
>>> index daed4ee013..e6fb46287d 100644
>>> --- a/tests/qemu-iotests/iotests.py
>>> +++ b/tests/qemu-iotests/iotests.py
>>> @@ -670,6 +670,54 @@ class VM(qtest.QEMUQtestMachine):
>>>    
>>>            return fields.items() <= ret.items()
>>>    
>>> +    '''
>>> +    @path is a string whose components are separated by slashes.
>>> +    The first component is a node name, the rest are child names.
>>> +    Examples:
>>> +      - "qcow2-node/backing/file"
>>> +      - "quorum-node/children.2/file"
>>
>> Possibly, separting node-name to first parameter and keeping child-path as
>> a second will simplify code a bit, and be more useful for cases when on caller
>> part node-name is in variable.
> 
> Sounds good.
> 
>>> +
>>> +    @expected_node may be None.
>>> +
>>> +    @graph may be None or the result of an x-debug-query-block-graph
>>> +    call that has already been performed.
>>> +    '''
>>> +    def assert_block_path(self, path, expected_node, graph=None):
>>> +        if graph is None:
>>> +            graph = self.qmp('x-debug-query-block-graph')['return']
>>
>> Yay! I'm happy to see that it's useful.
> 
> :-)
> 
> It’s probably the best query function we have.
> 
>>> +
>>> +        iter_path = iter(path.split('/'))
>>> +        root = next(iter_path)
>>> +        try:
>>> +            node = next(node for node in graph['nodes'] if node['name'] == root)
>>> +        except StopIteration:
>>> +            node = None
>>
>> for such usage next has second optional argument: next(iterator[, default])
> 
> Great!
> 
>> (don't think I teach you Python, actually you teach me, as before I didn't know
>> correct way to search first element with condition)
> 
> We learn from one another, which is the best case.
> 
>>> +
>>> +        for path_node in iter_path:
>>> +            assert node is not None, 'Cannot follow path %s' % path
>>> +
>>> +            try:
>>> +                node_id = next(edge['child'] for edge in graph['edges'] \
>>> +                                             if edge['parent'] == node['id'] and
>>> +                                                edge['name'] == path_node)
>>
>> Hmm here you allow default StopIteration exception [1]

brr, I just mistaken here: we are in same try-except as the following line, and we'll
catch it of course

>>
>>
>>> +
>>> +                node = next(node for node in graph['nodes'] \
>>> +                                 if node['id'] == node_id)
>>> +            except StopIteration:
>>> +                node = None
>>
>> actually, I think this will never happen, so we may simplify code and allow it to
>> throw StopIteration exception in this impossible case..
> 
> This is for a use case where the next child simply doesn’t exist, so you
> can do:
> 
> assert_block_path('qcow2-node/backing', None)
> 
> To verify that the qcow2 node has no backing file.
> 
>>> +
>>> +        assert node is not None or expected_node is None, \
>>> +               'No node found under %s (but expected %s)' % \
>>> +               (path, expected_node)
>>
>> node may be None here only from last iteration, but it can't happen: if we have edge
>> with child, we'll for sure have node with such node-name in graph
> 
> node will always be set by the try-except block, won’t it?
> 
>>> +
>>> +        assert expected_node is not None or node is None, \
>>> +               'Found node %s under %s (but expected none)' % \
>>> +               (node['name'], path)
>>
>> hmm, so expected_node=None means we want to prove that there is no such node? It should
>> be mentioned in comment above the function. But this don't work due to [1]
> 
> Hm, I seem to remember I tested all cases locally and they all worked.
> 
> Max
> 
>>> +
>>> +        if node is not None and expected_node is not None:
>>> +            assert node['name'] == expected_node, \
>>> +                   'Found node %s under %s (but expected %s)' % \
>>> +                   (node['name'], path, expected_node)
>>>    
>>>    index_re = re.compile(r'([^\[]+)\[([^\]]+)\]')
>>>    
>>>
>>
>>
> 
> 


-- 
Best regards,
Vladimir

^ permalink raw reply	[flat|nested] 58+ messages in thread

end of thread, other threads:[~2019-10-02 13:51 UTC | newest]

Thread overview: 58+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-09-20 15:27 [PATCH 00/22] block: Fix check_to_replace_node() Max Reitz
2019-09-20 15:27 ` [PATCH 01/22] blockdev: Allow external snapshots everywhere Max Reitz
2019-09-25 11:45   ` Vladimir Sementsov-Ogievskiy
2019-09-20 15:27 ` [PATCH 02/22] blockdev: Allow resizing everywhere Max Reitz
2019-09-25 11:46   ` Vladimir Sementsov-Ogievskiy
2019-09-20 15:27 ` [PATCH 03/22] block: Drop bdrv_is_first_non_filter() Max Reitz
2019-09-25 11:46   ` Vladimir Sementsov-Ogievskiy
2019-09-20 15:27 ` [PATCH 04/22] iotests: Let 041 use -blockdev for quorum children Max Reitz
2019-09-25 11:51   ` Vladimir Sementsov-Ogievskiy
2019-09-20 15:27 ` [PATCH 05/22] quorum: Fix child permissions Max Reitz
2019-09-25 11:56   ` Vladimir Sementsov-Ogievskiy
2019-09-26 11:02     ` Max Reitz
2019-09-20 15:27 ` [PATCH 06/22] block: Add bdrv_recurse_can_replace() Max Reitz
2019-09-25 12:39   ` Vladimir Sementsov-Ogievskiy
2019-09-26 11:03     ` Max Reitz
2019-09-20 15:27 ` [PATCH 07/22] blkverify: Implement .bdrv_recurse_can_replace() Max Reitz
2019-09-25 12:56   ` Vladimir Sementsov-Ogievskiy
2019-09-26 11:06     ` Max Reitz
2019-09-20 15:27 ` [PATCH 08/22] quorum: Store children in own structure Max Reitz
2019-09-25 13:21   ` Vladimir Sementsov-Ogievskiy
2019-09-26 11:07     ` Max Reitz
2019-09-20 15:27 ` [PATCH 09/22] quorum: Add QuorumChild.to_be_replaced Max Reitz
2019-09-25 13:45   ` Vladimir Sementsov-Ogievskiy
2019-09-26 11:13     ` Max Reitz
2019-09-20 15:27 ` [PATCH 10/22] quorum: Implement .bdrv_recurse_can_replace() Max Reitz
2019-09-25 13:39   ` Vladimir Sementsov-Ogievskiy
2019-09-26 11:12     ` Max Reitz
2019-09-25 14:12   ` Vladimir Sementsov-Ogievskiy
2019-09-26 11:17     ` Max Reitz
2019-09-25 14:14   ` Vladimir Sementsov-Ogievskiy
2019-09-20 15:27 ` [PATCH 11/22] block: Use bdrv_recurse_can_replace() Max Reitz
2019-09-20 15:27 ` [PATCH 12/22] block: Remove bdrv_recurse_is_first_non_filter() Max Reitz
2019-09-25 14:16   ` Vladimir Sementsov-Ogievskiy
2019-09-20 15:27 ` [PATCH 13/22] mirror: Double-check immediately before replacing Max Reitz
2019-09-20 15:27 ` [PATCH 14/22] quorum: Stop marking it as a filter Max Reitz
2019-09-26 12:43   ` Vladimir Sementsov-Ogievskiy
2019-09-20 15:27 ` [PATCH 15/22] mirror: Prevent loops Max Reitz
2019-09-26 13:08   ` Vladimir Sementsov-Ogievskiy
2019-10-02 12:36     ` Max Reitz
2019-09-20 15:27 ` [PATCH 16/22] iotests: Use complete_and_wait() in 155 Max Reitz
2019-09-26 13:11   ` Vladimir Sementsov-Ogievskiy
2019-09-20 15:27 ` [PATCH 17/22] iotests: Add VM.assert_block_path() Max Reitz
2019-09-26 14:07   ` Vladimir Sementsov-Ogievskiy
2019-10-02 12:40     ` Max Reitz
2019-10-02 13:51       ` Vladimir Sementsov-Ogievskiy
2019-09-20 15:28 ` [PATCH 18/22] iotests: Resolve TODOs in 041 Max Reitz
2019-09-26 14:09   ` Vladimir Sementsov-Ogievskiy
2019-09-20 15:28 ` [PATCH 19/22] iotests: Use self.image_len in TestRepairQuorum Max Reitz
2019-09-26 14:13   ` Vladimir Sementsov-Ogievskiy
2019-10-02 12:42     ` Max Reitz
2019-09-20 15:28 ` [PATCH 20/22] iotests: Add tests for invalid Quorum @replaces Max Reitz
2019-09-26 14:30   ` Vladimir Sementsov-Ogievskiy
2019-10-02 12:43     ` Max Reitz
2019-09-20 15:28 ` [PATCH 21/22] iotests: Check that @replaces can replace filters Max Reitz
2019-09-26 14:42   ` Vladimir Sementsov-Ogievskiy
2019-09-20 15:28 ` [PATCH 22/22] iotests: Mirror must not attempt to create loops Max Reitz
2019-09-26 15:03   ` Vladimir Sementsov-Ogievskiy
2019-10-02 12:46     ` Max Reitz

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).