qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [PATCH for-5.0 v2 00/23] block: Fix check_to_replace_node()
@ 2019-11-11 16:01 Max Reitz
  2019-11-11 16:01 ` [PATCH for-5.0 v2 01/23] blockdev: Allow external snapshots everywhere Max Reitz
                   ` (23 more replies)
  0 siblings, 24 replies; 75+ messages in thread
From: Max Reitz @ 2019-11-11 16:01 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, Alberto Garcia,
	qemu-devel, Max Reitz

Based-on: <20191108123455.39445-1-mreitz@redhat.com>
(“iotests: Test failing mirror complete”)

(Because both add cases to 041.)


Hi,

For what this series does, see the cover letter of v1:

https://lists.nongnu.org/archive/html/qemu-block/2019-09/msg01027.html


Now, in v2 I’ve addressed Vladimir’s comments:
- Patch 5: Extend explanation in the commit message
- Patch 6: Prefer driver-specific .bdrv_recurse_can_replace()
           implementation before the generic one for filters
- Patch 8: Some more s/BdrvChild \*/QuorumChild/
- Patch 15: Fix typo in the commit message
- Patch 17: Added
- Patch 18:
  - Split @path into @root + @path
  - In one instance, use x = next(y, z) instead of
    try: x = next(y); except StopIteration: x = z;
  - %s/'''/"""/
- Patch 19: Fallout from the patch 18 changes
- Patch 20: Fix in the commit message (uncommenting -> commenting out)
- Patch 21:
  - Check full stderr message by inspecting the VM log
  - Fallout from the patch 18 changes
  - %s/'''/"""/
- Patch 22:
  - Skip case if COR is unsupported
  - Fallout from the patch 18 changes
  - %s/'''/"""/
- Patch 23:
  - Added more comments
  - Skip cases if throttle/COR/quorum (as appropriate) is unsupported
  - Use imgfmt instead of hard-coding qcow2
  - Fallout from the patch 18 changes
  - %s/'''/"""/


git-backport-diff against v1:

Key:
[----] : patches are identical
[####] : number of functional differences between upstream/downstream patch
[down] : patch is downstream-only
The flags [FC] indicate (F)unctional and (C)ontextual differences, respectively

001/23:[----] [--] 'blockdev: Allow external snapshots everywhere'
002/23:[----] [--] 'blockdev: Allow resizing everywhere'
003/23:[----] [--] 'block: Drop bdrv_is_first_non_filter()'
004/23:[----] [--] 'iotests: Let 041 use -blockdev for quorum children'
005/23:[----] [--] 'quorum: Fix child permissions'
006/23:[0012] [FC] 'block: Add bdrv_recurse_can_replace()'
007/23:[----] [--] 'blkverify: Implement .bdrv_recurse_can_replace()'
008/23:[0006] [FC] 'quorum: Store children in own structure'
009/23:[----] [--] 'quorum: Add QuorumChild.to_be_replaced'
010/23:[----] [--] 'quorum: Implement .bdrv_recurse_can_replace()'
011/23:[----] [--] 'block: Use bdrv_recurse_can_replace()'
012/23:[----] [--] 'block: Remove bdrv_recurse_is_first_non_filter()'
013/23:[----] [--] 'mirror: Double-check immediately before replacing'
014/23:[----] [--] 'quorum: Stop marking it as a filter'
015/23:[----] [--] 'mirror: Prevent loops'
016/23:[----] [--] 'iotests: Use complete_and_wait() in 155'
017/23:[down] 'iotests: Use skip_if_unsupported decorator in 041'
018/23:[0037] [FC] 'iotests: Add VM.assert_block_path()'
019/23:[0004] [FC] 'iotests: Resolve TODOs in 041'
020/23:[----] [--] 'iotests: Use self.image_len in TestRepairQuorum'
021/23:[0027] [FC] 'iotests: Add tests for invalid Quorum @replaces'
022/23:[0007] [FC] 'iotests: Check that @replaces can replace filters'
023/23:[0141] [FC] 'iotests: Mirror must not attempt to create loops'


Max Reitz (23):
  blockdev: Allow external snapshots everywhere
  blockdev: Allow resizing everywhere
  block: Drop bdrv_is_first_non_filter()
  iotests: Let 041 use -blockdev for quorum children
  quorum: Fix child permissions
  block: Add bdrv_recurse_can_replace()
  blkverify: Implement .bdrv_recurse_can_replace()
  quorum: Store children in own structure
  quorum: Add QuorumChild.to_be_replaced
  quorum: Implement .bdrv_recurse_can_replace()
  block: Use bdrv_recurse_can_replace()
  block: Remove bdrv_recurse_is_first_non_filter()
  mirror: Double-check immediately before replacing
  quorum: Stop marking it as a filter
  mirror: Prevent loops
  iotests: Use complete_and_wait() in 155
  iotests: Use skip_if_unsupported decorator in 041
  iotests: Add VM.assert_block_path()
  iotests: Resolve TODOs in 041
  iotests: Use self.image_len in TestRepairQuorum
  iotests: Add tests for invalid Quorum @replaces
  iotests: Check that @replaces can replace filters
  iotests: Mirror must not attempt to create loops

 block.c                       | 115 ++++++----
 block/blkverify.c             |  20 +-
 block/copy-on-read.c          |   9 -
 block/mirror.c                |  31 ++-
 block/quorum.c                | 161 +++++++++++---
 block/replication.c           |   7 -
 block/throttle.c              |   8 -
 blockdev.c                    |  58 ++++-
 include/block/block.h         |   5 -
 include/block/block_int.h     |  19 +-
 tests/qemu-iotests/041        | 402 ++++++++++++++++++++++++++++++----
 tests/qemu-iotests/041.out    |   4 +-
 tests/qemu-iotests/155        |   7 +-
 tests/qemu-iotests/iotests.py |  59 +++++
 14 files changed, 715 insertions(+), 190 deletions(-)

-- 
2.23.0



^ permalink raw reply	[flat|nested] 75+ messages in thread

* [PATCH for-5.0 v2 01/23] blockdev: Allow external snapshots everywhere
  2019-11-11 16:01 [PATCH for-5.0 v2 00/23] block: Fix check_to_replace_node() Max Reitz
@ 2019-11-11 16:01 ` Max Reitz
  2019-11-11 16:01 ` [PATCH for-5.0 v2 02/23] blockdev: Allow resizing everywhere Max Reitz
                   ` (22 subsequent siblings)
  23 siblings, 0 replies; 75+ messages in thread
From: Max Reitz @ 2019-11-11 16:01 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, Alberto Garcia,
	qemu-devel, Max Reitz

There is no good reason why we would allow external snapshots only on
the first non-filter node in a chain.  Parent BDSs should not care
whether their child is replaced by a snapshot.  (If they do care, they
should announce that via freezing the chain, which is checked in
bdrv_append() through bdrv_set_backing_hd().)

Before we had bdrv_is_first_non_filter() here (since 212a5a8f095), there
was a special function bdrv_check_ext_snapshot() that allowed snapshots
by default, but block drivers could override this.  Only blkverify did
so, however.

It is not clear to me why blkverify would do so; maybe just so that the
testee block driver would not be replaced.  The introducing commit
f6186f49e2c does not explain why.  Maybe because 08b24cfe376 would have
been the correct solution?  (Which adds a .supports_backing check.)

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 blockdev.c | 5 -----
 1 file changed, 5 deletions(-)

diff --git a/blockdev.c b/blockdev.c
index 8e029e9c01..ab78230d23 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -1595,11 +1595,6 @@ static void external_snapshot_prepare(BlkActionState *common,
         }
     }
 
-    if (!bdrv_is_first_non_filter(state->old_bs)) {
-        error_setg(errp, QERR_FEATURE_DISABLED, "snapshot");
-        goto out;
-    }
-
     if (action->type == TRANSACTION_ACTION_KIND_BLOCKDEV_SNAPSHOT_SYNC) {
         BlockdevSnapshotSync *s = action->u.blockdev_snapshot_sync.data;
         const char *format = s->has_format ? s->format : "qcow2";
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH for-5.0 v2 02/23] blockdev: Allow resizing everywhere
  2019-11-11 16:01 [PATCH for-5.0 v2 00/23] block: Fix check_to_replace_node() Max Reitz
  2019-11-11 16:01 ` [PATCH for-5.0 v2 01/23] blockdev: Allow external snapshots everywhere Max Reitz
@ 2019-11-11 16:01 ` Max Reitz
  2019-12-06 14:04   ` Alberto Garcia
  2019-11-11 16:01 ` [PATCH for-5.0 v2 03/23] block: Drop bdrv_is_first_non_filter() Max Reitz
                   ` (21 subsequent siblings)
  23 siblings, 1 reply; 75+ messages in thread
From: Max Reitz @ 2019-11-11 16:01 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, Alberto Garcia,
	qemu-devel, Max Reitz

Block nodes that do not allow resizing should not share BLK_PERM_RESIZE.
It does not matter whether they are the first non-filter in their chain
or not.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 blockdev.c | 5 -----
 1 file changed, 5 deletions(-)

diff --git a/blockdev.c b/blockdev.c
index ab78230d23..9dc2238bf3 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -3177,11 +3177,6 @@ void qmp_block_resize(bool has_device, const char *device,
     aio_context = bdrv_get_aio_context(bs);
     aio_context_acquire(aio_context);
 
-    if (!bdrv_is_first_non_filter(bs)) {
-        error_setg(errp, QERR_FEATURE_DISABLED, "resize");
-        goto out;
-    }
-
     if (size < 0) {
         error_setg(errp, QERR_INVALID_PARAMETER_VALUE, "size", "a >0 size");
         goto out;
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH for-5.0 v2 03/23] block: Drop bdrv_is_first_non_filter()
  2019-11-11 16:01 [PATCH for-5.0 v2 00/23] block: Fix check_to_replace_node() Max Reitz
  2019-11-11 16:01 ` [PATCH for-5.0 v2 01/23] blockdev: Allow external snapshots everywhere Max Reitz
  2019-11-11 16:01 ` [PATCH for-5.0 v2 02/23] blockdev: Allow resizing everywhere Max Reitz
@ 2019-11-11 16:01 ` Max Reitz
  2019-11-11 16:01 ` [PATCH for-5.0 v2 04/23] iotests: Let 041 use -blockdev for quorum children Max Reitz
                   ` (20 subsequent siblings)
  23 siblings, 0 replies; 75+ messages in thread
From: Max Reitz @ 2019-11-11 16:01 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, Alberto Garcia,
	qemu-devel, Max Reitz

It is unused now.  (And it was ugly because it needed to explore all BDS
chains from the top.)

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 block.c               | 26 --------------------------
 include/block/block.h |  1 -
 2 files changed, 27 deletions(-)

diff --git a/block.c b/block.c
index ae279ff21f..9b1049786a 100644
--- a/block.c
+++ b/block.c
@@ -6205,32 +6205,6 @@ bool bdrv_recurse_is_first_non_filter(BlockDriverState *bs,
     return false;
 }
 
-/* This function checks if the candidate is the first non filter bs down it's
- * bs chain. Since we don't have pointers to parents it explore all bs chains
- * from the top. Some filters can choose not to pass down the recursion.
- */
-bool bdrv_is_first_non_filter(BlockDriverState *candidate)
-{
-    BlockDriverState *bs;
-    BdrvNextIterator it;
-
-    /* walk down the bs forest recursively */
-    for (bs = bdrv_first(&it); bs; bs = bdrv_next(&it)) {
-        bool perm;
-
-        /* try to recurse in this top level bs */
-        perm = bdrv_recurse_is_first_non_filter(bs, candidate);
-
-        /* candidate is the first non filter */
-        if (perm) {
-            bdrv_next_cleanup(&it);
-            return true;
-        }
-    }
-
-    return false;
-}
-
 BlockDriverState *check_to_replace_node(BlockDriverState *parent_bs,
                                         const char *node_name, Error **errp)
 {
diff --git a/include/block/block.h b/include/block/block.h
index e9dcfef7fa..8f6a0cad9c 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -404,7 +404,6 @@ int bdrv_amend_options(BlockDriverState *bs_new, QemuOpts *opts,
 /* external snapshots */
 bool bdrv_recurse_is_first_non_filter(BlockDriverState *bs,
                                       BlockDriverState *candidate);
-bool bdrv_is_first_non_filter(BlockDriverState *candidate);
 
 /* check if a named node can be replaced when doing drive-mirror */
 BlockDriverState *check_to_replace_node(BlockDriverState *parent_bs,
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH for-5.0 v2 04/23] iotests: Let 041 use -blockdev for quorum children
  2019-11-11 16:01 [PATCH for-5.0 v2 00/23] block: Fix check_to_replace_node() Max Reitz
                   ` (2 preceding siblings ...)
  2019-11-11 16:01 ` [PATCH for-5.0 v2 03/23] block: Drop bdrv_is_first_non_filter() Max Reitz
@ 2019-11-11 16:01 ` Max Reitz
  2019-11-11 16:01 ` [PATCH for-5.0 v2 05/23] quorum: Fix child permissions Max Reitz
                   ` (19 subsequent siblings)
  23 siblings, 0 replies; 75+ messages in thread
From: Max Reitz @ 2019-11-11 16:01 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, Alberto Garcia,
	qemu-devel, Max Reitz

Using -drive with default options means that a virtio-blk drive will be
created that has write access to the to-be quorum children.  Quorum
should have exclusive write access to them, so we should use -blockdev
instead.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 tests/qemu-iotests/041 | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/tests/qemu-iotests/041 b/tests/qemu-iotests/041
index d7be30b62b..3c60c07b01 100755
--- a/tests/qemu-iotests/041
+++ b/tests/qemu-iotests/041
@@ -884,7 +884,10 @@ class TestRepairQuorum(iotests.QMPTestCase):
             # Assign a node name to each quorum image in order to manipulate
             # them
             opts = "node-name=img%i" % self.IMAGES.index(i)
-            self.vm = self.vm.add_drive(i, opts)
+            opts += ',driver=%s' % iotests.imgfmt
+            opts += ',file.driver=file'
+            opts += ',file.filename=%s' % i
+            self.vm = self.vm.add_blockdev(opts)
 
         self.vm.launch()
 
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH for-5.0 v2 05/23] quorum: Fix child permissions
  2019-11-11 16:01 [PATCH for-5.0 v2 00/23] block: Fix check_to_replace_node() Max Reitz
                   ` (3 preceding siblings ...)
  2019-11-11 16:01 ` [PATCH for-5.0 v2 04/23] iotests: Let 041 use -blockdev for quorum children Max Reitz
@ 2019-11-11 16:01 ` Max Reitz
  2019-11-29  9:14   ` Vladimir Sementsov-Ogievskiy
  2019-11-11 16:01 ` [PATCH for-5.0 v2 06/23] block: Add bdrv_recurse_can_replace() Max Reitz
                   ` (18 subsequent siblings)
  23 siblings, 1 reply; 75+ messages in thread
From: Max Reitz @ 2019-11-11 16:01 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, Alberto Garcia,
	qemu-devel, Max Reitz

Quorum cannot share WRITE or RESIZE on its children.  Presumably, it
only does so because as a filter, it seemed intuitively correct to point
its .bdrv_child_perm to bdrv_filter_default_perm().

However, it is not really a filter, and bdrv_filter_default_perm() does
not work for it, so we have to provide a custom .bdrv_child_perm
implementation.

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 block/quorum.c | 19 ++++++++++++++++++-
 1 file changed, 18 insertions(+), 1 deletion(-)

diff --git a/block/quorum.c b/block/quorum.c
index df68adcfaa..17b439056f 100644
--- a/block/quorum.c
+++ b/block/quorum.c
@@ -1114,6 +1114,23 @@ static char *quorum_dirname(BlockDriverState *bs, Error **errp)
     return NULL;
 }
 
+static void quorum_child_perm(BlockDriverState *bs, BdrvChild *c,
+                              const BdrvChildRole *role,
+                              BlockReopenQueue *reopen_queue,
+                              uint64_t perm, uint64_t shared,
+                              uint64_t *nperm, uint64_t *nshared)
+{
+    *nperm = perm & DEFAULT_PERM_PASSTHROUGH;
+
+    /*
+     * We cannot share RESIZE or WRITE, as this would make the
+     * children differ from each other.
+     */
+    *nshared = (shared & (BLK_PERM_CONSISTENT_READ |
+                          BLK_PERM_WRITE_UNCHANGED))
+             | DEFAULT_PERM_UNCHANGED;
+}
+
 static const char *const quorum_strong_runtime_opts[] = {
     QUORUM_OPT_VOTE_THRESHOLD,
     QUORUM_OPT_BLKVERIFY,
@@ -1143,7 +1160,7 @@ static BlockDriver bdrv_quorum = {
     .bdrv_add_child                     = quorum_add_child,
     .bdrv_del_child                     = quorum_del_child,
 
-    .bdrv_child_perm                    = bdrv_filter_default_perms,
+    .bdrv_child_perm                    = quorum_child_perm,
 
     .is_filter                          = true,
     .bdrv_recurse_is_first_non_filter   = quorum_recurse_is_first_non_filter,
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH for-5.0 v2 06/23] block: Add bdrv_recurse_can_replace()
  2019-11-11 16:01 [PATCH for-5.0 v2 00/23] block: Fix check_to_replace_node() Max Reitz
                   ` (4 preceding siblings ...)
  2019-11-11 16:01 ` [PATCH for-5.0 v2 05/23] quorum: Fix child permissions Max Reitz
@ 2019-11-11 16:01 ` Max Reitz
  2019-11-29  9:34   ` Vladimir Sementsov-Ogievskiy
  2019-11-11 16:02 ` [PATCH for-5.0 v2 07/23] blkverify: Implement .bdrv_recurse_can_replace() Max Reitz
                   ` (17 subsequent siblings)
  23 siblings, 1 reply; 75+ messages in thread
From: Max Reitz @ 2019-11-11 16:01 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, Alberto Garcia,
	qemu-devel, Max Reitz

After a couple of follow-up patches, this function will replace
bdrv_recurse_is_first_non_filter() in check_to_replace_node().

bdrv_recurse_is_first_non_filter() is both not sufficiently specific for
check_to_replace_node() (it allows cases that should not be allowed,
like replacing child nodes of quorum with dissenting data that have more
parents than just quorum), and it is too restrictive (it is perfectly
fine to replace filters).

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 block.c                   | 38 ++++++++++++++++++++++++++++++++++++++
 include/block/block_int.h | 10 ++++++++++
 2 files changed, 48 insertions(+)

diff --git a/block.c b/block.c
index 9b1049786a..de53addeb0 100644
--- a/block.c
+++ b/block.c
@@ -6205,6 +6205,44 @@ bool bdrv_recurse_is_first_non_filter(BlockDriverState *bs,
     return false;
 }
 
+/*
+ * This function checks whether the given @to_replace is allowed to be
+ * replaced by a node that always shows the same data as @bs.  This is
+ * used for example to verify whether the mirror job can replace
+ * @to_replace by the target mirrored from @bs.
+ * To be replaceable, @bs and @to_replace may either be guaranteed to
+ * always show the same data (because they are only connected through
+ * filters), or some driver may allow replacing one of its children
+ * because it can guarantee that this child's data is not visible at
+ * all (for example, for dissenting quorum children that have no other
+ * parents).
+ */
+bool bdrv_recurse_can_replace(BlockDriverState *bs,
+                              BlockDriverState *to_replace)
+{
+    if (!bs || !bs->drv) {
+        return false;
+    }
+
+    if (bs == to_replace) {
+        return true;
+    }
+
+    /* See what the driver can do */
+    if (bs->drv->bdrv_recurse_can_replace) {
+        return bs->drv->bdrv_recurse_can_replace(bs, to_replace);
+    }
+
+    /* For filters without an own implementation, we can recurse on our own */
+    if (bs->drv->is_filter) {
+        BdrvChild *child = bs->file ?: bs->backing;
+        return bdrv_recurse_can_replace(child->bs, to_replace);
+    }
+
+    /* Safe default */
+    return false;
+}
+
 BlockDriverState *check_to_replace_node(BlockDriverState *parent_bs,
                                         const char *node_name, Error **errp)
 {
diff --git a/include/block/block_int.h b/include/block/block_int.h
index dd033d0b37..75f03dcc38 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -102,6 +102,13 @@ struct BlockDriver {
      */
     bool (*bdrv_recurse_is_first_non_filter)(BlockDriverState *bs,
                                              BlockDriverState *candidate);
+    /*
+     * Return true if @to_replace can be replaced by a BDS with the
+     * same data as @bs without it affecting @bs's behavior (that is,
+     * without it being visible to @bs's parents).
+     */
+    bool (*bdrv_recurse_can_replace)(BlockDriverState *bs,
+                                     BlockDriverState *to_replace);
 
     int (*bdrv_probe)(const uint8_t *buf, int buf_size, const char *filename);
     int (*bdrv_probe_device)(const char *filename);
@@ -1264,6 +1271,9 @@ void bdrv_format_default_perms(BlockDriverState *bs, BdrvChild *c,
                                uint64_t perm, uint64_t shared,
                                uint64_t *nperm, uint64_t *nshared);
 
+bool bdrv_recurse_can_replace(BlockDriverState *bs,
+                              BlockDriverState *to_replace);
+
 /*
  * Default implementation for drivers to pass bdrv_co_block_status() to
  * their file.
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH for-5.0 v2 07/23] blkverify: Implement .bdrv_recurse_can_replace()
  2019-11-11 16:01 [PATCH for-5.0 v2 00/23] block: Fix check_to_replace_node() Max Reitz
                   ` (5 preceding siblings ...)
  2019-11-11 16:01 ` [PATCH for-5.0 v2 06/23] block: Add bdrv_recurse_can_replace() Max Reitz
@ 2019-11-11 16:02 ` Max Reitz
  2019-11-29  9:41   ` Vladimir Sementsov-Ogievskiy
  2019-11-11 16:02 ` [PATCH for-5.0 v2 08/23] quorum: Store children in own structure Max Reitz
                   ` (16 subsequent siblings)
  23 siblings, 1 reply; 75+ messages in thread
From: Max Reitz @ 2019-11-11 16:02 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, Alberto Garcia,
	qemu-devel, Max Reitz

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 block/blkverify.c | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/block/blkverify.c b/block/blkverify.c
index 304b0a1368..0add3ab483 100644
--- a/block/blkverify.c
+++ b/block/blkverify.c
@@ -282,6 +282,20 @@ static bool blkverify_recurse_is_first_non_filter(BlockDriverState *bs,
     return bdrv_recurse_is_first_non_filter(s->test_file->bs, candidate);
 }
 
+static bool blkverify_recurse_can_replace(BlockDriverState *bs,
+                                          BlockDriverState *to_replace)
+{
+    BDRVBlkverifyState *s = bs->opaque;
+
+    /*
+     * blkverify quits the whole qemu process if there is a mismatch
+     * between bs->file->bs and s->test_file->bs.  Therefore, we know
+     * know that both must match bs and we can recurse down to either.
+     */
+    return bdrv_recurse_can_replace(bs->file->bs, to_replace) ||
+           bdrv_recurse_can_replace(s->test_file->bs, to_replace);
+}
+
 static void blkverify_refresh_filename(BlockDriverState *bs)
 {
     BDRVBlkverifyState *s = bs->opaque;
@@ -328,6 +342,7 @@ static BlockDriver bdrv_blkverify = {
 
     .is_filter                        = true,
     .bdrv_recurse_is_first_non_filter = blkverify_recurse_is_first_non_filter,
+    .bdrv_recurse_can_replace         = blkverify_recurse_can_replace,
 };
 
 static void bdrv_blkverify_init(void)
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH for-5.0 v2 08/23] quorum: Store children in own structure
  2019-11-11 16:01 [PATCH for-5.0 v2 00/23] block: Fix check_to_replace_node() Max Reitz
                   ` (6 preceding siblings ...)
  2019-11-11 16:02 ` [PATCH for-5.0 v2 07/23] blkverify: Implement .bdrv_recurse_can_replace() Max Reitz
@ 2019-11-11 16:02 ` Max Reitz
  2019-11-29  9:46   ` Vladimir Sementsov-Ogievskiy
  2019-11-11 16:02 ` [PATCH for-5.0 v2 09/23] quorum: Add QuorumChild.to_be_replaced Max Reitz
                   ` (15 subsequent siblings)
  23 siblings, 1 reply; 75+ messages in thread
From: Max Reitz @ 2019-11-11 16:02 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, Alberto Garcia,
	qemu-devel, Max Reitz

This will be useful when we want to store additional attributes for each
child.

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 block/quorum.c | 64 ++++++++++++++++++++++++++++----------------------
 1 file changed, 36 insertions(+), 28 deletions(-)

diff --git a/block/quorum.c b/block/quorum.c
index 17b439056f..59cd524502 100644
--- a/block/quorum.c
+++ b/block/quorum.c
@@ -65,9 +65,13 @@ typedef struct QuorumVotes {
     bool (*compare)(QuorumVoteValue *a, QuorumVoteValue *b);
 } QuorumVotes;
 
+typedef struct QuorumChild {
+    BdrvChild *child;
+} QuorumChild;
+
 /* the following structure holds the state of one quorum instance */
 typedef struct BDRVQuorumState {
-    BdrvChild **children;  /* children BlockDriverStates */
+    QuorumChild *children;
     int num_children;      /* children count */
     unsigned next_child_index;  /* the index of the next child that should
                                  * be added
@@ -264,7 +268,7 @@ static void quorum_report_bad_versions(BDRVQuorumState *s,
         }
         QLIST_FOREACH(item, &version->items, next) {
             quorum_report_bad(QUORUM_OP_TYPE_READ, acb->offset, acb->bytes,
-                              s->children[item->index]->bs->node_name, 0);
+                              s->children[item->index].child->bs->node_name, 0);
         }
     }
 }
@@ -279,7 +283,7 @@ static void quorum_rewrite_entry(void *opaque)
      * corrupted data.
      * Mask out BDRV_REQ_WRITE_UNCHANGED because this overwrites the
      * area with different data from the other children. */
-    bdrv_co_pwritev(s->children[co->idx], acb->offset, acb->bytes,
+    bdrv_co_pwritev(s->children[co->idx].child, acb->offset, acb->bytes,
                     acb->qiov, acb->flags & ~BDRV_REQ_WRITE_UNCHANGED);
 
     /* Wake up the caller after the last rewrite */
@@ -578,8 +582,8 @@ static void read_quorum_children_entry(void *opaque)
     int i = co->idx;
     QuorumChildRequest *sacb = &acb->qcrs[i];
 
-    sacb->bs = s->children[i]->bs;
-    sacb->ret = bdrv_co_preadv(s->children[i], acb->offset, acb->bytes,
+    sacb->bs = s->children[i].child->bs;
+    sacb->ret = bdrv_co_preadv(s->children[i].child, acb->offset, acb->bytes,
                                &acb->qcrs[i].qiov, 0);
 
     if (sacb->ret == 0) {
@@ -605,7 +609,8 @@ static int read_quorum_children(QuorumAIOCB *acb)
 
     acb->children_read = s->num_children;
     for (i = 0; i < s->num_children; i++) {
-        acb->qcrs[i].buf = qemu_blockalign(s->children[i]->bs, acb->qiov->size);
+        acb->qcrs[i].buf = qemu_blockalign(s->children[i].child->bs,
+                                           acb->qiov->size);
         qemu_iovec_init(&acb->qcrs[i].qiov, acb->qiov->niov);
         qemu_iovec_clone(&acb->qcrs[i].qiov, acb->qiov, acb->qcrs[i].buf);
     }
@@ -647,8 +652,8 @@ static int read_fifo_child(QuorumAIOCB *acb)
     /* We try to read the next child in FIFO order if we failed to read */
     do {
         n = acb->children_read++;
-        acb->qcrs[n].bs = s->children[n]->bs;
-        ret = bdrv_co_preadv(s->children[n], acb->offset, acb->bytes,
+        acb->qcrs[n].bs = s->children[n].child->bs;
+        ret = bdrv_co_preadv(s->children[n].child, acb->offset, acb->bytes,
                              acb->qiov, 0);
         if (ret < 0) {
             quorum_report_bad_acb(&acb->qcrs[n], ret);
@@ -688,8 +693,8 @@ static void write_quorum_entry(void *opaque)
     int i = co->idx;
     QuorumChildRequest *sacb = &acb->qcrs[i];
 
-    sacb->bs = s->children[i]->bs;
-    sacb->ret = bdrv_co_pwritev(s->children[i], acb->offset, acb->bytes,
+    sacb->bs = s->children[i].child->bs;
+    sacb->ret = bdrv_co_pwritev(s->children[i].child, acb->offset, acb->bytes,
                                 acb->qiov, acb->flags);
     if (sacb->ret == 0) {
         acb->success_count++;
@@ -743,12 +748,12 @@ static int64_t quorum_getlength(BlockDriverState *bs)
     int i;
 
     /* check that all file have the same length */
-    result = bdrv_getlength(s->children[0]->bs);
+    result = bdrv_getlength(s->children[0].child->bs);
     if (result < 0) {
         return result;
     }
     for (i = 1; i < s->num_children; i++) {
-        int64_t value = bdrv_getlength(s->children[i]->bs);
+        int64_t value = bdrv_getlength(s->children[i].child->bs);
         if (value < 0) {
             return value;
         }
@@ -774,10 +779,10 @@ static coroutine_fn int quorum_co_flush(BlockDriverState *bs)
     error_votes.compare = quorum_64bits_compare;
 
     for (i = 0; i < s->num_children; i++) {
-        result = bdrv_co_flush(s->children[i]->bs);
+        result = bdrv_co_flush(s->children[i].child->bs);
         if (result) {
             quorum_report_bad(QUORUM_OP_TYPE_FLUSH, 0, 0,
-                              s->children[i]->bs->node_name, result);
+                              s->children[i].child->bs->node_name, result);
             result_value.l = result;
             quorum_count_vote(&error_votes, &result_value, i);
         } else {
@@ -803,7 +808,7 @@ static bool quorum_recurse_is_first_non_filter(BlockDriverState *bs,
     int i;
 
     for (i = 0; i < s->num_children; i++) {
-        bool perm = bdrv_recurse_is_first_non_filter(s->children[i]->bs,
+        bool perm = bdrv_recurse_is_first_non_filter(s->children[i].child->bs,
                                                      candidate);
         if (perm) {
             return true;
@@ -932,7 +937,7 @@ static int quorum_open(BlockDriverState *bs, QDict *options, int flags,
     }
 
     /* allocate the children array */
-    s->children = g_new0(BdrvChild *, s->num_children);
+    s->children = g_new0(QuorumChild, s->num_children);
     opened = g_new0(bool, s->num_children);
 
     for (i = 0; i < s->num_children; i++) {
@@ -940,8 +945,9 @@ static int quorum_open(BlockDriverState *bs, QDict *options, int flags,
         ret = snprintf(indexstr, 32, "children.%d", i);
         assert(ret < 32);
 
-        s->children[i] = bdrv_open_child(NULL, options, indexstr, bs,
-                                         &child_format, false, &local_err);
+        s->children[i].child = bdrv_open_child(NULL, options, indexstr, bs,
+                                               &child_format, false,
+                                               &local_err);
         if (local_err) {
             ret = -EINVAL;
             goto close_exit;
@@ -962,7 +968,7 @@ close_exit:
         if (!opened[i]) {
             continue;
         }
-        bdrv_unref_child(bs, s->children[i]);
+        bdrv_unref_child(bs, s->children[i].child);
     }
     g_free(s->children);
     g_free(opened);
@@ -979,7 +985,7 @@ static void quorum_close(BlockDriverState *bs)
     int i;
 
     for (i = 0; i < s->num_children; i++) {
-        bdrv_unref_child(bs, s->children[i]);
+        bdrv_unref_child(bs, s->children[i].child);
     }
 
     g_free(s->children);
@@ -998,8 +1004,8 @@ static void quorum_add_child(BlockDriverState *bs, BlockDriverState *child_bs,
         return;
     }
 
-    assert(s->num_children <= INT_MAX / sizeof(BdrvChild *));
-    if (s->num_children == INT_MAX / sizeof(BdrvChild *) ||
+    assert(s->num_children <= INT_MAX / sizeof(QuorumChild));
+    if (s->num_children == INT_MAX / sizeof(QuorumChild) ||
         s->next_child_index == UINT_MAX) {
         error_setg(errp, "Too many children");
         return;
@@ -1022,8 +1028,10 @@ static void quorum_add_child(BlockDriverState *bs, BlockDriverState *child_bs,
         s->next_child_index--;
         goto out;
     }
-    s->children = g_renew(BdrvChild *, s->children, s->num_children + 1);
-    s->children[s->num_children++] = child;
+    s->children = g_renew(QuorumChild, s->children, s->num_children + 1);
+    s->children[s->num_children++] = (QuorumChild){
+        .child = child,
+    };
 
 out:
     bdrv_drained_end(bs);
@@ -1036,7 +1044,7 @@ static void quorum_del_child(BlockDriverState *bs, BdrvChild *child,
     int i;
 
     for (i = 0; i < s->num_children; i++) {
-        if (s->children[i] == child) {
+        if (s->children[i].child == child) {
             break;
         }
     }
@@ -1058,8 +1066,8 @@ static void quorum_del_child(BlockDriverState *bs, BdrvChild *child,
 
     /* We can safely remove this child now */
     memmove(&s->children[i], &s->children[i + 1],
-            (s->num_children - i - 1) * sizeof(BdrvChild *));
-    s->children = g_renew(BdrvChild *, s->children, --s->num_children);
+            (s->num_children - i - 1) * sizeof(QuorumChild));
+    s->children = g_renew(QuorumChild, s->children, --s->num_children);
     bdrv_unref_child(bs, child);
 
     bdrv_drained_end(bs);
@@ -1100,7 +1108,7 @@ static void quorum_gather_child_options(BlockDriverState *bs, QDict *target,
 
     for (i = 0; i < s->num_children; i++) {
         qlist_append(children_list,
-                     qobject_ref(s->children[i]->bs->full_open_options));
+                     qobject_ref(s->children[i].child->bs->full_open_options));
     }
 }
 
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH for-5.0 v2 09/23] quorum: Add QuorumChild.to_be_replaced
  2019-11-11 16:01 [PATCH for-5.0 v2 00/23] block: Fix check_to_replace_node() Max Reitz
                   ` (7 preceding siblings ...)
  2019-11-11 16:02 ` [PATCH for-5.0 v2 08/23] quorum: Store children in own structure Max Reitz
@ 2019-11-11 16:02 ` Max Reitz
  2019-11-29  9:59   ` Vladimir Sementsov-Ogievskiy
  2019-11-11 16:02 ` [PATCH for-5.0 v2 10/23] quorum: Implement .bdrv_recurse_can_replace() Max Reitz
                   ` (14 subsequent siblings)
  23 siblings, 1 reply; 75+ messages in thread
From: Max Reitz @ 2019-11-11 16:02 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, Alberto Garcia,
	qemu-devel, Max Reitz

We will need this to verify that Quorum can let one of its children be
replaced without breaking anything else.

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 block/quorum.c | 23 +++++++++++++++++++++++
 1 file changed, 23 insertions(+)

diff --git a/block/quorum.c b/block/quorum.c
index 59cd524502..3a824e77e3 100644
--- a/block/quorum.c
+++ b/block/quorum.c
@@ -67,6 +67,13 @@ typedef struct QuorumVotes {
 
 typedef struct QuorumChild {
     BdrvChild *child;
+
+    /*
+     * If set, check whether this node can be replaced without any
+     * other parent noticing: Unshare CONSISTENT_READ, and take the
+     * WRITE permission.
+     */
+    bool to_be_replaced;
 } QuorumChild;
 
 /* the following structure holds the state of one quorum instance */
@@ -1128,6 +1135,16 @@ static void quorum_child_perm(BlockDriverState *bs, BdrvChild *c,
                               uint64_t perm, uint64_t shared,
                               uint64_t *nperm, uint64_t *nshared)
 {
+    BDRVQuorumState *s = bs->opaque;
+    int i;
+
+    for (i = 0; i < s->num_children; i++) {
+        if (s->children[i].child == c) {
+            break;
+        }
+    }
+    assert(!c || i < s->num_children);
+
     *nperm = perm & DEFAULT_PERM_PASSTHROUGH;
 
     /*
@@ -1137,6 +1154,12 @@ static void quorum_child_perm(BlockDriverState *bs, BdrvChild *c,
     *nshared = (shared & (BLK_PERM_CONSISTENT_READ |
                           BLK_PERM_WRITE_UNCHANGED))
              | DEFAULT_PERM_UNCHANGED;
+
+    if (c && s->children[i].to_be_replaced) {
+        /* Prepare for sudden data changes */
+        *nperm |= BLK_PERM_WRITE;
+        *nshared &= ~BLK_PERM_CONSISTENT_READ;
+    }
 }
 
 static const char *const quorum_strong_runtime_opts[] = {
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH for-5.0 v2 10/23] quorum: Implement .bdrv_recurse_can_replace()
  2019-11-11 16:01 [PATCH for-5.0 v2 00/23] block: Fix check_to_replace_node() Max Reitz
                   ` (8 preceding siblings ...)
  2019-11-11 16:02 ` [PATCH for-5.0 v2 09/23] quorum: Add QuorumChild.to_be_replaced Max Reitz
@ 2019-11-11 16:02 ` Max Reitz
  2019-11-29 10:18   ` Vladimir Sementsov-Ogievskiy
  2020-02-05 15:55   ` Kevin Wolf
  2019-11-11 16:02 ` [PATCH for-5.0 v2 11/23] block: Use bdrv_recurse_can_replace() Max Reitz
                   ` (13 subsequent siblings)
  23 siblings, 2 replies; 75+ messages in thread
From: Max Reitz @ 2019-11-11 16:02 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, Alberto Garcia,
	qemu-devel, Max Reitz

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 block/quorum.c | 62 ++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 62 insertions(+)

diff --git a/block/quorum.c b/block/quorum.c
index 3a824e77e3..8ee03e9baf 100644
--- a/block/quorum.c
+++ b/block/quorum.c
@@ -825,6 +825,67 @@ static bool quorum_recurse_is_first_non_filter(BlockDriverState *bs,
     return false;
 }
 
+static bool quorum_recurse_can_replace(BlockDriverState *bs,
+                                       BlockDriverState *to_replace)
+{
+    BDRVQuorumState *s = bs->opaque;
+    int i;
+
+    for (i = 0; i < s->num_children; i++) {
+        /*
+         * We have no idea whether our children show the same data as
+         * this node (@bs).  It is actually highly likely that
+         * @to_replace does not, because replacing a broken child is
+         * one of the main use cases here.
+         *
+         * We do know that the new BDS will match @bs, so replacing
+         * any of our children by it will be safe.  It cannot change
+         * the data this quorum node presents to its parents.
+         *
+         * However, replacing @to_replace by @bs in any of our
+         * children's chains may change visible data somewhere in
+         * there.  We therefore cannot recurse down those chains with
+         * bdrv_recurse_can_replace().
+         * (More formally, bdrv_recurse_can_replace() requires that
+         * @to_replace will be replaced by something matching the @bs
+         * passed to it.  We cannot guarantee that.)
+         *
+         * Thus, we can only check whether any of our immediate
+         * children matches @to_replace.
+         *
+         * (In the future, we might add a function to recurse down a
+         * chain that checks that nothing there cares about a change
+         * in data from the respective child in question.  For
+         * example, most filters do not care when their child's data
+         * suddenly changes, as long as their parents do not care.)
+         */
+        if (s->children[i].child->bs == to_replace) {
+            Error *local_err = NULL;
+
+            /*
+             * We now have to ensure that there is no other parent
+             * that cares about replacing this child by a node with
+             * potentially different data.
+             */
+            s->children[i].to_be_replaced = true;
+            bdrv_child_refresh_perms(bs, s->children[i].child, &local_err);
+
+            /* Revert permissions */
+            s->children[i].to_be_replaced = false;
+            bdrv_child_refresh_perms(bs, s->children[i].child, &error_abort);
+
+            if (local_err) {
+                error_free(local_err);
+                return false;
+            }
+
+            return true;
+        }
+    }
+
+    return false;
+}
+
 static int quorum_valid_threshold(int threshold, int num_children, Error **errp)
 {
 
@@ -1195,6 +1256,7 @@ static BlockDriver bdrv_quorum = {
 
     .is_filter                          = true,
     .bdrv_recurse_is_first_non_filter   = quorum_recurse_is_first_non_filter,
+    .bdrv_recurse_can_replace           = quorum_recurse_can_replace,
 
     .strong_runtime_opts                = quorum_strong_runtime_opts,
 };
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH for-5.0 v2 11/23] block: Use bdrv_recurse_can_replace()
  2019-11-11 16:01 [PATCH for-5.0 v2 00/23] block: Fix check_to_replace_node() Max Reitz
                   ` (9 preceding siblings ...)
  2019-11-11 16:02 ` [PATCH for-5.0 v2 10/23] quorum: Implement .bdrv_recurse_can_replace() Max Reitz
@ 2019-11-11 16:02 ` Max Reitz
  2019-11-29 11:07   ` Vladimir Sementsov-Ogievskiy
  2020-02-05 15:57   ` Kevin Wolf
  2019-11-11 16:02 ` [PATCH for-5.0 v2 12/23] block: Remove bdrv_recurse_is_first_non_filter() Max Reitz
                   ` (12 subsequent siblings)
  23 siblings, 2 replies; 75+ messages in thread
From: Max Reitz @ 2019-11-11 16:02 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, Alberto Garcia,
	qemu-devel, Max Reitz

Let check_to_replace_node() use the more specialized
bdrv_recurse_can_replace() instead of
bdrv_recurse_is_first_non_filter(), which is too restrictive.

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 block.c | 18 ++++++++++++++++--
 1 file changed, 16 insertions(+), 2 deletions(-)

diff --git a/block.c b/block.c
index de53addeb0..7608f21570 100644
--- a/block.c
+++ b/block.c
@@ -6243,6 +6243,17 @@ bool bdrv_recurse_can_replace(BlockDriverState *bs,
     return false;
 }
 
+/*
+ * Check whether the given @node_name can be replaced by a node that
+ * has the same data as @parent_bs.  If so, return @node_name's BDS;
+ * NULL otherwise.
+ *
+ * @node_name must be a (recursive) *child of @parent_bs (or this
+ * function will return NULL).
+ *
+ * The result (whether the node can be replaced or not) is only valid
+ * for as long as no graph changes occur.
+ */
 BlockDriverState *check_to_replace_node(BlockDriverState *parent_bs,
                                         const char *node_name, Error **errp)
 {
@@ -6267,8 +6278,11 @@ BlockDriverState *check_to_replace_node(BlockDriverState *parent_bs,
      * Another benefit is that this tests exclude backing files which are
      * blocked by the backing blockers.
      */
-    if (!bdrv_recurse_is_first_non_filter(parent_bs, to_replace_bs)) {
-        error_setg(errp, "Only top most non filter can be replaced");
+    if (!bdrv_recurse_can_replace(parent_bs, to_replace_bs)) {
+        error_setg(errp, "Cannot replace '%s' by a node mirrored from '%s', "
+                   "because it cannot be guaranteed that doing so would not "
+                   "lead to an abrupt change of visible data",
+                   node_name, parent_bs->node_name);
         to_replace_bs = NULL;
         goto out;
     }
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH for-5.0 v2 12/23] block: Remove bdrv_recurse_is_first_non_filter()
  2019-11-11 16:01 [PATCH for-5.0 v2 00/23] block: Fix check_to_replace_node() Max Reitz
                   ` (10 preceding siblings ...)
  2019-11-11 16:02 ` [PATCH for-5.0 v2 11/23] block: Use bdrv_recurse_can_replace() Max Reitz
@ 2019-11-11 16:02 ` Max Reitz
  2019-11-11 16:02 ` [PATCH for-5.0 v2 13/23] mirror: Double-check immediately before replacing Max Reitz
                   ` (11 subsequent siblings)
  23 siblings, 0 replies; 75+ messages in thread
From: Max Reitz @ 2019-11-11 16:02 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, Alberto Garcia,
	qemu-devel, Max Reitz

It no longer has any users.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 block.c                   | 33 ---------------------------------
 block/blkverify.c         | 15 ---------------
 block/copy-on-read.c      |  9 ---------
 block/quorum.c            | 18 ------------------
 block/replication.c       |  7 -------
 block/throttle.c          |  8 --------
 include/block/block.h     |  4 ----
 include/block/block_int.h |  8 --------
 8 files changed, 102 deletions(-)

diff --git a/block.c b/block.c
index 7608f21570..0159f8e510 100644
--- a/block.c
+++ b/block.c
@@ -6172,39 +6172,6 @@ int bdrv_amend_options(BlockDriverState *bs, QemuOpts *opts,
     return bs->drv->bdrv_amend_options(bs, opts, status_cb, cb_opaque, errp);
 }
 
-/* This function will be called by the bdrv_recurse_is_first_non_filter method
- * of block filter and by bdrv_is_first_non_filter.
- * It is used to test if the given bs is the candidate or recurse more in the
- * node graph.
- */
-bool bdrv_recurse_is_first_non_filter(BlockDriverState *bs,
-                                      BlockDriverState *candidate)
-{
-    /* return false if basic checks fails */
-    if (!bs || !bs->drv) {
-        return false;
-    }
-
-    /* the code reached a non block filter driver -> check if the bs is
-     * the same as the candidate. It's the recursion termination condition.
-     */
-    if (!bs->drv->is_filter) {
-        return bs == candidate;
-    }
-    /* Down this path the driver is a block filter driver */
-
-    /* If the block filter recursion method is defined use it to recurse down
-     * the node graph.
-     */
-    if (bs->drv->bdrv_recurse_is_first_non_filter) {
-        return bs->drv->bdrv_recurse_is_first_non_filter(bs, candidate);
-    }
-
-    /* the driver is a block filter but don't allow to recurse -> return false
-     */
-    return false;
-}
-
 /*
  * This function checks whether the given @to_replace is allowed to be
  * replaced by a node that always shows the same data as @bs.  This is
diff --git a/block/blkverify.c b/block/blkverify.c
index 0add3ab483..ba6b1853ae 100644
--- a/block/blkverify.c
+++ b/block/blkverify.c
@@ -268,20 +268,6 @@ static int blkverify_co_flush(BlockDriverState *bs)
     return bdrv_co_flush(s->test_file->bs);
 }
 
-static bool blkverify_recurse_is_first_non_filter(BlockDriverState *bs,
-                                                  BlockDriverState *candidate)
-{
-    BDRVBlkverifyState *s = bs->opaque;
-
-    bool perm = bdrv_recurse_is_first_non_filter(bs->file->bs, candidate);
-
-    if (perm) {
-        return true;
-    }
-
-    return bdrv_recurse_is_first_non_filter(s->test_file->bs, candidate);
-}
-
 static bool blkverify_recurse_can_replace(BlockDriverState *bs,
                                           BlockDriverState *to_replace)
 {
@@ -341,7 +327,6 @@ static BlockDriver bdrv_blkverify = {
     .bdrv_co_flush                    = blkverify_co_flush,
 
     .is_filter                        = true,
-    .bdrv_recurse_is_first_non_filter = blkverify_recurse_is_first_non_filter,
     .bdrv_recurse_can_replace         = blkverify_recurse_can_replace,
 };
 
diff --git a/block/copy-on-read.c b/block/copy-on-read.c
index e95223d3cb..242d3ff055 100644
--- a/block/copy-on-read.c
+++ b/block/copy-on-read.c
@@ -118,13 +118,6 @@ static void cor_lock_medium(BlockDriverState *bs, bool locked)
 }
 
 
-static bool cor_recurse_is_first_non_filter(BlockDriverState *bs,
-                                            BlockDriverState *candidate)
-{
-    return bdrv_recurse_is_first_non_filter(bs->file->bs, candidate);
-}
-
-
 static BlockDriver bdrv_copy_on_read = {
     .format_name                        = "copy-on-read",
 
@@ -143,8 +136,6 @@ static BlockDriver bdrv_copy_on_read = {
 
     .bdrv_co_block_status               = bdrv_co_block_status_from_file,
 
-    .bdrv_recurse_is_first_non_filter   = cor_recurse_is_first_non_filter,
-
     .has_variable_length                = true,
     .is_filter                          = true,
 };
diff --git a/block/quorum.c b/block/quorum.c
index 8ee03e9baf..1974e2ffa8 100644
--- a/block/quorum.c
+++ b/block/quorum.c
@@ -808,23 +808,6 @@ static coroutine_fn int quorum_co_flush(BlockDriverState *bs)
     return result;
 }
 
-static bool quorum_recurse_is_first_non_filter(BlockDriverState *bs,
-                                               BlockDriverState *candidate)
-{
-    BDRVQuorumState *s = bs->opaque;
-    int i;
-
-    for (i = 0; i < s->num_children; i++) {
-        bool perm = bdrv_recurse_is_first_non_filter(s->children[i].child->bs,
-                                                     candidate);
-        if (perm) {
-            return true;
-        }
-    }
-
-    return false;
-}
-
 static bool quorum_recurse_can_replace(BlockDriverState *bs,
                                        BlockDriverState *to_replace)
 {
@@ -1255,7 +1238,6 @@ static BlockDriver bdrv_quorum = {
     .bdrv_child_perm                    = quorum_child_perm,
 
     .is_filter                          = true,
-    .bdrv_recurse_is_first_non_filter   = quorum_recurse_is_first_non_filter,
     .bdrv_recurse_can_replace           = quorum_recurse_can_replace,
 
     .strong_runtime_opts                = quorum_strong_runtime_opts,
diff --git a/block/replication.c b/block/replication.c
index 99532ce521..d6681b6c84 100644
--- a/block/replication.c
+++ b/block/replication.c
@@ -306,12 +306,6 @@ out:
     return ret;
 }
 
-static bool replication_recurse_is_first_non_filter(BlockDriverState *bs,
-                                                    BlockDriverState *candidate)
-{
-    return bdrv_recurse_is_first_non_filter(bs->file->bs, candidate);
-}
-
 static void secondary_do_checkpoint(BDRVReplicationState *s, Error **errp)
 {
     Error *local_err = NULL;
@@ -699,7 +693,6 @@ static BlockDriver bdrv_replication = {
     .bdrv_co_writev             = replication_co_writev,
 
     .is_filter                  = true,
-    .bdrv_recurse_is_first_non_filter = replication_recurse_is_first_non_filter,
 
     .has_variable_length        = true,
     .strong_runtime_opts        = replication_strong_runtime_opts,
diff --git a/block/throttle.c b/block/throttle.c
index 0349f42257..71f4bb0ad1 100644
--- a/block/throttle.c
+++ b/block/throttle.c
@@ -207,12 +207,6 @@ static void throttle_reopen_abort(BDRVReopenState *reopen_state)
     reopen_state->opaque = NULL;
 }
 
-static bool throttle_recurse_is_first_non_filter(BlockDriverState *bs,
-                                                 BlockDriverState *candidate)
-{
-    return bdrv_recurse_is_first_non_filter(bs->file->bs, candidate);
-}
-
 static void coroutine_fn throttle_co_drain_begin(BlockDriverState *bs)
 {
     ThrottleGroupMember *tgm = bs->opaque;
@@ -252,8 +246,6 @@ static BlockDriver bdrv_throttle = {
     .bdrv_co_pwrite_zeroes              =   throttle_co_pwrite_zeroes,
     .bdrv_co_pdiscard                   =   throttle_co_pdiscard,
 
-    .bdrv_recurse_is_first_non_filter   =   throttle_recurse_is_first_non_filter,
-
     .bdrv_attach_aio_context            =   throttle_attach_aio_context,
     .bdrv_detach_aio_context            =   throttle_detach_aio_context,
 
diff --git a/include/block/block.h b/include/block/block.h
index 8f6a0cad9c..764a217de6 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -401,10 +401,6 @@ int bdrv_amend_options(BlockDriverState *bs_new, QemuOpts *opts,
                        BlockDriverAmendStatusCB *status_cb, void *cb_opaque,
                        Error **errp);
 
-/* external snapshots */
-bool bdrv_recurse_is_first_non_filter(BlockDriverState *bs,
-                                      BlockDriverState *candidate);
-
 /* check if a named node can be replaced when doing drive-mirror */
 BlockDriverState *check_to_replace_node(BlockDriverState *parent_bs,
                                         const char *node_name, Error **errp);
diff --git a/include/block/block_int.h b/include/block/block_int.h
index 75f03dcc38..589a797fab 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -94,14 +94,6 @@ struct BlockDriver {
      * must implement them and return -ENOTSUP.
      */
     bool is_filter;
-    /* for snapshots block filter like Quorum can implement the
-     * following recursive callback.
-     * It's purpose is to recurse on the filter children while calling
-     * bdrv_recurse_is_first_non_filter on them.
-     * For a sample implementation look in the future Quorum block filter.
-     */
-    bool (*bdrv_recurse_is_first_non_filter)(BlockDriverState *bs,
-                                             BlockDriverState *candidate);
     /*
      * Return true if @to_replace can be replaced by a BDS with the
      * same data as @bs without it affecting @bs's behavior (that is,
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH for-5.0 v2 13/23] mirror: Double-check immediately before replacing
  2019-11-11 16:01 [PATCH for-5.0 v2 00/23] block: Fix check_to_replace_node() Max Reitz
                   ` (11 preceding siblings ...)
  2019-11-11 16:02 ` [PATCH for-5.0 v2 12/23] block: Remove bdrv_recurse_is_first_non_filter() Max Reitz
@ 2019-11-11 16:02 ` Max Reitz
  2019-11-29 11:18   ` Vladimir Sementsov-Ogievskiy
  2019-11-11 16:02 ` [PATCH for-5.0 v2 14/23] quorum: Stop marking it as a filter Max Reitz
                   ` (10 subsequent siblings)
  23 siblings, 1 reply; 75+ messages in thread
From: Max Reitz @ 2019-11-11 16:02 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, Alberto Garcia,
	qemu-devel, Max Reitz

There is no guarantee that we can still replace the node we want to
replace at the end of the mirror job.  Double-check by calling
bdrv_recurse_can_replace().

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 block/mirror.c | 14 +++++++++++++-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/block/mirror.c b/block/mirror.c
index f0f2d9dff1..68a4404666 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -695,7 +695,19 @@ static int mirror_exit_common(Job *job)
          * drain potential other users of the BDS before changing the graph. */
         assert(s->in_drain);
         bdrv_drained_begin(target_bs);
-        bdrv_replace_node(to_replace, target_bs, &local_err);
+        /*
+         * Cannot use check_to_replace_node() here, because that would
+         * check for an op blocker on @to_replace, and we have our own
+         * there.
+         */
+        if (bdrv_recurse_can_replace(src, to_replace)) {
+            bdrv_replace_node(to_replace, target_bs, &local_err);
+        } else {
+            error_setg(&local_err, "Can no longer replace '%s' by '%s', "
+                       "because it can no longer be guaranteed that doing so "
+                       "would not lead to an abrupt change of visible data",
+                       to_replace->node_name, target_bs->node_name);
+        }
         bdrv_drained_end(target_bs);
         if (local_err) {
             error_report_err(local_err);
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH for-5.0 v2 14/23] quorum: Stop marking it as a filter
  2019-11-11 16:01 [PATCH for-5.0 v2 00/23] block: Fix check_to_replace_node() Max Reitz
                   ` (12 preceding siblings ...)
  2019-11-11 16:02 ` [PATCH for-5.0 v2 13/23] mirror: Double-check immediately before replacing Max Reitz
@ 2019-11-11 16:02 ` Max Reitz
  2019-11-11 16:02 ` [PATCH for-5.0 v2 15/23] mirror: Prevent loops Max Reitz
                   ` (9 subsequent siblings)
  23 siblings, 0 replies; 75+ messages in thread
From: Max Reitz @ 2019-11-11 16:02 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, Alberto Garcia,
	qemu-devel, Max Reitz

Quorum is not a filter, for example because it cannot guarantee which of
its children will serve the next request.  Thus, any of its children may
differ from the data visible to quorum's parents.

We have other filters with multiple children, but they differ in this
aspect:

- blkverify quits the whole qemu process if its children differ.  As
  such, we can always skip it when we want to skip it (as a filter node)
  by going to any of its children.  Both have the same data.

- replication generally serves requests from bs->file, so this is its
  only actually filtered child.

- Block job filters currently only have one child, but they will
  probably get more children in the future.  Still, they will always
  have only one actually filtered child.

Having "filters" as a dedicated node category only makes sense if you
can skip them by going to a one fixed child that always shows the same
data as the filter node.  Quorum cannot fulfill this, so it is not a
filter.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 block/quorum.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/block/quorum.c b/block/quorum.c
index 1974e2ffa8..8cd13a7b91 100644
--- a/block/quorum.c
+++ b/block/quorum.c
@@ -1237,7 +1237,6 @@ static BlockDriver bdrv_quorum = {
 
     .bdrv_child_perm                    = quorum_child_perm,
 
-    .is_filter                          = true,
     .bdrv_recurse_can_replace           = quorum_recurse_can_replace,
 
     .strong_runtime_opts                = quorum_strong_runtime_opts,
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH for-5.0 v2 15/23] mirror: Prevent loops
  2019-11-11 16:01 [PATCH for-5.0 v2 00/23] block: Fix check_to_replace_node() Max Reitz
                   ` (13 preceding siblings ...)
  2019-11-11 16:02 ` [PATCH for-5.0 v2 14/23] quorum: Stop marking it as a filter Max Reitz
@ 2019-11-11 16:02 ` Max Reitz
  2019-11-29 12:01   ` Vladimir Sementsov-Ogievskiy
  2019-12-02 12:12   ` Vladimir Sementsov-Ogievskiy
  2019-11-11 16:02 ` [PATCH for-5.0 v2 16/23] iotests: Use complete_and_wait() in 155 Max Reitz
                   ` (8 subsequent siblings)
  23 siblings, 2 replies; 75+ messages in thread
From: Max Reitz @ 2019-11-11 16:02 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, Alberto Garcia,
	qemu-devel, Max Reitz

While bdrv_replace_node() will not follow through with it, a specific
@replaces asks the mirror job to create a loop.

For example, say both the source and the target share a child where the
source is a filter; by letting @replaces point to the common child, you
ask for a loop.

Or if you use @replaces in drive-mirror with sync=none and
mode=absolute-paths, you generally ask for a loop (@replaces must point
to a child of the source, and sync=none makes the source the backing
file of the target after the job).

bdrv_replace_node() will not create those loops, but by doing so, it
ignores the user-requested configuration, which is not ideally either.
(In the first example above, the target's child will remain what it was,
which may still be reasonable.  But in the second example, the target
will just not become a child of the source, which is precisely what was
requested with @replaces.)

So prevent such configurations, both before the job, and before it
actually completes.

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 block.c                   | 30 ++++++++++++++++++++++++
 block/mirror.c            | 19 +++++++++++++++-
 blockdev.c                | 48 ++++++++++++++++++++++++++++++++++++++-
 include/block/block_int.h |  3 +++
 4 files changed, 98 insertions(+), 2 deletions(-)

diff --git a/block.c b/block.c
index 0159f8e510..e3922a0474 100644
--- a/block.c
+++ b/block.c
@@ -6259,6 +6259,36 @@ out:
     return to_replace_bs;
 }
 
+/*
+ * Return true iff @child is a (recursive) child of @parent, with at
+ * least @min_level edges between them.
+ *
+ * (If @min_level == 0, return true if @child == @parent.  For
+ * @min_level == 1, @child needs to be at least a real child; for
+ * @min_level == 2, it needs to be at least a grand-child; and so on.)
+ */
+bool bdrv_is_child_of(BlockDriverState *child, BlockDriverState *parent,
+                      int min_level)
+{
+    BdrvChild *c;
+
+    if (child == parent && min_level <= 0) {
+        return true;
+    }
+
+    if (!parent) {
+        return false;
+    }
+
+    QLIST_FOREACH(c, &parent->children, next) {
+        if (bdrv_is_child_of(child, c->bs, min_level - 1)) {
+            return true;
+        }
+    }
+
+    return false;
+}
+
 /**
  * Iterates through the list of runtime option keys that are said to
  * be "strong" for a BDS.  An option is called "strong" if it changes
diff --git a/block/mirror.c b/block/mirror.c
index 68a4404666..b258c7e98b 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -701,7 +701,24 @@ static int mirror_exit_common(Job *job)
          * there.
          */
         if (bdrv_recurse_can_replace(src, to_replace)) {
-            bdrv_replace_node(to_replace, target_bs, &local_err);
+            /*
+             * It is OK for @to_replace to be an immediate child of
+             * @target_bs, because that is what happens with
+             * drive-mirror sync=none mode=absolute-paths: target_bs's
+             * backing file will be the source node, which is also
+             * to_replace (by default).
+             * bdrv_replace_node() handles this case by not letting
+             * target_bs->backing point to itself, but to the source
+             * still.
+             */
+            if (!bdrv_is_child_of(to_replace, target_bs, 2)) {
+                bdrv_replace_node(to_replace, target_bs, &local_err);
+            } else {
+                error_setg(&local_err, "Can no longer replace '%s' by '%s', "
+                           "because the former is now a child of the latter, "
+                           "and doing so would thus create a loop",
+                           to_replace->node_name, target_bs->node_name);
+            }
         } else {
             error_setg(&local_err, "Can no longer replace '%s' by '%s', "
                        "because it can no longer be guaranteed that doing so "
diff --git a/blockdev.c b/blockdev.c
index 9dc2238bf3..d29f147f72 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -3824,7 +3824,7 @@ static void blockdev_mirror_common(const char *job_id, BlockDriverState *bs,
     }
 
     if (has_replaces) {
-        BlockDriverState *to_replace_bs;
+        BlockDriverState *to_replace_bs, *target_backing_bs;
         AioContext *replace_aio_context;
         int64_t bs_size, replace_size;
 
@@ -3839,6 +3839,52 @@ static void blockdev_mirror_common(const char *job_id, BlockDriverState *bs,
             return;
         }
 
+        if (bdrv_is_child_of(to_replace_bs, target, 1)) {
+            error_setg(errp, "Replacing %s by %s would result in a loop, "
+                       "because the former is a child of the latter",
+                       to_replace_bs->node_name, target->node_name);
+            return;
+        }
+
+        if (backing_mode == MIRROR_SOURCE_BACKING_CHAIN ||
+            backing_mode == MIRROR_OPEN_BACKING_CHAIN)
+        {
+            /*
+             * While we do not quite know what OPEN_BACKING_CHAIN
+             * (used for mode=existing) will yield, it is probably
+             * best to restrict it exactly like SOURCE_BACKING_CHAIN,
+             * because that is our best guess.
+             */
+            switch (sync) {
+            case MIRROR_SYNC_MODE_FULL:
+                target_backing_bs = NULL;
+                break;
+
+            case MIRROR_SYNC_MODE_TOP:
+                target_backing_bs = backing_bs(bs);
+                break;
+
+            case MIRROR_SYNC_MODE_NONE:
+                target_backing_bs = bs;
+                break;
+
+            default:
+                abort();
+            }
+        } else {
+            assert(backing_mode == MIRROR_LEAVE_BACKING_CHAIN);
+            target_backing_bs = backing_bs(target);
+        }
+
+        if (bdrv_is_child_of(to_replace_bs, target_backing_bs, 0)) {
+            error_setg(errp, "Replacing '%s' by '%s' with this sync mode would "
+                       "result in a loop, because the former would be a child "
+                       "of the latter's backing file ('%s') after the mirror "
+                       "job", to_replace_bs->node_name, target->node_name,
+                       target_backing_bs->node_name);
+            return;
+        }
+
         replace_aio_context = bdrv_get_aio_context(to_replace_bs);
         aio_context_acquire(replace_aio_context);
         replace_size = bdrv_getlength(to_replace_bs);
diff --git a/include/block/block_int.h b/include/block/block_int.h
index 589a797fab..7064a1a4fa 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -1266,6 +1266,9 @@ void bdrv_format_default_perms(BlockDriverState *bs, BdrvChild *c,
 bool bdrv_recurse_can_replace(BlockDriverState *bs,
                               BlockDriverState *to_replace);
 
+bool bdrv_is_child_of(BlockDriverState *child, BlockDriverState *parent,
+                      int min_level);
+
 /*
  * Default implementation for drivers to pass bdrv_co_block_status() to
  * their file.
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH for-5.0 v2 16/23] iotests: Use complete_and_wait() in 155
  2019-11-11 16:01 [PATCH for-5.0 v2 00/23] block: Fix check_to_replace_node() Max Reitz
                   ` (14 preceding siblings ...)
  2019-11-11 16:02 ` [PATCH for-5.0 v2 15/23] mirror: Prevent loops Max Reitz
@ 2019-11-11 16:02 ` Max Reitz
  2019-11-11 16:02 ` [PATCH for-5.0 v2 17/23] iotests: Use skip_if_unsupported decorator in 041 Max Reitz
                   ` (7 subsequent siblings)
  23 siblings, 0 replies; 75+ messages in thread
From: Max Reitz @ 2019-11-11 16:02 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, Alberto Garcia,
	qemu-devel, Max Reitz

This way, we get to see errors during the completion phase.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 tests/qemu-iotests/155 | 7 +------
 1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/tests/qemu-iotests/155 b/tests/qemu-iotests/155
index e19485911c..d7ef2579d3 100755
--- a/tests/qemu-iotests/155
+++ b/tests/qemu-iotests/155
@@ -163,12 +163,7 @@ class MirrorBaseClass(BaseClass):
 
         self.assert_qmp(result, 'return', {})
 
-        self.vm.event_wait('BLOCK_JOB_READY')
-
-        result = self.vm.qmp('block-job-complete', device='mirror-job')
-        self.assert_qmp(result, 'return', {})
-
-        self.vm.event_wait('BLOCK_JOB_COMPLETED')
+        self.complete_and_wait('mirror-job')
 
     def testFull(self):
         self.runMirror('full')
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH for-5.0 v2 17/23] iotests: Use skip_if_unsupported decorator in 041
  2019-11-11 16:01 [PATCH for-5.0 v2 00/23] block: Fix check_to_replace_node() Max Reitz
                   ` (15 preceding siblings ...)
  2019-11-11 16:02 ` [PATCH for-5.0 v2 16/23] iotests: Use complete_and_wait() in 155 Max Reitz
@ 2019-11-11 16:02 ` Max Reitz
  2019-12-03 12:03   ` Vladimir Sementsov-Ogievskiy
  2019-11-11 16:02 ` [PATCH for-5.0 v2 18/23] iotests: Add VM.assert_block_path() Max Reitz
                   ` (6 subsequent siblings)
  23 siblings, 1 reply; 75+ messages in thread
From: Max Reitz @ 2019-11-11 16:02 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, Alberto Garcia,
	qemu-devel, Max Reitz

We can use this decorator above TestRepairQuorum.setUp() to skip all
quorum tests with a single line.

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 tests/qemu-iotests/041 | 39 +++------------------------------------
 1 file changed, 3 insertions(+), 36 deletions(-)

diff --git a/tests/qemu-iotests/041 b/tests/qemu-iotests/041
index 3c60c07b01..2ab59e9c53 100755
--- a/tests/qemu-iotests/041
+++ b/tests/qemu-iotests/041
@@ -871,6 +871,7 @@ class TestRepairQuorum(iotests.QMPTestCase):
     image_len = 1 * 1024 * 1024 # MB
     IMAGES = [ quorum_img1, quorum_img2, quorum_img3 ]
 
+    @iotests.skip_if_unsupported(['quorum'])
     def setUp(self):
         self.vm = iotests.VM()
 
@@ -894,9 +895,8 @@ class TestRepairQuorum(iotests.QMPTestCase):
         #assemble the quorum block device from the individual files
         args = { "driver": "quorum", "node-name": "quorum0",
                  "vote-threshold": 2, "children": [ "img0", "img1", "img2" ] }
-        if iotests.supports_quorum():
-            result = self.vm.qmp("blockdev-add", **args)
-            self.assert_qmp(result, 'return', {})
+        result = self.vm.qmp("blockdev-add", **args)
+        self.assert_qmp(result, 'return', {})
 
 
     def tearDown(self):
@@ -909,9 +909,6 @@ class TestRepairQuorum(iotests.QMPTestCase):
                 pass
 
     def test_complete(self):
-        if not iotests.supports_quorum():
-            return
-
         self.assert_no_active_block_jobs()
 
         result = self.vm.qmp('drive-mirror', job_id='job0', device='quorum0',
@@ -928,9 +925,6 @@ class TestRepairQuorum(iotests.QMPTestCase):
                         'target image does not match source after mirroring')
 
     def test_cancel(self):
-        if not iotests.supports_quorum():
-            return
-
         self.assert_no_active_block_jobs()
 
         result = self.vm.qmp('drive-mirror', job_id='job0', device='quorum0',
@@ -945,9 +939,6 @@ class TestRepairQuorum(iotests.QMPTestCase):
         self.vm.shutdown()
 
     def test_cancel_after_ready(self):
-        if not iotests.supports_quorum():
-            return
-
         self.assert_no_active_block_jobs()
 
         result = self.vm.qmp('drive-mirror', job_id='job0', device='quorum0',
@@ -964,9 +955,6 @@ class TestRepairQuorum(iotests.QMPTestCase):
                         'target image does not match source after mirroring')
 
     def test_pause(self):
-        if not iotests.supports_quorum():
-            return
-
         self.assert_no_active_block_jobs()
 
         result = self.vm.qmp('drive-mirror', job_id='job0', device='quorum0',
@@ -992,9 +980,6 @@ class TestRepairQuorum(iotests.QMPTestCase):
                         'target image does not match source after mirroring')
 
     def test_medium_not_found(self):
-        if not iotests.supports_quorum():
-            return
-
         if iotests.qemu_default_machine != 'pc':
             return
 
@@ -1006,9 +991,6 @@ class TestRepairQuorum(iotests.QMPTestCase):
         self.assert_qmp(result, 'error/class', 'GenericError')
 
     def test_image_not_found(self):
-        if not iotests.supports_quorum():
-            return
-
         result = self.vm.qmp('drive-mirror', job_id='job0', device='quorum0',
                              sync='full', node_name='repair0', replaces='img1',
                              mode='existing', target=quorum_repair_img,
@@ -1016,9 +998,6 @@ class TestRepairQuorum(iotests.QMPTestCase):
         self.assert_qmp(result, 'error/class', 'GenericError')
 
     def test_device_not_found(self):
-        if not iotests.supports_quorum():
-            return
-
         result = self.vm.qmp('drive-mirror', job_id='job0',
                              device='nonexistent', sync='full',
                              node_name='repair0',
@@ -1027,9 +1006,6 @@ class TestRepairQuorum(iotests.QMPTestCase):
         self.assert_qmp(result, 'error/class', 'GenericError')
 
     def test_wrong_sync_mode(self):
-        if not iotests.supports_quorum():
-            return
-
         result = self.vm.qmp('drive-mirror', device='quorum0', job_id='job0',
                              node_name='repair0',
                              replaces='img1',
@@ -1037,27 +1013,18 @@ class TestRepairQuorum(iotests.QMPTestCase):
         self.assert_qmp(result, 'error/class', 'GenericError')
 
     def test_no_node_name(self):
-        if not iotests.supports_quorum():
-            return
-
         result = self.vm.qmp('drive-mirror', job_id='job0', device='quorum0',
                              sync='full', replaces='img1',
                              target=quorum_repair_img, format=iotests.imgfmt)
         self.assert_qmp(result, 'error/class', 'GenericError')
 
     def test_nonexistent_replaces(self):
-        if not iotests.supports_quorum():
-            return
-
         result = self.vm.qmp('drive-mirror', job_id='job0', device='quorum0',
                              sync='full', node_name='repair0', replaces='img77',
                              target=quorum_repair_img, format=iotests.imgfmt)
         self.assert_qmp(result, 'error/class', 'GenericError')
 
     def test_after_a_quorum_snapshot(self):
-        if not iotests.supports_quorum():
-            return
-
         result = self.vm.qmp('blockdev-snapshot-sync', node_name='img1',
                              snapshot_file=quorum_snapshot_file,
                              snapshot_node_name="snap1");
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH for-5.0 v2 18/23] iotests: Add VM.assert_block_path()
  2019-11-11 16:01 [PATCH for-5.0 v2 00/23] block: Fix check_to_replace_node() Max Reitz
                   ` (16 preceding siblings ...)
  2019-11-11 16:02 ` [PATCH for-5.0 v2 17/23] iotests: Use skip_if_unsupported decorator in 041 Max Reitz
@ 2019-11-11 16:02 ` Max Reitz
  2019-12-03 12:59   ` Vladimir Sementsov-Ogievskiy
  2019-12-13 11:27   ` Vladimir Sementsov-Ogievskiy
  2019-11-11 16:02 ` [PATCH for-5.0 v2 19/23] iotests: Resolve TODOs in 041 Max Reitz
                   ` (5 subsequent siblings)
  23 siblings, 2 replies; 75+ messages in thread
From: Max Reitz @ 2019-11-11 16:02 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, Alberto Garcia,
	qemu-devel, Max Reitz

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 tests/qemu-iotests/iotests.py | 59 +++++++++++++++++++++++++++++++++++
 1 file changed, 59 insertions(+)

diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py
index d34305ce69..3e03320ce3 100644
--- a/tests/qemu-iotests/iotests.py
+++ b/tests/qemu-iotests/iotests.py
@@ -681,6 +681,65 @@ class VM(qtest.QEMUQtestMachine):
 
         return fields.items() <= ret.items()
 
+    """
+    Check whether the node under the given path in the block graph is
+    @expected_node.
+
+    @root is the node name of the node where the @path is rooted.
+
+    @path is a string that consists of child names separated by
+    slashes.  It must begin with a slash.
+
+    Examples for @root + @path:
+      - root="qcow2-node", path="/backing/file"
+      - root="quorum-node", path="/children.2/file"
+
+    Hypothetically, @path could be empty, in which case it would point
+    to @root.  However, in practice this case is not useful and hence
+    not allowed.
+
+    @expected_node may be None.
+
+    @graph may be None or the result of an x-debug-query-block-graph
+    call that has already been performed.
+    """
+    def assert_block_path(self, root, path, expected_node, graph=None):
+        if graph is None:
+            graph = self.qmp('x-debug-query-block-graph')['return']
+
+        iter_path = iter(path.split('/'))
+
+        # Must start with a /
+        assert next(iter_path) == ''
+
+        node = next((node for node in graph['nodes'] if node['name'] == root),
+                    None)
+
+        for path_node in iter_path:
+            assert node is not None, 'Cannot follow path %s' % path
+
+            try:
+                node_id = next(edge['child'] for edge in graph['edges'] \
+                                             if edge['parent'] == node['id'] and
+                                                edge['name'] == path_node)
+
+                node = next(node for node in graph['nodes'] \
+                                 if node['id'] == node_id)
+            except StopIteration:
+                node = None
+
+        assert node is not None or expected_node is None, \
+               'No node found under %s (but expected %s)' % \
+               (path, expected_node)
+
+        assert expected_node is not None or node is None, \
+               'Found node %s under %s (but expected none)' % \
+               (node['name'], path)
+
+        if node is not None and expected_node is not None:
+            assert node['name'] == expected_node, \
+                   'Found node %s under %s (but expected %s)' % \
+                   (node['name'], path, expected_node)
 
 index_re = re.compile(r'([^\[]+)\[([^\]]+)\]')
 
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH for-5.0 v2 19/23] iotests: Resolve TODOs in 041
  2019-11-11 16:01 [PATCH for-5.0 v2 00/23] block: Fix check_to_replace_node() Max Reitz
                   ` (17 preceding siblings ...)
  2019-11-11 16:02 ` [PATCH for-5.0 v2 18/23] iotests: Add VM.assert_block_path() Max Reitz
@ 2019-11-11 16:02 ` Max Reitz
  2019-12-03 13:32   ` Vladimir Sementsov-Ogievskiy
  2019-11-11 16:02 ` [PATCH for-5.0 v2 20/23] iotests: Use self.image_len in TestRepairQuorum Max Reitz
                   ` (4 subsequent siblings)
  23 siblings, 1 reply; 75+ messages in thread
From: Max Reitz @ 2019-11-11 16:02 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, Alberto Garcia,
	qemu-devel, Max Reitz

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 tests/qemu-iotests/041 | 7 ++-----
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/tests/qemu-iotests/041 b/tests/qemu-iotests/041
index 2ab59e9c53..d636cb7f1d 100755
--- a/tests/qemu-iotests/041
+++ b/tests/qemu-iotests/041
@@ -918,8 +918,7 @@ class TestRepairQuorum(iotests.QMPTestCase):
 
         self.complete_and_wait(drive="job0")
         self.assert_has_block_node("repair0", quorum_repair_img)
-        # TODO: a better test requiring some QEMU infrastructure will be added
-        #       to check that this file is really driven by quorum
+        self.vm.assert_block_path('quorum0', '/children.1', 'repair0')
         self.vm.shutdown()
         self.assertTrue(iotests.compare_images(quorum_img2, quorum_repair_img),
                         'target image does not match source after mirroring')
@@ -1041,9 +1040,7 @@ class TestRepairQuorum(iotests.QMPTestCase):
 
         self.complete_and_wait('job0')
         self.assert_has_block_node("repair0", quorum_repair_img)
-        # TODO: a better test requiring some QEMU infrastructure will be added
-        #       to check that this file is really driven by quorum
-        self.vm.shutdown()
+        self.vm.assert_block_path('quorum0', '/children.1', 'repair0')
 
 # Test mirroring with a source that does not have any parents (not even a
 # BlockBackend)
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH for-5.0 v2 20/23] iotests: Use self.image_len in TestRepairQuorum
  2019-11-11 16:01 [PATCH for-5.0 v2 00/23] block: Fix check_to_replace_node() Max Reitz
                   ` (18 preceding siblings ...)
  2019-11-11 16:02 ` [PATCH for-5.0 v2 19/23] iotests: Resolve TODOs in 041 Max Reitz
@ 2019-11-11 16:02 ` Max Reitz
  2019-11-11 16:02 ` [PATCH for-5.0 v2 21/23] iotests: Add tests for invalid Quorum @replaces Max Reitz
                   ` (3 subsequent siblings)
  23 siblings, 0 replies; 75+ messages in thread
From: Max Reitz @ 2019-11-11 16:02 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, Alberto Garcia,
	qemu-devel, Max Reitz

041's TestRepairQuorum has its own image_len, no need to refer to
TestSingleDrive.  (This patch allows commenting out TestSingleDrive to
speed up 041 during test testing.)

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 tests/qemu-iotests/041 | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tests/qemu-iotests/041 b/tests/qemu-iotests/041
index d636cb7f1d..0c1af45639 100755
--- a/tests/qemu-iotests/041
+++ b/tests/qemu-iotests/041
@@ -881,7 +881,7 @@ class TestRepairQuorum(iotests.QMPTestCase):
         # Add each individual quorum images
         for i in self.IMAGES:
             qemu_img('create', '-f', iotests.imgfmt, i,
-                     str(TestSingleDrive.image_len))
+                     str(self.image_len))
             # Assign a node name to each quorum image in order to manipulate
             # them
             opts = "node-name=img%i" % self.IMAGES.index(i)
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH for-5.0 v2 21/23] iotests: Add tests for invalid Quorum @replaces
  2019-11-11 16:01 [PATCH for-5.0 v2 00/23] block: Fix check_to_replace_node() Max Reitz
                   ` (19 preceding siblings ...)
  2019-11-11 16:02 ` [PATCH for-5.0 v2 20/23] iotests: Use self.image_len in TestRepairQuorum Max Reitz
@ 2019-11-11 16:02 ` Max Reitz
  2019-12-03 14:40   ` Vladimir Sementsov-Ogievskiy
  2019-11-11 16:02 ` [PATCH for-5.0 v2 22/23] iotests: Check that @replaces can replace filters Max Reitz
                   ` (2 subsequent siblings)
  23 siblings, 1 reply; 75+ messages in thread
From: Max Reitz @ 2019-11-11 16:02 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, Alberto Garcia,
	qemu-devel, Max Reitz

Add two tests to see that you cannot replace a Quorum child with the
mirror job while the child is in use by a different parent.

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 tests/qemu-iotests/041     | 70 +++++++++++++++++++++++++++++++++++++-
 tests/qemu-iotests/041.out |  4 +--
 2 files changed, 71 insertions(+), 3 deletions(-)

diff --git a/tests/qemu-iotests/041 b/tests/qemu-iotests/041
index 0c1af45639..ab0cb5b42f 100755
--- a/tests/qemu-iotests/041
+++ b/tests/qemu-iotests/041
@@ -20,6 +20,7 @@
 
 import time
 import os
+import re
 import iotests
 from iotests import qemu_img, qemu_io
 
@@ -34,6 +35,8 @@ quorum_img3 = os.path.join(iotests.test_dir, 'quorum3.img')
 quorum_repair_img = os.path.join(iotests.test_dir, 'quorum_repair.img')
 quorum_snapshot_file = os.path.join(iotests.test_dir, 'quorum_snapshot.img')
 
+nbd_sock_path = os.path.join(iotests.test_dir, 'nbd.sock')
+
 class TestSingleDrive(iotests.QMPTestCase):
     image_len = 1 * 1024 * 1024 # MB
     qmp_cmd = 'drive-mirror'
@@ -901,7 +904,8 @@ class TestRepairQuorum(iotests.QMPTestCase):
 
     def tearDown(self):
         self.vm.shutdown()
-        for i in self.IMAGES + [ quorum_repair_img, quorum_snapshot_file ]:
+        for i in self.IMAGES + [ quorum_repair_img, quorum_snapshot_file,
+                                 nbd_sock_path ]:
             # Do a try/except because the test may have deleted some images
             try:
                 os.remove(i)
@@ -1042,6 +1046,70 @@ class TestRepairQuorum(iotests.QMPTestCase):
         self.assert_has_block_node("repair0", quorum_repair_img)
         self.vm.assert_block_path('quorum0', '/children.1', 'repair0')
 
+    """
+    Check that we cannot replace a Quorum child when it has other
+    parents.
+    """
+    def test_with_other_parent(self):
+        result = self.vm.qmp('nbd-server-start',
+                             addr={
+                                 'type': 'unix',
+                                 'data': {'path': nbd_sock_path}
+                             })
+        self.assert_qmp(result, 'return', {})
+
+        result = self.vm.qmp('nbd-server-add', device='img1')
+        self.assert_qmp(result, 'return', {})
+
+        result = self.vm.qmp('drive-mirror', job_id='mirror', device='quorum0',
+                             sync='full', node_name='repair0', replaces='img1',
+                             target=quorum_repair_img, format=iotests.imgfmt)
+        self.assert_qmp(result, 'error/desc',
+                        "Cannot replace 'img1' by a node mirrored from "
+                        "'quorum0', because it cannot be guaranteed that doing "
+                        "so would not lead to an abrupt change of visible data")
+
+    """
+    The same as test_with_other_parent(), but add the NBD server only
+    when the mirror job is already running.
+    """
+    def test_with_other_parents_after_mirror_start(self):
+        result = self.vm.qmp('nbd-server-start',
+                             addr={
+                                 'type': 'unix',
+                                 'data': {'path': nbd_sock_path}
+                             })
+        self.assert_qmp(result, 'return', {})
+
+        result = self.vm.qmp('drive-mirror', job_id='mirror', device='quorum0',
+                             sync='full', node_name='repair0', replaces='img1',
+                             target=quorum_repair_img, format=iotests.imgfmt)
+        self.assert_qmp(result, 'return', {})
+
+        result = self.vm.qmp('nbd-server-add', device='img1')
+        self.assert_qmp(result, 'return', {})
+
+        # The full error message goes to stderr, we will check it later
+        self.complete_and_wait('mirror',
+                               completion_error='Operation not permitted')
+
+        # Should not have been replaced
+        self.vm.assert_block_path('quorum0', '/children.1', 'img1')
+
+        # Check the full error message now
+        self.vm.shutdown()
+        log = self.vm.get_log()
+        log = re.sub(r'^\[I \d+\.\d+\] OPENED\n', '', log)
+        log = re.sub(r'^Formatting.*\n', '', log)
+        log = re.sub(r'\n\[I \+\d+\.\d+\] CLOSED\n?$', '', log)
+        log = re.sub(r'^qemu-system-[^:]*: ', '', log)
+
+        self.assertEqual(log,
+                         "Can no longer replace 'img1' by 'repair0', because " +
+                         "it can no longer be guaranteed that doing so would " +
+                         "not lead to an abrupt change of visible data")
+
+
 # Test mirroring with a source that does not have any parents (not even a
 # BlockBackend)
 class TestOrphanedSource(iotests.QMPTestCase):
diff --git a/tests/qemu-iotests/041.out b/tests/qemu-iotests/041.out
index f496be9197..ffc779b4d1 100644
--- a/tests/qemu-iotests/041.out
+++ b/tests/qemu-iotests/041.out
@@ -1,5 +1,5 @@
-...........................................................................................
+.............................................................................................
 ----------------------------------------------------------------------
-Ran 91 tests
+Ran 93 tests
 
 OK
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH for-5.0 v2 22/23] iotests: Check that @replaces can replace filters
  2019-11-11 16:01 [PATCH for-5.0 v2 00/23] block: Fix check_to_replace_node() Max Reitz
                   ` (20 preceding siblings ...)
  2019-11-11 16:02 ` [PATCH for-5.0 v2 21/23] iotests: Add tests for invalid Quorum @replaces Max Reitz
@ 2019-11-11 16:02 ` Max Reitz
  2019-12-03 15:58   ` Vladimir Sementsov-Ogievskiy
  2019-11-11 16:02 ` [PATCH for-5.0 v2 23/23] iotests: Mirror must not attempt to create loops Max Reitz
  2019-11-29 12:24 ` [PATCH for-5.0 v2 00/23] block: Fix check_to_replace_node() Vladimir Sementsov-Ogievskiy
  23 siblings, 1 reply; 75+ messages in thread
From: Max Reitz @ 2019-11-11 16:02 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, Alberto Garcia,
	qemu-devel, Max Reitz

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 tests/qemu-iotests/041     | 46 ++++++++++++++++++++++++++++++++++++++
 tests/qemu-iotests/041.out |  4 ++--
 2 files changed, 48 insertions(+), 2 deletions(-)

diff --git a/tests/qemu-iotests/041 b/tests/qemu-iotests/041
index ab0cb5b42f..9a00cf6f7b 100755
--- a/tests/qemu-iotests/041
+++ b/tests/qemu-iotests/041
@@ -1200,6 +1200,52 @@ class TestOrphanedSource(iotests.QMPTestCase):
         self.assertFalse('mirror-filter' in nodes,
                          'Mirror filter node did not disappear')
 
+# Test cases for @replaces that do not necessarily involve Quorum
+class TestReplaces(iotests.QMPTestCase):
+    # Each of these test cases needs their own block graph, so do not
+    # create any nodes here
+    def setUp(self):
+        self.vm = iotests.VM()
+        self.vm.launch()
+
+    def tearDown(self):
+        self.vm.shutdown()
+        for img in (test_img, target_img):
+            try:
+                os.remove(img)
+            except OSError:
+                pass
+
+    """
+    Check that we can replace filter nodes.
+    """
+    @iotests.skip_if_unsupported(['copy-on-read'])
+    def test_replace_filter(self):
+        result = self.vm.qmp('blockdev-add', **{
+                                 'driver': 'copy-on-read',
+                                 'node-name': 'filter0',
+                                 'file': {
+                                     'driver': 'copy-on-read',
+                                     'node-name': 'filter1',
+                                     'file': {
+                                         'driver': 'null-co'
+                                     }
+                                 }
+                             })
+        self.assert_qmp(result, 'return', {})
+
+        result = self.vm.qmp('blockdev-add',
+                             node_name='target', driver='null-co')
+        self.assert_qmp(result, 'return', {})
+
+        result = self.vm.qmp('blockdev-mirror', job_id='mirror', device='filter0',
+                             target='target', sync='full', replaces='filter1')
+        self.assert_qmp(result, 'return', {})
+
+        self.complete_and_wait('mirror')
+
+        self.vm.assert_block_path('filter0', '/file', 'target')
+
 if __name__ == '__main__':
     iotests.main(supported_fmts=['qcow2', 'qed'],
                  supported_protocols=['file'])
diff --git a/tests/qemu-iotests/041.out b/tests/qemu-iotests/041.out
index ffc779b4d1..877b76fd31 100644
--- a/tests/qemu-iotests/041.out
+++ b/tests/qemu-iotests/041.out
@@ -1,5 +1,5 @@
-.............................................................................................
+..............................................................................................
 ----------------------------------------------------------------------
-Ran 93 tests
+Ran 94 tests
 
 OK
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH for-5.0 v2 23/23] iotests: Mirror must not attempt to create loops
  2019-11-11 16:01 [PATCH for-5.0 v2 00/23] block: Fix check_to_replace_node() Max Reitz
                   ` (21 preceding siblings ...)
  2019-11-11 16:02 ` [PATCH for-5.0 v2 22/23] iotests: Check that @replaces can replace filters Max Reitz
@ 2019-11-11 16:02 ` Max Reitz
  2019-12-03 17:03   ` Vladimir Sementsov-Ogievskiy
  2019-11-29 12:24 ` [PATCH for-5.0 v2 00/23] block: Fix check_to_replace_node() Vladimir Sementsov-Ogievskiy
  23 siblings, 1 reply; 75+ messages in thread
From: Max Reitz @ 2019-11-11 16:02 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, Alberto Garcia,
	qemu-devel, Max Reitz

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 tests/qemu-iotests/041     | 235 +++++++++++++++++++++++++++++++++++++
 tests/qemu-iotests/041.out |   4 +-
 2 files changed, 237 insertions(+), 2 deletions(-)

diff --git a/tests/qemu-iotests/041 b/tests/qemu-iotests/041
index 9a00cf6f7b..0e43bb699d 100755
--- a/tests/qemu-iotests/041
+++ b/tests/qemu-iotests/041
@@ -1246,6 +1246,241 @@ class TestReplaces(iotests.QMPTestCase):
 
         self.vm.assert_block_path('filter0', '/file', 'target')
 
+    """
+    See what happens when the @sync/@replaces configuration dictates
+    creating a loop.
+    """
+    @iotests.skip_if_unsupported(['throttle'])
+    def test_loop(self):
+        qemu_img('create', '-f', iotests.imgfmt, test_img, str(1 * 1024 * 1024))
+
+        # Dummy group so we can create a NOP filter
+        result = self.vm.qmp('object-add', qom_type='throttle-group', id='tg0')
+        self.assert_qmp(result, 'return', {})
+
+        result = self.vm.qmp('blockdev-add', **{
+                                 'driver': 'throttle',
+                                 'node-name': 'source',
+                                 'throttle-group': 'tg0',
+                                 'file': {
+                                     'driver': iotests.imgfmt,
+                                     'node-name': 'filtered',
+                                     'file': {
+                                         'driver': 'file',
+                                         'filename': test_img
+                                     }
+                                 }
+                             })
+        self.assert_qmp(result, 'return', {})
+
+        # Block graph is now:
+        #   source[throttle] --file--> filtered[imgfmt] --file--> ...
+
+        result = self.vm.qmp('drive-mirror', job_id='mirror', device='source',
+                             target=target_img, format=iotests.imgfmt,
+                             node_name='target', sync='none',
+                             replaces='filtered')
+
+        """
+        Block graph before mirror exits would be (ignoring mirror_top):
+          source[throttle] --file--> filtered[imgfmt] --file--> ...
+          target[imgfmt] --file--> ...
+
+        Then, because of sync=none and drive-mirror in absolute-paths mode,
+        the source is attached to the target:
+          source[throttle] --file--> filtered[imgfmt] --file--> ...
+                 ^
+              backing
+                 |
+            target[imgfmt] --file--> ...
+
+        Replacing filtered by target would yield:
+          source[throttle] --file--> target[imgfmt] --file--> ...
+                 ^                        |
+                 +------- backing --------+
+
+        I.e., a loop.  bdrv_replace_node() detects this and simply
+        does not let source's file link point to target.  However,
+        that means that target cannot really replace source.
+
+        drive-mirror should detect this and not allow this case.
+        """
+
+        self.assert_qmp(result, 'error/desc',
+                        "Replacing 'filtered' by 'target' with this sync " + \
+                        "mode would result in a loop, because the former " + \
+                        "would be a child of the latter's backing file " + \
+                        "('source') after the mirror job")
+
+    """
+    Test what happens when there would be no loop with the pre-mirror
+    configuration, but something changes during the mirror job that asks
+    for a loop to be created during completion.
+    """
+    @iotests.skip_if_unsupported(['copy-on-read', 'quorum'])
+    def test_loop_during_mirror(self):
+        qemu_img('create', '-f', iotests.imgfmt, test_img, str(1 * 1024 * 1024))
+
+        """
+        In this test, we are going to mirror from a node that is a
+        filter above some file "common-base".  The target is a quorum
+        node (with just an unrelated null-co child).
+
+        We will ask the mirror job to replace common-base by the
+        target upon completion.  That is a completely valid
+        configuration so far.
+
+        However, while the job is running, we add common-base as an
+        (indirect[1]) child to the target quorum node.  This way,
+        completing the job as requested would yield a loop, because
+        the target would be supposed to replace common-base -- which
+        is its own (indirect) child.
+
+        [1] It needs to be an indirect child, because if it were a
+        direct child, the mirror job would simply end by effectively
+        injecting the target above common-base.  This is the same
+        effect as when using sync=none: The target ends up above the
+        source.
+
+        So only loops that have a length of more than one node are
+        forbidden, which means common-base must be an indirect child
+        of the target.
+
+        (Furthermore, we are going to use x-blockdev-change to add
+        common-base as a child to the target.  This command only
+        allows doing so for nodes that have no parent yet.
+        common-base will have a parent already, though, namely the
+        source node.  Therefore, this is another reason why we need at
+        least one node above common-base, so this parent can become
+        target's child during the mirror.)
+        """
+
+        result = self.vm.qmp('blockdev-add', **{
+                                 'driver': 'null-co',
+                                 'node-name': 'common-base',
+                                 'read-zeroes': True,
+                                 'size': 1 * 1024 * 1024
+                             })
+        self.assert_qmp(result, 'return', {})
+
+        result = self.vm.qmp('blockdev-add', **{
+                                 'driver': 'copy-on-read',
+                                 'node-name': 'source',
+                                 'file': 'common-base'
+                             })
+        self.assert_qmp(result, 'return', {})
+
+        """
+        As explained above, we have to create a parent above
+        common-base.
+
+        We cannot use any parent that would forward the RESIZE
+        permission, because the job takes it on the target, but
+        unshares it on the source: After the x-blockdev-change
+        operation during the mirror job, this parent will be a child
+        of the target, so common-base will be an (indirect) child of
+        both the mirror's source and target.  Thus, the job would
+        conflict with itself.
+
+        Therefore, we make common-base a backing child of a $imgfmt
+        node.  Unfortunately, we cannot let the mirror job replace a
+        node that acts as a backing child somewhere (because of an op
+        blocker), so we put another raw node between the $imgfmt node
+        and common-base.
+        """
+        result = self.vm.qmp('blockdev-add', **{
+                                 'driver': iotests.imgfmt,
+                                 'node-name': 'base-parent',
+                                 'file': {
+                                     'driver': 'file',
+                                     'filename': test_img
+                                 },
+                                 'backing': {
+                                     'driver': 'raw',
+                                     'file': 'common-base'
+                                 }
+                             })
+
+        """
+        Add a quorum node with a single child, we will add base-parent
+        to prepare a loop later.
+        (We do not care about this single child at all, but it is
+        impossible to create a quorum node without any children.  We
+        will ignore this child from now on.)
+        """
+        result = self.vm.qmp('blockdev-add', **{
+                                 'driver': 'quorum',
+                                 'node-name': 'target',
+                                 'vote-threshold': 1,
+                                 'children': [
+                                     {
+                                         'driver': 'null-co',
+                                         'read-zeroes': True,
+                                         'size': 1 * 1024 * 1024
+                                     }
+                                 ]
+                             })
+        self.assert_qmp(result, 'return', {})
+
+        """
+        Current block graph:
+
+        base-parent[$imgfmt] --backing--> [raw]
+                                            |
+                                           file
+                                            v
+              source[COR] --file--> common-base[null-co]
+
+        target[quorum]
+
+
+        The following blockdev-mirror asks for this graph post-mirror:
+
+        base-parent[$imgfmt] --backing--> [raw]
+                                            |
+                                           file
+                                            v
+                source[COR] --file--> target[quorum]
+
+        That would be a valid configuration without any loops.
+        """
+
+        result = self.vm.qmp('blockdev-mirror', job_id='mirror',
+                             device='source', target='target', sync='full',
+                             replaces='common-base')
+        self.assert_qmp(result, 'return', {})
+
+        """
+        However, now we will make base-parent a child of target.
+        Before the mirror job completes, that is still completely
+        valid:
+
+                                             source
+                                               |
+                                               v
+        target -> base-parent -> [raw] -> common-base
+        """
+
+        result = self.vm.qmp('x-blockdev-change',
+                             parent='target', node='base-parent')
+        self.assert_qmp(result, 'return', {})
+
+        """
+        However, post-mirror, we thus ask for a loop:
+
+        source -> target (replaced common-base) -> base-parent
+                                  ^                    |
+                                  |                    v
+                                  +----------------- [raw]
+
+        bdrv_replace_node() would not allow such a configuration, but
+        we should not pretend we can create it, so the mirror job
+        should fail during completion.
+        """
+
+        self.complete_and_wait('mirror',
+                               completion_error='Operation not permitted')
+
 if __name__ == '__main__':
     iotests.main(supported_fmts=['qcow2', 'qed'],
                  supported_protocols=['file'])
diff --git a/tests/qemu-iotests/041.out b/tests/qemu-iotests/041.out
index 877b76fd31..20a8158b99 100644
--- a/tests/qemu-iotests/041.out
+++ b/tests/qemu-iotests/041.out
@@ -1,5 +1,5 @@
-..............................................................................................
+................................................................................................
 ----------------------------------------------------------------------
-Ran 94 tests
+Ran 96 tests
 
 OK
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 75+ messages in thread

* Re: [PATCH for-5.0 v2 05/23] quorum: Fix child permissions
  2019-11-11 16:01 ` [PATCH for-5.0 v2 05/23] quorum: Fix child permissions Max Reitz
@ 2019-11-29  9:14   ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 75+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2019-11-29  9:14 UTC (permalink / raw)
  To: Max Reitz, qemu-block; +Cc: Kevin Wolf, Alberto Garcia, qemu-devel

11.11.2019 19:01, Max Reitz wrote:
> Quorum cannot share WRITE or RESIZE on its children.  Presumably, it
> only does so because as a filter, it seemed intuitively correct to point
> its .bdrv_child_perm to bdrv_filter_default_perm().
> 
> However, it is not really a filter, and bdrv_filter_default_perm() does
> not work for it, so we have to provide a custom .bdrv_child_perm
> implementation.
> 
> Signed-off-by: Max Reitz<mreitz@redhat.com>

Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>

-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH for-5.0 v2 06/23] block: Add bdrv_recurse_can_replace()
  2019-11-11 16:01 ` [PATCH for-5.0 v2 06/23] block: Add bdrv_recurse_can_replace() Max Reitz
@ 2019-11-29  9:34   ` Vladimir Sementsov-Ogievskiy
  2019-11-29 10:23     ` Max Reitz
  0 siblings, 1 reply; 75+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2019-11-29  9:34 UTC (permalink / raw)
  To: Max Reitz, qemu-block; +Cc: Kevin Wolf, Alberto Garcia, qemu-devel

11.11.2019 19:01, Max Reitz wrote:
> After a couple of follow-up patches, this function will replace
> bdrv_recurse_is_first_non_filter() in check_to_replace_node().
> 
> bdrv_recurse_is_first_non_filter() is both not sufficiently specific for
> check_to_replace_node() (it allows cases that should not be allowed,
> like replacing child nodes of quorum with dissenting data that have more
> parents than just quorum), and it is too restrictive (it is perfectly
> fine to replace filters).
> 
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>   block.c                   | 38 ++++++++++++++++++++++++++++++++++++++
>   include/block/block_int.h | 10 ++++++++++
>   2 files changed, 48 insertions(+)
> 
> diff --git a/block.c b/block.c
> index 9b1049786a..de53addeb0 100644
> --- a/block.c
> +++ b/block.c
> @@ -6205,6 +6205,44 @@ bool bdrv_recurse_is_first_non_filter(BlockDriverState *bs,
>       return false;
>   }
>   
> +/*
> + * This function checks whether the given @to_replace is allowed to be
> + * replaced by a node that always shows the same data as @bs.  This is
> + * used for example to verify whether the mirror job can replace
> + * @to_replace by the target mirrored from @bs.
> + * To be replaceable, @bs and @to_replace may either be guaranteed to
> + * always show the same data (because they are only connected through
> + * filters), or some driver may allow replacing one of its children
> + * because it can guarantee that this child's data is not visible at
> + * all (for example, for dissenting quorum children that have no other
> + * parents).
> + */
> +bool bdrv_recurse_can_replace(BlockDriverState *bs,
> +                              BlockDriverState *to_replace)
> +{
> +    if (!bs || !bs->drv) {
> +        return false;
> +    }
> +
> +    if (bs == to_replace) {
> +        return true;
> +    }
> +
> +    /* See what the driver can do */
> +    if (bs->drv->bdrv_recurse_can_replace) {
> +        return bs->drv->bdrv_recurse_can_replace(bs, to_replace);
> +    }
> +
> +    /* For filters without an own implementation, we can recurse on our own */
> +    if (bs->drv->is_filter) {
> +        BdrvChild *child = bs->file ?: bs->backing;

should we check that child != NULL ?

> +        return bdrv_recurse_can_replace(child->bs, to_replace);
> +    }
> +
> +    /* Safe default */
> +    return false;
> +}
> +
>   BlockDriverState *check_to_replace_node(BlockDriverState *parent_bs,
>                                           const char *node_name, Error **errp)
>   {
> diff --git a/include/block/block_int.h b/include/block/block_int.h
> index dd033d0b37..75f03dcc38 100644
> --- a/include/block/block_int.h
> +++ b/include/block/block_int.h
> @@ -102,6 +102,13 @@ struct BlockDriver {
>        */
>       bool (*bdrv_recurse_is_first_non_filter)(BlockDriverState *bs,
>                                                BlockDriverState *candidate);
> +    /*
> +     * Return true if @to_replace can be replaced by a BDS with the
> +     * same data as @bs without it affecting @bs's behavior (that is,
> +     * without it being visible to @bs's parents).
> +     */
> +    bool (*bdrv_recurse_can_replace)(BlockDriverState *bs,
> +                                     BlockDriverState *to_replace);
>   
>       int (*bdrv_probe)(const uint8_t *buf, int buf_size, const char *filename);
>       int (*bdrv_probe_device)(const char *filename);
> @@ -1264,6 +1271,9 @@ void bdrv_format_default_perms(BlockDriverState *bs, BdrvChild *c,
>                                  uint64_t perm, uint64_t shared,
>                                  uint64_t *nperm, uint64_t *nshared);
>   
> +bool bdrv_recurse_can_replace(BlockDriverState *bs,
> +                              BlockDriverState *to_replace);
> +
>   /*
>    * Default implementation for drivers to pass bdrv_co_block_status() to
>    * their file.
> 


-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH for-5.0 v2 07/23] blkverify: Implement .bdrv_recurse_can_replace()
  2019-11-11 16:02 ` [PATCH for-5.0 v2 07/23] blkverify: Implement .bdrv_recurse_can_replace() Max Reitz
@ 2019-11-29  9:41   ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 75+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2019-11-29  9:41 UTC (permalink / raw)
  To: Max Reitz, qemu-block; +Cc: Kevin Wolf, Alberto Garcia, qemu-devel

11.11.2019 19:02, Max Reitz wrote:
> Signed-off-by: Max Reitz<mreitz@redhat.com>

Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>

-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH for-5.0 v2 08/23] quorum: Store children in own structure
  2019-11-11 16:02 ` [PATCH for-5.0 v2 08/23] quorum: Store children in own structure Max Reitz
@ 2019-11-29  9:46   ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 75+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2019-11-29  9:46 UTC (permalink / raw)
  To: Max Reitz, qemu-block; +Cc: Kevin Wolf, Alberto Garcia, qemu-devel

11.11.2019 19:02, Max Reitz wrote:
> This will be useful when we want to store additional attributes for each
> child.
> 
> Signed-off-by: Max Reitz<mreitz@redhat.com>

you forget my
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>

-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH for-5.0 v2 09/23] quorum: Add QuorumChild.to_be_replaced
  2019-11-11 16:02 ` [PATCH for-5.0 v2 09/23] quorum: Add QuorumChild.to_be_replaced Max Reitz
@ 2019-11-29  9:59   ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 75+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2019-11-29  9:59 UTC (permalink / raw)
  To: Max Reitz, qemu-block; +Cc: Kevin Wolf, Alberto Garcia, qemu-devel

11.11.2019 19:02, Max Reitz wrote:
> We will need this to verify that Quorum can let one of its children be
> replaced without breaking anything else.
> 
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>   block/quorum.c | 23 +++++++++++++++++++++++
>   1 file changed, 23 insertions(+)
> 
> diff --git a/block/quorum.c b/block/quorum.c
> index 59cd524502..3a824e77e3 100644
> --- a/block/quorum.c
> +++ b/block/quorum.c
> @@ -67,6 +67,13 @@ typedef struct QuorumVotes {
>   
>   typedef struct QuorumChild {
>       BdrvChild *child;
> +
> +    /*
> +     * If set, check whether this node can be replaced without any
> +     * other parent noticing: Unshare CONSISTENT_READ, and take the
> +     * WRITE permission.
> +     */
> +    bool to_be_replaced;
>   } QuorumChild;
>   
>   /* the following structure holds the state of one quorum instance */
> @@ -1128,6 +1135,16 @@ static void quorum_child_perm(BlockDriverState *bs, BdrvChild *c,
>                                 uint64_t perm, uint64_t shared,
>                                 uint64_t *nperm, uint64_t *nshared)
>   {
> +    BDRVQuorumState *s = bs->opaque;
> +    int i;
> +

loop is still useless if c == NULL...

if (c) {

> +    for (i = 0; i < s->num_children; i++) {
> +        if (s->children[i].child == c) {
> +            break;
> +        }
> +    }


        assert(i < s->num_children);

}

> +
>       *nperm = perm & DEFAULT_PERM_PASSTHROUGH;
>   
>       /*
> @@ -1137,6 +1154,12 @@ static void quorum_child_perm(BlockDriverState *bs, BdrvChild *c,
>       *nshared = (shared & (BLK_PERM_CONSISTENT_READ |
>                             BLK_PERM_WRITE_UNCHANGED))
>                | DEFAULT_PERM_UNCHANGED;
> +
> +    if (c && s->children[i].to_be_replaced) {
> +        /* Prepare for sudden data changes */
> +        *nperm |= BLK_PERM_WRITE;
> +        *nshared &= ~BLK_PERM_CONSISTENT_READ;
> +    }
>   }
>   
>   static const char *const quorum_strong_runtime_opts[] = {
> 

with or without "if (c)":

Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>


-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH for-5.0 v2 10/23] quorum: Implement .bdrv_recurse_can_replace()
  2019-11-11 16:02 ` [PATCH for-5.0 v2 10/23] quorum: Implement .bdrv_recurse_can_replace() Max Reitz
@ 2019-11-29 10:18   ` Vladimir Sementsov-Ogievskiy
  2019-11-29 12:50     ` Max Reitz
  2020-02-05 15:55   ` Kevin Wolf
  1 sibling, 1 reply; 75+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2019-11-29 10:18 UTC (permalink / raw)
  To: Max Reitz, qemu-block; +Cc: Kevin Wolf, Alberto Garcia, qemu-devel

11.11.2019 19:02, Max Reitz wrote:
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>   block/quorum.c | 62 ++++++++++++++++++++++++++++++++++++++++++++++++++
>   1 file changed, 62 insertions(+)
> 
> diff --git a/block/quorum.c b/block/quorum.c
> index 3a824e77e3..8ee03e9baf 100644
> --- a/block/quorum.c
> +++ b/block/quorum.c
> @@ -825,6 +825,67 @@ static bool quorum_recurse_is_first_non_filter(BlockDriverState *bs,
>       return false;
>   }
>   
> +static bool quorum_recurse_can_replace(BlockDriverState *bs,
> +                                       BlockDriverState *to_replace)
> +{
> +    BDRVQuorumState *s = bs->opaque;
> +    int i;
> +
> +    for (i = 0; i < s->num_children; i++) {
> +        /*
> +         * We have no idea whether our children show the same data as
> +         * this node (@bs).  It is actually highly likely that
> +         * @to_replace does not, because replacing a broken child is
> +         * one of the main use cases here.
> +         *
> +         * We do know that the new BDS will match @bs, so replacing
> +         * any of our children by it will be safe.  It cannot change
> +         * the data this quorum node presents to its parents.
> +         *
> +         * However, replacing @to_replace by @bs in any of our
> +         * children's chains may change visible data somewhere in
> +         * there.  We therefore cannot recurse down those chains with
> +         * bdrv_recurse_can_replace().
> +         * (More formally, bdrv_recurse_can_replace() requires that
> +         * @to_replace will be replaced by something matching the @bs
> +         * passed to it.  We cannot guarantee that.)
> +         *
> +         * Thus, we can only check whether any of our immediate
> +         * children matches @to_replace.
> +         *
> +         * (In the future, we might add a function to recurse down a
> +         * chain that checks that nothing there cares about a change
> +         * in data from the respective child in question.  For
> +         * example, most filters do not care when their child's data
> +         * suddenly changes, as long as their parents do not care.)
> +         */
> +        if (s->children[i].child->bs == to_replace) {
> +            Error *local_err = NULL;

bdrv_child_refresh_perms returns int, so I suggest instead:


bool ok;

> +
> +            /*
> +             * We now have to ensure that there is no other parent
> +             * that cares about replacing this child by a node with
> +             * potentially different data.
> +             */
> +            s->children[i].to_be_replaced = true;
> +            bdrv_child_refresh_perms(bs, s->children[i].child, &local_err);

ok = !bdrv_child_refresh_perms(bs, s->children[i].child, NULL);

> +
> +            /* Revert permissions */
> +            s->children[i].to_be_replaced = false;
> +            bdrv_child_refresh_perms(bs, s->children[i].child, &error_abort);

return ok;

Or similar with // int ret; // ret = // return !ret; //

> +
> +            if (local_err) {
> +                error_free(local_err);
> +                return false;
> +            }
> +
> +            return true;
> +        }
> +    }
> +
> +    return false;
> +}
> +
>   static int quorum_valid_threshold(int threshold, int num_children, Error **errp)
>   {
>   
> @@ -1195,6 +1256,7 @@ static BlockDriver bdrv_quorum = {
>   
>       .is_filter                          = true,
>       .bdrv_recurse_is_first_non_filter   = quorum_recurse_is_first_non_filter,
> +    .bdrv_recurse_can_replace           = quorum_recurse_can_replace,
>   
>       .strong_runtime_opts                = quorum_strong_runtime_opts,
>   };
> 

with or without my suggestion:
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>

-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH for-5.0 v2 06/23] block: Add bdrv_recurse_can_replace()
  2019-11-29  9:34   ` Vladimir Sementsov-Ogievskiy
@ 2019-11-29 10:23     ` Max Reitz
  2019-11-29 11:04       ` Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 75+ messages in thread
From: Max Reitz @ 2019-11-29 10:23 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-block
  Cc: Kevin Wolf, Alberto Garcia, qemu-devel


[-- Attachment #1.1: Type: text/plain, Size: 4332 bytes --]

On 29.11.19 10:34, Vladimir Sementsov-Ogievskiy wrote:
> 11.11.2019 19:01, Max Reitz wrote:
>> After a couple of follow-up patches, this function will replace
>> bdrv_recurse_is_first_non_filter() in check_to_replace_node().
>>
>> bdrv_recurse_is_first_non_filter() is both not sufficiently specific for
>> check_to_replace_node() (it allows cases that should not be allowed,
>> like replacing child nodes of quorum with dissenting data that have more
>> parents than just quorum), and it is too restrictive (it is perfectly
>> fine to replace filters).
>>
>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>> ---
>>   block.c                   | 38 ++++++++++++++++++++++++++++++++++++++
>>   include/block/block_int.h | 10 ++++++++++
>>   2 files changed, 48 insertions(+)
>>
>> diff --git a/block.c b/block.c
>> index 9b1049786a..de53addeb0 100644
>> --- a/block.c
>> +++ b/block.c
>> @@ -6205,6 +6205,44 @@ bool bdrv_recurse_is_first_non_filter(BlockDriverState *bs,
>>       return false;
>>   }
>>   
>> +/*
>> + * This function checks whether the given @to_replace is allowed to be
>> + * replaced by a node that always shows the same data as @bs.  This is
>> + * used for example to verify whether the mirror job can replace
>> + * @to_replace by the target mirrored from @bs.
>> + * To be replaceable, @bs and @to_replace may either be guaranteed to
>> + * always show the same data (because they are only connected through
>> + * filters), or some driver may allow replacing one of its children
>> + * because it can guarantee that this child's data is not visible at
>> + * all (for example, for dissenting quorum children that have no other
>> + * parents).
>> + */
>> +bool bdrv_recurse_can_replace(BlockDriverState *bs,
>> +                              BlockDriverState *to_replace)
>> +{
>> +    if (!bs || !bs->drv) {
>> +        return false;
>> +    }
>> +
>> +    if (bs == to_replace) {
>> +        return true;
>> +    }
>> +
>> +    /* See what the driver can do */
>> +    if (bs->drv->bdrv_recurse_can_replace) {
>> +        return bs->drv->bdrv_recurse_can_replace(bs, to_replace);
>> +    }
>> +
>> +    /* For filters without an own implementation, we can recurse on our own */
>> +    if (bs->drv->is_filter) {
>> +        BdrvChild *child = bs->file ?: bs->backing;
> 
> should we check that child != NULL ?

I’d say that normally (once they are open) filters must have a child,
and so I’d make it an assertion.  But then again an assertion isn’t much
better than the dereferencing that follows, I think. :?

Max

>> +        return bdrv_recurse_can_replace(child->bs, to_replace);
>> +    }
>> +
>> +    /* Safe default */
>> +    return false;
>> +}
>> +
>>   BlockDriverState *check_to_replace_node(BlockDriverState *parent_bs,
>>                                           const char *node_name, Error **errp)
>>   {
>> diff --git a/include/block/block_int.h b/include/block/block_int.h
>> index dd033d0b37..75f03dcc38 100644
>> --- a/include/block/block_int.h
>> +++ b/include/block/block_int.h
>> @@ -102,6 +102,13 @@ struct BlockDriver {
>>        */
>>       bool (*bdrv_recurse_is_first_non_filter)(BlockDriverState *bs,
>>                                                BlockDriverState *candidate);
>> +    /*
>> +     * Return true if @to_replace can be replaced by a BDS with the
>> +     * same data as @bs without it affecting @bs's behavior (that is,
>> +     * without it being visible to @bs's parents).
>> +     */
>> +    bool (*bdrv_recurse_can_replace)(BlockDriverState *bs,
>> +                                     BlockDriverState *to_replace);
>>   
>>       int (*bdrv_probe)(const uint8_t *buf, int buf_size, const char *filename);
>>       int (*bdrv_probe_device)(const char *filename);
>> @@ -1264,6 +1271,9 @@ void bdrv_format_default_perms(BlockDriverState *bs, BdrvChild *c,
>>                                  uint64_t perm, uint64_t shared,
>>                                  uint64_t *nperm, uint64_t *nshared);
>>   
>> +bool bdrv_recurse_can_replace(BlockDriverState *bs,
>> +                              BlockDriverState *to_replace);
>> +
>>   /*
>>    * Default implementation for drivers to pass bdrv_co_block_status() to
>>    * their file.
>>
> 
> 



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH for-5.0 v2 06/23] block: Add bdrv_recurse_can_replace()
  2019-11-29 10:23     ` Max Reitz
@ 2019-11-29 11:04       ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 75+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2019-11-29 11:04 UTC (permalink / raw)
  To: Max Reitz, qemu-block; +Cc: Kevin Wolf, Alberto Garcia, qemu-devel

29.11.2019 13:23, Max Reitz wrote:
> On 29.11.19 10:34, Vladimir Sementsov-Ogievskiy wrote:
>> 11.11.2019 19:01, Max Reitz wrote:
>>> After a couple of follow-up patches, this function will replace
>>> bdrv_recurse_is_first_non_filter() in check_to_replace_node().
>>>
>>> bdrv_recurse_is_first_non_filter() is both not sufficiently specific for
>>> check_to_replace_node() (it allows cases that should not be allowed,
>>> like replacing child nodes of quorum with dissenting data that have more
>>> parents than just quorum), and it is too restrictive (it is perfectly
>>> fine to replace filters).
>>>
>>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>>> ---
>>>    block.c                   | 38 ++++++++++++++++++++++++++++++++++++++
>>>    include/block/block_int.h | 10 ++++++++++
>>>    2 files changed, 48 insertions(+)
>>>
>>> diff --git a/block.c b/block.c
>>> index 9b1049786a..de53addeb0 100644
>>> --- a/block.c
>>> +++ b/block.c
>>> @@ -6205,6 +6205,44 @@ bool bdrv_recurse_is_first_non_filter(BlockDriverState *bs,
>>>        return false;
>>>    }
>>>    
>>> +/*
>>> + * This function checks whether the given @to_replace is allowed to be
>>> + * replaced by a node that always shows the same data as @bs.  This is
>>> + * used for example to verify whether the mirror job can replace
>>> + * @to_replace by the target mirrored from @bs.
>>> + * To be replaceable, @bs and @to_replace may either be guaranteed to
>>> + * always show the same data (because they are only connected through
>>> + * filters), or some driver may allow replacing one of its children
>>> + * because it can guarantee that this child's data is not visible at
>>> + * all (for example, for dissenting quorum children that have no other
>>> + * parents).
>>> + */
>>> +bool bdrv_recurse_can_replace(BlockDriverState *bs,
>>> +                              BlockDriverState *to_replace)
>>> +{
>>> +    if (!bs || !bs->drv) {
>>> +        return false;
>>> +    }
>>> +
>>> +    if (bs == to_replace) {
>>> +        return true;
>>> +    }
>>> +
>>> +    /* See what the driver can do */
>>> +    if (bs->drv->bdrv_recurse_can_replace) {
>>> +        return bs->drv->bdrv_recurse_can_replace(bs, to_replace);
>>> +    }
>>> +
>>> +    /* For filters without an own implementation, we can recurse on our own */
>>> +    if (bs->drv->is_filter) {
>>> +        BdrvChild *child = bs->file ?: bs->backing;
>>
>> should we check that child != NULL ?
> 
> I’d say that normally (once they are open) filters must have a child,
> and so I’d make it an assertion.  But then again an assertion isn’t much
> better than the dereferencing that follows, I think. :?
> 
> Max

OK then.

Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>

> 
>>> +        return bdrv_recurse_can_replace(child->bs, to_replace);
>>> +    }
>>> +
>>> +    /* Safe default */
>>> +    return false;
>>> +}
>>> +
>>>    BlockDriverState *check_to_replace_node(BlockDriverState *parent_bs,
>>>                                            const char *node_name, Error **errp)
>>>    {
>>> diff --git a/include/block/block_int.h b/include/block/block_int.h
>>> index dd033d0b37..75f03dcc38 100644
>>> --- a/include/block/block_int.h
>>> +++ b/include/block/block_int.h
>>> @@ -102,6 +102,13 @@ struct BlockDriver {
>>>         */
>>>        bool (*bdrv_recurse_is_first_non_filter)(BlockDriverState *bs,
>>>                                                 BlockDriverState *candidate);
>>> +    /*
>>> +     * Return true if @to_replace can be replaced by a BDS with the
>>> +     * same data as @bs without it affecting @bs's behavior (that is,
>>> +     * without it being visible to @bs's parents).
>>> +     */
>>> +    bool (*bdrv_recurse_can_replace)(BlockDriverState *bs,
>>> +                                     BlockDriverState *to_replace);
>>>    
>>>        int (*bdrv_probe)(const uint8_t *buf, int buf_size, const char *filename);
>>>        int (*bdrv_probe_device)(const char *filename);
>>> @@ -1264,6 +1271,9 @@ void bdrv_format_default_perms(BlockDriverState *bs, BdrvChild *c,
>>>                                   uint64_t perm, uint64_t shared,
>>>                                   uint64_t *nperm, uint64_t *nshared);
>>>    
>>> +bool bdrv_recurse_can_replace(BlockDriverState *bs,
>>> +                              BlockDriverState *to_replace);
>>> +
>>>    /*
>>>     * Default implementation for drivers to pass bdrv_co_block_status() to
>>>     * their file.
>>>
>>
>>
> 
> 


-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH for-5.0 v2 11/23] block: Use bdrv_recurse_can_replace()
  2019-11-11 16:02 ` [PATCH for-5.0 v2 11/23] block: Use bdrv_recurse_can_replace() Max Reitz
@ 2019-11-29 11:07   ` Vladimir Sementsov-Ogievskiy
  2020-02-05 15:57   ` Kevin Wolf
  1 sibling, 0 replies; 75+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2019-11-29 11:07 UTC (permalink / raw)
  To: Max Reitz, qemu-block; +Cc: Kevin Wolf, Alberto Garcia, qemu-devel

11.11.2019 19:02, Max Reitz wrote:
> Let check_to_replace_node() use the more specialized
> bdrv_recurse_can_replace() instead of
> bdrv_recurse_is_first_non_filter(), which is too restrictive.

or not enough restrictive in case of quorum

> 
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>   block.c | 18 ++++++++++++++++--
>   1 file changed, 16 insertions(+), 2 deletions(-)
> 
> diff --git a/block.c b/block.c
> index de53addeb0..7608f21570 100644
> --- a/block.c
> +++ b/block.c
> @@ -6243,6 +6243,17 @@ bool bdrv_recurse_can_replace(BlockDriverState *bs,
>       return false;
>   }
>   
> +/*
> + * Check whether the given @node_name can be replaced by a node that
> + * has the same data as @parent_bs.  If so, return @node_name's BDS;
> + * NULL otherwise.
> + *
> + * @node_name must be a (recursive) *child of @parent_bs (or this
> + * function will return NULL).
> + *
> + * The result (whether the node can be replaced or not) is only valid
> + * for as long as no graph changes occur.

actually, no graph changes neither any permission changes or updates.

> + */
>   BlockDriverState *check_to_replace_node(BlockDriverState *parent_bs,
>                                           const char *node_name, Error **errp)
>   {
> @@ -6267,8 +6278,11 @@ BlockDriverState *check_to_replace_node(BlockDriverState *parent_bs,
>        * Another benefit is that this tests exclude backing files which are
>        * blocked by the backing blockers.
>        */
> -    if (!bdrv_recurse_is_first_non_filter(parent_bs, to_replace_bs)) {
> -        error_setg(errp, "Only top most non filter can be replaced");
> +    if (!bdrv_recurse_can_replace(parent_bs, to_replace_bs)) {
> +        error_setg(errp, "Cannot replace '%s' by a node mirrored from '%s', "
> +                   "because it cannot be guaranteed that doing so would not "
> +                   "lead to an abrupt change of visible data",
> +                   node_name, parent_bs->node_name);
>           to_replace_bs = NULL;
>           goto out;
>       }
> 

Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>

-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH for-5.0 v2 13/23] mirror: Double-check immediately before replacing
  2019-11-11 16:02 ` [PATCH for-5.0 v2 13/23] mirror: Double-check immediately before replacing Max Reitz
@ 2019-11-29 11:18   ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 75+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2019-11-29 11:18 UTC (permalink / raw)
  To: Max Reitz, qemu-block; +Cc: Kevin Wolf, Alberto Garcia, qemu-devel

11.11.2019 19:02, Max Reitz wrote:
> There is no guarantee that we can still replace the node we want to
> replace at the end of the mirror job.  Double-check by calling
> bdrv_recurse_can_replace().
> 
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>   block/mirror.c | 14 +++++++++++++-
>   1 file changed, 13 insertions(+), 1 deletion(-)
> 
> diff --git a/block/mirror.c b/block/mirror.c
> index f0f2d9dff1..68a4404666 100644
> --- a/block/mirror.c
> +++ b/block/mirror.c
> @@ -695,7 +695,19 @@ static int mirror_exit_common(Job *job)
>            * drain potential other users of the BDS before changing the graph. */
>           assert(s->in_drain);
>           bdrv_drained_begin(target_bs);
> -        bdrv_replace_node(to_replace, target_bs, &local_err);
> +        /*
> +         * Cannot use check_to_replace_node() here, because that would
> +         * check for an op blocker on @to_replace, and we have our own
> +         * there.
> +         */

interesting, that check_to_replace_node would acquire aio context of src..

Here we acquire aio context only if s->to_replace set (above this hunk).. Isn't it a bug?

If it is, it's preexisting, and not directly related to the patch, so here:
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>

> +        if (bdrv_recurse_can_replace(src, to_replace)) {
> +            bdrv_replace_node(to_replace, target_bs, &local_err);
> +        } else {
> +            error_setg(&local_err, "Can no longer replace '%s' by '%s', "
> +                       "because it can no longer be guaranteed that doing so "
> +                       "would not lead to an abrupt change of visible data",
> +                       to_replace->node_name, target_bs->node_name);
> +        }
>           bdrv_drained_end(target_bs);
>           if (local_err) {
>               error_report_err(local_err);
> 


-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH for-5.0 v2 15/23] mirror: Prevent loops
  2019-11-11 16:02 ` [PATCH for-5.0 v2 15/23] mirror: Prevent loops Max Reitz
@ 2019-11-29 12:01   ` Vladimir Sementsov-Ogievskiy
  2019-11-29 13:46     ` Max Reitz
  2019-12-02 12:12   ` Vladimir Sementsov-Ogievskiy
  1 sibling, 1 reply; 75+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2019-11-29 12:01 UTC (permalink / raw)
  To: Max Reitz, qemu-block; +Cc: Kevin Wolf, Alberto Garcia, qemu-devel

11.11.2019 19:02, Max Reitz wrote:
> While bdrv_replace_node() will not follow through with it, a specific
> @replaces asks the mirror job to create a loop.
> 
> For example, say both the source and the target share a child where the
> source is a filter; by letting @replaces point to the common child, you
> ask for a loop.
> 
> Or if you use @replaces in drive-mirror with sync=none and
> mode=absolute-paths, you generally ask for a loop (@replaces must point
> to a child of the source, and sync=none makes the source the backing
> file of the target after the job).
> 
> bdrv_replace_node() will not create those loops, but by doing so, it
> ignores the user-requested configuration, which is not ideally either.
> (In the first example above, the target's child will remain what it was,
> which may still be reasonable.  But in the second example, the target
> will just not become a child of the source, which is precisely what was
> requested with @replaces.)
> 
> So prevent such configurations, both before the job, and before it
> actually completes.
> 
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>   block.c                   | 30 ++++++++++++++++++++++++
>   block/mirror.c            | 19 +++++++++++++++-
>   blockdev.c                | 48 ++++++++++++++++++++++++++++++++++++++-
>   include/block/block_int.h |  3 +++
>   4 files changed, 98 insertions(+), 2 deletions(-)
> 
> diff --git a/block.c b/block.c
> index 0159f8e510..e3922a0474 100644
> --- a/block.c
> +++ b/block.c
> @@ -6259,6 +6259,36 @@ out:
>       return to_replace_bs;
>   }
>   
> +/*
> + * Return true iff @child is a (recursive) child of @parent, with at
> + * least @min_level edges between them.
> + *
> + * (If @min_level == 0, return true if @child == @parent.  For
> + * @min_level == 1, @child needs to be at least a real child; for
> + * @min_level == 2, it needs to be at least a grand-child; and so on.)
> + */
> +bool bdrv_is_child_of(BlockDriverState *child, BlockDriverState *parent,
> +                      int min_level)
> +{
> +    BdrvChild *c;
> +
> +    if (child == parent && min_level <= 0) {
> +        return true;
> +    }
> +
> +    if (!parent) {
> +        return false;
> +    }
> +
> +    QLIST_FOREACH(c, &parent->children, next) {
> +        if (bdrv_is_child_of(child, c->bs, min_level - 1)) {
> +            return true;
> +        }
> +    }
> +
> +    return false;
> +}
> +
>   /**
>    * Iterates through the list of runtime option keys that are said to
>    * be "strong" for a BDS.  An option is called "strong" if it changes
> diff --git a/block/mirror.c b/block/mirror.c
> index 68a4404666..b258c7e98b 100644
> --- a/block/mirror.c
> +++ b/block/mirror.c
> @@ -701,7 +701,24 @@ static int mirror_exit_common(Job *job)
>            * there.
>            */
>           if (bdrv_recurse_can_replace(src, to_replace)) {
> -            bdrv_replace_node(to_replace, target_bs, &local_err);
> +            /*
> +             * It is OK for @to_replace to be an immediate child of
> +             * @target_bs, because that is what happens with
> +             * drive-mirror sync=none mode=absolute-paths: target_bs's
> +             * backing file will be the source node, which is also
> +             * to_replace (by default).
> +             * bdrv_replace_node() handles this case by not letting
> +             * target_bs->backing point to itself, but to the source
> +             * still.
> +             */

Hmm.. So, we want the following valid case:

(other parents of source) ----> source = to_replace <--- backing --- target

becomes

(other parents of source) ----> target --- backing ---> source

But it seems for me, that the following is not less valid:

(other parents of source) ----> source = to_replace <--- backing --- X <--- backing --- target

becomes

(other parents of source) ----> target --- backing ---> X --- backing ---> source

And what we actually want to prevent, is when to_replace is not source, but child (may be not direct)
of source..

Also, with your check you still allow silent no-change in the following case:

source --- backing --> to_replace <-- backing -- target

====

In other words, replacing make sense, only if to_replace has some other parents, which are not
children (may be not direct) of target.. And the only known such case is when in the same time
to_replace == source.

so, shouldn't the following be

if (to_replace == src || !bdrv_is_child_of(to_replace, target_bs, 1) {

or, may be, to allow also replace filters above src, keeping backing link :

if (bdrv_is_child_of(src, to_replace, 0) || !bdrv_is_child_of(to_replace, target_bs, 1) {

> +            if (!bdrv_is_child_of(to_replace, target_bs, 2)) {
> +                bdrv_replace_node(to_replace, target_bs, &local_err);
> +            } else {
> +                error_setg(&local_err, "Can no longer replace '%s' by '%s', "
> +                           "because the former is now a child of the latter, "
> +                           "and doing so would thus create a loop",
> +                           to_replace->node_name, target_bs->node_name);
> +            }
>           } else {
>               error_setg(&local_err, "Can no longer replace '%s' by '%s', "
>                          "because it can no longer be guaranteed that doing so "
> diff --git a/blockdev.c b/blockdev.c
> index 9dc2238bf3..d29f147f72 100644
> --- a/blockdev.c
> +++ b/blockdev.c
> @@ -3824,7 +3824,7 @@ static void blockdev_mirror_common(const char *job_id, BlockDriverState *bs,
>       }
>   
>       if (has_replaces) {
> -        BlockDriverState *to_replace_bs;
> +        BlockDriverState *to_replace_bs, *target_backing_bs;
>           AioContext *replace_aio_context;
>           int64_t bs_size, replace_size;
>   
> @@ -3839,6 +3839,52 @@ static void blockdev_mirror_common(const char *job_id, BlockDriverState *bs,
>               return;
>           }
>   
> +        if (bdrv_is_child_of(to_replace_bs, target, 1)) {
> +            error_setg(errp, "Replacing %s by %s would result in a loop, "
> +                       "because the former is a child of the latter",
> +                       to_replace_bs->node_name, target->node_name);
> +            return;
> +        }
> +
> +        if (backing_mode == MIRROR_SOURCE_BACKING_CHAIN ||
> +            backing_mode == MIRROR_OPEN_BACKING_CHAIN)
> +        {
> +            /*
> +             * While we do not quite know what OPEN_BACKING_CHAIN
> +             * (used for mode=existing) will yield, it is probably
> +             * best to restrict it exactly like SOURCE_BACKING_CHAIN,
> +             * because that is our best guess.
> +             */
> +            switch (sync) {
> +            case MIRROR_SYNC_MODE_FULL:
> +                target_backing_bs = NULL;
> +                break;
> +
> +            case MIRROR_SYNC_MODE_TOP:
> +                target_backing_bs = backing_bs(bs);
> +                break;
> +
> +            case MIRROR_SYNC_MODE_NONE:
> +                target_backing_bs = bs;
> +                break;
> +
> +            default:
> +                abort();
> +            }
> +        } else {
> +            assert(backing_mode == MIRROR_LEAVE_BACKING_CHAIN);
> +            target_backing_bs = backing_bs(target);
> +        }
> +
> +        if (bdrv_is_child_of(to_replace_bs, target_backing_bs, 0)) {
> +            error_setg(errp, "Replacing '%s' by '%s' with this sync mode would "
> +                       "result in a loop, because the former would be a child "
> +                       "of the latter's backing file ('%s') after the mirror "
> +                       "job", to_replace_bs->node_name, target->node_name,
> +                       target_backing_bs->node_name);
> +            return;
> +        }
> +
>           replace_aio_context = bdrv_get_aio_context(to_replace_bs);
>           aio_context_acquire(replace_aio_context);
>           replace_size = bdrv_getlength(to_replace_bs);
> diff --git a/include/block/block_int.h b/include/block/block_int.h
> index 589a797fab..7064a1a4fa 100644
> --- a/include/block/block_int.h
> +++ b/include/block/block_int.h
> @@ -1266,6 +1266,9 @@ void bdrv_format_default_perms(BlockDriverState *bs, BdrvChild *c,
>   bool bdrv_recurse_can_replace(BlockDriverState *bs,
>                                 BlockDriverState *to_replace);
>   
> +bool bdrv_is_child_of(BlockDriverState *child, BlockDriverState *parent,
> +                      int min_level);
> +
>   /*
>    * Default implementation for drivers to pass bdrv_co_block_status() to
>    * their file.
> 


-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH for-5.0 v2 00/23] block: Fix check_to_replace_node()
  2019-11-11 16:01 [PATCH for-5.0 v2 00/23] block: Fix check_to_replace_node() Max Reitz
                   ` (22 preceding siblings ...)
  2019-11-11 16:02 ` [PATCH for-5.0 v2 23/23] iotests: Mirror must not attempt to create loops Max Reitz
@ 2019-11-29 12:24 ` Vladimir Sementsov-Ogievskiy
  2019-11-29 12:49   ` Max Reitz
  23 siblings, 1 reply; 75+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2019-11-29 12:24 UTC (permalink / raw)
  To: Max Reitz, qemu-block; +Cc: Kevin Wolf, Alberto Garcia, qemu-devel

3 last iotests patches don't apply now.. Do you have a branch pushed somewhere?

11.11.2019 19:01, Max Reitz wrote:
> Based-on: <20191108123455.39445-1-mreitz@redhat.com>
> (“iotests: Test failing mirror complete”)
> 
> (Because both add cases to 041.)
> 
> 
> Hi,
> 
> For what this series does, see the cover letter of v1:
> 
> https://lists.nongnu.org/archive/html/qemu-block/2019-09/msg01027.html
> 
> 
> Now, in v2 I’ve addressed Vladimir’s comments:
> - Patch 5: Extend explanation in the commit message
> - Patch 6: Prefer driver-specific .bdrv_recurse_can_replace()
>             implementation before the generic one for filters
> - Patch 8: Some more s/BdrvChild \*/QuorumChild/
> - Patch 15: Fix typo in the commit message
> - Patch 17: Added
> - Patch 18:
>    - Split @path into @root + @path
>    - In one instance, use x = next(y, z) instead of
>      try: x = next(y); except StopIteration: x = z;
>    - %s/'''/"""/
> - Patch 19: Fallout from the patch 18 changes
> - Patch 20: Fix in the commit message (uncommenting -> commenting out)
> - Patch 21:
>    - Check full stderr message by inspecting the VM log
>    - Fallout from the patch 18 changes
>    - %s/'''/"""/
> - Patch 22:
>    - Skip case if COR is unsupported
>    - Fallout from the patch 18 changes
>    - %s/'''/"""/
> - Patch 23:
>    - Added more comments
>    - Skip cases if throttle/COR/quorum (as appropriate) is unsupported
>    - Use imgfmt instead of hard-coding qcow2
>    - Fallout from the patch 18 changes
>    - %s/'''/"""/
> 
> 
> git-backport-diff against v1:
> 
> Key:
> [----] : patches are identical
> [####] : number of functional differences between upstream/downstream patch
> [down] : patch is downstream-only
> The flags [FC] indicate (F)unctional and (C)ontextual differences, respectively
> 
> 001/23:[----] [--] 'blockdev: Allow external snapshots everywhere'
> 002/23:[----] [--] 'blockdev: Allow resizing everywhere'
> 003/23:[----] [--] 'block: Drop bdrv_is_first_non_filter()'
> 004/23:[----] [--] 'iotests: Let 041 use -blockdev for quorum children'
> 005/23:[----] [--] 'quorum: Fix child permissions'
> 006/23:[0012] [FC] 'block: Add bdrv_recurse_can_replace()'
> 007/23:[----] [--] 'blkverify: Implement .bdrv_recurse_can_replace()'
> 008/23:[0006] [FC] 'quorum: Store children in own structure'
> 009/23:[----] [--] 'quorum: Add QuorumChild.to_be_replaced'
> 010/23:[----] [--] 'quorum: Implement .bdrv_recurse_can_replace()'
> 011/23:[----] [--] 'block: Use bdrv_recurse_can_replace()'
> 012/23:[----] [--] 'block: Remove bdrv_recurse_is_first_non_filter()'
> 013/23:[----] [--] 'mirror: Double-check immediately before replacing'
> 014/23:[----] [--] 'quorum: Stop marking it as a filter'
> 015/23:[----] [--] 'mirror: Prevent loops'
> 016/23:[----] [--] 'iotests: Use complete_and_wait() in 155'
> 017/23:[down] 'iotests: Use skip_if_unsupported decorator in 041'
> 018/23:[0037] [FC] 'iotests: Add VM.assert_block_path()'
> 019/23:[0004] [FC] 'iotests: Resolve TODOs in 041'
> 020/23:[----] [--] 'iotests: Use self.image_len in TestRepairQuorum'
> 021/23:[0027] [FC] 'iotests: Add tests for invalid Quorum @replaces'
> 022/23:[0007] [FC] 'iotests: Check that @replaces can replace filters'
> 023/23:[0141] [FC] 'iotests: Mirror must not attempt to create loops'
> 
> 
> Max Reitz (23):
>    blockdev: Allow external snapshots everywhere
>    blockdev: Allow resizing everywhere
>    block: Drop bdrv_is_first_non_filter()
>    iotests: Let 041 use -blockdev for quorum children
>    quorum: Fix child permissions
>    block: Add bdrv_recurse_can_replace()
>    blkverify: Implement .bdrv_recurse_can_replace()
>    quorum: Store children in own structure
>    quorum: Add QuorumChild.to_be_replaced
>    quorum: Implement .bdrv_recurse_can_replace()
>    block: Use bdrv_recurse_can_replace()
>    block: Remove bdrv_recurse_is_first_non_filter()
>    mirror: Double-check immediately before replacing
>    quorum: Stop marking it as a filter
>    mirror: Prevent loops
>    iotests: Use complete_and_wait() in 155
>    iotests: Use skip_if_unsupported decorator in 041
>    iotests: Add VM.assert_block_path()
>    iotests: Resolve TODOs in 041
>    iotests: Use self.image_len in TestRepairQuorum
>    iotests: Add tests for invalid Quorum @replaces
>    iotests: Check that @replaces can replace filters
>    iotests: Mirror must not attempt to create loops
> 
>   block.c                       | 115 ++++++----
>   block/blkverify.c             |  20 +-
>   block/copy-on-read.c          |   9 -
>   block/mirror.c                |  31 ++-
>   block/quorum.c                | 161 +++++++++++---
>   block/replication.c           |   7 -
>   block/throttle.c              |   8 -
>   blockdev.c                    |  58 ++++-
>   include/block/block.h         |   5 -
>   include/block/block_int.h     |  19 +-
>   tests/qemu-iotests/041        | 402 ++++++++++++++++++++++++++++++----
>   tests/qemu-iotests/041.out    |   4 +-
>   tests/qemu-iotests/155        |   7 +-
>   tests/qemu-iotests/iotests.py |  59 +++++
>   14 files changed, 715 insertions(+), 190 deletions(-)
> 


-- 
Best regards,
Vladimir

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH for-5.0 v2 00/23] block: Fix check_to_replace_node()
  2019-11-29 12:24 ` [PATCH for-5.0 v2 00/23] block: Fix check_to_replace_node() Vladimir Sementsov-Ogievskiy
@ 2019-11-29 12:49   ` Max Reitz
  2019-11-29 12:55     ` Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 75+ messages in thread
From: Max Reitz @ 2019-11-29 12:49 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-block
  Cc: Kevin Wolf, Alberto Garcia, qemu-devel


[-- Attachment #1.1: Type: text/plain, Size: 322 bytes --]

On 29.11.19 13:24, Vladimir Sementsov-Ogievskiy wrote:
> 3 last iotests patches don't apply now.. Do you have a branch pushed somewhere?

Hm, it’s based on “iotests: Test failing mirror complete”, maybe because
of that.

Does this work?

https://git.xanclic.moe/XanClic/qemu.git fix-can-replace-v2

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH for-5.0 v2 10/23] quorum: Implement .bdrv_recurse_can_replace()
  2019-11-29 10:18   ` Vladimir Sementsov-Ogievskiy
@ 2019-11-29 12:50     ` Max Reitz
  0 siblings, 0 replies; 75+ messages in thread
From: Max Reitz @ 2019-11-29 12:50 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-block
  Cc: Kevin Wolf, Alberto Garcia, qemu-devel


[-- Attachment #1.1: Type: text/plain, Size: 3272 bytes --]

On 29.11.19 11:18, Vladimir Sementsov-Ogievskiy wrote:
> 11.11.2019 19:02, Max Reitz wrote:
>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>> ---
>>   block/quorum.c | 62 ++++++++++++++++++++++++++++++++++++++++++++++++++
>>   1 file changed, 62 insertions(+)
>>
>> diff --git a/block/quorum.c b/block/quorum.c
>> index 3a824e77e3..8ee03e9baf 100644
>> --- a/block/quorum.c
>> +++ b/block/quorum.c
>> @@ -825,6 +825,67 @@ static bool quorum_recurse_is_first_non_filter(BlockDriverState *bs,
>>       return false;
>>   }
>>   
>> +static bool quorum_recurse_can_replace(BlockDriverState *bs,
>> +                                       BlockDriverState *to_replace)
>> +{
>> +    BDRVQuorumState *s = bs->opaque;
>> +    int i;
>> +
>> +    for (i = 0; i < s->num_children; i++) {
>> +        /*
>> +         * We have no idea whether our children show the same data as
>> +         * this node (@bs).  It is actually highly likely that
>> +         * @to_replace does not, because replacing a broken child is
>> +         * one of the main use cases here.
>> +         *
>> +         * We do know that the new BDS will match @bs, so replacing
>> +         * any of our children by it will be safe.  It cannot change
>> +         * the data this quorum node presents to its parents.
>> +         *
>> +         * However, replacing @to_replace by @bs in any of our
>> +         * children's chains may change visible data somewhere in
>> +         * there.  We therefore cannot recurse down those chains with
>> +         * bdrv_recurse_can_replace().
>> +         * (More formally, bdrv_recurse_can_replace() requires that
>> +         * @to_replace will be replaced by something matching the @bs
>> +         * passed to it.  We cannot guarantee that.)
>> +         *
>> +         * Thus, we can only check whether any of our immediate
>> +         * children matches @to_replace.
>> +         *
>> +         * (In the future, we might add a function to recurse down a
>> +         * chain that checks that nothing there cares about a change
>> +         * in data from the respective child in question.  For
>> +         * example, most filters do not care when their child's data
>> +         * suddenly changes, as long as their parents do not care.)
>> +         */
>> +        if (s->children[i].child->bs == to_replace) {
>> +            Error *local_err = NULL;
> 
> bdrv_child_refresh_perms returns int, so I suggest instead:

Good to know. :-)

> bool ok;
> 
>> +
>> +            /*
>> +             * We now have to ensure that there is no other parent
>> +             * that cares about replacing this child by a node with
>> +             * potentially different data.
>> +             */
>> +            s->children[i].to_be_replaced = true;
>> +            bdrv_child_refresh_perms(bs, s->children[i].child, &local_err);
> 
> ok = !bdrv_child_refresh_perms(bs, s->children[i].child, NULL);
> 
>> +
>> +            /* Revert permissions */
>> +            s->children[i].to_be_replaced = false;
>> +            bdrv_child_refresh_perms(bs, s->children[i].child, &error_abort);
> 
> return ok;
> 
> Or similar with // int ret; // ret = // return !ret; //

Sounds good.

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH for-5.0 v2 00/23] block: Fix check_to_replace_node()
  2019-11-29 12:49   ` Max Reitz
@ 2019-11-29 12:55     ` Vladimir Sementsov-Ogievskiy
  2019-11-29 13:08       ` Max Reitz
  0 siblings, 1 reply; 75+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2019-11-29 12:55 UTC (permalink / raw)
  To: Max Reitz, qemu-block; +Cc: Kevin Wolf, Alberto Garcia, qemu-devel

29.11.2019 15:49, Max Reitz wrote:
> On 29.11.19 13:24, Vladimir Sementsov-Ogievskiy wrote:
>> 3 last iotests patches don't apply now.. Do you have a branch pushed somewhere?
> 
> Hm, it’s based on “iotests: Test failing mirror complete”, maybe because
> of that.
> 
> Does this work?
> 
> https://git.xanclic.moe/XanClic/qemu.git fix-can-replace-v2
> 

Hmm, I remember in past, fetching from this repo always hang for me.
But now it works, thanks.


-- 
Best regards,
Vladimir

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH for-5.0 v2 00/23] block: Fix check_to_replace_node()
  2019-11-29 12:55     ` Vladimir Sementsov-Ogievskiy
@ 2019-11-29 13:08       ` Max Reitz
  0 siblings, 0 replies; 75+ messages in thread
From: Max Reitz @ 2019-11-29 13:08 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-block
  Cc: Kevin Wolf, Alberto Garcia, qemu-devel


[-- Attachment #1.1: Type: text/plain, Size: 708 bytes --]

On 29.11.19 13:55, Vladimir Sementsov-Ogievskiy wrote:
> 29.11.2019 15:49, Max Reitz wrote:
>> On 29.11.19 13:24, Vladimir Sementsov-Ogievskiy wrote:
>>> 3 last iotests patches don't apply now.. Do you have a branch pushed somewhere?
>>
>> Hm, it’s based on “iotests: Test failing mirror complete”, maybe because
>> of that.
>>
>> Does this work?
>>
>> https://git.xanclic.moe/XanClic/qemu.git fix-can-replace-v2
>>
> 
> Hmm, I remember in past, fetching from this repo always hang for me.
> But now it works, thanks.

Possibly because I disabled Gravatar fetching. :-)

(No idea why that would affect git, but I do know it made the normal
browser representation quicker.)

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH for-5.0 v2 15/23] mirror: Prevent loops
  2019-11-29 12:01   ` Vladimir Sementsov-Ogievskiy
@ 2019-11-29 13:46     ` Max Reitz
  2019-11-29 13:55       ` Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 75+ messages in thread
From: Max Reitz @ 2019-11-29 13:46 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-block
  Cc: Kevin Wolf, Alberto Garcia, qemu-devel


[-- Attachment #1.1: Type: text/plain, Size: 9938 bytes --]

On 29.11.19 13:01, Vladimir Sementsov-Ogievskiy wrote:
> 11.11.2019 19:02, Max Reitz wrote:
>> While bdrv_replace_node() will not follow through with it, a specific
>> @replaces asks the mirror job to create a loop.
>>
>> For example, say both the source and the target share a child where the
>> source is a filter; by letting @replaces point to the common child, you
>> ask for a loop.
>>
>> Or if you use @replaces in drive-mirror with sync=none and
>> mode=absolute-paths, you generally ask for a loop (@replaces must point
>> to a child of the source, and sync=none makes the source the backing
>> file of the target after the job).
>>
>> bdrv_replace_node() will not create those loops, but by doing so, it
>> ignores the user-requested configuration, which is not ideally either.
>> (In the first example above, the target's child will remain what it was,
>> which may still be reasonable.  But in the second example, the target
>> will just not become a child of the source, which is precisely what was
>> requested with @replaces.)
>>
>> So prevent such configurations, both before the job, and before it
>> actually completes.
>>
>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>> ---
>>   block.c                   | 30 ++++++++++++++++++++++++
>>   block/mirror.c            | 19 +++++++++++++++-
>>   blockdev.c                | 48 ++++++++++++++++++++++++++++++++++++++-
>>   include/block/block_int.h |  3 +++
>>   4 files changed, 98 insertions(+), 2 deletions(-)
>>
>> diff --git a/block.c b/block.c
>> index 0159f8e510..e3922a0474 100644
>> --- a/block.c
>> +++ b/block.c
>> @@ -6259,6 +6259,36 @@ out:
>>       return to_replace_bs;
>>   }
>>   
>> +/*
>> + * Return true iff @child is a (recursive) child of @parent, with at
>> + * least @min_level edges between them.
>> + *
>> + * (If @min_level == 0, return true if @child == @parent.  For
>> + * @min_level == 1, @child needs to be at least a real child; for
>> + * @min_level == 2, it needs to be at least a grand-child; and so on.)
>> + */
>> +bool bdrv_is_child_of(BlockDriverState *child, BlockDriverState *parent,
>> +                      int min_level)
>> +{
>> +    BdrvChild *c;
>> +
>> +    if (child == parent && min_level <= 0) {
>> +        return true;
>> +    }
>> +
>> +    if (!parent) {
>> +        return false;
>> +    }
>> +
>> +    QLIST_FOREACH(c, &parent->children, next) {
>> +        if (bdrv_is_child_of(child, c->bs, min_level - 1)) {
>> +            return true;
>> +        }
>> +    }
>> +
>> +    return false;
>> +}
>> +
>>   /**
>>    * Iterates through the list of runtime option keys that are said to
>>    * be "strong" for a BDS.  An option is called "strong" if it changes
>> diff --git a/block/mirror.c b/block/mirror.c
>> index 68a4404666..b258c7e98b 100644
>> --- a/block/mirror.c
>> +++ b/block/mirror.c
>> @@ -701,7 +701,24 @@ static int mirror_exit_common(Job *job)
>>            * there.
>>            */
>>           if (bdrv_recurse_can_replace(src, to_replace)) {
>> -            bdrv_replace_node(to_replace, target_bs, &local_err);
>> +            /*
>> +             * It is OK for @to_replace to be an immediate child of
>> +             * @target_bs, because that is what happens with
>> +             * drive-mirror sync=none mode=absolute-paths: target_bs's
>> +             * backing file will be the source node, which is also
>> +             * to_replace (by default).
>> +             * bdrv_replace_node() handles this case by not letting
>> +             * target_bs->backing point to itself, but to the source
>> +             * still.
>> +             */
> 
> Hmm.. So, we want the following valid case:
> 
> (other parents of source) ----> source = to_replace <--- backing --- target
> 
> becomes
> 
> (other parents of source) ----> target --- backing ---> source
> 
> But it seems for me, that the following is not less valid:
> 
> (other parents of source) ----> source = to_replace <--- backing --- X <--- backing --- target
> 
> becomes
> 
> (other parents of source) ----> target --- backing ---> X --- backing ---> source

I think it is less valid.  The first case works with sync=none, because
target is initially empty and then you just copy all new data, so the
target keeps looking like the source.

But in the second case, there are intermediate nodes that mean that
target may well not look like the source.

(Yes, you have the same problem if you use sync=none or sync=full to a
completely independent node.  But that still means that while the first
case is always valid, the second may be problematic.)

> And what we actually want to prevent, is when to_replace is not source, but child (may be not direct)
> of source..
> 
> Also, with your check you still allow silent no-change in the following case:
> 
> source --- backing --> to_replace <-- backing -- target

You mean if source is a filter on to_replace?  (Because otherwise you
can’t replace that node.)

Is that really a no-change?  Shouldn’t we get

source --> target --> to_replace

?  (And what else would you expect?)

So maybe we don’t want to prevent that, because I think it can make sense.

Max

> ====
> 
> In other words, replacing make sense, only if to_replace has some other parents, which are not
> children (may be not direct) of target.. And the only known such case is when in the same time
> to_replace == source.
> 
> so, shouldn't the following be
> 
> if (to_replace == src || !bdrv_is_child_of(to_replace, target_bs, 1) {
> 
> or, may be, to allow also replace filters above src, keeping backing link :
> 
> if (bdrv_is_child_of(src, to_replace, 0) || !bdrv_is_child_of(to_replace, target_bs, 1) {
> 
>> +            if (!bdrv_is_child_of(to_replace, target_bs, 2)) {
>> +                bdrv_replace_node(to_replace, target_bs, &local_err);
>> +            } else {
>> +                error_setg(&local_err, "Can no longer replace '%s' by '%s', "
>> +                           "because the former is now a child of the latter, "
>> +                           "and doing so would thus create a loop",
>> +                           to_replace->node_name, target_bs->node_name);
>> +            }
>>           } else {
>>               error_setg(&local_err, "Can no longer replace '%s' by '%s', "
>>                          "because it can no longer be guaranteed that doing so "
>> diff --git a/blockdev.c b/blockdev.c
>> index 9dc2238bf3..d29f147f72 100644
>> --- a/blockdev.c
>> +++ b/blockdev.c
>> @@ -3824,7 +3824,7 @@ static void blockdev_mirror_common(const char *job_id, BlockDriverState *bs,
>>       }
>>   
>>       if (has_replaces) {
>> -        BlockDriverState *to_replace_bs;
>> +        BlockDriverState *to_replace_bs, *target_backing_bs;
>>           AioContext *replace_aio_context;
>>           int64_t bs_size, replace_size;
>>   
>> @@ -3839,6 +3839,52 @@ static void blockdev_mirror_common(const char *job_id, BlockDriverState *bs,
>>               return;
>>           }
>>   
>> +        if (bdrv_is_child_of(to_replace_bs, target, 1)) {
>> +            error_setg(errp, "Replacing %s by %s would result in a loop, "
>> +                       "because the former is a child of the latter",
>> +                       to_replace_bs->node_name, target->node_name);
>> +            return;
>> +        }
>> +
>> +        if (backing_mode == MIRROR_SOURCE_BACKING_CHAIN ||
>> +            backing_mode == MIRROR_OPEN_BACKING_CHAIN)
>> +        {
>> +            /*
>> +             * While we do not quite know what OPEN_BACKING_CHAIN
>> +             * (used for mode=existing) will yield, it is probably
>> +             * best to restrict it exactly like SOURCE_BACKING_CHAIN,
>> +             * because that is our best guess.
>> +             */
>> +            switch (sync) {
>> +            case MIRROR_SYNC_MODE_FULL:
>> +                target_backing_bs = NULL;
>> +                break;
>> +
>> +            case MIRROR_SYNC_MODE_TOP:
>> +                target_backing_bs = backing_bs(bs);
>> +                break;
>> +
>> +            case MIRROR_SYNC_MODE_NONE:
>> +                target_backing_bs = bs;
>> +                break;
>> +
>> +            default:
>> +                abort();
>> +            }
>> +        } else {
>> +            assert(backing_mode == MIRROR_LEAVE_BACKING_CHAIN);
>> +            target_backing_bs = backing_bs(target);
>> +        }
>> +
>> +        if (bdrv_is_child_of(to_replace_bs, target_backing_bs, 0)) {
>> +            error_setg(errp, "Replacing '%s' by '%s' with this sync mode would "
>> +                       "result in a loop, because the former would be a child "
>> +                       "of the latter's backing file ('%s') after the mirror "
>> +                       "job", to_replace_bs->node_name, target->node_name,
>> +                       target_backing_bs->node_name);
>> +            return;
>> +        }
>> +
>>           replace_aio_context = bdrv_get_aio_context(to_replace_bs);
>>           aio_context_acquire(replace_aio_context);
>>           replace_size = bdrv_getlength(to_replace_bs);
>> diff --git a/include/block/block_int.h b/include/block/block_int.h
>> index 589a797fab..7064a1a4fa 100644
>> --- a/include/block/block_int.h
>> +++ b/include/block/block_int.h
>> @@ -1266,6 +1266,9 @@ void bdrv_format_default_perms(BlockDriverState *bs, BdrvChild *c,
>>   bool bdrv_recurse_can_replace(BlockDriverState *bs,
>>                                 BlockDriverState *to_replace);
>>   
>> +bool bdrv_is_child_of(BlockDriverState *child, BlockDriverState *parent,
>> +                      int min_level);
>> +
>>   /*
>>    * Default implementation for drivers to pass bdrv_co_block_status() to
>>    * their file.
>>
> 
> 



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH for-5.0 v2 15/23] mirror: Prevent loops
  2019-11-29 13:46     ` Max Reitz
@ 2019-11-29 13:55       ` Vladimir Sementsov-Ogievskiy
  2019-11-29 14:17         ` Max Reitz
  0 siblings, 1 reply; 75+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2019-11-29 13:55 UTC (permalink / raw)
  To: Max Reitz, qemu-block; +Cc: Kevin Wolf, Alberto Garcia, qemu-devel

29.11.2019 16:46, Max Reitz wrote:
> On 29.11.19 13:01, Vladimir Sementsov-Ogievskiy wrote:
>> 11.11.2019 19:02, Max Reitz wrote:
>>> While bdrv_replace_node() will not follow through with it, a specific
>>> @replaces asks the mirror job to create a loop.
>>>
>>> For example, say both the source and the target share a child where the
>>> source is a filter; by letting @replaces point to the common child, you
>>> ask for a loop.
>>>
>>> Or if you use @replaces in drive-mirror with sync=none and
>>> mode=absolute-paths, you generally ask for a loop (@replaces must point
>>> to a child of the source, and sync=none makes the source the backing
>>> file of the target after the job).
>>>
>>> bdrv_replace_node() will not create those loops, but by doing so, it
>>> ignores the user-requested configuration, which is not ideally either.
>>> (In the first example above, the target's child will remain what it was,
>>> which may still be reasonable.  But in the second example, the target
>>> will just not become a child of the source, which is precisely what was
>>> requested with @replaces.)
>>>
>>> So prevent such configurations, both before the job, and before it
>>> actually completes.
>>>
>>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>>> ---
>>>    block.c                   | 30 ++++++++++++++++++++++++
>>>    block/mirror.c            | 19 +++++++++++++++-
>>>    blockdev.c                | 48 ++++++++++++++++++++++++++++++++++++++-
>>>    include/block/block_int.h |  3 +++
>>>    4 files changed, 98 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/block.c b/block.c
>>> index 0159f8e510..e3922a0474 100644
>>> --- a/block.c
>>> +++ b/block.c
>>> @@ -6259,6 +6259,36 @@ out:
>>>        return to_replace_bs;
>>>    }
>>>    
>>> +/*
>>> + * Return true iff @child is a (recursive) child of @parent, with at
>>> + * least @min_level edges between them.
>>> + *
>>> + * (If @min_level == 0, return true if @child == @parent.  For
>>> + * @min_level == 1, @child needs to be at least a real child; for
>>> + * @min_level == 2, it needs to be at least a grand-child; and so on.)
>>> + */
>>> +bool bdrv_is_child_of(BlockDriverState *child, BlockDriverState *parent,
>>> +                      int min_level)
>>> +{
>>> +    BdrvChild *c;
>>> +
>>> +    if (child == parent && min_level <= 0) {
>>> +        return true;
>>> +    }
>>> +
>>> +    if (!parent) {
>>> +        return false;
>>> +    }
>>> +
>>> +    QLIST_FOREACH(c, &parent->children, next) {
>>> +        if (bdrv_is_child_of(child, c->bs, min_level - 1)) {
>>> +            return true;
>>> +        }
>>> +    }
>>> +
>>> +    return false;
>>> +}
>>> +
>>>    /**
>>>     * Iterates through the list of runtime option keys that are said to
>>>     * be "strong" for a BDS.  An option is called "strong" if it changes
>>> diff --git a/block/mirror.c b/block/mirror.c
>>> index 68a4404666..b258c7e98b 100644
>>> --- a/block/mirror.c
>>> +++ b/block/mirror.c
>>> @@ -701,7 +701,24 @@ static int mirror_exit_common(Job *job)
>>>             * there.
>>>             */
>>>            if (bdrv_recurse_can_replace(src, to_replace)) {
>>> -            bdrv_replace_node(to_replace, target_bs, &local_err);
>>> +            /*
>>> +             * It is OK for @to_replace to be an immediate child of
>>> +             * @target_bs, because that is what happens with
>>> +             * drive-mirror sync=none mode=absolute-paths: target_bs's
>>> +             * backing file will be the source node, which is also
>>> +             * to_replace (by default).
>>> +             * bdrv_replace_node() handles this case by not letting
>>> +             * target_bs->backing point to itself, but to the source
>>> +             * still.
>>> +             */
>>
>> Hmm.. So, we want the following valid case:
>>
>> (other parents of source) ----> source = to_replace <--- backing --- target
>>
>> becomes
>>
>> (other parents of source) ----> target --- backing ---> source
>>
>> But it seems for me, that the following is not less valid:
>>
>> (other parents of source) ----> source = to_replace <--- backing --- X <--- backing --- target
>>
>> becomes
>>
>> (other parents of source) ----> target --- backing ---> X --- backing ---> source
> 
> I think it is less valid.  The first case works with sync=none, because
> target is initially empty and then you just copy all new data, so the
> target keeps looking like the source.
> 
> But in the second case, there are intermediate nodes that mean that
> target may well not look like the source.

Maybe, it's valid if target node is a filter? Or, otherwise, it's backing is a filter,
but this seems less useful.

> 
> (Yes, you have the same problem if you use sync=none or sync=full to a
> completely independent node.  But that still means that while the first
> case is always valid, the second may be problematic.)
> 
>> And what we actually want to prevent, is when to_replace is not source, but child (may be not direct)
>> of source..
>>
>> Also, with your check you still allow silent no-change in the following case:
>>
>> source --- backing --> to_replace <-- backing -- target
> 
> You mean if source is a filter on to_replace?  (Because otherwise you
> can’t replace that node.)
> 
> Is that really a no-change?  Shouldn’t we get
> 
> source --> target --> to_replace

Ah, yes, it's OK.

> 
> ?  (And what else would you expect?)
> 
> So maybe we don’t want to prevent that, because I think it can make sense.
> 
> Max
> 
>> ====
>>
>> In other words, replacing make sense, only if to_replace has some other parents, which are not
>> children (may be not direct) of target.. And the only known such case is when in the same time
>> to_replace == source.
>>
>> so, shouldn't the following be
>>
>> if (to_replace == src || !bdrv_is_child_of(to_replace, target_bs, 1) {
>>
>> or, may be, to allow also replace filters above src, keeping backing link :
>>
>> if (bdrv_is_child_of(src, to_replace, 0) || !bdrv_is_child_of(to_replace, target_bs, 1) {
>>
>>> +            if (!bdrv_is_child_of(to_replace, target_bs, 2)) {
>>> +                bdrv_replace_node(to_replace, target_bs, &local_err);
>>> +            } else {
>>> +                error_setg(&local_err, "Can no longer replace '%s' by '%s', "
>>> +                           "because the former is now a child of the latter, "
>>> +                           "and doing so would thus create a loop",
>>> +                           to_replace->node_name, target_bs->node_name);
>>> +            }
>>>            } else {
>>>                error_setg(&local_err, "Can no longer replace '%s' by '%s', "
>>>                           "because it can no longer be guaranteed that doing so "
>>> diff --git a/blockdev.c b/blockdev.c
>>> index 9dc2238bf3..d29f147f72 100644
>>> --- a/blockdev.c
>>> +++ b/blockdev.c
>>> @@ -3824,7 +3824,7 @@ static void blockdev_mirror_common(const char *job_id, BlockDriverState *bs,
>>>        }
>>>    
>>>        if (has_replaces) {
>>> -        BlockDriverState *to_replace_bs;
>>> +        BlockDriverState *to_replace_bs, *target_backing_bs;
>>>            AioContext *replace_aio_context;
>>>            int64_t bs_size, replace_size;
>>>    
>>> @@ -3839,6 +3839,52 @@ static void blockdev_mirror_common(const char *job_id, BlockDriverState *bs,
>>>                return;
>>>            }
>>>    
>>> +        if (bdrv_is_child_of(to_replace_bs, target, 1)) {
>>> +            error_setg(errp, "Replacing %s by %s would result in a loop, "
>>> +                       "because the former is a child of the latter",
>>> +                       to_replace_bs->node_name, target->node_name);
>>> +            return;
>>> +        }
>>> +
>>> +        if (backing_mode == MIRROR_SOURCE_BACKING_CHAIN ||
>>> +            backing_mode == MIRROR_OPEN_BACKING_CHAIN)
>>> +        {
>>> +            /*
>>> +             * While we do not quite know what OPEN_BACKING_CHAIN
>>> +             * (used for mode=existing) will yield, it is probably
>>> +             * best to restrict it exactly like SOURCE_BACKING_CHAIN,
>>> +             * because that is our best guess.
>>> +             */
>>> +            switch (sync) {
>>> +            case MIRROR_SYNC_MODE_FULL:
>>> +                target_backing_bs = NULL;
>>> +                break;
>>> +
>>> +            case MIRROR_SYNC_MODE_TOP:
>>> +                target_backing_bs = backing_bs(bs);
>>> +                break;
>>> +
>>> +            case MIRROR_SYNC_MODE_NONE:
>>> +                target_backing_bs = bs;
>>> +                break;
>>> +
>>> +            default:
>>> +                abort();
>>> +            }
>>> +        } else {
>>> +            assert(backing_mode == MIRROR_LEAVE_BACKING_CHAIN);
>>> +            target_backing_bs = backing_bs(target);
>>> +        }
>>> +
>>> +        if (bdrv_is_child_of(to_replace_bs, target_backing_bs, 0)) {
>>> +            error_setg(errp, "Replacing '%s' by '%s' with this sync mode would "
>>> +                       "result in a loop, because the former would be a child "
>>> +                       "of the latter's backing file ('%s') after the mirror "
>>> +                       "job", to_replace_bs->node_name, target->node_name,
>>> +                       target_backing_bs->node_name);
>>> +            return;
>>> +        }
>>> +
>>>            replace_aio_context = bdrv_get_aio_context(to_replace_bs);
>>>            aio_context_acquire(replace_aio_context);
>>>            replace_size = bdrv_getlength(to_replace_bs);
>>> diff --git a/include/block/block_int.h b/include/block/block_int.h
>>> index 589a797fab..7064a1a4fa 100644
>>> --- a/include/block/block_int.h
>>> +++ b/include/block/block_int.h
>>> @@ -1266,6 +1266,9 @@ void bdrv_format_default_perms(BlockDriverState *bs, BdrvChild *c,
>>>    bool bdrv_recurse_can_replace(BlockDriverState *bs,
>>>                                  BlockDriverState *to_replace);
>>>    
>>> +bool bdrv_is_child_of(BlockDriverState *child, BlockDriverState *parent,
>>> +                      int min_level);
>>> +
>>>    /*
>>>     * Default implementation for drivers to pass bdrv_co_block_status() to
>>>     * their file.
>>>
>>
>>
> 
> 


-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH for-5.0 v2 15/23] mirror: Prevent loops
  2019-11-29 13:55       ` Vladimir Sementsov-Ogievskiy
@ 2019-11-29 14:17         ` Max Reitz
  2019-11-29 14:26           ` Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 75+ messages in thread
From: Max Reitz @ 2019-11-29 14:17 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-block
  Cc: Kevin Wolf, Alberto Garcia, qemu-devel


[-- Attachment #1.1: Type: text/plain, Size: 5384 bytes --]

On 29.11.19 14:55, Vladimir Sementsov-Ogievskiy wrote:
> 29.11.2019 16:46, Max Reitz wrote:
>> On 29.11.19 13:01, Vladimir Sementsov-Ogievskiy wrote:
>>> 11.11.2019 19:02, Max Reitz wrote:
>>>> While bdrv_replace_node() will not follow through with it, a specific
>>>> @replaces asks the mirror job to create a loop.
>>>>
>>>> For example, say both the source and the target share a child where the
>>>> source is a filter; by letting @replaces point to the common child, you
>>>> ask for a loop.
>>>>
>>>> Or if you use @replaces in drive-mirror with sync=none and
>>>> mode=absolute-paths, you generally ask for a loop (@replaces must point
>>>> to a child of the source, and sync=none makes the source the backing
>>>> file of the target after the job).
>>>>
>>>> bdrv_replace_node() will not create those loops, but by doing so, it
>>>> ignores the user-requested configuration, which is not ideally either.
>>>> (In the first example above, the target's child will remain what it was,
>>>> which may still be reasonable.  But in the second example, the target
>>>> will just not become a child of the source, which is precisely what was
>>>> requested with @replaces.)
>>>>
>>>> So prevent such configurations, both before the job, and before it
>>>> actually completes.
>>>>
>>>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>>>> ---
>>>>    block.c                   | 30 ++++++++++++++++++++++++
>>>>    block/mirror.c            | 19 +++++++++++++++-
>>>>    blockdev.c                | 48 ++++++++++++++++++++++++++++++++++++++-
>>>>    include/block/block_int.h |  3 +++
>>>>    4 files changed, 98 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/block.c b/block.c
>>>> index 0159f8e510..e3922a0474 100644
>>>> --- a/block.c
>>>> +++ b/block.c
>>>> @@ -6259,6 +6259,36 @@ out:
>>>>        return to_replace_bs;
>>>>    }
>>>>    
>>>> +/*
>>>> + * Return true iff @child is a (recursive) child of @parent, with at
>>>> + * least @min_level edges between them.
>>>> + *
>>>> + * (If @min_level == 0, return true if @child == @parent.  For
>>>> + * @min_level == 1, @child needs to be at least a real child; for
>>>> + * @min_level == 2, it needs to be at least a grand-child; and so on.)
>>>> + */
>>>> +bool bdrv_is_child_of(BlockDriverState *child, BlockDriverState *parent,
>>>> +                      int min_level)
>>>> +{
>>>> +    BdrvChild *c;
>>>> +
>>>> +    if (child == parent && min_level <= 0) {
>>>> +        return true;
>>>> +    }
>>>> +
>>>> +    if (!parent) {
>>>> +        return false;
>>>> +    }
>>>> +
>>>> +    QLIST_FOREACH(c, &parent->children, next) {
>>>> +        if (bdrv_is_child_of(child, c->bs, min_level - 1)) {
>>>> +            return true;
>>>> +        }
>>>> +    }
>>>> +
>>>> +    return false;
>>>> +}
>>>> +
>>>>    /**
>>>>     * Iterates through the list of runtime option keys that are said to
>>>>     * be "strong" for a BDS.  An option is called "strong" if it changes
>>>> diff --git a/block/mirror.c b/block/mirror.c
>>>> index 68a4404666..b258c7e98b 100644
>>>> --- a/block/mirror.c
>>>> +++ b/block/mirror.c
>>>> @@ -701,7 +701,24 @@ static int mirror_exit_common(Job *job)
>>>>             * there.
>>>>             */
>>>>            if (bdrv_recurse_can_replace(src, to_replace)) {
>>>> -            bdrv_replace_node(to_replace, target_bs, &local_err);
>>>> +            /*
>>>> +             * It is OK for @to_replace to be an immediate child of
>>>> +             * @target_bs, because that is what happens with
>>>> +             * drive-mirror sync=none mode=absolute-paths: target_bs's
>>>> +             * backing file will be the source node, which is also
>>>> +             * to_replace (by default).
>>>> +             * bdrv_replace_node() handles this case by not letting
>>>> +             * target_bs->backing point to itself, but to the source
>>>> +             * still.
>>>> +             */
>>>
>>> Hmm.. So, we want the following valid case:
>>>
>>> (other parents of source) ----> source = to_replace <--- backing --- target
>>>
>>> becomes
>>>
>>> (other parents of source) ----> target --- backing ---> source
>>>
>>> But it seems for me, that the following is not less valid:
>>>
>>> (other parents of source) ----> source = to_replace <--- backing --- X <--- backing --- target
>>>
>>> becomes
>>>
>>> (other parents of source) ----> target --- backing ---> X --- backing ---> source
>>
>> I think it is less valid.  The first case works with sync=none, because
>> target is initially empty and then you just copy all new data, so the
>> target keeps looking like the source.
>>
>> But in the second case, there are intermediate nodes that mean that
>> target may well not look like the source.
> 
> Maybe, it's valid if target node is a filter? Or, otherwise, it's backing is a filter,
> but this seems less useful.

The question to me is whether it’s really useful.  The thing is that
maybe bdrv_replace_node() can make sense of it.  But still, from the
user’s perspective, they kind of are asking for a loop whenever
to_replace is a child of target.  It just so happens that we must allow
one of these cases because it’s the default case for sync=none.

So I’d rather forbid all such cases, because it should be understandable
to users why...

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH for-5.0 v2 15/23] mirror: Prevent loops
  2019-11-29 14:17         ` Max Reitz
@ 2019-11-29 14:26           ` Vladimir Sementsov-Ogievskiy
  2019-11-29 14:38             ` Max Reitz
  0 siblings, 1 reply; 75+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2019-11-29 14:26 UTC (permalink / raw)
  To: Max Reitz, qemu-block; +Cc: Kevin Wolf, Alberto Garcia, qemu-devel

29.11.2019 17:17, Max Reitz wrote:
> On 29.11.19 14:55, Vladimir Sementsov-Ogievskiy wrote:
>> 29.11.2019 16:46, Max Reitz wrote:
>>> On 29.11.19 13:01, Vladimir Sementsov-Ogievskiy wrote:
>>>> 11.11.2019 19:02, Max Reitz wrote:
>>>>> While bdrv_replace_node() will not follow through with it, a specific
>>>>> @replaces asks the mirror job to create a loop.
>>>>>
>>>>> For example, say both the source and the target share a child where the
>>>>> source is a filter; by letting @replaces point to the common child, you
>>>>> ask for a loop.
>>>>>
>>>>> Or if you use @replaces in drive-mirror with sync=none and
>>>>> mode=absolute-paths, you generally ask for a loop (@replaces must point
>>>>> to a child of the source, and sync=none makes the source the backing
>>>>> file of the target after the job).
>>>>>
>>>>> bdrv_replace_node() will not create those loops, but by doing so, it
>>>>> ignores the user-requested configuration, which is not ideally either.
>>>>> (In the first example above, the target's child will remain what it was,
>>>>> which may still be reasonable.  But in the second example, the target
>>>>> will just not become a child of the source, which is precisely what was
>>>>> requested with @replaces.)
>>>>>
>>>>> So prevent such configurations, both before the job, and before it
>>>>> actually completes.
>>>>>
>>>>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>>>>> ---
>>>>>     block.c                   | 30 ++++++++++++++++++++++++
>>>>>     block/mirror.c            | 19 +++++++++++++++-
>>>>>     blockdev.c                | 48 ++++++++++++++++++++++++++++++++++++++-
>>>>>     include/block/block_int.h |  3 +++
>>>>>     4 files changed, 98 insertions(+), 2 deletions(-)
>>>>>
>>>>> diff --git a/block.c b/block.c
>>>>> index 0159f8e510..e3922a0474 100644
>>>>> --- a/block.c
>>>>> +++ b/block.c
>>>>> @@ -6259,6 +6259,36 @@ out:
>>>>>         return to_replace_bs;
>>>>>     }
>>>>>     
>>>>> +/*
>>>>> + * Return true iff @child is a (recursive) child of @parent, with at
>>>>> + * least @min_level edges between them.
>>>>> + *
>>>>> + * (If @min_level == 0, return true if @child == @parent.  For
>>>>> + * @min_level == 1, @child needs to be at least a real child; for
>>>>> + * @min_level == 2, it needs to be at least a grand-child; and so on.)
>>>>> + */
>>>>> +bool bdrv_is_child_of(BlockDriverState *child, BlockDriverState *parent,
>>>>> +                      int min_level)
>>>>> +{
>>>>> +    BdrvChild *c;
>>>>> +
>>>>> +    if (child == parent && min_level <= 0) {
>>>>> +        return true;
>>>>> +    }
>>>>> +
>>>>> +    if (!parent) {
>>>>> +        return false;
>>>>> +    }
>>>>> +
>>>>> +    QLIST_FOREACH(c, &parent->children, next) {
>>>>> +        if (bdrv_is_child_of(child, c->bs, min_level - 1)) {
>>>>> +            return true;
>>>>> +        }
>>>>> +    }
>>>>> +
>>>>> +    return false;
>>>>> +}
>>>>> +
>>>>>     /**
>>>>>      * Iterates through the list of runtime option keys that are said to
>>>>>      * be "strong" for a BDS.  An option is called "strong" if it changes
>>>>> diff --git a/block/mirror.c b/block/mirror.c
>>>>> index 68a4404666..b258c7e98b 100644
>>>>> --- a/block/mirror.c
>>>>> +++ b/block/mirror.c
>>>>> @@ -701,7 +701,24 @@ static int mirror_exit_common(Job *job)
>>>>>              * there.
>>>>>              */
>>>>>             if (bdrv_recurse_can_replace(src, to_replace)) {
>>>>> -            bdrv_replace_node(to_replace, target_bs, &local_err);
>>>>> +            /*
>>>>> +             * It is OK for @to_replace to be an immediate child of
>>>>> +             * @target_bs, because that is what happens with
>>>>> +             * drive-mirror sync=none mode=absolute-paths: target_bs's
>>>>> +             * backing file will be the source node, which is also
>>>>> +             * to_replace (by default).
>>>>> +             * bdrv_replace_node() handles this case by not letting
>>>>> +             * target_bs->backing point to itself, but to the source
>>>>> +             * still.
>>>>> +             */
>>>>
>>>> Hmm.. So, we want the following valid case:
>>>>
>>>> (other parents of source) ----> source = to_replace <--- backing --- target
>>>>
>>>> becomes
>>>>
>>>> (other parents of source) ----> target --- backing ---> source
>>>>
>>>> But it seems for me, that the following is not less valid:
>>>>
>>>> (other parents of source) ----> source = to_replace <--- backing --- X <--- backing --- target
>>>>
>>>> becomes
>>>>
>>>> (other parents of source) ----> target --- backing ---> X --- backing ---> source
>>>
>>> I think it is less valid.  The first case works with sync=none, because
>>> target is initially empty and then you just copy all new data, so the
>>> target keeps looking like the source.
>>>
>>> But in the second case, there are intermediate nodes that mean that
>>> target may well not look like the source.
>>
>> Maybe, it's valid if target node is a filter? Or, otherwise, it's backing is a filter,
>> but this seems less useful.
> 
> The question to me is whether it’s really useful.  The thing is that
> maybe bdrv_replace_node() can make sense of it.  But still, from the
> user’s perspective, they kind of are asking for a loop whenever
> to_replace is a child of target.  It just so happens that we must allow
> one of these cases because it’s the default case for sync=none.
> 
> So I’d rather forbid all such cases, because it should be understandable
> to users why...
> 

Okay, I don't have more arguments:) Honestly, I just feel that relying on existing
of chains between nodes of some hardcoded length is not good generic criteria...

bdrv_replace_node never creates loops.. Maybe, just document this behavior in
qapi? And (maybe) return error, if we see that bdrv_replace_node will be noop?

And if it is not noop, may be user don't tries to create a loop, but instead,
user is powerful, knows how bdrv_replace_node works and wants exactly this
behavior?

-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH for-5.0 v2 15/23] mirror: Prevent loops
  2019-11-29 14:26           ` Vladimir Sementsov-Ogievskiy
@ 2019-11-29 14:38             ` Max Reitz
  0 siblings, 0 replies; 75+ messages in thread
From: Max Reitz @ 2019-11-29 14:38 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-block
  Cc: Kevin Wolf, Alberto Garcia, qemu-devel


[-- Attachment #1.1: Type: text/plain, Size: 7143 bytes --]

On 29.11.19 15:26, Vladimir Sementsov-Ogievskiy wrote:
> 29.11.2019 17:17, Max Reitz wrote:
>> On 29.11.19 14:55, Vladimir Sementsov-Ogievskiy wrote:
>>> 29.11.2019 16:46, Max Reitz wrote:
>>>> On 29.11.19 13:01, Vladimir Sementsov-Ogievskiy wrote:
>>>>> 11.11.2019 19:02, Max Reitz wrote:
>>>>>> While bdrv_replace_node() will not follow through with it, a specific
>>>>>> @replaces asks the mirror job to create a loop.
>>>>>>
>>>>>> For example, say both the source and the target share a child where the
>>>>>> source is a filter; by letting @replaces point to the common child, you
>>>>>> ask for a loop.
>>>>>>
>>>>>> Or if you use @replaces in drive-mirror with sync=none and
>>>>>> mode=absolute-paths, you generally ask for a loop (@replaces must point
>>>>>> to a child of the source, and sync=none makes the source the backing
>>>>>> file of the target after the job).
>>>>>>
>>>>>> bdrv_replace_node() will not create those loops, but by doing so, it
>>>>>> ignores the user-requested configuration, which is not ideally either.
>>>>>> (In the first example above, the target's child will remain what it was,
>>>>>> which may still be reasonable.  But in the second example, the target
>>>>>> will just not become a child of the source, which is precisely what was
>>>>>> requested with @replaces.)
>>>>>>
>>>>>> So prevent such configurations, both before the job, and before it
>>>>>> actually completes.
>>>>>>
>>>>>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>>>>>> ---
>>>>>>     block.c                   | 30 ++++++++++++++++++++++++
>>>>>>     block/mirror.c            | 19 +++++++++++++++-
>>>>>>     blockdev.c                | 48 ++++++++++++++++++++++++++++++++++++++-
>>>>>>     include/block/block_int.h |  3 +++
>>>>>>     4 files changed, 98 insertions(+), 2 deletions(-)
>>>>>>
>>>>>> diff --git a/block.c b/block.c
>>>>>> index 0159f8e510..e3922a0474 100644
>>>>>> --- a/block.c
>>>>>> +++ b/block.c
>>>>>> @@ -6259,6 +6259,36 @@ out:
>>>>>>         return to_replace_bs;
>>>>>>     }
>>>>>>     
>>>>>> +/*
>>>>>> + * Return true iff @child is a (recursive) child of @parent, with at
>>>>>> + * least @min_level edges between them.
>>>>>> + *
>>>>>> + * (If @min_level == 0, return true if @child == @parent.  For
>>>>>> + * @min_level == 1, @child needs to be at least a real child; for
>>>>>> + * @min_level == 2, it needs to be at least a grand-child; and so on.)
>>>>>> + */
>>>>>> +bool bdrv_is_child_of(BlockDriverState *child, BlockDriverState *parent,
>>>>>> +                      int min_level)
>>>>>> +{
>>>>>> +    BdrvChild *c;
>>>>>> +
>>>>>> +    if (child == parent && min_level <= 0) {
>>>>>> +        return true;
>>>>>> +    }
>>>>>> +
>>>>>> +    if (!parent) {
>>>>>> +        return false;
>>>>>> +    }
>>>>>> +
>>>>>> +    QLIST_FOREACH(c, &parent->children, next) {
>>>>>> +        if (bdrv_is_child_of(child, c->bs, min_level - 1)) {
>>>>>> +            return true;
>>>>>> +        }
>>>>>> +    }
>>>>>> +
>>>>>> +    return false;
>>>>>> +}
>>>>>> +
>>>>>>     /**
>>>>>>      * Iterates through the list of runtime option keys that are said to
>>>>>>      * be "strong" for a BDS.  An option is called "strong" if it changes
>>>>>> diff --git a/block/mirror.c b/block/mirror.c
>>>>>> index 68a4404666..b258c7e98b 100644
>>>>>> --- a/block/mirror.c
>>>>>> +++ b/block/mirror.c
>>>>>> @@ -701,7 +701,24 @@ static int mirror_exit_common(Job *job)
>>>>>>              * there.
>>>>>>              */
>>>>>>             if (bdrv_recurse_can_replace(src, to_replace)) {
>>>>>> -            bdrv_replace_node(to_replace, target_bs, &local_err);
>>>>>> +            /*
>>>>>> +             * It is OK for @to_replace to be an immediate child of
>>>>>> +             * @target_bs, because that is what happens with
>>>>>> +             * drive-mirror sync=none mode=absolute-paths: target_bs's
>>>>>> +             * backing file will be the source node, which is also
>>>>>> +             * to_replace (by default).
>>>>>> +             * bdrv_replace_node() handles this case by not letting
>>>>>> +             * target_bs->backing point to itself, but to the source
>>>>>> +             * still.
>>>>>> +             */
>>>>>
>>>>> Hmm.. So, we want the following valid case:
>>>>>
>>>>> (other parents of source) ----> source = to_replace <--- backing --- target
>>>>>
>>>>> becomes
>>>>>
>>>>> (other parents of source) ----> target --- backing ---> source
>>>>>
>>>>> But it seems for me, that the following is not less valid:
>>>>>
>>>>> (other parents of source) ----> source = to_replace <--- backing --- X <--- backing --- target
>>>>>
>>>>> becomes
>>>>>
>>>>> (other parents of source) ----> target --- backing ---> X --- backing ---> source
>>>>
>>>> I think it is less valid.  The first case works with sync=none, because
>>>> target is initially empty and then you just copy all new data, so the
>>>> target keeps looking like the source.
>>>>
>>>> But in the second case, there are intermediate nodes that mean that
>>>> target may well not look like the source.
>>>
>>> Maybe, it's valid if target node is a filter? Or, otherwise, it's backing is a filter,
>>> but this seems less useful.
>>
>> The question to me is whether it’s really useful.  The thing is that
>> maybe bdrv_replace_node() can make sense of it.  But still, from the
>> user’s perspective, they kind of are asking for a loop whenever
>> to_replace is a child of target.  It just so happens that we must allow
>> one of these cases because it’s the default case for sync=none.
>>
>> So I’d rather forbid all such cases, because it should be understandable
>> to users why...
>>
> 
> Okay, I don't have more arguments:) Honestly, I just feel that relying on existing
> of chains between nodes of some hardcoded length is not good generic criteria...
> 
> bdrv_replace_node never creates loops.. Maybe, just document this behavior in
> qapi? And (maybe) return error, if we see that bdrv_replace_node will be noop?
> 
> And if it is not noop, may be user don't tries to create a loop, but instead,
> user is powerful, knows how bdrv_replace_node works and wants exactly this
> behavior?

I don’t know whether that’s a good point.  We have strong restrictions
on @replaces anyway (that’s the point of this series, to fix them).  So
if we want to loosen those restrictions and allow the user to do
anything they want because it’s their job to be careful, that would be a
whole different series.

Also, one of the examples in the commit message is using @replaces with
drive-mirror sync=none mode=absolute-paths.  @replaces must be a child
of the source.  So what will happen is that it’s replaced and then we
can’t attach the source as the backing file of the target.  So the
target will probably just read garbage, given that it now lacks the
source as its backing file.

So I’m not sold on “If the user knows what’ll happen, it’s all good”.
Because I don’t think they’ll really know.  I’d rather keep it tight
until someone complains.

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH for-5.0 v2 15/23] mirror: Prevent loops
  2019-11-11 16:02 ` [PATCH for-5.0 v2 15/23] mirror: Prevent loops Max Reitz
  2019-11-29 12:01   ` Vladimir Sementsov-Ogievskiy
@ 2019-12-02 12:12   ` Vladimir Sementsov-Ogievskiy
  2019-12-09 14:43     ` Max Reitz
  1 sibling, 1 reply; 75+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2019-12-02 12:12 UTC (permalink / raw)
  To: Max Reitz, qemu-block; +Cc: Kevin Wolf, Alberto Garcia, qemu-devel

11.11.2019 19:02, Max Reitz wrote:
> While bdrv_replace_node() will not follow through with it, a specific
> @replaces asks the mirror job to create a loop.
> 
> For example, say both the source and the target share a child where the
> source is a filter; by letting @replaces point to the common child, you
> ask for a loop.
> 
> Or if you use @replaces in drive-mirror with sync=none and
> mode=absolute-paths, you generally ask for a loop (@replaces must point
> to a child of the source, and sync=none makes the source the backing
> file of the target after the job).
> 
> bdrv_replace_node() will not create those loops, but by doing so, it
> ignores the user-requested configuration, which is not ideally either.
> (In the first example above, the target's child will remain what it was,
> which may still be reasonable.  But in the second example, the target
> will just not become a child of the source, which is precisely what was
> requested with @replaces.)
> 
> So prevent such configurations, both before the job, and before it
> actually completes.
> 
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>   block.c                   | 30 ++++++++++++++++++++++++
>   block/mirror.c            | 19 +++++++++++++++-
>   blockdev.c                | 48 ++++++++++++++++++++++++++++++++++++++-
>   include/block/block_int.h |  3 +++
>   4 files changed, 98 insertions(+), 2 deletions(-)
> 
> diff --git a/block.c b/block.c
> index 0159f8e510..e3922a0474 100644
> --- a/block.c
> +++ b/block.c
> @@ -6259,6 +6259,36 @@ out:
>       return to_replace_bs;
>   }
>   
> +/*
> + * Return true iff @child is a (recursive) child of @parent, with at
> + * least @min_level edges between them.
> + *
> + * (If @min_level == 0, return true if @child == @parent.  For
> + * @min_level == 1, @child needs to be at least a real child; for
> + * @min_level == 2, it needs to be at least a grand-child; and so on.)
> + */
> +bool bdrv_is_child_of(BlockDriverState *child, BlockDriverState *parent,
> +                      int min_level)
> +{
> +    BdrvChild *c;
> +
> +    if (child == parent && min_level <= 0) {
> +        return true;
> +    }
> +
> +    if (!parent) {
> +        return false;
> +    }
> +
> +    QLIST_FOREACH(c, &parent->children, next) {
> +        if (bdrv_is_child_of(child, c->bs, min_level - 1)) {
> +            return true;
> +        }
> +    }
> +
> +    return false;
> +}
> +
>   /**
>    * Iterates through the list of runtime option keys that are said to
>    * be "strong" for a BDS.  An option is called "strong" if it changes
> diff --git a/block/mirror.c b/block/mirror.c
> index 68a4404666..b258c7e98b 100644
> --- a/block/mirror.c
> +++ b/block/mirror.c
> @@ -701,7 +701,24 @@ static int mirror_exit_common(Job *job)
>            * there.
>            */
>           if (bdrv_recurse_can_replace(src, to_replace)) {
> -            bdrv_replace_node(to_replace, target_bs, &local_err);
> +            /*
> +             * It is OK for @to_replace to be an immediate child of
> +             * @target_bs, because that is what happens with
> +             * drive-mirror sync=none mode=absolute-paths: target_bs's
> +             * backing file will be the source node, which is also
> +             * to_replace (by default).
> +             * bdrv_replace_node() handles this case by not letting
> +             * target_bs->backing point to itself, but to the source
> +             * still.
> +             */
> +            if (!bdrv_is_child_of(to_replace, target_bs, 2)) {
> +                bdrv_replace_node(to_replace, target_bs, &local_err);
> +            } else {
> +                error_setg(&local_err, "Can no longer replace '%s' by '%s', "
> +                           "because the former is now a child of the latter, "
> +                           "and doing so would thus create a loop",
> +                           to_replace->node_name, target_bs->node_name);
> +            }

you may swap if and else branch, dropping "!" mark..

>           } else {
>               error_setg(&local_err, "Can no longer replace '%s' by '%s', "
>                          "because it can no longer be guaranteed that doing so "
> diff --git a/blockdev.c b/blockdev.c
> index 9dc2238bf3..d29f147f72 100644
> --- a/blockdev.c
> +++ b/blockdev.c
> @@ -3824,7 +3824,7 @@ static void blockdev_mirror_common(const char *job_id, BlockDriverState *bs,
>       }
>   
>       if (has_replaces) {
> -        BlockDriverState *to_replace_bs;
> +        BlockDriverState *to_replace_bs, *target_backing_bs;
>           AioContext *replace_aio_context;
>           int64_t bs_size, replace_size;
>   
> @@ -3839,6 +3839,52 @@ static void blockdev_mirror_common(const char *job_id, BlockDriverState *bs,
>               return;
>           }
>   
> +        if (bdrv_is_child_of(to_replace_bs, target, 1)) {
> +            error_setg(errp, "Replacing %s by %s would result in a loop, "
> +                       "because the former is a child of the latter",
> +                       to_replace_bs->node_name, target->node_name);
> +            return;
> +        }

here min_level=1, so we don't handle the case, described in mirror_exit_common..
I don't see why.. blockdev_mirror_common is called from qmp_drive_mirror,
including the case with MIRROR_SYNC_MODE_NONE and NEW_IMAGE_MODE_ABSOLUTE_PATHS..

What I'm missing?

> +
> +        if (backing_mode == MIRROR_SOURCE_BACKING_CHAIN ||
> +            backing_mode == MIRROR_OPEN_BACKING_CHAIN)
> +        {
> +            /*
> +             * While we do not quite know what OPEN_BACKING_CHAIN
> +             * (used for mode=existing) will yield, it is probably
> +             * best to restrict it exactly like SOURCE_BACKING_CHAIN,
> +             * because that is our best guess.
> +             */
> +            switch (sync) {
> +            case MIRROR_SYNC_MODE_FULL:
> +                target_backing_bs = NULL;
> +                break;
> +
> +            case MIRROR_SYNC_MODE_TOP:
> +                target_backing_bs = backing_bs(bs);
> +                break;
> +
> +            case MIRROR_SYNC_MODE_NONE:
> +                target_backing_bs = bs;
> +                break;
> +
> +            default:
> +                abort();
> +            }
> +        } else {
> +            assert(backing_mode == MIRROR_LEAVE_BACKING_CHAIN);
> +            target_backing_bs = backing_bs(target);
> +        }
> +
> +        if (bdrv_is_child_of(to_replace_bs, target_backing_bs, 0)) {
> +            error_setg(errp, "Replacing '%s' by '%s' with this sync mode would "
> +                       "result in a loop, because the former would be a child "
> +                       "of the latter's backing file ('%s') after the mirror "
> +                       "job", to_replace_bs->node_name, target->node_name,
> +                       target_backing_bs->node_name);
> +            return;
> +        }

hmm.. so for MODE_NONE we disallow to_replace == src?

> +
>           replace_aio_context = bdrv_get_aio_context(to_replace_bs);
>           aio_context_acquire(replace_aio_context);
>           replace_size = bdrv_getlength(to_replace_bs);
> diff --git a/include/block/block_int.h b/include/block/block_int.h
> index 589a797fab..7064a1a4fa 100644
> --- a/include/block/block_int.h
> +++ b/include/block/block_int.h
> @@ -1266,6 +1266,9 @@ void bdrv_format_default_perms(BlockDriverState *bs, BdrvChild *c,
>   bool bdrv_recurse_can_replace(BlockDriverState *bs,
>                                 BlockDriverState *to_replace);
>   
> +bool bdrv_is_child_of(BlockDriverState *child, BlockDriverState *parent,
> +                      int min_level);
> +
>   /*
>    * Default implementation for drivers to pass bdrv_co_block_status() to
>    * their file.
> 


-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH for-5.0 v2 17/23] iotests: Use skip_if_unsupported decorator in 041
  2019-11-11 16:02 ` [PATCH for-5.0 v2 17/23] iotests: Use skip_if_unsupported decorator in 041 Max Reitz
@ 2019-12-03 12:03   ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 75+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2019-12-03 12:03 UTC (permalink / raw)
  To: Max Reitz, qemu-block; +Cc: Kevin Wolf, Alberto Garcia, qemu-devel

11.11.2019 19:02, Max Reitz wrote:
> We can use this decorator above TestRepairQuorum.setUp() to skip all
> quorum tests with a single line.
> 
> Signed-off-by: Max Reitz<mreitz@redhat.com>

Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>

-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH for-5.0 v2 18/23] iotests: Add VM.assert_block_path()
  2019-11-11 16:02 ` [PATCH for-5.0 v2 18/23] iotests: Add VM.assert_block_path() Max Reitz
@ 2019-12-03 12:59   ` Vladimir Sementsov-Ogievskiy
  2019-12-09 15:10     ` Max Reitz
  2019-12-13 11:27   ` Vladimir Sementsov-Ogievskiy
  1 sibling, 1 reply; 75+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2019-12-03 12:59 UTC (permalink / raw)
  To: Max Reitz, qemu-block; +Cc: Kevin Wolf, Alberto Garcia, qemu-devel

11.11.2019 19:02, Max Reitz wrote:
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>   tests/qemu-iotests/iotests.py | 59 +++++++++++++++++++++++++++++++++++
>   1 file changed, 59 insertions(+)
> 
> diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py
> index d34305ce69..3e03320ce3 100644
> --- a/tests/qemu-iotests/iotests.py
> +++ b/tests/qemu-iotests/iotests.py
> @@ -681,6 +681,65 @@ class VM(qtest.QEMUQtestMachine):
>   
>           return fields.items() <= ret.items()
>   
> +    """
> +    Check whether the node under the given path in the block graph is
> +    @expected_node.
> +
> +    @root is the node name of the node where the @path is rooted.
> +
> +    @path is a string that consists of child names separated by
> +    slashes.  It must begin with a slash.

Why do you need this slash? To stress that we are starting from root?
But root is not global, it's selected by previous argument, so for me the
path is more like relative than absolute..

> +
> +    Examples for @root + @path:
> +      - root="qcow2-node", path="/backing/file"
> +      - root="quorum-node", path="/children.2/file"
> +
> +    Hypothetically, @path could be empty, in which case it would point
> +    to @root.  However, in practice this case is not useful and hence
> +    not allowed.

1. path can't be empty, as accordingly to previous point, it must start with '/'
2. path can be '/', which does exactly what you don't allow, and I don't see,
where it is restricted in code

> +
> +    @expected_node may be None.

Which means that, we assert existence of the path except its last element,
yes? Worth mention this behavior here.

> +
> +    @graph may be None or the result of an x-debug-query-block-graph
> +    call that has already been performed.
> +    """
> +    def assert_block_path(self, root, path, expected_node, graph=None):
> +        if graph is None:
> +            graph = self.qmp('x-debug-query-block-graph')['return']
> +
> +        iter_path = iter(path.split('/'))
> +
> +        # Must start with a /
> +        assert next(iter_path) == ''
> +
> +        node = next((node for node in graph['nodes'] if node['name'] == root),
> +                    None)
> +
> +        for path_node in iter_path:
> +            assert node is not None, 'Cannot follow path %s' % path
> +
> +            try:
> +                node_id = next(edge['child'] for edge in graph['edges'] \
> +                                             if edge['parent'] == node['id'] and
> +                                                edge['name'] == path_node)
> +
> +                node = next(node for node in graph['nodes'] \
> +                                 if node['id'] == node_id)

this line cant fail. If it fail, it means a bug in x-debug-query-block-graph, so,
I'd prefer to move it out of try:except block.

> +            except StopIteration:
> +                node = None
> +
> +        assert node is not None or expected_node is None, \
> +               'No node found under %s (but expected %s)' % \
> +               (path, expected_node)
> +
> +        assert expected_node is not None or node is None, \
> +               'Found node %s under %s (but expected none)' % \
> +               (node['name'], path)
> +
> +        if node is not None and expected_node is not None:

[1]
second part of condition already asserted by previous assertion

> +            assert node['name'] == expected_node, \
> +                   'Found node %s under %s (but expected %s)' % \
> +                   (node['name'], path, expected_node)

IMHO, it would be easier to read like:

           if node is None:
               assert  expected_node is None, \
                  'No node found under %s (but expected %s)' % \
                  (path, expected_node)
           else:
               assert expected_node is not None, \
                  'Found node %s under %s (but expected none)' % \
                  (node['name'], path)

               assert node['name'] == expected_node, \
                      'Found node %s under %s (but expected %s)' % \
                      (node['name'], path, expected_node)

Or even just

           if node is None:
               assert expected_node is None, \
                  'No node found under %s (but expected %s)' % \
                  (path, expected_node)
           else:
               assert node['name'] == expected_node, \
                      'Found node %s under %s (but expected %s)' % \
                      (node['name'], path, expected_node)

(I've checked:
 >>> 'erger %s erg' % None
'erger None erg'

Also, %-style formatting is old, as I understand it's better always use .format()
)

>   
>   index_re = re.compile(r'([^\[]+)\[([^\]]+)\]')
>   
> 
-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH for-5.0 v2 19/23] iotests: Resolve TODOs in 041
  2019-11-11 16:02 ` [PATCH for-5.0 v2 19/23] iotests: Resolve TODOs in 041 Max Reitz
@ 2019-12-03 13:32   ` Vladimir Sementsov-Ogievskiy
  2019-12-03 13:33     ` Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 75+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2019-12-03 13:32 UTC (permalink / raw)
  To: Max Reitz, qemu-block; +Cc: Kevin Wolf, Alberto Garcia, qemu-devel

11.11.2019 19:02, Max Reitz wrote:
> Signed-off-by: Max Reitz<mreitz@redhat.com>


Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>

-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH for-5.0 v2 19/23] iotests: Resolve TODOs in 041
  2019-12-03 13:32   ` Vladimir Sementsov-Ogievskiy
@ 2019-12-03 13:33     ` Vladimir Sementsov-Ogievskiy
  2019-12-09 15:15       ` Max Reitz
  0 siblings, 1 reply; 75+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2019-12-03 13:33 UTC (permalink / raw)
  To: Max Reitz, qemu-block; +Cc: Kevin Wolf, Alberto Garcia, qemu-devel

03.12.2019 16:32, Vladimir Sementsov-Ogievskiy wrote:
> 11.11.2019 19:02, Max Reitz wrote:
>> Signed-off-by: Max Reitz<mreitz@redhat.com>
> 
> 
> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> 


Oops, stop. Why do you remove line "self.vm.shutdown()" ?

-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH for-5.0 v2 21/23] iotests: Add tests for invalid Quorum @replaces
  2019-11-11 16:02 ` [PATCH for-5.0 v2 21/23] iotests: Add tests for invalid Quorum @replaces Max Reitz
@ 2019-12-03 14:40   ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 75+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2019-12-03 14:40 UTC (permalink / raw)
  To: Max Reitz, qemu-block; +Cc: Kevin Wolf, Alberto Garcia, qemu-devel

11.11.2019 19:02, Max Reitz wrote:
> Add two tests to see that you cannot replace a Quorum child with the
> mirror job while the child is in use by a different parent.
> 
> Signed-off-by: Max Reitz<mreitz@redhat.com>

Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>

-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH for-5.0 v2 22/23] iotests: Check that @replaces can replace filters
  2019-11-11 16:02 ` [PATCH for-5.0 v2 22/23] iotests: Check that @replaces can replace filters Max Reitz
@ 2019-12-03 15:58   ` Vladimir Sementsov-Ogievskiy
  2019-12-09 15:17     ` Max Reitz
  0 siblings, 1 reply; 75+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2019-12-03 15:58 UTC (permalink / raw)
  To: Max Reitz, qemu-block; +Cc: Kevin Wolf, Alberto Garcia, qemu-devel

11.11.2019 19:02, Max Reitz wrote:
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>   tests/qemu-iotests/041     | 46 ++++++++++++++++++++++++++++++++++++++
>   tests/qemu-iotests/041.out |  4 ++--
>   2 files changed, 48 insertions(+), 2 deletions(-)
> 
> diff --git a/tests/qemu-iotests/041 b/tests/qemu-iotests/041
> index ab0cb5b42f..9a00cf6f7b 100755
> --- a/tests/qemu-iotests/041
> +++ b/tests/qemu-iotests/041
> @@ -1200,6 +1200,52 @@ class TestOrphanedSource(iotests.QMPTestCase):
>           self.assertFalse('mirror-filter' in nodes,
>                            'Mirror filter node did not disappear')
>   
> +# Test cases for @replaces that do not necessarily involve Quorum
> +class TestReplaces(iotests.QMPTestCase):
> +    # Each of these test cases needs their own block graph, so do not
> +    # create any nodes here
> +    def setUp(self):
> +        self.vm = iotests.VM()
> +        self.vm.launch()
> +
> +    def tearDown(self):
> +        self.vm.shutdown()
> +        for img in (test_img, target_img):
> +            try:
> +                os.remove(img)
> +            except OSError:
> +                pass
> +
> +    """
> +    Check that we can replace filter nodes.
> +    """

PEP8 says, that doc string should appear after "def" line.
(this applies to previous patch too)

> +    @iotests.skip_if_unsupported(['copy-on-read'])
> +    def test_replace_filter(self):
> +        result = self.vm.qmp('blockdev-add', **{
> +                                 'driver': 'copy-on-read',
> +                                 'node-name': 'filter0',
> +                                 'file': {
> +                                     'driver': 'copy-on-read',
> +                                     'node-name': 'filter1',
> +                                     'file': {
> +                                         'driver': 'null-co'
> +                                     }
> +                                 }
> +                             })
> +        self.assert_qmp(result, 'return', {})
> +
> +        result = self.vm.qmp('blockdev-add',
> +                             node_name='target', driver='null-co')
> +        self.assert_qmp(result, 'return', {})
> +
> +        result = self.vm.qmp('blockdev-mirror', job_id='mirror', device='filter0',
> +                             target='target', sync='full', replaces='filter1')
> +        self.assert_qmp(result, 'return', {})
> +
> +        self.complete_and_wait('mirror')
> +
> +        self.vm.assert_block_path('filter0', '/file', 'target')
> +
>   if __name__ == '__main__':
>       iotests.main(supported_fmts=['qcow2', 'qed'],
>                    supported_protocols=['file'])
> diff --git a/tests/qemu-iotests/041.out b/tests/qemu-iotests/041.out
> index ffc779b4d1..877b76fd31 100644
> --- a/tests/qemu-iotests/041.out
> +++ b/tests/qemu-iotests/041.out
> @@ -1,5 +1,5 @@
> -.............................................................................................
> +..............................................................................................
>   ----------------------------------------------------------------------
> -Ran 93 tests
> +Ran 94 tests
>   
>   OK
> 


Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>


-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH for-5.0 v2 23/23] iotests: Mirror must not attempt to create loops
  2019-11-11 16:02 ` [PATCH for-5.0 v2 23/23] iotests: Mirror must not attempt to create loops Max Reitz
@ 2019-12-03 17:03   ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 75+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2019-12-03 17:03 UTC (permalink / raw)
  To: Max Reitz, qemu-block; +Cc: Kevin Wolf, Alberto Garcia, qemu-devel

11.11.2019 19:02, Max Reitz wrote:
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>   tests/qemu-iotests/041     | 235 +++++++++++++++++++++++++++++++++++++
>   tests/qemu-iotests/041.out |   4 +-
>   2 files changed, 237 insertions(+), 2 deletions(-)
> 
> diff --git a/tests/qemu-iotests/041 b/tests/qemu-iotests/041
> index 9a00cf6f7b..0e43bb699d 100755
> --- a/tests/qemu-iotests/041
> +++ b/tests/qemu-iotests/041
> @@ -1246,6 +1246,241 @@ class TestReplaces(iotests.QMPTestCase):
>   
>           self.vm.assert_block_path('filter0', '/file', 'target')
>   
> +    """
> +    See what happens when the @sync/@replaces configuration dictates
> +    creating a loop.
> +    """
> +    @iotests.skip_if_unsupported(['throttle'])
> +    def test_loop(self):
> +        qemu_img('create', '-f', iotests.imgfmt, test_img, str(1 * 1024 * 1024))
> +
> +        # Dummy group so we can create a NOP filter
> +        result = self.vm.qmp('object-add', qom_type='throttle-group', id='tg0')
> +        self.assert_qmp(result, 'return', {})
> +
> +        result = self.vm.qmp('blockdev-add', **{
> +                                 'driver': 'throttle',
> +                                 'node-name': 'source',
> +                                 'throttle-group': 'tg0',
> +                                 'file': {
> +                                     'driver': iotests.imgfmt,
> +                                     'node-name': 'filtered',
> +                                     'file': {
> +                                         'driver': 'file',
> +                                         'filename': test_img
> +                                     }
> +                                 }
> +                             })
> +        self.assert_qmp(result, 'return', {})
> +
> +        # Block graph is now:
> +        #   source[throttle] --file--> filtered[imgfmt] --file--> ...
> +
> +        result = self.vm.qmp('drive-mirror', job_id='mirror', device='source',
> +                             target=target_img, format=iotests.imgfmt,
> +                             node_name='target', sync='none',
> +                             replaces='filtered')
> +
> +        """
> +        Block graph before mirror exits would be (ignoring mirror_top):
> +          source[throttle] --file--> filtered[imgfmt] --file--> ...
> +          target[imgfmt] --file--> ...
> +
> +        Then, because of sync=none and drive-mirror in absolute-paths mode,
> +        the source is attached to the target:
> +          source[throttle] --file--> filtered[imgfmt] --file--> ...
> +                 ^
> +              backing
> +                 |
> +            target[imgfmt] --file--> ...
> +
> +        Replacing filtered by target would yield:
> +          source[throttle] --file--> target[imgfmt] --file--> ...
> +                 ^                        |
> +                 +------- backing --------+
> +
> +        I.e., a loop.  bdrv_replace_node() detects this and simply
> +        does not let source's file link point to target.  However,
> +        that means that target cannot really replace source.
> +
> +        drive-mirror should detect this and not allow this case.
> +        """
> +
> +        self.assert_qmp(result, 'error/desc',
> +                        "Replacing 'filtered' by 'target' with this sync " + \
> +                        "mode would result in a loop, because the former " + \
> +                        "would be a child of the latter's backing file " + \
> +                        "('source') after the mirror job")
> +
> +    """
> +    Test what happens when there would be no loop with the pre-mirror
> +    configuration, but something changes during the mirror job that asks
> +    for a loop to be created during completion.
> +    """
> +    @iotests.skip_if_unsupported(['copy-on-read', 'quorum'])
> +    def test_loop_during_mirror(self):
> +        qemu_img('create', '-f', iotests.imgfmt, test_img, str(1 * 1024 * 1024))
> +
> +        """
> +        In this test, we are going to mirror from a node that is a
> +        filter above some file "common-base".  The target is a quorum
> +        node (with just an unrelated null-co child).
> +
> +        We will ask the mirror job to replace common-base by the
> +        target upon completion.  That is a completely valid
> +        configuration so far.
> +
> +        However, while the job is running, we add common-base as an
> +        (indirect[1]) child to the target quorum node.  This way,
> +        completing the job as requested would yield a loop, because
> +        the target would be supposed to replace common-base -- which
> +        is its own (indirect) child.
> +
> +        [1] It needs to be an indirect child, because if it were a
> +        direct child, the mirror job would simply end by effectively
> +        injecting the target above common-base.  This is the same
> +        effect as when using sync=none: The target ends up above the
> +        source.
> +
> +        So only loops that have a length of more than one node are
> +        forbidden, which means common-base must be an indirect child
> +        of the target.
> +
> +        (Furthermore, we are going to use x-blockdev-change to add
> +        common-base as a child to the target.  This command only
> +        allows doing so for nodes that have no parent yet.
> +        common-base will have a parent already, though, namely the
> +        source node.  Therefore, this is another reason why we need at
> +        least one node above common-base, so this parent can become
> +        target's child during the mirror.)
> +        """
> +
> +        result = self.vm.qmp('blockdev-add', **{
> +                                 'driver': 'null-co',
> +                                 'node-name': 'common-base',
> +                                 'read-zeroes': True,
> +                                 'size': 1 * 1024 * 1024
> +                             })
> +        self.assert_qmp(result, 'return', {})
> +
> +        result = self.vm.qmp('blockdev-add', **{
> +                                 'driver': 'copy-on-read',
> +                                 'node-name': 'source',
> +                                 'file': 'common-base'
> +                             })
> +        self.assert_qmp(result, 'return', {})
> +
> +        """
> +        As explained above, we have to create a parent above
> +        common-base.
> +
> +        We cannot use any parent that would forward the RESIZE
> +        permission, because the job takes it on the target, but
> +        unshares it on the source: After the x-blockdev-change
> +        operation during the mirror job, this parent will be a child
> +        of the target, so common-base will be an (indirect) child of
> +        both the mirror's source and target.  Thus, the job would
> +        conflict with itself.
> +
> +        Therefore, we make common-base a backing child of a $imgfmt
> +        node.  Unfortunately, we cannot let the mirror job replace a
> +        node that acts as a backing child somewhere (because of an op
> +        blocker), so we put another raw node between the $imgfmt node
> +        and common-base.
> +        """
> +        result = self.vm.qmp('blockdev-add', **{
> +                                 'driver': iotests.imgfmt,
> +                                 'node-name': 'base-parent',
> +                                 'file': {
> +                                     'driver': 'file',
> +                                     'filename': test_img
> +                                 },
> +                                 'backing': {
> +                                     'driver': 'raw',
> +                                     'file': 'common-base'
> +                                 }
> +                             })

self.assert_qmp(result, 'return', {})

> +
> +        """
> +        Add a quorum node with a single child, we will add base-parent
> +        to prepare a loop later.
> +        (We do not care about this single child at all, but it is
> +        impossible to create a quorum node without any children.  We
> +        will ignore this child from now on.)
> +        """
> +        result = self.vm.qmp('blockdev-add', **{
> +                                 'driver': 'quorum',
> +                                 'node-name': 'target',
> +                                 'vote-threshold': 1,
> +                                 'children': [
> +                                     {
> +                                         'driver': 'null-co',
> +                                         'read-zeroes': True,
> +                                         'size': 1 * 1024 * 1024
> +                                     }
> +                                 ]
> +                             })
> +        self.assert_qmp(result, 'return', {})
> +
> +        """
> +        Current block graph:
> +
> +        base-parent[$imgfmt] --backing--> [raw]
> +                                            |
> +                                           file
> +                                            v
> +              source[COR] --file--> common-base[null-co]
> +
> +        target[quorum]
> +
> +
> +        The following blockdev-mirror asks for this graph post-mirror:
> +
> +        base-parent[$imgfmt] --backing--> [raw]
> +                                            |
> +                                           file
> +                                            v
> +                source[COR] --file--> target[quorum]
> +
> +        That would be a valid configuration without any loops.
> +        """
> +
> +        result = self.vm.qmp('blockdev-mirror', job_id='mirror',
> +                             device='source', target='target', sync='full',
> +                             replaces='common-base')
> +        self.assert_qmp(result, 'return', {})
> +
> +        """
> +        However, now we will make base-parent a child of target.
> +        Before the mirror job completes, that is still completely
> +        valid:
> +
> +                                             source
> +                                               |
> +                                               v
> +        target -> base-parent -> [raw] -> common-base
> +        """
> +
> +        result = self.vm.qmp('x-blockdev-change',
> +                             parent='target', node='base-parent')
> +        self.assert_qmp(result, 'return', {})
> +
> +        """
> +        However, post-mirror, we thus ask for a loop:
> +
> +        source -> target (replaced common-base) -> base-parent
> +                                  ^                    |
> +                                  |                    v
> +                                  +----------------- [raw]
> +
> +        bdrv_replace_node() would not allow such a configuration, but
> +        we should not pretend we can create it, so the mirror job
> +        should fail during completion.
> +        """
> +
> +        self.complete_and_wait('mirror',
> +                               completion_error='Operation not permitted')

Thanks for exhaustive comments!

> +
>   if __name__ == '__main__':
>       iotests.main(supported_fmts=['qcow2', 'qed'],
>                    supported_protocols=['file'])
> diff --git a/tests/qemu-iotests/041.out b/tests/qemu-iotests/041.out
> index 877b76fd31..20a8158b99 100644
> --- a/tests/qemu-iotests/041.out
> +++ b/tests/qemu-iotests/041.out
> @@ -1,5 +1,5 @@
> -..............................................................................................
> +................................................................................................
>   ----------------------------------------------------------------------
> -Ran 94 tests
> +Ran 96 tests
>   
>   OK
> 


With forgotten assertion added:

Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>

-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH for-5.0 v2 02/23] blockdev: Allow resizing everywhere
  2019-11-11 16:01 ` [PATCH for-5.0 v2 02/23] blockdev: Allow resizing everywhere Max Reitz
@ 2019-12-06 14:04   ` Alberto Garcia
  2019-12-09 13:56     ` Max Reitz
  0 siblings, 1 reply; 75+ messages in thread
From: Alberto Garcia @ 2019-12-06 14:04 UTC (permalink / raw)
  To: Max Reitz, qemu-block
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, qemu-devel, Max Reitz

On Mon 11 Nov 2019 05:01:55 PM CET, Max Reitz wrote:
> @@ -3177,11 +3177,6 @@ void qmp_block_resize(bool has_device, const char *device,
>      aio_context = bdrv_get_aio_context(bs);
>      aio_context_acquire(aio_context);
>  
> -    if (!bdrv_is_first_non_filter(bs)) {
> -        error_setg(errp, QERR_FEATURE_DISABLED, "resize");
> -        goto out;
> -    }
> -

What happens with this case now?

https://lists.gnu.org/archive/html/qemu-block/2019-11/msg00793.html

Berto


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH for-5.0 v2 02/23] blockdev: Allow resizing everywhere
  2019-12-06 14:04   ` Alberto Garcia
@ 2019-12-09 13:56     ` Max Reitz
  0 siblings, 0 replies; 75+ messages in thread
From: Max Reitz @ 2019-12-09 13:56 UTC (permalink / raw)
  To: Alberto Garcia, qemu-block
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, qemu-devel


[-- Attachment #1.1: Type: text/plain, Size: 899 bytes --]

On 06.12.19 15:04, Alberto Garcia wrote:
> On Mon 11 Nov 2019 05:01:55 PM CET, Max Reitz wrote:
>> @@ -3177,11 +3177,6 @@ void qmp_block_resize(bool has_device, const char *device,
>>      aio_context = bdrv_get_aio_context(bs);
>>      aio_context_acquire(aio_context);
>>  
>> -    if (!bdrv_is_first_non_filter(bs)) {
>> -        error_setg(errp, QERR_FEATURE_DISABLED, "resize");
>> -        goto out;
>> -    }
>> -
> 
> What happens with this case now?
> 
> https://lists.gnu.org/archive/html/qemu-block/2019-11/msg00793.html

As far as I understand, we have a bug there and we’ll fix it in 5.0.
It’s just that in one case, it wasn’t visible because resize wasn’t
allowed on some nodes (where I think it should actually be allowed,
hence this patch).

So I think we should allow resize on those nodes (this patch) and fix
the bug, and that should be fine then.

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH for-5.0 v2 15/23] mirror: Prevent loops
  2019-12-02 12:12   ` Vladimir Sementsov-Ogievskiy
@ 2019-12-09 14:43     ` Max Reitz
  2019-12-13 11:18       ` Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 75+ messages in thread
From: Max Reitz @ 2019-12-09 14:43 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-block
  Cc: Kevin Wolf, Alberto Garcia, qemu-devel


[-- Attachment #1.1: Type: text/plain, Size: 9083 bytes --]

On 02.12.19 13:12, Vladimir Sementsov-Ogievskiy wrote:
> 11.11.2019 19:02, Max Reitz wrote:
>> While bdrv_replace_node() will not follow through with it, a specific
>> @replaces asks the mirror job to create a loop.
>>
>> For example, say both the source and the target share a child where the
>> source is a filter; by letting @replaces point to the common child, you
>> ask for a loop.
>>
>> Or if you use @replaces in drive-mirror with sync=none and
>> mode=absolute-paths, you generally ask for a loop (@replaces must point
>> to a child of the source, and sync=none makes the source the backing
>> file of the target after the job).
>>
>> bdrv_replace_node() will not create those loops, but by doing so, it
>> ignores the user-requested configuration, which is not ideally either.
>> (In the first example above, the target's child will remain what it was,
>> which may still be reasonable.  But in the second example, the target
>> will just not become a child of the source, which is precisely what was
>> requested with @replaces.)
>>
>> So prevent such configurations, both before the job, and before it
>> actually completes.
>>
>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>> ---
>>   block.c                   | 30 ++++++++++++++++++++++++
>>   block/mirror.c            | 19 +++++++++++++++-
>>   blockdev.c                | 48 ++++++++++++++++++++++++++++++++++++++-
>>   include/block/block_int.h |  3 +++
>>   4 files changed, 98 insertions(+), 2 deletions(-)
>>
>> diff --git a/block.c b/block.c
>> index 0159f8e510..e3922a0474 100644
>> --- a/block.c
>> +++ b/block.c
>> @@ -6259,6 +6259,36 @@ out:
>>       return to_replace_bs;
>>   }
>>   
>> +/*
>> + * Return true iff @child is a (recursive) child of @parent, with at
>> + * least @min_level edges between them.
>> + *
>> + * (If @min_level == 0, return true if @child == @parent.  For
>> + * @min_level == 1, @child needs to be at least a real child; for
>> + * @min_level == 2, it needs to be at least a grand-child; and so on.)
>> + */
>> +bool bdrv_is_child_of(BlockDriverState *child, BlockDriverState *parent,
>> +                      int min_level)
>> +{
>> +    BdrvChild *c;
>> +
>> +    if (child == parent && min_level <= 0) {
>> +        return true;
>> +    }
>> +
>> +    if (!parent) {
>> +        return false;
>> +    }
>> +
>> +    QLIST_FOREACH(c, &parent->children, next) {
>> +        if (bdrv_is_child_of(child, c->bs, min_level - 1)) {
>> +            return true;
>> +        }
>> +    }
>> +
>> +    return false;
>> +}
>> +
>>   /**
>>    * Iterates through the list of runtime option keys that are said to
>>    * be "strong" for a BDS.  An option is called "strong" if it changes
>> diff --git a/block/mirror.c b/block/mirror.c
>> index 68a4404666..b258c7e98b 100644
>> --- a/block/mirror.c
>> +++ b/block/mirror.c
>> @@ -701,7 +701,24 @@ static int mirror_exit_common(Job *job)
>>            * there.
>>            */
>>           if (bdrv_recurse_can_replace(src, to_replace)) {
>> -            bdrv_replace_node(to_replace, target_bs, &local_err);
>> +            /*
>> +             * It is OK for @to_replace to be an immediate child of
>> +             * @target_bs, because that is what happens with
>> +             * drive-mirror sync=none mode=absolute-paths: target_bs's
>> +             * backing file will be the source node, which is also
>> +             * to_replace (by default).
>> +             * bdrv_replace_node() handles this case by not letting
>> +             * target_bs->backing point to itself, but to the source
>> +             * still.
>> +             */
>> +            if (!bdrv_is_child_of(to_replace, target_bs, 2)) {
>> +                bdrv_replace_node(to_replace, target_bs, &local_err);
>> +            } else {
>> +                error_setg(&local_err, "Can no longer replace '%s' by '%s', "
>> +                           "because the former is now a child of the latter, "
>> +                           "and doing so would thus create a loop",
>> +                           to_replace->node_name, target_bs->node_name);
>> +            }
> 
> you may swap if and else branch, dropping "!" mark..

Yes, but I just personally prefer to have the error case in the else branch.

>>           } else {
>>               error_setg(&local_err, "Can no longer replace '%s' by '%s', "
>>                          "because it can no longer be guaranteed that doing so "
>> diff --git a/blockdev.c b/blockdev.c
>> index 9dc2238bf3..d29f147f72 100644
>> --- a/blockdev.c
>> +++ b/blockdev.c
>> @@ -3824,7 +3824,7 @@ static void blockdev_mirror_common(const char *job_id, BlockDriverState *bs,
>>       }
>>   
>>       if (has_replaces) {
>> -        BlockDriverState *to_replace_bs;
>> +        BlockDriverState *to_replace_bs, *target_backing_bs;
>>           AioContext *replace_aio_context;
>>           int64_t bs_size, replace_size;
>>   
>> @@ -3839,6 +3839,52 @@ static void blockdev_mirror_common(const char *job_id, BlockDriverState *bs,
>>               return;
>>           }
>>   
>> +        if (bdrv_is_child_of(to_replace_bs, target, 1)) {
>> +            error_setg(errp, "Replacing %s by %s would result in a loop, "
>> +                       "because the former is a child of the latter",
>> +                       to_replace_bs->node_name, target->node_name);
>> +            return;
>> +        }
> 
> here min_level=1, so we don't handle the case, described in mirror_exit_common..
> I don't see why.. blockdev_mirror_common is called from qmp_drive_mirror,
> including the case with MIRROR_SYNC_MODE_NONE and NEW_IMAGE_MODE_ABSOLUTE_PATHS..
> 
> What I'm missing?

Hmm.  Well.

If it broke drive-mirror sync=none, I suppose I would have noticed by
running the iotests.  But I didn’t, and that’s because this code here is
reached only if the user actually specified @replaces.  (As opposed to
the mirror_exit_common code, where @to_replace may simply be @src if not
overridden by the user.)

The only reason why I allow it in mirror_exit_common is because we have
to.  But if the user manually specifies this configuration, we can’t
guarantee it’s safe.

OTOH, well, if we allow it for drive-mirror sync=none, why not allow it
when manually specified with blockdev-mirror?

What’s your opinion?

>> +
>> +        if (backing_mode == MIRROR_SOURCE_BACKING_CHAIN ||
>> +            backing_mode == MIRROR_OPEN_BACKING_CHAIN)
>> +        {
>> +            /*
>> +             * While we do not quite know what OPEN_BACKING_CHAIN
>> +             * (used for mode=existing) will yield, it is probably
>> +             * best to restrict it exactly like SOURCE_BACKING_CHAIN,
>> +             * because that is our best guess.
>> +             */
>> +            switch (sync) {
>> +            case MIRROR_SYNC_MODE_FULL:
>> +                target_backing_bs = NULL;
>> +                break;
>> +
>> +            case MIRROR_SYNC_MODE_TOP:
>> +                target_backing_bs = backing_bs(bs);
>> +                break;
>> +
>> +            case MIRROR_SYNC_MODE_NONE:
>> +                target_backing_bs = bs;
>> +                break;
>> +
>> +            default:
>> +                abort();
>> +            }
>> +        } else {
>> +            assert(backing_mode == MIRROR_LEAVE_BACKING_CHAIN);
>> +            target_backing_bs = backing_bs(target);
>> +        }
>> +
>> +        if (bdrv_is_child_of(to_replace_bs, target_backing_bs, 0)) {
>> +            error_setg(errp, "Replacing '%s' by '%s' with this sync mode would "
>> +                       "result in a loop, because the former would be a child "
>> +                       "of the latter's backing file ('%s') after the mirror "
>> +                       "job", to_replace_bs->node_name, target->node_name,
>> +                       target_backing_bs->node_name);
>> +            return;
>> +        }
> 
> hmm.. so for MODE_NONE we disallow to_replace == src?

I suppose that’s basically the same as above.  Should we allow this case
when specified explicitly by the user?

Max

>> +
>>           replace_aio_context = bdrv_get_aio_context(to_replace_bs);
>>           aio_context_acquire(replace_aio_context);
>>           replace_size = bdrv_getlength(to_replace_bs);
>> diff --git a/include/block/block_int.h b/include/block/block_int.h
>> index 589a797fab..7064a1a4fa 100644
>> --- a/include/block/block_int.h
>> +++ b/include/block/block_int.h
>> @@ -1266,6 +1266,9 @@ void bdrv_format_default_perms(BlockDriverState *bs, BdrvChild *c,
>>   bool bdrv_recurse_can_replace(BlockDriverState *bs,
>>                                 BlockDriverState *to_replace);
>>   
>> +bool bdrv_is_child_of(BlockDriverState *child, BlockDriverState *parent,
>> +                      int min_level);
>> +
>>   /*
>>    * Default implementation for drivers to pass bdrv_co_block_status() to
>>    * their file.
>>
> 
> 



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH for-5.0 v2 18/23] iotests: Add VM.assert_block_path()
  2019-12-03 12:59   ` Vladimir Sementsov-Ogievskiy
@ 2019-12-09 15:10     ` Max Reitz
  2019-12-13 11:26       ` Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 75+ messages in thread
From: Max Reitz @ 2019-12-09 15:10 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-block
  Cc: Kevin Wolf, Alberto Garcia, qemu-devel


[-- Attachment #1.1: Type: text/plain, Size: 6000 bytes --]

On 03.12.19 13:59, Vladimir Sementsov-Ogievskiy wrote:
> 11.11.2019 19:02, Max Reitz wrote:
>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>> ---
>>   tests/qemu-iotests/iotests.py | 59 +++++++++++++++++++++++++++++++++++
>>   1 file changed, 59 insertions(+)
>>
>> diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py
>> index d34305ce69..3e03320ce3 100644
>> --- a/tests/qemu-iotests/iotests.py
>> +++ b/tests/qemu-iotests/iotests.py
>> @@ -681,6 +681,65 @@ class VM(qtest.QEMUQtestMachine):
>>   
>>           return fields.items() <= ret.items()
>>   
>> +    """
>> +    Check whether the node under the given path in the block graph is
>> +    @expected_node.
>> +
>> +    @root is the node name of the node where the @path is rooted.
>> +
>> +    @path is a string that consists of child names separated by
>> +    slashes.  It must begin with a slash.
> 
> Why do you need this slash?

I don’t.  It just looked better to me.

(One reason would be so it could be empty to refer to @root, but as I
said that isn’t very useful.)

> To stress that we are starting from root?
> But root is not global, it's selected by previous argument, so for me the
> path is more like relative than absolute..
> 
>> +
>> +    Examples for @root + @path:
>> +      - root="qcow2-node", path="/backing/file"
>> +      - root="quorum-node", path="/children.2/file"
>> +
>> +    Hypothetically, @path could be empty, in which case it would point
>> +    to @root.  However, in practice this case is not useful and hence
>> +    not allowed.
> 
> 1. path can't be empty, as accordingly to previous point, it must start with '/'

Hence “hypothetically”.

> 2. path can be '/', which does exactly what you don't allow, and I don't see,
> where it is restricted in code

No, it doesn’t.  That refers to a child of @root with an empty name.

>> +
>> +    @expected_node may be None.
> 
> Which means that, we assert existence of the path except its last element,
> yes? Worth mention this behavior here.

“(All elements of the path but the leaf must still exist.)”?  OK.

>> +
>> +    @graph may be None or the result of an x-debug-query-block-graph
>> +    call that has already been performed.
>> +    """
>> +    def assert_block_path(self, root, path, expected_node, graph=None):
>> +        if graph is None:
>> +            graph = self.qmp('x-debug-query-block-graph')['return']
>> +
>> +        iter_path = iter(path.split('/'))
>> +
>> +        # Must start with a /
>> +        assert next(iter_path) == ''
>> +
>> +        node = next((node for node in graph['nodes'] if node['name'] == root),
>> +                    None)
>> +
>> +        for path_node in iter_path:
>> +            assert node is not None, 'Cannot follow path %s' % path
>> +
>> +            try:
>> +                node_id = next(edge['child'] for edge in graph['edges'] \
>> +                                             if edge['parent'] == node['id'] and
>> +                                                edge['name'] == path_node)
>> +
>> +                node = next(node for node in graph['nodes'] \
>> +                                 if node['id'] == node_id)
> 
> this line cant fail. If it fail, it means a bug in x-debug-query-block-graph, so,
> I'd prefer to move it out of try:except block.

But that makes the code uglier, in my opinion.  We’d then have to set
node_id to e.g. None in the except branch (or rather just abolish the
try-except then) and check whether it’s None before assigning node.
Like this:

node_id = next(..., None)

if node_id is not None:
    node = next(...)
else:
    node = None

I prefer the current try-except construct over that.

>> +            except StopIteration:
>> +                node = None
>> +
>> +        assert node is not None or expected_node is None, \
>> +               'No node found under %s (but expected %s)' % \
>> +               (path, expected_node)
>> +
>> +        assert expected_node is not None or node is None, \
>> +               'Found node %s under %s (but expected none)' % \
>> +               (node['name'], path)
>> +
>> +        if node is not None and expected_node is not None:
> 
> [1]
> second part of condition already asserted by previous assertion

Yes, but I wanted to cover all four cases explicitly.  (In the usual 00,
01, 10, 11 manner.  Well, except it’s 10, 01, 11, 00.)

>> +            assert node['name'] == expected_node, \
>> +                   'Found node %s under %s (but expected %s)' % \
>> +                   (node['name'], path, expected_node)
> 
> IMHO, it would be easier to read like:
> 
>            if node is None:
>                assert  expected_node is None, \
>                   'No node found under %s (but expected %s)' % \
>                   (path, expected_node)
>            else:
>                assert expected_node is not None, \
>                   'Found node %s under %s (but expected none)' % \
>                   (node['name'], path)
> 
>                assert node['name'] == expected_node, \
>                       'Found node %s under %s (but expected %s)' % \
>                       (node['name'], path, expected_node)
> 
> Or even just
> 
>            if node is None:
>                assert expected_node is None, \
>                   'No node found under %s (but expected %s)' % \
>                   (path, expected_node)
>            else:
>                assert node['name'] == expected_node, \
>                       'Found node %s under %s (but expected %s)' % \
>                       (node['name'], path, expected_node)

Works for me, too.

> (I've checked:
>  >>> 'erger %s erg' % None
> 'erger None erg'
> 
> Also, %-style formatting is old, as I understand it's better always use .format()
> )

OK.

Max

>>   
>>   index_re = re.compile(r'([^\[]+)\[([^\]]+)\]')
>>   
>>



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH for-5.0 v2 19/23] iotests: Resolve TODOs in 041
  2019-12-03 13:33     ` Vladimir Sementsov-Ogievskiy
@ 2019-12-09 15:15       ` Max Reitz
  2019-12-13 11:31         ` Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 75+ messages in thread
From: Max Reitz @ 2019-12-09 15:15 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-block
  Cc: Kevin Wolf, Alberto Garcia, qemu-devel


[-- Attachment #1.1: Type: text/plain, Size: 476 bytes --]

On 03.12.19 14:33, Vladimir Sementsov-Ogievskiy wrote:
> 03.12.2019 16:32, Vladimir Sementsov-Ogievskiy wrote:
>> 11.11.2019 19:02, Max Reitz wrote:
>>> Signed-off-by: Max Reitz<mreitz@redhat.com>
>>
>>
>> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>>
> 
> 
> Oops, stop. Why do you remove line "self.vm.shutdown()" ?

Because we don’t need it.  tearDown() does it anyway.  I suppose I
should mention it in the commit message.

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH for-5.0 v2 22/23] iotests: Check that @replaces can replace filters
  2019-12-03 15:58   ` Vladimir Sementsov-Ogievskiy
@ 2019-12-09 15:17     ` Max Reitz
  0 siblings, 0 replies; 75+ messages in thread
From: Max Reitz @ 2019-12-09 15:17 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-block
  Cc: Kevin Wolf, Alberto Garcia, qemu-devel


[-- Attachment #1.1: Type: text/plain, Size: 1567 bytes --]

On 03.12.19 16:58, Vladimir Sementsov-Ogievskiy wrote:
> 11.11.2019 19:02, Max Reitz wrote:
>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>> ---
>>   tests/qemu-iotests/041     | 46 ++++++++++++++++++++++++++++++++++++++
>>   tests/qemu-iotests/041.out |  4 ++--
>>   2 files changed, 48 insertions(+), 2 deletions(-)
>>
>> diff --git a/tests/qemu-iotests/041 b/tests/qemu-iotests/041
>> index ab0cb5b42f..9a00cf6f7b 100755
>> --- a/tests/qemu-iotests/041
>> +++ b/tests/qemu-iotests/041
>> @@ -1200,6 +1200,52 @@ class TestOrphanedSource(iotests.QMPTestCase):
>>           self.assertFalse('mirror-filter' in nodes,
>>                            'Mirror filter node did not disappear')
>>   
>> +# Test cases for @replaces that do not necessarily involve Quorum
>> +class TestReplaces(iotests.QMPTestCase):
>> +    # Each of these test cases needs their own block graph, so do not
>> +    # create any nodes here
>> +    def setUp(self):
>> +        self.vm = iotests.VM()
>> +        self.vm.launch()
>> +
>> +    def tearDown(self):
>> +        self.vm.shutdown()
>> +        for img in (test_img, target_img):
>> +            try:
>> +                os.remove(img)
>> +            except OSError:
>> +                pass
>> +
>> +    """
>> +    Check that we can replace filter nodes.
>> +    """
> 
> PEP8 says, that doc string should appear after "def" line.
> (this applies to previous patch too)

OK.  I just noticed that in some previous patch I also left them having
single quotes, which I should fix.

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH for-5.0 v2 15/23] mirror: Prevent loops
  2019-12-09 14:43     ` Max Reitz
@ 2019-12-13 11:18       ` Vladimir Sementsov-Ogievskiy
  2019-12-20 11:39         ` Max Reitz
  0 siblings, 1 reply; 75+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2019-12-13 11:18 UTC (permalink / raw)
  To: Max Reitz, qemu-block; +Cc: Kevin Wolf, Alberto Garcia, qemu-devel

09.12.2019 17:43, Max Reitz wrote:
> On 02.12.19 13:12, Vladimir Sementsov-Ogievskiy wrote:
>> 11.11.2019 19:02, Max Reitz wrote:
>>> While bdrv_replace_node() will not follow through with it, a specific
>>> @replaces asks the mirror job to create a loop.
>>>
>>> For example, say both the source and the target share a child where the
>>> source is a filter; by letting @replaces point to the common child, you
>>> ask for a loop.
>>>
>>> Or if you use @replaces in drive-mirror with sync=none and
>>> mode=absolute-paths, you generally ask for a loop (@replaces must point
>>> to a child of the source, and sync=none makes the source the backing
>>> file of the target after the job).
>>>
>>> bdrv_replace_node() will not create those loops, but by doing so, it
>>> ignores the user-requested configuration, which is not ideally either.
>>> (In the first example above, the target's child will remain what it was,
>>> which may still be reasonable.  But in the second example, the target
>>> will just not become a child of the source, which is precisely what was
>>> requested with @replaces.)
>>>
>>> So prevent such configurations, both before the job, and before it
>>> actually completes.
>>>
>>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>>> ---
>>>    block.c                   | 30 ++++++++++++++++++++++++
>>>    block/mirror.c            | 19 +++++++++++++++-
>>>    blockdev.c                | 48 ++++++++++++++++++++++++++++++++++++++-
>>>    include/block/block_int.h |  3 +++
>>>    4 files changed, 98 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/block.c b/block.c
>>> index 0159f8e510..e3922a0474 100644
>>> --- a/block.c
>>> +++ b/block.c
>>> @@ -6259,6 +6259,36 @@ out:
>>>        return to_replace_bs;
>>>    }
>>>    
>>> +/*
>>> + * Return true iff @child is a (recursive) child of @parent, with at
>>> + * least @min_level edges between them.
>>> + *
>>> + * (If @min_level == 0, return true if @child == @parent.  For
>>> + * @min_level == 1, @child needs to be at least a real child; for
>>> + * @min_level == 2, it needs to be at least a grand-child; and so on.)
>>> + */
>>> +bool bdrv_is_child_of(BlockDriverState *child, BlockDriverState *parent,
>>> +                      int min_level)
>>> +{
>>> +    BdrvChild *c;
>>> +
>>> +    if (child == parent && min_level <= 0) {
>>> +        return true;
>>> +    }
>>> +
>>> +    if (!parent) {
>>> +        return false;
>>> +    }
>>> +
>>> +    QLIST_FOREACH(c, &parent->children, next) {
>>> +        if (bdrv_is_child_of(child, c->bs, min_level - 1)) {
>>> +            return true;
>>> +        }
>>> +    }
>>> +
>>> +    return false;
>>> +}
>>> +
>>>    /**
>>>     * Iterates through the list of runtime option keys that are said to
>>>     * be "strong" for a BDS.  An option is called "strong" if it changes
>>> diff --git a/block/mirror.c b/block/mirror.c
>>> index 68a4404666..b258c7e98b 100644
>>> --- a/block/mirror.c
>>> +++ b/block/mirror.c
>>> @@ -701,7 +701,24 @@ static int mirror_exit_common(Job *job)
>>>             * there.
>>>             */
>>>            if (bdrv_recurse_can_replace(src, to_replace)) {
>>> -            bdrv_replace_node(to_replace, target_bs, &local_err);
>>> +            /*
>>> +             * It is OK for @to_replace to be an immediate child of
>>> +             * @target_bs, because that is what happens with
>>> +             * drive-mirror sync=none mode=absolute-paths: target_bs's
>>> +             * backing file will be the source node, which is also
>>> +             * to_replace (by default).
>>> +             * bdrv_replace_node() handles this case by not letting
>>> +             * target_bs->backing point to itself, but to the source
>>> +             * still.
>>> +             */
>>> +            if (!bdrv_is_child_of(to_replace, target_bs, 2)) {
>>> +                bdrv_replace_node(to_replace, target_bs, &local_err);
>>> +            } else {
>>> +                error_setg(&local_err, "Can no longer replace '%s' by '%s', "
>>> +                           "because the former is now a child of the latter, "
>>> +                           "and doing so would thus create a loop",
>>> +                           to_replace->node_name, target_bs->node_name);
>>> +            }
>>
>> you may swap if and else branch, dropping "!" mark..
> 
> Yes, but I just personally prefer to have the error case in the else branch.
> 
>>>            } else {
>>>                error_setg(&local_err, "Can no longer replace '%s' by '%s', "
>>>                           "because it can no longer be guaranteed that doing so "
>>> diff --git a/blockdev.c b/blockdev.c
>>> index 9dc2238bf3..d29f147f72 100644
>>> --- a/blockdev.c
>>> +++ b/blockdev.c
>>> @@ -3824,7 +3824,7 @@ static void blockdev_mirror_common(const char *job_id, BlockDriverState *bs,
>>>        }
>>>    
>>>        if (has_replaces) {
>>> -        BlockDriverState *to_replace_bs;
>>> +        BlockDriverState *to_replace_bs, *target_backing_bs;
>>>            AioContext *replace_aio_context;
>>>            int64_t bs_size, replace_size;
>>>    
>>> @@ -3839,6 +3839,52 @@ static void blockdev_mirror_common(const char *job_id, BlockDriverState *bs,
>>>                return;
>>>            }
>>>    
>>> +        if (bdrv_is_child_of(to_replace_bs, target, 1)) {
>>> +            error_setg(errp, "Replacing %s by %s would result in a loop, "
>>> +                       "because the former is a child of the latter",
>>> +                       to_replace_bs->node_name, target->node_name);
>>> +            return;
>>> +        }
>>
>> here min_level=1, so we don't handle the case, described in mirror_exit_common..
>> I don't see why.. blockdev_mirror_common is called from qmp_drive_mirror,
>> including the case with MIRROR_SYNC_MODE_NONE and NEW_IMAGE_MODE_ABSOLUTE_PATHS..
>>
>> What I'm missing?
> 
> Hmm.  Well.
> 
> If it broke drive-mirror sync=none, I suppose I would have noticed by
> running the iotests.  But I didn’t, and that’s because this code here is
> reached only if the user actually specified @replaces.  (As opposed to
> the mirror_exit_common code, where @to_replace may simply be @src if not
> overridden by the user.)
> 
> The only reason why I allow it in mirror_exit_common is because we have
> to.  But if the user manually specifies this configuration, we can’t
> guarantee it’s safe.
> 
> OTOH, well, if we allow it for drive-mirror sync=none, why not allow it
> when manually specified with blockdev-mirror?
> 
> What’s your opinion?

Hmm, I think, that allowing to_replaces to be direct backing child of target
(like in mirror_exit_common) is safe enough. User doesn't know that
such replacing includes also replacing own child of the target,
which leads to the loop.. It's not obvious. And behavior of
bdrv_replace_node() which just doesn't create this loop, doesn't
seem something too tricky. Hmm..

We could mention in qapi spec, that replacing doesn't break backing
link of the target, for it to be absolutely defined.

But should we allow replaces to be some other (not backing and not filtered)
child of target?..

> 
>>> +
>>> +        if (backing_mode == MIRROR_SOURCE_BACKING_CHAIN ||
>>> +            backing_mode == MIRROR_OPEN_BACKING_CHAIN)
>>> +        {
>>> +            /*
>>> +             * While we do not quite know what OPEN_BACKING_CHAIN
>>> +             * (used for mode=existing) will yield, it is probably
>>> +             * best to restrict it exactly like SOURCE_BACKING_CHAIN,
>>> +             * because that is our best guess.
>>> +             */
>>> +            switch (sync) {
>>> +            case MIRROR_SYNC_MODE_FULL:
>>> +                target_backing_bs = NULL;
>>> +                break;
>>> +
>>> +            case MIRROR_SYNC_MODE_TOP:
>>> +                target_backing_bs = backing_bs(bs);
>>> +                break;
>>> +
>>> +            case MIRROR_SYNC_MODE_NONE:
>>> +                target_backing_bs = bs;
>>> +                break;
>>> +
>>> +            default:
>>> +                abort();
>>> +            }
>>> +        } else {
>>> +            assert(backing_mode == MIRROR_LEAVE_BACKING_CHAIN);
>>> +            target_backing_bs = backing_bs(target);
>>> +        }
>>> +
>>> +        if (bdrv_is_child_of(to_replace_bs, target_backing_bs, 0)) {
>>> +            error_setg(errp, "Replacing '%s' by '%s' with this sync mode would "
>>> +                       "result in a loop, because the former would be a child "
>>> +                       "of the latter's backing file ('%s') after the mirror "
>>> +                       "job", to_replace_bs->node_name, target->node_name,
>>> +                       target_backing_bs->node_name);
>>> +            return;
>>> +        }
>>
>> hmm.. so for MODE_NONE we disallow to_replace == src?
> 
> I suppose that’s basically the same as above.  Should we allow this case
> when specified explicitly by the user?
> 

I'm a bit more closer to allowing it, for consistency with automatic path, with
unspecified replaces. Are we sure that nobody uses it?

> 
>>> +
>>>            replace_aio_context = bdrv_get_aio_context(to_replace_bs);
>>>            aio_context_acquire(replace_aio_context);
>>>            replace_size = bdrv_getlength(to_replace_bs);
>>> diff --git a/include/block/block_int.h b/include/block/block_int.h
>>> index 589a797fab..7064a1a4fa 100644
>>> --- a/include/block/block_int.h
>>> +++ b/include/block/block_int.h
>>> @@ -1266,6 +1266,9 @@ void bdrv_format_default_perms(BlockDriverState *bs, BdrvChild *c,
>>>    bool bdrv_recurse_can_replace(BlockDriverState *bs,
>>>                                  BlockDriverState *to_replace);
>>>    
>>> +bool bdrv_is_child_of(BlockDriverState *child, BlockDriverState *parent,
>>> +                      int min_level);
>>> +
>>>    /*
>>>     * Default implementation for drivers to pass bdrv_co_block_status() to
>>>     * their file.
>>>
>>
>>
> 
> 


-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH for-5.0 v2 18/23] iotests: Add VM.assert_block_path()
  2019-12-09 15:10     ` Max Reitz
@ 2019-12-13 11:26       ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 75+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2019-12-13 11:26 UTC (permalink / raw)
  To: Max Reitz, qemu-block; +Cc: Kevin Wolf, Alberto Garcia, qemu-devel

09.12.2019 18:10, Max Reitz wrote:
> On 03.12.19 13:59, Vladimir Sementsov-Ogievskiy wrote:
>> 11.11.2019 19:02, Max Reitz wrote:
>>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>>> ---
>>>    tests/qemu-iotests/iotests.py | 59 +++++++++++++++++++++++++++++++++++
>>>    1 file changed, 59 insertions(+)
>>>
>>> diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py
>>> index d34305ce69..3e03320ce3 100644
>>> --- a/tests/qemu-iotests/iotests.py
>>> +++ b/tests/qemu-iotests/iotests.py
>>> @@ -681,6 +681,65 @@ class VM(qtest.QEMUQtestMachine):
>>>    
>>>            return fields.items() <= ret.items()
>>>    
>>> +    """
>>> +    Check whether the node under the given path in the block graph is
>>> +    @expected_node.
>>> +
>>> +    @root is the node name of the node where the @path is rooted.
>>> +
>>> +    @path is a string that consists of child names separated by
>>> +    slashes.  It must begin with a slash.
>>
>> Why do you need this slash?
> 
> I don’t.  It just looked better to me.
> 
> (One reason would be so it could be empty to refer to @root, but as I
> said that isn’t very useful.)
> 
>> To stress that we are starting from root?
>> But root is not global, it's selected by previous argument, so for me the
>> path is more like relative than absolute..
>>
>>> +
>>> +    Examples for @root + @path:
>>> +      - root="qcow2-node", path="/backing/file"
>>> +      - root="quorum-node", path="/children.2/file"
>>> +
>>> +    Hypothetically, @path could be empty, in which case it would point
>>> +    to @root.  However, in practice this case is not useful and hence
>>> +    not allowed.
>>
>> 1. path can't be empty, as accordingly to previous point, it must start with '/'
> 
> Hence “hypothetically”.
> 
>> 2. path can be '/', which does exactly what you don't allow, and I don't see,
>> where it is restricted in code
> 
> No, it doesn’t.  That refers to a child of @root with an empty name.

Hmm, yes, OK.

> 
>>> +
>>> +    @expected_node may be None.
>>
>> Which means that, we assert existence of the path except its last element,
>> yes? Worth mention this behavior here.
> 
> “(All elements of the path but the leaf must still exist.)”?  OK.

OK

> 
>>> +
>>> +    @graph may be None or the result of an x-debug-query-block-graph
>>> +    call that has already been performed.
>>> +    """
>>> +    def assert_block_path(self, root, path, expected_node, graph=None):
>>> +        if graph is None:
>>> +            graph = self.qmp('x-debug-query-block-graph')['return']
>>> +
>>> +        iter_path = iter(path.split('/'))
>>> +
>>> +        # Must start with a /
>>> +        assert next(iter_path) == ''
>>> +
>>> +        node = next((node for node in graph['nodes'] if node['name'] == root),
>>> +                    None)
>>> +
>>> +        for path_node in iter_path:
>>> +            assert node is not None, 'Cannot follow path %s' % path
>>> +
>>> +            try:
>>> +                node_id = next(edge['child'] for edge in graph['edges'] \
>>> +                                             if edge['parent'] == node['id'] and
>>> +                                                edge['name'] == path_node)
>>> +
>>> +                node = next(node for node in graph['nodes'] \
>>> +                                 if node['id'] == node_id)
>>
>> this line cant fail. If it fail, it means a bug in x-debug-query-block-graph, so,
>> I'd prefer to move it out of try:except block.
> 
> But that makes the code uglier, in my opinion.  We’d then have to set
> node_id to e.g. None in the except branch (or rather just abolish the
> try-except then) and check whether it’s None before assigning node.
> Like this:
> 
> node_id = next(..., None)
> 
> if node_id is not None:
>      node = next(...)
> else:
>      node = None
> 
> I prefer the current try-except construct over that.

OK

> 
>>> +            except StopIteration:
>>> +                node = None
>>> +
>>> +        assert node is not None or expected_node is None, \
>>> +               'No node found under %s (but expected %s)' % \
>>> +               (path, expected_node)
>>> +
>>> +        assert expected_node is not None or node is None, \
>>> +               'Found node %s under %s (but expected none)' % \
>>> +               (node['name'], path)
>>> +
>>> +        if node is not None and expected_node is not None:
>>
>> [1]
>> second part of condition already asserted by previous assertion
> 
> Yes, but I wanted to cover all four cases explicitly.  (In the usual 00,
> 01, 10, 11 manner.  Well, except it’s 10, 01, 11, 00.)
> 
>>> +            assert node['name'] == expected_node, \
>>> +                   'Found node %s under %s (but expected %s)' % \
>>> +                   (node['name'], path, expected_node)
>>
>> IMHO, it would be easier to read like:
>>
>>             if node is None:
>>                 assert  expected_node is None, \
>>                    'No node found under %s (but expected %s)' % \
>>                    (path, expected_node)
>>             else:
>>                 assert expected_node is not None, \
>>                    'Found node %s under %s (but expected none)' % \
>>                    (node['name'], path)
>>
>>                 assert node['name'] == expected_node, \
>>                        'Found node %s under %s (but expected %s)' % \
>>                        (node['name'], path, expected_node)
>>
>> Or even just
>>
>>             if node is None:
>>                 assert expected_node is None, \
>>                    'No node found under %s (but expected %s)' % \
>>                    (path, expected_node)
>>             else:
>>                 assert node['name'] == expected_node, \
>>                        'Found node %s under %s (but expected %s)' % \
>>                        (node['name'], path, expected_node)
> 
> Works for me, too.
> 
>> (I've checked:
>>   >>> 'erger %s erg' % None
>> 'erger None erg'
>>
>> Also, %-style formatting is old, as I understand it's better always use .format()
>> )
> 
> OK.
> 
> Max
> 
>>>    
>>>    index_re = re.compile(r'([^\[]+)\[([^\]]+)\]')
>>>    
>>>
> 
> 


-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH for-5.0 v2 18/23] iotests: Add VM.assert_block_path()
  2019-11-11 16:02 ` [PATCH for-5.0 v2 18/23] iotests: Add VM.assert_block_path() Max Reitz
  2019-12-03 12:59   ` Vladimir Sementsov-Ogievskiy
@ 2019-12-13 11:27   ` Vladimir Sementsov-Ogievskiy
  2019-12-20 11:42     ` Max Reitz
  1 sibling, 1 reply; 75+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2019-12-13 11:27 UTC (permalink / raw)
  To: Max Reitz, qemu-block; +Cc: Kevin Wolf, Alberto Garcia, qemu-devel

11.11.2019 19:02, Max Reitz wrote:
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>   tests/qemu-iotests/iotests.py | 59 +++++++++++++++++++++++++++++++++++
>   1 file changed, 59 insertions(+)
> 
> diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py
> index d34305ce69..3e03320ce3 100644
> --- a/tests/qemu-iotests/iotests.py
> +++ b/tests/qemu-iotests/iotests.py
> @@ -681,6 +681,65 @@ class VM(qtest.QEMUQtestMachine):
>   
>           return fields.items() <= ret.items()
>   
> +    """
> +    Check whether the node under the given path in the block graph is
> +    @expected_node.
> +
> +    @root is the node name of the node where the @path is rooted.
> +
> +    @path is a string that consists of child names separated by
> +    slashes.  It must begin with a slash.
> +
> +    Examples for @root + @path:
> +      - root="qcow2-node", path="/backing/file"
> +      - root="quorum-node", path="/children.2/file"
> +
> +    Hypothetically, @path could be empty, in which case it would point
> +    to @root.  However, in practice this case is not useful and hence
> +    not allowed.
> +
> +    @expected_node may be None.
> +
> +    @graph may be None or the result of an x-debug-query-block-graph
> +    call that has already been performed.
> +    """
> +    def assert_block_path(self, root, path, expected_node, graph=None):
> +        if graph is None:
> +            graph = self.qmp('x-debug-query-block-graph')['return']
> +
> +        iter_path = iter(path.split('/'))
> +
> +        # Must start with a /
> +        assert next(iter_path) == ''
> +
> +        node = next((node for node in graph['nodes'] if node['name'] == root),
> +                    None)
> +
> +        for path_node in iter_path:

I'd rename path_node to child or edge, to not interfere with block nodes here.

> +            assert node is not None, 'Cannot follow path %s' % path
> +
> +            try:
> +                node_id = next(edge['child'] for edge in graph['edges'] \
> +                                             if edge['parent'] == node['id'] and
> +                                                edge['name'] == path_node)
> +
> +                node = next(node for node in graph['nodes'] \
> +                                 if node['id'] == node_id)
> +            except StopIteration:
> +                node = None
> +
> +        assert node is not None or expected_node is None, \
> +               'No node found under %s (but expected %s)' % \
> +               (path, expected_node)
> +
> +        assert expected_node is not None or node is None, \
> +               'Found node %s under %s (but expected none)' % \
> +               (node['name'], path)
> +
> +        if node is not None and expected_node is not None:
> +            assert node['name'] == expected_node, \
> +                   'Found node %s under %s (but expected %s)' % \
> +                   (node['name'], path, expected_node)
>   
>   index_re = re.compile(r'([^\[]+)\[([^\]]+)\]')
>   
> 


-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH for-5.0 v2 19/23] iotests: Resolve TODOs in 041
  2019-12-09 15:15       ` Max Reitz
@ 2019-12-13 11:31         ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 75+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2019-12-13 11:31 UTC (permalink / raw)
  To: Max Reitz, qemu-block; +Cc: Kevin Wolf, Alberto Garcia, qemu-devel

09.12.2019 18:15, Max Reitz wrote:
> On 03.12.19 14:33, Vladimir Sementsov-Ogievskiy wrote:
>> 03.12.2019 16:32, Vladimir Sementsov-Ogievskiy wrote:
>>> 11.11.2019 19:02, Max Reitz wrote:
>>>> Signed-off-by: Max Reitz<mreitz@redhat.com>
>>>
>>>
>>> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>>>
>>
>>
>> Oops, stop. Why do you remove line "self.vm.shutdown()" ?
> 
> Because we don’t need it.  tearDown() does it anyway.  I suppose I
> should mention it in the commit message.
> 

Yes...

But actually, better to remove extra shutdown from all test cases, not from
one, and than it would be separate patch.

Extra shutdown is left in (considering only class TestRepairQuorum):
test_pause
test_cancel_after_ready
test_cancel
test_complete


-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH for-5.0 v2 15/23] mirror: Prevent loops
  2019-12-13 11:18       ` Vladimir Sementsov-Ogievskiy
@ 2019-12-20 11:39         ` Max Reitz
  2019-12-20 11:55           ` Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 75+ messages in thread
From: Max Reitz @ 2019-12-20 11:39 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-block
  Cc: Kevin Wolf, Alberto Garcia, qemu-devel


[-- Attachment #1.1: Type: text/plain, Size: 10490 bytes --]

On 13.12.19 12:18, Vladimir Sementsov-Ogievskiy wrote:
> 09.12.2019 17:43, Max Reitz wrote:
>> On 02.12.19 13:12, Vladimir Sementsov-Ogievskiy wrote:
>>> 11.11.2019 19:02, Max Reitz wrote:
>>>> While bdrv_replace_node() will not follow through with it, a specific
>>>> @replaces asks the mirror job to create a loop.
>>>>
>>>> For example, say both the source and the target share a child where the
>>>> source is a filter; by letting @replaces point to the common child, you
>>>> ask for a loop.
>>>>
>>>> Or if you use @replaces in drive-mirror with sync=none and
>>>> mode=absolute-paths, you generally ask for a loop (@replaces must point
>>>> to a child of the source, and sync=none makes the source the backing
>>>> file of the target after the job).
>>>>
>>>> bdrv_replace_node() will not create those loops, but by doing so, it
>>>> ignores the user-requested configuration, which is not ideally either.
>>>> (In the first example above, the target's child will remain what it was,
>>>> which may still be reasonable.  But in the second example, the target
>>>> will just not become a child of the source, which is precisely what was
>>>> requested with @replaces.)
>>>>
>>>> So prevent such configurations, both before the job, and before it
>>>> actually completes.
>>>>
>>>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>>>> ---
>>>>    block.c                   | 30 ++++++++++++++++++++++++
>>>>    block/mirror.c            | 19 +++++++++++++++-
>>>>    blockdev.c                | 48 ++++++++++++++++++++++++++++++++++++++-
>>>>    include/block/block_int.h |  3 +++
>>>>    4 files changed, 98 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/block.c b/block.c
>>>> index 0159f8e510..e3922a0474 100644
>>>> --- a/block.c
>>>> +++ b/block.c
>>>> @@ -6259,6 +6259,36 @@ out:
>>>>        return to_replace_bs;
>>>>    }
>>>>    
>>>> +/*
>>>> + * Return true iff @child is a (recursive) child of @parent, with at
>>>> + * least @min_level edges between them.
>>>> + *
>>>> + * (If @min_level == 0, return true if @child == @parent.  For
>>>> + * @min_level == 1, @child needs to be at least a real child; for
>>>> + * @min_level == 2, it needs to be at least a grand-child; and so on.)
>>>> + */
>>>> +bool bdrv_is_child_of(BlockDriverState *child, BlockDriverState *parent,
>>>> +                      int min_level)
>>>> +{
>>>> +    BdrvChild *c;
>>>> +
>>>> +    if (child == parent && min_level <= 0) {
>>>> +        return true;
>>>> +    }
>>>> +
>>>> +    if (!parent) {
>>>> +        return false;
>>>> +    }
>>>> +
>>>> +    QLIST_FOREACH(c, &parent->children, next) {
>>>> +        if (bdrv_is_child_of(child, c->bs, min_level - 1)) {
>>>> +            return true;
>>>> +        }
>>>> +    }
>>>> +
>>>> +    return false;
>>>> +}
>>>> +
>>>>    /**
>>>>     * Iterates through the list of runtime option keys that are said to
>>>>     * be "strong" for a BDS.  An option is called "strong" if it changes
>>>> diff --git a/block/mirror.c b/block/mirror.c
>>>> index 68a4404666..b258c7e98b 100644
>>>> --- a/block/mirror.c
>>>> +++ b/block/mirror.c
>>>> @@ -701,7 +701,24 @@ static int mirror_exit_common(Job *job)
>>>>             * there.
>>>>             */
>>>>            if (bdrv_recurse_can_replace(src, to_replace)) {
>>>> -            bdrv_replace_node(to_replace, target_bs, &local_err);
>>>> +            /*
>>>> +             * It is OK for @to_replace to be an immediate child of
>>>> +             * @target_bs, because that is what happens with
>>>> +             * drive-mirror sync=none mode=absolute-paths: target_bs's
>>>> +             * backing file will be the source node, which is also
>>>> +             * to_replace (by default).
>>>> +             * bdrv_replace_node() handles this case by not letting
>>>> +             * target_bs->backing point to itself, but to the source
>>>> +             * still.
>>>> +             */
>>>> +            if (!bdrv_is_child_of(to_replace, target_bs, 2)) {
>>>> +                bdrv_replace_node(to_replace, target_bs, &local_err);
>>>> +            } else {
>>>> +                error_setg(&local_err, "Can no longer replace '%s' by '%s', "
>>>> +                           "because the former is now a child of the latter, "
>>>> +                           "and doing so would thus create a loop",
>>>> +                           to_replace->node_name, target_bs->node_name);
>>>> +            }
>>>
>>> you may swap if and else branch, dropping "!" mark..
>>
>> Yes, but I just personally prefer to have the error case in the else branch.
>>
>>>>            } else {
>>>>                error_setg(&local_err, "Can no longer replace '%s' by '%s', "
>>>>                           "because it can no longer be guaranteed that doing so "
>>>> diff --git a/blockdev.c b/blockdev.c
>>>> index 9dc2238bf3..d29f147f72 100644
>>>> --- a/blockdev.c
>>>> +++ b/blockdev.c
>>>> @@ -3824,7 +3824,7 @@ static void blockdev_mirror_common(const char *job_id, BlockDriverState *bs,
>>>>        }
>>>>    
>>>>        if (has_replaces) {
>>>> -        BlockDriverState *to_replace_bs;
>>>> +        BlockDriverState *to_replace_bs, *target_backing_bs;
>>>>            AioContext *replace_aio_context;
>>>>            int64_t bs_size, replace_size;
>>>>    
>>>> @@ -3839,6 +3839,52 @@ static void blockdev_mirror_common(const char *job_id, BlockDriverState *bs,
>>>>                return;
>>>>            }
>>>>    
>>>> +        if (bdrv_is_child_of(to_replace_bs, target, 1)) {
>>>> +            error_setg(errp, "Replacing %s by %s would result in a loop, "
>>>> +                       "because the former is a child of the latter",
>>>> +                       to_replace_bs->node_name, target->node_name);
>>>> +            return;
>>>> +        }
>>>
>>> here min_level=1, so we don't handle the case, described in mirror_exit_common..
>>> I don't see why.. blockdev_mirror_common is called from qmp_drive_mirror,
>>> including the case with MIRROR_SYNC_MODE_NONE and NEW_IMAGE_MODE_ABSOLUTE_PATHS..
>>>
>>> What I'm missing?
>>
>> Hmm.  Well.
>>
>> If it broke drive-mirror sync=none, I suppose I would have noticed by
>> running the iotests.  But I didn’t, and that’s because this code here is
>> reached only if the user actually specified @replaces.  (As opposed to
>> the mirror_exit_common code, where @to_replace may simply be @src if not
>> overridden by the user.)
>>
>> The only reason why I allow it in mirror_exit_common is because we have
>> to.  But if the user manually specifies this configuration, we can’t
>> guarantee it’s safe.
>>
>> OTOH, well, if we allow it for drive-mirror sync=none, why not allow it
>> when manually specified with blockdev-mirror?
>>
>> What’s your opinion?
> 
> Hmm, I think, that allowing to_replaces to be direct backing child of target
> (like in mirror_exit_common) is safe enough. User doesn't know that
> such replacing includes also replacing own child of the target,
> which leads to the loop.. It's not obvious. And behavior of
> bdrv_replace_node() which just doesn't create this loop, doesn't
> seem something too tricky. Hmm..
> 
> We could mention in qapi spec, that replacing doesn't break backing
> link of the target, for it to be absolutely defined.
> 
> But should we allow replaces to be some other (not backing and not filtered)
> child of target?..

Well, my opinion is that this is a bit of weird thing to do and that it
basically does ask for a loop.

I’m OK with excluding the sync=none case, because (1) that’s so
obviously a loop that it can’t be what the user honestly wants; (2) how
it’s resolved is rather obvious, too: There is exactly one edge that
causes the loop, so you simply don’t change that one; (3) drive-mirror
sync=none does this case automatically, so we should probably allow
users to do it manually with blockdev-mirror, too.

>>>> +
>>>> +        if (backing_mode == MIRROR_SOURCE_BACKING_CHAIN ||
>>>> +            backing_mode == MIRROR_OPEN_BACKING_CHAIN)
>>>> +        {
>>>> +            /*
>>>> +             * While we do not quite know what OPEN_BACKING_CHAIN
>>>> +             * (used for mode=existing) will yield, it is probably
>>>> +             * best to restrict it exactly like SOURCE_BACKING_CHAIN,
>>>> +             * because that is our best guess.
>>>> +             */
>>>> +            switch (sync) {
>>>> +            case MIRROR_SYNC_MODE_FULL:
>>>> +                target_backing_bs = NULL;
>>>> +                break;
>>>> +
>>>> +            case MIRROR_SYNC_MODE_TOP:
>>>> +                target_backing_bs = backing_bs(bs);
>>>> +                break;
>>>> +
>>>> +            case MIRROR_SYNC_MODE_NONE:
>>>> +                target_backing_bs = bs;
>>>> +                break;
>>>> +
>>>> +            default:
>>>> +                abort();
>>>> +            }
>>>> +        } else {
>>>> +            assert(backing_mode == MIRROR_LEAVE_BACKING_CHAIN);
>>>> +            target_backing_bs = backing_bs(target);
>>>> +        }
>>>> +
>>>> +        if (bdrv_is_child_of(to_replace_bs, target_backing_bs, 0)) {
>>>> +            error_setg(errp, "Replacing '%s' by '%s' with this sync mode would "
>>>> +                       "result in a loop, because the former would be a child "
>>>> +                       "of the latter's backing file ('%s') after the mirror "
>>>> +                       "job", to_replace_bs->node_name, target->node_name,
>>>> +                       target_backing_bs->node_name);
>>>> +            return;
>>>> +        }
>>>
>>> hmm.. so for MODE_NONE we disallow to_replace == src?
>>
>> I suppose that’s basically the same as above.  Should we allow this case
>> when specified explicitly by the user?
>>
> 
> I'm a bit more closer to allowing it, for consistency with automatic path, with
> unspecified replaces. Are we sure that nobody uses it?

Well, there are multiple cases, as shown in the commit message.  I think
that for drive-mirror sync=none, nobody uses @replaces, because it just
doesn’t work.

But, well, that’s just because drive-mirror does graph manipulation that
blockdev-mirror doesn’t (i.e., changing the target’s backing file on
completion).  So maybe we should just prevent loops for drive-mirror,
but let the user do what they want when they use blockdev-mirror?

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH for-5.0 v2 18/23] iotests: Add VM.assert_block_path()
  2019-12-13 11:27   ` Vladimir Sementsov-Ogievskiy
@ 2019-12-20 11:42     ` Max Reitz
  0 siblings, 0 replies; 75+ messages in thread
From: Max Reitz @ 2019-12-20 11:42 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-block
  Cc: Kevin Wolf, Alberto Garcia, qemu-devel


[-- Attachment #1.1: Type: text/plain, Size: 2032 bytes --]

On 13.12.19 12:27, Vladimir Sementsov-Ogievskiy wrote:
> 11.11.2019 19:02, Max Reitz wrote:
>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>> ---
>>   tests/qemu-iotests/iotests.py | 59 +++++++++++++++++++++++++++++++++++
>>   1 file changed, 59 insertions(+)
>>
>> diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py
>> index d34305ce69..3e03320ce3 100644
>> --- a/tests/qemu-iotests/iotests.py
>> +++ b/tests/qemu-iotests/iotests.py
>> @@ -681,6 +681,65 @@ class VM(qtest.QEMUQtestMachine):
>>   
>>           return fields.items() <= ret.items()
>>   
>> +    """
>> +    Check whether the node under the given path in the block graph is
>> +    @expected_node.
>> +
>> +    @root is the node name of the node where the @path is rooted.
>> +
>> +    @path is a string that consists of child names separated by
>> +    slashes.  It must begin with a slash.
>> +
>> +    Examples for @root + @path:
>> +      - root="qcow2-node", path="/backing/file"
>> +      - root="quorum-node", path="/children.2/file"
>> +
>> +    Hypothetically, @path could be empty, in which case it would point
>> +    to @root.  However, in practice this case is not useful and hence
>> +    not allowed.
>> +
>> +    @expected_node may be None.
>> +
>> +    @graph may be None or the result of an x-debug-query-block-graph
>> +    call that has already been performed.
>> +    """
>> +    def assert_block_path(self, root, path, expected_node, graph=None):
>> +        if graph is None:
>> +            graph = self.qmp('x-debug-query-block-graph')['return']
>> +
>> +        iter_path = iter(path.split('/'))
>> +
>> +        # Must start with a /
>> +        assert next(iter_path) == ''
>> +
>> +        node = next((node for node in graph['nodes'] if node['name'] == root),
>> +                    None)
>> +
>> +        for path_node in iter_path:
> 
> I'd rename path_node to child or edge, to not interfere with block nodes here.

Sure.  Or maybe child_name.

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH for-5.0 v2 15/23] mirror: Prevent loops
  2019-12-20 11:39         ` Max Reitz
@ 2019-12-20 11:55           ` Vladimir Sementsov-Ogievskiy
  2019-12-20 12:10             ` Max Reitz
  0 siblings, 1 reply; 75+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2019-12-20 11:55 UTC (permalink / raw)
  To: Max Reitz, qemu-block; +Cc: Kevin Wolf, Alberto Garcia, qemu-devel

20.12.2019 14:39, Max Reitz wrote:
> On 13.12.19 12:18, Vladimir Sementsov-Ogievskiy wrote:
>> 09.12.2019 17:43, Max Reitz wrote:
>>> On 02.12.19 13:12, Vladimir Sementsov-Ogievskiy wrote:
>>>> 11.11.2019 19:02, Max Reitz wrote:
>>>>> While bdrv_replace_node() will not follow through with it, a specific
>>>>> @replaces asks the mirror job to create a loop.
>>>>>
>>>>> For example, say both the source and the target share a child where the
>>>>> source is a filter; by letting @replaces point to the common child, you
>>>>> ask for a loop.
>>>>>
>>>>> Or if you use @replaces in drive-mirror with sync=none and
>>>>> mode=absolute-paths, you generally ask for a loop (@replaces must point
>>>>> to a child of the source, and sync=none makes the source the backing
>>>>> file of the target after the job).
>>>>>
>>>>> bdrv_replace_node() will not create those loops, but by doing so, it
>>>>> ignores the user-requested configuration, which is not ideally either.
>>>>> (In the first example above, the target's child will remain what it was,
>>>>> which may still be reasonable.  But in the second example, the target
>>>>> will just not become a child of the source, which is precisely what was
>>>>> requested with @replaces.)
>>>>>
>>>>> So prevent such configurations, both before the job, and before it
>>>>> actually completes.
>>>>>
>>>>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>>>>> ---
>>>>>     block.c                   | 30 ++++++++++++++++++++++++
>>>>>     block/mirror.c            | 19 +++++++++++++++-
>>>>>     blockdev.c                | 48 ++++++++++++++++++++++++++++++++++++++-
>>>>>     include/block/block_int.h |  3 +++
>>>>>     4 files changed, 98 insertions(+), 2 deletions(-)
>>>>>
>>>>> diff --git a/block.c b/block.c
>>>>> index 0159f8e510..e3922a0474 100644
>>>>> --- a/block.c
>>>>> +++ b/block.c
>>>>> @@ -6259,6 +6259,36 @@ out:
>>>>>         return to_replace_bs;
>>>>>     }
>>>>>     
>>>>> +/*
>>>>> + * Return true iff @child is a (recursive) child of @parent, with at
>>>>> + * least @min_level edges between them.
>>>>> + *
>>>>> + * (If @min_level == 0, return true if @child == @parent.  For
>>>>> + * @min_level == 1, @child needs to be at least a real child; for
>>>>> + * @min_level == 2, it needs to be at least a grand-child; and so on.)
>>>>> + */
>>>>> +bool bdrv_is_child_of(BlockDriverState *child, BlockDriverState *parent,
>>>>> +                      int min_level)
>>>>> +{
>>>>> +    BdrvChild *c;
>>>>> +
>>>>> +    if (child == parent && min_level <= 0) {
>>>>> +        return true;
>>>>> +    }
>>>>> +
>>>>> +    if (!parent) {
>>>>> +        return false;
>>>>> +    }
>>>>> +
>>>>> +    QLIST_FOREACH(c, &parent->children, next) {
>>>>> +        if (bdrv_is_child_of(child, c->bs, min_level - 1)) {
>>>>> +            return true;
>>>>> +        }
>>>>> +    }
>>>>> +
>>>>> +    return false;
>>>>> +}
>>>>> +
>>>>>     /**
>>>>>      * Iterates through the list of runtime option keys that are said to
>>>>>      * be "strong" for a BDS.  An option is called "strong" if it changes
>>>>> diff --git a/block/mirror.c b/block/mirror.c
>>>>> index 68a4404666..b258c7e98b 100644
>>>>> --- a/block/mirror.c
>>>>> +++ b/block/mirror.c
>>>>> @@ -701,7 +701,24 @@ static int mirror_exit_common(Job *job)
>>>>>              * there.
>>>>>              */
>>>>>             if (bdrv_recurse_can_replace(src, to_replace)) {
>>>>> -            bdrv_replace_node(to_replace, target_bs, &local_err);
>>>>> +            /*
>>>>> +             * It is OK for @to_replace to be an immediate child of
>>>>> +             * @target_bs, because that is what happens with
>>>>> +             * drive-mirror sync=none mode=absolute-paths: target_bs's
>>>>> +             * backing file will be the source node, which is also
>>>>> +             * to_replace (by default).
>>>>> +             * bdrv_replace_node() handles this case by not letting
>>>>> +             * target_bs->backing point to itself, but to the source
>>>>> +             * still.
>>>>> +             */
>>>>> +            if (!bdrv_is_child_of(to_replace, target_bs, 2)) {
>>>>> +                bdrv_replace_node(to_replace, target_bs, &local_err);
>>>>> +            } else {
>>>>> +                error_setg(&local_err, "Can no longer replace '%s' by '%s', "
>>>>> +                           "because the former is now a child of the latter, "
>>>>> +                           "and doing so would thus create a loop",
>>>>> +                           to_replace->node_name, target_bs->node_name);
>>>>> +            }
>>>>
>>>> you may swap if and else branch, dropping "!" mark..
>>>
>>> Yes, but I just personally prefer to have the error case in the else branch.
>>>
>>>>>             } else {
>>>>>                 error_setg(&local_err, "Can no longer replace '%s' by '%s', "
>>>>>                            "because it can no longer be guaranteed that doing so "
>>>>> diff --git a/blockdev.c b/blockdev.c
>>>>> index 9dc2238bf3..d29f147f72 100644
>>>>> --- a/blockdev.c
>>>>> +++ b/blockdev.c
>>>>> @@ -3824,7 +3824,7 @@ static void blockdev_mirror_common(const char *job_id, BlockDriverState *bs,
>>>>>         }
>>>>>     
>>>>>         if (has_replaces) {
>>>>> -        BlockDriverState *to_replace_bs;
>>>>> +        BlockDriverState *to_replace_bs, *target_backing_bs;
>>>>>             AioContext *replace_aio_context;
>>>>>             int64_t bs_size, replace_size;
>>>>>     
>>>>> @@ -3839,6 +3839,52 @@ static void blockdev_mirror_common(const char *job_id, BlockDriverState *bs,
>>>>>                 return;
>>>>>             }
>>>>>     
>>>>> +        if (bdrv_is_child_of(to_replace_bs, target, 1)) {
>>>>> +            error_setg(errp, "Replacing %s by %s would result in a loop, "
>>>>> +                       "because the former is a child of the latter",
>>>>> +                       to_replace_bs->node_name, target->node_name);
>>>>> +            return;
>>>>> +        }
>>>>
>>>> here min_level=1, so we don't handle the case, described in mirror_exit_common..
>>>> I don't see why.. blockdev_mirror_common is called from qmp_drive_mirror,
>>>> including the case with MIRROR_SYNC_MODE_NONE and NEW_IMAGE_MODE_ABSOLUTE_PATHS..
>>>>
>>>> What I'm missing?
>>>
>>> Hmm.  Well.
>>>
>>> If it broke drive-mirror sync=none, I suppose I would have noticed by
>>> running the iotests.  But I didn’t, and that’s because this code here is
>>> reached only if the user actually specified @replaces.  (As opposed to
>>> the mirror_exit_common code, where @to_replace may simply be @src if not
>>> overridden by the user.)
>>>
>>> The only reason why I allow it in mirror_exit_common is because we have
>>> to.  But if the user manually specifies this configuration, we can’t
>>> guarantee it’s safe.
>>>
>>> OTOH, well, if we allow it for drive-mirror sync=none, why not allow it
>>> when manually specified with blockdev-mirror?
>>>
>>> What’s your opinion?
>>
>> Hmm, I think, that allowing to_replaces to be direct backing child of target
>> (like in mirror_exit_common) is safe enough. User doesn't know that
>> such replacing includes also replacing own child of the target,
>> which leads to the loop.. It's not obvious. And behavior of
>> bdrv_replace_node() which just doesn't create this loop, doesn't
>> seem something too tricky. Hmm..
>>
>> We could mention in qapi spec, that replacing doesn't break backing
>> link of the target, for it to be absolutely defined.
>>
>> But should we allow replaces to be some other (not backing and not filtered)
>> child of target?..
> 
> Well, my opinion is that this is a bit of weird thing to do and that it
> basically does ask for a loop.
> 
> I’m OK with excluding the sync=none case, because (1) that’s so
> obviously a loop that it can’t be what the user honestly wants; (2) how
> it’s resolved is rather obvious, too: There is exactly one edge that
> causes the loop, so you simply don’t change that one; (3) drive-mirror
> sync=none does this case automatically, so we should probably allow
> users to do it manually with blockdev-mirror, too.
> 
>>>>> +
>>>>> +        if (backing_mode == MIRROR_SOURCE_BACKING_CHAIN ||
>>>>> +            backing_mode == MIRROR_OPEN_BACKING_CHAIN)
>>>>> +        {
>>>>> +            /*
>>>>> +             * While we do not quite know what OPEN_BACKING_CHAIN
>>>>> +             * (used for mode=existing) will yield, it is probably
>>>>> +             * best to restrict it exactly like SOURCE_BACKING_CHAIN,
>>>>> +             * because that is our best guess.
>>>>> +             */
>>>>> +            switch (sync) {
>>>>> +            case MIRROR_SYNC_MODE_FULL:
>>>>> +                target_backing_bs = NULL;
>>>>> +                break;
>>>>> +
>>>>> +            case MIRROR_SYNC_MODE_TOP:
>>>>> +                target_backing_bs = backing_bs(bs);
>>>>> +                break;
>>>>> +
>>>>> +            case MIRROR_SYNC_MODE_NONE:
>>>>> +                target_backing_bs = bs;
>>>>> +                break;
>>>>> +
>>>>> +            default:
>>>>> +                abort();
>>>>> +            }
>>>>> +        } else {
>>>>> +            assert(backing_mode == MIRROR_LEAVE_BACKING_CHAIN);
>>>>> +            target_backing_bs = backing_bs(target);
>>>>> +        }
>>>>> +
>>>>> +        if (bdrv_is_child_of(to_replace_bs, target_backing_bs, 0)) {
>>>>> +            error_setg(errp, "Replacing '%s' by '%s' with this sync mode would "
>>>>> +                       "result in a loop, because the former would be a child "
>>>>> +                       "of the latter's backing file ('%s') after the mirror "
>>>>> +                       "job", to_replace_bs->node_name, target->node_name,
>>>>> +                       target_backing_bs->node_name);
>>>>> +            return;
>>>>> +        }
>>>>
>>>> hmm.. so for MODE_NONE we disallow to_replace == src?
>>>
>>> I suppose that’s basically the same as above.  Should we allow this case
>>> when specified explicitly by the user?
>>>
>>
>> I'm a bit more closer to allowing it, for consistency with automatic path, with
>> unspecified replaces. Are we sure that nobody uses it?
> 
> Well, there are multiple cases, as shown in the commit message.  I think
> that for drive-mirror sync=none, nobody uses @replaces, because it just
> doesn’t work.
> 
> But, well, that’s just because drive-mirror does graph manipulation that
> blockdev-mirror doesn’t (i.e., changing the target’s backing file on
> completion).  So maybe we should just prevent loops for drive-mirror,
> but let the user do what they want when they use blockdev-mirror?
> 

Well, the question finally is, how much to restrict from things for which we
don't know are they useful or not. I don't know) I think, finally, I'm OK with
either way we discussed, or with this patch as is. If it breaks some existing
scenario it will be easy to fix.

-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH for-5.0 v2 15/23] mirror: Prevent loops
  2019-12-20 11:55           ` Vladimir Sementsov-Ogievskiy
@ 2019-12-20 12:10             ` Max Reitz
  0 siblings, 0 replies; 75+ messages in thread
From: Max Reitz @ 2019-12-20 12:10 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-block
  Cc: Kevin Wolf, Alberto Garcia, qemu-devel


[-- Attachment #1.1: Type: text/plain, Size: 11457 bytes --]

On 20.12.19 12:55, Vladimir Sementsov-Ogievskiy wrote:
> 20.12.2019 14:39, Max Reitz wrote:
>> On 13.12.19 12:18, Vladimir Sementsov-Ogievskiy wrote:
>>> 09.12.2019 17:43, Max Reitz wrote:
>>>> On 02.12.19 13:12, Vladimir Sementsov-Ogievskiy wrote:
>>>>> 11.11.2019 19:02, Max Reitz wrote:
>>>>>> While bdrv_replace_node() will not follow through with it, a specific
>>>>>> @replaces asks the mirror job to create a loop.
>>>>>>
>>>>>> For example, say both the source and the target share a child where the
>>>>>> source is a filter; by letting @replaces point to the common child, you
>>>>>> ask for a loop.
>>>>>>
>>>>>> Or if you use @replaces in drive-mirror with sync=none and
>>>>>> mode=absolute-paths, you generally ask for a loop (@replaces must point
>>>>>> to a child of the source, and sync=none makes the source the backing
>>>>>> file of the target after the job).
>>>>>>
>>>>>> bdrv_replace_node() will not create those loops, but by doing so, it
>>>>>> ignores the user-requested configuration, which is not ideally either.
>>>>>> (In the first example above, the target's child will remain what it was,
>>>>>> which may still be reasonable.  But in the second example, the target
>>>>>> will just not become a child of the source, which is precisely what was
>>>>>> requested with @replaces.)
>>>>>>
>>>>>> So prevent such configurations, both before the job, and before it
>>>>>> actually completes.
>>>>>>
>>>>>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>>>>>> ---
>>>>>>     block.c                   | 30 ++++++++++++++++++++++++
>>>>>>     block/mirror.c            | 19 +++++++++++++++-
>>>>>>     blockdev.c                | 48 ++++++++++++++++++++++++++++++++++++++-
>>>>>>     include/block/block_int.h |  3 +++
>>>>>>     4 files changed, 98 insertions(+), 2 deletions(-)
>>>>>>
>>>>>> diff --git a/block.c b/block.c
>>>>>> index 0159f8e510..e3922a0474 100644
>>>>>> --- a/block.c
>>>>>> +++ b/block.c
>>>>>> @@ -6259,6 +6259,36 @@ out:
>>>>>>         return to_replace_bs;
>>>>>>     }
>>>>>>     
>>>>>> +/*
>>>>>> + * Return true iff @child is a (recursive) child of @parent, with at
>>>>>> + * least @min_level edges between them.
>>>>>> + *
>>>>>> + * (If @min_level == 0, return true if @child == @parent.  For
>>>>>> + * @min_level == 1, @child needs to be at least a real child; for
>>>>>> + * @min_level == 2, it needs to be at least a grand-child; and so on.)
>>>>>> + */
>>>>>> +bool bdrv_is_child_of(BlockDriverState *child, BlockDriverState *parent,
>>>>>> +                      int min_level)
>>>>>> +{
>>>>>> +    BdrvChild *c;
>>>>>> +
>>>>>> +    if (child == parent && min_level <= 0) {
>>>>>> +        return true;
>>>>>> +    }
>>>>>> +
>>>>>> +    if (!parent) {
>>>>>> +        return false;
>>>>>> +    }
>>>>>> +
>>>>>> +    QLIST_FOREACH(c, &parent->children, next) {
>>>>>> +        if (bdrv_is_child_of(child, c->bs, min_level - 1)) {
>>>>>> +            return true;
>>>>>> +        }
>>>>>> +    }
>>>>>> +
>>>>>> +    return false;
>>>>>> +}
>>>>>> +
>>>>>>     /**
>>>>>>      * Iterates through the list of runtime option keys that are said to
>>>>>>      * be "strong" for a BDS.  An option is called "strong" if it changes
>>>>>> diff --git a/block/mirror.c b/block/mirror.c
>>>>>> index 68a4404666..b258c7e98b 100644
>>>>>> --- a/block/mirror.c
>>>>>> +++ b/block/mirror.c
>>>>>> @@ -701,7 +701,24 @@ static int mirror_exit_common(Job *job)
>>>>>>              * there.
>>>>>>              */
>>>>>>             if (bdrv_recurse_can_replace(src, to_replace)) {
>>>>>> -            bdrv_replace_node(to_replace, target_bs, &local_err);
>>>>>> +            /*
>>>>>> +             * It is OK for @to_replace to be an immediate child of
>>>>>> +             * @target_bs, because that is what happens with
>>>>>> +             * drive-mirror sync=none mode=absolute-paths: target_bs's
>>>>>> +             * backing file will be the source node, which is also
>>>>>> +             * to_replace (by default).
>>>>>> +             * bdrv_replace_node() handles this case by not letting
>>>>>> +             * target_bs->backing point to itself, but to the source
>>>>>> +             * still.
>>>>>> +             */
>>>>>> +            if (!bdrv_is_child_of(to_replace, target_bs, 2)) {
>>>>>> +                bdrv_replace_node(to_replace, target_bs, &local_err);
>>>>>> +            } else {
>>>>>> +                error_setg(&local_err, "Can no longer replace '%s' by '%s', "
>>>>>> +                           "because the former is now a child of the latter, "
>>>>>> +                           "and doing so would thus create a loop",
>>>>>> +                           to_replace->node_name, target_bs->node_name);
>>>>>> +            }
>>>>>
>>>>> you may swap if and else branch, dropping "!" mark..
>>>>
>>>> Yes, but I just personally prefer to have the error case in the else branch.
>>>>
>>>>>>             } else {
>>>>>>                 error_setg(&local_err, "Can no longer replace '%s' by '%s', "
>>>>>>                            "because it can no longer be guaranteed that doing so "
>>>>>> diff --git a/blockdev.c b/blockdev.c
>>>>>> index 9dc2238bf3..d29f147f72 100644
>>>>>> --- a/blockdev.c
>>>>>> +++ b/blockdev.c
>>>>>> @@ -3824,7 +3824,7 @@ static void blockdev_mirror_common(const char *job_id, BlockDriverState *bs,
>>>>>>         }
>>>>>>     
>>>>>>         if (has_replaces) {
>>>>>> -        BlockDriverState *to_replace_bs;
>>>>>> +        BlockDriverState *to_replace_bs, *target_backing_bs;
>>>>>>             AioContext *replace_aio_context;
>>>>>>             int64_t bs_size, replace_size;
>>>>>>     
>>>>>> @@ -3839,6 +3839,52 @@ static void blockdev_mirror_common(const char *job_id, BlockDriverState *bs,
>>>>>>                 return;
>>>>>>             }
>>>>>>     
>>>>>> +        if (bdrv_is_child_of(to_replace_bs, target, 1)) {
>>>>>> +            error_setg(errp, "Replacing %s by %s would result in a loop, "
>>>>>> +                       "because the former is a child of the latter",
>>>>>> +                       to_replace_bs->node_name, target->node_name);
>>>>>> +            return;
>>>>>> +        }
>>>>>
>>>>> here min_level=1, so we don't handle the case, described in mirror_exit_common..
>>>>> I don't see why.. blockdev_mirror_common is called from qmp_drive_mirror,
>>>>> including the case with MIRROR_SYNC_MODE_NONE and NEW_IMAGE_MODE_ABSOLUTE_PATHS..
>>>>>
>>>>> What I'm missing?
>>>>
>>>> Hmm.  Well.
>>>>
>>>> If it broke drive-mirror sync=none, I suppose I would have noticed by
>>>> running the iotests.  But I didn’t, and that’s because this code here is
>>>> reached only if the user actually specified @replaces.  (As opposed to
>>>> the mirror_exit_common code, where @to_replace may simply be @src if not
>>>> overridden by the user.)
>>>>
>>>> The only reason why I allow it in mirror_exit_common is because we have
>>>> to.  But if the user manually specifies this configuration, we can’t
>>>> guarantee it’s safe.
>>>>
>>>> OTOH, well, if we allow it for drive-mirror sync=none, why not allow it
>>>> when manually specified with blockdev-mirror?
>>>>
>>>> What’s your opinion?
>>>
>>> Hmm, I think, that allowing to_replaces to be direct backing child of target
>>> (like in mirror_exit_common) is safe enough. User doesn't know that
>>> such replacing includes also replacing own child of the target,
>>> which leads to the loop.. It's not obvious. And behavior of
>>> bdrv_replace_node() which just doesn't create this loop, doesn't
>>> seem something too tricky. Hmm..
>>>
>>> We could mention in qapi spec, that replacing doesn't break backing
>>> link of the target, for it to be absolutely defined.
>>>
>>> But should we allow replaces to be some other (not backing and not filtered)
>>> child of target?..
>>
>> Well, my opinion is that this is a bit of weird thing to do and that it
>> basically does ask for a loop.
>>
>> I’m OK with excluding the sync=none case, because (1) that’s so
>> obviously a loop that it can’t be what the user honestly wants; (2) how
>> it’s resolved is rather obvious, too: There is exactly one edge that
>> causes the loop, so you simply don’t change that one; (3) drive-mirror
>> sync=none does this case automatically, so we should probably allow
>> users to do it manually with blockdev-mirror, too.
>>
>>>>>> +
>>>>>> +        if (backing_mode == MIRROR_SOURCE_BACKING_CHAIN ||
>>>>>> +            backing_mode == MIRROR_OPEN_BACKING_CHAIN)
>>>>>> +        {
>>>>>> +            /*
>>>>>> +             * While we do not quite know what OPEN_BACKING_CHAIN
>>>>>> +             * (used for mode=existing) will yield, it is probably
>>>>>> +             * best to restrict it exactly like SOURCE_BACKING_CHAIN,
>>>>>> +             * because that is our best guess.
>>>>>> +             */
>>>>>> +            switch (sync) {
>>>>>> +            case MIRROR_SYNC_MODE_FULL:
>>>>>> +                target_backing_bs = NULL;
>>>>>> +                break;
>>>>>> +
>>>>>> +            case MIRROR_SYNC_MODE_TOP:
>>>>>> +                target_backing_bs = backing_bs(bs);
>>>>>> +                break;
>>>>>> +
>>>>>> +            case MIRROR_SYNC_MODE_NONE:
>>>>>> +                target_backing_bs = bs;
>>>>>> +                break;
>>>>>> +
>>>>>> +            default:
>>>>>> +                abort();
>>>>>> +            }
>>>>>> +        } else {
>>>>>> +            assert(backing_mode == MIRROR_LEAVE_BACKING_CHAIN);
>>>>>> +            target_backing_bs = backing_bs(target);
>>>>>> +        }
>>>>>> +
>>>>>> +        if (bdrv_is_child_of(to_replace_bs, target_backing_bs, 0)) {
>>>>>> +            error_setg(errp, "Replacing '%s' by '%s' with this sync mode would "
>>>>>> +                       "result in a loop, because the former would be a child "
>>>>>> +                       "of the latter's backing file ('%s') after the mirror "
>>>>>> +                       "job", to_replace_bs->node_name, target->node_name,
>>>>>> +                       target_backing_bs->node_name);
>>>>>> +            return;
>>>>>> +        }
>>>>>
>>>>> hmm.. so for MODE_NONE we disallow to_replace == src?
>>>>
>>>> I suppose that’s basically the same as above.  Should we allow this case
>>>> when specified explicitly by the user?
>>>>
>>>
>>> I'm a bit more closer to allowing it, for consistency with automatic path, with
>>> unspecified replaces. Are we sure that nobody uses it?
>>
>> Well, there are multiple cases, as shown in the commit message.  I think
>> that for drive-mirror sync=none, nobody uses @replaces, because it just
>> doesn’t work.
>>
>> But, well, that’s just because drive-mirror does graph manipulation that
>> blockdev-mirror doesn’t (i.e., changing the target’s backing file on
>> completion).  So maybe we should just prevent loops for drive-mirror,
>> but let the user do what they want when they use blockdev-mirror?
>>
> 
> Well, the question finally is, how much to restrict from things for which we
> don't know are they useful or not. I don't know) I think, finally, I'm OK with
> either way we discussed, or with this patch as is. If it breaks some existing
> scenario it will be easy to fix.

OK.  I hope next-year-me has a good and consistent idea on what to do.

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH for-5.0 v2 10/23] quorum: Implement .bdrv_recurse_can_replace()
  2019-11-11 16:02 ` [PATCH for-5.0 v2 10/23] quorum: Implement .bdrv_recurse_can_replace() Max Reitz
  2019-11-29 10:18   ` Vladimir Sementsov-Ogievskiy
@ 2020-02-05 15:55   ` Kevin Wolf
  2020-02-05 16:03     ` Kevin Wolf
  2020-02-06 10:21     ` Max Reitz
  1 sibling, 2 replies; 75+ messages in thread
From: Kevin Wolf @ 2020-02-05 15:55 UTC (permalink / raw)
  To: Max Reitz
  Cc: Vladimir Sementsov-Ogievskiy, Alberto Garcia, qemu-devel, qemu-block

Am 11.11.2019 um 17:02 hat Max Reitz geschrieben:
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>  block/quorum.c | 62 ++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 62 insertions(+)
> 
> diff --git a/block/quorum.c b/block/quorum.c
> index 3a824e77e3..8ee03e9baf 100644
> --- a/block/quorum.c
> +++ b/block/quorum.c
> @@ -825,6 +825,67 @@ static bool quorum_recurse_is_first_non_filter(BlockDriverState *bs,
>      return false;
>  }
>  
> +static bool quorum_recurse_can_replace(BlockDriverState *bs,
> +                                       BlockDriverState *to_replace)
> +{
> +    BDRVQuorumState *s = bs->opaque;
> +    int i;
> +
> +    for (i = 0; i < s->num_children; i++) {
> +        /*
> +         * We have no idea whether our children show the same data as
> +         * this node (@bs).  It is actually highly likely that
> +         * @to_replace does not, because replacing a broken child is
> +         * one of the main use cases here.
> +         *
> +         * We do know that the new BDS will match @bs, so replacing
> +         * any of our children by it will be safe.  It cannot change
> +         * the data this quorum node presents to its parents.
> +         *
> +         * However, replacing @to_replace by @bs in any of our
> +         * children's chains may change visible data somewhere in
> +         * there.  We therefore cannot recurse down those chains with
> +         * bdrv_recurse_can_replace().
> +         * (More formally, bdrv_recurse_can_replace() requires that
> +         * @to_replace will be replaced by something matching the @bs
> +         * passed to it.  We cannot guarantee that.)
> +         *
> +         * Thus, we can only check whether any of our immediate
> +         * children matches @to_replace.
> +         *
> +         * (In the future, we might add a function to recurse down a
> +         * chain that checks that nothing there cares about a change
> +         * in data from the respective child in question.  For
> +         * example, most filters do not care when their child's data
> +         * suddenly changes, as long as their parents do not care.)
> +         */
> +        if (s->children[i].child->bs == to_replace) {
> +            Error *local_err = NULL;
> +
> +            /*
> +             * We now have to ensure that there is no other parent
> +             * that cares about replacing this child by a node with
> +             * potentially different data.
> +             */
> +            s->children[i].to_be_replaced = true;
> +            bdrv_child_refresh_perms(bs, s->children[i].child, &local_err);
> +
> +            /* Revert permissions */
> +            s->children[i].to_be_replaced = false;
> +            bdrv_child_refresh_perms(bs, s->children[i].child, &error_abort);

Quite a hack. The two obvious problems are:

1. We can't guarantee that we can actually revert the permissions. I
   think we ignore failure to loosen permissions meanwhile so that at
   least the &error_abort doesn't trigger, but bs could still be in the
   wrong state afterwards.

   It would be cleaner to use check+abort instead of actually setting
   the new permission.

2. As aborting the permission change makes more obvious, we're checking
   something that might not be true any more when we actually make the
   change.

Pragmatically, a hack might be good enough here, but it should be
documented as such (with a short explanation of its shortcomings) at
least.

Kevin



^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH for-5.0 v2 11/23] block: Use bdrv_recurse_can_replace()
  2019-11-11 16:02 ` [PATCH for-5.0 v2 11/23] block: Use bdrv_recurse_can_replace() Max Reitz
  2019-11-29 11:07   ` Vladimir Sementsov-Ogievskiy
@ 2020-02-05 15:57   ` Kevin Wolf
  1 sibling, 0 replies; 75+ messages in thread
From: Kevin Wolf @ 2020-02-05 15:57 UTC (permalink / raw)
  To: Max Reitz
  Cc: Vladimir Sementsov-Ogievskiy, Alberto Garcia, qemu-devel, qemu-block

Am 11.11.2019 um 17:02 hat Max Reitz geschrieben:
> Let check_to_replace_node() use the more specialized
> bdrv_recurse_can_replace() instead of
> bdrv_recurse_is_first_non_filter(), which is too restrictive.
> 
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>  block.c | 18 ++++++++++++++++--
>  1 file changed, 16 insertions(+), 2 deletions(-)
> 
> diff --git a/block.c b/block.c
> index de53addeb0..7608f21570 100644
> --- a/block.c
> +++ b/block.c
> @@ -6243,6 +6243,17 @@ bool bdrv_recurse_can_replace(BlockDriverState *bs,
>      return false;
>  }
>  
> +/*
> + * Check whether the given @node_name can be replaced by a node that
> + * has the same data as @parent_bs.  If so, return @node_name's BDS;
> + * NULL otherwise.
> + *
> + * @node_name must be a (recursive) *child of @parent_bs (or this
> + * function will return NULL).
> + *
> + * The result (whether the node can be replaced or not) is only valid
> + * for as long as no graph changes occur.
> + */
>  BlockDriverState *check_to_replace_node(BlockDriverState *parent_bs,
>                                          const char *node_name, Error **errp)
>  {
> @@ -6267,8 +6278,11 @@ BlockDriverState *check_to_replace_node(BlockDriverState *parent_bs,
>       * Another benefit is that this tests exclude backing files which are
>       * blocked by the backing blockers.
>       */
> -    if (!bdrv_recurse_is_first_non_filter(parent_bs, to_replace_bs)) {
> -        error_setg(errp, "Only top most non filter can be replaced");
> +    if (!bdrv_recurse_can_replace(parent_bs, to_replace_bs)) {
> +        error_setg(errp, "Cannot replace '%s' by a node mirrored from '%s', "
> +                   "because it cannot be guaranteed that doing so would not "
> +                   "lead to an abrupt change of visible data",
> +                   node_name, parent_bs->node_name);

If this function is only supposed to be used in the context of the
mirror job, moving it into block/mirror.c could be considered as a
cleanup on top.

Kevin



^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH for-5.0 v2 10/23] quorum: Implement .bdrv_recurse_can_replace()
  2020-02-05 15:55   ` Kevin Wolf
@ 2020-02-05 16:03     ` Kevin Wolf
  2020-02-06 10:21     ` Max Reitz
  1 sibling, 0 replies; 75+ messages in thread
From: Kevin Wolf @ 2020-02-05 16:03 UTC (permalink / raw)
  To: Max Reitz; +Cc: Vladimir Sementsov-Ogievskiy, qemu-devel, qemu-block

Am 05.02.2020 um 16:55 hat Kevin Wolf geschrieben:
> Am 11.11.2019 um 17:02 hat Max Reitz geschrieben:
> > Signed-off-by: Max Reitz <mreitz@redhat.com>
> > ---
> >  block/quorum.c | 62 ++++++++++++++++++++++++++++++++++++++++++++++++++
> >  1 file changed, 62 insertions(+)
> > 
> > diff --git a/block/quorum.c b/block/quorum.c
> > index 3a824e77e3..8ee03e9baf 100644
> > --- a/block/quorum.c
> > +++ b/block/quorum.c
> > @@ -825,6 +825,67 @@ static bool quorum_recurse_is_first_non_filter(BlockDriverState *bs,
> >      return false;
> >  }
> >  
> > +static bool quorum_recurse_can_replace(BlockDriverState *bs,
> > +                                       BlockDriverState *to_replace)
> > +{
> > +    BDRVQuorumState *s = bs->opaque;
> > +    int i;
> > +
> > +    for (i = 0; i < s->num_children; i++) {
> > +        /*
> > +         * We have no idea whether our children show the same data as
> > +         * this node (@bs).  It is actually highly likely that
> > +         * @to_replace does not, because replacing a broken child is
> > +         * one of the main use cases here.
> > +         *
> > +         * We do know that the new BDS will match @bs, so replacing
> > +         * any of our children by it will be safe.  It cannot change
> > +         * the data this quorum node presents to its parents.
> > +         *
> > +         * However, replacing @to_replace by @bs in any of our
> > +         * children's chains may change visible data somewhere in
> > +         * there.  We therefore cannot recurse down those chains with
> > +         * bdrv_recurse_can_replace().
> > +         * (More formally, bdrv_recurse_can_replace() requires that
> > +         * @to_replace will be replaced by something matching the @bs
> > +         * passed to it.  We cannot guarantee that.)
> > +         *
> > +         * Thus, we can only check whether any of our immediate
> > +         * children matches @to_replace.
> > +         *
> > +         * (In the future, we might add a function to recurse down a
> > +         * chain that checks that nothing there cares about a change
> > +         * in data from the respective child in question.  For
> > +         * example, most filters do not care when their child's data
> > +         * suddenly changes, as long as their parents do not care.)
> > +         */
> > +        if (s->children[i].child->bs == to_replace) {
> > +            Error *local_err = NULL;
> > +
> > +            /*
> > +             * We now have to ensure that there is no other parent
> > +             * that cares about replacing this child by a node with
> > +             * potentially different data.
> > +             */
> > +            s->children[i].to_be_replaced = true;
> > +            bdrv_child_refresh_perms(bs, s->children[i].child, &local_err);
> > +
> > +            /* Revert permissions */
> > +            s->children[i].to_be_replaced = false;
> > +            bdrv_child_refresh_perms(bs, s->children[i].child, &error_abort);
> 
> Quite a hack. The two obvious problems are:
> 
> 1. We can't guarantee that we can actually revert the permissions. I
>    think we ignore failure to loosen permissions meanwhile so that at
>    least the &error_abort doesn't trigger, but bs could still be in the
>    wrong state afterwards.
> 
>    It would be cleaner to use check+abort instead of actually setting
>    the new permission.
> 
> 2. As aborting the permission change makes more obvious, we're checking
>    something that might not be true any more when we actually make the
>    change.
> 
> Pragmatically, a hack might be good enough here, but it should be
> documented as such (with a short explanation of its shortcomings) at
> least.

Oops, meant to send this as a comment for v3 (which I did apply locally
for review).

Kevin



^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH for-5.0 v2 10/23] quorum: Implement .bdrv_recurse_can_replace()
  2020-02-05 15:55   ` Kevin Wolf
  2020-02-05 16:03     ` Kevin Wolf
@ 2020-02-06 10:21     ` Max Reitz
  2020-02-06 14:42       ` Kevin Wolf
  1 sibling, 1 reply; 75+ messages in thread
From: Max Reitz @ 2020-02-06 10:21 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: Vladimir Sementsov-Ogievskiy, Alberto Garcia, qemu-devel, qemu-block


[-- Attachment #1.1: Type: text/plain, Size: 4369 bytes --]

On 05.02.20 16:55, Kevin Wolf wrote:
> Am 11.11.2019 um 17:02 hat Max Reitz geschrieben:
>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>> ---
>>  block/quorum.c | 62 ++++++++++++++++++++++++++++++++++++++++++++++++++
>>  1 file changed, 62 insertions(+)
>>
>> diff --git a/block/quorum.c b/block/quorum.c
>> index 3a824e77e3..8ee03e9baf 100644
>> --- a/block/quorum.c
>> +++ b/block/quorum.c
>> @@ -825,6 +825,67 @@ static bool quorum_recurse_is_first_non_filter(BlockDriverState *bs,
>>      return false;
>>  }
>>  
>> +static bool quorum_recurse_can_replace(BlockDriverState *bs,
>> +                                       BlockDriverState *to_replace)
>> +{
>> +    BDRVQuorumState *s = bs->opaque;
>> +    int i;
>> +
>> +    for (i = 0; i < s->num_children; i++) {
>> +        /*
>> +         * We have no idea whether our children show the same data as
>> +         * this node (@bs).  It is actually highly likely that
>> +         * @to_replace does not, because replacing a broken child is
>> +         * one of the main use cases here.
>> +         *
>> +         * We do know that the new BDS will match @bs, so replacing
>> +         * any of our children by it will be safe.  It cannot change
>> +         * the data this quorum node presents to its parents.
>> +         *
>> +         * However, replacing @to_replace by @bs in any of our
>> +         * children's chains may change visible data somewhere in
>> +         * there.  We therefore cannot recurse down those chains with
>> +         * bdrv_recurse_can_replace().
>> +         * (More formally, bdrv_recurse_can_replace() requires that
>> +         * @to_replace will be replaced by something matching the @bs
>> +         * passed to it.  We cannot guarantee that.)
>> +         *
>> +         * Thus, we can only check whether any of our immediate
>> +         * children matches @to_replace.
>> +         *
>> +         * (In the future, we might add a function to recurse down a
>> +         * chain that checks that nothing there cares about a change
>> +         * in data from the respective child in question.  For
>> +         * example, most filters do not care when their child's data
>> +         * suddenly changes, as long as their parents do not care.)
>> +         */
>> +        if (s->children[i].child->bs == to_replace) {
>> +            Error *local_err = NULL;
>> +
>> +            /*
>> +             * We now have to ensure that there is no other parent
>> +             * that cares about replacing this child by a node with
>> +             * potentially different data.
>> +             */
>> +            s->children[i].to_be_replaced = true;
>> +            bdrv_child_refresh_perms(bs, s->children[i].child, &local_err);
>> +
>> +            /* Revert permissions */
>> +            s->children[i].to_be_replaced = false;
>> +            bdrv_child_refresh_perms(bs, s->children[i].child, &error_abort);
> 
> Quite a hack. The two obvious problems are:
> 
> 1. We can't guarantee that we can actually revert the permissions. I
>    think we ignore failure to loosen permissions meanwhile so that at
>    least the &error_abort doesn't trigger, but bs could still be in the
>    wrong state afterwards.

I thought we guaranteed that loosening permissions never fails.

(Well, you know.  It may “leak” permissions, but we’d never get an error
here so there’s nothing to handle anyway.)

>    It would be cleaner to use check+abort instead of actually setting
>    the new permission.

Oh.  Yes.  Maybe.  It does require more code, though, because I’d rather
not use bdrv_check_update_perm() from here as-is.

> 2. As aborting the permission change makes more obvious, we're checking
>    something that might not be true any more when we actually make the
>    change.

True.  I tried to do it right by having a post-replace cleanup function,
but after a while that was just going nowhere, really.  So I just went
with what’s patch 13 here.

But isn’t 13 enough, actually?  It check can_replace right before
replacing in a drained section.  I can’t imagine the permissions to
change there.

Max

> Pragmatically, a hack might be good enough here, but it should be
> documented as such (with a short explanation of its shortcomings) at
> least.


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH for-5.0 v2 10/23] quorum: Implement .bdrv_recurse_can_replace()
  2020-02-06 10:21     ` Max Reitz
@ 2020-02-06 14:42       ` Kevin Wolf
  2020-02-06 15:19         ` Max Reitz
  0 siblings, 1 reply; 75+ messages in thread
From: Kevin Wolf @ 2020-02-06 14:42 UTC (permalink / raw)
  To: Max Reitz
  Cc: Vladimir Sementsov-Ogievskiy, Alberto Garcia, qemu-devel, qemu-block

[-- Attachment #1: Type: text/plain, Size: 4973 bytes --]

Am 06.02.2020 um 11:21 hat Max Reitz geschrieben:
> On 05.02.20 16:55, Kevin Wolf wrote:
> > Am 11.11.2019 um 17:02 hat Max Reitz geschrieben:
> >> Signed-off-by: Max Reitz <mreitz@redhat.com>
> >> ---
> >>  block/quorum.c | 62 ++++++++++++++++++++++++++++++++++++++++++++++++++
> >>  1 file changed, 62 insertions(+)
> >>
> >> diff --git a/block/quorum.c b/block/quorum.c
> >> index 3a824e77e3..8ee03e9baf 100644
> >> --- a/block/quorum.c
> >> +++ b/block/quorum.c
> >> @@ -825,6 +825,67 @@ static bool quorum_recurse_is_first_non_filter(BlockDriverState *bs,
> >>      return false;
> >>  }
> >>  
> >> +static bool quorum_recurse_can_replace(BlockDriverState *bs,
> >> +                                       BlockDriverState *to_replace)
> >> +{
> >> +    BDRVQuorumState *s = bs->opaque;
> >> +    int i;
> >> +
> >> +    for (i = 0; i < s->num_children; i++) {
> >> +        /*
> >> +         * We have no idea whether our children show the same data as
> >> +         * this node (@bs).  It is actually highly likely that
> >> +         * @to_replace does not, because replacing a broken child is
> >> +         * one of the main use cases here.
> >> +         *
> >> +         * We do know that the new BDS will match @bs, so replacing
> >> +         * any of our children by it will be safe.  It cannot change
> >> +         * the data this quorum node presents to its parents.
> >> +         *
> >> +         * However, replacing @to_replace by @bs in any of our
> >> +         * children's chains may change visible data somewhere in
> >> +         * there.  We therefore cannot recurse down those chains with
> >> +         * bdrv_recurse_can_replace().
> >> +         * (More formally, bdrv_recurse_can_replace() requires that
> >> +         * @to_replace will be replaced by something matching the @bs
> >> +         * passed to it.  We cannot guarantee that.)
> >> +         *
> >> +         * Thus, we can only check whether any of our immediate
> >> +         * children matches @to_replace.
> >> +         *
> >> +         * (In the future, we might add a function to recurse down a
> >> +         * chain that checks that nothing there cares about a change
> >> +         * in data from the respective child in question.  For
> >> +         * example, most filters do not care when their child's data
> >> +         * suddenly changes, as long as their parents do not care.)
> >> +         */
> >> +        if (s->children[i].child->bs == to_replace) {
> >> +            Error *local_err = NULL;
> >> +
> >> +            /*
> >> +             * We now have to ensure that there is no other parent
> >> +             * that cares about replacing this child by a node with
> >> +             * potentially different data.
> >> +             */
> >> +            s->children[i].to_be_replaced = true;
> >> +            bdrv_child_refresh_perms(bs, s->children[i].child, &local_err);
> >> +
> >> +            /* Revert permissions */
> >> +            s->children[i].to_be_replaced = false;
> >> +            bdrv_child_refresh_perms(bs, s->children[i].child, &error_abort);
> > 
> > Quite a hack. The two obvious problems are:
> > 
> > 1. We can't guarantee that we can actually revert the permissions. I
> >    think we ignore failure to loosen permissions meanwhile so that at
> >    least the &error_abort doesn't trigger, but bs could still be in the
> >    wrong state afterwards.
> 
> I thought we guaranteed that loosening permissions never fails.
> 
> (Well, you know.  It may “leak” permissions, but we’d never get an error
> here so there’s nothing to handle anyway.)

This is what I meant. We ignore the failure (i.e. don't return an error),
but the result still isn't completely correct ("leaked" permissions).

> >    It would be cleaner to use check+abort instead of actually setting
> >    the new permission.
> 
> Oh.  Yes.  Maybe.  It does require more code, though, because I’d rather
> not use bdrv_check_update_perm() from here as-is.

I'm not saying you need to do it, just that it would be cleaner. :-)

> > 2. As aborting the permission change makes more obvious, we're checking
> >    something that might not be true any more when we actually make the
> >    change.
> 
> True.  I tried to do it right by having a post-replace cleanup function,
> but after a while that was just going nowhere, really.  So I just went
> with what’s patch 13 here.
> 
> But isn’t 13 enough, actually?  It check can_replace right before
> replacing in a drained section.  I can’t imagine the permissions to
> change there.

Permissions are tied to file locks, so an external process can just grab
the locks in between. But if I understand correctly, all we try here is
to have an additional safeguard to prevent the user from doing stupid
things. So I guess not being 100% is fine as long as it's documented in
the code.

Kevin

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 801 bytes --]

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH for-5.0 v2 10/23] quorum: Implement .bdrv_recurse_can_replace()
  2020-02-06 14:42       ` Kevin Wolf
@ 2020-02-06 15:19         ` Max Reitz
  2020-02-06 15:42           ` Kevin Wolf
  0 siblings, 1 reply; 75+ messages in thread
From: Max Reitz @ 2020-02-06 15:19 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: Vladimir Sementsov-Ogievskiy, Alberto Garcia, qemu-devel, qemu-block


[-- Attachment #1.1: Type: text/plain, Size: 6094 bytes --]

On 06.02.20 15:42, Kevin Wolf wrote:
> Am 06.02.2020 um 11:21 hat Max Reitz geschrieben:
>> On 05.02.20 16:55, Kevin Wolf wrote:
>>> Am 11.11.2019 um 17:02 hat Max Reitz geschrieben:
>>>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>>>> ---
>>>>  block/quorum.c | 62 ++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>  1 file changed, 62 insertions(+)
>>>>
>>>> diff --git a/block/quorum.c b/block/quorum.c
>>>> index 3a824e77e3..8ee03e9baf 100644
>>>> --- a/block/quorum.c
>>>> +++ b/block/quorum.c
>>>> @@ -825,6 +825,67 @@ static bool quorum_recurse_is_first_non_filter(BlockDriverState *bs,
>>>>      return false;
>>>>  }
>>>>  
>>>> +static bool quorum_recurse_can_replace(BlockDriverState *bs,
>>>> +                                       BlockDriverState *to_replace)
>>>> +{
>>>> +    BDRVQuorumState *s = bs->opaque;
>>>> +    int i;
>>>> +
>>>> +    for (i = 0; i < s->num_children; i++) {
>>>> +        /*
>>>> +         * We have no idea whether our children show the same data as
>>>> +         * this node (@bs).  It is actually highly likely that
>>>> +         * @to_replace does not, because replacing a broken child is
>>>> +         * one of the main use cases here.
>>>> +         *
>>>> +         * We do know that the new BDS will match @bs, so replacing
>>>> +         * any of our children by it will be safe.  It cannot change
>>>> +         * the data this quorum node presents to its parents.
>>>> +         *
>>>> +         * However, replacing @to_replace by @bs in any of our
>>>> +         * children's chains may change visible data somewhere in
>>>> +         * there.  We therefore cannot recurse down those chains with
>>>> +         * bdrv_recurse_can_replace().
>>>> +         * (More formally, bdrv_recurse_can_replace() requires that
>>>> +         * @to_replace will be replaced by something matching the @bs
>>>> +         * passed to it.  We cannot guarantee that.)
>>>> +         *
>>>> +         * Thus, we can only check whether any of our immediate
>>>> +         * children matches @to_replace.
>>>> +         *
>>>> +         * (In the future, we might add a function to recurse down a
>>>> +         * chain that checks that nothing there cares about a change
>>>> +         * in data from the respective child in question.  For
>>>> +         * example, most filters do not care when their child's data
>>>> +         * suddenly changes, as long as their parents do not care.)
>>>> +         */
>>>> +        if (s->children[i].child->bs == to_replace) {
>>>> +            Error *local_err = NULL;
>>>> +
>>>> +            /*
>>>> +             * We now have to ensure that there is no other parent
>>>> +             * that cares about replacing this child by a node with
>>>> +             * potentially different data.
>>>> +             */
>>>> +            s->children[i].to_be_replaced = true;
>>>> +            bdrv_child_refresh_perms(bs, s->children[i].child, &local_err);
>>>> +
>>>> +            /* Revert permissions */
>>>> +            s->children[i].to_be_replaced = false;
>>>> +            bdrv_child_refresh_perms(bs, s->children[i].child, &error_abort);
>>>
>>> Quite a hack. The two obvious problems are:
>>>
>>> 1. We can't guarantee that we can actually revert the permissions. I
>>>    think we ignore failure to loosen permissions meanwhile so that at
>>>    least the &error_abort doesn't trigger, but bs could still be in the
>>>    wrong state afterwards.
>>
>> I thought we guaranteed that loosening permissions never fails.
>>
>> (Well, you know.  It may “leak” permissions, but we’d never get an error
>> here so there’s nothing to handle anyway.)
> 
> This is what I meant. We ignore the failure (i.e. don't return an error),
> but the result still isn't completely correct ("leaked" permissions).
> 
>>>    It would be cleaner to use check+abort instead of actually setting
>>>    the new permission.
>>
>> Oh.  Yes.  Maybe.  It does require more code, though, because I’d rather
>> not use bdrv_check_update_perm() from here as-is.
> 
> I'm not saying you need to do it, just that it would be cleaner. :-)

It would.  Thanks for the suggestion, I obviously didn’t think of it.
(Or there’d be a comment on how this is not the best way in theory, but
in practice it’s good enough.)  I suppose I’ll see how what I can do.

>>> 2. As aborting the permission change makes more obvious, we're checking
>>>    something that might not be true any more when we actually make the
>>>    change.
>>
>> True.  I tried to do it right by having a post-replace cleanup function,
>> but after a while that was just going nowhere, really.  So I just went
>> with what’s patch 13 here.
>>
>> But isn’t 13 enough, actually?  It check can_replace right before
>> replacing in a drained section.  I can’t imagine the permissions to
>> change there.
> 
> Permissions are tied to file locks, so an external process can just grab
> the locks in between.

Ah, right, I didn’t think of that.

> But if I understand correctly, all we try here is
> to have an additional safeguard to prevent the user from doing stupid
> things. So I guess not being 100% is fine as long as it's documented in
> the code.

Yes.  I just think it actually would be 100 % in practice, so I wondered
whether it would need to be documented.

You’re right, though, it isn’t 100 %, so it should definitely be
documented.  Maybe something like

In theory, we would have to keep the permissions tightened until the
node is replaced.  In practice, that would require post-replacement
cleanup infrastructure, which we do not have, and which would be
unreasonably complex to implement.  Therefore, all we can do is require
anyone who wants to replace one node by some potentially unrelated other
node (i.e., the mirror job on completion) to invoke
bdrv_recurse_can_replace() immediately before and thus minimize the time
during which some condition may arise that might forbid the swap.

?

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH for-5.0 v2 10/23] quorum: Implement .bdrv_recurse_can_replace()
  2020-02-06 15:19         ` Max Reitz
@ 2020-02-06 15:42           ` Kevin Wolf
  2020-02-06 16:44             ` Max Reitz
  0 siblings, 1 reply; 75+ messages in thread
From: Kevin Wolf @ 2020-02-06 15:42 UTC (permalink / raw)
  To: Max Reitz
  Cc: Vladimir Sementsov-Ogievskiy, Alberto Garcia, qemu-devel, qemu-block

[-- Attachment #1: Type: text/plain, Size: 6860 bytes --]

Am 06.02.2020 um 16:19 hat Max Reitz geschrieben:
> On 06.02.20 15:42, Kevin Wolf wrote:
> > Am 06.02.2020 um 11:21 hat Max Reitz geschrieben:
> >> On 05.02.20 16:55, Kevin Wolf wrote:
> >>> Am 11.11.2019 um 17:02 hat Max Reitz geschrieben:
> >>>> Signed-off-by: Max Reitz <mreitz@redhat.com>
> >>>> ---
> >>>>  block/quorum.c | 62 ++++++++++++++++++++++++++++++++++++++++++++++++++
> >>>>  1 file changed, 62 insertions(+)
> >>>>
> >>>> diff --git a/block/quorum.c b/block/quorum.c
> >>>> index 3a824e77e3..8ee03e9baf 100644
> >>>> --- a/block/quorum.c
> >>>> +++ b/block/quorum.c
> >>>> @@ -825,6 +825,67 @@ static bool quorum_recurse_is_first_non_filter(BlockDriverState *bs,
> >>>>      return false;
> >>>>  }
> >>>>  
> >>>> +static bool quorum_recurse_can_replace(BlockDriverState *bs,
> >>>> +                                       BlockDriverState *to_replace)
> >>>> +{
> >>>> +    BDRVQuorumState *s = bs->opaque;
> >>>> +    int i;
> >>>> +
> >>>> +    for (i = 0; i < s->num_children; i++) {
> >>>> +        /*
> >>>> +         * We have no idea whether our children show the same data as
> >>>> +         * this node (@bs).  It is actually highly likely that
> >>>> +         * @to_replace does not, because replacing a broken child is
> >>>> +         * one of the main use cases here.
> >>>> +         *
> >>>> +         * We do know that the new BDS will match @bs, so replacing
> >>>> +         * any of our children by it will be safe.  It cannot change
> >>>> +         * the data this quorum node presents to its parents.
> >>>> +         *
> >>>> +         * However, replacing @to_replace by @bs in any of our
> >>>> +         * children's chains may change visible data somewhere in
> >>>> +         * there.  We therefore cannot recurse down those chains with
> >>>> +         * bdrv_recurse_can_replace().
> >>>> +         * (More formally, bdrv_recurse_can_replace() requires that
> >>>> +         * @to_replace will be replaced by something matching the @bs
> >>>> +         * passed to it.  We cannot guarantee that.)
> >>>> +         *
> >>>> +         * Thus, we can only check whether any of our immediate
> >>>> +         * children matches @to_replace.
> >>>> +         *
> >>>> +         * (In the future, we might add a function to recurse down a
> >>>> +         * chain that checks that nothing there cares about a change
> >>>> +         * in data from the respective child in question.  For
> >>>> +         * example, most filters do not care when their child's data
> >>>> +         * suddenly changes, as long as their parents do not care.)
> >>>> +         */
> >>>> +        if (s->children[i].child->bs == to_replace) {
> >>>> +            Error *local_err = NULL;
> >>>> +
> >>>> +            /*
> >>>> +             * We now have to ensure that there is no other parent
> >>>> +             * that cares about replacing this child by a node with
> >>>> +             * potentially different data.
> >>>> +             */
> >>>> +            s->children[i].to_be_replaced = true;
> >>>> +            bdrv_child_refresh_perms(bs, s->children[i].child, &local_err);
> >>>> +
> >>>> +            /* Revert permissions */
> >>>> +            s->children[i].to_be_replaced = false;
> >>>> +            bdrv_child_refresh_perms(bs, s->children[i].child, &error_abort);
> >>>
> >>> Quite a hack. The two obvious problems are:
> >>>
> >>> 1. We can't guarantee that we can actually revert the permissions. I
> >>>    think we ignore failure to loosen permissions meanwhile so that at
> >>>    least the &error_abort doesn't trigger, but bs could still be in the
> >>>    wrong state afterwards.
> >>
> >> I thought we guaranteed that loosening permissions never fails.
> >>
> >> (Well, you know.  It may “leak” permissions, but we’d never get an error
> >> here so there’s nothing to handle anyway.)
> > 
> > This is what I meant. We ignore the failure (i.e. don't return an error),
> > but the result still isn't completely correct ("leaked" permissions).
> > 
> >>>    It would be cleaner to use check+abort instead of actually setting
> >>>    the new permission.
> >>
> >> Oh.  Yes.  Maybe.  It does require more code, though, because I’d rather
> >> not use bdrv_check_update_perm() from here as-is.
> > 
> > I'm not saying you need to do it, just that it would be cleaner. :-)
> 
> It would.  Thanks for the suggestion, I obviously didn’t think of it.
> (Or there’d be a comment on how this is not the best way in theory, but
> in practice it’s good enough.)  I suppose I’ll see how what I can do.
> 
> >>> 2. As aborting the permission change makes more obvious, we're checking
> >>>    something that might not be true any more when we actually make the
> >>>    change.
> >>
> >> True.  I tried to do it right by having a post-replace cleanup function,
> >> but after a while that was just going nowhere, really.  So I just went
> >> with what’s patch 13 here.
> >>
> >> But isn’t 13 enough, actually?  It check can_replace right before
> >> replacing in a drained section.  I can’t imagine the permissions to
> >> change there.
> > 
> > Permissions are tied to file locks, so an external process can just grab
> > the locks in between.
> 
> Ah, right, I didn’t think of that.
> 
> > But if I understand correctly, all we try here is
> > to have an additional safeguard to prevent the user from doing stupid
> > things. So I guess not being 100% is fine as long as it's documented in
> > the code.
> 
> Yes.  I just think it actually would be 100 % in practice, so I wondered
> whether it would need to be documented.
> 
> You’re right, though, it isn’t 100 %, so it should definitely be
> documented.  Maybe something like
> 
> In theory, we would have to keep the permissions tightened until the
> node is replaced.  In practice, that would require post-replacement
> cleanup infrastructure, which we do not have, and which would be
> unreasonably complex to implement.

Sounds good until here.

> Therefore, all we can do is require
> anyone who wants to replace one node by some potentially unrelated other
> node (i.e., the mirror job on completion) to invoke
> bdrv_recurse_can_replace() immediately before and thus minimize the time
> during which some condition may arise that might forbid the swap.
> 
> ?

This second part of your suggested comment could be dropped, as far as
I'm concerned. If anything, it's part of the contract and would belong
in the bdrv_recurse_can_replace() documentation.

However, I think I would mention why not being 100% is okay: The part
with "additional safeguard to prevent the user from doing stupid
things", and that it doesn't make a difference if the user runs the
correct command.

Kevin

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 801 bytes --]

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH for-5.0 v2 10/23] quorum: Implement .bdrv_recurse_can_replace()
  2020-02-06 15:42           ` Kevin Wolf
@ 2020-02-06 16:44             ` Max Reitz
  0 siblings, 0 replies; 75+ messages in thread
From: Max Reitz @ 2020-02-06 16:44 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: Vladimir Sementsov-Ogievskiy, Alberto Garcia, qemu-devel, qemu-block


[-- Attachment #1.1: Type: text/plain, Size: 6946 bytes --]

On 06.02.20 16:42, Kevin Wolf wrote:
> Am 06.02.2020 um 16:19 hat Max Reitz geschrieben:
>> On 06.02.20 15:42, Kevin Wolf wrote:
>>> Am 06.02.2020 um 11:21 hat Max Reitz geschrieben:
>>>> On 05.02.20 16:55, Kevin Wolf wrote:
>>>>> Am 11.11.2019 um 17:02 hat Max Reitz geschrieben:
>>>>>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>>>>>> ---
>>>>>>  block/quorum.c | 62 ++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>>>  1 file changed, 62 insertions(+)
>>>>>>
>>>>>> diff --git a/block/quorum.c b/block/quorum.c
>>>>>> index 3a824e77e3..8ee03e9baf 100644
>>>>>> --- a/block/quorum.c
>>>>>> +++ b/block/quorum.c
>>>>>> @@ -825,6 +825,67 @@ static bool quorum_recurse_is_first_non_filter(BlockDriverState *bs,
>>>>>>      return false;
>>>>>>  }
>>>>>>  
>>>>>> +static bool quorum_recurse_can_replace(BlockDriverState *bs,
>>>>>> +                                       BlockDriverState *to_replace)
>>>>>> +{
>>>>>> +    BDRVQuorumState *s = bs->opaque;
>>>>>> +    int i;
>>>>>> +
>>>>>> +    for (i = 0; i < s->num_children; i++) {
>>>>>> +        /*
>>>>>> +         * We have no idea whether our children show the same data as
>>>>>> +         * this node (@bs).  It is actually highly likely that
>>>>>> +         * @to_replace does not, because replacing a broken child is
>>>>>> +         * one of the main use cases here.
>>>>>> +         *
>>>>>> +         * We do know that the new BDS will match @bs, so replacing
>>>>>> +         * any of our children by it will be safe.  It cannot change
>>>>>> +         * the data this quorum node presents to its parents.
>>>>>> +         *
>>>>>> +         * However, replacing @to_replace by @bs in any of our
>>>>>> +         * children's chains may change visible data somewhere in
>>>>>> +         * there.  We therefore cannot recurse down those chains with
>>>>>> +         * bdrv_recurse_can_replace().
>>>>>> +         * (More formally, bdrv_recurse_can_replace() requires that
>>>>>> +         * @to_replace will be replaced by something matching the @bs
>>>>>> +         * passed to it.  We cannot guarantee that.)
>>>>>> +         *
>>>>>> +         * Thus, we can only check whether any of our immediate
>>>>>> +         * children matches @to_replace.
>>>>>> +         *
>>>>>> +         * (In the future, we might add a function to recurse down a
>>>>>> +         * chain that checks that nothing there cares about a change
>>>>>> +         * in data from the respective child in question.  For
>>>>>> +         * example, most filters do not care when their child's data
>>>>>> +         * suddenly changes, as long as their parents do not care.)
>>>>>> +         */
>>>>>> +        if (s->children[i].child->bs == to_replace) {
>>>>>> +            Error *local_err = NULL;
>>>>>> +
>>>>>> +            /*
>>>>>> +             * We now have to ensure that there is no other parent
>>>>>> +             * that cares about replacing this child by a node with
>>>>>> +             * potentially different data.
>>>>>> +             */
>>>>>> +            s->children[i].to_be_replaced = true;
>>>>>> +            bdrv_child_refresh_perms(bs, s->children[i].child, &local_err);
>>>>>> +
>>>>>> +            /* Revert permissions */
>>>>>> +            s->children[i].to_be_replaced = false;
>>>>>> +            bdrv_child_refresh_perms(bs, s->children[i].child, &error_abort);
>>>>>
>>>>> Quite a hack. The two obvious problems are:
>>>>>
>>>>> 1. We can't guarantee that we can actually revert the permissions. I
>>>>>    think we ignore failure to loosen permissions meanwhile so that at
>>>>>    least the &error_abort doesn't trigger, but bs could still be in the
>>>>>    wrong state afterwards.
>>>>
>>>> I thought we guaranteed that loosening permissions never fails.
>>>>
>>>> (Well, you know.  It may “leak” permissions, but we’d never get an error
>>>> here so there’s nothing to handle anyway.)
>>>
>>> This is what I meant. We ignore the failure (i.e. don't return an error),
>>> but the result still isn't completely correct ("leaked" permissions).
>>>
>>>>>    It would be cleaner to use check+abort instead of actually setting
>>>>>    the new permission.
>>>>
>>>> Oh.  Yes.  Maybe.  It does require more code, though, because I’d rather
>>>> not use bdrv_check_update_perm() from here as-is.
>>>
>>> I'm not saying you need to do it, just that it would be cleaner. :-)
>>
>> It would.  Thanks for the suggestion, I obviously didn’t think of it.
>> (Or there’d be a comment on how this is not the best way in theory, but
>> in practice it’s good enough.)  I suppose I’ll see how what I can do.
>>
>>>>> 2. As aborting the permission change makes more obvious, we're checking
>>>>>    something that might not be true any more when we actually make the
>>>>>    change.
>>>>
>>>> True.  I tried to do it right by having a post-replace cleanup function,
>>>> but after a while that was just going nowhere, really.  So I just went
>>>> with what’s patch 13 here.
>>>>
>>>> But isn’t 13 enough, actually?  It check can_replace right before
>>>> replacing in a drained section.  I can’t imagine the permissions to
>>>> change there.
>>>
>>> Permissions are tied to file locks, so an external process can just grab
>>> the locks in between.
>>
>> Ah, right, I didn’t think of that.
>>
>>> But if I understand correctly, all we try here is
>>> to have an additional safeguard to prevent the user from doing stupid
>>> things. So I guess not being 100% is fine as long as it's documented in
>>> the code.
>>
>> Yes.  I just think it actually would be 100 % in practice, so I wondered
>> whether it would need to be documented.
>>
>> You’re right, though, it isn’t 100 %, so it should definitely be
>> documented.  Maybe something like
>>
>> In theory, we would have to keep the permissions tightened until the
>> node is replaced.  In practice, that would require post-replacement
>> cleanup infrastructure, which we do not have, and which would be
>> unreasonably complex to implement.
> 
> Sounds good until here.
> 
>> Therefore, all we can do is require
>> anyone who wants to replace one node by some potentially unrelated other
>> node (i.e., the mirror job on completion) to invoke
>> bdrv_recurse_can_replace() immediately before and thus minimize the time
>> during which some condition may arise that might forbid the swap.
>>
>> ?
> 
> This second part of your suggested comment could be dropped, as far as
> I'm concerned. If anything, it's part of the contract and would belong
> in the bdrv_recurse_can_replace() documentation.
> 
> However, I think I would mention why not being 100% is okay: The part
> with "additional safeguard to prevent the user from doing stupid
> things", and that it doesn't make a difference if the user runs the
> correct command.

OK.

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 75+ messages in thread

end of thread, other threads:[~2020-02-06 16:46 UTC | newest]

Thread overview: 75+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-11-11 16:01 [PATCH for-5.0 v2 00/23] block: Fix check_to_replace_node() Max Reitz
2019-11-11 16:01 ` [PATCH for-5.0 v2 01/23] blockdev: Allow external snapshots everywhere Max Reitz
2019-11-11 16:01 ` [PATCH for-5.0 v2 02/23] blockdev: Allow resizing everywhere Max Reitz
2019-12-06 14:04   ` Alberto Garcia
2019-12-09 13:56     ` Max Reitz
2019-11-11 16:01 ` [PATCH for-5.0 v2 03/23] block: Drop bdrv_is_first_non_filter() Max Reitz
2019-11-11 16:01 ` [PATCH for-5.0 v2 04/23] iotests: Let 041 use -blockdev for quorum children Max Reitz
2019-11-11 16:01 ` [PATCH for-5.0 v2 05/23] quorum: Fix child permissions Max Reitz
2019-11-29  9:14   ` Vladimir Sementsov-Ogievskiy
2019-11-11 16:01 ` [PATCH for-5.0 v2 06/23] block: Add bdrv_recurse_can_replace() Max Reitz
2019-11-29  9:34   ` Vladimir Sementsov-Ogievskiy
2019-11-29 10:23     ` Max Reitz
2019-11-29 11:04       ` Vladimir Sementsov-Ogievskiy
2019-11-11 16:02 ` [PATCH for-5.0 v2 07/23] blkverify: Implement .bdrv_recurse_can_replace() Max Reitz
2019-11-29  9:41   ` Vladimir Sementsov-Ogievskiy
2019-11-11 16:02 ` [PATCH for-5.0 v2 08/23] quorum: Store children in own structure Max Reitz
2019-11-29  9:46   ` Vladimir Sementsov-Ogievskiy
2019-11-11 16:02 ` [PATCH for-5.0 v2 09/23] quorum: Add QuorumChild.to_be_replaced Max Reitz
2019-11-29  9:59   ` Vladimir Sementsov-Ogievskiy
2019-11-11 16:02 ` [PATCH for-5.0 v2 10/23] quorum: Implement .bdrv_recurse_can_replace() Max Reitz
2019-11-29 10:18   ` Vladimir Sementsov-Ogievskiy
2019-11-29 12:50     ` Max Reitz
2020-02-05 15:55   ` Kevin Wolf
2020-02-05 16:03     ` Kevin Wolf
2020-02-06 10:21     ` Max Reitz
2020-02-06 14:42       ` Kevin Wolf
2020-02-06 15:19         ` Max Reitz
2020-02-06 15:42           ` Kevin Wolf
2020-02-06 16:44             ` Max Reitz
2019-11-11 16:02 ` [PATCH for-5.0 v2 11/23] block: Use bdrv_recurse_can_replace() Max Reitz
2019-11-29 11:07   ` Vladimir Sementsov-Ogievskiy
2020-02-05 15:57   ` Kevin Wolf
2019-11-11 16:02 ` [PATCH for-5.0 v2 12/23] block: Remove bdrv_recurse_is_first_non_filter() Max Reitz
2019-11-11 16:02 ` [PATCH for-5.0 v2 13/23] mirror: Double-check immediately before replacing Max Reitz
2019-11-29 11:18   ` Vladimir Sementsov-Ogievskiy
2019-11-11 16:02 ` [PATCH for-5.0 v2 14/23] quorum: Stop marking it as a filter Max Reitz
2019-11-11 16:02 ` [PATCH for-5.0 v2 15/23] mirror: Prevent loops Max Reitz
2019-11-29 12:01   ` Vladimir Sementsov-Ogievskiy
2019-11-29 13:46     ` Max Reitz
2019-11-29 13:55       ` Vladimir Sementsov-Ogievskiy
2019-11-29 14:17         ` Max Reitz
2019-11-29 14:26           ` Vladimir Sementsov-Ogievskiy
2019-11-29 14:38             ` Max Reitz
2019-12-02 12:12   ` Vladimir Sementsov-Ogievskiy
2019-12-09 14:43     ` Max Reitz
2019-12-13 11:18       ` Vladimir Sementsov-Ogievskiy
2019-12-20 11:39         ` Max Reitz
2019-12-20 11:55           ` Vladimir Sementsov-Ogievskiy
2019-12-20 12:10             ` Max Reitz
2019-11-11 16:02 ` [PATCH for-5.0 v2 16/23] iotests: Use complete_and_wait() in 155 Max Reitz
2019-11-11 16:02 ` [PATCH for-5.0 v2 17/23] iotests: Use skip_if_unsupported decorator in 041 Max Reitz
2019-12-03 12:03   ` Vladimir Sementsov-Ogievskiy
2019-11-11 16:02 ` [PATCH for-5.0 v2 18/23] iotests: Add VM.assert_block_path() Max Reitz
2019-12-03 12:59   ` Vladimir Sementsov-Ogievskiy
2019-12-09 15:10     ` Max Reitz
2019-12-13 11:26       ` Vladimir Sementsov-Ogievskiy
2019-12-13 11:27   ` Vladimir Sementsov-Ogievskiy
2019-12-20 11:42     ` Max Reitz
2019-11-11 16:02 ` [PATCH for-5.0 v2 19/23] iotests: Resolve TODOs in 041 Max Reitz
2019-12-03 13:32   ` Vladimir Sementsov-Ogievskiy
2019-12-03 13:33     ` Vladimir Sementsov-Ogievskiy
2019-12-09 15:15       ` Max Reitz
2019-12-13 11:31         ` Vladimir Sementsov-Ogievskiy
2019-11-11 16:02 ` [PATCH for-5.0 v2 20/23] iotests: Use self.image_len in TestRepairQuorum Max Reitz
2019-11-11 16:02 ` [PATCH for-5.0 v2 21/23] iotests: Add tests for invalid Quorum @replaces Max Reitz
2019-12-03 14:40   ` Vladimir Sementsov-Ogievskiy
2019-11-11 16:02 ` [PATCH for-5.0 v2 22/23] iotests: Check that @replaces can replace filters Max Reitz
2019-12-03 15:58   ` Vladimir Sementsov-Ogievskiy
2019-12-09 15:17     ` Max Reitz
2019-11-11 16:02 ` [PATCH for-5.0 v2 23/23] iotests: Mirror must not attempt to create loops Max Reitz
2019-12-03 17:03   ` Vladimir Sementsov-Ogievskiy
2019-11-29 12:24 ` [PATCH for-5.0 v2 00/23] block: Fix check_to_replace_node() Vladimir Sementsov-Ogievskiy
2019-11-29 12:49   ` Max Reitz
2019-11-29 12:55     ` Vladimir Sementsov-Ogievskiy
2019-11-29 13:08       ` Max Reitz

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).