All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v2 00/11] block: Deal with filters
@ 2018-08-09 22:31 Max Reitz
  2018-08-09 22:31 ` [Qemu-devel] [PATCH v2 01/11] block: Mark commit and mirror as filter drivers Max Reitz
                   ` (13 more replies)
  0 siblings, 14 replies; 27+ messages in thread
From: Max Reitz @ 2018-08-09 22:31 UTC (permalink / raw)
  To: qemu-block; +Cc: qemu-devel, Max Reitz, Kevin Wolf

Note 1: This series depends on v10 of my “block: Fix some filename
generation issues” series.

Based-on: <20180809213528.14738-1-mreitz@redhat.com>

Note 2: This is technically the first part of my active mirror followup.
But just very technically.  I noticed that that followup started to
consist of two parts, namely (A) fix filtery things in the block layer,
and (B) fix active mirror.  So I decided to split it.  This is part A.
Part B comes later.


When we introduced filters, we did it a bit casually.  Sure, we talked a
lot about them before, but that was mostly discussion about where
implicit filters should be added to the graph (note that we currently
only have two implicit filters, those being mirror and commit).  But in
the end, we really just designated some drivers filters (Quorum,
blkdebug, etc.) and added some specifically (throttle, COR), without
really looking through the block layer to see where issues might occur.

It turns out vast areas of the block layer just don't know about filters
and cannot really handle them.  Many cases will work in practice, in
others, well, too bad, you cannot use some feature because some part
deep inside the block layer looks at your filters and thinks they are
format nodes.

This series sets out to correct a bit of that.  I lost my head many
times and I'm sure this series is incomplete in many ways, but it really
doesn't do any good if it sits on my disk any longer, it needs to go out
now.

The most important patches of this series are patches 3 and 4.  These
introduce functions to encapsulate bs->backing and bs->file accesses.
Because sometimes, bs->backing means COW, sometimes it means filtered
node.  And sometimes, bs->file means metadata storage, and sometimes it
means filtered node.  With this functions, it's always clear what the
caller wants, and it will always get what it wants.

Besides that, patch 3 introduces functions to skip filters which may be
used by parts of the block layer that just don't care about them.


Secondly, the restraints put on mirror's @replaces parameter are
revisited and fixed.


Thirdly, BDS.backing_file is changed to be constant.  I don't quite know
why we modify it whenever we change a BDS's backing file, but that's
definitely not quite right.  This fixes things like being able to
perform a commit on a file (using relative filenames) in a directory
that's not qemu's CWD.


Finally, a number of tests are added.


There are probably many things that are worthy of discussion, of which
only some come to my head, e.g.:

- In which cases do we want to skip filters, in which cases do we want
  to skip implicit filters?
  My approach was to basically never skip explicitly added filters,
  except when it's about finding a file in some tree (e.g. in a backing
  chain).  Maybe there are cases where you think we should skip even
  explicitly added filters.

- I made interesting decisions like “When you mirror from a node, we
  should indeed mirror from that node, but when replacing it, we should
  skip leave all implicit filters on top intact.”  You may disagree with
  that.
  (My reasoning here is that users aren't supposed to know about
  implicit filters, and therefore, they should not intend to remove
  them.  Also, mirror accepts only root nodes as the source, so you
  cannot really specify the node below the implicit filters.  But you
  can use @replaces to drop the implicit filters, if you know they are
  there.)


v2:
- Patch 4: We must clear BDS.exact_filename for filter nodes, or we
  basically end up with a random filename for them.  This is achieved by
  pulling the !drv->is_filter check into an inner condition.
  (Fixes iotests 40 and 184)

- Patch 7: iotest 228 tried to use QAPI's backing_file information
  (which was very inconsistent, so with 228 doing some funky backing
  stuff, it really showed) to get some idea of what qemu thinks about
  the image header.  Let's make it use full-backing-filename (and that
  now really reflects the image header).


Max Reitz (11):
  block: Mark commit and mirror as filter drivers
  blockdev: Check @replaces in blockdev_mirror_common
  block: Filtered children access functions
  block: Storage child access function
  block: Fix check_to_replace_node()
  iotests: Add tests for mirror @replaces loops
  block: Leave BDS.backing_file constant
  iotests: Add filter commit test cases
  iotests: Add filter mirror test cases
  iotests: Add test for commit in sub directory
  iotests: Test committing to overridden backing

 qapi/block-core.json           |   4 +
 include/block/block.h          |   2 +
 include/block/block_int.h      |  53 ++++-
 block.c                        | 341 ++++++++++++++++++++++++++++-----
 block/backup.c                 |   8 +-
 block/block-backend.c          |  16 +-
 block/commit.c                 |  38 ++--
 block/io.c                     |  47 +++--
 block/mirror.c                 |  39 ++--
 block/qapi.c                   |  38 ++--
 block/snapshot.c               |  40 ++--
 block/stream.c                 |  15 +-
 blockdev.c                     | 167 ++++++++++++----
 migration/block-dirty-bitmap.c |   4 +-
 nbd/server.c                   |   8 +-
 qemu-img.c                     |  24 ++-
 tests/qemu-iotests/020         |  36 ++++
 tests/qemu-iotests/020.out     |  10 +
 tests/qemu-iotests/040         | 191 ++++++++++++++++++
 tests/qemu-iotests/040.out     |   4 +-
 tests/qemu-iotests/041         | 270 +++++++++++++++++++++++++-
 tests/qemu-iotests/041.out     |   4 +-
 tests/qemu-iotests/191.out     |   1 -
 tests/qemu-iotests/228         |   6 +-
 tests/qemu-iotests/228.out     |   6 +-
 25 files changed, 1141 insertions(+), 231 deletions(-)

-- 
2.17.1

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Qemu-devel] [PATCH v2 01/11] block: Mark commit and mirror as filter drivers
  2018-08-09 22:31 [Qemu-devel] [PATCH v2 00/11] block: Deal with filters Max Reitz
@ 2018-08-09 22:31 ` Max Reitz
  2018-09-07 12:37   ` [Qemu-devel] [Qemu-block] " Alberto Garcia
  2018-08-09 22:31 ` [Qemu-devel] [PATCH v2 02/11] blockdev: Check @replaces in blockdev_mirror_common Max Reitz
                   ` (12 subsequent siblings)
  13 siblings, 1 reply; 27+ messages in thread
From: Max Reitz @ 2018-08-09 22:31 UTC (permalink / raw)
  To: qemu-block; +Cc: qemu-devel, Max Reitz, Kevin Wolf

The commit and mirror block nodes are filters, so they should be marked
as such.

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 block/commit.c | 2 ++
 block/mirror.c | 2 ++
 2 files changed, 4 insertions(+)

diff --git a/block/commit.c b/block/commit.c
index 14788b0708..a95b87bb3a 100644
--- a/block/commit.c
+++ b/block/commit.c
@@ -261,6 +261,8 @@ static BlockDriver bdrv_commit_top = {
     .bdrv_refresh_filename      = bdrv_commit_top_refresh_filename,
     .bdrv_close                 = bdrv_commit_top_close,
     .bdrv_child_perm            = bdrv_commit_top_child_perm,
+
+    .is_filter                  = true,
 };
 
 void commit_start(const char *job_id, BlockDriverState *bs,
diff --git a/block/mirror.c b/block/mirror.c
index b4287a1e2b..5c561c6241 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -1457,6 +1457,8 @@ static BlockDriver bdrv_mirror_top = {
     .bdrv_refresh_filename      = bdrv_mirror_top_refresh_filename,
     .bdrv_close                 = bdrv_mirror_top_close,
     .bdrv_child_perm            = bdrv_mirror_top_child_perm,
+
+    .is_filter                  = true,
 };
 
 static void mirror_start_job(const char *job_id, BlockDriverState *bs,
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [Qemu-devel] [PATCH v2 02/11] blockdev: Check @replaces in blockdev_mirror_common
  2018-08-09 22:31 [Qemu-devel] [PATCH v2 00/11] block: Deal with filters Max Reitz
  2018-08-09 22:31 ` [Qemu-devel] [PATCH v2 01/11] block: Mark commit and mirror as filter drivers Max Reitz
@ 2018-08-09 22:31 ` Max Reitz
  2018-11-09 22:51   ` Eric Blake
  2018-08-09 22:31 ` [Qemu-devel] [PATCH v2 03/11] block: Filtered children access functions Max Reitz
                   ` (11 subsequent siblings)
  13 siblings, 1 reply; 27+ messages in thread
From: Max Reitz @ 2018-08-09 22:31 UTC (permalink / raw)
  To: qemu-block; +Cc: qemu-devel, Max Reitz, Kevin Wolf

There is no reason why the constraints we put on @replaces should be
limited to drive-mirror.  Therefore, move the sanity checks from
qmp_drive_mirror() to blockdev_mirror_common() so they apply to
blockdev-mirror as well.

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 blockdev.c | 55 ++++++++++++++++++++++++++++++++----------------------
 1 file changed, 33 insertions(+), 22 deletions(-)

diff --git a/blockdev.c b/blockdev.c
index 9b143d9e72..2d61588a9a 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -3644,6 +3644,39 @@ static void blockdev_mirror_common(const char *job_id, BlockDriverState *bs,
         sync = MIRROR_SYNC_MODE_FULL;
     }
 
+    if (has_replaces) {
+        BlockDriverState *to_replace_bs;
+        AioContext *replace_aio_context;
+        int64_t bs_size, replace_size;
+
+        bs_size = bdrv_getlength(bs);
+        if (bs_size < 0) {
+            error_setg_errno(errp, -bs_size, "Failed to query device's size");
+            return;
+        }
+
+        to_replace_bs = check_to_replace_node(bs, replaces, errp);
+        if (!to_replace_bs) {
+            return;
+        }
+
+        replace_aio_context = bdrv_get_aio_context(to_replace_bs);
+        aio_context_acquire(replace_aio_context);
+        replace_size = bdrv_getlength(to_replace_bs);
+        aio_context_release(replace_aio_context);
+
+        if (replace_size < 0) {
+            error_setg_errno(errp, -replace_size,
+                             "Failed to query the replacement node's size");
+            return;
+        }
+        if (bs_size != replace_size) {
+            error_setg(errp, "cannot replace image with a mirror image of "
+                             "different size");
+            return;
+        }
+    }
+
     /* pass the node name to replace to mirror start since it's loose coupling
      * and will allow to check whether the node still exist at mirror completion
      */
@@ -3704,33 +3737,11 @@ void qmp_drive_mirror(DriveMirror *arg, Error **errp)
     }
 
     if (arg->has_replaces) {
-        BlockDriverState *to_replace_bs;
-        AioContext *replace_aio_context;
-        int64_t replace_size;
-
         if (!arg->has_node_name) {
             error_setg(errp, "a node-name must be provided when replacing a"
                              " named node of the graph");
             goto out;
         }
-
-        to_replace_bs = check_to_replace_node(bs, arg->replaces, &local_err);
-
-        if (!to_replace_bs) {
-            error_propagate(errp, local_err);
-            goto out;
-        }
-
-        replace_aio_context = bdrv_get_aio_context(to_replace_bs);
-        aio_context_acquire(replace_aio_context);
-        replace_size = bdrv_getlength(to_replace_bs);
-        aio_context_release(replace_aio_context);
-
-        if (size != replace_size) {
-            error_setg(errp, "cannot replace image with a mirror image of "
-                             "different size");
-            goto out;
-        }
     }
 
     if (arg->mode == NEW_IMAGE_MODE_ABSOLUTE_PATHS) {
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [Qemu-devel] [PATCH v2 03/11] block: Filtered children access functions
  2018-08-09 22:31 [Qemu-devel] [PATCH v2 00/11] block: Deal with filters Max Reitz
  2018-08-09 22:31 ` [Qemu-devel] [PATCH v2 01/11] block: Mark commit and mirror as filter drivers Max Reitz
  2018-08-09 22:31 ` [Qemu-devel] [PATCH v2 02/11] blockdev: Check @replaces in blockdev_mirror_common Max Reitz
@ 2018-08-09 22:31 ` Max Reitz
  2018-11-12 22:17   ` Eric Blake
  2018-08-09 22:31 ` [Qemu-devel] [PATCH v2 04/11] block: Storage child access function Max Reitz
                   ` (10 subsequent siblings)
  13 siblings, 1 reply; 27+ messages in thread
From: Max Reitz @ 2018-08-09 22:31 UTC (permalink / raw)
  To: qemu-block; +Cc: qemu-devel, Max Reitz, Kevin Wolf

What bs->file and bs->backing mean depends on the node.  For filter
nodes, both signify a node that will eventually receive all R/W
accesses.  For format nodes, bs->file contains metadata and data, and
bs->backing will not receive writes -- instead, writes are COWed to
bs->file.  Usually.

In any case, it is not trivial to guess what a child means exactly with
our currently limited form of expression.  It is better to introduce
some functions that actually guarantee a meaning:

- bdrv_filtered_cow_child() will return the child that receives requests
  filtered through COW.  That is, reads may or may not be forwarded
  (depending on the overlay's allocation status), but writes never go to
  this child.

- bdrv_filtered_rw_child() will return the child that receives requests
  filtered through some very plain process.  Reads and writes issued to
  the parent will go to the child as well (although timing, etc. may be
  modified).

- All drivers but quorum (but quorum is pretty opaque to the general
  block layer anyway) always only have one of these children: All read
  requests must be served from the filtered_rw_child (if it exists), so
  if there was a filtered_cow_child in addition, it would not receive
  any requests at all.
  (The closest here is mirror, where all requests are passed on to the
  source, but with write-blocking, write requests are "COWed" to the
  target.  But that just means that the target is a special child that
  cannot be introspected by the generic block layer functions, and that
  source is a filtered_rw_child.)
  Therefore, we can also add bdrv_filtered_child() which returns that
  one child (or NULL, if there is no filtered child).

Also, many places in the current block layer should be skipping filters
(all filters or just the ones added implicitly, it depends) when going
through a block node chain.  They do not do that currently, but this
patch makes them.

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 qapi/block-core.json           |   4 +
 include/block/block.h          |   1 +
 include/block/block_int.h      |  33 +++++-
 block.c                        | 184 ++++++++++++++++++++++++++++-----
 block/backup.c                 |   8 +-
 block/block-backend.c          |  16 ++-
 block/commit.c                 |  36 ++++---
 block/io.c                     |  27 ++---
 block/mirror.c                 |  37 ++++---
 block/qapi.c                   |  26 ++---
 block/stream.c                 |  15 ++-
 blockdev.c                     |  84 ++++++++++++---
 migration/block-dirty-bitmap.c |   4 +-
 nbd/server.c                   |   8 +-
 qemu-img.c                     |  12 ++-
 15 files changed, 363 insertions(+), 132 deletions(-)

diff --git a/qapi/block-core.json b/qapi/block-core.json
index f20efc97f7..a71df88eb2 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -2248,6 +2248,10 @@
 # On successful completion the image file is updated to drop the backing file
 # and the BLOCK_JOB_COMPLETED event is emitted.
 #
+# In case @device is a filter node, block-stream modifies the first non-filter
+# overlay node below it to point to base's backing node (or NULL if @base was
+# not specified) instead of modifying @device itself.
+#
 # @job-id: identifier for the newly-created block job. If
 #          omitted, the device name will be used. (Since 2.7)
 #
diff --git a/include/block/block.h b/include/block/block.h
index 7ef118a704..a01986495d 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -452,6 +452,7 @@ BlockDriverState *bdrv_lookup_bs(const char *device,
                                  const char *node_name,
                                  Error **errp);
 bool bdrv_chain_contains(BlockDriverState *top, BlockDriverState *base);
+bool bdrv_legacy_chain_contains(BlockDriverState *top, BlockDriverState *base);
 BlockDriverState *bdrv_next_node(BlockDriverState *bs);
 BlockDriverState *bdrv_next_all_states(BlockDriverState *bs);
 
diff --git a/include/block/block_int.h b/include/block/block_int.h
index 90217512b5..fa9154899d 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -91,6 +91,7 @@ struct BlockDriver {
      * certain callbacks that refer to data (see block.c) to their bs->file if
      * the driver doesn't implement them. Drivers that do not wish to forward
      * must implement them and return -ENOTSUP.
+     * Note that filters are not allowed to modify data.
      */
     bool is_filter;
     /* for snapshots block filter like Quorum can implement the
@@ -887,11 +888,6 @@ typedef enum BlockMirrorBackingMode {
     MIRROR_LEAVE_BACKING_CHAIN,
 } BlockMirrorBackingMode;
 
-static inline BlockDriverState *backing_bs(BlockDriverState *bs)
-{
-    return bs->backing ? bs->backing->bs : NULL;
-}
-
 
 /* Essential block drivers which must always be statically linked into qemu, and
  * which therefore can be accessed without using bdrv_find_format() */
@@ -1215,4 +1211,31 @@ int coroutine_fn bdrv_co_copy_range_to(BdrvChild *src, uint64_t src_offset,
 
 int refresh_total_sectors(BlockDriverState *bs, int64_t hint);
 
+BdrvChild *bdrv_filtered_cow_child(BlockDriverState *bs);
+BdrvChild *bdrv_filtered_rw_child(BlockDriverState *bs);
+BdrvChild *bdrv_filtered_child(BlockDriverState *bs);
+BlockDriverState *bdrv_skip_implicit_filters(BlockDriverState *bs);
+BlockDriverState *bdrv_skip_rw_filters(BlockDriverState *bs);
+BlockDriverState *bdrv_backing_chain_next(BlockDriverState *bs);
+
+static inline BlockDriverState *child_bs(BdrvChild *child)
+{
+    return child ? child->bs : NULL;
+}
+
+static inline BlockDriverState *bdrv_filtered_cow_bs(BlockDriverState *bs)
+{
+    return child_bs(bdrv_filtered_cow_child(bs));
+}
+
+static inline BlockDriverState *bdrv_filtered_rw_bs(BlockDriverState *bs)
+{
+    return child_bs(bdrv_filtered_rw_child(bs));
+}
+
+static inline BlockDriverState *bdrv_filtered_bs(BlockDriverState *bs)
+{
+    return child_bs(bdrv_filtered_child(bs));
+}
+
 #endif /* BLOCK_INT_H */
diff --git a/block.c b/block.c
index 5118d992c3..61a2fe14eb 100644
--- a/block.c
+++ b/block.c
@@ -532,11 +532,12 @@ int bdrv_create_file(const char *filename, QemuOpts *opts, Error **errp)
 int bdrv_probe_blocksizes(BlockDriverState *bs, BlockSizes *bsz)
 {
     BlockDriver *drv = bs->drv;
+    BlockDriverState *filtered = bdrv_filtered_rw_bs(bs);
 
     if (drv && drv->bdrv_probe_blocksizes) {
         return drv->bdrv_probe_blocksizes(bs, bsz);
-    } else if (drv && drv->is_filter && bs->file) {
-        return bdrv_probe_blocksizes(bs->file->bs, bsz);
+    } else if (filtered) {
+        return bdrv_probe_blocksizes(filtered, bsz);
     }
 
     return -ENOTSUP;
@@ -551,11 +552,12 @@ int bdrv_probe_blocksizes(BlockDriverState *bs, BlockSizes *bsz)
 int bdrv_probe_geometry(BlockDriverState *bs, HDGeometry *geo)
 {
     BlockDriver *drv = bs->drv;
+    BlockDriverState *filtered = bdrv_filtered_rw_bs(bs);
 
     if (drv && drv->bdrv_probe_geometry) {
         return drv->bdrv_probe_geometry(bs, geo);
-    } else if (drv && drv->is_filter && bs->file) {
-        return bdrv_probe_geometry(bs->file->bs, geo);
+    } else if (filtered) {
+        return bdrv_probe_geometry(filtered, geo);
     }
 
     return -ENOTSUP;
@@ -2261,7 +2263,7 @@ static void bdrv_parent_cb_change_media(BlockDriverState *bs, bool load)
 }
 
 /*
- * Sets the backing file link of a BDS. A new reference is created; callers
+ * Sets the bs->backing link of a BDS. A new reference is created; callers
  * which don't need their own reference any more must call bdrv_unref().
  */
 void bdrv_set_backing_hd(BlockDriverState *bs, BlockDriverState *backing_hd,
@@ -2313,7 +2315,7 @@ int bdrv_open_backing_file(BlockDriverState *bs, QDict *parent_options,
     QDict *tmp_parent_options = NULL;
     Error *local_err = NULL;
 
-    if (bs->backing != NULL) {
+    if (bdrv_filtered_cow_child(bs) != NULL) {
         goto free_exit;
     }
 
@@ -3722,8 +3724,8 @@ int bdrv_change_backing_file(BlockDriverState *bs,
 BlockDriverState *bdrv_find_overlay(BlockDriverState *active,
                                     BlockDriverState *bs)
 {
-    while (active && bs != backing_bs(active)) {
-        active = backing_bs(active);
+    while (active && bs != bdrv_filtered_bs(active)) {
+        active = bdrv_filtered_bs(active);
     }
 
     return active;
@@ -3926,10 +3928,14 @@ bool bdrv_is_sg(BlockDriverState *bs)
 
 bool bdrv_is_encrypted(BlockDriverState *bs)
 {
-    if (bs->backing && bs->backing->bs->encrypted) {
+    BlockDriverState *filtered = bdrv_filtered_bs(bs);
+    if (bs->encrypted) {
         return true;
     }
-    return bs->encrypted;
+    if (filtered && bdrv_is_encrypted(filtered)) {
+        return true;
+    }
+    return false;
 }
 
 const char *bdrv_get_format_name(BlockDriverState *bs)
@@ -4068,7 +4074,19 @@ BlockDriverState *bdrv_lookup_bs(const char *device,
 bool bdrv_chain_contains(BlockDriverState *top, BlockDriverState *base)
 {
     while (top && top != base) {
-        top = backing_bs(top);
+        top = bdrv_filtered_bs(top);
+    }
+
+    return top != NULL;
+}
+
+/* Same as bdrv_chain_contains(), but skip implicitly added R/W filter
+ * nodes and do not move past explicitly added R/W filters. */
+bool bdrv_legacy_chain_contains(BlockDriverState *top, BlockDriverState *base)
+{
+    top = bdrv_skip_implicit_filters(top);
+    while (top && top != base) {
+        top = bdrv_skip_implicit_filters(bdrv_filtered_cow_bs(top));
     }
 
     return top != NULL;
@@ -4140,20 +4158,24 @@ int bdrv_has_zero_init_1(BlockDriverState *bs)
 
 int bdrv_has_zero_init(BlockDriverState *bs)
 {
+    BlockDriverState *filtered;
+
     if (!bs->drv) {
         return 0;
     }
 
     /* If BS is a copy on write image, it is initialized to
        the contents of the base image, which may not be zeroes.  */
-    if (bs->backing) {
+    if (bdrv_filtered_cow_child(bs)) {
         return 0;
     }
     if (bs->drv->bdrv_has_zero_init) {
         return bs->drv->bdrv_has_zero_init(bs);
     }
-    if (bs->file && bs->drv->is_filter) {
-        return bdrv_has_zero_init(bs->file->bs);
+
+    filtered = bdrv_filtered_rw_bs(bs);
+    if (filtered) {
+        return bdrv_has_zero_init(filtered);
     }
 
     /* safe default */
@@ -4164,7 +4186,7 @@ bool bdrv_unallocated_blocks_are_zero(BlockDriverState *bs)
 {
     BlockDriverInfo bdi;
 
-    if (bs->backing) {
+    if (bdrv_filtered_cow_child(bs)) {
         return false;
     }
 
@@ -4198,8 +4220,9 @@ int bdrv_get_info(BlockDriverState *bs, BlockDriverInfo *bdi)
         return -ENOMEDIUM;
     }
     if (!drv->bdrv_get_info) {
-        if (bs->file && drv->is_filter) {
-            return bdrv_get_info(bs->file->bs, bdi);
+        BlockDriverState *filtered = bdrv_filtered_rw_bs(bs);
+        if (filtered) {
+            return bdrv_get_info(filtered, bdi);
         }
         return -ENOTSUP;
     }
@@ -4301,7 +4324,15 @@ BlockDriverState *bdrv_find_backing_image(BlockDriverState *bs,
 
     is_protocol = path_has_protocol(backing_file);
 
-    for (curr_bs = bs; curr_bs->backing; curr_bs = curr_bs->backing->bs) {
+    /* Being largely a legacy function, skip any filters here
+     * (because filters do not have normal filenames, so they cannot
+     * match anyway; and allowing json:{} filenames is a bit out of
+     * scope) */
+    for (curr_bs = bdrv_skip_rw_filters(bs);
+         bdrv_filtered_cow_child(curr_bs) != NULL;
+         curr_bs = bdrv_backing_chain_next(curr_bs))
+    {
+        BlockDriverState *bs_below = bdrv_backing_chain_next(curr_bs);
 
         /* If either of the filename paths is actually a protocol, then
          * compare unmodified paths; otherwise make paths relative */
@@ -4309,7 +4340,7 @@ BlockDriverState *bdrv_find_backing_image(BlockDriverState *bs,
             char *backing_file_full_ret;
 
             if (strcmp(backing_file, curr_bs->backing_file) == 0) {
-                retval = curr_bs->backing->bs;
+                retval = bs_below;
                 break;
             }
             /* Also check against the full backing filename for the image */
@@ -4319,7 +4350,7 @@ BlockDriverState *bdrv_find_backing_image(BlockDriverState *bs,
                 bool equal = strcmp(backing_file, backing_file_full_ret) == 0;
                 g_free(backing_file_full_ret);
                 if (equal) {
-                    retval = curr_bs->backing->bs;
+                    retval = bs_below;
                     break;
                 }
             }
@@ -4345,7 +4376,7 @@ BlockDriverState *bdrv_find_backing_image(BlockDriverState *bs,
             g_free(filename_tmp);
 
             if (strcmp(backing_file_full, filename_full) == 0) {
-                retval = curr_bs->backing->bs;
+                retval = bs_below;
                 break;
             }
         }
@@ -5256,9 +5287,10 @@ static bool append_strong_runtime_options(QDict *d, BlockDriverState *bs)
  * would result in exactly bs->backing. */
 static bool bdrv_backing_overridden(BlockDriverState *bs)
 {
-    if (bs->backing) {
-        return strcmp(bs->auto_backing_file,
-                      bs->backing->bs->filename);
+    BlockDriverState *backing = bdrv_filtered_cow_bs(bs);
+
+    if (backing) {
+        return strcmp(bs->auto_backing_file, backing->filename);
     } else {
         /* No backing BDS, so if the image header reports any backing
          * file, it must have been suppressed */
@@ -5341,7 +5373,7 @@ void bdrv_refresh_filename(BlockDriverState *bs)
                       qobject_ref(child->bs->full_open_options));
         }
 
-        if (backing_overridden && !bs->backing) {
+        if (backing_overridden && !bdrv_filtered_cow_child(bs)) {
             /* Force no backing file */
             qdict_put_null(opts, "backing");
         }
@@ -5487,3 +5519,105 @@ bool bdrv_can_store_new_dirty_bitmap(BlockDriverState *bs, const char *name,
 
     return drv->bdrv_can_store_new_dirty_bitmap(bs, name, granularity, errp);
 }
+
+/*
+ * Return the child that @bs acts as an overlay for, and from which data may be
+ * copied in COW or COR operations.  Usually this is the backing file.
+ */
+BdrvChild *bdrv_filtered_cow_child(BlockDriverState *bs)
+{
+    if (!bs || !bs->drv) {
+        return NULL;
+    }
+
+    if (bs->drv->is_filter) {
+        return NULL;
+    }
+
+    return bs->backing;
+}
+
+/*
+ * If @bs acts as a pass-through filter for one of its children,
+ * return that child.  "Pass-through" means that write operations to
+ * @bs are forwarded to that child instead of triggering COW.
+ */
+BdrvChild *bdrv_filtered_rw_child(BlockDriverState *bs)
+{
+    if (!bs || !bs->drv) {
+        return NULL;
+    }
+
+    if (!bs->drv->is_filter) {
+        return NULL;
+    }
+
+    return bs->backing ?: bs->file;
+}
+
+/*
+ * Return any filtered child, independently on how it reacts to write
+ * accesses and whether data is copied onto this BDS through COR.
+ */
+BdrvChild *bdrv_filtered_child(BlockDriverState *bs)
+{
+    BdrvChild *cow_child = bdrv_filtered_cow_child(bs);
+    BdrvChild *rw_child = bdrv_filtered_rw_child(bs);
+
+    /* There can only be one filtered child at a time */
+    assert(!(cow_child && rw_child));
+
+    return cow_child ?: rw_child;
+}
+
+static BlockDriverState *bdrv_skip_filters(BlockDriverState *bs,
+                                           bool stop_on_explicit_filter)
+{
+    BdrvChild *filtered;
+
+    if (!bs) {
+        return NULL;
+    }
+
+    while (!(stop_on_explicit_filter && !bs->implicit)) {
+        filtered = bdrv_filtered_rw_child(bs);
+        if (!filtered) {
+            break;
+        }
+        bs = filtered->bs;
+    }
+    /* Note that this treats nodes with bs->drv == NULL as not being
+     * R/W filters (bs->drv == NULL should be replaced by something
+     * else anyway).
+     * The advantage of this behavior is that this function will thus
+     * always return a non-NULL value (given a non-NULL @bs). */
+
+    return bs;
+}
+
+/*
+ * Return the first BDS that has not been added implicitly or that
+ * does not have an RW-filtered child down the chain starting from @bs
+ * (including @bs itself).
+ */
+BlockDriverState *bdrv_skip_implicit_filters(BlockDriverState *bs)
+{
+    return bdrv_skip_filters(bs, true);
+}
+
+/*
+ * Return the first BDS that does not have an RW-filtered child down
+ * the chain starting from @bs (including @bs itself).
+ */
+BlockDriverState *bdrv_skip_rw_filters(BlockDriverState *bs)
+{
+    return bdrv_skip_filters(bs, false);
+}
+
+/*
+ * For a backing chain, return the first non-filter backing image.
+ */
+BlockDriverState *bdrv_backing_chain_next(BlockDriverState *bs)
+{
+    return bdrv_skip_rw_filters(bdrv_filtered_cow_bs(bdrv_skip_rw_filters(bs)));
+}
diff --git a/block/backup.c b/block/backup.c
index 8630d32926..4ddc0bb632 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -618,6 +618,7 @@ BlockJob *backup_job_create(const char *job_id, BlockDriverState *bs,
     int64_t len;
     BlockDriverInfo bdi;
     BackupBlockJob *job = NULL;
+    bool target_does_cow;
     int ret;
 
     assert(bs);
@@ -712,8 +713,9 @@ BlockJob *backup_job_create(const char *job_id, BlockDriverState *bs,
     /* If there is no backing file on the target, we cannot rely on COW if our
      * backup cluster size is smaller than the target cluster size. Even for
      * targets with a backing file, try to avoid COW if possible. */
+    target_does_cow = bdrv_filtered_cow_child(target);
     ret = bdrv_get_info(target, &bdi);
-    if (ret == -ENOTSUP && !target->backing) {
+    if (ret == -ENOTSUP && !target_does_cow) {
         /* Cluster size is not defined */
         warn_report("The target block device doesn't provide "
                     "information about the block size and it doesn't have a "
@@ -722,14 +724,14 @@ BlockJob *backup_job_create(const char *job_id, BlockDriverState *bs,
                     "this default, the backup may be unusable",
                     BACKUP_CLUSTER_SIZE_DEFAULT);
         job->cluster_size = BACKUP_CLUSTER_SIZE_DEFAULT;
-    } else if (ret < 0 && !target->backing) {
+    } else if (ret < 0 && !target_does_cow) {
         error_setg_errno(errp, -ret,
             "Couldn't determine the cluster size of the target image, "
             "which has no backing file");
         error_append_hint(errp,
             "Aborting, since this may create an unusable destination image\n");
         goto error;
-    } else if (ret < 0 && target->backing) {
+    } else if (ret < 0 && target_does_cow) {
         /* Not fatal; just trudge on ahead. */
         job->cluster_size = BACKUP_CLUSTER_SIZE_DEFAULT;
     } else {
diff --git a/block/block-backend.c b/block/block-backend.c
index fa120630be..832c5e3838 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -2114,11 +2114,17 @@ int blk_commit_all(void)
         AioContext *aio_context = blk_get_aio_context(blk);
 
         aio_context_acquire(aio_context);
-        if (blk_is_inserted(blk) && blk->root->bs->backing) {
-            int ret = bdrv_commit(blk->root->bs);
-            if (ret < 0) {
-                aio_context_release(aio_context);
-                return ret;
+        if (blk_is_inserted(blk)) {
+            BlockDriverState *non_filter;
+
+            /* Legacy function, so skip implicit filters */
+            non_filter = bdrv_skip_implicit_filters(blk->root->bs);
+            if (bdrv_filtered_cow_child(non_filter)) {
+                int ret = bdrv_commit(non_filter);
+                if (ret < 0) {
+                    aio_context_release(aio_context);
+                    return ret;
+                }
             }
         }
         aio_context_release(aio_context);
diff --git a/block/commit.c b/block/commit.c
index a95b87bb3a..3ea8e76a50 100644
--- a/block/commit.c
+++ b/block/commit.c
@@ -124,10 +124,9 @@ static void commit_complete(Job *job, void *opaque)
      * filter driver from the backing chain. Do this as the final step so that
      * the 'consistent read' permission can be granted.  */
     if (remove_commit_top_bs) {
-        bdrv_child_try_set_perm(commit_top_bs->backing, 0, BLK_PERM_ALL,
-                                &error_abort);
-        bdrv_replace_node(commit_top_bs, backing_bs(commit_top_bs),
-                          &error_abort);
+        BdrvChild *unfiltered_top = bdrv_filtered_rw_child(commit_top_bs);
+        bdrv_child_try_set_perm(unfiltered_top, 0, BLK_PERM_ALL, &error_abort);
+        bdrv_replace_node(commit_top_bs, unfiltered_top->bs, &error_abort);
     }
 
     bdrv_unref(commit_top_bs);
@@ -331,9 +330,13 @@ void commit_start(const char *job_id, BlockDriverState *bs,
     bdrv_unref(commit_top_bs);
 
     /* Block all nodes between top and base, because they will
-     * disappear from the chain after this operation. */
+     * disappear from the chain after this operation.
+     * Note that this assumes that the user is fine with removing all
+     * nodes (including R/W filters) between top and base.  Assuring
+     * this is the responsibility of the interface (i.e. whoever calls
+     * commit_start()). */
     assert(bdrv_chain_contains(top, base));
-    for (iter = top; iter != base; iter = backing_bs(iter)) {
+    for (iter = top; iter != base; iter = bdrv_filtered_bs(iter)) {
         /* XXX BLK_PERM_WRITE needs to be allowed so we don't block ourselves
          * at s->base (if writes are blocked for a node, they are also blocked
          * for its backing file). The other options would be a second filter
@@ -410,20 +413,23 @@ int bdrv_commit(BlockDriverState *bs)
     if (!drv)
         return -ENOMEDIUM;
 
-    if (!bs->backing) {
+    backing_file_bs = bdrv_filtered_cow_bs(bs);
+
+    if (!backing_file_bs) {
         return -ENOTSUP;
     }
 
     if (bdrv_op_is_blocked(bs, BLOCK_OP_TYPE_COMMIT_SOURCE, NULL) ||
-        bdrv_op_is_blocked(bs->backing->bs, BLOCK_OP_TYPE_COMMIT_TARGET, NULL)) {
+        bdrv_op_is_blocked(backing_file_bs, BLOCK_OP_TYPE_COMMIT_TARGET, NULL))
+    {
         return -EBUSY;
     }
 
-    ro = bs->backing->bs->read_only;
-    open_flags =  bs->backing->bs->open_flags;
+    ro = backing_file_bs->read_only;
+    open_flags =  backing_file_bs->open_flags;
 
     if (ro) {
-        if (bdrv_reopen(bs->backing->bs, open_flags | BDRV_O_RDWR, NULL)) {
+        if (bdrv_reopen(backing_file_bs, open_flags | BDRV_O_RDWR, NULL)) {
             return -EACCES;
         }
     }
@@ -438,8 +444,6 @@ int bdrv_commit(BlockDriverState *bs)
     }
 
     /* Insert commit_top block node above backing, so we can write to it */
-    backing_file_bs = backing_bs(bs);
-
     commit_top_bs = bdrv_new_open_driver(&bdrv_commit_top, NULL, BDRV_O_RDWR,
                                          &local_err);
     if (commit_top_bs == NULL) {
@@ -525,15 +529,13 @@ ro_cleanup:
     qemu_vfree(buf);
 
     blk_unref(backing);
-    if (backing_file_bs) {
-        bdrv_set_backing_hd(bs, backing_file_bs, &error_abort);
-    }
+    bdrv_set_backing_hd(bs, backing_file_bs, &error_abort);
     bdrv_unref(commit_top_bs);
     blk_unref(src);
 
     if (ro) {
         /* ignoring error return here */
-        bdrv_reopen(bs->backing->bs, open_flags & ~BDRV_O_RDWR, NULL);
+        bdrv_reopen(backing_file_bs, open_flags & ~BDRV_O_RDWR, NULL);
     }
 
     return ret;
diff --git a/block/io.c b/block/io.c
index 7100344c7b..8a442d37b2 100644
--- a/block/io.c
+++ b/block/io.c
@@ -120,6 +120,7 @@ static void bdrv_merge_limits(BlockLimits *dst, const BlockLimits *src)
 void bdrv_refresh_limits(BlockDriverState *bs, Error **errp)
 {
     BlockDriver *drv = bs->drv;
+    BlockDriverState *cow_bs = bdrv_filtered_cow_bs(bs);
     Error *local_err = NULL;
 
     memset(&bs->bl, 0, sizeof(bs->bl));
@@ -148,13 +149,13 @@ void bdrv_refresh_limits(BlockDriverState *bs, Error **errp)
         bs->bl.max_iov = IOV_MAX;
     }
 
-    if (bs->backing) {
-        bdrv_refresh_limits(bs->backing->bs, &local_err);
+    if (cow_bs) {
+        bdrv_refresh_limits(cow_bs, &local_err);
         if (local_err) {
             error_propagate(errp, local_err);
             return;
         }
-        bdrv_merge_limits(&bs->bl, &bs->backing->bs->bl);
+        bdrv_merge_limits(&bs->bl, &cow_bs->bl);
     }
 
     /* Then let the driver override it */
@@ -2170,11 +2171,12 @@ static int coroutine_fn bdrv_co_block_status(BlockDriverState *bs,
     if (ret & (BDRV_BLOCK_DATA | BDRV_BLOCK_ZERO)) {
         ret |= BDRV_BLOCK_ALLOCATED;
     } else if (want_zero) {
+        BlockDriverState *cow_bs = bdrv_filtered_cow_bs(bs);
+
         if (bdrv_unallocated_blocks_are_zero(bs)) {
             ret |= BDRV_BLOCK_ZERO;
-        } else if (bs->backing) {
-            BlockDriverState *bs2 = bs->backing->bs;
-            int64_t size2 = bdrv_getlength(bs2);
+        } else if (cow_bs) {
+            int64_t size2 = bdrv_getlength(cow_bs);
 
             if (size2 >= 0 && offset >= size2) {
                 ret |= BDRV_BLOCK_ZERO;
@@ -2239,7 +2241,7 @@ static int coroutine_fn bdrv_co_block_status_above(BlockDriverState *bs,
     bool first = true;
 
     assert(bs != base);
-    for (p = bs; p != base; p = backing_bs(p)) {
+    for (p = bs; p != base; p = bdrv_filtered_bs(p)) {
         ret = bdrv_co_block_status(p, want_zero, offset, bytes, pnum, map,
                                    file);
         if (ret < 0) {
@@ -2324,7 +2326,7 @@ int bdrv_block_status_above(BlockDriverState *bs, BlockDriverState *base,
 int bdrv_block_status(BlockDriverState *bs, int64_t offset, int64_t bytes,
                       int64_t *pnum, int64_t *map, BlockDriverState **file)
 {
-    return bdrv_block_status_above(bs, backing_bs(bs),
+    return bdrv_block_status_above(bs, bdrv_filtered_bs(bs),
                                    offset, bytes, pnum, map, file);
 }
 
@@ -2334,7 +2336,7 @@ int coroutine_fn bdrv_is_allocated(BlockDriverState *bs, int64_t offset,
     int ret;
     int64_t dummy;
 
-    ret = bdrv_common_block_status_above(bs, backing_bs(bs), false, offset,
+    ret = bdrv_common_block_status_above(bs, bdrv_filtered_bs(bs), false, offset,
                                          bytes, pnum ? pnum : &dummy, NULL,
                                          NULL);
     if (ret < 0) {
@@ -2390,7 +2392,7 @@ int bdrv_is_allocated_above(BlockDriverState *top,
             n = pnum_inter;
         }
 
-        intermediate = backing_bs(intermediate);
+        intermediate = bdrv_filtered_bs(intermediate);
     }
 
     *pnum = n;
@@ -3169,8 +3171,9 @@ int coroutine_fn bdrv_co_truncate(BdrvChild *child, int64_t offset,
     }
 
     if (!drv->bdrv_co_truncate) {
-        if (bs->file && drv->is_filter) {
-            ret = bdrv_co_truncate(bs->file, offset, prealloc, errp);
+        BdrvChild *filtered = bdrv_filtered_rw_child(bs);
+        if (filtered) {
+            ret = bdrv_co_truncate(filtered, offset, prealloc, errp);
             goto out;
         }
         error_setg(errp, "Image format driver does not support resize");
diff --git a/block/mirror.c b/block/mirror.c
index 5c561c6241..85f5742eae 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -618,7 +618,7 @@ static void mirror_exit(Job *job, void *opaque)
     MirrorExitData *data = opaque;
     MirrorBDSOpaque *bs_opaque = s->mirror_top_bs->opaque;
     AioContext *replace_aio_context = NULL;
-    BlockDriverState *src = s->mirror_top_bs->backing->bs;
+    BlockDriverState *src = bdrv_filtered_rw_bs(s->mirror_top_bs);
     BlockDriverState *target_bs = blk_bs(s->target);
     BlockDriverState *mirror_top_bs = s->mirror_top_bs;
     Error *local_err = NULL;
@@ -644,12 +644,13 @@ static void mirror_exit(Job *job, void *opaque)
 
     /* We don't access the source any more. Dropping any WRITE/RESIZE is
      * required before it could become a backing file of target_bs. */
-    bdrv_child_try_set_perm(mirror_top_bs->backing, 0, BLK_PERM_ALL,
-                            &error_abort);
+    bdrv_child_try_set_perm(bdrv_filtered_rw_child(mirror_top_bs),
+                            0, BLK_PERM_ALL, &error_abort);
     if (s->backing_mode == MIRROR_SOURCE_BACKING_CHAIN) {
         BlockDriverState *backing = s->is_none_mode ? src : s->base;
-        if (backing_bs(target_bs) != backing) {
-            bdrv_set_backing_hd(target_bs, backing, &local_err);
+        if (bdrv_backing_chain_next(target_bs) != backing) {
+            bdrv_set_backing_hd(bdrv_skip_rw_filters(target_bs), backing,
+                                &local_err);
             if (local_err) {
                 error_report_err(local_err);
                 data->ret = -EPERM;
@@ -698,9 +699,10 @@ static void mirror_exit(Job *job, void *opaque)
      * valid. Also give up permissions on mirror_top_bs->backing, which might
      * block the removal. */
     block_job_remove_all_bdrv(bjob);
-    bdrv_child_try_set_perm(mirror_top_bs->backing, 0, BLK_PERM_ALL,
-                            &error_abort);
-    bdrv_replace_node(mirror_top_bs, backing_bs(mirror_top_bs), &error_abort);
+    bdrv_child_try_set_perm(bdrv_filtered_rw_child(mirror_top_bs),
+                            0, BLK_PERM_ALL, &error_abort);
+    bdrv_replace_node(mirror_top_bs, bdrv_filtered_rw_bs(mirror_top_bs),
+                      &error_abort);
 
     /* We just changed the BDS the job BB refers to (with either or both of the
      * bdrv_replace_node() calls), so switch the BB back so the cleanup does
@@ -881,7 +883,7 @@ static void coroutine_fn mirror_run(void *opaque)
     } else {
         s->target_cluster_size = BDRV_SECTOR_SIZE;
     }
-    if (backing_filename[0] && !target_bs->backing &&
+    if (backing_filename[0] && !bdrv_filtered_cow_child(target_bs) &&
         s->granularity < s->target_cluster_size) {
         s->buf_size = MAX(s->buf_size, s->target_cluster_size);
         s->cow_bitmap = bitmap_new(length);
@@ -1060,7 +1062,7 @@ static void mirror_complete(Job *job, Error **errp)
     if (s->backing_mode == MIRROR_OPEN_BACKING_CHAIN) {
         int ret;
 
-        assert(!target->backing);
+        assert(!bdrv_filtered_cow_child(target));
         ret = bdrv_open_backing_file(target, NULL, "backing", errp);
         if (ret < 0) {
             return;
@@ -1604,7 +1606,9 @@ static void mirror_start_job(const char *job_id, BlockDriverState *bs,
      * any jobs in them must be blocked */
     if (target_is_backing) {
         BlockDriverState *iter;
-        for (iter = backing_bs(bs); iter != target; iter = backing_bs(iter)) {
+        for (iter = bdrv_filtered_bs(bs); iter != target;
+             iter = bdrv_filtered_bs(iter))
+        {
             /* XXX BLK_PERM_WRITE needs to be allowed so we don't block
              * ourselves at s->base (if writes are blocked for a node, they are
              * also blocked for its backing file). The other options would be a
@@ -1636,9 +1640,10 @@ fail:
         job_early_fail(&s->common.job);
     }
 
-    bdrv_child_try_set_perm(mirror_top_bs->backing, 0, BLK_PERM_ALL,
-                            &error_abort);
-    bdrv_replace_node(mirror_top_bs, backing_bs(mirror_top_bs), &error_abort);
+    bdrv_child_try_set_perm(bdrv_filtered_rw_child(mirror_top_bs),
+                            0, BLK_PERM_ALL, &error_abort);
+    bdrv_replace_node(mirror_top_bs, bdrv_filtered_rw_bs(mirror_top_bs),
+                      &error_abort);
 
     bdrv_unref(mirror_top_bs);
 }
@@ -1653,14 +1658,14 @@ void mirror_start(const char *job_id, BlockDriverState *bs,
                   MirrorCopyMode copy_mode, Error **errp)
 {
     bool is_none_mode;
-    BlockDriverState *base;
+    BlockDriverState *base = NULL;
 
     if (mode == MIRROR_SYNC_MODE_INCREMENTAL) {
         error_setg(errp, "Sync mode 'incremental' not supported");
         return;
     }
     is_none_mode = mode == MIRROR_SYNC_MODE_NONE;
-    base = mode == MIRROR_SYNC_MODE_TOP ? backing_bs(bs) : NULL;
+    base = mode == MIRROR_SYNC_MODE_TOP ? bdrv_backing_chain_next(bs) : NULL;
     mirror_start_job(job_id, bs, JOB_DEFAULT, target, replaces,
                      speed, granularity, buf_size, backing_mode,
                      on_source_error, on_target_error, unmap, NULL, NULL,
diff --git a/block/qapi.c b/block/qapi.c
index 430d4b24d4..f2eb83945e 100644
--- a/block/qapi.c
+++ b/block/qapi.c
@@ -149,9 +149,9 @@ BlockDeviceInfo *bdrv_block_device_info(BlockBackend *blk,
             return NULL;
         }
 
-        if (bs0->drv && bs0->backing) {
+        if (bs0->drv && bdrv_filtered_cow_child(bs0)) {
             info->backing_file_depth++;
-            bs0 = bs0->backing->bs;
+            bs0 = bdrv_filtered_cow_bs(bs0);
             (*p_image_info)->has_backing_image = true;
             p_image_info = &((*p_image_info)->backing_image);
         } else {
@@ -160,9 +160,8 @@ BlockDeviceInfo *bdrv_block_device_info(BlockBackend *blk,
 
         /* Skip automatically inserted nodes that the user isn't aware of for
          * query-block (blk != NULL), but not for query-named-block-nodes */
-        while (blk && bs0->drv && bs0->implicit) {
-            bs0 = backing_bs(bs0);
-            assert(bs0);
+        if (blk) {
+            bs0 = bdrv_skip_implicit_filters(bs0);
         }
     }
 
@@ -342,9 +341,9 @@ static void bdrv_query_info(BlockBackend *blk, BlockInfo **p_info,
     BlockDriverState *bs = blk_bs(blk);
     char *qdev;
 
-    /* Skip automatically inserted nodes that the user isn't aware of */
-    while (bs && bs->drv && bs->implicit) {
-        bs = backing_bs(bs);
+    if (bs) {
+        /* Skip automatically inserted nodes that the user isn't aware of */
+        bs = bdrv_skip_implicit_filters(bs);
     }
 
     info->device = g_strdup(blk_name(blk));
@@ -501,6 +500,7 @@ static void bdrv_query_blk_stats(BlockDeviceStats *ds, BlockBackend *blk)
 static BlockStats *bdrv_query_bds_stats(BlockDriverState *bs,
                                         bool blk_level)
 {
+    BlockDriverState *cow_bs;
     BlockStats *s = NULL;
 
     s = g_malloc0(sizeof(*s));
@@ -513,9 +513,8 @@ static BlockStats *bdrv_query_bds_stats(BlockDriverState *bs,
     /* Skip automatically inserted nodes that the user isn't aware of in
      * a BlockBackend-level command. Stay at the exact node for a node-level
      * command. */
-    while (blk_level && bs->drv && bs->implicit) {
-        bs = backing_bs(bs);
-        assert(bs);
+    if (blk_level) {
+        bs = bdrv_skip_implicit_filters(bs);
     }
 
     if (bdrv_get_node_name(bs)[0]) {
@@ -530,9 +529,10 @@ static BlockStats *bdrv_query_bds_stats(BlockDriverState *bs,
         s->parent = bdrv_query_bds_stats(bs->file->bs, blk_level);
     }
 
-    if (blk_level && bs->backing) {
+    cow_bs = bdrv_filtered_cow_bs(bs);
+    if (blk_level && cow_bs) {
         s->has_backing = true;
-        s->backing = bdrv_query_bds_stats(bs->backing->bs, blk_level);
+        s->backing = bdrv_query_bds_stats(cow_bs, blk_level);
     }
 
     return s;
diff --git a/block/stream.c b/block/stream.c
index 9264b68a1e..77933ed09e 100644
--- a/block/stream.c
+++ b/block/stream.c
@@ -64,10 +64,13 @@ static void stream_complete(Job *job, void *opaque)
     BlockJob *bjob = &s->common;
     StreamCompleteData *data = opaque;
     BlockDriverState *bs = blk_bs(bjob->blk);
+    BlockDriverState *unfiltered = bdrv_skip_rw_filters(bs);
     BlockDriverState *base = s->base;
     Error *local_err = NULL;
 
-    if (!job_is_cancelled(job) && bs->backing && data->ret == 0) {
+    if (!job_is_cancelled(job) && bdrv_filtered_cow_child(unfiltered) &&
+        data->ret == 0)
+    {
         const char *base_id = NULL, *base_fmt = NULL;
         if (base) {
             base_id = s->backing_file_str;
@@ -75,7 +78,7 @@ static void stream_complete(Job *job, void *opaque)
                 base_fmt = base->drv->format_name;
             }
         }
-        data->ret = bdrv_change_backing_file(bs, base_id, base_fmt);
+        data->ret = bdrv_change_backing_file(unfiltered, base_id, base_fmt);
         bdrv_set_backing_hd(bs, base, &local_err);
         if (local_err) {
             error_report_err(local_err);
@@ -112,7 +115,7 @@ static void coroutine_fn stream_run(void *opaque)
     int64_t n = 0; /* bytes */
     void *buf;
 
-    if (!bs->backing) {
+    if (!bdrv_filtered_child(bs)) {
         goto out;
     }
 
@@ -153,7 +156,7 @@ static void coroutine_fn stream_run(void *opaque)
         } else if (ret >= 0) {
             /* Copy if allocated in the intermediate images.  Limit to the
              * known-unallocated area [offset, offset+n*BDRV_SECTOR_SIZE).  */
-            ret = bdrv_is_allocated_above(backing_bs(bs), base,
+            ret = bdrv_is_allocated_above(bdrv_filtered_bs(bs), base,
                                           offset, n, &n);
 
             /* Finish early if end of backing file has been reached */
@@ -252,7 +255,9 @@ void stream_start(const char *job_id, BlockDriverState *bs,
      * disappear from the chain after this operation. The streaming job reads
      * every block only once, assuming that it doesn't change, so block writes
      * and resizes. */
-    for (iter = backing_bs(bs); iter && iter != base; iter = backing_bs(iter)) {
+    for (iter = bdrv_filtered_bs(bs); iter && iter != base;
+         iter = bdrv_filtered_bs(iter))
+    {
         block_job_add_bdrv(&s->common, "intermediate node", iter, 0,
                            BLK_PERM_CONSISTENT_READ | BLK_PERM_WRITE_UNCHANGED,
                            &error_abort);
diff --git a/blockdev.c b/blockdev.c
index 2d61588a9a..33dd6408c0 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -1092,7 +1092,7 @@ void hmp_commit(Monitor *mon, const QDict *qdict)
             return;
         }
 
-        bs = blk_bs(blk);
+        bs = bdrv_skip_implicit_filters(blk_bs(blk));
         aio_context = bdrv_get_aio_context(bs);
         aio_context_acquire(aio_context);
 
@@ -1662,7 +1662,7 @@ static void external_snapshot_prepare(BlkActionState *common,
         goto out;
     }
 
-    if (state->new_bs->backing != NULL) {
+    if (bdrv_filtered_cow_child(state->new_bs)) {
         error_setg(errp, "The snapshot already has a backing image");
         goto out;
     }
@@ -3158,6 +3158,11 @@ void qmp_block_stream(bool has_job_id, const char *job_id, const char *device,
         if (!base_bs) {
             goto out;
         }
+        /* Streaming copies data through COR, so all of the filters
+         * between the target and the base are considered.  Therefore,
+         * we can use bdrv_chain_contains() and do not have to use
+         * bdrv_legacy_chain_contains() (which does not go past
+         * explicitly added filters). */
         if (bs == base_bs || !bdrv_chain_contains(bs, base_bs)) {
             error_setg(errp, "Node '%s' is not a backing image of '%s'",
                        base_node, device);
@@ -3169,7 +3174,7 @@ void qmp_block_stream(bool has_job_id, const char *job_id, const char *device,
     }
 
     /* Check for op blockers in the whole chain between bs and base */
-    for (iter = bs; iter && iter != base_bs; iter = backing_bs(iter)) {
+    for (iter = bs; iter && iter != base_bs; iter = bdrv_filtered_bs(iter)) {
         if (bdrv_op_is_blocked(iter, BLOCK_OP_TYPE_STREAM, errp)) {
             goto out;
         }
@@ -3282,7 +3287,9 @@ void qmp_block_commit(bool has_job_id, const char *job_id, const char *device,
 
     assert(bdrv_get_aio_context(base_bs) == aio_context);
 
-    for (iter = top_bs; iter != backing_bs(base_bs); iter = backing_bs(iter)) {
+    for (iter = top_bs; iter != bdrv_filtered_bs(base_bs);
+         iter = bdrv_filtered_bs(iter))
+    {
         if (bdrv_op_is_blocked(iter, BLOCK_OP_TYPE_COMMIT_TARGET, errp)) {
             goto out;
         }
@@ -3293,6 +3300,12 @@ void qmp_block_commit(bool has_job_id, const char *job_id, const char *device,
         error_setg(errp, "cannot commit an image into itself");
         goto out;
     }
+    if (!bdrv_legacy_chain_contains(top_bs, base_bs)) {
+        /* We have to disallow this until the user can give explicit
+         * consent */
+        error_setg(errp, "Cannot commit through explicit filter nodes");
+        goto out;
+    }
 
     if (top_bs == bs) {
         if (has_backing_file) {
@@ -3384,7 +3397,11 @@ static BlockJob *do_drive_backup(DriveBackup *backup, JobTxn *txn,
     /* See if we have a backing HD we can use to create our new image
      * on top of. */
     if (backup->sync == MIRROR_SYNC_MODE_TOP) {
-        source = backing_bs(bs);
+        /* Backup will not replace the source by the target, so none
+         * of the filters skipped here will be removed (in contrast to
+         * mirror).  Therefore, we can skip all of them when looking
+         * for the first COW relationship. */
+        source = bdrv_filtered_cow_bs(bdrv_skip_rw_filters(bs));
         if (!source) {
             backup->sync = MIRROR_SYNC_MODE_FULL;
         }
@@ -3404,9 +3421,14 @@ static BlockJob *do_drive_backup(DriveBackup *backup, JobTxn *txn,
     if (backup->mode != NEW_IMAGE_MODE_EXISTING) {
         assert(backup->format);
         if (source) {
-            bdrv_refresh_filename(source);
-            bdrv_img_create(backup->target, backup->format, source->filename,
-                            source->drv->format_name, NULL,
+            /* Implicit filters should not appear in the filename */
+            BlockDriverState *explicit_backing =
+                bdrv_skip_implicit_filters(source);
+
+            bdrv_refresh_filename(explicit_backing);
+            bdrv_img_create(backup->target, backup->format,
+                            explicit_backing->filename,
+                            explicit_backing->drv->format_name, NULL,
                             size, flags, false, &local_err);
         } else {
             bdrv_img_create(backup->target, backup->format, NULL, NULL, NULL,
@@ -3640,7 +3662,7 @@ static void blockdev_mirror_common(const char *job_id, BlockDriverState *bs,
         return;
     }
 
-    if (!bs->backing && sync == MIRROR_SYNC_MODE_TOP) {
+    if (!bdrv_backing_chain_next(bs) && sync == MIRROR_SYNC_MODE_TOP) {
         sync = MIRROR_SYNC_MODE_FULL;
     }
 
@@ -3680,8 +3702,7 @@ static void blockdev_mirror_common(const char *job_id, BlockDriverState *bs,
     /* pass the node name to replace to mirror start since it's loose coupling
      * and will allow to check whether the node still exist at mirror completion
      */
-    mirror_start(job_id, bs, target,
-                 has_replaces ? replaces : NULL,
+    mirror_start(job_id, bs, target, replaces,
                  speed, granularity, buf_size, sync, backing_mode,
                  on_source_error, on_target_error, unmap, filter_node_name,
                  copy_mode, errp);
@@ -3689,7 +3710,7 @@ static void blockdev_mirror_common(const char *job_id, BlockDriverState *bs,
 
 void qmp_drive_mirror(DriveMirror *arg, Error **errp)
 {
-    BlockDriverState *bs;
+    BlockDriverState *bs, *unfiltered_bs;
     BlockDriverState *source, *target_bs;
     AioContext *aio_context;
     BlockMirrorBackingMode backing_mode;
@@ -3698,6 +3719,7 @@ void qmp_drive_mirror(DriveMirror *arg, Error **errp)
     int flags;
     int64_t size;
     const char *format = arg->format;
+    const char *replaces_node_name = NULL;
 
     bs = qmp_get_root_bs(arg->device, errp);
     if (!bs) {
@@ -3709,6 +3731,14 @@ void qmp_drive_mirror(DriveMirror *arg, Error **errp)
         return;
     }
 
+    /* If the user has not instructed us otherwise, we should let the
+     * block job run from @bs (thus taking into account all filters on
+     * it) but replace @unfiltered_bs when it finishes (thus not
+     * removing those filters).
+     * (And if there are any explicit filters, we should assume the
+     *  user knows how to use the @replaces option.) */
+    unfiltered_bs = bdrv_skip_implicit_filters(bs);
+
     aio_context = bdrv_get_aio_context(bs);
     aio_context_acquire(aio_context);
 
@@ -3722,8 +3752,14 @@ void qmp_drive_mirror(DriveMirror *arg, Error **errp)
     }
 
     flags = bs->open_flags | BDRV_O_RDWR;
-    source = backing_bs(bs);
+    source = bdrv_filtered_cow_bs(unfiltered_bs);
     if (!source && arg->sync == MIRROR_SYNC_MODE_TOP) {
+        if (bdrv_filtered_bs(unfiltered_bs)) {
+            /* @unfiltered_bs is an explicit filter */
+            error_setg(errp, "Cannot perform sync=top mirror through an "
+                       "explicitly added filter node on the source");
+            goto out;
+        }
         arg->sync = MIRROR_SYNC_MODE_FULL;
     }
     if (arg->sync == MIRROR_SYNC_MODE_NONE) {
@@ -3742,6 +3778,9 @@ void qmp_drive_mirror(DriveMirror *arg, Error **errp)
                              " named node of the graph");
             goto out;
         }
+        replaces_node_name = arg->replaces;
+    } else if (unfiltered_bs != bs) {
+        replaces_node_name = unfiltered_bs->node_name;
     }
 
     if (arg->mode == NEW_IMAGE_MODE_ABSOLUTE_PATHS) {
@@ -3761,6 +3800,9 @@ void qmp_drive_mirror(DriveMirror *arg, Error **errp)
         bdrv_img_create(arg->target, format,
                         NULL, NULL, NULL, size, flags, false, &local_err);
     } else {
+        /* Implicit filters should not appear in the filename */
+        BlockDriverState *explicit_backing = bdrv_skip_implicit_filters(source);
+
         switch (arg->mode) {
         case NEW_IMAGE_MODE_EXISTING:
             break;
@@ -3768,8 +3810,8 @@ void qmp_drive_mirror(DriveMirror *arg, Error **errp)
             /* create new image with backing file */
             bdrv_refresh_filename(source);
             bdrv_img_create(arg->target, format,
-                            source->filename,
-                            source->drv->format_name,
+                            explicit_backing->filename,
+                            explicit_backing->drv->format_name,
                             NULL, size, flags, false, &local_err);
             break;
         default:
@@ -3801,7 +3843,7 @@ void qmp_drive_mirror(DriveMirror *arg, Error **errp)
     bdrv_set_aio_context(target_bs, aio_context);
 
     blockdev_mirror_common(arg->has_job_id ? arg->job_id : NULL, bs, target_bs,
-                           arg->has_replaces, arg->replaces, arg->sync,
+                           !!replaces_node_name, replaces_node_name, arg->sync,
                            backing_mode, arg->has_speed, arg->speed,
                            arg->has_granularity, arg->granularity,
                            arg->has_buf_size, arg->buf_size,
@@ -3833,7 +3875,7 @@ void qmp_blockdev_mirror(bool has_job_id, const char *job_id,
                          bool has_copy_mode, MirrorCopyMode copy_mode,
                          Error **errp)
 {
-    BlockDriverState *bs;
+    BlockDriverState *bs, *unfiltered_bs;
     BlockDriverState *target_bs;
     AioContext *aio_context;
     BlockMirrorBackingMode backing_mode = MIRROR_LEAVE_BACKING_CHAIN;
@@ -3844,6 +3886,14 @@ void qmp_blockdev_mirror(bool has_job_id, const char *job_id,
         return;
     }
 
+    /* Same as in qmp_drive_mirror(): We want to run the job from @bs,
+     * but we want to replace @unfiltered_bs on completion. */
+    unfiltered_bs = bdrv_skip_implicit_filters(bs);
+    if (!has_replaces && unfiltered_bs != bs) {
+        replaces = unfiltered_bs->node_name;
+        has_replaces = true;
+    }
+
     target_bs = bdrv_lookup_bs(target, target, errp);
     if (!target_bs) {
         return;
diff --git a/migration/block-dirty-bitmap.c b/migration/block-dirty-bitmap.c
index 477826330c..2890dffc73 100644
--- a/migration/block-dirty-bitmap.c
+++ b/migration/block-dirty-bitmap.c
@@ -284,9 +284,7 @@ static int init_dirty_bitmap_migration(void)
         const char *drive_name = bdrv_get_device_or_node_name(bs);
 
         /* skip automatically inserted nodes */
-        while (bs && bs->drv && bs->implicit) {
-            bs = backing_bs(bs);
-        }
+        bs = bdrv_skip_implicit_filters(bs);
 
         for (bitmap = bdrv_dirty_bitmap_next(bs, NULL); bitmap;
              bitmap = bdrv_dirty_bitmap_next(bs, bitmap))
diff --git a/nbd/server.c b/nbd/server.c
index ea5fe0eb33..ea654a20c1 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -2415,13 +2415,9 @@ void nbd_export_bitmap(NBDExport *exp, const char *bitmap,
         return;
     }
 
-    while (true) {
+    while (bs && !bm) {
         bm = bdrv_find_dirty_bitmap(bs, bitmap);
-        if (bm != NULL || bs->backing == NULL) {
-            break;
-        }
-
-        bs = bs->backing->bs;
+        bs = bdrv_filtered_bs(bs);
     }
 
     if (bm == NULL) {
diff --git a/qemu-img.c b/qemu-img.c
index 0752bbe4d9..307e72c9fd 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -996,7 +996,7 @@ static int img_commit(int argc, char **argv)
     if (!blk) {
         return 1;
     }
-    bs = blk_bs(blk);
+    bs = bdrv_skip_implicit_filters(blk_bs(blk));
 
     qemu_progress_init(progress, 1.f);
     qemu_progress_print(0.f, 100);
@@ -1013,7 +1013,7 @@ static int img_commit(int argc, char **argv)
         /* This is different from QMP, which by default uses the deepest file in
          * the backing chain (i.e., the very base); however, the traditional
          * behavior of qemu-img commit is using the immediate backing file. */
-        base_bs = backing_bs(bs);
+        base_bs = bdrv_filtered_cow_bs(bs);
         if (!base_bs) {
             error_setg(&local_err, "Image does not have a backing file");
             goto done;
@@ -2438,7 +2438,8 @@ static int img_convert(int argc, char **argv)
          * s.target_backing_sectors has to be negative, which it will
          * be automatically).  The backing file length is used only
          * for optimizations, so such a case is not fatal. */
-        s.target_backing_sectors = bdrv_nb_sectors(out_bs->backing->bs);
+        s.target_backing_sectors =
+            bdrv_nb_sectors(bdrv_filtered_cow_bs(out_bs));
     } else {
         s.target_backing_sectors = -1;
     }
@@ -2806,11 +2807,12 @@ static int get_block_status(BlockDriverState *bs, int64_t offset,
         if (ret & (BDRV_BLOCK_ZERO|BDRV_BLOCK_DATA)) {
             break;
         }
-        bs = backing_bs(bs);
+        bs = bdrv_filtered_cow_bs(bs);
         if (bs == NULL) {
             ret = 0;
             break;
         }
+        bs = bdrv_skip_implicit_filters(bs);
 
         depth++;
     }
@@ -2940,7 +2942,7 @@ static int img_map(int argc, char **argv)
     if (!blk) {
         return 1;
     }
-    bs = blk_bs(blk);
+    bs = bdrv_skip_implicit_filters(blk_bs(blk));
 
     if (output_format == OFORMAT_HUMAN) {
         printf("%-16s%-16s%-16s%s\n", "Offset", "Length", "Mapped to", "File");
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [Qemu-devel] [PATCH v2 04/11] block: Storage child access function
  2018-08-09 22:31 [Qemu-devel] [PATCH v2 00/11] block: Deal with filters Max Reitz
                   ` (2 preceding siblings ...)
  2018-08-09 22:31 ` [Qemu-devel] [PATCH v2 03/11] block: Filtered children access functions Max Reitz
@ 2018-08-09 22:31 ` Max Reitz
  2018-11-12 22:32   ` Eric Blake
  2018-08-09 22:31 ` [Qemu-devel] [PATCH v2 05/11] block: Fix check_to_replace_node() Max Reitz
                   ` (9 subsequent siblings)
  13 siblings, 1 reply; 27+ messages in thread
From: Max Reitz @ 2018-08-09 22:31 UTC (permalink / raw)
  To: qemu-block; +Cc: qemu-devel, Max Reitz, Kevin Wolf

For completeness' sake, add a function for accessing a node's storage
child, too.  For filters, this is there filtered child; for non-filters,
this is bs->file.

Some places are deliberately left unconverted:
- BDS opening/closing functions where bs->file is handled specially
  (which is basically wrong, but at least simplifies probing)
- bdrv_co_block_status_from_file(), because its name implies that it
  points to ->file
- bdrv_snapshot_goto() in one places unrefs bs->file.  Such a
  modification is not covered by this patch and is therefore just
  safeguarded by an additional assert(), but otherwise kept as-is.

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 include/block/block_int.h |  6 +++++
 block.c                   | 51 ++++++++++++++++++++++++++++-----------
 block/io.c                | 20 +++++++++------
 block/qapi.c              |  7 +++---
 block/snapshot.c          | 40 +++++++++++++++++-------------
 5 files changed, 82 insertions(+), 42 deletions(-)

diff --git a/include/block/block_int.h b/include/block/block_int.h
index fa9154899d..d3d8b22155 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -1214,6 +1214,7 @@ int refresh_total_sectors(BlockDriverState *bs, int64_t hint);
 BdrvChild *bdrv_filtered_cow_child(BlockDriverState *bs);
 BdrvChild *bdrv_filtered_rw_child(BlockDriverState *bs);
 BdrvChild *bdrv_filtered_child(BlockDriverState *bs);
+BdrvChild *bdrv_storage_child(BlockDriverState *bs);
 BlockDriverState *bdrv_skip_implicit_filters(BlockDriverState *bs);
 BlockDriverState *bdrv_skip_rw_filters(BlockDriverState *bs);
 BlockDriverState *bdrv_backing_chain_next(BlockDriverState *bs);
@@ -1238,4 +1239,9 @@ static inline BlockDriverState *bdrv_filtered_bs(BlockDriverState *bs)
     return child_bs(bdrv_filtered_child(bs));
 }
 
+static inline BlockDriverState *bdrv_storage_bs(BlockDriverState *bs)
+{
+    return child_bs(bdrv_storage_child(bs));
+}
+
 #endif /* BLOCK_INT_H */
diff --git a/block.c b/block.c
index 61a2fe14eb..a3a121551a 100644
--- a/block.c
+++ b/block.c
@@ -3835,15 +3835,21 @@ exit:
 int64_t bdrv_get_allocated_file_size(BlockDriverState *bs)
 {
     BlockDriver *drv = bs->drv;
+    BlockDriverState *storage_bs;
+
     if (!drv) {
         return -ENOMEDIUM;
     }
+
     if (drv->bdrv_get_allocated_file_size) {
         return drv->bdrv_get_allocated_file_size(bs);
     }
-    if (bs->file) {
-        return bdrv_get_allocated_file_size(bs->file->bs);
+
+    storage_bs = bdrv_storage_bs(bs);
+    if (storage_bs) {
+        return bdrv_get_allocated_file_size(storage_bs);
     }
+
     return -ENOTSUP;
 }
 
@@ -4252,7 +4258,7 @@ int bdrv_debug_breakpoint(BlockDriverState *bs, const char *event,
                           const char *tag)
 {
     while (bs && bs->drv && !bs->drv->bdrv_debug_breakpoint) {
-        bs = bs->file ? bs->file->bs : NULL;
+        bs = bdrv_storage_bs(bs);
     }
 
     if (bs && bs->drv && bs->drv->bdrv_debug_breakpoint) {
@@ -4265,7 +4271,7 @@ int bdrv_debug_breakpoint(BlockDriverState *bs, const char *event,
 int bdrv_debug_remove_breakpoint(BlockDriverState *bs, const char *tag)
 {
     while (bs && bs->drv && !bs->drv->bdrv_debug_remove_breakpoint) {
-        bs = bs->file ? bs->file->bs : NULL;
+        bs = bdrv_storage_bs(bs);
     }
 
     if (bs && bs->drv && bs->drv->bdrv_debug_remove_breakpoint) {
@@ -4278,7 +4284,7 @@ int bdrv_debug_remove_breakpoint(BlockDriverState *bs, const char *tag)
 int bdrv_debug_resume(BlockDriverState *bs, const char *tag)
 {
     while (bs && (!bs->drv || !bs->drv->bdrv_debug_resume)) {
-        bs = bs->file ? bs->file->bs : NULL;
+        bs = bdrv_storage_bs(bs);
     }
 
     if (bs && bs->drv && bs->drv->bdrv_debug_resume) {
@@ -4291,7 +4297,7 @@ int bdrv_debug_resume(BlockDriverState *bs, const char *tag)
 bool bdrv_debug_is_suspended(BlockDriverState *bs, const char *tag)
 {
     while (bs && bs->drv && !bs->drv->bdrv_debug_is_suspended) {
-        bs = bs->file ? bs->file->bs : NULL;
+        bs = bdrv_storage_bs(bs);
     }
 
     if (bs && bs->drv && bs->drv->bdrv_debug_is_suspended) {
@@ -5388,14 +5394,21 @@ void bdrv_refresh_filename(BlockDriverState *bs)
         bs->exact_filename[0] = '\0';
 
         drv->bdrv_refresh_filename(bs);
-    } else if (bs->file) {
-        /* Try to reconstruct valid information from the underlying file */
+    } else if (bdrv_storage_child(bs)) {
+        /* Try to reconstruct valid information from the underlying
+         * file -- this only works for format nodes (filter nodes
+         * cannot be probed and as such must be selected by the user
+         * either through an options dict, or through a special
+         * filename which the filter driver must construct in its
+         * .bdrv_refresh_filename() implementation) */
+        BlockDriverState *storage_bs = bdrv_storage_bs(bs);
 
         bs->exact_filename[0] = '\0';
 
         /*
          * We can use the underlying file's filename if:
          * - it has a filename,
+         * - the current BDS is not a filter,
          * - the file is a protocol BDS, and
          * - opening that file (as this BDS's format) will automatically create
          *   the BDS tree we have right now, that is:
@@ -5404,11 +5417,10 @@ void bdrv_refresh_filename(BlockDriverState *bs)
          *   - no non-file child of this BDS has been overridden by the user
          *   Both of these conditions are represented by generate_json_filename.
          */
-        if (bs->file->bs->exact_filename[0] &&
-            bs->file->bs->drv->bdrv_file_open &&
-            !generate_json_filename)
+        if (storage_bs->exact_filename[0] && storage_bs->drv->bdrv_file_open &&
+            !drv->is_filter && !generate_json_filename)
         {
-            strcpy(bs->exact_filename, bs->file->bs->exact_filename);
+            strcpy(bs->exact_filename, storage_bs->exact_filename);
         }
     }
 
@@ -5425,6 +5437,7 @@ void bdrv_refresh_filename(BlockDriverState *bs)
 char *bdrv_dirname(BlockDriverState *bs, Error **errp)
 {
     BlockDriver *drv = bs->drv;
+    BlockDriverState *storage_bs;
 
     if (!drv) {
         error_setg(errp, "Node '%s' is ejected", bs->node_name);
@@ -5435,8 +5448,9 @@ char *bdrv_dirname(BlockDriverState *bs, Error **errp)
         return drv->bdrv_dirname(bs, errp);
     }
 
-    if (bs->file) {
-        return bdrv_dirname(bs->file->bs, errp);
+    storage_bs = bdrv_storage_bs(bs);
+    if (storage_bs) {
+        return bdrv_dirname(storage_bs, errp);
     }
 
     bdrv_refresh_filename(bs);
@@ -5570,6 +5584,15 @@ BdrvChild *bdrv_filtered_child(BlockDriverState *bs)
     return cow_child ?: rw_child;
 }
 
+/*
+ * Return the child that stores the data that is allocated on this
+ * node.  This may or may not include metadata.
+ */
+BdrvChild *bdrv_storage_child(BlockDriverState *bs)
+{
+    return bdrv_filtered_rw_child(bs) ?: bs->file;
+}
+
 static BlockDriverState *bdrv_skip_filters(BlockDriverState *bs,
                                            bool stop_on_explicit_filter)
 {
diff --git a/block/io.c b/block/io.c
index 8a442d37b2..7fb81287c5 100644
--- a/block/io.c
+++ b/block/io.c
@@ -120,6 +120,7 @@ static void bdrv_merge_limits(BlockLimits *dst, const BlockLimits *src)
 void bdrv_refresh_limits(BlockDriverState *bs, Error **errp)
 {
     BlockDriver *drv = bs->drv;
+    BlockDriverState *storage_bs = bdrv_storage_bs(bs);
     BlockDriverState *cow_bs = bdrv_filtered_cow_bs(bs);
     Error *local_err = NULL;
 
@@ -134,13 +135,13 @@ void bdrv_refresh_limits(BlockDriverState *bs, Error **errp)
                                 drv->bdrv_aio_preadv) ? 1 : 512;
 
     /* Take some limits from the children as a default */
-    if (bs->file) {
-        bdrv_refresh_limits(bs->file->bs, &local_err);
+    if (storage_bs) {
+        bdrv_refresh_limits(storage_bs, &local_err);
         if (local_err) {
             error_propagate(errp, local_err);
             return;
         }
-        bdrv_merge_limits(&bs->bl, &bs->file->bs->bl);
+        bdrv_merge_limits(&bs->bl, &storage_bs->bl);
     } else {
         bs->bl.min_mem_alignment = 512;
         bs->bl.opt_mem_alignment = getpagesize();
@@ -2412,6 +2413,7 @@ bdrv_co_rw_vmstate(BlockDriverState *bs, QEMUIOVector *qiov, int64_t pos,
                    bool is_read)
 {
     BlockDriver *drv = bs->drv;
+    BlockDriverState *storage_bs = bdrv_storage_bs(bs);
     int ret = -ENOTSUP;
 
     bdrv_inc_in_flight(bs);
@@ -2424,8 +2426,8 @@ bdrv_co_rw_vmstate(BlockDriverState *bs, QEMUIOVector *qiov, int64_t pos,
         } else {
             ret = drv->bdrv_save_vmstate(bs, qiov, pos);
         }
-    } else if (bs->file) {
-        ret = bdrv_co_rw_vmstate(bs->file->bs, qiov, pos, is_read);
+    } else if (storage_bs) {
+        ret = bdrv_co_rw_vmstate(storage_bs, qiov, pos, is_read);
     }
 
     bdrv_dec_in_flight(bs);
@@ -2561,6 +2563,7 @@ static void coroutine_fn bdrv_flush_co_entry(void *opaque)
 
 int coroutine_fn bdrv_co_flush(BlockDriverState *bs)
 {
+    BlockDriverState *storage_bs;
     int current_gen;
     int ret = 0;
 
@@ -2590,7 +2593,7 @@ int coroutine_fn bdrv_co_flush(BlockDriverState *bs)
     }
 
     /* Write back cached data to the OS even with cache=unsafe */
-    BLKDBG_EVENT(bs->file, BLKDBG_FLUSH_TO_OS);
+    BLKDBG_EVENT(bdrv_storage_child(bs), BLKDBG_FLUSH_TO_OS);
     if (bs->drv->bdrv_co_flush_to_os) {
         ret = bs->drv->bdrv_co_flush_to_os(bs);
         if (ret < 0) {
@@ -2608,7 +2611,7 @@ int coroutine_fn bdrv_co_flush(BlockDriverState *bs)
         goto flush_parent;
     }
 
-    BLKDBG_EVENT(bs->file, BLKDBG_FLUSH_TO_DISK);
+    BLKDBG_EVENT(bdrv_storage_child(bs), BLKDBG_FLUSH_TO_DISK);
     if (!bs->drv) {
         /* bs->drv->bdrv_co_flush() might have ejected the BDS
          * (even in case of apparent success) */
@@ -2653,7 +2656,8 @@ int coroutine_fn bdrv_co_flush(BlockDriverState *bs)
      * in the case of cache=unsafe, so there are no useless flushes.
      */
 flush_parent:
-    ret = bs->file ? bdrv_co_flush(bs->file->bs) : 0;
+    storage_bs = bdrv_storage_bs(bs);
+    ret = storage_bs ? bdrv_co_flush(storage_bs) : 0;
 out:
     /* Notify any pending flushes that we have completed */
     if (ret == 0) {
diff --git a/block/qapi.c b/block/qapi.c
index f2eb83945e..cbee819c13 100644
--- a/block/qapi.c
+++ b/block/qapi.c
@@ -500,7 +500,7 @@ static void bdrv_query_blk_stats(BlockDeviceStats *ds, BlockBackend *blk)
 static BlockStats *bdrv_query_bds_stats(BlockDriverState *bs,
                                         bool blk_level)
 {
-    BlockDriverState *cow_bs;
+    BlockDriverState *storage_bs, *cow_bs;
     BlockStats *s = NULL;
 
     s = g_malloc0(sizeof(*s));
@@ -524,9 +524,10 @@ static BlockStats *bdrv_query_bds_stats(BlockDriverState *bs,
 
     s->stats->wr_highest_offset = stat64_get(&bs->wr_highest_offset);
 
-    if (bs->file) {
+    storage_bs = bdrv_storage_bs(bs);
+    if (storage_bs) {
         s->has_parent = true;
-        s->parent = bdrv_query_bds_stats(bs->file->bs, blk_level);
+        s->parent = bdrv_query_bds_stats(storage_bs, blk_level);
     }
 
     cow_bs = bdrv_filtered_cow_bs(bs);
diff --git a/block/snapshot.c b/block/snapshot.c
index f9903bc94e..2c5997fc73 100644
--- a/block/snapshot.c
+++ b/block/snapshot.c
@@ -154,8 +154,9 @@ int bdrv_can_snapshot(BlockDriverState *bs)
     }
 
     if (!drv->bdrv_snapshot_create) {
-        if (bs->file != NULL) {
-            return bdrv_can_snapshot(bs->file->bs);
+        BlockDriverState *storage_bs = bdrv_storage_bs(bs);
+        if (storage_bs) {
+            return bdrv_can_snapshot(storage_bs);
         }
         return 0;
     }
@@ -167,14 +168,15 @@ int bdrv_snapshot_create(BlockDriverState *bs,
                          QEMUSnapshotInfo *sn_info)
 {
     BlockDriver *drv = bs->drv;
+    BlockDriverState *storage_bs = bdrv_storage_bs(bs);
     if (!drv) {
         return -ENOMEDIUM;
     }
     if (drv->bdrv_snapshot_create) {
         return drv->bdrv_snapshot_create(bs, sn_info);
     }
-    if (bs->file) {
-        return bdrv_snapshot_create(bs->file->bs, sn_info);
+    if (storage_bs) {
+        return bdrv_snapshot_create(storage_bs, sn_info);
     }
     return -ENOTSUP;
 }
@@ -184,6 +186,7 @@ int bdrv_snapshot_goto(BlockDriverState *bs,
                        Error **errp)
 {
     BlockDriver *drv = bs->drv;
+    BlockDriverState *storage_bs;
     int ret, open_ret;
 
     if (!drv) {
@@ -204,37 +207,38 @@ int bdrv_snapshot_goto(BlockDriverState *bs,
         return ret;
     }
 
-    if (bs->file) {
-        BlockDriverState *file;
+    storage_bs = bdrv_storage_bs(bs);
+    if (storage_bs) {
         QDict *options = qdict_clone_shallow(bs->options);
         QDict *file_options;
         Error *local_err = NULL;
 
-        file = bs->file->bs;
         /* Prevent it from getting deleted when detached from bs */
-        bdrv_ref(file);
+        bdrv_ref(storage_bs);
 
         qdict_extract_subqdict(options, &file_options, "file.");
         qobject_unref(file_options);
-        qdict_put_str(options, "file", bdrv_get_node_name(file));
+        qdict_put_str(options, "file", bdrv_get_node_name(storage_bs));
 
         drv->bdrv_close(bs);
+
+        assert(bs->file->bs == storage_bs);
         bdrv_unref_child(bs, bs->file);
         bs->file = NULL;
 
-        ret = bdrv_snapshot_goto(file, snapshot_id, errp);
+        ret = bdrv_snapshot_goto(storage_bs, snapshot_id, errp);
         open_ret = drv->bdrv_open(bs, options, bs->open_flags, &local_err);
         qobject_unref(options);
         if (open_ret < 0) {
-            bdrv_unref(file);
+            bdrv_unref(storage_bs);
             bs->drv = NULL;
             /* A bdrv_snapshot_goto() error takes precedence */
             error_propagate(errp, local_err);
             return ret < 0 ? ret : open_ret;
         }
 
-        assert(bs->file->bs == file);
-        bdrv_unref(file);
+        assert(bs->file->bs == storage_bs);
+        bdrv_unref(storage_bs);
         return ret;
     }
 
@@ -270,6 +274,7 @@ int bdrv_snapshot_delete(BlockDriverState *bs,
                          Error **errp)
 {
     BlockDriver *drv = bs->drv;
+    BlockDriverState *storage_bs = bdrv_storage_bs(bs);
     int ret;
 
     if (!drv) {
@@ -286,8 +291,8 @@ int bdrv_snapshot_delete(BlockDriverState *bs,
 
     if (drv->bdrv_snapshot_delete) {
         ret = drv->bdrv_snapshot_delete(bs, snapshot_id, name, errp);
-    } else if (bs->file) {
-        ret = bdrv_snapshot_delete(bs->file->bs, snapshot_id, name, errp);
+    } else if (storage_bs) {
+        ret = bdrv_snapshot_delete(storage_bs, snapshot_id, name, errp);
     } else {
         error_setg(errp, "Block format '%s' used by device '%s' "
                    "does not support internal snapshot deletion",
@@ -323,14 +328,15 @@ int bdrv_snapshot_list(BlockDriverState *bs,
                        QEMUSnapshotInfo **psn_info)
 {
     BlockDriver *drv = bs->drv;
+    BlockDriverState *storage_bs = bdrv_storage_bs(bs);
     if (!drv) {
         return -ENOMEDIUM;
     }
     if (drv->bdrv_snapshot_list) {
         return drv->bdrv_snapshot_list(bs, psn_info);
     }
-    if (bs->file) {
-        return bdrv_snapshot_list(bs->file->bs, psn_info);
+    if (storage_bs) {
+        return bdrv_snapshot_list(storage_bs, psn_info);
     }
     return -ENOTSUP;
 }
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [Qemu-devel] [PATCH v2 05/11] block: Fix check_to_replace_node()
  2018-08-09 22:31 [Qemu-devel] [PATCH v2 00/11] block: Deal with filters Max Reitz
                   ` (3 preceding siblings ...)
  2018-08-09 22:31 ` [Qemu-devel] [PATCH v2 04/11] block: Storage child access function Max Reitz
@ 2018-08-09 22:31 ` Max Reitz
  2018-11-12 22:47   ` [Qemu-devel] [for 3.1? Qemu-devel] " Eric Blake
  2018-08-09 22:31 ` [Qemu-devel] [PATCH v2 06/11] iotests: Add tests for mirror @replaces loops Max Reitz
                   ` (8 subsequent siblings)
  13 siblings, 1 reply; 27+ messages in thread
From: Max Reitz @ 2018-08-09 22:31 UTC (permalink / raw)
  To: qemu-block; +Cc: qemu-devel, Max Reitz, Kevin Wolf

Currently, check_to_replace_node() only allows mirror to replace a node
in the chain of the source node, and only if it is the first non-filter
node below the source.  Well, technically, the idea is that you can
exactly replace a quorum child by mirroring from quorum.

This has (probably) two reasons:
(1) We do not want to create loops.
(2) @replaces and @device should have exactly the same content so
    replacing them does not cause visible data to change.

This has two issues:
(1) It is overly restrictive.  It is completely fine for @replaces to be
    a filter.
(2) It is not restrictive enough.  You can create loops with this as
    follows:

$ qemu-img create -f qcow2 /tmp/source.qcow2 64M
$ qemu-system-x86_64 -qmp stdio
{"execute": "qmp_capabilities"}
{"execute": "object-add",
 "arguments": {"qom-type": "throttle-group", "id": "tg0"}}
{"execute": "blockdev-add",
 "arguments": {
     "node-name": "source",
     "driver": "throttle",
     "throttle-group": "tg0",
     "file": {
         "node-name": "filtered",
         "driver": "qcow2",
         "file": {
             "driver": "file",
             "filename": "/tmp/source.qcow2"
         } } } }
{"execute": "drive-mirror",
 "arguments": {
     "job-id": "mirror",
     "device": "source",
     "target": "/tmp/target.qcow2",
     "format": "qcow2",
     "node-name": "target",
     "sync" :"none",
     "replaces": "filtered"
 } }
{"execute": "block-job-complete", "arguments": {"device": "mirror"}}

And qemu crashes because of a stack overflow due to the loop being
created (target's backing file is source, so when it replaces filtered,
it points to itself through source).

(blockdev-mirror can be broken similarly.)

So let us make the checks for the two conditions above explicit, which
makes the whole function exactly as restrictive as it needs to be.

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 include/block/block.h |  1 +
 block.c               | 77 +++++++++++++++++++++++++++++++++++++++----
 blockdev.c            | 32 ++++++++++++++++--
 3 files changed, 101 insertions(+), 9 deletions(-)

diff --git a/include/block/block.h b/include/block/block.h
index a01986495d..a738fef601 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -385,6 +385,7 @@ bool bdrv_is_first_non_filter(BlockDriverState *candidate);
 
 /* check if a named node can be replaced when doing drive-mirror */
 BlockDriverState *check_to_replace_node(BlockDriverState *parent_bs,
+                                        BlockDriverState *backing_bs,
                                         const char *node_name, Error **errp);
 
 /* async block I/O */
diff --git a/block.c b/block.c
index a3a121551a..69b56771f7 100644
--- a/block.c
+++ b/block.c
@@ -5161,7 +5161,55 @@ bool bdrv_is_first_non_filter(BlockDriverState *candidate)
     return false;
 }
 
+static bool is_child_of(BlockDriverState *child, BlockDriverState *parent)
+{
+    BdrvChild *c;
+
+    if (!parent) {
+        return false;
+    }
+
+    QLIST_FOREACH(c, &parent->children, next) {
+        if (c->bs == child || is_child_of(child, c->bs)) {
+            return true;
+        }
+    }
+
+    return false;
+}
+
+/* Return true if there are only filters in [@top, @base).  Note that
+ * this may include quorum (which bdrv_chain_contains() cannot
+ * handle). */
+static bool is_filtered_child(BlockDriverState *top, BlockDriverState *base)
+{
+    BdrvChild *c;
+
+    if (!top) {
+        return false;
+    }
+
+    if (top == base) {
+        return true;
+    }
+
+    if (!top->drv->is_filter) {
+        return false;
+    }
+
+    QLIST_FOREACH(c, &top->children, next) {
+        if (is_filtered_child(c->bs, base)) {
+            return true;
+        }
+    }
+
+    return false;
+}
+
+/* @parent_bs is mirror's source BDS, @backing_bs is the BDS which
+ * will be attached to the target when mirror completes */
 BlockDriverState *check_to_replace_node(BlockDriverState *parent_bs,
+                                        BlockDriverState *backing_bs,
                                         const char *node_name, Error **errp)
 {
     BlockDriverState *to_replace_bs = bdrv_find_node(node_name);
@@ -5180,13 +5228,28 @@ BlockDriverState *check_to_replace_node(BlockDriverState *parent_bs,
         goto out;
     }
 
-    /* We don't want arbitrary node of the BDS chain to be replaced only the top
-     * most non filter in order to prevent data corruption.
-     * Another benefit is that this tests exclude backing files which are
-     * blocked by the backing blockers.
-     */
-    if (!bdrv_recurse_is_first_non_filter(parent_bs, to_replace_bs)) {
-        error_setg(errp, "Only top most non filter can be replaced");
+    /* If to_replace_bs is (recursively) a child of backing_bs,
+     * replacing it may create a loop.  We cannot allow that. */
+    if (to_replace_bs == backing_bs || is_child_of(to_replace_bs, backing_bs)) {
+        error_setg(errp, "Replacing this node would result in a loop");
+        to_replace_bs = NULL;
+        goto out;
+    }
+
+    /* Mirror is designed in such a way that when it completes, the
+     * source BDS is seamlessly replaced.  It is therefore not allowed
+     * to replace a BDS where this condition would be violated, as that
+     * would defeat the purpose of mirror and could lead to data
+     * corruption.
+     * Therefore, between parent_bs and to_replace_bs there may be
+     * only filters (and the one on top must be a filter, too), so
+     * their data always stays in sync and mirror can complete and
+     * replace to_replace_bs without any possible corruptions. */
+    if (!is_filtered_child(parent_bs, to_replace_bs) &&
+        !is_filtered_child(to_replace_bs, parent_bs))
+    {
+        error_setg(errp, "The node to be replaced must be connected to the "
+                   "source through filter nodes only");
         to_replace_bs = NULL;
         goto out;
     }
diff --git a/blockdev.c b/blockdev.c
index 33dd6408c0..9ba9dc3043 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -3667,7 +3667,7 @@ static void blockdev_mirror_common(const char *job_id, BlockDriverState *bs,
     }
 
     if (has_replaces) {
-        BlockDriverState *to_replace_bs;
+        BlockDriverState *to_replace_bs, *backing_bs;
         AioContext *replace_aio_context;
         int64_t bs_size, replace_size;
 
@@ -3677,7 +3677,35 @@ static void blockdev_mirror_common(const char *job_id, BlockDriverState *bs,
             return;
         }
 
-        to_replace_bs = check_to_replace_node(bs, replaces, errp);
+        if (backing_mode == MIRROR_SOURCE_BACKING_CHAIN ||
+            backing_mode == MIRROR_OPEN_BACKING_CHAIN)
+        {
+            /* While we do not quite know what OPEN_BACKING_CHAIN
+             * (used for mode=existing) will yield, it is probably
+             * best to restrict it exactly like SOURCE_BACKING_CHAIN,
+             * because that is our best guess */
+            switch (sync) {
+            case MIRROR_SYNC_MODE_FULL:
+                backing_bs = NULL;
+                break;
+
+            case MIRROR_SYNC_MODE_TOP:
+                backing_bs = bdrv_filtered_cow_bs(bdrv_skip_rw_filters(bs));
+                break;
+
+            case MIRROR_SYNC_MODE_NONE:
+                backing_bs = bs;
+                break;
+
+            default:
+                abort();
+            }
+        } else {
+            assert(backing_mode == MIRROR_LEAVE_BACKING_CHAIN);
+            backing_bs = bdrv_filtered_cow_bs(bdrv_skip_rw_filters(target));
+        }
+
+        to_replace_bs = check_to_replace_node(bs, backing_bs, replaces, errp);
         if (!to_replace_bs) {
             return;
         }
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [Qemu-devel] [PATCH v2 06/11] iotests: Add tests for mirror @replaces loops
  2018-08-09 22:31 [Qemu-devel] [PATCH v2 00/11] block: Deal with filters Max Reitz
                   ` (4 preceding siblings ...)
  2018-08-09 22:31 ` [Qemu-devel] [PATCH v2 05/11] block: Fix check_to_replace_node() Max Reitz
@ 2018-08-09 22:31 ` Max Reitz
  2018-08-09 22:47   ` Max Reitz
  2018-11-12 22:54   ` Eric Blake
  2018-08-09 22:31 ` [Qemu-devel] [PATCH v2 07/11] block: Leave BDS.backing_file constant Max Reitz
                   ` (7 subsequent siblings)
  13 siblings, 2 replies; 27+ messages in thread
From: Max Reitz @ 2018-08-09 22:31 UTC (permalink / raw)
  To: qemu-block; +Cc: qemu-devel, Max Reitz, Kevin Wolf

This adds two tests for cases where our old check_to_replace_node()
function failed to detect that executing this job with these parameters
would result in a cyclic graph.

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 tests/qemu-iotests/041     | 124 +++++++++++++++++++++++++++++++++++++
 tests/qemu-iotests/041.out |   4 +-
 2 files changed, 126 insertions(+), 2 deletions(-)

diff --git a/tests/qemu-iotests/041 b/tests/qemu-iotests/041
index c20ac7da87..186bd0f031 100755
--- a/tests/qemu-iotests/041
+++ b/tests/qemu-iotests/041
@@ -1061,5 +1061,129 @@ class TestOrphanedSource(iotests.QMPTestCase):
                              target='dest-ro')
         self.assert_qmp(result, 'error/class', 'GenericError')
 
+# Various tests for the @replaces option (independent of quorum)
+class TestReplaces(iotests.QMPTestCase):
+    def setUp(self):
+        self.vm = iotests.VM()
+        self.vm.launch()
+
+    def tearDown(self):
+        self.vm.shutdown()
+
+    def test_drive_mirror_loop(self):
+        qemu_img('create', '-f', iotests.imgfmt, test_img, '1M')
+
+        result = self.vm.qmp('object-add', qom_type='throttle-group', id='tg')
+        self.assert_qmp(result, 'return', {})
+
+        result = self.vm.qmp('blockdev-add', **{
+                    'node-name': 'source',
+                    'driver': 'throttle',
+                    'throttle-group': 'tg',
+                    'file': {
+                        'node-name': 'filtered',
+                        'driver': 'qcow2',
+                        'file': {
+                            'driver': 'file',
+                            'filename': test_img
+                        }
+                    }
+                })
+        self.assert_qmp(result, 'return', {})
+
+        # Mirror from @source to @target in sync=none, so that @source
+        # will be @target's backing file; but replace @filtered.
+        # Then, @target's backing file will be @source, whose backing
+        # file is now @target instead of @filtered.  That is a loop.
+        # (But apart from the loop, replacing @filtered instead of
+        # @source is fine, because both are just filtered versions of
+        # each other.)
+        result = self.vm.qmp('drive-mirror',
+                             job_id='mirror',
+                             device='source',
+                             target=target_img,
+                             format=iotests.imgfmt,
+                             node_name='target',
+                             sync='none',
+                             replaces='filtered')
+        if 'error' in result:
+            # This is the correct result
+            self.assert_qmp(result, 'error/class', 'GenericError')
+        else:
+            # This is wrong, but let's run it to the bitter conclusion
+            self.complete_and_wait(drive='mirror')
+            # Fail for good measure, although qemu should have crashed
+            # anyway
+            self.fail('Loop creation was successful')
+
+        os.remove(test_img)
+        try:
+            os.remove(target_img)
+        except OSError:
+            pass
+
+    def test_blockdev_mirror_loop(self):
+        qemu_img('create', '-f', iotests.imgfmt, test_img, '1M')
+        qemu_img('create', '-f', iotests.imgfmt, target_img, '1M')
+
+        result = self.vm.qmp('object-add', qom_type='throttle-group', id='tg')
+        self.assert_qmp(result, 'return', {})
+
+        result = self.vm.qmp('blockdev-add', **{
+                    'node-name': 'source',
+                    'driver': 'throttle',
+                    'throttle-group': 'tg',
+                    'file': {
+                        'node-name': 'middle',
+                        'driver': 'throttle',
+                        'throttle-group': 'tg',
+                        'file': {
+                            'node-name': 'bottom',
+                            'driver': 'qcow2',
+                            'file': {
+                                'driver': 'file',
+                                'filename': test_img
+                            }
+                        }
+                    }
+                })
+        self.assert_qmp(result, 'return', {})
+
+        result = self.vm.qmp('blockdev-add', **{
+                    'node-name': 'target',
+                    'driver': 'qcow2',
+                    'file': {
+                        'driver': 'file',
+                        'filename': target_img
+                    },
+                    'backing': 'middle'
+                })
+
+        # Mirror from @source to @target.  With blockdev-mirror, the
+        # current (old) backing file is retained (which is @middle).
+        # By replacing @bottom, @middle's file will be @target, whose
+        # backing file is @middle again.  That is a loop.
+        # (But apart from the loop, replacing @bottom instead of
+        # @source is fine, because both are just filtered versions of
+        # each other.)
+        result = self.vm.qmp('blockdev-mirror',
+                             job_id='mirror',
+                             device='source',
+                             target='target',
+                             sync='full',
+                             replaces='bottom')
+        if 'error' in result:
+            # This is the correct result
+            self.assert_qmp(result, 'error/class', 'GenericError')
+        else:
+            # This is wrong, but let's run it to the bitter conclusion
+            self.complete_and_wait(drive='mirror')
+            # Fail for good measure, although qemu should have crashed
+            # anyway
+            self.fail('Loop creation was successful')
+
+        os.remove(test_img)
+        os.remove(target_img)
+
 if __name__ == '__main__':
     iotests.main(supported_fmts=['qcow2', 'qed'])
diff --git a/tests/qemu-iotests/041.out b/tests/qemu-iotests/041.out
index c28b392b87..d71481b010 100644
--- a/tests/qemu-iotests/041.out
+++ b/tests/qemu-iotests/041.out
@@ -1,5 +1,5 @@
-.....................................................................................
+.......................................................................................
 ----------------------------------------------------------------------
-Ran 85 tests
+Ran 87 tests
 
 OK
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [Qemu-devel] [PATCH v2 07/11] block: Leave BDS.backing_file constant
  2018-08-09 22:31 [Qemu-devel] [PATCH v2 00/11] block: Deal with filters Max Reitz
                   ` (5 preceding siblings ...)
  2018-08-09 22:31 ` [Qemu-devel] [PATCH v2 06/11] iotests: Add tests for mirror @replaces loops Max Reitz
@ 2018-08-09 22:31 ` Max Reitz
  2018-11-12 23:08   ` Eric Blake
  2018-08-09 22:31 ` [Qemu-devel] [PATCH v2 08/11] iotests: Add filter commit test cases Max Reitz
                   ` (6 subsequent siblings)
  13 siblings, 1 reply; 27+ messages in thread
From: Max Reitz @ 2018-08-09 22:31 UTC (permalink / raw)
  To: qemu-block; +Cc: qemu-devel, Max Reitz, Kevin Wolf

Parts of the block layer treat BDS.backing_file as if it were whatever
the image header says (i.e., if it is a relative path, it is relative to
the overlay), other parts treat it like a cache for
bs->backing->bs->filename (relative paths are relative to the CWD).
Considering bs->backing->bs->filename exists, let us make it mean the
former.

Among other things, this now allows the user to specify a base when
using qemu-img to commit an image file in a directory that is not the
CWD (assuming, everything uses relative filenames).

Before this patch:

$ ./qemu-img create -f qcow2 foo/bot.qcow2 1M
$ ./qemu-img create -f qcow2 -b bot.qcow2 foo/mid.qcow2
$ ./qemu-img create -f qcow2 -b mid.qcow2 foo/top.qcow2
$ ./qemu-img commit -b mid.qcow2 foo/top.qcow2
qemu-img: Did not find 'mid.qcow2' in the backing chain of 'foo/top.qcow2'
$ ./qemu-img commit -b foo/mid.qcow2 foo/top.qcow2
qemu-img: Did not find 'foo/mid.qcow2' in the backing chain of 'foo/top.qcow2'
$ ./qemu-img commit -b $PWD/foo/mid.qcow2 foo/top.qcow2
qemu-img: Did not find '[...]/foo/mid.qcow2' in the backing chain of 'foo/top.qcow2'

After this patch:

$ ./qemu-img commit -b mid.qcow2 foo/top.qcow2
Image committed.
$ ./qemu-img commit -b foo/mid.qcow2 foo/top.qcow2
qemu-img: Did not find 'foo/mid.qcow2' in the backing chain of 'foo/top.qcow2'
$ ./qemu-img commit -b $PWD/foo/mid.qcow2 foo/top.qcow2
Image committed.

With this change, bdrv_find_backing_image() must look at whether the
user has overridden a BDS's backing file.  If so, it can no longer use
bs->backing_file, but must instead compare the given filename against
the backing node's filename directly.

Note that this changes the QAPI output for a node's backing_file.  We
had very inconsistent output there (sometimes what the image header
said, sometimes the actual filename of the backing image).  This
inconsistent output was effectively useless, so we have to decide one
way or the other.  Considering that bs->backing_file usually at runtime
contained the path to the image relative to qemu's CWD (or absolute),
this patch changes QAPI's backing_file to always report the
bs->backing->bs->filename from now on.  If you want to receive the image
header information, you have to refer to full-backing-filename.

This necessitates a change to iotest 228.  The interesting information
it really wanted is the image header, and it can get that now, but it
has to use full-backing-filename instead of backing_file.  Because of
this patch's changes to bs->backing_file's behavior, we also need some
reference output changes.

Along with the changes to bs->backing_file, stop updating
BDS.backing_format in bdrv_backing_attach() as well.  This necessitates
a change to the reference output of iotest 191.

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 include/block/block_int.h  | 14 +++++++++-----
 block.c                    | 29 ++++++++++++++++++++++-------
 block/qapi.c               |  7 ++++---
 qemu-img.c                 | 12 ++++++++++--
 tests/qemu-iotests/191.out |  1 -
 tests/qemu-iotests/228     |  6 +++---
 tests/qemu-iotests/228.out |  6 +++---
 7 files changed, 51 insertions(+), 24 deletions(-)

diff --git a/include/block/block_int.h b/include/block/block_int.h
index d3d8b22155..8f2c515ec1 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -737,11 +737,15 @@ struct BlockDriverState {
     bool walking_aio_notifiers; /* to make removal during iteration safe */
 
     char filename[PATH_MAX];
-    char backing_file[PATH_MAX]; /* if non zero, the image is a diff of
-                                    this file image */
-    /* The backing filename indicated by the image header; if we ever
-     * open this file, then this is replaced by the resulting BDS's
-     * filename (i.e. after a bdrv_refresh_filename() run). */
+    /* If non-zero, the image is a diff of this image file.  Note that
+     * this the name given in the image header and may therefore not
+     * be equal to .backing->bs->filename, and relative paths are
+     * resolved relatively to their overlay. */
+    char backing_file[PATH_MAX];
+    /* The backing filename indicated by the image header.  Contrary
+     * to backing_file, if we ever open this file, auto_backing_file
+     * is replaced by the resulting BDS's filename (i.e. after a
+     * bdrv_refresh_filename() run). */
     char auto_backing_file[PATH_MAX];
     char backing_format[16]; /* if non-zero and backing_file exists */
 
diff --git a/block.c b/block.c
index 69b56771f7..434074b7dd 100644
--- a/block.c
+++ b/block.c
@@ -78,6 +78,8 @@ static BlockDriverState *bdrv_open_inherit(const char *filename,
                                            const BdrvChildRole *child_role,
                                            Error **errp);
 
+static bool bdrv_backing_overridden(BlockDriverState *bs);
+
 /* If non-zero, use only whitelisted block drivers */
 static int use_bdrv_whitelist;
 
@@ -995,10 +997,6 @@ static void bdrv_backing_attach(BdrvChild *c)
     bdrv_refresh_filename(backing_hd);
 
     parent->open_flags &= ~BDRV_O_NO_BACKING;
-    pstrcpy(parent->backing_file, sizeof(parent->backing_file),
-            backing_hd->filename);
-    pstrcpy(parent->backing_format, sizeof(parent->backing_format),
-            backing_hd->drv ? backing_hd->drv->format_name : "");
 
     bdrv_op_block_all(backing_hd, parent->backing_blocker);
     /* Otherwise we won't be able to commit or stream */
@@ -4318,6 +4316,7 @@ BlockDriverState *bdrv_find_backing_image(BlockDriverState *bs,
     char *backing_file_full = NULL;
     char *filename_tmp = NULL;
     int is_protocol = 0;
+    bool filenames_refreshed = false;
     BlockDriverState *curr_bs = NULL;
     BlockDriverState *retval = NULL;
 
@@ -4340,9 +4339,25 @@ BlockDriverState *bdrv_find_backing_image(BlockDriverState *bs,
     {
         BlockDriverState *bs_below = bdrv_backing_chain_next(curr_bs);
 
-        /* If either of the filename paths is actually a protocol, then
-         * compare unmodified paths; otherwise make paths relative */
-        if (is_protocol || path_has_protocol(curr_bs->backing_file)) {
+        if (bdrv_backing_overridden(curr_bs)) {
+            /* If the backing file was overridden, we can only compare
+             * directly against the backing node's filename */
+
+            if (!filenames_refreshed) {
+                /* This will automatically refresh all of the
+                 * filenames in the rest of the backing chain, so we
+                 * only need to do this once */
+                bdrv_refresh_filename(bs_below);
+                filenames_refreshed = true;
+            }
+
+            if (strcmp(backing_file, bs_below->filename) == 0) {
+                retval = bs_below;
+                break;
+            }
+        } else if (is_protocol || path_has_protocol(curr_bs->backing_file)) {
+            /* If either of the filename paths is actually a protocol, then
+             * compare unmodified paths; otherwise make paths relative */
             char *backing_file_full_ret;
 
             if (strcmp(backing_file, curr_bs->backing_file) == 0) {
diff --git a/block/qapi.c b/block/qapi.c
index cbee819c13..f5288012f5 100644
--- a/block/qapi.c
+++ b/block/qapi.c
@@ -43,7 +43,7 @@ BlockDeviceInfo *bdrv_block_device_info(BlockBackend *blk,
                                         BlockDriverState *bs, Error **errp)
 {
     ImageInfo **p_image_info;
-    BlockDriverState *bs0;
+    BlockDriverState *bs0, *backing;
     BlockDeviceInfo *info;
 
     if (!bs->drv) {
@@ -72,9 +72,10 @@ BlockDeviceInfo *bdrv_block_device_info(BlockBackend *blk,
         info->node_name = g_strdup(bs->node_name);
     }
 
-    if (bs->backing_file[0]) {
+    backing = bdrv_filtered_cow_bs(bs);
+    if (backing) {
         info->has_backing_file = true;
-        info->backing_file = g_strdup(bs->backing_file);
+        info->backing_file = g_strdup(backing->filename);
     }
 
     info->detect_zeroes = bs->detect_zeroes;
diff --git a/qemu-img.c b/qemu-img.c
index 307e72c9fd..6d405fb6d4 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -3292,7 +3292,7 @@ static int img_rebase(int argc, char **argv)
 
     /* For safe rebasing we need to compare old and new backing file */
     if (!unsafe) {
-        char backing_name[PATH_MAX];
+        char *backing_name;
         QDict *options = NULL;
 
         if (bs->backing_format[0] != '\0') {
@@ -3306,16 +3306,24 @@ static int img_rebase(int argc, char **argv)
             }
             qdict_put_bool(options, BDRV_OPT_FORCE_SHARE, true);
         }
-        bdrv_get_backing_filename(bs, backing_name, sizeof(backing_name));
+        backing_name = bdrv_get_full_backing_filename(bs, &local_err);
+        if (local_err) {
+            error_reportf_err(local_err,
+                              "Could not resolve old backing file name: ");
+            ret = -1;
+            goto out;
+        }
         blk_old_backing = blk_new_open(backing_name, NULL,
                                        options, src_flags, &local_err);
         if (!blk_old_backing) {
             error_reportf_err(local_err,
                               "Could not open old backing file '%s': ",
                               backing_name);
+            g_free(backing_name);
             ret = -1;
             goto out;
         }
+        g_free(backing_name);
 
         if (out_baseimg[0]) {
             const char *overlay_filename;
diff --git a/tests/qemu-iotests/191.out b/tests/qemu-iotests/191.out
index 31a0c7d4c4..b87cddc56f 100644
--- a/tests/qemu-iotests/191.out
+++ b/tests/qemu-iotests/191.out
@@ -604,7 +604,6 @@ wrote 65536/65536 bytes at offset 1048576
                     "backing-filename": "TEST_DIR/t.IMGFMT.base",
                     "dirty-flag": false
                 },
-                "backing-filename-format": "IMGFMT",
                 "virtual-size": 67108864,
                 "filename": "TEST_DIR/t.IMGFMT.ovl3",
                 "cluster-size": 65536,
diff --git a/tests/qemu-iotests/228 b/tests/qemu-iotests/228
index a2200efba5..5f46a1c2c8 100755
--- a/tests/qemu-iotests/228
+++ b/tests/qemu-iotests/228
@@ -33,7 +33,7 @@ def log_node_info(node):
 
     log('bs->filename: ' + node['image']['filename'],
         filters=[filter_testfiles, filter_imgfmt])
-    log('bs->backing_file: ' + node['backing_file'],
+    log('bs->backing_file: ' + node['image']['full-backing-filename'],
         filters=[filter_testfiles, filter_imgfmt])
 
     if 'backing-image' in node['image']:
@@ -69,8 +69,8 @@ with iotests.FilePath('base.img') as base_img_path, \
                 },
                 filters=[filter_testfiles, filter_imgfmt])
 
-    # Filename should be plain, and the backing filename should not
-    # contain the "file:" prefix
+    # Filename should be plain, and the backing node filename should
+    # not contain the "file:" prefix
     log_node_info(vm.node_info('node0'))
 
     vm.qmp_log('blockdev-del', node_name='node0')
diff --git a/tests/qemu-iotests/228.out b/tests/qemu-iotests/228.out
index b01de766bc..e9a91c7af0 100644
--- a/tests/qemu-iotests/228.out
+++ b/tests/qemu-iotests/228.out
@@ -4,7 +4,7 @@
 {u'return': {}}
 
 bs->filename: TEST_DIR/PID-top.img
-bs->backing_file: TEST_DIR/PID-base.img
+bs->backing_file: file:TEST_DIR/PID-base.img
 bs->backing->bs->filename: TEST_DIR/PID-base.img
 
 {'execute': 'blockdev-del', 'arguments': {'node_name': 'node0'}}
@@ -41,7 +41,7 @@ bs->backing->bs->filename: TEST_DIR/PID-base.img
 {u'return': {}}
 
 bs->filename: TEST_DIR/PID-top.img
-bs->backing_file: TEST_DIR/PID-base.img
+bs->backing_file: file:TEST_DIR/PID-base.img
 bs->backing->bs->filename: TEST_DIR/PID-base.img
 
 {'execute': 'blockdev-del', 'arguments': {'node_name': 'node0'}}
@@ -55,7 +55,7 @@ bs->backing->bs->filename: TEST_DIR/PID-base.img
 {u'return': {}}
 
 bs->filename: json:{"backing": {"driver": "null-co"}, "driver": "IMGFMT", "file": {"driver": "file", "filename": "TEST_DIR/PID-top.img"}}
-bs->backing_file: null-co://
+bs->backing_file: TEST_DIR/PID-base.img
 bs->backing->bs->filename: null-co://
 
 {'execute': 'blockdev-del', 'arguments': {'node_name': 'node0'}}
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [Qemu-devel] [PATCH v2 08/11] iotests: Add filter commit test cases
  2018-08-09 22:31 [Qemu-devel] [PATCH v2 00/11] block: Deal with filters Max Reitz
                   ` (6 preceding siblings ...)
  2018-08-09 22:31 ` [Qemu-devel] [PATCH v2 07/11] block: Leave BDS.backing_file constant Max Reitz
@ 2018-08-09 22:31 ` Max Reitz
  2018-08-09 22:31 ` [Qemu-devel] [PATCH v2 09/11] iotests: Add filter mirror " Max Reitz
                   ` (5 subsequent siblings)
  13 siblings, 0 replies; 27+ messages in thread
From: Max Reitz @ 2018-08-09 22:31 UTC (permalink / raw)
  To: qemu-block; +Cc: qemu-devel, Max Reitz, Kevin Wolf

This patch adds some tests on how commit copes with filter nodes.

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 tests/qemu-iotests/040     | 130 +++++++++++++++++++++++++++++++++++++
 tests/qemu-iotests/040.out |   4 +-
 2 files changed, 132 insertions(+), 2 deletions(-)

diff --git a/tests/qemu-iotests/040 b/tests/qemu-iotests/040
index 1beb5e6dab..f0544d6107 100755
--- a/tests/qemu-iotests/040
+++ b/tests/qemu-iotests/040
@@ -346,5 +346,135 @@ class TestReopenOverlay(ImageCommitTestCase):
     def test_reopen_overlay(self):
         self.run_commit_test(self.img1, self.img0)
 
+class TestCommitWithFilters(iotests.QMPTestCase):
+    img0 = os.path.join(iotests.test_dir, '0.img')
+    img1 = os.path.join(iotests.test_dir, '1.img')
+    img2 = os.path.join(iotests.test_dir, '2.img')
+    img3 = os.path.join(iotests.test_dir, '3.img')
+
+    def setUp(self):
+        qemu_img('create', '-f', iotests.imgfmt, self.img0, '1M')
+        qemu_img('create', '-f', iotests.imgfmt, self.img1, '1M')
+        qemu_img('create', '-f', iotests.imgfmt, self.img2, '1M')
+        qemu_img('create', '-f', iotests.imgfmt, self.img3, '1M')
+
+        self.vm = iotests.VM()
+        self.vm.launch()
+        result = self.vm.qmp('object-add', qom_type='throttle-group', id='tg')
+        self.assert_qmp(result, 'return', {})
+
+        result = self.vm.qmp('blockdev-add', **{
+                'node-name': 'top-filter',
+                'driver': 'throttle',
+                'throttle-group': 'tg',
+                'file': {
+                    'node-name': 'cow-3',
+                    'driver': iotests.imgfmt,
+                    'file': {
+                        'driver': 'file',
+                        'filename': self.img3
+                    },
+                    'backing': {
+                        'node-name': 'cow-2',
+                        'driver': iotests.imgfmt,
+                        'file': {
+                            'driver': 'file',
+                            'filename': self.img2
+                        },
+                        'backing': {
+                            'node-name': 'cow-1',
+                            'driver': iotests.imgfmt,
+                            'file': {
+                                'driver': 'file',
+                                'filename': self.img1
+                            },
+                            'backing': {
+                                'node-name': 'bottom-filter',
+                                'driver': 'throttle',
+                                'throttle-group': 'tg',
+                                'file': {
+                                    'node-name': 'cow-0',
+                                    'driver': iotests.imgfmt,
+                                    'file': {
+                                        'driver': 'file',
+                                        'filename': self.img0
+                                    }
+                                }
+                            }
+                        }
+                    }
+                }
+            })
+        self.assert_qmp(result, 'return', {})
+
+    def tearDown(self):
+        self.vm.shutdown()
+        os.remove(self.img3)
+        os.remove(self.img2)
+        os.remove(self.img1)
+        os.remove(self.img0)
+
+    # Filters make for funny filenames, so we cannot just use
+    # self.imgX for the block-commit parameters
+    def get_filename(self, node):
+        return self.vm.node_info(node)['image']['filename']
+
+    def test_filterless_commit(self):
+        self.assert_no_active_block_jobs()
+        result = self.vm.qmp('block-commit',
+                             job_id='commit',
+                             device='top-filter',
+                             top=self.get_filename('cow-2'),
+                             base=self.get_filename('cow-1'))
+        self.assert_qmp(result, 'return', {})
+        self.wait_until_completed(drive='commit')
+
+    def test_commit_through_filter(self):
+        self.assert_no_active_block_jobs()
+        result = self.vm.qmp('block-commit',
+                             job_id='commit',
+                             device='top-filter',
+                             top=self.get_filename('cow-1'),
+                             base=self.get_filename('cow-0'))
+        # Cannot commit through explicitly added filters (yet,
+        # although in the future we probably want to make users use
+        # blockdev-copy for this)
+        self.assert_qmp(result, 'error/class', 'GenericError')
+        self.assert_qmp(result, 'error/desc', 'Cannot commit through explicit filter nodes')
+
+    def test_filtered_active_commit_with_filter(self):
+        self.assert_no_active_block_jobs()
+        result = self.vm.qmp('block-commit',
+                             job_id='commit',
+                             device='top-filter',
+                             base=self.get_filename('cow-2'))
+        # Not specifying @top means active commit, so including the
+        # filter on top (which is not allowed right now)
+        self.assert_qmp(result, 'error/class', 'GenericError')
+        self.assert_qmp(result, 'error/desc', 'Cannot commit through explicit filter nodes')
+
+    def test_filtered_active_commit_without_filter(self):
+        cow3_name = self.get_filename('cow-3')
+
+        self.assert_no_active_block_jobs()
+        result = self.vm.qmp('block-commit',
+                             job_id='commit',
+                             device='top-filter',
+                             top=cow3_name,
+                             base=self.get_filename('cow-2'))
+        # This is how you'd want to specify committing img3 into img2
+        # disregarding the filter on top of img3 -- but that does not
+        # work, because you can only specify names of backing files
+        # (and img3 is not a backing file).  The solution for this
+        # would be for block-commit to accept node names.
+        # Note that even if it did work, the above command would
+        # result in a non-active commit, because img3 is not the top
+        # node.  Which is wrong, because img3 can still be written to,
+        # so it should be an active commit, but that is a different
+        # story.
+        self.assert_qmp(result, 'error/class', 'GenericError')
+        self.assert_qmp(result, 'error/desc',
+                        'Top image file %s not found' % cow3_name)
+
 if __name__ == '__main__':
     iotests.main(supported_fmts=['qcow2', 'qed'])
diff --git a/tests/qemu-iotests/040.out b/tests/qemu-iotests/040.out
index e20a75ce4f..49f84261d0 100644
--- a/tests/qemu-iotests/040.out
+++ b/tests/qemu-iotests/040.out
@@ -1,5 +1,5 @@
-.............................
+.................................
 ----------------------------------------------------------------------
-Ran 29 tests
+Ran 33 tests
 
 OK
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [Qemu-devel] [PATCH v2 09/11] iotests: Add filter mirror test cases
  2018-08-09 22:31 [Qemu-devel] [PATCH v2 00/11] block: Deal with filters Max Reitz
                   ` (7 preceding siblings ...)
  2018-08-09 22:31 ` [Qemu-devel] [PATCH v2 08/11] iotests: Add filter commit test cases Max Reitz
@ 2018-08-09 22:31 ` Max Reitz
  2018-08-09 22:31 ` [Qemu-devel] [PATCH v2 10/11] iotests: Add test for commit in sub directory Max Reitz
                   ` (4 subsequent siblings)
  13 siblings, 0 replies; 27+ messages in thread
From: Max Reitz @ 2018-08-09 22:31 UTC (permalink / raw)
  To: qemu-block; +Cc: qemu-devel, Max Reitz, Kevin Wolf

This patch adds some test cases how mirroring relates to filters.  One
of them tests what happens when you mirror off a filtered COW node, two
others use the mirror filter node as basically our only example of an
implicitly created filter node so far (besides the commit filter).

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 tests/qemu-iotests/041     | 146 ++++++++++++++++++++++++++++++++++++-
 tests/qemu-iotests/041.out |   4 +-
 2 files changed, 147 insertions(+), 3 deletions(-)

diff --git a/tests/qemu-iotests/041 b/tests/qemu-iotests/041
index 186bd0f031..c21c8b36d5 100755
--- a/tests/qemu-iotests/041
+++ b/tests/qemu-iotests/041
@@ -20,8 +20,9 @@
 
 import time
 import os
+import json
 import iotests
-from iotests import qemu_img, qemu_io
+from iotests import qemu_img, qemu_img_pipe, qemu_io
 
 backing_img = os.path.join(iotests.test_dir, 'backing.img')
 target_backing_img = os.path.join(iotests.test_dir, 'target-backing.img')
@@ -1185,5 +1186,148 @@ class TestReplaces(iotests.QMPTestCase):
         os.remove(test_img)
         os.remove(target_img)
 
+# Tests for mirror with filters (and how the mirror filter behaves, as
+# an example for an implicit filter)
+class TestFilters(iotests.QMPTestCase):
+    def setUp(self):
+        qemu_img('create', '-f', iotests.imgfmt, backing_img, '1M')
+        qemu_img('create', '-f', iotests.imgfmt, '-b', backing_img, test_img)
+        qemu_img('create', '-f', iotests.imgfmt, '-b', backing_img, target_img)
+
+        qemu_io('-c', 'write -P 1 0 512k', backing_img)
+        qemu_io('-c', 'write -P 2 512k 512k', test_img)
+
+        self.vm = iotests.VM()
+        self.vm.launch()
+
+        result = self.vm.qmp('blockdev-add', **{
+                                'node-name': 'target',
+                                'driver': iotests.imgfmt,
+                                'file': {
+                                    'driver': 'file',
+                                    'filename': target_img
+                                },
+                                'backing': None
+                            })
+        self.assert_qmp(result, 'return', {})
+
+        self.filterless_chain = {
+                'node-name': 'source',
+                'driver': iotests.imgfmt,
+                'file': {
+                    'driver': 'file',
+                    'filename': test_img
+                },
+                'backing': {
+                    'node-name': 'backing',
+                    'driver': iotests.imgfmt,
+                    'file': {
+                        'driver': 'file',
+                        'filename': backing_img
+                    }
+                }
+            }
+
+    def tearDown(self):
+        self.vm.shutdown()
+
+        os.remove(test_img)
+        os.remove(target_img)
+        os.remove(backing_img)
+
+    def test_cor(self):
+        result = self.vm.qmp('blockdev-add', **{
+                                'node-name': 'filter',
+                                'driver': 'copy-on-read',
+                                'file': self.filterless_chain
+                            })
+        self.assert_qmp(result, 'return', {})
+
+        result = self.vm.qmp('blockdev-mirror',
+                             job_id='mirror',
+                             device='filter',
+                             target='target',
+                             sync='top')
+        self.assert_qmp(result, 'return', {})
+
+        self.complete_and_wait('mirror')
+
+        self.vm.qmp('blockdev-del', node_name='target')
+
+        target_map = qemu_img_pipe('map', '--output=json', target_img)
+        target_map = json.loads(target_map)
+
+        assert target_map[0]['start'] == 0
+        assert target_map[0]['length'] == 512 * 1024
+        assert target_map[0]['depth'] == 1
+
+        assert target_map[1]['start'] == 512 * 1024
+        assert target_map[1]['length'] == 512 * 1024
+        assert target_map[1]['depth'] == 0
+
+    def test_implicit_mirror_filter(self):
+        result = self.vm.qmp('blockdev-add', **self.filterless_chain)
+        self.assert_qmp(result, 'return', {})
+
+        # We need this so we can query from above the mirror node
+        result = self.vm.qmp('device_add',
+                             driver='virtio-blk',
+                             id='virtio',
+                             bus='pci.0',
+                             drive='source')
+        self.assert_qmp(result, 'return', {})
+
+        result = self.vm.qmp('blockdev-mirror',
+                             job_id='mirror',
+                             device='source',
+                             target='target',
+                             sync='top')
+        self.assert_qmp(result, 'return', {})
+
+        # The mirror filter is now an implicit node, so it should be
+        # invisible when querying the backing chain
+        device_info = self.vm.qmp('query-block')['return'][0]
+        assert device_info['qdev'] == '/machine/peripheral/virtio/virtio-backend'
+
+        assert device_info['inserted']['node-name'] == 'source'
+
+        image_info = device_info['inserted']['image']
+        assert image_info['filename'] == test_img
+        assert image_info['backing-image']['filename'] == backing_img
+
+        self.complete_and_wait('mirror')
+
+    def test_explicit_mirror_filter(self):
+        # Same test as above, but this time we give the mirror filter
+        # a node-name so it will not be invisible
+        result = self.vm.qmp('blockdev-add', **self.filterless_chain)
+        self.assert_qmp(result, 'return', {})
+
+        # We need this so we can query from above the mirror node
+        result = self.vm.qmp('device_add',
+                             driver='virtio-blk',
+                             id='virtio',
+                             bus='pci.0',
+                             drive='source')
+        self.assert_qmp(result, 'return', {})
+
+        result = self.vm.qmp('blockdev-mirror',
+                             job_id='mirror',
+                             device='source',
+                             target='target',
+                             sync='top',
+                             filter_node_name='mirror-filter')
+        self.assert_qmp(result, 'return', {})
+
+        # With a node-name given to it, the mirror filter should now
+        # be visible
+        device_info = self.vm.qmp('query-block')['return'][0]
+        assert device_info['qdev'] == '/machine/peripheral/virtio/virtio-backend'
+
+        assert device_info['inserted']['node-name'] == 'mirror-filter'
+
+        self.complete_and_wait('mirror')
+
+
 if __name__ == '__main__':
     iotests.main(supported_fmts=['qcow2', 'qed'])
diff --git a/tests/qemu-iotests/041.out b/tests/qemu-iotests/041.out
index d71481b010..2c448b4239 100644
--- a/tests/qemu-iotests/041.out
+++ b/tests/qemu-iotests/041.out
@@ -1,5 +1,5 @@
-.......................................................................................
+..........................................................................................
 ----------------------------------------------------------------------
-Ran 87 tests
+Ran 90 tests
 
 OK
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [Qemu-devel] [PATCH v2 10/11] iotests: Add test for commit in sub directory
  2018-08-09 22:31 [Qemu-devel] [PATCH v2 00/11] block: Deal with filters Max Reitz
                   ` (8 preceding siblings ...)
  2018-08-09 22:31 ` [Qemu-devel] [PATCH v2 09/11] iotests: Add filter mirror " Max Reitz
@ 2018-08-09 22:31 ` Max Reitz
  2018-08-09 22:31 ` [Qemu-devel] [PATCH v2 11/11] iotests: Test committing to overridden backing Max Reitz
                   ` (3 subsequent siblings)
  13 siblings, 0 replies; 27+ messages in thread
From: Max Reitz @ 2018-08-09 22:31 UTC (permalink / raw)
  To: qemu-block; +Cc: qemu-devel, Max Reitz, Kevin Wolf

Add a test for committing an overlay in a sub directory to one of the
images in its backing chain, using both relative and absolute filenames.

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 tests/qemu-iotests/020     | 36 ++++++++++++++++++++++++++++++++++++
 tests/qemu-iotests/020.out | 10 ++++++++++
 2 files changed, 46 insertions(+)

diff --git a/tests/qemu-iotests/020 b/tests/qemu-iotests/020
index eac5080f83..d0ac33922d 100755
--- a/tests/qemu-iotests/020
+++ b/tests/qemu-iotests/020
@@ -32,6 +32,11 @@ _cleanup()
 	_cleanup_test_img
     rm -f "$TEST_IMG.base"
     rm -f "$TEST_IMG.orig"
+
+    rm -f "$TEST_DIR/subdir/t.$IMGFMT.base"
+    rm -f "$TEST_DIR/subdir/t.$IMGFMT.mid"
+    rm -f "$TEST_DIR/subdir/t.$IMGFMT"
+    rmdir "$TEST_DIR/subdir" &> /dev/null
 }
 trap "_cleanup; exit \$status" 0 1 2 3 15
 
@@ -135,6 +140,37 @@ $QEMU_IO -c 'writev 0 64k' "$TEST_IMG" | _filter_qemu_io
 $QEMU_IMG commit "$TEST_IMG"
 _cleanup
 
+
+echo
+echo 'Testing commit in sub-directory with relative filenames'
+echo
+
+pushd "$TEST_DIR" > /dev/null
+
+mkdir subdir
+
+TEST_IMG="subdir/t.$IMGFMT.base" _make_test_img 1M
+TEST_IMG="subdir/t.$IMGFMT.mid" _make_test_img -b "t.$IMGFMT.base"
+TEST_IMG="subdir/t.$IMGFMT" _make_test_img -b "t.$IMGFMT.mid"
+
+# Should work
+$QEMU_IMG commit -b "t.$IMGFMT.mid" "subdir/t.$IMGFMT"
+
+# Might theoretically work, but does not in practice (we have to
+# decide between this and the above; and since we always represent
+# backing file names as relative to the overlay, we go for the above)
+$QEMU_IMG commit -b "subdir/t.$IMGFMT.mid" "subdir/t.$IMGFMT" 2>&1 | \
+    _filter_imgfmt
+
+# This should work as well
+$QEMU_IMG commit -b "$TEST_DIR/subdir/t.$IMGFMT.mid" "subdir/t.$IMGFMT"
+
+popd > /dev/null
+
+# Now let's try with just absolute filenames
+$QEMU_IMG commit -b "$TEST_DIR/subdir/t.$IMGFMT.mid" \
+    "$TEST_DIR/subdir/t.$IMGFMT"
+
 # success, all done
 echo "*** done"
 rm -f $seq.full
diff --git a/tests/qemu-iotests/020.out b/tests/qemu-iotests/020.out
index 4b722b2dd0..228c37dded 100644
--- a/tests/qemu-iotests/020.out
+++ b/tests/qemu-iotests/020.out
@@ -1094,4 +1094,14 @@ Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1048576 backing_file=json:{'driv
 wrote 65536/65536 bytes at offset 0
 64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 qemu-img: Block job failed: No space left on device
+
+Testing commit in sub-directory with relative filenames
+
+Formatting 'subdir/t.IMGFMT.base', fmt=IMGFMT size=1048576
+Formatting 'subdir/t.IMGFMT.mid', fmt=IMGFMT size=1048576 backing_file=t.IMGFMT.base
+Formatting 'subdir/t.IMGFMT', fmt=IMGFMT size=1048576 backing_file=t.IMGFMT.mid
+Image committed.
+qemu-img: Did not find 'subdir/t.IMGFMT.mid' in the backing chain of 'subdir/t.IMGFMT'
+Image committed.
+Image committed.
 *** done
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [Qemu-devel] [PATCH v2 11/11] iotests: Test committing to overridden backing
  2018-08-09 22:31 [Qemu-devel] [PATCH v2 00/11] block: Deal with filters Max Reitz
                   ` (9 preceding siblings ...)
  2018-08-09 22:31 ` [Qemu-devel] [PATCH v2 10/11] iotests: Add test for commit in sub directory Max Reitz
@ 2018-08-09 22:31 ` Max Reitz
  2018-08-09 22:33 ` [Qemu-devel] [PATCH v2 00/11] block: Deal with filters Max Reitz
                   ` (2 subsequent siblings)
  13 siblings, 0 replies; 27+ messages in thread
From: Max Reitz @ 2018-08-09 22:31 UTC (permalink / raw)
  To: qemu-block; +Cc: qemu-devel, Max Reitz, Kevin Wolf

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 tests/qemu-iotests/040     | 61 ++++++++++++++++++++++++++++++++++++++
 tests/qemu-iotests/040.out |  4 +--
 2 files changed, 63 insertions(+), 2 deletions(-)

diff --git a/tests/qemu-iotests/040 b/tests/qemu-iotests/040
index f0544d6107..90c03e745b 100755
--- a/tests/qemu-iotests/040
+++ b/tests/qemu-iotests/040
@@ -476,5 +476,66 @@ class TestCommitWithFilters(iotests.QMPTestCase):
         self.assert_qmp(result, 'error/desc',
                         'Top image file %s not found' % cow3_name)
 
+class TestCommitWithOverriddenBacking(iotests.QMPTestCase):
+    img_base_a = os.path.join(iotests.test_dir, 'base_a.img')
+    img_base_b = os.path.join(iotests.test_dir, 'base_b.img')
+    img_top = os.path.join(iotests.test_dir, 'top.img')
+
+    def setUp(self):
+        qemu_img('create', '-f', iotests.imgfmt, self.img_base_a, '1M')
+        qemu_img('create', '-f', iotests.imgfmt, self.img_base_b, '1M')
+        qemu_img('create', '-f', iotests.imgfmt, '-b', self.img_base_a, \
+                 self.img_top)
+
+        self.vm = iotests.VM()
+        self.vm.launch()
+
+        # Use base_b instead of base_a as the backing of top
+        result = self.vm.qmp('blockdev-add', **{
+                                'node-name': 'top',
+                                'driver': iotests.imgfmt,
+                                'file': {
+                                    'driver': 'file',
+                                    'filename': self.img_top
+                                },
+                                'backing': {
+                                    'node-name': 'base',
+                                    'driver': iotests.imgfmt,
+                                    'file': {
+                                        'driver': 'file',
+                                        'filename': self.img_base_b
+                                    }
+                                }
+                            })
+        self.assert_qmp(result, 'return', {})
+
+    def tearDown(self):
+        self.vm.shutdown()
+        os.remove(self.img_top)
+        os.remove(self.img_base_a)
+        os.remove(self.img_base_b)
+
+    def test_commit_to_a(self):
+        # Try committing to base_a (which should fail, as top's
+        # backing image is base_b instead)
+        result = self.vm.qmp('block-commit',
+                             job_id='commit',
+                             device='top',
+                             base=self.img_base_a)
+        self.assert_qmp(result, 'error/class', 'GenericError')
+
+    def test_commit_to_b(self):
+        # Try committing to base_b (which should work, since that is
+        # actually top's backing image)
+        result = self.vm.qmp('block-commit',
+                             job_id='commit',
+                             device='top',
+                             base=self.img_base_b)
+        self.assert_qmp(result, 'return', {})
+
+        self.vm.event_wait('BLOCK_JOB_READY')
+        self.vm.qmp('block-job-complete', device='commit')
+        self.vm.event_wait('BLOCK_JOB_COMPLETED')
+
 if __name__ == '__main__':
     iotests.main(supported_fmts=['qcow2', 'qed'])
diff --git a/tests/qemu-iotests/040.out b/tests/qemu-iotests/040.out
index 49f84261d0..cfa5c0d0e6 100644
--- a/tests/qemu-iotests/040.out
+++ b/tests/qemu-iotests/040.out
@@ -1,5 +1,5 @@
-.................................
+...................................
 ----------------------------------------------------------------------
-Ran 33 tests
+Ran 35 tests
 
 OK
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH v2 00/11] block: Deal with filters
  2018-08-09 22:31 [Qemu-devel] [PATCH v2 00/11] block: Deal with filters Max Reitz
                   ` (10 preceding siblings ...)
  2018-08-09 22:31 ` [Qemu-devel] [PATCH v2 11/11] iotests: Test committing to overridden backing Max Reitz
@ 2018-08-09 22:33 ` Max Reitz
  2018-08-29 13:29 ` Max Reitz
  2018-11-09 22:47 ` Eric Blake
  13 siblings, 0 replies; 27+ messages in thread
From: Max Reitz @ 2018-08-09 22:33 UTC (permalink / raw)
  To: qemu-block; +Cc: qemu-devel, Kevin Wolf

[-- Attachment #1: Type: text/plain, Size: 3456 bytes --]

On 2018-08-10 00:31, Max Reitz wrote:

[...]

> v2:
> - Patch 4: We must clear BDS.exact_filename for filter nodes, or we
>   basically end up with a random filename for them.  This is achieved by
>   pulling the !drv->is_filter check into an inner condition.
>   (Fixes iotests 40 and 184)
> 
> - Patch 7: iotest 228 tried to use QAPI's backing_file information
>   (which was very inconsistent, so with 228 doing some funky backing
>   stuff, it really showed) to get some idea of what qemu thinks about
>   the image header.  Let's make it use full-backing-filename (and that
>   now really reflects the image header).

*cough* *cough* *cough*

git-backport-diff against v1:

Key:
[----] : patches are identical
[####] : number of functional differences between upstream/downstream patch
[down] : patch is downstream-only
The flags [FC] indicate (F)unctional and (C)ontextual differences,
respectively

001/11:[----] [--] 'block: Mark commit and mirror as filter drivers'
002/11:[----] [--] 'blockdev: Check @replaces in blockdev_mirror_common'
003/11:[----] [--] 'block: Filtered children access functions'
004/11:[0005] [FC] 'block: Storage child access function'
005/11:[----] [--] 'block: Fix check_to_replace_node()'
006/11:[----] [--] 'iotests: Add tests for mirror @replaces loops'
007/11:[0012] [FC] 'block: Leave BDS.backing_file constant'
008/11:[----] [--] 'iotests: Add filter commit test cases'
009/11:[----] [--] 'iotests: Add filter mirror test cases'
010/11:[----] [--] 'iotests: Add test for commit in sub directory'
011/11:[----] [--] 'iotests: Test committing to overridden backing'

> Max Reitz (11):
>   block: Mark commit and mirror as filter drivers
>   blockdev: Check @replaces in blockdev_mirror_common
>   block: Filtered children access functions
>   block: Storage child access function
>   block: Fix check_to_replace_node()
>   iotests: Add tests for mirror @replaces loops
>   block: Leave BDS.backing_file constant
>   iotests: Add filter commit test cases
>   iotests: Add filter mirror test cases
>   iotests: Add test for commit in sub directory
>   iotests: Test committing to overridden backing
> 
>  qapi/block-core.json           |   4 +
>  include/block/block.h          |   2 +
>  include/block/block_int.h      |  53 ++++-
>  block.c                        | 341 ++++++++++++++++++++++++++++-----
>  block/backup.c                 |   8 +-
>  block/block-backend.c          |  16 +-
>  block/commit.c                 |  38 ++--
>  block/io.c                     |  47 +++--
>  block/mirror.c                 |  39 ++--
>  block/qapi.c                   |  38 ++--
>  block/snapshot.c               |  40 ++--
>  block/stream.c                 |  15 +-
>  blockdev.c                     | 167 ++++++++++++----
>  migration/block-dirty-bitmap.c |   4 +-
>  nbd/server.c                   |   8 +-
>  qemu-img.c                     |  24 ++-
>  tests/qemu-iotests/020         |  36 ++++
>  tests/qemu-iotests/020.out     |  10 +
>  tests/qemu-iotests/040         | 191 ++++++++++++++++++
>  tests/qemu-iotests/040.out     |   4 +-
>  tests/qemu-iotests/041         | 270 +++++++++++++++++++++++++-
>  tests/qemu-iotests/041.out     |   4 +-
>  tests/qemu-iotests/191.out     |   1 -
>  tests/qemu-iotests/228         |   6 +-
>  tests/qemu-iotests/228.out     |   6 +-
>  25 files changed, 1141 insertions(+), 231 deletions(-)
> 



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH v2 06/11] iotests: Add tests for mirror @replaces loops
  2018-08-09 22:31 ` [Qemu-devel] [PATCH v2 06/11] iotests: Add tests for mirror @replaces loops Max Reitz
@ 2018-08-09 22:47   ` Max Reitz
  2018-11-12 22:54   ` Eric Blake
  1 sibling, 0 replies; 27+ messages in thread
From: Max Reitz @ 2018-08-09 22:47 UTC (permalink / raw)
  To: qemu-block; +Cc: qemu-devel, Kevin Wolf

[-- Attachment #1: Type: text/plain, Size: 1131 bytes --]

On 2018-08-10 00:31, Max Reitz wrote:
> This adds two tests for cases where our old check_to_replace_node()
> function failed to detect that executing this job with these parameters
> would result in a cyclic graph.
> 
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>  tests/qemu-iotests/041     | 124 +++++++++++++++++++++++++++++++++++++
>  tests/qemu-iotests/041.out |   4 +-
>  2 files changed, 126 insertions(+), 2 deletions(-)

Sometimes, I hate myself a little bit.  I totally forgot to run the test
with QED.

So:

> diff --git a/tests/qemu-iotests/041 b/tests/qemu-iotests/041
> index c20ac7da87..186bd0f031 100755
> --- a/tests/qemu-iotests/041
> +++ b/tests/qemu-iotests/041
> @@ -1061,5 +1061,129 @@ class TestOrphanedSource(iotests.QMPTestCase):

[...]

> +                        'driver': 'qcow2',

[...]

> +                            'driver': 'qcow2',

[...]

> +                    'driver': 'qcow2',

Imagine these to be "'driver': iotests.imgfmt".  With that, it passes.
(I won't send a v3 specifically for this, but it will be included in v3.
Eventually.)

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH v2 00/11] block: Deal with filters
  2018-08-09 22:31 [Qemu-devel] [PATCH v2 00/11] block: Deal with filters Max Reitz
                   ` (11 preceding siblings ...)
  2018-08-09 22:33 ` [Qemu-devel] [PATCH v2 00/11] block: Deal with filters Max Reitz
@ 2018-08-29 13:29 ` Max Reitz
  2018-11-09 22:47 ` Eric Blake
  13 siblings, 0 replies; 27+ messages in thread
From: Max Reitz @ 2018-08-29 13:29 UTC (permalink / raw)
  To: qemu-block; +Cc: qemu-devel, Kevin Wolf

[-- Attachment #1: Type: text/plain, Size: 539 bytes --]

Eric inadvertently pointed me to the fact that qemu-img needs some
additional work.  Its get_block_status() should completely skip R/W
filters (just adding a "bs = bdrv_skip_rw_filters()" at the top of the
loop is enough -- in turn, I can probably drop the
bdrv_skip_implicit_filters() at the bottom), and I suspect the
bdrv_block_status() call in convert_iteration_sectors() should really be

  BlockDriverState *src_bs = blk_bs(s->src[src_cur]);
  bdrv_block_status_above(src_bs, bdrv_backing_chain_next(src_bs), ...);

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH v2 01/11] block: Mark commit and mirror as filter drivers
  2018-08-09 22:31 ` [Qemu-devel] [PATCH v2 01/11] block: Mark commit and mirror as filter drivers Max Reitz
@ 2018-09-07 12:37   ` Alberto Garcia
  0 siblings, 0 replies; 27+ messages in thread
From: Alberto Garcia @ 2018-09-07 12:37 UTC (permalink / raw)
  To: Max Reitz, qemu-block; +Cc: Kevin Wolf, qemu-devel

On Fri 10 Aug 2018 12:31:07 AM CEST, Max Reitz wrote:
> The commit and mirror block nodes are filters, so they should be marked
> as such.
>
> Signed-off-by: Max Reitz <mreitz@redhat.com>

Reviewed-by: Alberto Garcia <berto@igalia.com>

Berto

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH v2 00/11] block: Deal with filters
  2018-08-09 22:31 [Qemu-devel] [PATCH v2 00/11] block: Deal with filters Max Reitz
                   ` (12 preceding siblings ...)
  2018-08-29 13:29 ` Max Reitz
@ 2018-11-09 22:47 ` Eric Blake
  13 siblings, 0 replies; 27+ messages in thread
From: Eric Blake @ 2018-11-09 22:47 UTC (permalink / raw)
  To: Max Reitz, qemu-block; +Cc: Kevin Wolf, qemu-devel

On 8/9/18 5:31 PM, Max Reitz wrote:
> Note 1: This series depends on v10 of my “block: Fix some filename
> generation issues” series.
> 
> Based-on: <20180809213528.14738-1-mreitz@redhat.com>
> 
> Note 2: This is technically the first part of my active mirror followup.
> But just very technically.  I noticed that that followup started to
> consist of two parts, namely (A) fix filtery things in the block layer,
> and (B) fix active mirror.  So I decided to split it.  This is part A.
> Part B comes later.
> 
> 
> When we introduced filters, we did it a bit casually.  Sure, we talked a
> lot about them before, but that was mostly discussion about where
> implicit filters should be added to the graph (note that we currently
> only have two implicit filters, those being mirror and commit).  But in
> the end, we really just designated some drivers filters (Quorum,
> blkdebug, etc.) and added some specifically (throttle, COR), without
> really looking through the block layer to see where issues might occur.
> 
> It turns out vast areas of the block layer just don't know about filters
> and cannot really handle them.  Many cases will work in practice, in
> others, well, too bad, you cannot use some feature because some part
> deep inside the block layer looks at your filters and thinks they are
> format nodes.
> 
> This series sets out to correct a bit of that.  I lost my head many
> times and I'm sure this series is incomplete in many ways, but it really
> doesn't do any good if it sits on my disk any longer, it needs to go out
> now.

Is there any of this series that still needs to go in 3.1?

I'm trying to fix an NBD bug with incorrect alignment in response to 
NBD_CMD_BLOCK_STATUS.  I wanted to use blkdebug to force a 
larger-than-512 minimum alignment exposure to the guest.  But right now, 
nbd/server.c:nbd_export_bitmap() does a braindead:

     while (true) {
         bm = bdrv_find_dirty_bitmap(bs, bitmap);
         if (bm != NULL || bs->backing == NULL) {
             break;
         }

         bs = bs->backing->bs;
     }

which fails to see through a blkdebug filter to find a dirty bitmap in 
the underlying qcow2 BDS (because blkdebug uses bs->file, not 
bs->backing).  It looks like your series will make my task a lot easier, 
but if it's not going in for 3.1, I still need to push my bug fix now, 
and defer the testsuite additions to later when we can more sanely see 
through filters.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH v2 02/11] blockdev: Check @replaces in blockdev_mirror_common
  2018-08-09 22:31 ` [Qemu-devel] [PATCH v2 02/11] blockdev: Check @replaces in blockdev_mirror_common Max Reitz
@ 2018-11-09 22:51   ` Eric Blake
  0 siblings, 0 replies; 27+ messages in thread
From: Eric Blake @ 2018-11-09 22:51 UTC (permalink / raw)
  To: Max Reitz, qemu-block; +Cc: Kevin Wolf, qemu-devel

On 8/9/18 5:31 PM, Max Reitz wrote:
> There is no reason why the constraints we put on @replaces should be
> limited to drive-mirror.  Therefore, move the sanity checks from
> qmp_drive_mirror() to blockdev_mirror_common() so they apply to
> blockdev-mirror as well.
> 
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>   blockdev.c | 55 ++++++++++++++++++++++++++++++++----------------------
>   1 file changed, 33 insertions(+), 22 deletions(-)
> 

Reviewed-by: Eric Blake <eblake@redhat.com>

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH v2 03/11] block: Filtered children access functions
  2018-08-09 22:31 ` [Qemu-devel] [PATCH v2 03/11] block: Filtered children access functions Max Reitz
@ 2018-11-12 22:17   ` Eric Blake
  2018-11-14 19:52     ` Max Reitz
  0 siblings, 1 reply; 27+ messages in thread
From: Eric Blake @ 2018-11-12 22:17 UTC (permalink / raw)
  To: Max Reitz, qemu-block; +Cc: Kevin Wolf, qemu-devel

On 8/9/18 5:31 PM, Max Reitz wrote:
> What bs->file and bs->backing mean depends on the node.  For filter
> nodes, both signify a node that will eventually receive all R/W
> accesses.  For format nodes, bs->file contains metadata and data, and
> bs->backing will not receive writes -- instead, writes are COWed to
> bs->file.  Usually.
> 
> In any case, it is not trivial to guess what a child means exactly with
> our currently limited form of expression.  It is better to introduce
> some functions that actually guarantee a meaning:
> 
> - bdrv_filtered_cow_child() will return the child that receives requests
>    filtered through COW.  That is, reads may or may not be forwarded
>    (depending on the overlay's allocation status), but writes never go to
>    this child.
> 
> - bdrv_filtered_rw_child() will return the child that receives requests
>    filtered through some very plain process.  Reads and writes issued to
>    the parent will go to the child as well (although timing, etc. may be
>    modified).
> 
> - All drivers but quorum (but quorum is pretty opaque to the general
>    block layer anyway) always only have one of these children: All read
>    requests must be served from the filtered_rw_child (if it exists), so
>    if there was a filtered_cow_child in addition, it would not receive
>    any requests at all.
>    (The closest here is mirror, where all requests are passed on to the
>    source, but with write-blocking, write requests are "COWed" to the
>    target.  But that just means that the target is a special child that
>    cannot be introspected by the generic block layer functions, and that
>    source is a filtered_rw_child.)
>    Therefore, we can also add bdrv_filtered_child() which returns that
>    one child (or NULL, if there is no filtered child).
> 
> Also, many places in the current block layer should be skipping filters
> (all filters or just the ones added implicitly, it depends) when going
> through a block node chain.  They do not do that currently, but this
> patch makes them.

The description makes sense; now on to the code.

> 
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>   qapi/block-core.json           |   4 +
>   include/block/block.h          |   1 +
>   include/block/block_int.h      |  33 +++++-
>   block.c                        | 184 ++++++++++++++++++++++++++++-----
>   block/backup.c                 |   8 +-
>   block/block-backend.c          |  16 ++-
>   block/commit.c                 |  36 ++++---
>   block/io.c                     |  27 ++---
>   block/mirror.c                 |  37 ++++---
>   block/qapi.c                   |  26 ++---
>   block/stream.c                 |  15 ++-
>   blockdev.c                     |  84 ++++++++++++---
>   migration/block-dirty-bitmap.c |   4 +-
>   nbd/server.c                   |   8 +-
>   qemu-img.c                     |  12 ++-
>   15 files changed, 363 insertions(+), 132 deletions(-)
> 
> diff --git a/qapi/block-core.json b/qapi/block-core.json
> index f20efc97f7..a71df88eb2 100644
> --- a/qapi/block-core.json
> +++ b/qapi/block-core.json
> @@ -2248,6 +2248,10 @@
>   # On successful completion the image file is updated to drop the backing file
>   # and the BLOCK_JOB_COMPLETED event is emitted.

Context: this is part of block-stream.

>   #
> +# In case @device is a filter node, block-stream modifies the first non-filter
> +# overlay node below it to point to base's backing node (or NULL if @base was
> +# not specified) instead of modifying @device itself.

That is, if we have:

base <- filter1 <- active <- filter2

and request a block-stream with "top":"filter2", it is no different in 
effect than if we had requested "top":"active", since filter nodes can't 
be stream targets.  Makes sense.

What happens if we request "base":"filter1"? Do we want to require base 
to be a non-filter node?

> +++ b/include/block/block_int.h
> @@ -91,6 +91,7 @@ struct BlockDriver {
>        * certain callbacks that refer to data (see block.c) to their bs->file if
>        * the driver doesn't implement them. Drivers that do not wish to forward
>        * must implement them and return -ENOTSUP.
> +     * Note that filters are not allowed to modify data.

They can modify offsets and timing, but not data?  Even if it is an 
encryption filter?  I'm trying to figure out if LUKS behaves like a filter.

> +++ b/block.c
> @@ -532,11 +532,12 @@ int bdrv_create_file(const char *filename, QemuOpts *opts, Error **errp)
>   int bdrv_probe_blocksizes(BlockDriverState *bs, BlockSizes *bsz)
>   {
>       BlockDriver *drv = bs->drv;
> +    BlockDriverState *filtered = bdrv_filtered_rw_bs(bs);

Is it worth a micro-optimization of not calling this...

>   
>       if (drv && drv->bdrv_probe_blocksizes) {
>           return drv->bdrv_probe_blocksizes(bs, bsz);

...until after checking drv->bdrv_probe_blocksizes?

> -    } else if (drv && drv->is_filter && bs->file) {
> -        return bdrv_probe_blocksizes(bs->file->bs, bsz);
> +    } else if (filtered) {
> +        return bdrv_probe_blocksizes(filtered, bsz);
>       }

But I don't mind if you leave it as written.

Is blkdebug a filter, or something else?  That's a case of something 
that DOES change block sizes in relation to the child that it is 
filtering.  If we have qcow2 -> blkdebug -> file, and the qcow2 format 
layer wants to know the blocksizes of its child, does it really always 
want the sizes of 'file' rather than the (possibly changed) sizes of 
'blkdebug'?

>   
>       return -ENOTSUP;
> @@ -551,11 +552,12 @@ int bdrv_probe_blocksizes(BlockDriverState *bs, BlockSizes *bsz)
>   int bdrv_probe_geometry(BlockDriverState *bs, HDGeometry *geo)
>   {
>       BlockDriver *drv = bs->drv;
> +    BlockDriverState *filtered = bdrv_filtered_rw_bs(bs);
>   
>       if (drv && drv->bdrv_probe_geometry) {
>           return drv->bdrv_probe_geometry(bs, geo);
> -    } else if (drv && drv->is_filter && bs->file) {
> -        return bdrv_probe_geometry(bs->file->bs, geo);
> +    } else if (filtered) {
> +        return bdrv_probe_geometry(filtered, geo);
>       }

At least you're consistent on skipping filters.

> @@ -4068,7 +4074,19 @@ BlockDriverState *bdrv_lookup_bs(const char *device,
>   bool bdrv_chain_contains(BlockDriverState *top, BlockDriverState *base)
>   {
>       while (top && top != base) {
> -        top = backing_bs(top);
> +        top = bdrv_filtered_bs(top);
> +    }
> +
> +    return top != NULL;
> +}
> +
> +/* Same as bdrv_chain_contains(), but skip implicitly added R/W filter
> + * nodes and do not move past explicitly added R/W filters. */
> +bool bdrv_legacy_chain_contains(BlockDriverState *top, BlockDriverState *base)
> +{
> +    top = bdrv_skip_implicit_filters(top);
> +    while (top && top != base) {
> +        top = bdrv_skip_implicit_filters(bdrv_filtered_cow_bs(top));
>       }

Is there a goal of getting rid of bdrv_legacy_chain_contains() in the 
future?  If so, should the commit message and/or code comments mention that?

>   
>       return top != NULL;
> @@ -4140,20 +4158,24 @@ int bdrv_has_zero_init_1(BlockDriverState *bs)
>   
>   int bdrv_has_zero_init(BlockDriverState *bs)
>   {
> +    BlockDriverState *filtered;
> +
>       if (!bs->drv) {
>           return 0;
>       }
>   
>       /* If BS is a copy on write image, it is initialized to
>          the contents of the base image, which may not be zeroes.  */
> -    if (bs->backing) {
> +    if (bdrv_filtered_cow_child(bs)) {
>           return 0;

Not for this patch, but should we ask the filtered_cow_child if it is 
known to be all-zero content before blindly returning 0 here? Some 
children may be able to efficiently report if they have all-zero content 
[for example, see the recent thread about NBD performace drop due to 
explicitly zeroing the remote device, which could be skipped if it is 
known that the remote device started life uninitialized]

>       }
>       if (bs->drv->bdrv_has_zero_init) {
>           return bs->drv->bdrv_has_zero_init(bs);
>       }
> -    if (bs->file && bs->drv->is_filter) {
> -        return bdrv_has_zero_init(bs->file->bs);
> +
> +    filtered = bdrv_filtered_rw_bs(bs);
> +    if (filtered) {
> +        return bdrv_has_zero_init(filtered);
>       }

You argued earlier that a filter can't change contents - but is that 
just guest-visible contents? If LUKS is a filter node, then a file that 
is zero-initialized is NOT zero-initialized after passing through LUKS 
encryption (decrypting the zeros returns garbage; conversely, writing 
zeros into LUKS results in random-looking bits in the file).  I guess 
I'm leaning more and more towards LUKS is not a filter, but a format.

> @@ -4198,8 +4220,9 @@ int bdrv_get_info(BlockDriverState *bs, BlockDriverInfo *bdi)
>           return -ENOMEDIUM;
>       }
>       if (!drv->bdrv_get_info) {
> -        if (bs->file && drv->is_filter) {
> -            return bdrv_get_info(bs->file->bs, bdi);
> +        BlockDriverState *filtered = bdrv_filtered_rw_bs(bs);
> +        if (filtered) {
> +            return bdrv_get_info(filtered, bdi);

Is this right for blkdebug?

> @@ -5487,3 +5519,105 @@ bool bdrv_can_store_new_dirty_bitmap(BlockDriverState *bs, const char *name,
>   
>       return drv->bdrv_can_store_new_dirty_bitmap(bs, name, granularity, errp);
>   }
> +
> +/*
> + * Return the child that @bs acts as an overlay for, and from which data may be
> + * copied in COW or COR operations.  Usually this is the backing file.
> + */
> +BdrvChild *bdrv_filtered_cow_child(BlockDriverState *bs)
> +{
> +    if (!bs || !bs->drv) {
> +        return NULL;
> +    }
> +
> +    if (bs->drv->is_filter) {
> +        return NULL;
> +    }

Here, filters end the search...

> +
> +    return bs->backing;
> +}
> +
> +/*
> + * If @bs acts as a pass-through filter for one of its children,
> + * return that child.  "Pass-through" means that write operations to
> + * @bs are forwarded to that child instead of triggering COW.
> + */
> +BdrvChild *bdrv_filtered_rw_child(BlockDriverState *bs)
> +{
> +    if (!bs || !bs->drv) {
> +        return NULL;
> +    }
> +
> +    if (!bs->drv->is_filter) {
> +        return NULL;
> +    }

...while here, non-filters end the search. I think I follow your 
semantics (we were abusing bs->backing for filters, and your code is now 
trying to distinguish what was really meant)

> +
> +    return bs->backing ?: bs->file;
> +}
> +
> +/*
> + * Return any filtered child, independently on how it reacts to write

s/on/of/

> + * accesses and whether data is copied onto this BDS through COR.
> + */
> +BdrvChild *bdrv_filtered_child(BlockDriverState *bs)
> +{
> +    BdrvChild *cow_child = bdrv_filtered_cow_child(bs);
> +    BdrvChild *rw_child = bdrv_filtered_rw_child(bs);
> +
> +    /* There can only be one filtered child at a time */
> +    assert(!(cow_child && rw_child));
> +
> +    return cow_child ?: rw_child;
> +}
> +
> +static BlockDriverState *bdrv_skip_filters(BlockDriverState *bs,
> +                                           bool stop_on_explicit_filter)
> +{
> +    BdrvChild *filtered;
> +
> +    if (!bs) {
> +        return NULL;
> +    }
> +
> +    while (!(stop_on_explicit_filter && !bs->implicit)) {
> +        filtered = bdrv_filtered_rw_child(bs);
> +        if (!filtered) {
> +            break;
> +        }
> +        bs = filtered->bs;
> +    }
> +    /* Note that this treats nodes with bs->drv == NULL as not being
> +     * R/W filters (bs->drv == NULL should be replaced by something
> +     * else anyway).
> +     * The advantage of this behavior is that this function will thus
> +     * always return a non-NULL value (given a non-NULL @bs). */
> +
> +    return bs;
> +}
> +
> +/*
> + * Return the first BDS that has not been added implicitly or that
> + * does not have an RW-filtered child down the chain starting from @bs
> + * (including @bs itself).
> + */
> +BlockDriverState *bdrv_skip_implicit_filters(BlockDriverState *bs)
> +{
> +    return bdrv_skip_filters(bs, true);
> +}
> +
> +/*
> + * Return the first BDS that does not have an RW-filtered child down
> + * the chain starting from @bs (including @bs itself).
> + */
> +BlockDriverState *bdrv_skip_rw_filters(BlockDriverState *bs)
> +{
> +    return bdrv_skip_filters(bs, false);
> +}
> +
> +/*
> + * For a backing chain, return the first non-filter backing image.
> + */
> +BlockDriverState *bdrv_backing_chain_next(BlockDriverState *bs)
> +{
> +    return bdrv_skip_rw_filters(bdrv_filtered_cow_bs(bdrv_skip_rw_filters(bs)));
> +}

Makes sense to me.

> diff --git a/block/backup.c b/block/backup.c
> index 8630d32926..4ddc0bb632 100644
> --- a/block/backup.c
> +++ b/block/backup.c
> @@ -618,6 +618,7 @@ BlockJob *backup_job_create(const char *job_id, BlockDriverState *bs,
>       int64_t len;
>       BlockDriverInfo bdi;
>       BackupBlockJob *job = NULL;
> +    bool target_does_cow;
>       int ret;
>   
>       assert(bs);
> @@ -712,8 +713,9 @@ BlockJob *backup_job_create(const char *job_id, BlockDriverState *bs,
>       /* If there is no backing file on the target, we cannot rely on COW if our
>        * backup cluster size is smaller than the target cluster size. Even for
>        * targets with a backing file, try to avoid COW if possible. */
> +    target_does_cow = bdrv_filtered_cow_child(target);
>       ret = bdrv_get_info(target, &bdi);
> -    if (ret == -ENOTSUP && !target->backing) {
> +    if (ret == -ENOTSUP && !target_does_cow) {

And now we're starting to see the bug fixes - a backup job to a 
throttled node should behave the same as backing up to the original node 
before throttling was added.

> @@ -410,20 +413,23 @@ int bdrv_commit(BlockDriverState *bs)
>       if (!drv)
>           return -ENOMEDIUM;
>   
> -    if (!bs->backing) {
> +    backing_file_bs = bdrv_filtered_cow_bs(bs);
> +
> +    if (!backing_file_bs) {
>           return -ENOTSUP;
>       }

Here, the old code exits early without bs->backing...

>   
>       if (bdrv_op_is_blocked(bs, BLOCK_OP_TYPE_COMMIT_SOURCE, NULL) ||
> -        bdrv_op_is_blocked(bs->backing->bs, BLOCK_OP_TYPE_COMMIT_TARGET, NULL)) {
> +        bdrv_op_is_blocked(backing_file_bs, BLOCK_OP_TYPE_COMMIT_TARGET, NULL))
> +    {
>           return -EBUSY;
>       }
>   
> -    ro = bs->backing->bs->read_only;
> -    open_flags =  bs->backing->bs->open_flags;
> +    ro = backing_file_bs->read_only;
> +    open_flags =  backing_file_bs->open_flags;
>   
>       if (ro) {
> -        if (bdrv_reopen(bs->backing->bs, open_flags | BDRV_O_RDWR, NULL)) {
> +        if (bdrv_reopen(backing_file_bs, open_flags | BDRV_O_RDWR, NULL)) {
>               return -EACCES;
>           }
>       }
> @@ -438,8 +444,6 @@ int bdrv_commit(BlockDriverState *bs)
>       }
>   
>       /* Insert commit_top block node above backing, so we can write to it */
> -    backing_file_bs = backing_bs(bs);
> -

...then set backing_file_bs (presumably to something always non-null)...

>       commit_top_bs = bdrv_new_open_driver(&bdrv_commit_top, NULL, BDRV_O_RDWR,
>                                            &local_err);
>       if (commit_top_bs == NULL) {
> @@ -525,15 +529,13 @@ ro_cleanup:
>       qemu_vfree(buf);
>   
>       blk_unref(backing);
> -    if (backing_file_bs) {
> -        bdrv_set_backing_hd(bs, backing_file_bs, &error_abort);
> -    }
> +    bdrv_set_backing_hd(bs, backing_file_bs, &error_abort);

...then looks like it has a dead always-true conditional. The new code 
is thus a bit smarter.

> +++ b/block/io.c
> @@ -120,6 +120,7 @@ static void bdrv_merge_limits(BlockLimits *dst, const BlockLimits *src)
>   void bdrv_refresh_limits(BlockDriverState *bs, Error **errp)
>   {
>       BlockDriver *drv = bs->drv;
> +    BlockDriverState *cow_bs = bdrv_filtered_cow_bs(bs);
>       Error *local_err = NULL;
>   
>       memset(&bs->bl, 0, sizeof(bs->bl));
> @@ -148,13 +149,13 @@ void bdrv_refresh_limits(BlockDriverState *bs, Error **errp)
>           bs->bl.max_iov = IOV_MAX;
>       }
>   
> -    if (bs->backing) {
> -        bdrv_refresh_limits(bs->backing->bs, &local_err);
> +    if (cow_bs) {
> +        bdrv_refresh_limits(cow_bs, &local_err);
>           if (local_err) {
>               error_propagate(errp, local_err);
>               return;
>           }
> -        bdrv_merge_limits(&bs->bl, &bs->backing->bs->bl);
> +        bdrv_merge_limits(&bs->bl, &cow_bs->bl);

Is this doing the right things with blkdebug?

> +++ b/blockdev.c

> @@ -3293,6 +3300,12 @@ void qmp_block_commit(bool has_job_id, const char *job_id, const char *device,
>           error_setg(errp, "cannot commit an image into itself");
>           goto out;
>       }
> +    if (!bdrv_legacy_chain_contains(top_bs, base_bs)) {
> +        /* We have to disallow this until the user can give explicit
> +         * consent */
> +        error_setg(errp, "Cannot commit through explicit filter nodes");
> +        goto out;
> +    }

Makes sense. I guess the argument here is that the API now fails where 
it could previously succeed, but the earlier success was questionable 
and probably broke rather than doing what the user thought it might.

> @@ -3722,8 +3752,14 @@ void qmp_drive_mirror(DriveMirror *arg, Error **errp)
>       }
>   
>       flags = bs->open_flags | BDRV_O_RDWR;
> -    source = backing_bs(bs);
> +    source = bdrv_filtered_cow_bs(unfiltered_bs);
>       if (!source && arg->sync == MIRROR_SYNC_MODE_TOP) {
> +        if (bdrv_filtered_bs(unfiltered_bs)) {
> +            /* @unfiltered_bs is an explicit filter */
> +            error_setg(errp, "Cannot perform sync=top mirror through an "
> +                       "explicitly added filter node on the source");
> +            goto out;
> +        }

Again, a failure now where the previous code was probably questionable. 
Seems okay.

> +++ b/nbd/server.c
> @@ -2415,13 +2415,9 @@ void nbd_export_bitmap(NBDExport *exp, const char *bitmap,
>           return;
>       }
>   
> -    while (true) {
> +    while (bs && !bm) {
>           bm = bdrv_find_dirty_bitmap(bs, bitmap);
> -        if (bm != NULL || bs->backing == NULL) {
> -            break;
> -        }
> -
> -        bs = bs->backing->bs;
> +        bs = bdrv_filtered_bs(bs);

Yep, this is the rewrite that I recently realized I need for using 
blkdebug to artificially change block limits during NBD testing.

Overall looks good, but I'm not sure if any of my questions, or rebasing 
to master, will require a respin, so I'll wait a bit before giving R-b 
in case you want to respond to my comments first.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH v2 04/11] block: Storage child access function
  2018-08-09 22:31 ` [Qemu-devel] [PATCH v2 04/11] block: Storage child access function Max Reitz
@ 2018-11-12 22:32   ` Eric Blake
  2018-11-14 19:56     ` Max Reitz
  0 siblings, 1 reply; 27+ messages in thread
From: Eric Blake @ 2018-11-12 22:32 UTC (permalink / raw)
  To: Max Reitz, qemu-block; +Cc: Kevin Wolf, qemu-devel

On 8/9/18 5:31 PM, Max Reitz wrote:
> For completeness' sake, add a function for accessing a node's storage
> child, too.  For filters, this is there filtered child; for non-filters,

s/there/their/

> this is bs->file.
> 
> Some places are deliberately left unconverted:
> - BDS opening/closing functions where bs->file is handled specially
>    (which is basically wrong, but at least simplifies probing)
> - bdrv_co_block_status_from_file(), because its name implies that it
>    points to ->file

I'm wondering if we can clean up block_status to let filters have a NULL 
callback and io.c do the right thing automatically, rather than the 
current approach of filters assigning the callback to the common helper 
routine.  Maybe later in the series.

> - bdrv_snapshot_goto() in one places unrefs bs->file.  Such a

s/places/place/

>    modification is not covered by this patch and is therefore just
>    safeguarded by an additional assert(), but otherwise kept as-is.
> 
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---

> +++ b/block/snapshot.c

> @@ -204,37 +207,38 @@ int bdrv_snapshot_goto(BlockDriverState *bs,
>           return ret;
>       }
>   
> -    if (bs->file) {
> -        BlockDriverState *file;
> +    storage_bs = bdrv_storage_bs(bs);
> +    if (storage_bs) {
>           QDict *options = qdict_clone_shallow(bs->options);
>           QDict *file_options;
>           Error *local_err = NULL;
>   
> -        file = bs->file->bs;
>           /* Prevent it from getting deleted when detached from bs */
> -        bdrv_ref(file);
> +        bdrv_ref(storage_bs);
>   
>           qdict_extract_subqdict(options, &file_options, "file.");
>           qobject_unref(file_options);
> -        qdict_put_str(options, "file", bdrv_get_node_name(file));
> +        qdict_put_str(options, "file", bdrv_get_node_name(storage_bs));
>   
>           drv->bdrv_close(bs);
> +
> +        assert(bs->file->bs == storage_bs);

At first glance, this assertion...

>           bdrv_unref_child(bs, bs->file);
>           bs->file = NULL;
>   
> -        ret = bdrv_snapshot_goto(file, snapshot_id, errp);
> +        ret = bdrv_snapshot_goto(storage_bs, snapshot_id, errp);
>           open_ret = drv->bdrv_open(bs, options, bs->open_flags, &local_err);
>           qobject_unref(options);
>           if (open_ret < 0) {
> -            bdrv_unref(file);
> +            bdrv_unref(storage_bs);
>               bs->drv = NULL;
>               /* A bdrv_snapshot_goto() error takes precedence */
>               error_propagate(errp, local_err);
>               return ret < 0 ? ret : open_ret;
>           }
>   
> -        assert(bs->file->bs == file);
> -        bdrv_unref(file);
> +        assert(bs->file->bs == storage_bs);

...looks like a duplicate of this one. But looking closer, I see 
bs->file = NULL followed by drv->bdrv_open() in between which should 
reassign bs->file, so having the assertion on both ends makes sense.

Reviewed-by: Eric Blake <eblake@redhat.com>

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [for 3.1? Qemu-devel] [PATCH v2 05/11] block: Fix check_to_replace_node()
  2018-08-09 22:31 ` [Qemu-devel] [PATCH v2 05/11] block: Fix check_to_replace_node() Max Reitz
@ 2018-11-12 22:47   ` Eric Blake
  0 siblings, 0 replies; 27+ messages in thread
From: Eric Blake @ 2018-11-12 22:47 UTC (permalink / raw)
  To: Max Reitz, qemu-block; +Cc: Kevin Wolf, qemu-devel

On 8/9/18 5:31 PM, Max Reitz wrote:
> Currently, check_to_replace_node() only allows mirror to replace a node
> in the chain of the source node, and only if it is the first non-filter
> node below the source.  Well, technically, the idea is that you can
> exactly replace a quorum child by mirroring from quorum.
> 
> This has (probably) two reasons:
> (1) We do not want to create loops.
> (2) @replaces and @device should have exactly the same content so
>      replacing them does not cause visible data to change.
> 
> This has two issues:
> (1) It is overly restrictive.  It is completely fine for @replaces to be
>      a filter.
> (2) It is not restrictive enough.  You can create loops with this as
>      follows:
> 
> $ qemu-img create -f qcow2 /tmp/source.qcow2 64M
> $ qemu-system-x86_64 -qmp stdio
> {"execute": "qmp_capabilities"}
> {"execute": "object-add",
>   "arguments": {"qom-type": "throttle-group", "id": "tg0"}}
> {"execute": "blockdev-add",
>   "arguments": {
>       "node-name": "source",
>       "driver": "throttle",
>       "throttle-group": "tg0",
>       "file": {
>           "node-name": "filtered",
>           "driver": "qcow2",
>           "file": {
>               "driver": "file",
>               "filename": "/tmp/source.qcow2"
>           } } } }
> {"execute": "drive-mirror",
>   "arguments": {
>       "job-id": "mirror",
>       "device": "source",
>       "target": "/tmp/target.qcow2",
>       "format": "qcow2",
>       "node-name": "target",
>       "sync" :"none",
>       "replaces": "filtered"
>   } }
> {"execute": "block-job-complete", "arguments": {"device": "mirror"}}
> 
> And qemu crashes because of a stack overflow due to the loop being
> created (target's backing file is source, so when it replaces filtered,
> it points to itself through source).

Sounds like good material for inclusion in 3.1.

> 
> (blockdev-mirror can be broken similarly.)
> 
> So let us make the checks for the two conditions above explicit, which
> makes the whole function exactly as restrictive as it needs to be.
> 
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---

Reviewed-by: Eric Blake <eblake@redhat.com>

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH v2 06/11] iotests: Add tests for mirror @replaces loops
  2018-08-09 22:31 ` [Qemu-devel] [PATCH v2 06/11] iotests: Add tests for mirror @replaces loops Max Reitz
  2018-08-09 22:47   ` Max Reitz
@ 2018-11-12 22:54   ` Eric Blake
  1 sibling, 0 replies; 27+ messages in thread
From: Eric Blake @ 2018-11-12 22:54 UTC (permalink / raw)
  To: Max Reitz, qemu-block; +Cc: Kevin Wolf, qemu-devel

On 8/9/18 5:31 PM, Max Reitz wrote:
> This adds two tests for cases where our old check_to_replace_node()
> function failed to detect that executing this job with these parameters
> would result in a cyclic graph.
> 
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>   tests/qemu-iotests/041     | 124 +++++++++++++++++++++++++++++++++++++
>   tests/qemu-iotests/041.out |   4 +-
>   2 files changed, 126 insertions(+), 2 deletions(-)
> 

With your followup amendments to allow qed testing (hmm, you mention 
you'd be posting a v3),

Reviewed-by: Eric Blake <eblake@redhat.com>

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH v2 07/11] block: Leave BDS.backing_file constant
  2018-08-09 22:31 ` [Qemu-devel] [PATCH v2 07/11] block: Leave BDS.backing_file constant Max Reitz
@ 2018-11-12 23:08   ` Eric Blake
  2018-11-14 20:01     ` Max Reitz
  0 siblings, 1 reply; 27+ messages in thread
From: Eric Blake @ 2018-11-12 23:08 UTC (permalink / raw)
  To: Max Reitz, qemu-block; +Cc: Kevin Wolf, qemu-devel

On 8/9/18 5:31 PM, Max Reitz wrote:
> Parts of the block layer treat BDS.backing_file as if it were whatever
> the image header says (i.e., if it is a relative path, it is relative to
> the overlay), other parts treat it like a cache for
> bs->backing->bs->filename (relative paths are relative to the CWD).
> Considering bs->backing->bs->filename exists, let us make it mean the
> former.
> 
> Among other things, this now allows the user to specify a base when
> using qemu-img to commit an image file in a directory that is not the
> CWD (assuming, everything uses relative filenames).
> 
> Before this patch:
> 
> $ ./qemu-img create -f qcow2 foo/bot.qcow2 1M
> $ ./qemu-img create -f qcow2 -b bot.qcow2 foo/mid.qcow2
> $ ./qemu-img create -f qcow2 -b mid.qcow2 foo/top.qcow2
> $ ./qemu-img commit -b mid.qcow2 foo/top.qcow2
> qemu-img: Did not find 'mid.qcow2' in the backing chain of 'foo/top.qcow2'
> $ ./qemu-img commit -b foo/mid.qcow2 foo/top.qcow2
> qemu-img: Did not find 'foo/mid.qcow2' in the backing chain of 'foo/top.qcow2'
> $ ./qemu-img commit -b $PWD/foo/mid.qcow2 foo/top.qcow2
> qemu-img: Did not find '[...]/foo/mid.qcow2' in the backing chain of 'foo/top.qcow2'

Three failures in a row - no way to commit short of changing your 
working directory.

> 
> After this patch:
> 
> $ ./qemu-img commit -b mid.qcow2 foo/top.qcow2
> Image committed.
> $ ./qemu-img commit -b foo/mid.qcow2 foo/top.qcow2
> qemu-img: Did not find 'foo/mid.qcow2' in the backing chain of 'foo/top.qcow2'
> $ ./qemu-img commit -b $PWD/foo/mid.qcow2 foo/top.qcow2
> Image committed.

Yay, that looks saner.

> 
> With this change, bdrv_find_backing_image() must look at whether the
> user has overridden a BDS's backing file.  If so, it can no longer use
> bs->backing_file, but must instead compare the given filename against
> the backing node's filename directly.
> 
> Note that this changes the QAPI output for a node's backing_file.  We
> had very inconsistent output there (sometimes what the image header
> said, sometimes the actual filename of the backing image).  This
> inconsistent output was effectively useless, so we have to decide one
> way or the other.  Considering that bs->backing_file usually at runtime
> contained the path to the image relative to qemu's CWD (or absolute),
> this patch changes QAPI's backing_file to always report the
> bs->backing->bs->filename from now on.  If you want to receive the image
> header information, you have to refer to full-backing-filename.
> 
> This necessitates a change to iotest 228.  The interesting information
> it really wanted is the image header, and it can get that now, but it
> has to use full-backing-filename instead of backing_file.  Because of
> this patch's changes to bs->backing_file's behavior, we also need some
> reference output changes.
> 
> Along with the changes to bs->backing_file, stop updating
> BDS.backing_format in bdrv_backing_attach() as well.  This necessitates
> a change to the reference output of iotest 191.

Good explanations for the test changes.

> 
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>   include/block/block_int.h  | 14 +++++++++-----
>   block.c                    | 29 ++++++++++++++++++++++-------
>   block/qapi.c               |  7 ++++---
>   qemu-img.c                 | 12 ++++++++++--
>   tests/qemu-iotests/191.out |  1 -
>   tests/qemu-iotests/228     |  6 +++---
>   tests/qemu-iotests/228.out |  6 +++---
>   7 files changed, 51 insertions(+), 24 deletions(-)
> 
> diff --git a/include/block/block_int.h b/include/block/block_int.h
> index d3d8b22155..8f2c515ec1 100644
> --- a/include/block/block_int.h
> +++ b/include/block/block_int.h
> @@ -737,11 +737,15 @@ struct BlockDriverState {
>       bool walking_aio_notifiers; /* to make removal during iteration safe */
>   
>       char filename[PATH_MAX];
> -    char backing_file[PATH_MAX]; /* if non zero, the image is a diff of
> -                                    this file image */
> -    /* The backing filename indicated by the image header; if we ever
> -     * open this file, then this is replaced by the resulting BDS's
> -     * filename (i.e. after a bdrv_refresh_filename() run). */
> +    /* If non-zero, the image is a diff of this image file.  Note that

Pre-existing, but that sentence might read nicer as:

If not empty, this image is a diff in relation to backing_file.

> +     * this the name given in the image header and may therefore not

"this the name" is wrong; did you mean "this is the name" or "this name" 
or "the name"?

> +     * be equal to .backing->bs->filename, and relative paths are
> +     * resolved relatively to their overlay. */
> +    char backing_file[PATH_MAX];
> +    /* The backing filename indicated by the image header.  Contrary
> +     * to backing_file, if we ever open this file, auto_backing_file
> +     * is replaced by the resulting BDS's filename (i.e. after a
> +     * bdrv_refresh_filename() run). */
>       char auto_backing_file[PATH_MAX];
>       char backing_format[16]; /* if non-zero and backing_file exists */
>   

Reviewed-by: Eric Blake <eblake@redhat.com>

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH v2 03/11] block: Filtered children access functions
  2018-11-12 22:17   ` Eric Blake
@ 2018-11-14 19:52     ` Max Reitz
  2019-02-13 16:42       ` Max Reitz
  0 siblings, 1 reply; 27+ messages in thread
From: Max Reitz @ 2018-11-14 19:52 UTC (permalink / raw)
  To: Eric Blake, qemu-block; +Cc: Kevin Wolf, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 17476 bytes --]

On 12.11.18 23:17, Eric Blake wrote:
> On 8/9/18 5:31 PM, Max Reitz wrote:
>> What bs->file and bs->backing mean depends on the node.  For filter
>> nodes, both signify a node that will eventually receive all R/W
>> accesses.  For format nodes, bs->file contains metadata and data, and
>> bs->backing will not receive writes -- instead, writes are COWed to
>> bs->file.  Usually.
>>
>> In any case, it is not trivial to guess what a child means exactly with
>> our currently limited form of expression.  It is better to introduce
>> some functions that actually guarantee a meaning:
>>
>> - bdrv_filtered_cow_child() will return the child that receives requests
>>    filtered through COW.  That is, reads may or may not be forwarded
>>    (depending on the overlay's allocation status), but writes never go to
>>    this child.
>>
>> - bdrv_filtered_rw_child() will return the child that receives requests
>>    filtered through some very plain process.  Reads and writes issued to
>>    the parent will go to the child as well (although timing, etc. may be
>>    modified).
>>
>> - All drivers but quorum (but quorum is pretty opaque to the general
>>    block layer anyway) always only have one of these children: All read
>>    requests must be served from the filtered_rw_child (if it exists), so
>>    if there was a filtered_cow_child in addition, it would not receive
>>    any requests at all.
>>    (The closest here is mirror, where all requests are passed on to the
>>    source, but with write-blocking, write requests are "COWed" to the
>>    target.  But that just means that the target is a special child that
>>    cannot be introspected by the generic block layer functions, and that
>>    source is a filtered_rw_child.)
>>    Therefore, we can also add bdrv_filtered_child() which returns that
>>    one child (or NULL, if there is no filtered child).
>>
>> Also, many places in the current block layer should be skipping filters
>> (all filters or just the ones added implicitly, it depends) when going
>> through a block node chain.  They do not do that currently, but this
>> patch makes them.
> 
> The description makes sense; now on to the code.
> 
>>
>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>> ---
>>   qapi/block-core.json           |   4 +
>>   include/block/block.h          |   1 +
>>   include/block/block_int.h      |  33 +++++-
>>   block.c                        | 184 ++++++++++++++++++++++++++++-----
>>   block/backup.c                 |   8 +-
>>   block/block-backend.c          |  16 ++-
>>   block/commit.c                 |  36 ++++---
>>   block/io.c                     |  27 ++---
>>   block/mirror.c                 |  37 ++++---
>>   block/qapi.c                   |  26 ++---
>>   block/stream.c                 |  15 ++-
>>   blockdev.c                     |  84 ++++++++++++---
>>   migration/block-dirty-bitmap.c |   4 +-
>>   nbd/server.c                   |   8 +-
>>   qemu-img.c                     |  12 ++-
>>   15 files changed, 363 insertions(+), 132 deletions(-)
>>
>> diff --git a/qapi/block-core.json b/qapi/block-core.json
>> index f20efc97f7..a71df88eb2 100644
>> --- a/qapi/block-core.json
>> +++ b/qapi/block-core.json
>> @@ -2248,6 +2248,10 @@
>>   # On successful completion the image file is updated to drop the
>> backing file
>>   # and the BLOCK_JOB_COMPLETED event is emitted.
> 
> Context: this is part of block-stream.
> 
>>   #
>> +# In case @device is a filter node, block-stream modifies the first
>> non-filter
>> +# overlay node below it to point to base's backing node (or NULL if
>> @base was
>> +# not specified) instead of modifying @device itself.
> 
> That is, if we have:
> 
> base <- filter1 <- active <- filter2
> 
> and request a block-stream with "top":"filter2", it is no different in
> effect than if we had requested "top":"active", since filter nodes can't
> be stream targets.  Makes sense.
> 
> What happens if we request "base":"filter1"? Do we want to require base
> to be a non-filter node?

Well, then you get this after streaming:

base <- active <- filter2

There is no good reason why you'd stream to remove filters (just doing a
reopen should be enough), but why not.  We can make the backing pointer
point to any child, so it doesn't matter what the child is.  The problem
is that we can only write backing file strings into actual COW overlay
nodes, so it does matter what the parent is.

>> +++ b/include/block/block_int.h
>> @@ -91,6 +91,7 @@ struct BlockDriver {
>>        * certain callbacks that refer to data (see block.c) to their
>> bs->file if
>>        * the driver doesn't implement them. Drivers that do not wish
>> to forward
>>        * must implement them and return -ENOTSUP.
>> +     * Note that filters are not allowed to modify data.
> 
> They can modify offsets and timing, but not data?  Even if it is an
> encryption filter?  I'm trying to figure out if LUKS behaves like a filter.

It doesn't.  It's a format.

First of all, LUKS has metadata, so it definitely is a format.

Second, even if it didn't, I think it is a very, very useful convention
to declare filters as things that do not modify data.  If a block driver
does modify data, there is absolutely no point in handling it any
different than a normal format driver.

>> +++ b/block.c
>> @@ -532,11 +532,12 @@ int bdrv_create_file(const char *filename,
>> QemuOpts *opts, Error **errp)
>>   int bdrv_probe_blocksizes(BlockDriverState *bs, BlockSizes *bsz)
>>   {
>>       BlockDriver *drv = bs->drv;
>> +    BlockDriverState *filtered = bdrv_filtered_rw_bs(bs);
> 
> Is it worth a micro-optimization of not calling this...
> 
>>         if (drv && drv->bdrv_probe_blocksizes) {
>>           return drv->bdrv_probe_blocksizes(bs, bsz);
> 
> ...until after checking drv->bdrv_probe_blocksizes?

I don't know? :-)

I wouldn't think so, as bdrv_filtered_rw_bs() is just so simple.

>> -    } else if (drv && drv->is_filter && bs->file) {
>> -        return bdrv_probe_blocksizes(bs->file->bs, bsz);
>> +    } else if (filtered) {
>> +        return bdrv_probe_blocksizes(filtered, bsz);
>>       }
> 
> But I don't mind if you leave it as written.
> 
> Is blkdebug a filter, or something else?

I would have said it's a filter.

> That's a case of something
> that DOES change block sizes in relation to the child that it is
> filtering.  If we have qcow2 -> blkdebug -> file, and the qcow2 format
> layer wants to know the blocksizes of its child, does it really always
> want the sizes of 'file' rather than the (possibly changed) sizes of
> 'blkdebug'?

Hm.  See, that's why this series is so difficult, because all these
questions keep popping up. :-)

This is a very good question indeed.  I think for all filters but
blkdebug it makes sense to just pass this through to the filtered child,
because this should fundamentally go down to the protocol layer anyway.

However, when looking at who uses this function at all, it appears that
this is just used for guest device configuration (so the guest device's
cluster size matches the hosts).  qcow2 doesn't support this at all, so
if you use qcow2, you'll just get the default of BDRV_SECTOR_SIZE.  If
you want to override the auto-detection, you can set a device-level
option.  So I don't think we need support in blkdebug to emulate a
different block size, because of two reasons:

First, it wouldn't be a test of the block layer, because the block layer
really doesn't care about this (internally).

So second, it would only be a test of a guest.  But if you want to test
that, you can always just set the device-level option.

>>         return -ENOTSUP;
>> @@ -551,11 +552,12 @@ int bdrv_probe_blocksizes(BlockDriverState *bs,
>> BlockSizes *bsz)
>>   int bdrv_probe_geometry(BlockDriverState *bs, HDGeometry *geo)
>>   {
>>       BlockDriver *drv = bs->drv;
>> +    BlockDriverState *filtered = bdrv_filtered_rw_bs(bs);
>>         if (drv && drv->bdrv_probe_geometry) {
>>           return drv->bdrv_probe_geometry(bs, geo);
>> -    } else if (drv && drv->is_filter && bs->file) {
>> -        return bdrv_probe_geometry(bs->file->bs, geo);
>> +    } else if (filtered) {
>> +        return bdrv_probe_geometry(filtered, geo);
>>       }
> 
> At least you're consistent on skipping filters.

I tried my best to come up with something that makes sense.  Sometimes
it made me nearly go insane.

>> @@ -4068,7 +4074,19 @@ BlockDriverState *bdrv_lookup_bs(const char
>> *device,
>>   bool bdrv_chain_contains(BlockDriverState *top, BlockDriverState *base)
>>   {
>>       while (top && top != base) {
>> -        top = backing_bs(top);
>> +        top = bdrv_filtered_bs(top);
>> +    }
>> +
>> +    return top != NULL;
>> +}
>> +
>> +/* Same as bdrv_chain_contains(), but skip implicitly added R/W filter
>> + * nodes and do not move past explicitly added R/W filters. */
>> +bool bdrv_legacy_chain_contains(BlockDriverState *top,
>> BlockDriverState *base)
>> +{
>> +    top = bdrv_skip_implicit_filters(top);
>> +    while (top && top != base) {
>> +        top = bdrv_skip_implicit_filters(bdrv_filtered_cow_bs(top));
>>       }
> 
> Is there a goal of getting rid of bdrv_legacy_chain_contains() in the
> future?  If so, should the commit message and/or code comments mention
> that?

The only thing that's using it is qmp_block_commit.  I think the
long-term goal is to get rid of the commit job and replace it by
blockdev-copy, which would make the use of that function moot, but I
suppose we have to keep it around as long as block-commit is there.

>>         return top != NULL;
>> @@ -4140,20 +4158,24 @@ int bdrv_has_zero_init_1(BlockDriverState *bs)
>>     int bdrv_has_zero_init(BlockDriverState *bs)
>>   {
>> +    BlockDriverState *filtered;
>> +
>>       if (!bs->drv) {
>>           return 0;
>>       }
>>         /* If BS is a copy on write image, it is initialized to
>>          the contents of the base image, which may not be zeroes.  */
>> -    if (bs->backing) {
>> +    if (bdrv_filtered_cow_child(bs)) {
>>           return 0;
> 
> Not for this patch, but should we ask the filtered_cow_child if it is
> known to be all-zero content before blindly returning 0 here? Some
> children may be able to efficiently report if they have all-zero content
> [for example, see the recent thread about NBD performace drop due to
> explicitly zeroing the remote device, which could be skipped if it is
> known that the remote device started life uninitialized]

The question is, why would you have an empty backing file?

>>       }
>>       if (bs->drv->bdrv_has_zero_init) {
>>           return bs->drv->bdrv_has_zero_init(bs);
>>       }
>> -    if (bs->file && bs->drv->is_filter) {
>> -        return bdrv_has_zero_init(bs->file->bs);
>> +
>> +    filtered = bdrv_filtered_rw_bs(bs);
>> +    if (filtered) {
>> +        return bdrv_has_zero_init(filtered);
>>       }
> 
> You argued earlier that a filter can't change contents - but is that
> just guest-visible contents? If LUKS is a filter node, then a file that
> is zero-initialized is NOT zero-initialized after passing through LUKS
> encryption (decrypting the zeros returns garbage; conversely, writing
> zeros into LUKS results in random-looking bits in the file).  I guess
> I'm leaning more and more towards LUKS is not a filter, but a format.

Yeah.  I think it's just not useful to consider LUKS a filter, because
if filters can change data -- then what good is having the "filter"
category?

>> @@ -4198,8 +4220,9 @@ int bdrv_get_info(BlockDriverState *bs,
>> BlockDriverInfo *bdi)
>>           return -ENOMEDIUM;
>>       }
>>       if (!drv->bdrv_get_info) {
>> -        if (bs->file && drv->is_filter) {
>> -            return bdrv_get_info(bs->file->bs, bdi);
>> +        BlockDriverState *filtered = bdrv_filtered_rw_bs(bs);
>> +        if (filtered) {
>> +            return bdrv_get_info(filtered, bdi);
> 
> Is this right for blkdebug?

I think it is.  If it wants to intercept this function, it's free to
implement .bdrv_get_info.

The alternative is returning -ENOTSUP, and I don't see how that's any
better than passing data through from the child.

>> @@ -5487,3 +5519,105 @@ bool
>> bdrv_can_store_new_dirty_bitmap(BlockDriverState *bs, const char *name,
>>         return drv->bdrv_can_store_new_dirty_bitmap(bs, name,
>> granularity, errp);
>>   }
>> +
>> +/*
>> + * Return the child that @bs acts as an overlay for, and from which
>> data may be
>> + * copied in COW or COR operations.  Usually this is the backing file.
>> + */
>> +BdrvChild *bdrv_filtered_cow_child(BlockDriverState *bs)
>> +{
>> +    if (!bs || !bs->drv) {
>> +        return NULL;
>> +    }
>> +
>> +    if (bs->drv->is_filter) {
>> +        return NULL;
>> +    }
> 
> Here, filters end the search...

Yes, because COW parents have is_filter == false...

>> +
>> +    return bs->backing;
>> +}
>> +
>> +/*
>> + * If @bs acts as a pass-through filter for one of its children,
>> + * return that child.  "Pass-through" means that write operations to
>> + * @bs are forwarded to that child instead of triggering COW.
>> + */
>> +BdrvChild *bdrv_filtered_rw_child(BlockDriverState *bs)
>> +{
>> +    if (!bs || !bs->drv) {
>> +        return NULL;
>> +    }
>> +
>> +    if (!bs->drv->is_filter) {
>> +        return NULL;
>> +    }
> 
> ...while here, non-filters end the search. I think I follow your
> semantics (we were abusing bs->backing for filters, and your code is now
> trying to distinguish what was really meant)

...while R/W filter parents have is_filter == true.  So that's why it's
the opposite.

>> +
>> +    return bs->backing ?: bs->file;
>> +}
>> +
>> +/*
>> + * Return any filtered child, independently on how it reacts to write
> 
> s/on/of/

Indeed.

>> + * accesses and whether data is copied onto this BDS through COR.
>> + */

[...]

>> +++ b/block/io.c
>> @@ -120,6 +120,7 @@ static void bdrv_merge_limits(BlockLimits *dst,
>> const BlockLimits *src)
>>   void bdrv_refresh_limits(BlockDriverState *bs, Error **errp)
>>   {
>>       BlockDriver *drv = bs->drv;
>> +    BlockDriverState *cow_bs = bdrv_filtered_cow_bs(bs);
>>       Error *local_err = NULL;
>>         memset(&bs->bl, 0, sizeof(bs->bl));
>> @@ -148,13 +149,13 @@ void bdrv_refresh_limits(BlockDriverState *bs,
>> Error **errp)
>>           bs->bl.max_iov = IOV_MAX;
>>       }
>>   -    if (bs->backing) {
>> -        bdrv_refresh_limits(bs->backing->bs, &local_err);
>> +    if (cow_bs) {
>> +        bdrv_refresh_limits(cow_bs, &local_err);
>>           if (local_err) {
>>               error_propagate(errp, local_err);
>>               return;
>>           }
>> -        bdrv_merge_limits(&bs->bl, &bs->backing->bs->bl);
>> +        bdrv_merge_limits(&bs->bl, &cow_bs->bl);
> 
> Is this doing the right things with blkdebug?

First, blkdebug doesn't have a COW child, does it?

Second, we still always invoke the driver's implementation (if there is
one).  All of the code at the beginning of the function just chooses
some defaults.  So blkdebug can still override everything.

But there is indeed something wrong here.  And that is: What is with R/W
filter drivers that use bs->backing?  After this patch, they won't get
any defaults.

So I think the change that is needed is:
- The bs->file branch should be transformed into a bdrv_storage_bs()
  branch (this is done by the next patch already, good)
- The bs->backing branch should be transformed into a bdrv_filtered_bs()
  branch

Then we have the following cases:
- R/W filters will go into the second branch rather than the first, but
  that's OK, because the code is the same anyway.
  (But all filters that used bs->backing already did go into the second
  branch, so...)
- COW nodes (with both a storage child and a filtered child) will
  continue to go into both branches and get a joined result.
- Non-COW format nodes will continue to go into the first branch.

Before we have bs_storage_bs() (that is, before the next patch), I think
it's OK to make filter nodes that use bs->file go into both branches
(because bs->file is set for them, so they'll go into the first branch,
and then, as filters, they'll go into the second branch).


So I think all that's needed is s/cow_bs/filtered_bs/ and
s/bdrv_filtered_cow_bs/bdrv_filtered_bs/.

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH v2 04/11] block: Storage child access function
  2018-11-12 22:32   ` Eric Blake
@ 2018-11-14 19:56     ` Max Reitz
  0 siblings, 0 replies; 27+ messages in thread
From: Max Reitz @ 2018-11-14 19:56 UTC (permalink / raw)
  To: Eric Blake, qemu-block; +Cc: Kevin Wolf, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 1195 bytes --]

On 12.11.18 23:32, Eric Blake wrote:
> On 8/9/18 5:31 PM, Max Reitz wrote:
>> For completeness' sake, add a function for accessing a node's storage
>> child, too.  For filters, this is there filtered child; for non-filters,
> 
> s/there/their/
> 
>> this is bs->file.
>>
>> Some places are deliberately left unconverted:
>> - BDS opening/closing functions where bs->file is handled specially
>>    (which is basically wrong, but at least simplifies probing)
>> - bdrv_co_block_status_from_file(), because its name implies that it
>>    points to ->file
> 
> I'm wondering if we can clean up block_status to let filters have a NULL
> callback and io.c do the right thing automatically, rather than the
> current approach of filters assigning the callback to the common helper
> routine.  Maybe later in the series.

Hm!  I didn't even think of that.  Yes, everything that uses
bdrv_co_block_status_from_*() seems to be an R/W filter, and which they
use simply depends on whether they use bs->backing or bs->file.  Well,
blkdebug is the exception, because it really wants to put an assertion
there, but inlining the code there shouldn't be the showstopper.

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH v2 07/11] block: Leave BDS.backing_file constant
  2018-11-12 23:08   ` Eric Blake
@ 2018-11-14 20:01     ` Max Reitz
  0 siblings, 0 replies; 27+ messages in thread
From: Max Reitz @ 2018-11-14 20:01 UTC (permalink / raw)
  To: Eric Blake, qemu-block; +Cc: Kevin Wolf, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 6198 bytes --]

On 13.11.18 00:08, Eric Blake wrote:
> On 8/9/18 5:31 PM, Max Reitz wrote:
>> Parts of the block layer treat BDS.backing_file as if it were whatever
>> the image header says (i.e., if it is a relative path, it is relative to
>> the overlay), other parts treat it like a cache for
>> bs->backing->bs->filename (relative paths are relative to the CWD).
>> Considering bs->backing->bs->filename exists, let us make it mean the
>> former.
>>
>> Among other things, this now allows the user to specify a base when
>> using qemu-img to commit an image file in a directory that is not the
>> CWD (assuming, everything uses relative filenames).
>>
>> Before this patch:
>>
>> $ ./qemu-img create -f qcow2 foo/bot.qcow2 1M
>> $ ./qemu-img create -f qcow2 -b bot.qcow2 foo/mid.qcow2
>> $ ./qemu-img create -f qcow2 -b mid.qcow2 foo/top.qcow2
>> $ ./qemu-img commit -b mid.qcow2 foo/top.qcow2
>> qemu-img: Did not find 'mid.qcow2' in the backing chain of
>> 'foo/top.qcow2'
>> $ ./qemu-img commit -b foo/mid.qcow2 foo/top.qcow2
>> qemu-img: Did not find 'foo/mid.qcow2' in the backing chain of
>> 'foo/top.qcow2'
>> $ ./qemu-img commit -b $PWD/foo/mid.qcow2 foo/top.qcow2
>> qemu-img: Did not find '[...]/foo/mid.qcow2' in the backing chain of
>> 'foo/top.qcow2'
> 
> Three failures in a row - no way to commit short of changing your
> working directory.
> 
>>
>> After this patch:
>>
>> $ ./qemu-img commit -b mid.qcow2 foo/top.qcow2
>> Image committed.
>> $ ./qemu-img commit -b foo/mid.qcow2 foo/top.qcow2
>> qemu-img: Did not find 'foo/mid.qcow2' in the backing chain of
>> 'foo/top.qcow2'
>> $ ./qemu-img commit -b $PWD/foo/mid.qcow2 foo/top.qcow2
>> Image committed.
> 
> Yay, that looks saner.
> 
>>
>> With this change, bdrv_find_backing_image() must look at whether the
>> user has overridden a BDS's backing file.  If so, it can no longer use
>> bs->backing_file, but must instead compare the given filename against
>> the backing node's filename directly.
>>
>> Note that this changes the QAPI output for a node's backing_file.  We
>> had very inconsistent output there (sometimes what the image header
>> said, sometimes the actual filename of the backing image).  This
>> inconsistent output was effectively useless, so we have to decide one
>> way or the other.  Considering that bs->backing_file usually at runtime
>> contained the path to the image relative to qemu's CWD (or absolute),
>> this patch changes QAPI's backing_file to always report the
>> bs->backing->bs->filename from now on.  If you want to receive the image
>> header information, you have to refer to full-backing-filename.
>>
>> This necessitates a change to iotest 228.  The interesting information
>> it really wanted is the image header, and it can get that now, but it
>> has to use full-backing-filename instead of backing_file.  Because of
>> this patch's changes to bs->backing_file's behavior, we also need some
>> reference output changes.
>>
>> Along with the changes to bs->backing_file, stop updating
>> BDS.backing_format in bdrv_backing_attach() as well.  This necessitates
>> a change to the reference output of iotest 191.
> 
> Good explanations for the test changes.
> 
>>
>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>> ---
>>   include/block/block_int.h  | 14 +++++++++-----
>>   block.c                    | 29 ++++++++++++++++++++++-------
>>   block/qapi.c               |  7 ++++---
>>   qemu-img.c                 | 12 ++++++++++--
>>   tests/qemu-iotests/191.out |  1 -
>>   tests/qemu-iotests/228     |  6 +++---
>>   tests/qemu-iotests/228.out |  6 +++---
>>   7 files changed, 51 insertions(+), 24 deletions(-)
>>
>> diff --git a/include/block/block_int.h b/include/block/block_int.h
>> index d3d8b22155..8f2c515ec1 100644
>> --- a/include/block/block_int.h
>> +++ b/include/block/block_int.h
>> @@ -737,11 +737,15 @@ struct BlockDriverState {
>>       bool walking_aio_notifiers; /* to make removal during iteration
>> safe */
>>         char filename[PATH_MAX];
>> -    char backing_file[PATH_MAX]; /* if non zero, the image is a diff of
>> -                                    this file image */
>> -    /* The backing filename indicated by the image header; if we ever
>> -     * open this file, then this is replaced by the resulting BDS's
>> -     * filename (i.e. after a bdrv_refresh_filename() run). */
>> +    /* If non-zero, the image is a diff of this image file.  Note that
> 
> Pre-existing, but that sentence might read nicer as:
> 
> If not empty, this image is a diff in relation to backing_file.
> 
>> +     * this the name given in the image header and may therefore not
> 
> "this the name" is wrong; did you mean "this is the name" or "this name"
> or "the name"?

Would any of the latter two make more grammatical sense? O:-)

Will fix.

I'll also fix the "may" wording.  Like this it sounds as if this is not
allowed to be equal to .backing->bs->filename, which of course is not
true.  It may or may not be equal.  So, I'll reword to:

"If not empty, this image is a diff in relation to backing_file.  Note
that this is the name given in the image header and therefore may or may
not be equal to .backing->bs->filename.  If this field contains a
relative path, it is to be resolved relatively to the overlay's location."


Thanks for reviewing!

Max

>> +     * be equal to .backing->bs->filename, and relative paths are
>> +     * resolved relatively to their overlay. */
>> +    char backing_file[PATH_MAX];
>> +    /* The backing filename indicated by the image header.  Contrary
>> +     * to backing_file, if we ever open this file, auto_backing_file
>> +     * is replaced by the resulting BDS's filename (i.e. after a
>> +     * bdrv_refresh_filename() run). */
>>       char auto_backing_file[PATH_MAX];
>>       char backing_format[16]; /* if non-zero and backing_file exists */
>>   
> 
> Reviewed-by: Eric Blake <eblake@redhat.com>
> 



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH v2 03/11] block: Filtered children access functions
  2018-11-14 19:52     ` Max Reitz
@ 2019-02-13 16:42       ` Max Reitz
  0 siblings, 0 replies; 27+ messages in thread
From: Max Reitz @ 2019-02-13 16:42 UTC (permalink / raw)
  To: Eric Blake, qemu-block; +Cc: Kevin Wolf, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 2042 bytes --]

On 14.11.18 20:52, Max Reitz wrote:
> On 12.11.18 23:17, Eric Blake wrote:
>> On 8/9/18 5:31 PM, Max Reitz wrote:

[...]

>>> +++ b/block/io.c
>>> @@ -120,6 +120,7 @@ static void bdrv_merge_limits(BlockLimits *dst,
>>> const BlockLimits *src)
>>>   void bdrv_refresh_limits(BlockDriverState *bs, Error **errp)
>>>   {
>>>       BlockDriver *drv = bs->drv;
>>> +    BlockDriverState *cow_bs = bdrv_filtered_cow_bs(bs);
>>>       Error *local_err = NULL;
>>>         memset(&bs->bl, 0, sizeof(bs->bl));
>>> @@ -148,13 +149,13 @@ void bdrv_refresh_limits(BlockDriverState *bs,
>>> Error **errp)
>>>           bs->bl.max_iov = IOV_MAX;
>>>       }
>>>   -    if (bs->backing) {
>>> -        bdrv_refresh_limits(bs->backing->bs, &local_err);
>>> +    if (cow_bs) {
>>> +        bdrv_refresh_limits(cow_bs, &local_err);
>>>           if (local_err) {
>>>               error_propagate(errp, local_err);
>>>               return;
>>>           }
>>> -        bdrv_merge_limits(&bs->bl, &bs->backing->bs->bl);
>>> +        bdrv_merge_limits(&bs->bl, &cow_bs->bl);
>>
>> Is this doing the right things with blkdebug?
> 
> First, blkdebug doesn't have a COW child, does it?
> 
> Second, we still always invoke the driver's implementation (if there is
> one).  All of the code at the beginning of the function just chooses
> some defaults.  So blkdebug can still override everything.
> 
> But there is indeed something wrong here.  And that is: What is with R/W
> filter drivers that use bs->backing?  After this patch, they won't get
> any defaults.

Hm, yeah, but after the next one they're alright because
bdrv_storage_bs() returns filtered children.  So the issue is just the
span in between...

I suppose I can solve this by assigning bs->file to storage_bs, or
bs->backing if both bs->file and cow_bs are NULL.  And then put a FIXME
behind it that the next patch will solve.

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2019-02-13 16:43 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-08-09 22:31 [Qemu-devel] [PATCH v2 00/11] block: Deal with filters Max Reitz
2018-08-09 22:31 ` [Qemu-devel] [PATCH v2 01/11] block: Mark commit and mirror as filter drivers Max Reitz
2018-09-07 12:37   ` [Qemu-devel] [Qemu-block] " Alberto Garcia
2018-08-09 22:31 ` [Qemu-devel] [PATCH v2 02/11] blockdev: Check @replaces in blockdev_mirror_common Max Reitz
2018-11-09 22:51   ` Eric Blake
2018-08-09 22:31 ` [Qemu-devel] [PATCH v2 03/11] block: Filtered children access functions Max Reitz
2018-11-12 22:17   ` Eric Blake
2018-11-14 19:52     ` Max Reitz
2019-02-13 16:42       ` Max Reitz
2018-08-09 22:31 ` [Qemu-devel] [PATCH v2 04/11] block: Storage child access function Max Reitz
2018-11-12 22:32   ` Eric Blake
2018-11-14 19:56     ` Max Reitz
2018-08-09 22:31 ` [Qemu-devel] [PATCH v2 05/11] block: Fix check_to_replace_node() Max Reitz
2018-11-12 22:47   ` [Qemu-devel] [for 3.1? Qemu-devel] " Eric Blake
2018-08-09 22:31 ` [Qemu-devel] [PATCH v2 06/11] iotests: Add tests for mirror @replaces loops Max Reitz
2018-08-09 22:47   ` Max Reitz
2018-11-12 22:54   ` Eric Blake
2018-08-09 22:31 ` [Qemu-devel] [PATCH v2 07/11] block: Leave BDS.backing_file constant Max Reitz
2018-11-12 23:08   ` Eric Blake
2018-11-14 20:01     ` Max Reitz
2018-08-09 22:31 ` [Qemu-devel] [PATCH v2 08/11] iotests: Add filter commit test cases Max Reitz
2018-08-09 22:31 ` [Qemu-devel] [PATCH v2 09/11] iotests: Add filter mirror " Max Reitz
2018-08-09 22:31 ` [Qemu-devel] [PATCH v2 10/11] iotests: Add test for commit in sub directory Max Reitz
2018-08-09 22:31 ` [Qemu-devel] [PATCH v2 11/11] iotests: Test committing to overridden backing Max Reitz
2018-08-09 22:33 ` [Qemu-devel] [PATCH v2 00/11] block: Deal with filters Max Reitz
2018-08-29 13:29 ` Max Reitz
2018-11-09 22:47 ` Eric Blake

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.