All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v4 00/11] block: Deal with filters
@ 2019-04-10 20:20 ` Max Reitz
  0 siblings, 0 replies; 48+ messages in thread
From: Max Reitz @ 2019-04-10 20:20 UTC (permalink / raw)
  To: qemu-block; +Cc: qemu-devel, Max Reitz, Kevin Wolf, Eric Blake

Note: This is technically the first part of my active mirror followup.
But just very technically.  I noticed that that followup started to
consist of two parts, namely (A) fix filtery things in the block layer,
and (B) fix active mirror.  So I decided to split it.  This is part A.
Part B is “mirror: Mainly coroutine refinements”.


When we introduced filters, we did it a bit casually.  Sure, we talked a
lot about them before, but that was mostly discussion about where
implicit filters should be added to the graph (note that we currently
only have two implicit filters, those being mirror and commit).  But in
the end, we really just designated some drivers filters (Quorum,
blkdebug, etc.) and added some specifically (throttle, COR), without
really looking through the block layer to see where issues might occur.

It turns out vast areas of the block layer just don’t know about filters
and cannot really handle them.  Many cases will work in practice, in
others, well, too bad, you cannot use some feature because some part
deep inside the block layer looks at your filters and thinks they are
format nodes.

This series sets out to correct a bit of that.  I lost my head many
times and I’m sure this series is incomplete in many ways, but it really
doesn’t do any good if it sits on my disk any longer, it needs to go out
now.

The most important patches of this series are patches 2 and 3.  These
introduce functions to encapsulate bs->backing and bs->file accesses.
Because sometimes, bs->backing means COW, sometimes it means filtered
node.  And sometimes, bs->file means metadata storage, and sometimes it
means filtered node.  With this functions, it’s always clear what the
caller wants, and it will always get what it wants.

Besides that, patch 2 introduces functions to skip filters which may be
used by parts of the block layer that just don’t care about them.


Secondly, the restraints put on mirror’s @replaces parameter are
revisited and fixed.


Thirdly, BDS.backing_file is changed to be constant.  I don’t quite know
why we modify it whenever we change a BDS’s backing file, but that’s
definitely not quite right.  This fixes things like being able to
perform a commit on a file (using relative filenames) in a directory
that’s not qemu’s CWD.


Finally, a number of tests are added.


There are probably many things that are worthy of discussion, of which
only some come to my head, e.g.:

- In which cases do we want to skip filters, in which cases do we want
  to skip implicit filters?
  My approach was to basically never skip explicitly added filters,
  except when it’s about finding a file in some tree (e.g. in a backing
  chain).  Maybe there are cases where you think we should skip even
  explicitly added filters.

- I made interesting decisions like “When you mirror from a node, we
  should indeed mirror from that node, but when replacing it, we should
  skip leave all implicit filters on top intact.”  You may disagree with
  that.
  (My reasoning here is that users aren’t supposed to know about
  implicit filters, and therefore, they should not intend to remove
  them.  Also, mirror accepts only root nodes as the source, so you
  cannot really specify the node below the implicit filters.  But you
  can use @replaces to drop the implicit filters, if you know they are
  there.)

- bdrv_query_bds_stats() is changed: “parent” now means storage,
  “backing” means COW.  This is what makes sense, although it breaks
  compatibility; but only for filters that use bs->backing for the
  filtered child (i.e. mirror top and commit top).  The alternatives
  would be:
  - Leave everything as it is.  But this means that whenever you add
    another filter (throttle or COR), the backing chain is still broken
    because they use bs->file for their filtered child.  So this is not
    really an option.
  - Present all filtered children under “backing”.  We would need to
    present them under “parent” as well, though, if they are referenced
    as bs->file, otherwise this too would break compatibility and would
    not be any better.
    This seems rather broken because we may present the same node twice
    (once as “parent”, once as “backing”).
    Well, or we decide to break compatibility here, too, but to me it
    seems wrong to present filtered nodes under “backing” but not under
    “parent”.

  So I went for the solution that makes the most sense to me.


v4:
- Dropped patch 2 because it’s in master
- (New) patch 2:
  - Fixed the description of BDS.is_filter (requested by Kevin); I
    didn’t do that before patch 1 because the description is kind of
    wrong today already (Quorum does not have a bs->file but has been
    marked a filter from the start).  It was only kind of wrong
    because the description just claims that some callbacks get
    automatically passed to bs->file, not that the filter must have
    bs->file present.  But then patch 1 is correct without adjusting the
    description, and we only need to do so here, in patch 2.  (Because
    now the callbacks may be passed to bs->backing, too.)
  - Rebase conflicts, mostly due to *backing_chain_frozen() and reopen
    stuff in general (we usually still want to access bs->backing
    instead of using the new functions because this is specifically
    about bs->backing, not about the COW child).
Patch 7: Keep iotest 245 passing


git-backport-diff against v3:

Key:
[----] : patches are identical
[####] : number of functional differences between upstream/downstream patch
[down] : patch is downstream-only
The flags [FC] indicate (F)unctional and (C)ontextual differences, respectively

001/11:[----] [--] 'block: Mark commit and mirror as filter drivers'
002/11:[0034] [FC] 'block: Filtered children access functions'
003/11:[----] [--] 'block: Storage child access function'
004/11:[----] [-C] 'block: Inline bdrv_co_block_status_from_*()'
005/11:[----] [--] 'block: Fix check_to_replace_node()'
006/11:[----] [--] 'iotests: Add tests for mirror @replaces loops'
007/11:[0004] [FC] 'block: Leave BDS.backing_file constant'
008/11:[----] [--] 'iotests: Add filter commit test cases'
009/11:[----] [--] 'iotests: Add filter mirror test cases'
010/11:[----] [--] 'iotests: Add test for commit in sub directory'
011/11:[----] [--] 'iotests: Test committing to overridden backing'


Max Reitz (11):
  block: Mark commit and mirror as filter drivers
  block: Filtered children access functions
  block: Storage child access function
  block: Inline bdrv_co_block_status_from_*()
  block: Fix check_to_replace_node()
  iotests: Add tests for mirror @replaces loops
  block: Leave BDS.backing_file constant
  iotests: Add filter commit test cases
  iotests: Add filter mirror test cases
  iotests: Add test for commit in sub directory
  iotests: Test committing to overridden backing

 qapi/block-core.json           |   4 +
 include/block/block.h          |   2 +
 include/block/block_int.h      |  87 +++++---
 block.c                        | 381 +++++++++++++++++++++++++++------
 block/backup.c                 |   8 +-
 block/blkdebug.c               |   7 +-
 block/blklogwrites.c           |   1 -
 block/block-backend.c          |  16 +-
 block/commit.c                 |  36 ++--
 block/copy-on-read.c           |   2 -
 block/io.c                     | 102 ++++-----
 block/mirror.c                 |  24 ++-
 block/qapi.c                   |  42 ++--
 block/snapshot.c               |  40 ++--
 block/stream.c                 |  13 +-
 block/throttle.c               |   1 -
 blockdev.c                     | 122 +++++++++--
 migration/block-dirty-bitmap.c |   4 +-
 nbd/server.c                   |   6 +-
 qemu-img.c                     |  41 ++--
 tests/qemu-iotests/020         |  36 ++++
 tests/qemu-iotests/020.out     |  10 +
 tests/qemu-iotests/040         | 191 +++++++++++++++++
 tests/qemu-iotests/040.out     |   4 +-
 tests/qemu-iotests/041         | 270 ++++++++++++++++++++++-
 tests/qemu-iotests/041.out     |   4 +-
 tests/qemu-iotests/184.out     |   7 +-
 tests/qemu-iotests/191.out     |   1 -
 tests/qemu-iotests/204.out     |   1 +
 tests/qemu-iotests/228         |   6 +-
 tests/qemu-iotests/228.out     |   6 +-
 tests/qemu-iotests/245         |   4 +-
 32 files changed, 1194 insertions(+), 285 deletions(-)

-- 
2.20.1

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [Qemu-devel] [PATCH v4 00/11] block: Deal with filters
@ 2019-04-10 20:20 ` Max Reitz
  0 siblings, 0 replies; 48+ messages in thread
From: Max Reitz @ 2019-04-10 20:20 UTC (permalink / raw)
  To: qemu-block; +Cc: Kevin Wolf, qemu-devel, Max Reitz

Note: This is technically the first part of my active mirror followup.
But just very technically.  I noticed that that followup started to
consist of two parts, namely (A) fix filtery things in the block layer,
and (B) fix active mirror.  So I decided to split it.  This is part A.
Part B is “mirror: Mainly coroutine refinements”.


When we introduced filters, we did it a bit casually.  Sure, we talked a
lot about them before, but that was mostly discussion about where
implicit filters should be added to the graph (note that we currently
only have two implicit filters, those being mirror and commit).  But in
the end, we really just designated some drivers filters (Quorum,
blkdebug, etc.) and added some specifically (throttle, COR), without
really looking through the block layer to see where issues might occur.

It turns out vast areas of the block layer just don’t know about filters
and cannot really handle them.  Many cases will work in practice, in
others, well, too bad, you cannot use some feature because some part
deep inside the block layer looks at your filters and thinks they are
format nodes.

This series sets out to correct a bit of that.  I lost my head many
times and I’m sure this series is incomplete in many ways, but it really
doesn’t do any good if it sits on my disk any longer, it needs to go out
now.

The most important patches of this series are patches 2 and 3.  These
introduce functions to encapsulate bs->backing and bs->file accesses.
Because sometimes, bs->backing means COW, sometimes it means filtered
node.  And sometimes, bs->file means metadata storage, and sometimes it
means filtered node.  With this functions, it’s always clear what the
caller wants, and it will always get what it wants.

Besides that, patch 2 introduces functions to skip filters which may be
used by parts of the block layer that just don’t care about them.


Secondly, the restraints put on mirror’s @replaces parameter are
revisited and fixed.


Thirdly, BDS.backing_file is changed to be constant.  I don’t quite know
why we modify it whenever we change a BDS’s backing file, but that’s
definitely not quite right.  This fixes things like being able to
perform a commit on a file (using relative filenames) in a directory
that’s not qemu’s CWD.


Finally, a number of tests are added.


There are probably many things that are worthy of discussion, of which
only some come to my head, e.g.:

- In which cases do we want to skip filters, in which cases do we want
  to skip implicit filters?
  My approach was to basically never skip explicitly added filters,
  except when it’s about finding a file in some tree (e.g. in a backing
  chain).  Maybe there are cases where you think we should skip even
  explicitly added filters.

- I made interesting decisions like “When you mirror from a node, we
  should indeed mirror from that node, but when replacing it, we should
  skip leave all implicit filters on top intact.”  You may disagree with
  that.
  (My reasoning here is that users aren’t supposed to know about
  implicit filters, and therefore, they should not intend to remove
  them.  Also, mirror accepts only root nodes as the source, so you
  cannot really specify the node below the implicit filters.  But you
  can use @replaces to drop the implicit filters, if you know they are
  there.)

- bdrv_query_bds_stats() is changed: “parent” now means storage,
  “backing” means COW.  This is what makes sense, although it breaks
  compatibility; but only for filters that use bs->backing for the
  filtered child (i.e. mirror top and commit top).  The alternatives
  would be:
  - Leave everything as it is.  But this means that whenever you add
    another filter (throttle or COR), the backing chain is still broken
    because they use bs->file for their filtered child.  So this is not
    really an option.
  - Present all filtered children under “backing”.  We would need to
    present them under “parent” as well, though, if they are referenced
    as bs->file, otherwise this too would break compatibility and would
    not be any better.
    This seems rather broken because we may present the same node twice
    (once as “parent”, once as “backing”).
    Well, or we decide to break compatibility here, too, but to me it
    seems wrong to present filtered nodes under “backing” but not under
    “parent”.

  So I went for the solution that makes the most sense to me.


v4:
- Dropped patch 2 because it’s in master
- (New) patch 2:
  - Fixed the description of BDS.is_filter (requested by Kevin); I
    didn’t do that before patch 1 because the description is kind of
    wrong today already (Quorum does not have a bs->file but has been
    marked a filter from the start).  It was only kind of wrong
    because the description just claims that some callbacks get
    automatically passed to bs->file, not that the filter must have
    bs->file present.  But then patch 1 is correct without adjusting the
    description, and we only need to do so here, in patch 2.  (Because
    now the callbacks may be passed to bs->backing, too.)
  - Rebase conflicts, mostly due to *backing_chain_frozen() and reopen
    stuff in general (we usually still want to access bs->backing
    instead of using the new functions because this is specifically
    about bs->backing, not about the COW child).
Patch 7: Keep iotest 245 passing


git-backport-diff against v3:

Key:
[----] : patches are identical
[####] : number of functional differences between upstream/downstream patch
[down] : patch is downstream-only
The flags [FC] indicate (F)unctional and (C)ontextual differences, respectively

001/11:[----] [--] 'block: Mark commit and mirror as filter drivers'
002/11:[0034] [FC] 'block: Filtered children access functions'
003/11:[----] [--] 'block: Storage child access function'
004/11:[----] [-C] 'block: Inline bdrv_co_block_status_from_*()'
005/11:[----] [--] 'block: Fix check_to_replace_node()'
006/11:[----] [--] 'iotests: Add tests for mirror @replaces loops'
007/11:[0004] [FC] 'block: Leave BDS.backing_file constant'
008/11:[----] [--] 'iotests: Add filter commit test cases'
009/11:[----] [--] 'iotests: Add filter mirror test cases'
010/11:[----] [--] 'iotests: Add test for commit in sub directory'
011/11:[----] [--] 'iotests: Test committing to overridden backing'


Max Reitz (11):
  block: Mark commit and mirror as filter drivers
  block: Filtered children access functions
  block: Storage child access function
  block: Inline bdrv_co_block_status_from_*()
  block: Fix check_to_replace_node()
  iotests: Add tests for mirror @replaces loops
  block: Leave BDS.backing_file constant
  iotests: Add filter commit test cases
  iotests: Add filter mirror test cases
  iotests: Add test for commit in sub directory
  iotests: Test committing to overridden backing

 qapi/block-core.json           |   4 +
 include/block/block.h          |   2 +
 include/block/block_int.h      |  87 +++++---
 block.c                        | 381 +++++++++++++++++++++++++++------
 block/backup.c                 |   8 +-
 block/blkdebug.c               |   7 +-
 block/blklogwrites.c           |   1 -
 block/block-backend.c          |  16 +-
 block/commit.c                 |  36 ++--
 block/copy-on-read.c           |   2 -
 block/io.c                     | 102 ++++-----
 block/mirror.c                 |  24 ++-
 block/qapi.c                   |  42 ++--
 block/snapshot.c               |  40 ++--
 block/stream.c                 |  13 +-
 block/throttle.c               |   1 -
 blockdev.c                     | 122 +++++++++--
 migration/block-dirty-bitmap.c |   4 +-
 nbd/server.c                   |   6 +-
 qemu-img.c                     |  41 ++--
 tests/qemu-iotests/020         |  36 ++++
 tests/qemu-iotests/020.out     |  10 +
 tests/qemu-iotests/040         | 191 +++++++++++++++++
 tests/qemu-iotests/040.out     |   4 +-
 tests/qemu-iotests/041         | 270 ++++++++++++++++++++++-
 tests/qemu-iotests/041.out     |   4 +-
 tests/qemu-iotests/184.out     |   7 +-
 tests/qemu-iotests/191.out     |   1 -
 tests/qemu-iotests/204.out     |   1 +
 tests/qemu-iotests/228         |   6 +-
 tests/qemu-iotests/228.out     |   6 +-
 tests/qemu-iotests/245         |   4 +-
 32 files changed, 1194 insertions(+), 285 deletions(-)

-- 
2.20.1



^ permalink raw reply	[flat|nested] 48+ messages in thread

* [Qemu-devel] [PATCH v4 01/11] block: Mark commit and mirror as filter drivers
@ 2019-04-10 20:20   ` Max Reitz
  0 siblings, 0 replies; 48+ messages in thread
From: Max Reitz @ 2019-04-10 20:20 UTC (permalink / raw)
  To: qemu-block; +Cc: qemu-devel, Max Reitz, Kevin Wolf, Eric Blake

The commit and mirror block nodes are filters, so they should be marked
as such.  (Strictly speaking, BDS.is_filter's documentation states that
a filter's child must be bs->file.  The following patch will relax this
restriction, however.)

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
---
 block/commit.c | 2 ++
 block/mirror.c | 2 ++
 2 files changed, 4 insertions(+)

diff --git a/block/commit.c b/block/commit.c
index ba60fef58a..02eab34925 100644
--- a/block/commit.c
+++ b/block/commit.c
@@ -257,6 +257,8 @@ static BlockDriver bdrv_commit_top = {
     .bdrv_co_block_status       = bdrv_co_block_status_from_backing,
     .bdrv_refresh_filename      = bdrv_commit_top_refresh_filename,
     .bdrv_child_perm            = bdrv_commit_top_child_perm,
+
+    .is_filter                  = true,
 };
 
 void commit_start(const char *job_id, BlockDriverState *bs,
diff --git a/block/mirror.c b/block/mirror.c
index ff15cfb197..8b2404051f 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -1489,6 +1489,8 @@ static BlockDriver bdrv_mirror_top = {
     .bdrv_co_block_status       = bdrv_co_block_status_from_backing,
     .bdrv_refresh_filename      = bdrv_mirror_top_refresh_filename,
     .bdrv_child_perm            = bdrv_mirror_top_child_perm,
+
+    .is_filter                  = true,
 };
 
 static void mirror_start_job(const char *job_id, BlockDriverState *bs,
-- 
2.20.1

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [Qemu-devel] [PATCH v4 01/11] block: Mark commit and mirror as filter drivers
@ 2019-04-10 20:20   ` Max Reitz
  0 siblings, 0 replies; 48+ messages in thread
From: Max Reitz @ 2019-04-10 20:20 UTC (permalink / raw)
  To: qemu-block; +Cc: Kevin Wolf, qemu-devel, Max Reitz

The commit and mirror block nodes are filters, so they should be marked
as such.  (Strictly speaking, BDS.is_filter's documentation states that
a filter's child must be bs->file.  The following patch will relax this
restriction, however.)

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
---
 block/commit.c | 2 ++
 block/mirror.c | 2 ++
 2 files changed, 4 insertions(+)

diff --git a/block/commit.c b/block/commit.c
index ba60fef58a..02eab34925 100644
--- a/block/commit.c
+++ b/block/commit.c
@@ -257,6 +257,8 @@ static BlockDriver bdrv_commit_top = {
     .bdrv_co_block_status       = bdrv_co_block_status_from_backing,
     .bdrv_refresh_filename      = bdrv_commit_top_refresh_filename,
     .bdrv_child_perm            = bdrv_commit_top_child_perm,
+
+    .is_filter                  = true,
 };
 
 void commit_start(const char *job_id, BlockDriverState *bs,
diff --git a/block/mirror.c b/block/mirror.c
index ff15cfb197..8b2404051f 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -1489,6 +1489,8 @@ static BlockDriver bdrv_mirror_top = {
     .bdrv_co_block_status       = bdrv_co_block_status_from_backing,
     .bdrv_refresh_filename      = bdrv_mirror_top_refresh_filename,
     .bdrv_child_perm            = bdrv_mirror_top_child_perm,
+
+    .is_filter                  = true,
 };
 
 static void mirror_start_job(const char *job_id, BlockDriverState *bs,
-- 
2.20.1



^ permalink raw reply	[flat|nested] 48+ messages in thread

* [Qemu-devel] [PATCH v4 02/11] block: Filtered children access functions
@ 2019-04-10 20:20   ` Max Reitz
  0 siblings, 0 replies; 48+ messages in thread
From: Max Reitz @ 2019-04-10 20:20 UTC (permalink / raw)
  To: qemu-block; +Cc: qemu-devel, Max Reitz, Kevin Wolf, Eric Blake

What bs->file and bs->backing mean depends on the node.  For filter
nodes, both signify a node that will eventually receive all R/W
accesses.  For format nodes, bs->file contains metadata and data, and
bs->backing will not receive writes -- instead, writes are COWed to
bs->file.  Usually.

In any case, it is not trivial to guess what a child means exactly with
our currently limited form of expression.  It is better to introduce
some functions that actually guarantee a meaning:

- bdrv_filtered_cow_child() will return the child that receives requests
  filtered through COW.  That is, reads may or may not be forwarded
  (depending on the overlay's allocation status), but writes never go to
  this child.

- bdrv_filtered_rw_child() will return the child that receives requests
  filtered through some very plain process.  Reads and writes issued to
  the parent will go to the child as well (although timing, etc. may be
  modified).

- All drivers but quorum (but quorum is pretty opaque to the general
  block layer anyway) always only have one of these children: All read
  requests must be served from the filtered_rw_child (if it exists), so
  if there was a filtered_cow_child in addition, it would not receive
  any requests at all.
  (The closest here is mirror, where all requests are passed on to the
  source, but with write-blocking, write requests are "COWed" to the
  target.  But that just means that the target is a special child that
  cannot be introspected by the generic block layer functions, and that
  source is a filtered_rw_child.)
  Therefore, we can also add bdrv_filtered_child() which returns that
  one child (or NULL, if there is no filtered child).

Also, many places in the current block layer should be skipping filters
(all filters or just the ones added implicitly, it depends) when going
through a block node chain.  They do not do that currently, but this
patch makes them.

One example for this is qemu-img map, which should skip filters and only
look at the COW elements in the graph.  The change to iotest 204's
reference output shows how using blkdebug on top of a COW node used to
make qemu-img map disregard the rest of the backing chain, but with this
patch, the allocation in the base image is reported correctly.

Furthermore, a note should be made that sometimes we do want to access
bs->backing directly.  This is whenever the operation in question is not
about accessing the COW child, but the "backing" child, be it COW or
not.  This is the case in functions such as bdrv_open_backing_file() or
whenever we have to deal with the special behavior of @backing as a
blockdev option, which is that it does not default to null like all
other child references do.

Finally, the query functions (query-block and query-named-block-nodes)
are modified to return any filtered child under "backing", not just
bs->backing or COW children.  This is so that filters do not interrupt
the reported backing chain.  This changes the output of iotest 184, as
the throttled node now appears as a backing child.

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 qapi/block-core.json           |   4 +
 include/block/block.h          |   1 +
 include/block/block_int.h      |  40 +++++--
 block.c                        | 210 +++++++++++++++++++++++++++------
 block/backup.c                 |   8 +-
 block/block-backend.c          |  16 ++-
 block/commit.c                 |  33 +++---
 block/io.c                     |  45 ++++---
 block/mirror.c                 |  21 ++--
 block/qapi.c                   |  30 +++--
 block/stream.c                 |  13 +-
 blockdev.c                     |  88 +++++++++++---
 migration/block-dirty-bitmap.c |   4 +-
 nbd/server.c                   |   6 +-
 qemu-img.c                     |  29 ++---
 tests/qemu-iotests/184.out     |   7 +-
 tests/qemu-iotests/204.out     |   1 +
 17 files changed, 411 insertions(+), 145 deletions(-)

diff --git a/qapi/block-core.json b/qapi/block-core.json
index 7ccbfff9d0..dbd9286e4a 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -2502,6 +2502,10 @@
 # On successful completion the image file is updated to drop the backing file
 # and the BLOCK_JOB_COMPLETED event is emitted.
 #
+# In case @device is a filter node, block-stream modifies the first non-filter
+# overlay node below it to point to base's backing node (or NULL if @base was
+# not specified) instead of modifying @device itself.
+#
 # @job-id: identifier for the newly-created block job. If
 #          omitted, the device name will be used. (Since 2.7)
 #
diff --git a/include/block/block.h b/include/block/block.h
index c7a26199aa..2005664f14 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -467,6 +467,7 @@ BlockDriverState *bdrv_lookup_bs(const char *device,
                                  const char *node_name,
                                  Error **errp);
 bool bdrv_chain_contains(BlockDriverState *top, BlockDriverState *base);
+bool bdrv_legacy_chain_contains(BlockDriverState *top, BlockDriverState *base);
 BlockDriverState *bdrv_next_node(BlockDriverState *bs);
 BlockDriverState *bdrv_next_all_states(BlockDriverState *bs);
 
diff --git a/include/block/block_int.h b/include/block/block_int.h
index 01e855a066..b22b1164f8 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -90,9 +90,11 @@ struct BlockDriver {
     int instance_size;
 
     /* set to true if the BlockDriver is a block filter. Block filters pass
-     * certain callbacks that refer to data (see block.c) to their bs->file if
-     * the driver doesn't implement them. Drivers that do not wish to forward
-     * must implement them and return -ENOTSUP.
+     * certain callbacks that refer to data (see block.c) to their bs->file
+     * or bs->backing (whichever one exists) if the driver doesn't implement
+     * them. Drivers that do not wish to forward must implement them and return
+     * -ENOTSUP.
+     * Note that filters are not allowed to modify data.
      */
     bool is_filter;
     /* for snapshots block filter like Quorum can implement the
@@ -906,11 +908,6 @@ typedef enum BlockMirrorBackingMode {
     MIRROR_LEAVE_BACKING_CHAIN,
 } BlockMirrorBackingMode;
 
-static inline BlockDriverState *backing_bs(BlockDriverState *bs)
-{
-    return bs->backing ? bs->backing->bs : NULL;
-}
-
 
 /* Essential block drivers which must always be statically linked into qemu, and
  * which therefore can be accessed without using bdrv_find_format() */
@@ -1243,4 +1240,31 @@ int coroutine_fn bdrv_co_copy_range_to(BdrvChild *src, uint64_t src_offset,
 
 int refresh_total_sectors(BlockDriverState *bs, int64_t hint);
 
+BdrvChild *bdrv_filtered_cow_child(BlockDriverState *bs);
+BdrvChild *bdrv_filtered_rw_child(BlockDriverState *bs);
+BdrvChild *bdrv_filtered_child(BlockDriverState *bs);
+BlockDriverState *bdrv_skip_implicit_filters(BlockDriverState *bs);
+BlockDriverState *bdrv_skip_rw_filters(BlockDriverState *bs);
+BlockDriverState *bdrv_backing_chain_next(BlockDriverState *bs);
+
+static inline BlockDriverState *child_bs(BdrvChild *child)
+{
+    return child ? child->bs : NULL;
+}
+
+static inline BlockDriverState *bdrv_filtered_cow_bs(BlockDriverState *bs)
+{
+    return child_bs(bdrv_filtered_cow_child(bs));
+}
+
+static inline BlockDriverState *bdrv_filtered_rw_bs(BlockDriverState *bs)
+{
+    return child_bs(bdrv_filtered_rw_child(bs));
+}
+
+static inline BlockDriverState *bdrv_filtered_bs(BlockDriverState *bs)
+{
+    return child_bs(bdrv_filtered_child(bs));
+}
+
 #endif /* BLOCK_INT_H */
diff --git a/block.c b/block.c
index 16615bc876..e8f6febda0 100644
--- a/block.c
+++ b/block.c
@@ -556,11 +556,12 @@ int bdrv_create_file(const char *filename, QemuOpts *opts, Error **errp)
 int bdrv_probe_blocksizes(BlockDriverState *bs, BlockSizes *bsz)
 {
     BlockDriver *drv = bs->drv;
+    BlockDriverState *filtered = bdrv_filtered_rw_bs(bs);
 
     if (drv && drv->bdrv_probe_blocksizes) {
         return drv->bdrv_probe_blocksizes(bs, bsz);
-    } else if (drv && drv->is_filter && bs->file) {
-        return bdrv_probe_blocksizes(bs->file->bs, bsz);
+    } else if (filtered) {
+        return bdrv_probe_blocksizes(filtered, bsz);
     }
 
     return -ENOTSUP;
@@ -575,11 +576,12 @@ int bdrv_probe_blocksizes(BlockDriverState *bs, BlockSizes *bsz)
 int bdrv_probe_geometry(BlockDriverState *bs, HDGeometry *geo)
 {
     BlockDriver *drv = bs->drv;
+    BlockDriverState *filtered = bdrv_filtered_rw_bs(bs);
 
     if (drv && drv->bdrv_probe_geometry) {
         return drv->bdrv_probe_geometry(bs, geo);
-    } else if (drv && drv->is_filter && bs->file) {
-        return bdrv_probe_geometry(bs->file->bs, geo);
+    } else if (filtered) {
+        return bdrv_probe_geometry(filtered, geo);
     }
 
     return -ENOTSUP;
@@ -2336,7 +2338,7 @@ static bool bdrv_inherits_from_recursive(BlockDriverState *child,
 }
 
 /*
- * Sets the backing file link of a BDS. A new reference is created; callers
+ * Sets the bs->backing link of a BDS. A new reference is created; callers
  * which don't need their own reference any more must call bdrv_unref().
  */
 void bdrv_set_backing_hd(BlockDriverState *bs, BlockDriverState *backing_hd,
@@ -2345,7 +2347,7 @@ void bdrv_set_backing_hd(BlockDriverState *bs, BlockDriverState *backing_hd,
     bool update_inherits_from = bdrv_chain_contains(bs, backing_hd) &&
         bdrv_inherits_from_recursive(backing_hd, bs);
 
-    if (bdrv_is_backing_chain_frozen(bs, backing_bs(bs), errp)) {
+    if (bdrv_is_backing_chain_frozen(bs, child_bs(bs->backing), errp)) {
         return;
     }
 
@@ -3467,14 +3469,17 @@ static int bdrv_reopen_parse_backing(BDRVReopenState *reopen_state,
     /*
      * Find the "actual" backing file by skipping all links that point
      * to an implicit node, if any (e.g. a commit filter node).
+     * We cannot use any of the bdrv_skip_*() functions here because
+     * those return the first explicit node, while we are looking for
+     * its overlay here.
      */
     overlay_bs = bs;
-    while (backing_bs(overlay_bs) && backing_bs(overlay_bs)->implicit) {
-        overlay_bs = backing_bs(overlay_bs);
+    while (overlay_bs->backing && bdrv_filtered_bs(overlay_bs)->implicit) {
+        overlay_bs = bdrv_filtered_bs(overlay_bs);
     }
 
     /* If we want to replace the backing file we need some extra checks */
-    if (new_backing_bs != backing_bs(overlay_bs)) {
+    if (new_backing_bs != child_bs(overlay_bs->backing)) {
         /* Check for implicit nodes between bs and its backing file */
         if (bs != overlay_bs) {
             error_setg(errp, "Cannot change backing link if '%s' has "
@@ -3482,8 +3487,8 @@ static int bdrv_reopen_parse_backing(BDRVReopenState *reopen_state,
             return -EPERM;
         }
         /* Check if the backing link that we want to replace is frozen */
-        if (bdrv_is_backing_chain_frozen(overlay_bs, backing_bs(overlay_bs),
-                                         errp)) {
+        if (bdrv_is_backing_chain_frozen(overlay_bs,
+                                         child_bs(overlay_bs->backing), errp)) {
             return -EPERM;
         }
         reopen_state->replace_backing_bs = true;
@@ -3634,7 +3639,7 @@ int bdrv_reopen_prepare(BDRVReopenState *reopen_state, BlockReopenQueue *queue,
      * its metadata. Otherwise the 'backing' option can be omitted.
      */
     if (drv->supports_backing && reopen_state->backing_missing &&
-        (backing_bs(reopen_state->bs) || reopen_state->bs->backing_file[0])) {
+        (reopen_state->bs->backing || reopen_state->bs->backing_file[0])) {
         error_setg(errp, "backing is missing for '%s'",
                    reopen_state->bs->node_name);
         ret = -EINVAL;
@@ -3779,7 +3784,7 @@ void bdrv_reopen_commit(BDRVReopenState *reopen_state)
      * from bdrv_set_backing_hd()) has the new values.
      */
     if (reopen_state->replace_backing_bs) {
-        BlockDriverState *old_backing_bs = backing_bs(bs);
+        BlockDriverState *old_backing_bs = child_bs(bs->backing);
         assert(!old_backing_bs || !old_backing_bs->implicit);
         /* Abort the permission update on the backing bs we're detaching */
         if (old_backing_bs) {
@@ -4203,8 +4208,8 @@ int bdrv_change_backing_file(BlockDriverState *bs,
 BlockDriverState *bdrv_find_overlay(BlockDriverState *active,
                                     BlockDriverState *bs)
 {
-    while (active && bs != backing_bs(active)) {
-        active = backing_bs(active);
+    while (active && bs != bdrv_filtered_bs(active)) {
+        active = bdrv_filtered_bs(active);
     }
 
     return active;
@@ -4226,11 +4231,11 @@ bool bdrv_is_backing_chain_frozen(BlockDriverState *bs, BlockDriverState *base,
 {
     BlockDriverState *i;
 
-    for (i = bs; i != base; i = backing_bs(i)) {
+    for (i = bs; i != base; i = child_bs(i->backing)) {
         if (i->backing && i->backing->frozen) {
             error_setg(errp, "Cannot change '%s' link from '%s' to '%s'",
                        i->backing->name, i->node_name,
-                       backing_bs(i)->node_name);
+                       i->backing->bs->node_name);
             return true;
         }
     }
@@ -4254,7 +4259,7 @@ int bdrv_freeze_backing_chain(BlockDriverState *bs, BlockDriverState *base,
         return -EPERM;
     }
 
-    for (i = bs; i != base; i = backing_bs(i)) {
+    for (i = bs; i != base; i = child_bs(i->backing)) {
         if (i->backing) {
             i->backing->frozen = true;
         }
@@ -4272,7 +4277,7 @@ void bdrv_unfreeze_backing_chain(BlockDriverState *bs, BlockDriverState *base)
 {
     BlockDriverState *i;
 
-    for (i = bs; i != base; i = backing_bs(i)) {
+    for (i = bs; i != base; i = child_bs(i->backing)) {
         if (i->backing) {
             assert(i->backing->frozen);
             i->backing->frozen = false;
@@ -4342,9 +4347,7 @@ int bdrv_drop_intermediate(BlockDriverState *top, BlockDriverState *base,
      * other intermediate nodes have been dropped.
      * If 'top' is an implicit node (e.g. "commit_top") we should skip
      * it because no one inherits from it. We use explicit_top for that. */
-    while (explicit_top && explicit_top->implicit) {
-        explicit_top = backing_bs(explicit_top);
-    }
+    explicit_top = bdrv_skip_implicit_filters(explicit_top);
     update_inherits_from = bdrv_inherits_from_recursive(base, explicit_top);
 
     /* success - we can delete the intermediate states, and link top->base */
@@ -4494,10 +4497,14 @@ bool bdrv_is_sg(BlockDriverState *bs)
 
 bool bdrv_is_encrypted(BlockDriverState *bs)
 {
-    if (bs->backing && bs->backing->bs->encrypted) {
+    BlockDriverState *filtered = bdrv_filtered_bs(bs);
+    if (bs->encrypted) {
+        return true;
+    }
+    if (filtered && bdrv_is_encrypted(filtered)) {
         return true;
     }
-    return bs->encrypted;
+    return false;
 }
 
 const char *bdrv_get_format_name(BlockDriverState *bs)
@@ -4794,7 +4801,21 @@ BlockDriverState *bdrv_lookup_bs(const char *device,
 bool bdrv_chain_contains(BlockDriverState *top, BlockDriverState *base)
 {
     while (top && top != base) {
-        top = backing_bs(top);
+        top = bdrv_filtered_bs(top);
+    }
+
+    return top != NULL;
+}
+
+/*
+ * Same as bdrv_chain_contains(), but skip implicitly added R/W filter
+ * nodes and do not move past explicitly added R/W filters.
+ */
+bool bdrv_legacy_chain_contains(BlockDriverState *top, BlockDriverState *base)
+{
+    top = bdrv_skip_implicit_filters(top);
+    while (top && top != base) {
+        top = bdrv_skip_implicit_filters(bdrv_filtered_cow_bs(top));
     }
 
     return top != NULL;
@@ -4866,20 +4887,24 @@ int bdrv_has_zero_init_1(BlockDriverState *bs)
 
 int bdrv_has_zero_init(BlockDriverState *bs)
 {
+    BlockDriverState *filtered;
+
     if (!bs->drv) {
         return 0;
     }
 
     /* If BS is a copy on write image, it is initialized to
        the contents of the base image, which may not be zeroes.  */
-    if (bs->backing) {
+    if (bdrv_filtered_cow_child(bs)) {
         return 0;
     }
     if (bs->drv->bdrv_has_zero_init) {
         return bs->drv->bdrv_has_zero_init(bs);
     }
-    if (bs->file && bs->drv->is_filter) {
-        return bdrv_has_zero_init(bs->file->bs);
+
+    filtered = bdrv_filtered_rw_bs(bs);
+    if (filtered) {
+        return bdrv_has_zero_init(filtered);
     }
 
     /* safe default */
@@ -4890,7 +4915,7 @@ bool bdrv_unallocated_blocks_are_zero(BlockDriverState *bs)
 {
     BlockDriverInfo bdi;
 
-    if (bs->backing) {
+    if (bdrv_filtered_cow_child(bs)) {
         return false;
     }
 
@@ -4924,8 +4949,9 @@ int bdrv_get_info(BlockDriverState *bs, BlockDriverInfo *bdi)
         return -ENOMEDIUM;
     }
     if (!drv->bdrv_get_info) {
-        if (bs->file && drv->is_filter) {
-            return bdrv_get_info(bs->file->bs, bdi);
+        BlockDriverState *filtered = bdrv_filtered_rw_bs(bs);
+        if (filtered) {
+            return bdrv_get_info(filtered, bdi);
         }
         return -ENOTSUP;
     }
@@ -5028,7 +5054,17 @@ BlockDriverState *bdrv_find_backing_image(BlockDriverState *bs,
 
     is_protocol = path_has_protocol(backing_file);
 
-    for (curr_bs = bs; curr_bs->backing; curr_bs = curr_bs->backing->bs) {
+    /*
+     * Being largely a legacy function, skip any filters here
+     * (because filters do not have normal filenames, so they cannot
+     * match anyway; and allowing json:{} filenames is a bit out of
+     * scope).
+     */
+    for (curr_bs = bdrv_skip_rw_filters(bs);
+         bdrv_filtered_cow_child(curr_bs) != NULL;
+         curr_bs = bdrv_backing_chain_next(curr_bs))
+    {
+        BlockDriverState *bs_below = bdrv_backing_chain_next(curr_bs);
 
         /* If either of the filename paths is actually a protocol, then
          * compare unmodified paths; otherwise make paths relative */
@@ -5036,7 +5072,7 @@ BlockDriverState *bdrv_find_backing_image(BlockDriverState *bs,
             char *backing_file_full_ret;
 
             if (strcmp(backing_file, curr_bs->backing_file) == 0) {
-                retval = curr_bs->backing->bs;
+                retval = bs_below;
                 break;
             }
             /* Also check against the full backing filename for the image */
@@ -5046,7 +5082,7 @@ BlockDriverState *bdrv_find_backing_image(BlockDriverState *bs,
                 bool equal = strcmp(backing_file, backing_file_full_ret) == 0;
                 g_free(backing_file_full_ret);
                 if (equal) {
-                    retval = curr_bs->backing->bs;
+                    retval = bs_below;
                     break;
                 }
             }
@@ -5072,7 +5108,7 @@ BlockDriverState *bdrv_find_backing_image(BlockDriverState *bs,
             g_free(filename_tmp);
 
             if (strcmp(backing_file_full, filename_full) == 0) {
-                retval = curr_bs->backing->bs;
+                retval = bs_below;
                 break;
             }
         }
@@ -6237,3 +6273,107 @@ bool bdrv_can_store_new_dirty_bitmap(BlockDriverState *bs, const char *name,
 
     return drv->bdrv_can_store_new_dirty_bitmap(bs, name, granularity, errp);
 }
+
+/*
+ * Return the child that @bs acts as an overlay for, and from which data may be
+ * copied in COW or COR operations.  Usually this is the backing file.
+ */
+BdrvChild *bdrv_filtered_cow_child(BlockDriverState *bs)
+{
+    if (!bs || !bs->drv) {
+        return NULL;
+    }
+
+    if (bs->drv->is_filter) {
+        return NULL;
+    }
+
+    return bs->backing;
+}
+
+/*
+ * If @bs acts as a pass-through filter for one of its children,
+ * return that child.  "Pass-through" means that write operations to
+ * @bs are forwarded to that child instead of triggering COW.
+ */
+BdrvChild *bdrv_filtered_rw_child(BlockDriverState *bs)
+{
+    if (!bs || !bs->drv) {
+        return NULL;
+    }
+
+    if (!bs->drv->is_filter) {
+        return NULL;
+    }
+
+    return bs->backing ?: bs->file;
+}
+
+/*
+ * Return any filtered child, independently of how it reacts to write
+ * accesses and whether data is copied onto this BDS through COR.
+ */
+BdrvChild *bdrv_filtered_child(BlockDriverState *bs)
+{
+    BdrvChild *cow_child = bdrv_filtered_cow_child(bs);
+    BdrvChild *rw_child = bdrv_filtered_rw_child(bs);
+
+    /* There can only be one filtered child at a time */
+    assert(!(cow_child && rw_child));
+
+    return cow_child ?: rw_child;
+}
+
+static BlockDriverState *bdrv_skip_filters(BlockDriverState *bs,
+                                           bool stop_on_explicit_filter)
+{
+    BdrvChild *filtered;
+
+    if (!bs) {
+        return NULL;
+    }
+
+    while (!(stop_on_explicit_filter && !bs->implicit)) {
+        filtered = bdrv_filtered_rw_child(bs);
+        if (!filtered) {
+            break;
+        }
+        bs = filtered->bs;
+    }
+    /*
+     * Note that this treats nodes with bs->drv == NULL as not being
+     * R/W filters (bs->drv == NULL should be replaced by something
+     * else anyway).
+     * The advantage of this behavior is that this function will thus
+     * always return a non-NULL value (given a non-NULL @bs).
+     */
+
+    return bs;
+}
+
+/*
+ * Return the first BDS that has not been added implicitly or that
+ * does not have an RW-filtered child down the chain starting from @bs
+ * (including @bs itself).
+ */
+BlockDriverState *bdrv_skip_implicit_filters(BlockDriverState *bs)
+{
+    return bdrv_skip_filters(bs, true);
+}
+
+/*
+ * Return the first BDS that does not have an RW-filtered child down
+ * the chain starting from @bs (including @bs itself).
+ */
+BlockDriverState *bdrv_skip_rw_filters(BlockDriverState *bs)
+{
+    return bdrv_skip_filters(bs, false);
+}
+
+/*
+ * For a backing chain, return the first non-filter backing image.
+ */
+BlockDriverState *bdrv_backing_chain_next(BlockDriverState *bs)
+{
+    return bdrv_skip_rw_filters(bdrv_filtered_cow_bs(bdrv_skip_rw_filters(bs)));
+}
diff --git a/block/backup.c b/block/backup.c
index 9988753249..9c08353b23 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -577,6 +577,7 @@ BlockJob *backup_job_create(const char *job_id, BlockDriverState *bs,
     int64_t len;
     BlockDriverInfo bdi;
     BackupBlockJob *job = NULL;
+    bool target_does_cow;
     int ret;
 
     assert(bs);
@@ -671,8 +672,9 @@ BlockJob *backup_job_create(const char *job_id, BlockDriverState *bs,
     /* If there is no backing file on the target, we cannot rely on COW if our
      * backup cluster size is smaller than the target cluster size. Even for
      * targets with a backing file, try to avoid COW if possible. */
+    target_does_cow = bdrv_filtered_cow_child(target);
     ret = bdrv_get_info(target, &bdi);
-    if (ret == -ENOTSUP && !target->backing) {
+    if (ret == -ENOTSUP && !target_does_cow) {
         /* Cluster size is not defined */
         warn_report("The target block device doesn't provide "
                     "information about the block size and it doesn't have a "
@@ -681,14 +683,14 @@ BlockJob *backup_job_create(const char *job_id, BlockDriverState *bs,
                     "this default, the backup may be unusable",
                     BACKUP_CLUSTER_SIZE_DEFAULT);
         job->cluster_size = BACKUP_CLUSTER_SIZE_DEFAULT;
-    } else if (ret < 0 && !target->backing) {
+    } else if (ret < 0 && !target_does_cow) {
         error_setg_errno(errp, -ret,
             "Couldn't determine the cluster size of the target image, "
             "which has no backing file");
         error_append_hint(errp,
             "Aborting, since this may create an unusable destination image\n");
         goto error;
-    } else if (ret < 0 && target->backing) {
+    } else if (ret < 0 && target_does_cow) {
         /* Not fatal; just trudge on ahead. */
         job->cluster_size = BACKUP_CLUSTER_SIZE_DEFAULT;
     } else {
diff --git a/block/block-backend.c b/block/block-backend.c
index f78e82a707..aa9a1d84a6 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -2089,11 +2089,17 @@ int blk_commit_all(void)
         AioContext *aio_context = blk_get_aio_context(blk);
 
         aio_context_acquire(aio_context);
-        if (blk_is_inserted(blk) && blk->root->bs->backing) {
-            int ret = bdrv_commit(blk->root->bs);
-            if (ret < 0) {
-                aio_context_release(aio_context);
-                return ret;
+        if (blk_is_inserted(blk)) {
+            BlockDriverState *non_filter;
+
+            /* Legacy function, so skip implicit filters */
+            non_filter = bdrv_skip_implicit_filters(blk->root->bs);
+            if (bdrv_filtered_cow_child(non_filter)) {
+                int ret = bdrv_commit(non_filter);
+                if (ret < 0) {
+                    aio_context_release(aio_context);
+                    return ret;
+                }
             }
         }
         aio_context_release(aio_context);
diff --git a/block/commit.c b/block/commit.c
index 02eab34925..252007fd57 100644
--- a/block/commit.c
+++ b/block/commit.c
@@ -113,7 +113,7 @@ static void commit_abort(Job *job)
      * something to base, the intermediate images aren't valid any more. */
     bdrv_child_try_set_perm(s->commit_top_bs->backing, 0, BLK_PERM_ALL,
                             &error_abort);
-    bdrv_replace_node(s->commit_top_bs, backing_bs(s->commit_top_bs),
+    bdrv_replace_node(s->commit_top_bs, s->commit_top_bs->backing->bs,
                       &error_abort);
 
     bdrv_unref(s->commit_top_bs);
@@ -324,10 +324,16 @@ void commit_start(const char *job_id, BlockDriverState *bs,
     s->commit_top_bs = commit_top_bs;
     bdrv_unref(commit_top_bs);
 
-    /* Block all nodes between top and base, because they will
-     * disappear from the chain after this operation. */
+    /*
+     * Block all nodes between top and base, because they will
+     * disappear from the chain after this operation.
+     * Note that this assumes that the user is fine with removing all
+     * nodes (including R/W filters) between top and base.  Assuring
+     * this is the responsibility of the interface (i.e. whoever calls
+     * commit_start()).
+     */
     assert(bdrv_chain_contains(top, base));
-    for (iter = top; iter != base; iter = backing_bs(iter)) {
+    for (iter = top; iter != base; iter = bdrv_filtered_bs(iter)) {
         /* XXX BLK_PERM_WRITE needs to be allowed so we don't block ourselves
          * at s->base (if writes are blocked for a node, they are also blocked
          * for its backing file). The other options would be a second filter
@@ -414,19 +420,22 @@ int bdrv_commit(BlockDriverState *bs)
     if (!drv)
         return -ENOMEDIUM;
 
-    if (!bs->backing) {
+    backing_file_bs = bdrv_filtered_cow_bs(bs);
+
+    if (!backing_file_bs) {
         return -ENOTSUP;
     }
 
     if (bdrv_op_is_blocked(bs, BLOCK_OP_TYPE_COMMIT_SOURCE, NULL) ||
-        bdrv_op_is_blocked(bs->backing->bs, BLOCK_OP_TYPE_COMMIT_TARGET, NULL)) {
+        bdrv_op_is_blocked(backing_file_bs, BLOCK_OP_TYPE_COMMIT_TARGET, NULL))
+    {
         return -EBUSY;
     }
 
-    ro = bs->backing->bs->read_only;
+    ro = backing_file_bs->read_only;
 
     if (ro) {
-        if (bdrv_reopen_set_read_only(bs->backing->bs, false, NULL)) {
+        if (bdrv_reopen_set_read_only(backing_file_bs, false, NULL)) {
             return -EACCES;
         }
     }
@@ -441,8 +450,6 @@ int bdrv_commit(BlockDriverState *bs)
     }
 
     /* Insert commit_top block node above backing, so we can write to it */
-    backing_file_bs = backing_bs(bs);
-
     commit_top_bs = bdrv_new_open_driver(&bdrv_commit_top, NULL, BDRV_O_RDWR,
                                          &local_err);
     if (commit_top_bs == NULL) {
@@ -528,15 +535,13 @@ ro_cleanup:
     qemu_vfree(buf);
 
     blk_unref(backing);
-    if (backing_file_bs) {
-        bdrv_set_backing_hd(bs, backing_file_bs, &error_abort);
-    }
+    bdrv_set_backing_hd(bs, backing_file_bs, &error_abort);
     bdrv_unref(commit_top_bs);
     blk_unref(src);
 
     if (ro) {
         /* ignoring error return here */
-        bdrv_reopen_set_read_only(bs->backing->bs, true, NULL);
+        bdrv_reopen_set_read_only(backing_file_bs, true, NULL);
     }
 
     return ret;
diff --git a/block/io.c b/block/io.c
index dfc153b8d8..83c2b6b46a 100644
--- a/block/io.c
+++ b/block/io.c
@@ -118,8 +118,17 @@ static void bdrv_merge_limits(BlockLimits *dst, const BlockLimits *src)
 void bdrv_refresh_limits(BlockDriverState *bs, Error **errp)
 {
     BlockDriver *drv = bs->drv;
+    BlockDriverState *storage_bs;
+    BlockDriverState *cow_bs = bdrv_filtered_cow_bs(bs);
     Error *local_err = NULL;
 
+    /*
+     * FIXME: There should be a function for this, and in fact there
+     * will be as of a follow-up patch.
+     */
+    storage_bs =
+        child_bs(bs->file) ?: bdrv_filtered_rw_bs(bs);
+
     memset(&bs->bl, 0, sizeof(bs->bl));
 
     if (!drv) {
@@ -131,13 +140,13 @@ void bdrv_refresh_limits(BlockDriverState *bs, Error **errp)
                                 drv->bdrv_aio_preadv) ? 1 : 512;
 
     /* Take some limits from the children as a default */
-    if (bs->file) {
-        bdrv_refresh_limits(bs->file->bs, &local_err);
+    if (storage_bs) {
+        bdrv_refresh_limits(storage_bs, &local_err);
         if (local_err) {
             error_propagate(errp, local_err);
             return;
         }
-        bdrv_merge_limits(&bs->bl, &bs->file->bs->bl);
+        bdrv_merge_limits(&bs->bl, &storage_bs->bl);
     } else {
         bs->bl.min_mem_alignment = 512;
         bs->bl.opt_mem_alignment = getpagesize();
@@ -146,13 +155,13 @@ void bdrv_refresh_limits(BlockDriverState *bs, Error **errp)
         bs->bl.max_iov = IOV_MAX;
     }
 
-    if (bs->backing) {
-        bdrv_refresh_limits(bs->backing->bs, &local_err);
+    if (cow_bs) {
+        bdrv_refresh_limits(cow_bs, &local_err);
         if (local_err) {
             error_propagate(errp, local_err);
             return;
         }
-        bdrv_merge_limits(&bs->bl, &bs->backing->bs->bl);
+        bdrv_merge_limits(&bs->bl, &cow_bs->bl);
     }
 
     /* Then let the driver override it */
@@ -2139,11 +2148,12 @@ static int coroutine_fn bdrv_co_block_status(BlockDriverState *bs,
     if (ret & (BDRV_BLOCK_DATA | BDRV_BLOCK_ZERO)) {
         ret |= BDRV_BLOCK_ALLOCATED;
     } else if (want_zero) {
+        BlockDriverState *cow_bs = bdrv_filtered_cow_bs(bs);
+
         if (bdrv_unallocated_blocks_are_zero(bs)) {
             ret |= BDRV_BLOCK_ZERO;
-        } else if (bs->backing) {
-            BlockDriverState *bs2 = bs->backing->bs;
-            int64_t size2 = bdrv_getlength(bs2);
+        } else if (cow_bs) {
+            int64_t size2 = bdrv_getlength(cow_bs);
 
             if (size2 >= 0 && offset >= size2) {
                 ret |= BDRV_BLOCK_ZERO;
@@ -2208,7 +2218,7 @@ static int coroutine_fn bdrv_co_block_status_above(BlockDriverState *bs,
     bool first = true;
 
     assert(bs != base);
-    for (p = bs; p != base; p = backing_bs(p)) {
+    for (p = bs; p != base; p = bdrv_filtered_bs(p)) {
         ret = bdrv_co_block_status(p, want_zero, offset, bytes, pnum, map,
                                    file);
         if (ret < 0) {
@@ -2294,7 +2304,7 @@ int bdrv_block_status_above(BlockDriverState *bs, BlockDriverState *base,
 int bdrv_block_status(BlockDriverState *bs, int64_t offset, int64_t bytes,
                       int64_t *pnum, int64_t *map, BlockDriverState **file)
 {
-    return bdrv_block_status_above(bs, backing_bs(bs),
+    return bdrv_block_status_above(bs, bdrv_filtered_bs(bs),
                                    offset, bytes, pnum, map, file);
 }
 
@@ -2304,9 +2314,9 @@ int coroutine_fn bdrv_is_allocated(BlockDriverState *bs, int64_t offset,
     int ret;
     int64_t dummy;
 
-    ret = bdrv_common_block_status_above(bs, backing_bs(bs), false, offset,
-                                         bytes, pnum ? pnum : &dummy, NULL,
-                                         NULL);
+    ret = bdrv_common_block_status_above(bs, bdrv_filtered_bs(bs), false,
+                                         offset, bytes, pnum ? pnum : &dummy,
+                                         NULL, NULL);
     if (ret < 0) {
         return ret;
     }
@@ -2360,7 +2370,7 @@ int bdrv_is_allocated_above(BlockDriverState *top,
             n = pnum_inter;
         }
 
-        intermediate = backing_bs(intermediate);
+        intermediate = bdrv_filtered_bs(intermediate);
     }
 
     *pnum = n;
@@ -3135,8 +3145,9 @@ int coroutine_fn bdrv_co_truncate(BdrvChild *child, int64_t offset,
     }
 
     if (!drv->bdrv_co_truncate) {
-        if (bs->file && drv->is_filter) {
-            ret = bdrv_co_truncate(bs->file, offset, prealloc, errp);
+        BdrvChild *filtered = bdrv_filtered_rw_child(bs);
+        if (filtered) {
+            ret = bdrv_co_truncate(filtered, offset, prealloc, errp);
             goto out;
         }
         error_setg(errp, "Image format driver does not support resize");
diff --git a/block/mirror.c b/block/mirror.c
index 8b2404051f..80cef587f0 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -660,8 +660,9 @@ static int mirror_exit_common(Job *job)
                             &error_abort);
     if (!abort && s->backing_mode == MIRROR_SOURCE_BACKING_CHAIN) {
         BlockDriverState *backing = s->is_none_mode ? src : s->base;
-        if (backing_bs(target_bs) != backing) {
-            bdrv_set_backing_hd(target_bs, backing, &local_err);
+        if (bdrv_backing_chain_next(target_bs) != backing) {
+            bdrv_set_backing_hd(bdrv_skip_rw_filters(target_bs), backing,
+                                &local_err);
             if (local_err) {
                 error_report_err(local_err);
                 ret = -EPERM;
@@ -711,7 +712,7 @@ static int mirror_exit_common(Job *job)
     block_job_remove_all_bdrv(bjob);
     bdrv_child_try_set_perm(mirror_top_bs->backing, 0, BLK_PERM_ALL,
                             &error_abort);
-    bdrv_replace_node(mirror_top_bs, backing_bs(mirror_top_bs), &error_abort);
+    bdrv_replace_node(mirror_top_bs, mirror_top_bs->backing->bs, &error_abort);
 
     /* We just changed the BDS the job BB refers to (with either or both of the
      * bdrv_replace_node() calls), so switch the BB back so the cleanup does
@@ -903,7 +904,7 @@ static int coroutine_fn mirror_run(Job *job, Error **errp)
     } else {
         s->target_cluster_size = BDRV_SECTOR_SIZE;
     }
-    if (backing_filename[0] && !target_bs->backing &&
+    if (backing_filename[0] && !bdrv_filtered_cow_child(target_bs) &&
         s->granularity < s->target_cluster_size) {
         s->buf_size = MAX(s->buf_size, s->target_cluster_size);
         s->cow_bitmap = bitmap_new(length);
@@ -1083,7 +1084,7 @@ static void mirror_complete(Job *job, Error **errp)
     if (s->backing_mode == MIRROR_OPEN_BACKING_CHAIN) {
         int ret;
 
-        assert(!target->backing);
+        assert(!bdrv_filtered_cow_child(target));
         ret = bdrv_open_backing_file(target, NULL, "backing", errp);
         if (ret < 0) {
             return;
@@ -1650,7 +1651,9 @@ static void mirror_start_job(const char *job_id, BlockDriverState *bs,
      * any jobs in them must be blocked */
     if (target_is_backing) {
         BlockDriverState *iter;
-        for (iter = backing_bs(bs); iter != target; iter = backing_bs(iter)) {
+        for (iter = bdrv_filtered_bs(bs); iter != target;
+             iter = bdrv_filtered_bs(iter))
+        {
             /* XXX BLK_PERM_WRITE needs to be allowed so we don't block
              * ourselves at s->base (if writes are blocked for a node, they are
              * also blocked for its backing file). The other options would be a
@@ -1691,7 +1694,7 @@ fail:
 
     bdrv_child_try_set_perm(mirror_top_bs->backing, 0, BLK_PERM_ALL,
                             &error_abort);
-    bdrv_replace_node(mirror_top_bs, backing_bs(mirror_top_bs), &error_abort);
+    bdrv_replace_node(mirror_top_bs, mirror_top_bs->backing->bs, &error_abort);
 
     bdrv_unref(mirror_top_bs);
 }
@@ -1707,14 +1710,14 @@ void mirror_start(const char *job_id, BlockDriverState *bs,
                   MirrorCopyMode copy_mode, Error **errp)
 {
     bool is_none_mode;
-    BlockDriverState *base;
+    BlockDriverState *base = NULL;
 
     if (mode == MIRROR_SYNC_MODE_INCREMENTAL) {
         error_setg(errp, "Sync mode 'incremental' not supported");
         return;
     }
     is_none_mode = mode == MIRROR_SYNC_MODE_NONE;
-    base = mode == MIRROR_SYNC_MODE_TOP ? backing_bs(bs) : NULL;
+    base = mode == MIRROR_SYNC_MODE_TOP ? bdrv_backing_chain_next(bs) : NULL;
     mirror_start_job(job_id, bs, creation_flags, target, replaces,
                      speed, granularity, buf_size, backing_mode,
                      on_source_error, on_target_error, unmap, NULL, NULL,
diff --git a/block/qapi.c b/block/qapi.c
index 110d05dc57..478c6f5e0d 100644
--- a/block/qapi.c
+++ b/block/qapi.c
@@ -149,9 +149,13 @@ BlockDeviceInfo *bdrv_block_device_info(BlockBackend *blk,
             return NULL;
         }
 
-        if (bs0->drv && bs0->backing) {
+        if (bs0->drv && bdrv_filtered_child(bs0)) {
+            /*
+             * Put any filtered child here (for backwards compatibility to when
+             * we put bs0->backing here, which might be any filtered child).
+             */
             info->backing_file_depth++;
-            bs0 = bs0->backing->bs;
+            bs0 = bdrv_filtered_bs(bs0);
             (*p_image_info)->has_backing_image = true;
             p_image_info = &((*p_image_info)->backing_image);
         } else {
@@ -160,9 +164,8 @@ BlockDeviceInfo *bdrv_block_device_info(BlockBackend *blk,
 
         /* Skip automatically inserted nodes that the user isn't aware of for
          * query-block (blk != NULL), but not for query-named-block-nodes */
-        while (blk && bs0->drv && bs0->implicit) {
-            bs0 = backing_bs(bs0);
-            assert(bs0);
+        if (blk) {
+            bs0 = bdrv_skip_implicit_filters(bs0);
         }
     }
 
@@ -347,9 +350,9 @@ static void bdrv_query_info(BlockBackend *blk, BlockInfo **p_info,
     BlockDriverState *bs = blk_bs(blk);
     char *qdev;
 
-    /* Skip automatically inserted nodes that the user isn't aware of */
-    while (bs && bs->drv && bs->implicit) {
-        bs = backing_bs(bs);
+    if (bs) {
+        /* Skip automatically inserted nodes that the user isn't aware of */
+        bs = bdrv_skip_implicit_filters(bs);
     }
 
     info->device = g_strdup(blk_name(blk));
@@ -506,6 +509,7 @@ static void bdrv_query_blk_stats(BlockDeviceStats *ds, BlockBackend *blk)
 static BlockStats *bdrv_query_bds_stats(BlockDriverState *bs,
                                         bool blk_level)
 {
+    BlockDriverState *cow_bs;
     BlockStats *s = NULL;
 
     s = g_malloc0(sizeof(*s));
@@ -518,9 +522,8 @@ static BlockStats *bdrv_query_bds_stats(BlockDriverState *bs,
     /* Skip automatically inserted nodes that the user isn't aware of in
      * a BlockBackend-level command. Stay at the exact node for a node-level
      * command. */
-    while (blk_level && bs->drv && bs->implicit) {
-        bs = backing_bs(bs);
-        assert(bs);
+    if (blk_level) {
+        bs = bdrv_skip_implicit_filters(bs);
     }
 
     if (bdrv_get_node_name(bs)[0]) {
@@ -535,9 +538,10 @@ static BlockStats *bdrv_query_bds_stats(BlockDriverState *bs,
         s->parent = bdrv_query_bds_stats(bs->file->bs, blk_level);
     }
 
-    if (blk_level && bs->backing) {
+    cow_bs = bdrv_filtered_cow_bs(bs);
+    if (blk_level && cow_bs) {
         s->has_backing = true;
-        s->backing = bdrv_query_bds_stats(bs->backing->bs, blk_level);
+        s->backing = bdrv_query_bds_stats(cow_bs, blk_level);
     }
 
     return s;
diff --git a/block/stream.c b/block/stream.c
index bfaebb861a..23d5c890e0 100644
--- a/block/stream.c
+++ b/block/stream.c
@@ -65,6 +65,7 @@ static int stream_prepare(Job *job)
     StreamBlockJob *s = container_of(job, StreamBlockJob, common.job);
     BlockJob *bjob = &s->common;
     BlockDriverState *bs = blk_bs(bjob->blk);
+    BlockDriverState *unfiltered = bdrv_skip_rw_filters(bs);
     BlockDriverState *base = s->base;
     Error *local_err = NULL;
     int ret = 0;
@@ -72,7 +73,7 @@ static int stream_prepare(Job *job)
     bdrv_unfreeze_backing_chain(bs, base);
     s->chain_frozen = false;
 
-    if (bs->backing) {
+    if (bdrv_filtered_cow_child(unfiltered)) {
         const char *base_id = NULL, *base_fmt = NULL;
         if (base) {
             base_id = s->backing_file_str;
@@ -80,7 +81,7 @@ static int stream_prepare(Job *job)
                 base_fmt = base->drv->format_name;
             }
         }
-        ret = bdrv_change_backing_file(bs, base_id, base_fmt);
+        ret = bdrv_change_backing_file(unfiltered, base_id, base_fmt);
         bdrv_set_backing_hd(bs, base, &local_err);
         if (local_err) {
             error_report_err(local_err);
@@ -121,7 +122,7 @@ static int coroutine_fn stream_run(Job *job, Error **errp)
     int64_t n = 0; /* bytes */
     void *buf;
 
-    if (!bs->backing) {
+    if (!bdrv_filtered_child(bs)) {
         goto out;
     }
 
@@ -162,7 +163,7 @@ static int coroutine_fn stream_run(Job *job, Error **errp)
         } else if (ret >= 0) {
             /* Copy if allocated in the intermediate images.  Limit to the
              * known-unallocated area [offset, offset+n*BDRV_SECTOR_SIZE).  */
-            ret = bdrv_is_allocated_above(backing_bs(bs), base,
+            ret = bdrv_is_allocated_above(bdrv_filtered_bs(bs), base,
                                           offset, n, &n);
 
             /* Finish early if end of backing file has been reached */
@@ -268,7 +269,9 @@ void stream_start(const char *job_id, BlockDriverState *bs,
      * disappear from the chain after this operation. The streaming job reads
      * every block only once, assuming that it doesn't change, so block writes
      * and resizes. */
-    for (iter = backing_bs(bs); iter && iter != base; iter = backing_bs(iter)) {
+    for (iter = bdrv_filtered_bs(bs); iter && iter != base;
+         iter = bdrv_filtered_bs(iter))
+    {
         block_job_add_bdrv(&s->common, "intermediate node", iter, 0,
                            BLK_PERM_CONSISTENT_READ | BLK_PERM_WRITE_UNCHANGED,
                            &error_abort);
diff --git a/blockdev.c b/blockdev.c
index 4775a07d93..bb71b8368d 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -1094,7 +1094,7 @@ void hmp_commit(Monitor *mon, const QDict *qdict)
             return;
         }
 
-        bs = blk_bs(blk);
+        bs = bdrv_skip_implicit_filters(blk_bs(blk));
         aio_context = bdrv_get_aio_context(bs);
         aio_context_acquire(aio_context);
 
@@ -1663,7 +1663,7 @@ static void external_snapshot_prepare(BlkActionState *common,
         goto out;
     }
 
-    if (state->new_bs->backing != NULL) {
+    if (bdrv_filtered_cow_child(state->new_bs)) {
         error_setg(errp, "The snapshot already has a backing image");
         goto out;
     }
@@ -3202,6 +3202,13 @@ void qmp_block_stream(bool has_job_id, const char *job_id, const char *device,
         if (!base_bs) {
             goto out;
         }
+        /*
+         * Streaming copies data through COR, so all of the filters
+         * between the target and the base are considered.  Therefore,
+         * we can use bdrv_chain_contains() and do not have to use
+         * bdrv_legacy_chain_contains() (which does not go past
+         * explicitly added filters).
+         */
         if (bs == base_bs || !bdrv_chain_contains(bs, base_bs)) {
             error_setg(errp, "Node '%s' is not a backing image of '%s'",
                        base_node, device);
@@ -3213,7 +3220,7 @@ void qmp_block_stream(bool has_job_id, const char *job_id, const char *device,
     }
 
     /* Check for op blockers in the whole chain between bs and base */
-    for (iter = bs; iter && iter != base_bs; iter = backing_bs(iter)) {
+    for (iter = bs; iter && iter != base_bs; iter = bdrv_filtered_bs(iter)) {
         if (bdrv_op_is_blocked(iter, BLOCK_OP_TYPE_STREAM, errp)) {
             goto out;
         }
@@ -3370,7 +3377,9 @@ void qmp_block_commit(bool has_job_id, const char *job_id, const char *device,
 
     assert(bdrv_get_aio_context(base_bs) == aio_context);
 
-    for (iter = top_bs; iter != backing_bs(base_bs); iter = backing_bs(iter)) {
+    for (iter = top_bs; iter != bdrv_filtered_bs(base_bs);
+         iter = bdrv_filtered_bs(iter))
+    {
         if (bdrv_op_is_blocked(iter, BLOCK_OP_TYPE_COMMIT_TARGET, errp)) {
             goto out;
         }
@@ -3381,6 +3390,11 @@ void qmp_block_commit(bool has_job_id, const char *job_id, const char *device,
         error_setg(errp, "cannot commit an image into itself");
         goto out;
     }
+    if (!bdrv_legacy_chain_contains(top_bs, base_bs)) {
+        /* We have to disallow this until the user can give explicit consent */
+        error_setg(errp, "Cannot commit through explicit filter nodes");
+        goto out;
+    }
 
     if (top_bs == bs) {
         if (has_backing_file) {
@@ -3472,7 +3486,13 @@ static BlockJob *do_drive_backup(DriveBackup *backup, JobTxn *txn,
     /* See if we have a backing HD we can use to create our new image
      * on top of. */
     if (backup->sync == MIRROR_SYNC_MODE_TOP) {
-        source = backing_bs(bs);
+        /*
+         * Backup will not replace the source by the target, so none
+         * of the filters skipped here will be removed (in contrast to
+         * mirror).  Therefore, we can skip all of them when looking
+         * for the first COW relationship.
+         */
+        source = bdrv_filtered_cow_bs(bdrv_skip_rw_filters(bs));
         if (!source) {
             backup->sync = MIRROR_SYNC_MODE_FULL;
         }
@@ -3492,9 +3512,14 @@ static BlockJob *do_drive_backup(DriveBackup *backup, JobTxn *txn,
     if (backup->mode != NEW_IMAGE_MODE_EXISTING) {
         assert(backup->format);
         if (source) {
-            bdrv_refresh_filename(source);
-            bdrv_img_create(backup->target, backup->format, source->filename,
-                            source->drv->format_name, NULL,
+            /* Implicit filters should not appear in the filename */
+            BlockDriverState *explicit_backing =
+                bdrv_skip_implicit_filters(source);
+
+            bdrv_refresh_filename(explicit_backing);
+            bdrv_img_create(backup->target, backup->format,
+                            explicit_backing->filename,
+                            explicit_backing->drv->format_name, NULL,
                             size, flags, false, &local_err);
         } else {
             bdrv_img_create(backup->target, backup->format, NULL, NULL, NULL,
@@ -3752,7 +3777,7 @@ static void blockdev_mirror_common(const char *job_id, BlockDriverState *bs,
         return;
     }
 
-    if (!bs->backing && sync == MIRROR_SYNC_MODE_TOP) {
+    if (!bdrv_backing_chain_next(bs) && sync == MIRROR_SYNC_MODE_TOP) {
         sync = MIRROR_SYNC_MODE_FULL;
     }
 
@@ -3801,7 +3826,7 @@ static void blockdev_mirror_common(const char *job_id, BlockDriverState *bs,
 
 void qmp_drive_mirror(DriveMirror *arg, Error **errp)
 {
-    BlockDriverState *bs;
+    BlockDriverState *bs, *unfiltered_bs;
     BlockDriverState *source, *target_bs;
     AioContext *aio_context;
     BlockMirrorBackingMode backing_mode;
@@ -3810,6 +3835,7 @@ void qmp_drive_mirror(DriveMirror *arg, Error **errp)
     int flags;
     int64_t size;
     const char *format = arg->format;
+    const char *replaces_node_name = NULL;
 
     bs = qmp_get_root_bs(arg->device, errp);
     if (!bs) {
@@ -3821,6 +3847,16 @@ void qmp_drive_mirror(DriveMirror *arg, Error **errp)
         return;
     }
 
+    /*
+     * If the user has not instructed us otherwise, we should let the
+     * block job run from @bs (thus taking into account all filters on
+     * it) but replace @unfiltered_bs when it finishes (thus not
+     * removing those filters).
+     * (And if there are any explicit filters, we should assume the
+     *  user knows how to use the @replaces option.)
+     */
+    unfiltered_bs = bdrv_skip_implicit_filters(bs);
+
     aio_context = bdrv_get_aio_context(bs);
     aio_context_acquire(aio_context);
 
@@ -3834,8 +3870,14 @@ void qmp_drive_mirror(DriveMirror *arg, Error **errp)
     }
 
     flags = bs->open_flags | BDRV_O_RDWR;
-    source = backing_bs(bs);
+    source = bdrv_filtered_cow_bs(unfiltered_bs);
     if (!source && arg->sync == MIRROR_SYNC_MODE_TOP) {
+        if (bdrv_filtered_bs(unfiltered_bs)) {
+            /* @unfiltered_bs is an explicit filter */
+            error_setg(errp, "Cannot perform sync=top mirror through an "
+                       "explicitly added filter node on the source");
+            goto out;
+        }
         arg->sync = MIRROR_SYNC_MODE_FULL;
     }
     if (arg->sync == MIRROR_SYNC_MODE_NONE) {
@@ -3854,6 +3896,9 @@ void qmp_drive_mirror(DriveMirror *arg, Error **errp)
                              " named node of the graph");
             goto out;
         }
+        replaces_node_name = arg->replaces;
+    } else if (unfiltered_bs != bs) {
+        replaces_node_name = unfiltered_bs->node_name;
     }
 
     if (arg->mode == NEW_IMAGE_MODE_ABSOLUTE_PATHS) {
@@ -3873,6 +3918,9 @@ void qmp_drive_mirror(DriveMirror *arg, Error **errp)
         bdrv_img_create(arg->target, format,
                         NULL, NULL, NULL, size, flags, false, &local_err);
     } else {
+        /* Implicit filters should not appear in the filename */
+        BlockDriverState *explicit_backing = bdrv_skip_implicit_filters(source);
+
         switch (arg->mode) {
         case NEW_IMAGE_MODE_EXISTING:
             break;
@@ -3880,8 +3928,8 @@ void qmp_drive_mirror(DriveMirror *arg, Error **errp)
             /* create new image with backing file */
             bdrv_refresh_filename(source);
             bdrv_img_create(arg->target, format,
-                            source->filename,
-                            source->drv->format_name,
+                            explicit_backing->filename,
+                            explicit_backing->drv->format_name,
                             NULL, size, flags, false, &local_err);
             break;
         default:
@@ -3913,7 +3961,7 @@ void qmp_drive_mirror(DriveMirror *arg, Error **errp)
     bdrv_set_aio_context(target_bs, aio_context);
 
     blockdev_mirror_common(arg->has_job_id ? arg->job_id : NULL, bs, target_bs,
-                           arg->has_replaces, arg->replaces, arg->sync,
+                           !!replaces_node_name, replaces_node_name, arg->sync,
                            backing_mode, arg->has_speed, arg->speed,
                            arg->has_granularity, arg->granularity,
                            arg->has_buf_size, arg->buf_size,
@@ -3949,7 +3997,7 @@ void qmp_blockdev_mirror(bool has_job_id, const char *job_id,
                          bool has_auto_dismiss, bool auto_dismiss,
                          Error **errp)
 {
-    BlockDriverState *bs;
+    BlockDriverState *bs, *unfiltered_bs;
     BlockDriverState *target_bs;
     AioContext *aio_context;
     BlockMirrorBackingMode backing_mode = MIRROR_LEAVE_BACKING_CHAIN;
@@ -3960,6 +4008,16 @@ void qmp_blockdev_mirror(bool has_job_id, const char *job_id,
         return;
     }
 
+    /*
+     * Same as in qmp_drive_mirror(): We want to run the job from @bs,
+     * but we want to replace @unfiltered_bs on completion.
+     */
+    unfiltered_bs = bdrv_skip_implicit_filters(bs);
+    if (!has_replaces && unfiltered_bs != bs) {
+        replaces = unfiltered_bs->node_name;
+        has_replaces = true;
+    }
+
     target_bs = bdrv_lookup_bs(target, target, errp);
     if (!target_bs) {
         return;
diff --git a/migration/block-dirty-bitmap.c b/migration/block-dirty-bitmap.c
index d1bb863cb6..f99f753fba 100644
--- a/migration/block-dirty-bitmap.c
+++ b/migration/block-dirty-bitmap.c
@@ -285,9 +285,7 @@ static int init_dirty_bitmap_migration(void)
         const char *drive_name = bdrv_get_device_or_node_name(bs);
 
         /* skip automatically inserted nodes */
-        while (bs && bs->drv && bs->implicit) {
-            bs = backing_bs(bs);
-        }
+        bs = bdrv_skip_implicit_filters(bs);
 
         for (bitmap = bdrv_dirty_bitmap_next(bs, NULL); bitmap;
              bitmap = bdrv_dirty_bitmap_next(bs, bitmap))
diff --git a/nbd/server.c b/nbd/server.c
index e21bd501dc..e41ae89dbe 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -1506,13 +1506,13 @@ NBDExport *nbd_export_new(BlockDriverState *bs, uint64_t dev_offset,
     if (bitmap) {
         BdrvDirtyBitmap *bm = NULL;
 
-        while (true) {
+        while (bs) {
             bm = bdrv_find_dirty_bitmap(bs, bitmap);
-            if (bm != NULL || bs->backing == NULL) {
+            if (bm != NULL) {
                 break;
             }
 
-            bs = bs->backing->bs;
+            bs = bdrv_filtered_bs(bs);
         }
 
         if (bm == NULL) {
diff --git a/qemu-img.c b/qemu-img.c
index aa6f81f1ea..bcfbb743fc 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -982,7 +982,7 @@ static int img_commit(int argc, char **argv)
     if (!blk) {
         return 1;
     }
-    bs = blk_bs(blk);
+    bs = bdrv_skip_implicit_filters(blk_bs(blk));
 
     qemu_progress_init(progress, 1.f);
     qemu_progress_print(0.f, 100);
@@ -999,7 +999,7 @@ static int img_commit(int argc, char **argv)
         /* This is different from QMP, which by default uses the deepest file in
          * the backing chain (i.e., the very base); however, the traditional
          * behavior of qemu-img commit is using the immediate backing file. */
-        base_bs = backing_bs(bs);
+        base_bs = bdrv_filtered_cow_bs(bs);
         if (!base_bs) {
             error_setg(&local_err, "Image does not have a backing file");
             goto done;
@@ -1616,19 +1616,18 @@ static int convert_iteration_sectors(ImgConvertState *s, int64_t sector_num)
 
     if (s->sector_next_status <= sector_num) {
         int64_t count = n * BDRV_SECTOR_SIZE;
+        BlockDriverState *src_bs = blk_bs(s->src[src_cur]);
+        BlockDriverState *base;
 
         if (s->target_has_backing) {
-
-            ret = bdrv_block_status(blk_bs(s->src[src_cur]),
-                                    (sector_num - src_cur_offset) *
-                                    BDRV_SECTOR_SIZE,
-                                    count, &count, NULL, NULL);
+            base = bdrv_backing_chain_next(src_bs);
         } else {
-            ret = bdrv_block_status_above(blk_bs(s->src[src_cur]), NULL,
-                                          (sector_num - src_cur_offset) *
-                                          BDRV_SECTOR_SIZE,
-                                          count, &count, NULL, NULL);
+            base = NULL;
         }
+        ret = bdrv_block_status_above(src_bs, base,
+                                      (sector_num - src_cur_offset) *
+                                      BDRV_SECTOR_SIZE,
+                                      count, &count, NULL, NULL);
         if (ret < 0) {
             error_report("error while reading block status of sector %" PRId64
                          ": %s", sector_num, strerror(-ret));
@@ -2434,7 +2433,8 @@ static int img_convert(int argc, char **argv)
          * s.target_backing_sectors has to be negative, which it will
          * be automatically).  The backing file length is used only
          * for optimizations, so such a case is not fatal. */
-        s.target_backing_sectors = bdrv_nb_sectors(out_bs->backing->bs);
+        s.target_backing_sectors =
+            bdrv_nb_sectors(bdrv_filtered_cow_bs(out_bs));
     } else {
         s.target_backing_sectors = -1;
     }
@@ -2797,6 +2797,7 @@ static int get_block_status(BlockDriverState *bs, int64_t offset,
 
     depth = 0;
     for (;;) {
+        bs = bdrv_skip_rw_filters(bs);
         ret = bdrv_block_status(bs, offset, bytes, &bytes, &map, &file);
         if (ret < 0) {
             return ret;
@@ -2805,7 +2806,7 @@ static int get_block_status(BlockDriverState *bs, int64_t offset,
         if (ret & (BDRV_BLOCK_ZERO|BDRV_BLOCK_DATA)) {
             break;
         }
-        bs = backing_bs(bs);
+        bs = bdrv_filtered_cow_bs(bs);
         if (bs == NULL) {
             ret = 0;
             break;
@@ -2944,7 +2945,7 @@ static int img_map(int argc, char **argv)
     if (!blk) {
         return 1;
     }
-    bs = blk_bs(blk);
+    bs = bdrv_skip_implicit_filters(blk_bs(blk));
 
     if (output_format == OFORMAT_HUMAN) {
         printf("%-16s%-16s%-16s%s\n", "Offset", "Length", "Mapped to", "File");
diff --git a/tests/qemu-iotests/184.out b/tests/qemu-iotests/184.out
index 3deb3cfb94..1d61f7e224 100644
--- a/tests/qemu-iotests/184.out
+++ b/tests/qemu-iotests/184.out
@@ -27,6 +27,11 @@ Testing:
             "iops_rd": 0,
             "detect_zeroes": "off",
             "image": {
+                "backing-image": {
+                    "virtual-size": 1073741824,
+                    "filename": "null-co://",
+                    "format": "null-co"
+                },
                 "virtual-size": 1073741824,
                 "filename": "json:{\"throttle-group\": \"group0\", \"driver\": \"throttle\", \"file\": {\"driver\": \"null-co\"}}",
                 "format": "throttle"
@@ -34,7 +39,7 @@ Testing:
             "iops_wr": 0,
             "ro": false,
             "node-name": "throttle0",
-            "backing_file_depth": 0,
+            "backing_file_depth": 1,
             "drv": "throttle",
             "iops": 0,
             "bps_wr": 0,
diff --git a/tests/qemu-iotests/204.out b/tests/qemu-iotests/204.out
index f3a10fbe90..684774d763 100644
--- a/tests/qemu-iotests/204.out
+++ b/tests/qemu-iotests/204.out
@@ -59,5 +59,6 @@ Offset          Length          File
 0x900000        0x2400000       TEST_DIR/t.IMGFMT
 0x3c00000       0x1100000       TEST_DIR/t.IMGFMT
 0x6a00000       0x400000        TEST_DIR/t.IMGFMT
+0x6e00000       0x1200000       TEST_DIR/t.IMGFMT.base
 No errors were found on the image.
 *** done
-- 
2.20.1

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [Qemu-devel] [PATCH v4 02/11] block: Filtered children access functions
@ 2019-04-10 20:20   ` Max Reitz
  0 siblings, 0 replies; 48+ messages in thread
From: Max Reitz @ 2019-04-10 20:20 UTC (permalink / raw)
  To: qemu-block; +Cc: Kevin Wolf, qemu-devel, Max Reitz

What bs->file and bs->backing mean depends on the node.  For filter
nodes, both signify a node that will eventually receive all R/W
accesses.  For format nodes, bs->file contains metadata and data, and
bs->backing will not receive writes -- instead, writes are COWed to
bs->file.  Usually.

In any case, it is not trivial to guess what a child means exactly with
our currently limited form of expression.  It is better to introduce
some functions that actually guarantee a meaning:

- bdrv_filtered_cow_child() will return the child that receives requests
  filtered through COW.  That is, reads may or may not be forwarded
  (depending on the overlay's allocation status), but writes never go to
  this child.

- bdrv_filtered_rw_child() will return the child that receives requests
  filtered through some very plain process.  Reads and writes issued to
  the parent will go to the child as well (although timing, etc. may be
  modified).

- All drivers but quorum (but quorum is pretty opaque to the general
  block layer anyway) always only have one of these children: All read
  requests must be served from the filtered_rw_child (if it exists), so
  if there was a filtered_cow_child in addition, it would not receive
  any requests at all.
  (The closest here is mirror, where all requests are passed on to the
  source, but with write-blocking, write requests are "COWed" to the
  target.  But that just means that the target is a special child that
  cannot be introspected by the generic block layer functions, and that
  source is a filtered_rw_child.)
  Therefore, we can also add bdrv_filtered_child() which returns that
  one child (or NULL, if there is no filtered child).

Also, many places in the current block layer should be skipping filters
(all filters or just the ones added implicitly, it depends) when going
through a block node chain.  They do not do that currently, but this
patch makes them.

One example for this is qemu-img map, which should skip filters and only
look at the COW elements in the graph.  The change to iotest 204's
reference output shows how using blkdebug on top of a COW node used to
make qemu-img map disregard the rest of the backing chain, but with this
patch, the allocation in the base image is reported correctly.

Furthermore, a note should be made that sometimes we do want to access
bs->backing directly.  This is whenever the operation in question is not
about accessing the COW child, but the "backing" child, be it COW or
not.  This is the case in functions such as bdrv_open_backing_file() or
whenever we have to deal with the special behavior of @backing as a
blockdev option, which is that it does not default to null like all
other child references do.

Finally, the query functions (query-block and query-named-block-nodes)
are modified to return any filtered child under "backing", not just
bs->backing or COW children.  This is so that filters do not interrupt
the reported backing chain.  This changes the output of iotest 184, as
the throttled node now appears as a backing child.

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 qapi/block-core.json           |   4 +
 include/block/block.h          |   1 +
 include/block/block_int.h      |  40 +++++--
 block.c                        | 210 +++++++++++++++++++++++++++------
 block/backup.c                 |   8 +-
 block/block-backend.c          |  16 ++-
 block/commit.c                 |  33 +++---
 block/io.c                     |  45 ++++---
 block/mirror.c                 |  21 ++--
 block/qapi.c                   |  30 +++--
 block/stream.c                 |  13 +-
 blockdev.c                     |  88 +++++++++++---
 migration/block-dirty-bitmap.c |   4 +-
 nbd/server.c                   |   6 +-
 qemu-img.c                     |  29 ++---
 tests/qemu-iotests/184.out     |   7 +-
 tests/qemu-iotests/204.out     |   1 +
 17 files changed, 411 insertions(+), 145 deletions(-)

diff --git a/qapi/block-core.json b/qapi/block-core.json
index 7ccbfff9d0..dbd9286e4a 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -2502,6 +2502,10 @@
 # On successful completion the image file is updated to drop the backing file
 # and the BLOCK_JOB_COMPLETED event is emitted.
 #
+# In case @device is a filter node, block-stream modifies the first non-filter
+# overlay node below it to point to base's backing node (or NULL if @base was
+# not specified) instead of modifying @device itself.
+#
 # @job-id: identifier for the newly-created block job. If
 #          omitted, the device name will be used. (Since 2.7)
 #
diff --git a/include/block/block.h b/include/block/block.h
index c7a26199aa..2005664f14 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -467,6 +467,7 @@ BlockDriverState *bdrv_lookup_bs(const char *device,
                                  const char *node_name,
                                  Error **errp);
 bool bdrv_chain_contains(BlockDriverState *top, BlockDriverState *base);
+bool bdrv_legacy_chain_contains(BlockDriverState *top, BlockDriverState *base);
 BlockDriverState *bdrv_next_node(BlockDriverState *bs);
 BlockDriverState *bdrv_next_all_states(BlockDriverState *bs);
 
diff --git a/include/block/block_int.h b/include/block/block_int.h
index 01e855a066..b22b1164f8 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -90,9 +90,11 @@ struct BlockDriver {
     int instance_size;
 
     /* set to true if the BlockDriver is a block filter. Block filters pass
-     * certain callbacks that refer to data (see block.c) to their bs->file if
-     * the driver doesn't implement them. Drivers that do not wish to forward
-     * must implement them and return -ENOTSUP.
+     * certain callbacks that refer to data (see block.c) to their bs->file
+     * or bs->backing (whichever one exists) if the driver doesn't implement
+     * them. Drivers that do not wish to forward must implement them and return
+     * -ENOTSUP.
+     * Note that filters are not allowed to modify data.
      */
     bool is_filter;
     /* for snapshots block filter like Quorum can implement the
@@ -906,11 +908,6 @@ typedef enum BlockMirrorBackingMode {
     MIRROR_LEAVE_BACKING_CHAIN,
 } BlockMirrorBackingMode;
 
-static inline BlockDriverState *backing_bs(BlockDriverState *bs)
-{
-    return bs->backing ? bs->backing->bs : NULL;
-}
-
 
 /* Essential block drivers which must always be statically linked into qemu, and
  * which therefore can be accessed without using bdrv_find_format() */
@@ -1243,4 +1240,31 @@ int coroutine_fn bdrv_co_copy_range_to(BdrvChild *src, uint64_t src_offset,
 
 int refresh_total_sectors(BlockDriverState *bs, int64_t hint);
 
+BdrvChild *bdrv_filtered_cow_child(BlockDriverState *bs);
+BdrvChild *bdrv_filtered_rw_child(BlockDriverState *bs);
+BdrvChild *bdrv_filtered_child(BlockDriverState *bs);
+BlockDriverState *bdrv_skip_implicit_filters(BlockDriverState *bs);
+BlockDriverState *bdrv_skip_rw_filters(BlockDriverState *bs);
+BlockDriverState *bdrv_backing_chain_next(BlockDriverState *bs);
+
+static inline BlockDriverState *child_bs(BdrvChild *child)
+{
+    return child ? child->bs : NULL;
+}
+
+static inline BlockDriverState *bdrv_filtered_cow_bs(BlockDriverState *bs)
+{
+    return child_bs(bdrv_filtered_cow_child(bs));
+}
+
+static inline BlockDriverState *bdrv_filtered_rw_bs(BlockDriverState *bs)
+{
+    return child_bs(bdrv_filtered_rw_child(bs));
+}
+
+static inline BlockDriverState *bdrv_filtered_bs(BlockDriverState *bs)
+{
+    return child_bs(bdrv_filtered_child(bs));
+}
+
 #endif /* BLOCK_INT_H */
diff --git a/block.c b/block.c
index 16615bc876..e8f6febda0 100644
--- a/block.c
+++ b/block.c
@@ -556,11 +556,12 @@ int bdrv_create_file(const char *filename, QemuOpts *opts, Error **errp)
 int bdrv_probe_blocksizes(BlockDriverState *bs, BlockSizes *bsz)
 {
     BlockDriver *drv = bs->drv;
+    BlockDriverState *filtered = bdrv_filtered_rw_bs(bs);
 
     if (drv && drv->bdrv_probe_blocksizes) {
         return drv->bdrv_probe_blocksizes(bs, bsz);
-    } else if (drv && drv->is_filter && bs->file) {
-        return bdrv_probe_blocksizes(bs->file->bs, bsz);
+    } else if (filtered) {
+        return bdrv_probe_blocksizes(filtered, bsz);
     }
 
     return -ENOTSUP;
@@ -575,11 +576,12 @@ int bdrv_probe_blocksizes(BlockDriverState *bs, BlockSizes *bsz)
 int bdrv_probe_geometry(BlockDriverState *bs, HDGeometry *geo)
 {
     BlockDriver *drv = bs->drv;
+    BlockDriverState *filtered = bdrv_filtered_rw_bs(bs);
 
     if (drv && drv->bdrv_probe_geometry) {
         return drv->bdrv_probe_geometry(bs, geo);
-    } else if (drv && drv->is_filter && bs->file) {
-        return bdrv_probe_geometry(bs->file->bs, geo);
+    } else if (filtered) {
+        return bdrv_probe_geometry(filtered, geo);
     }
 
     return -ENOTSUP;
@@ -2336,7 +2338,7 @@ static bool bdrv_inherits_from_recursive(BlockDriverState *child,
 }
 
 /*
- * Sets the backing file link of a BDS. A new reference is created; callers
+ * Sets the bs->backing link of a BDS. A new reference is created; callers
  * which don't need their own reference any more must call bdrv_unref().
  */
 void bdrv_set_backing_hd(BlockDriverState *bs, BlockDriverState *backing_hd,
@@ -2345,7 +2347,7 @@ void bdrv_set_backing_hd(BlockDriverState *bs, BlockDriverState *backing_hd,
     bool update_inherits_from = bdrv_chain_contains(bs, backing_hd) &&
         bdrv_inherits_from_recursive(backing_hd, bs);
 
-    if (bdrv_is_backing_chain_frozen(bs, backing_bs(bs), errp)) {
+    if (bdrv_is_backing_chain_frozen(bs, child_bs(bs->backing), errp)) {
         return;
     }
 
@@ -3467,14 +3469,17 @@ static int bdrv_reopen_parse_backing(BDRVReopenState *reopen_state,
     /*
      * Find the "actual" backing file by skipping all links that point
      * to an implicit node, if any (e.g. a commit filter node).
+     * We cannot use any of the bdrv_skip_*() functions here because
+     * those return the first explicit node, while we are looking for
+     * its overlay here.
      */
     overlay_bs = bs;
-    while (backing_bs(overlay_bs) && backing_bs(overlay_bs)->implicit) {
-        overlay_bs = backing_bs(overlay_bs);
+    while (overlay_bs->backing && bdrv_filtered_bs(overlay_bs)->implicit) {
+        overlay_bs = bdrv_filtered_bs(overlay_bs);
     }
 
     /* If we want to replace the backing file we need some extra checks */
-    if (new_backing_bs != backing_bs(overlay_bs)) {
+    if (new_backing_bs != child_bs(overlay_bs->backing)) {
         /* Check for implicit nodes between bs and its backing file */
         if (bs != overlay_bs) {
             error_setg(errp, "Cannot change backing link if '%s' has "
@@ -3482,8 +3487,8 @@ static int bdrv_reopen_parse_backing(BDRVReopenState *reopen_state,
             return -EPERM;
         }
         /* Check if the backing link that we want to replace is frozen */
-        if (bdrv_is_backing_chain_frozen(overlay_bs, backing_bs(overlay_bs),
-                                         errp)) {
+        if (bdrv_is_backing_chain_frozen(overlay_bs,
+                                         child_bs(overlay_bs->backing), errp)) {
             return -EPERM;
         }
         reopen_state->replace_backing_bs = true;
@@ -3634,7 +3639,7 @@ int bdrv_reopen_prepare(BDRVReopenState *reopen_state, BlockReopenQueue *queue,
      * its metadata. Otherwise the 'backing' option can be omitted.
      */
     if (drv->supports_backing && reopen_state->backing_missing &&
-        (backing_bs(reopen_state->bs) || reopen_state->bs->backing_file[0])) {
+        (reopen_state->bs->backing || reopen_state->bs->backing_file[0])) {
         error_setg(errp, "backing is missing for '%s'",
                    reopen_state->bs->node_name);
         ret = -EINVAL;
@@ -3779,7 +3784,7 @@ void bdrv_reopen_commit(BDRVReopenState *reopen_state)
      * from bdrv_set_backing_hd()) has the new values.
      */
     if (reopen_state->replace_backing_bs) {
-        BlockDriverState *old_backing_bs = backing_bs(bs);
+        BlockDriverState *old_backing_bs = child_bs(bs->backing);
         assert(!old_backing_bs || !old_backing_bs->implicit);
         /* Abort the permission update on the backing bs we're detaching */
         if (old_backing_bs) {
@@ -4203,8 +4208,8 @@ int bdrv_change_backing_file(BlockDriverState *bs,
 BlockDriverState *bdrv_find_overlay(BlockDriverState *active,
                                     BlockDriverState *bs)
 {
-    while (active && bs != backing_bs(active)) {
-        active = backing_bs(active);
+    while (active && bs != bdrv_filtered_bs(active)) {
+        active = bdrv_filtered_bs(active);
     }
 
     return active;
@@ -4226,11 +4231,11 @@ bool bdrv_is_backing_chain_frozen(BlockDriverState *bs, BlockDriverState *base,
 {
     BlockDriverState *i;
 
-    for (i = bs; i != base; i = backing_bs(i)) {
+    for (i = bs; i != base; i = child_bs(i->backing)) {
         if (i->backing && i->backing->frozen) {
             error_setg(errp, "Cannot change '%s' link from '%s' to '%s'",
                        i->backing->name, i->node_name,
-                       backing_bs(i)->node_name);
+                       i->backing->bs->node_name);
             return true;
         }
     }
@@ -4254,7 +4259,7 @@ int bdrv_freeze_backing_chain(BlockDriverState *bs, BlockDriverState *base,
         return -EPERM;
     }
 
-    for (i = bs; i != base; i = backing_bs(i)) {
+    for (i = bs; i != base; i = child_bs(i->backing)) {
         if (i->backing) {
             i->backing->frozen = true;
         }
@@ -4272,7 +4277,7 @@ void bdrv_unfreeze_backing_chain(BlockDriverState *bs, BlockDriverState *base)
 {
     BlockDriverState *i;
 
-    for (i = bs; i != base; i = backing_bs(i)) {
+    for (i = bs; i != base; i = child_bs(i->backing)) {
         if (i->backing) {
             assert(i->backing->frozen);
             i->backing->frozen = false;
@@ -4342,9 +4347,7 @@ int bdrv_drop_intermediate(BlockDriverState *top, BlockDriverState *base,
      * other intermediate nodes have been dropped.
      * If 'top' is an implicit node (e.g. "commit_top") we should skip
      * it because no one inherits from it. We use explicit_top for that. */
-    while (explicit_top && explicit_top->implicit) {
-        explicit_top = backing_bs(explicit_top);
-    }
+    explicit_top = bdrv_skip_implicit_filters(explicit_top);
     update_inherits_from = bdrv_inherits_from_recursive(base, explicit_top);
 
     /* success - we can delete the intermediate states, and link top->base */
@@ -4494,10 +4497,14 @@ bool bdrv_is_sg(BlockDriverState *bs)
 
 bool bdrv_is_encrypted(BlockDriverState *bs)
 {
-    if (bs->backing && bs->backing->bs->encrypted) {
+    BlockDriverState *filtered = bdrv_filtered_bs(bs);
+    if (bs->encrypted) {
+        return true;
+    }
+    if (filtered && bdrv_is_encrypted(filtered)) {
         return true;
     }
-    return bs->encrypted;
+    return false;
 }
 
 const char *bdrv_get_format_name(BlockDriverState *bs)
@@ -4794,7 +4801,21 @@ BlockDriverState *bdrv_lookup_bs(const char *device,
 bool bdrv_chain_contains(BlockDriverState *top, BlockDriverState *base)
 {
     while (top && top != base) {
-        top = backing_bs(top);
+        top = bdrv_filtered_bs(top);
+    }
+
+    return top != NULL;
+}
+
+/*
+ * Same as bdrv_chain_contains(), but skip implicitly added R/W filter
+ * nodes and do not move past explicitly added R/W filters.
+ */
+bool bdrv_legacy_chain_contains(BlockDriverState *top, BlockDriverState *base)
+{
+    top = bdrv_skip_implicit_filters(top);
+    while (top && top != base) {
+        top = bdrv_skip_implicit_filters(bdrv_filtered_cow_bs(top));
     }
 
     return top != NULL;
@@ -4866,20 +4887,24 @@ int bdrv_has_zero_init_1(BlockDriverState *bs)
 
 int bdrv_has_zero_init(BlockDriverState *bs)
 {
+    BlockDriverState *filtered;
+
     if (!bs->drv) {
         return 0;
     }
 
     /* If BS is a copy on write image, it is initialized to
        the contents of the base image, which may not be zeroes.  */
-    if (bs->backing) {
+    if (bdrv_filtered_cow_child(bs)) {
         return 0;
     }
     if (bs->drv->bdrv_has_zero_init) {
         return bs->drv->bdrv_has_zero_init(bs);
     }
-    if (bs->file && bs->drv->is_filter) {
-        return bdrv_has_zero_init(bs->file->bs);
+
+    filtered = bdrv_filtered_rw_bs(bs);
+    if (filtered) {
+        return bdrv_has_zero_init(filtered);
     }
 
     /* safe default */
@@ -4890,7 +4915,7 @@ bool bdrv_unallocated_blocks_are_zero(BlockDriverState *bs)
 {
     BlockDriverInfo bdi;
 
-    if (bs->backing) {
+    if (bdrv_filtered_cow_child(bs)) {
         return false;
     }
 
@@ -4924,8 +4949,9 @@ int bdrv_get_info(BlockDriverState *bs, BlockDriverInfo *bdi)
         return -ENOMEDIUM;
     }
     if (!drv->bdrv_get_info) {
-        if (bs->file && drv->is_filter) {
-            return bdrv_get_info(bs->file->bs, bdi);
+        BlockDriverState *filtered = bdrv_filtered_rw_bs(bs);
+        if (filtered) {
+            return bdrv_get_info(filtered, bdi);
         }
         return -ENOTSUP;
     }
@@ -5028,7 +5054,17 @@ BlockDriverState *bdrv_find_backing_image(BlockDriverState *bs,
 
     is_protocol = path_has_protocol(backing_file);
 
-    for (curr_bs = bs; curr_bs->backing; curr_bs = curr_bs->backing->bs) {
+    /*
+     * Being largely a legacy function, skip any filters here
+     * (because filters do not have normal filenames, so they cannot
+     * match anyway; and allowing json:{} filenames is a bit out of
+     * scope).
+     */
+    for (curr_bs = bdrv_skip_rw_filters(bs);
+         bdrv_filtered_cow_child(curr_bs) != NULL;
+         curr_bs = bdrv_backing_chain_next(curr_bs))
+    {
+        BlockDriverState *bs_below = bdrv_backing_chain_next(curr_bs);
 
         /* If either of the filename paths is actually a protocol, then
          * compare unmodified paths; otherwise make paths relative */
@@ -5036,7 +5072,7 @@ BlockDriverState *bdrv_find_backing_image(BlockDriverState *bs,
             char *backing_file_full_ret;
 
             if (strcmp(backing_file, curr_bs->backing_file) == 0) {
-                retval = curr_bs->backing->bs;
+                retval = bs_below;
                 break;
             }
             /* Also check against the full backing filename for the image */
@@ -5046,7 +5082,7 @@ BlockDriverState *bdrv_find_backing_image(BlockDriverState *bs,
                 bool equal = strcmp(backing_file, backing_file_full_ret) == 0;
                 g_free(backing_file_full_ret);
                 if (equal) {
-                    retval = curr_bs->backing->bs;
+                    retval = bs_below;
                     break;
                 }
             }
@@ -5072,7 +5108,7 @@ BlockDriverState *bdrv_find_backing_image(BlockDriverState *bs,
             g_free(filename_tmp);
 
             if (strcmp(backing_file_full, filename_full) == 0) {
-                retval = curr_bs->backing->bs;
+                retval = bs_below;
                 break;
             }
         }
@@ -6237,3 +6273,107 @@ bool bdrv_can_store_new_dirty_bitmap(BlockDriverState *bs, const char *name,
 
     return drv->bdrv_can_store_new_dirty_bitmap(bs, name, granularity, errp);
 }
+
+/*
+ * Return the child that @bs acts as an overlay for, and from which data may be
+ * copied in COW or COR operations.  Usually this is the backing file.
+ */
+BdrvChild *bdrv_filtered_cow_child(BlockDriverState *bs)
+{
+    if (!bs || !bs->drv) {
+        return NULL;
+    }
+
+    if (bs->drv->is_filter) {
+        return NULL;
+    }
+
+    return bs->backing;
+}
+
+/*
+ * If @bs acts as a pass-through filter for one of its children,
+ * return that child.  "Pass-through" means that write operations to
+ * @bs are forwarded to that child instead of triggering COW.
+ */
+BdrvChild *bdrv_filtered_rw_child(BlockDriverState *bs)
+{
+    if (!bs || !bs->drv) {
+        return NULL;
+    }
+
+    if (!bs->drv->is_filter) {
+        return NULL;
+    }
+
+    return bs->backing ?: bs->file;
+}
+
+/*
+ * Return any filtered child, independently of how it reacts to write
+ * accesses and whether data is copied onto this BDS through COR.
+ */
+BdrvChild *bdrv_filtered_child(BlockDriverState *bs)
+{
+    BdrvChild *cow_child = bdrv_filtered_cow_child(bs);
+    BdrvChild *rw_child = bdrv_filtered_rw_child(bs);
+
+    /* There can only be one filtered child at a time */
+    assert(!(cow_child && rw_child));
+
+    return cow_child ?: rw_child;
+}
+
+static BlockDriverState *bdrv_skip_filters(BlockDriverState *bs,
+                                           bool stop_on_explicit_filter)
+{
+    BdrvChild *filtered;
+
+    if (!bs) {
+        return NULL;
+    }
+
+    while (!(stop_on_explicit_filter && !bs->implicit)) {
+        filtered = bdrv_filtered_rw_child(bs);
+        if (!filtered) {
+            break;
+        }
+        bs = filtered->bs;
+    }
+    /*
+     * Note that this treats nodes with bs->drv == NULL as not being
+     * R/W filters (bs->drv == NULL should be replaced by something
+     * else anyway).
+     * The advantage of this behavior is that this function will thus
+     * always return a non-NULL value (given a non-NULL @bs).
+     */
+
+    return bs;
+}
+
+/*
+ * Return the first BDS that has not been added implicitly or that
+ * does not have an RW-filtered child down the chain starting from @bs
+ * (including @bs itself).
+ */
+BlockDriverState *bdrv_skip_implicit_filters(BlockDriverState *bs)
+{
+    return bdrv_skip_filters(bs, true);
+}
+
+/*
+ * Return the first BDS that does not have an RW-filtered child down
+ * the chain starting from @bs (including @bs itself).
+ */
+BlockDriverState *bdrv_skip_rw_filters(BlockDriverState *bs)
+{
+    return bdrv_skip_filters(bs, false);
+}
+
+/*
+ * For a backing chain, return the first non-filter backing image.
+ */
+BlockDriverState *bdrv_backing_chain_next(BlockDriverState *bs)
+{
+    return bdrv_skip_rw_filters(bdrv_filtered_cow_bs(bdrv_skip_rw_filters(bs)));
+}
diff --git a/block/backup.c b/block/backup.c
index 9988753249..9c08353b23 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -577,6 +577,7 @@ BlockJob *backup_job_create(const char *job_id, BlockDriverState *bs,
     int64_t len;
     BlockDriverInfo bdi;
     BackupBlockJob *job = NULL;
+    bool target_does_cow;
     int ret;
 
     assert(bs);
@@ -671,8 +672,9 @@ BlockJob *backup_job_create(const char *job_id, BlockDriverState *bs,
     /* If there is no backing file on the target, we cannot rely on COW if our
      * backup cluster size is smaller than the target cluster size. Even for
      * targets with a backing file, try to avoid COW if possible. */
+    target_does_cow = bdrv_filtered_cow_child(target);
     ret = bdrv_get_info(target, &bdi);
-    if (ret == -ENOTSUP && !target->backing) {
+    if (ret == -ENOTSUP && !target_does_cow) {
         /* Cluster size is not defined */
         warn_report("The target block device doesn't provide "
                     "information about the block size and it doesn't have a "
@@ -681,14 +683,14 @@ BlockJob *backup_job_create(const char *job_id, BlockDriverState *bs,
                     "this default, the backup may be unusable",
                     BACKUP_CLUSTER_SIZE_DEFAULT);
         job->cluster_size = BACKUP_CLUSTER_SIZE_DEFAULT;
-    } else if (ret < 0 && !target->backing) {
+    } else if (ret < 0 && !target_does_cow) {
         error_setg_errno(errp, -ret,
             "Couldn't determine the cluster size of the target image, "
             "which has no backing file");
         error_append_hint(errp,
             "Aborting, since this may create an unusable destination image\n");
         goto error;
-    } else if (ret < 0 && target->backing) {
+    } else if (ret < 0 && target_does_cow) {
         /* Not fatal; just trudge on ahead. */
         job->cluster_size = BACKUP_CLUSTER_SIZE_DEFAULT;
     } else {
diff --git a/block/block-backend.c b/block/block-backend.c
index f78e82a707..aa9a1d84a6 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -2089,11 +2089,17 @@ int blk_commit_all(void)
         AioContext *aio_context = blk_get_aio_context(blk);
 
         aio_context_acquire(aio_context);
-        if (blk_is_inserted(blk) && blk->root->bs->backing) {
-            int ret = bdrv_commit(blk->root->bs);
-            if (ret < 0) {
-                aio_context_release(aio_context);
-                return ret;
+        if (blk_is_inserted(blk)) {
+            BlockDriverState *non_filter;
+
+            /* Legacy function, so skip implicit filters */
+            non_filter = bdrv_skip_implicit_filters(blk->root->bs);
+            if (bdrv_filtered_cow_child(non_filter)) {
+                int ret = bdrv_commit(non_filter);
+                if (ret < 0) {
+                    aio_context_release(aio_context);
+                    return ret;
+                }
             }
         }
         aio_context_release(aio_context);
diff --git a/block/commit.c b/block/commit.c
index 02eab34925..252007fd57 100644
--- a/block/commit.c
+++ b/block/commit.c
@@ -113,7 +113,7 @@ static void commit_abort(Job *job)
      * something to base, the intermediate images aren't valid any more. */
     bdrv_child_try_set_perm(s->commit_top_bs->backing, 0, BLK_PERM_ALL,
                             &error_abort);
-    bdrv_replace_node(s->commit_top_bs, backing_bs(s->commit_top_bs),
+    bdrv_replace_node(s->commit_top_bs, s->commit_top_bs->backing->bs,
                       &error_abort);
 
     bdrv_unref(s->commit_top_bs);
@@ -324,10 +324,16 @@ void commit_start(const char *job_id, BlockDriverState *bs,
     s->commit_top_bs = commit_top_bs;
     bdrv_unref(commit_top_bs);
 
-    /* Block all nodes between top and base, because they will
-     * disappear from the chain after this operation. */
+    /*
+     * Block all nodes between top and base, because they will
+     * disappear from the chain after this operation.
+     * Note that this assumes that the user is fine with removing all
+     * nodes (including R/W filters) between top and base.  Assuring
+     * this is the responsibility of the interface (i.e. whoever calls
+     * commit_start()).
+     */
     assert(bdrv_chain_contains(top, base));
-    for (iter = top; iter != base; iter = backing_bs(iter)) {
+    for (iter = top; iter != base; iter = bdrv_filtered_bs(iter)) {
         /* XXX BLK_PERM_WRITE needs to be allowed so we don't block ourselves
          * at s->base (if writes are blocked for a node, they are also blocked
          * for its backing file). The other options would be a second filter
@@ -414,19 +420,22 @@ int bdrv_commit(BlockDriverState *bs)
     if (!drv)
         return -ENOMEDIUM;
 
-    if (!bs->backing) {
+    backing_file_bs = bdrv_filtered_cow_bs(bs);
+
+    if (!backing_file_bs) {
         return -ENOTSUP;
     }
 
     if (bdrv_op_is_blocked(bs, BLOCK_OP_TYPE_COMMIT_SOURCE, NULL) ||
-        bdrv_op_is_blocked(bs->backing->bs, BLOCK_OP_TYPE_COMMIT_TARGET, NULL)) {
+        bdrv_op_is_blocked(backing_file_bs, BLOCK_OP_TYPE_COMMIT_TARGET, NULL))
+    {
         return -EBUSY;
     }
 
-    ro = bs->backing->bs->read_only;
+    ro = backing_file_bs->read_only;
 
     if (ro) {
-        if (bdrv_reopen_set_read_only(bs->backing->bs, false, NULL)) {
+        if (bdrv_reopen_set_read_only(backing_file_bs, false, NULL)) {
             return -EACCES;
         }
     }
@@ -441,8 +450,6 @@ int bdrv_commit(BlockDriverState *bs)
     }
 
     /* Insert commit_top block node above backing, so we can write to it */
-    backing_file_bs = backing_bs(bs);
-
     commit_top_bs = bdrv_new_open_driver(&bdrv_commit_top, NULL, BDRV_O_RDWR,
                                          &local_err);
     if (commit_top_bs == NULL) {
@@ -528,15 +535,13 @@ ro_cleanup:
     qemu_vfree(buf);
 
     blk_unref(backing);
-    if (backing_file_bs) {
-        bdrv_set_backing_hd(bs, backing_file_bs, &error_abort);
-    }
+    bdrv_set_backing_hd(bs, backing_file_bs, &error_abort);
     bdrv_unref(commit_top_bs);
     blk_unref(src);
 
     if (ro) {
         /* ignoring error return here */
-        bdrv_reopen_set_read_only(bs->backing->bs, true, NULL);
+        bdrv_reopen_set_read_only(backing_file_bs, true, NULL);
     }
 
     return ret;
diff --git a/block/io.c b/block/io.c
index dfc153b8d8..83c2b6b46a 100644
--- a/block/io.c
+++ b/block/io.c
@@ -118,8 +118,17 @@ static void bdrv_merge_limits(BlockLimits *dst, const BlockLimits *src)
 void bdrv_refresh_limits(BlockDriverState *bs, Error **errp)
 {
     BlockDriver *drv = bs->drv;
+    BlockDriverState *storage_bs;
+    BlockDriverState *cow_bs = bdrv_filtered_cow_bs(bs);
     Error *local_err = NULL;
 
+    /*
+     * FIXME: There should be a function for this, and in fact there
+     * will be as of a follow-up patch.
+     */
+    storage_bs =
+        child_bs(bs->file) ?: bdrv_filtered_rw_bs(bs);
+
     memset(&bs->bl, 0, sizeof(bs->bl));
 
     if (!drv) {
@@ -131,13 +140,13 @@ void bdrv_refresh_limits(BlockDriverState *bs, Error **errp)
                                 drv->bdrv_aio_preadv) ? 1 : 512;
 
     /* Take some limits from the children as a default */
-    if (bs->file) {
-        bdrv_refresh_limits(bs->file->bs, &local_err);
+    if (storage_bs) {
+        bdrv_refresh_limits(storage_bs, &local_err);
         if (local_err) {
             error_propagate(errp, local_err);
             return;
         }
-        bdrv_merge_limits(&bs->bl, &bs->file->bs->bl);
+        bdrv_merge_limits(&bs->bl, &storage_bs->bl);
     } else {
         bs->bl.min_mem_alignment = 512;
         bs->bl.opt_mem_alignment = getpagesize();
@@ -146,13 +155,13 @@ void bdrv_refresh_limits(BlockDriverState *bs, Error **errp)
         bs->bl.max_iov = IOV_MAX;
     }
 
-    if (bs->backing) {
-        bdrv_refresh_limits(bs->backing->bs, &local_err);
+    if (cow_bs) {
+        bdrv_refresh_limits(cow_bs, &local_err);
         if (local_err) {
             error_propagate(errp, local_err);
             return;
         }
-        bdrv_merge_limits(&bs->bl, &bs->backing->bs->bl);
+        bdrv_merge_limits(&bs->bl, &cow_bs->bl);
     }
 
     /* Then let the driver override it */
@@ -2139,11 +2148,12 @@ static int coroutine_fn bdrv_co_block_status(BlockDriverState *bs,
     if (ret & (BDRV_BLOCK_DATA | BDRV_BLOCK_ZERO)) {
         ret |= BDRV_BLOCK_ALLOCATED;
     } else if (want_zero) {
+        BlockDriverState *cow_bs = bdrv_filtered_cow_bs(bs);
+
         if (bdrv_unallocated_blocks_are_zero(bs)) {
             ret |= BDRV_BLOCK_ZERO;
-        } else if (bs->backing) {
-            BlockDriverState *bs2 = bs->backing->bs;
-            int64_t size2 = bdrv_getlength(bs2);
+        } else if (cow_bs) {
+            int64_t size2 = bdrv_getlength(cow_bs);
 
             if (size2 >= 0 && offset >= size2) {
                 ret |= BDRV_BLOCK_ZERO;
@@ -2208,7 +2218,7 @@ static int coroutine_fn bdrv_co_block_status_above(BlockDriverState *bs,
     bool first = true;
 
     assert(bs != base);
-    for (p = bs; p != base; p = backing_bs(p)) {
+    for (p = bs; p != base; p = bdrv_filtered_bs(p)) {
         ret = bdrv_co_block_status(p, want_zero, offset, bytes, pnum, map,
                                    file);
         if (ret < 0) {
@@ -2294,7 +2304,7 @@ int bdrv_block_status_above(BlockDriverState *bs, BlockDriverState *base,
 int bdrv_block_status(BlockDriverState *bs, int64_t offset, int64_t bytes,
                       int64_t *pnum, int64_t *map, BlockDriverState **file)
 {
-    return bdrv_block_status_above(bs, backing_bs(bs),
+    return bdrv_block_status_above(bs, bdrv_filtered_bs(bs),
                                    offset, bytes, pnum, map, file);
 }
 
@@ -2304,9 +2314,9 @@ int coroutine_fn bdrv_is_allocated(BlockDriverState *bs, int64_t offset,
     int ret;
     int64_t dummy;
 
-    ret = bdrv_common_block_status_above(bs, backing_bs(bs), false, offset,
-                                         bytes, pnum ? pnum : &dummy, NULL,
-                                         NULL);
+    ret = bdrv_common_block_status_above(bs, bdrv_filtered_bs(bs), false,
+                                         offset, bytes, pnum ? pnum : &dummy,
+                                         NULL, NULL);
     if (ret < 0) {
         return ret;
     }
@@ -2360,7 +2370,7 @@ int bdrv_is_allocated_above(BlockDriverState *top,
             n = pnum_inter;
         }
 
-        intermediate = backing_bs(intermediate);
+        intermediate = bdrv_filtered_bs(intermediate);
     }
 
     *pnum = n;
@@ -3135,8 +3145,9 @@ int coroutine_fn bdrv_co_truncate(BdrvChild *child, int64_t offset,
     }
 
     if (!drv->bdrv_co_truncate) {
-        if (bs->file && drv->is_filter) {
-            ret = bdrv_co_truncate(bs->file, offset, prealloc, errp);
+        BdrvChild *filtered = bdrv_filtered_rw_child(bs);
+        if (filtered) {
+            ret = bdrv_co_truncate(filtered, offset, prealloc, errp);
             goto out;
         }
         error_setg(errp, "Image format driver does not support resize");
diff --git a/block/mirror.c b/block/mirror.c
index 8b2404051f..80cef587f0 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -660,8 +660,9 @@ static int mirror_exit_common(Job *job)
                             &error_abort);
     if (!abort && s->backing_mode == MIRROR_SOURCE_BACKING_CHAIN) {
         BlockDriverState *backing = s->is_none_mode ? src : s->base;
-        if (backing_bs(target_bs) != backing) {
-            bdrv_set_backing_hd(target_bs, backing, &local_err);
+        if (bdrv_backing_chain_next(target_bs) != backing) {
+            bdrv_set_backing_hd(bdrv_skip_rw_filters(target_bs), backing,
+                                &local_err);
             if (local_err) {
                 error_report_err(local_err);
                 ret = -EPERM;
@@ -711,7 +712,7 @@ static int mirror_exit_common(Job *job)
     block_job_remove_all_bdrv(bjob);
     bdrv_child_try_set_perm(mirror_top_bs->backing, 0, BLK_PERM_ALL,
                             &error_abort);
-    bdrv_replace_node(mirror_top_bs, backing_bs(mirror_top_bs), &error_abort);
+    bdrv_replace_node(mirror_top_bs, mirror_top_bs->backing->bs, &error_abort);
 
     /* We just changed the BDS the job BB refers to (with either or both of the
      * bdrv_replace_node() calls), so switch the BB back so the cleanup does
@@ -903,7 +904,7 @@ static int coroutine_fn mirror_run(Job *job, Error **errp)
     } else {
         s->target_cluster_size = BDRV_SECTOR_SIZE;
     }
-    if (backing_filename[0] && !target_bs->backing &&
+    if (backing_filename[0] && !bdrv_filtered_cow_child(target_bs) &&
         s->granularity < s->target_cluster_size) {
         s->buf_size = MAX(s->buf_size, s->target_cluster_size);
         s->cow_bitmap = bitmap_new(length);
@@ -1083,7 +1084,7 @@ static void mirror_complete(Job *job, Error **errp)
     if (s->backing_mode == MIRROR_OPEN_BACKING_CHAIN) {
         int ret;
 
-        assert(!target->backing);
+        assert(!bdrv_filtered_cow_child(target));
         ret = bdrv_open_backing_file(target, NULL, "backing", errp);
         if (ret < 0) {
             return;
@@ -1650,7 +1651,9 @@ static void mirror_start_job(const char *job_id, BlockDriverState *bs,
      * any jobs in them must be blocked */
     if (target_is_backing) {
         BlockDriverState *iter;
-        for (iter = backing_bs(bs); iter != target; iter = backing_bs(iter)) {
+        for (iter = bdrv_filtered_bs(bs); iter != target;
+             iter = bdrv_filtered_bs(iter))
+        {
             /* XXX BLK_PERM_WRITE needs to be allowed so we don't block
              * ourselves at s->base (if writes are blocked for a node, they are
              * also blocked for its backing file). The other options would be a
@@ -1691,7 +1694,7 @@ fail:
 
     bdrv_child_try_set_perm(mirror_top_bs->backing, 0, BLK_PERM_ALL,
                             &error_abort);
-    bdrv_replace_node(mirror_top_bs, backing_bs(mirror_top_bs), &error_abort);
+    bdrv_replace_node(mirror_top_bs, mirror_top_bs->backing->bs, &error_abort);
 
     bdrv_unref(mirror_top_bs);
 }
@@ -1707,14 +1710,14 @@ void mirror_start(const char *job_id, BlockDriverState *bs,
                   MirrorCopyMode copy_mode, Error **errp)
 {
     bool is_none_mode;
-    BlockDriverState *base;
+    BlockDriverState *base = NULL;
 
     if (mode == MIRROR_SYNC_MODE_INCREMENTAL) {
         error_setg(errp, "Sync mode 'incremental' not supported");
         return;
     }
     is_none_mode = mode == MIRROR_SYNC_MODE_NONE;
-    base = mode == MIRROR_SYNC_MODE_TOP ? backing_bs(bs) : NULL;
+    base = mode == MIRROR_SYNC_MODE_TOP ? bdrv_backing_chain_next(bs) : NULL;
     mirror_start_job(job_id, bs, creation_flags, target, replaces,
                      speed, granularity, buf_size, backing_mode,
                      on_source_error, on_target_error, unmap, NULL, NULL,
diff --git a/block/qapi.c b/block/qapi.c
index 110d05dc57..478c6f5e0d 100644
--- a/block/qapi.c
+++ b/block/qapi.c
@@ -149,9 +149,13 @@ BlockDeviceInfo *bdrv_block_device_info(BlockBackend *blk,
             return NULL;
         }
 
-        if (bs0->drv && bs0->backing) {
+        if (bs0->drv && bdrv_filtered_child(bs0)) {
+            /*
+             * Put any filtered child here (for backwards compatibility to when
+             * we put bs0->backing here, which might be any filtered child).
+             */
             info->backing_file_depth++;
-            bs0 = bs0->backing->bs;
+            bs0 = bdrv_filtered_bs(bs0);
             (*p_image_info)->has_backing_image = true;
             p_image_info = &((*p_image_info)->backing_image);
         } else {
@@ -160,9 +164,8 @@ BlockDeviceInfo *bdrv_block_device_info(BlockBackend *blk,
 
         /* Skip automatically inserted nodes that the user isn't aware of for
          * query-block (blk != NULL), but not for query-named-block-nodes */
-        while (blk && bs0->drv && bs0->implicit) {
-            bs0 = backing_bs(bs0);
-            assert(bs0);
+        if (blk) {
+            bs0 = bdrv_skip_implicit_filters(bs0);
         }
     }
 
@@ -347,9 +350,9 @@ static void bdrv_query_info(BlockBackend *blk, BlockInfo **p_info,
     BlockDriverState *bs = blk_bs(blk);
     char *qdev;
 
-    /* Skip automatically inserted nodes that the user isn't aware of */
-    while (bs && bs->drv && bs->implicit) {
-        bs = backing_bs(bs);
+    if (bs) {
+        /* Skip automatically inserted nodes that the user isn't aware of */
+        bs = bdrv_skip_implicit_filters(bs);
     }
 
     info->device = g_strdup(blk_name(blk));
@@ -506,6 +509,7 @@ static void bdrv_query_blk_stats(BlockDeviceStats *ds, BlockBackend *blk)
 static BlockStats *bdrv_query_bds_stats(BlockDriverState *bs,
                                         bool blk_level)
 {
+    BlockDriverState *cow_bs;
     BlockStats *s = NULL;
 
     s = g_malloc0(sizeof(*s));
@@ -518,9 +522,8 @@ static BlockStats *bdrv_query_bds_stats(BlockDriverState *bs,
     /* Skip automatically inserted nodes that the user isn't aware of in
      * a BlockBackend-level command. Stay at the exact node for a node-level
      * command. */
-    while (blk_level && bs->drv && bs->implicit) {
-        bs = backing_bs(bs);
-        assert(bs);
+    if (blk_level) {
+        bs = bdrv_skip_implicit_filters(bs);
     }
 
     if (bdrv_get_node_name(bs)[0]) {
@@ -535,9 +538,10 @@ static BlockStats *bdrv_query_bds_stats(BlockDriverState *bs,
         s->parent = bdrv_query_bds_stats(bs->file->bs, blk_level);
     }
 
-    if (blk_level && bs->backing) {
+    cow_bs = bdrv_filtered_cow_bs(bs);
+    if (blk_level && cow_bs) {
         s->has_backing = true;
-        s->backing = bdrv_query_bds_stats(bs->backing->bs, blk_level);
+        s->backing = bdrv_query_bds_stats(cow_bs, blk_level);
     }
 
     return s;
diff --git a/block/stream.c b/block/stream.c
index bfaebb861a..23d5c890e0 100644
--- a/block/stream.c
+++ b/block/stream.c
@@ -65,6 +65,7 @@ static int stream_prepare(Job *job)
     StreamBlockJob *s = container_of(job, StreamBlockJob, common.job);
     BlockJob *bjob = &s->common;
     BlockDriverState *bs = blk_bs(bjob->blk);
+    BlockDriverState *unfiltered = bdrv_skip_rw_filters(bs);
     BlockDriverState *base = s->base;
     Error *local_err = NULL;
     int ret = 0;
@@ -72,7 +73,7 @@ static int stream_prepare(Job *job)
     bdrv_unfreeze_backing_chain(bs, base);
     s->chain_frozen = false;
 
-    if (bs->backing) {
+    if (bdrv_filtered_cow_child(unfiltered)) {
         const char *base_id = NULL, *base_fmt = NULL;
         if (base) {
             base_id = s->backing_file_str;
@@ -80,7 +81,7 @@ static int stream_prepare(Job *job)
                 base_fmt = base->drv->format_name;
             }
         }
-        ret = bdrv_change_backing_file(bs, base_id, base_fmt);
+        ret = bdrv_change_backing_file(unfiltered, base_id, base_fmt);
         bdrv_set_backing_hd(bs, base, &local_err);
         if (local_err) {
             error_report_err(local_err);
@@ -121,7 +122,7 @@ static int coroutine_fn stream_run(Job *job, Error **errp)
     int64_t n = 0; /* bytes */
     void *buf;
 
-    if (!bs->backing) {
+    if (!bdrv_filtered_child(bs)) {
         goto out;
     }
 
@@ -162,7 +163,7 @@ static int coroutine_fn stream_run(Job *job, Error **errp)
         } else if (ret >= 0) {
             /* Copy if allocated in the intermediate images.  Limit to the
              * known-unallocated area [offset, offset+n*BDRV_SECTOR_SIZE).  */
-            ret = bdrv_is_allocated_above(backing_bs(bs), base,
+            ret = bdrv_is_allocated_above(bdrv_filtered_bs(bs), base,
                                           offset, n, &n);
 
             /* Finish early if end of backing file has been reached */
@@ -268,7 +269,9 @@ void stream_start(const char *job_id, BlockDriverState *bs,
      * disappear from the chain after this operation. The streaming job reads
      * every block only once, assuming that it doesn't change, so block writes
      * and resizes. */
-    for (iter = backing_bs(bs); iter && iter != base; iter = backing_bs(iter)) {
+    for (iter = bdrv_filtered_bs(bs); iter && iter != base;
+         iter = bdrv_filtered_bs(iter))
+    {
         block_job_add_bdrv(&s->common, "intermediate node", iter, 0,
                            BLK_PERM_CONSISTENT_READ | BLK_PERM_WRITE_UNCHANGED,
                            &error_abort);
diff --git a/blockdev.c b/blockdev.c
index 4775a07d93..bb71b8368d 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -1094,7 +1094,7 @@ void hmp_commit(Monitor *mon, const QDict *qdict)
             return;
         }
 
-        bs = blk_bs(blk);
+        bs = bdrv_skip_implicit_filters(blk_bs(blk));
         aio_context = bdrv_get_aio_context(bs);
         aio_context_acquire(aio_context);
 
@@ -1663,7 +1663,7 @@ static void external_snapshot_prepare(BlkActionState *common,
         goto out;
     }
 
-    if (state->new_bs->backing != NULL) {
+    if (bdrv_filtered_cow_child(state->new_bs)) {
         error_setg(errp, "The snapshot already has a backing image");
         goto out;
     }
@@ -3202,6 +3202,13 @@ void qmp_block_stream(bool has_job_id, const char *job_id, const char *device,
         if (!base_bs) {
             goto out;
         }
+        /*
+         * Streaming copies data through COR, so all of the filters
+         * between the target and the base are considered.  Therefore,
+         * we can use bdrv_chain_contains() and do not have to use
+         * bdrv_legacy_chain_contains() (which does not go past
+         * explicitly added filters).
+         */
         if (bs == base_bs || !bdrv_chain_contains(bs, base_bs)) {
             error_setg(errp, "Node '%s' is not a backing image of '%s'",
                        base_node, device);
@@ -3213,7 +3220,7 @@ void qmp_block_stream(bool has_job_id, const char *job_id, const char *device,
     }
 
     /* Check for op blockers in the whole chain between bs and base */
-    for (iter = bs; iter && iter != base_bs; iter = backing_bs(iter)) {
+    for (iter = bs; iter && iter != base_bs; iter = bdrv_filtered_bs(iter)) {
         if (bdrv_op_is_blocked(iter, BLOCK_OP_TYPE_STREAM, errp)) {
             goto out;
         }
@@ -3370,7 +3377,9 @@ void qmp_block_commit(bool has_job_id, const char *job_id, const char *device,
 
     assert(bdrv_get_aio_context(base_bs) == aio_context);
 
-    for (iter = top_bs; iter != backing_bs(base_bs); iter = backing_bs(iter)) {
+    for (iter = top_bs; iter != bdrv_filtered_bs(base_bs);
+         iter = bdrv_filtered_bs(iter))
+    {
         if (bdrv_op_is_blocked(iter, BLOCK_OP_TYPE_COMMIT_TARGET, errp)) {
             goto out;
         }
@@ -3381,6 +3390,11 @@ void qmp_block_commit(bool has_job_id, const char *job_id, const char *device,
         error_setg(errp, "cannot commit an image into itself");
         goto out;
     }
+    if (!bdrv_legacy_chain_contains(top_bs, base_bs)) {
+        /* We have to disallow this until the user can give explicit consent */
+        error_setg(errp, "Cannot commit through explicit filter nodes");
+        goto out;
+    }
 
     if (top_bs == bs) {
         if (has_backing_file) {
@@ -3472,7 +3486,13 @@ static BlockJob *do_drive_backup(DriveBackup *backup, JobTxn *txn,
     /* See if we have a backing HD we can use to create our new image
      * on top of. */
     if (backup->sync == MIRROR_SYNC_MODE_TOP) {
-        source = backing_bs(bs);
+        /*
+         * Backup will not replace the source by the target, so none
+         * of the filters skipped here will be removed (in contrast to
+         * mirror).  Therefore, we can skip all of them when looking
+         * for the first COW relationship.
+         */
+        source = bdrv_filtered_cow_bs(bdrv_skip_rw_filters(bs));
         if (!source) {
             backup->sync = MIRROR_SYNC_MODE_FULL;
         }
@@ -3492,9 +3512,14 @@ static BlockJob *do_drive_backup(DriveBackup *backup, JobTxn *txn,
     if (backup->mode != NEW_IMAGE_MODE_EXISTING) {
         assert(backup->format);
         if (source) {
-            bdrv_refresh_filename(source);
-            bdrv_img_create(backup->target, backup->format, source->filename,
-                            source->drv->format_name, NULL,
+            /* Implicit filters should not appear in the filename */
+            BlockDriverState *explicit_backing =
+                bdrv_skip_implicit_filters(source);
+
+            bdrv_refresh_filename(explicit_backing);
+            bdrv_img_create(backup->target, backup->format,
+                            explicit_backing->filename,
+                            explicit_backing->drv->format_name, NULL,
                             size, flags, false, &local_err);
         } else {
             bdrv_img_create(backup->target, backup->format, NULL, NULL, NULL,
@@ -3752,7 +3777,7 @@ static void blockdev_mirror_common(const char *job_id, BlockDriverState *bs,
         return;
     }
 
-    if (!bs->backing && sync == MIRROR_SYNC_MODE_TOP) {
+    if (!bdrv_backing_chain_next(bs) && sync == MIRROR_SYNC_MODE_TOP) {
         sync = MIRROR_SYNC_MODE_FULL;
     }
 
@@ -3801,7 +3826,7 @@ static void blockdev_mirror_common(const char *job_id, BlockDriverState *bs,
 
 void qmp_drive_mirror(DriveMirror *arg, Error **errp)
 {
-    BlockDriverState *bs;
+    BlockDriverState *bs, *unfiltered_bs;
     BlockDriverState *source, *target_bs;
     AioContext *aio_context;
     BlockMirrorBackingMode backing_mode;
@@ -3810,6 +3835,7 @@ void qmp_drive_mirror(DriveMirror *arg, Error **errp)
     int flags;
     int64_t size;
     const char *format = arg->format;
+    const char *replaces_node_name = NULL;
 
     bs = qmp_get_root_bs(arg->device, errp);
     if (!bs) {
@@ -3821,6 +3847,16 @@ void qmp_drive_mirror(DriveMirror *arg, Error **errp)
         return;
     }
 
+    /*
+     * If the user has not instructed us otherwise, we should let the
+     * block job run from @bs (thus taking into account all filters on
+     * it) but replace @unfiltered_bs when it finishes (thus not
+     * removing those filters).
+     * (And if there are any explicit filters, we should assume the
+     *  user knows how to use the @replaces option.)
+     */
+    unfiltered_bs = bdrv_skip_implicit_filters(bs);
+
     aio_context = bdrv_get_aio_context(bs);
     aio_context_acquire(aio_context);
 
@@ -3834,8 +3870,14 @@ void qmp_drive_mirror(DriveMirror *arg, Error **errp)
     }
 
     flags = bs->open_flags | BDRV_O_RDWR;
-    source = backing_bs(bs);
+    source = bdrv_filtered_cow_bs(unfiltered_bs);
     if (!source && arg->sync == MIRROR_SYNC_MODE_TOP) {
+        if (bdrv_filtered_bs(unfiltered_bs)) {
+            /* @unfiltered_bs is an explicit filter */
+            error_setg(errp, "Cannot perform sync=top mirror through an "
+                       "explicitly added filter node on the source");
+            goto out;
+        }
         arg->sync = MIRROR_SYNC_MODE_FULL;
     }
     if (arg->sync == MIRROR_SYNC_MODE_NONE) {
@@ -3854,6 +3896,9 @@ void qmp_drive_mirror(DriveMirror *arg, Error **errp)
                              " named node of the graph");
             goto out;
         }
+        replaces_node_name = arg->replaces;
+    } else if (unfiltered_bs != bs) {
+        replaces_node_name = unfiltered_bs->node_name;
     }
 
     if (arg->mode == NEW_IMAGE_MODE_ABSOLUTE_PATHS) {
@@ -3873,6 +3918,9 @@ void qmp_drive_mirror(DriveMirror *arg, Error **errp)
         bdrv_img_create(arg->target, format,
                         NULL, NULL, NULL, size, flags, false, &local_err);
     } else {
+        /* Implicit filters should not appear in the filename */
+        BlockDriverState *explicit_backing = bdrv_skip_implicit_filters(source);
+
         switch (arg->mode) {
         case NEW_IMAGE_MODE_EXISTING:
             break;
@@ -3880,8 +3928,8 @@ void qmp_drive_mirror(DriveMirror *arg, Error **errp)
             /* create new image with backing file */
             bdrv_refresh_filename(source);
             bdrv_img_create(arg->target, format,
-                            source->filename,
-                            source->drv->format_name,
+                            explicit_backing->filename,
+                            explicit_backing->drv->format_name,
                             NULL, size, flags, false, &local_err);
             break;
         default:
@@ -3913,7 +3961,7 @@ void qmp_drive_mirror(DriveMirror *arg, Error **errp)
     bdrv_set_aio_context(target_bs, aio_context);
 
     blockdev_mirror_common(arg->has_job_id ? arg->job_id : NULL, bs, target_bs,
-                           arg->has_replaces, arg->replaces, arg->sync,
+                           !!replaces_node_name, replaces_node_name, arg->sync,
                            backing_mode, arg->has_speed, arg->speed,
                            arg->has_granularity, arg->granularity,
                            arg->has_buf_size, arg->buf_size,
@@ -3949,7 +3997,7 @@ void qmp_blockdev_mirror(bool has_job_id, const char *job_id,
                          bool has_auto_dismiss, bool auto_dismiss,
                          Error **errp)
 {
-    BlockDriverState *bs;
+    BlockDriverState *bs, *unfiltered_bs;
     BlockDriverState *target_bs;
     AioContext *aio_context;
     BlockMirrorBackingMode backing_mode = MIRROR_LEAVE_BACKING_CHAIN;
@@ -3960,6 +4008,16 @@ void qmp_blockdev_mirror(bool has_job_id, const char *job_id,
         return;
     }
 
+    /*
+     * Same as in qmp_drive_mirror(): We want to run the job from @bs,
+     * but we want to replace @unfiltered_bs on completion.
+     */
+    unfiltered_bs = bdrv_skip_implicit_filters(bs);
+    if (!has_replaces && unfiltered_bs != bs) {
+        replaces = unfiltered_bs->node_name;
+        has_replaces = true;
+    }
+
     target_bs = bdrv_lookup_bs(target, target, errp);
     if (!target_bs) {
         return;
diff --git a/migration/block-dirty-bitmap.c b/migration/block-dirty-bitmap.c
index d1bb863cb6..f99f753fba 100644
--- a/migration/block-dirty-bitmap.c
+++ b/migration/block-dirty-bitmap.c
@@ -285,9 +285,7 @@ static int init_dirty_bitmap_migration(void)
         const char *drive_name = bdrv_get_device_or_node_name(bs);
 
         /* skip automatically inserted nodes */
-        while (bs && bs->drv && bs->implicit) {
-            bs = backing_bs(bs);
-        }
+        bs = bdrv_skip_implicit_filters(bs);
 
         for (bitmap = bdrv_dirty_bitmap_next(bs, NULL); bitmap;
              bitmap = bdrv_dirty_bitmap_next(bs, bitmap))
diff --git a/nbd/server.c b/nbd/server.c
index e21bd501dc..e41ae89dbe 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -1506,13 +1506,13 @@ NBDExport *nbd_export_new(BlockDriverState *bs, uint64_t dev_offset,
     if (bitmap) {
         BdrvDirtyBitmap *bm = NULL;
 
-        while (true) {
+        while (bs) {
             bm = bdrv_find_dirty_bitmap(bs, bitmap);
-            if (bm != NULL || bs->backing == NULL) {
+            if (bm != NULL) {
                 break;
             }
 
-            bs = bs->backing->bs;
+            bs = bdrv_filtered_bs(bs);
         }
 
         if (bm == NULL) {
diff --git a/qemu-img.c b/qemu-img.c
index aa6f81f1ea..bcfbb743fc 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -982,7 +982,7 @@ static int img_commit(int argc, char **argv)
     if (!blk) {
         return 1;
     }
-    bs = blk_bs(blk);
+    bs = bdrv_skip_implicit_filters(blk_bs(blk));
 
     qemu_progress_init(progress, 1.f);
     qemu_progress_print(0.f, 100);
@@ -999,7 +999,7 @@ static int img_commit(int argc, char **argv)
         /* This is different from QMP, which by default uses the deepest file in
          * the backing chain (i.e., the very base); however, the traditional
          * behavior of qemu-img commit is using the immediate backing file. */
-        base_bs = backing_bs(bs);
+        base_bs = bdrv_filtered_cow_bs(bs);
         if (!base_bs) {
             error_setg(&local_err, "Image does not have a backing file");
             goto done;
@@ -1616,19 +1616,18 @@ static int convert_iteration_sectors(ImgConvertState *s, int64_t sector_num)
 
     if (s->sector_next_status <= sector_num) {
         int64_t count = n * BDRV_SECTOR_SIZE;
+        BlockDriverState *src_bs = blk_bs(s->src[src_cur]);
+        BlockDriverState *base;
 
         if (s->target_has_backing) {
-
-            ret = bdrv_block_status(blk_bs(s->src[src_cur]),
-                                    (sector_num - src_cur_offset) *
-                                    BDRV_SECTOR_SIZE,
-                                    count, &count, NULL, NULL);
+            base = bdrv_backing_chain_next(src_bs);
         } else {
-            ret = bdrv_block_status_above(blk_bs(s->src[src_cur]), NULL,
-                                          (sector_num - src_cur_offset) *
-                                          BDRV_SECTOR_SIZE,
-                                          count, &count, NULL, NULL);
+            base = NULL;
         }
+        ret = bdrv_block_status_above(src_bs, base,
+                                      (sector_num - src_cur_offset) *
+                                      BDRV_SECTOR_SIZE,
+                                      count, &count, NULL, NULL);
         if (ret < 0) {
             error_report("error while reading block status of sector %" PRId64
                          ": %s", sector_num, strerror(-ret));
@@ -2434,7 +2433,8 @@ static int img_convert(int argc, char **argv)
          * s.target_backing_sectors has to be negative, which it will
          * be automatically).  The backing file length is used only
          * for optimizations, so such a case is not fatal. */
-        s.target_backing_sectors = bdrv_nb_sectors(out_bs->backing->bs);
+        s.target_backing_sectors =
+            bdrv_nb_sectors(bdrv_filtered_cow_bs(out_bs));
     } else {
         s.target_backing_sectors = -1;
     }
@@ -2797,6 +2797,7 @@ static int get_block_status(BlockDriverState *bs, int64_t offset,
 
     depth = 0;
     for (;;) {
+        bs = bdrv_skip_rw_filters(bs);
         ret = bdrv_block_status(bs, offset, bytes, &bytes, &map, &file);
         if (ret < 0) {
             return ret;
@@ -2805,7 +2806,7 @@ static int get_block_status(BlockDriverState *bs, int64_t offset,
         if (ret & (BDRV_BLOCK_ZERO|BDRV_BLOCK_DATA)) {
             break;
         }
-        bs = backing_bs(bs);
+        bs = bdrv_filtered_cow_bs(bs);
         if (bs == NULL) {
             ret = 0;
             break;
@@ -2944,7 +2945,7 @@ static int img_map(int argc, char **argv)
     if (!blk) {
         return 1;
     }
-    bs = blk_bs(blk);
+    bs = bdrv_skip_implicit_filters(blk_bs(blk));
 
     if (output_format == OFORMAT_HUMAN) {
         printf("%-16s%-16s%-16s%s\n", "Offset", "Length", "Mapped to", "File");
diff --git a/tests/qemu-iotests/184.out b/tests/qemu-iotests/184.out
index 3deb3cfb94..1d61f7e224 100644
--- a/tests/qemu-iotests/184.out
+++ b/tests/qemu-iotests/184.out
@@ -27,6 +27,11 @@ Testing:
             "iops_rd": 0,
             "detect_zeroes": "off",
             "image": {
+                "backing-image": {
+                    "virtual-size": 1073741824,
+                    "filename": "null-co://",
+                    "format": "null-co"
+                },
                 "virtual-size": 1073741824,
                 "filename": "json:{\"throttle-group\": \"group0\", \"driver\": \"throttle\", \"file\": {\"driver\": \"null-co\"}}",
                 "format": "throttle"
@@ -34,7 +39,7 @@ Testing:
             "iops_wr": 0,
             "ro": false,
             "node-name": "throttle0",
-            "backing_file_depth": 0,
+            "backing_file_depth": 1,
             "drv": "throttle",
             "iops": 0,
             "bps_wr": 0,
diff --git a/tests/qemu-iotests/204.out b/tests/qemu-iotests/204.out
index f3a10fbe90..684774d763 100644
--- a/tests/qemu-iotests/204.out
+++ b/tests/qemu-iotests/204.out
@@ -59,5 +59,6 @@ Offset          Length          File
 0x900000        0x2400000       TEST_DIR/t.IMGFMT
 0x3c00000       0x1100000       TEST_DIR/t.IMGFMT
 0x6a00000       0x400000        TEST_DIR/t.IMGFMT
+0x6e00000       0x1200000       TEST_DIR/t.IMGFMT.base
 No errors were found on the image.
 *** done
-- 
2.20.1



^ permalink raw reply	[flat|nested] 48+ messages in thread

* [Qemu-devel] [PATCH v4 03/11] block: Storage child access function
@ 2019-04-10 20:20   ` Max Reitz
  0 siblings, 0 replies; 48+ messages in thread
From: Max Reitz @ 2019-04-10 20:20 UTC (permalink / raw)
  To: qemu-block; +Cc: qemu-devel, Max Reitz, Kevin Wolf, Eric Blake

For completeness' sake, add a function for accessing a node's storage
child, too.  For filters, this is their filtered child; for non-filters,
this is bs->file.

Some places are deliberately left unconverted:
- BDS opening/closing functions where bs->file is handled specially
  (which is basically wrong, but at least simplifies probing)
- bdrv_co_block_status_from_file(), because its name implies that it
  points to ->file
- bdrv_snapshot_goto() in one places unrefs bs->file.  Such a
  modification is not covered by this patch and is therefore just
  safeguarded by an additional assert(), but otherwise kept as-is.

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 include/block/block_int.h |  6 +++++
 block.c                   | 53 ++++++++++++++++++++++++++++-----------
 block/io.c                | 22 +++++++---------
 block/qapi.c              |  7 +++---
 block/snapshot.c          | 40 ++++++++++++++++-------------
 5 files changed, 81 insertions(+), 47 deletions(-)

diff --git a/include/block/block_int.h b/include/block/block_int.h
index b22b1164f8..d0309e6307 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -1243,6 +1243,7 @@ int refresh_total_sectors(BlockDriverState *bs, int64_t hint);
 BdrvChild *bdrv_filtered_cow_child(BlockDriverState *bs);
 BdrvChild *bdrv_filtered_rw_child(BlockDriverState *bs);
 BdrvChild *bdrv_filtered_child(BlockDriverState *bs);
+BdrvChild *bdrv_storage_child(BlockDriverState *bs);
 BlockDriverState *bdrv_skip_implicit_filters(BlockDriverState *bs);
 BlockDriverState *bdrv_skip_rw_filters(BlockDriverState *bs);
 BlockDriverState *bdrv_backing_chain_next(BlockDriverState *bs);
@@ -1267,4 +1268,9 @@ static inline BlockDriverState *bdrv_filtered_bs(BlockDriverState *bs)
     return child_bs(bdrv_filtered_child(bs));
 }
 
+static inline BlockDriverState *bdrv_storage_bs(BlockDriverState *bs)
+{
+    return child_bs(bdrv_storage_child(bs));
+}
+
 #endif /* BLOCK_INT_H */
diff --git a/block.c b/block.c
index e8f6febda0..89cb6de4c3 100644
--- a/block.c
+++ b/block.c
@@ -4404,15 +4404,21 @@ exit:
 int64_t bdrv_get_allocated_file_size(BlockDriverState *bs)
 {
     BlockDriver *drv = bs->drv;
+    BlockDriverState *storage_bs;
+
     if (!drv) {
         return -ENOMEDIUM;
     }
+
     if (drv->bdrv_get_allocated_file_size) {
         return drv->bdrv_get_allocated_file_size(bs);
     }
-    if (bs->file) {
-        return bdrv_get_allocated_file_size(bs->file->bs);
+
+    storage_bs = bdrv_storage_bs(bs);
+    if (storage_bs) {
+        return bdrv_get_allocated_file_size(storage_bs);
     }
+
     return -ENOTSUP;
 }
 
@@ -4982,7 +4988,7 @@ int bdrv_debug_breakpoint(BlockDriverState *bs, const char *event,
                           const char *tag)
 {
     while (bs && bs->drv && !bs->drv->bdrv_debug_breakpoint) {
-        bs = bs->file ? bs->file->bs : NULL;
+        bs = bdrv_storage_bs(bs);
     }
 
     if (bs && bs->drv && bs->drv->bdrv_debug_breakpoint) {
@@ -4995,7 +5001,7 @@ int bdrv_debug_breakpoint(BlockDriverState *bs, const char *event,
 int bdrv_debug_remove_breakpoint(BlockDriverState *bs, const char *tag)
 {
     while (bs && bs->drv && !bs->drv->bdrv_debug_remove_breakpoint) {
-        bs = bs->file ? bs->file->bs : NULL;
+        bs = bdrv_storage_bs(bs);
     }
 
     if (bs && bs->drv && bs->drv->bdrv_debug_remove_breakpoint) {
@@ -5008,7 +5014,7 @@ int bdrv_debug_remove_breakpoint(BlockDriverState *bs, const char *tag)
 int bdrv_debug_resume(BlockDriverState *bs, const char *tag)
 {
     while (bs && (!bs->drv || !bs->drv->bdrv_debug_resume)) {
-        bs = bs->file ? bs->file->bs : NULL;
+        bs = bdrv_storage_bs(bs);
     }
 
     if (bs && bs->drv && bs->drv->bdrv_debug_resume) {
@@ -5021,7 +5027,7 @@ int bdrv_debug_resume(BlockDriverState *bs, const char *tag)
 bool bdrv_debug_is_suspended(BlockDriverState *bs, const char *tag)
 {
     while (bs && bs->drv && !bs->drv->bdrv_debug_is_suspended) {
-        bs = bs->file ? bs->file->bs : NULL;
+        bs = bdrv_storage_bs(bs);
     }
 
     if (bs && bs->drv && bs->drv->bdrv_debug_is_suspended) {
@@ -6142,14 +6148,23 @@ void bdrv_refresh_filename(BlockDriverState *bs)
         bs->exact_filename[0] = '\0';
 
         drv->bdrv_refresh_filename(bs);
-    } else if (bs->file) {
-        /* Try to reconstruct valid information from the underlying file */
+    } else if (bdrv_storage_child(bs)) {
+        /*
+         * Try to reconstruct valid information from the underlying
+         * file -- this only works for format nodes (filter nodes
+         * cannot be probed and as such must be selected by the user
+         * either through an options dict, or through a special
+         * filename which the filter driver must construct in its
+         * .bdrv_refresh_filename() implementation).
+         */
+        BlockDriverState *storage_bs = bdrv_storage_bs(bs);
 
         bs->exact_filename[0] = '\0';
 
         /*
          * We can use the underlying file's filename if:
          * - it has a filename,
+         * - the current BDS is not a filter,
          * - the file is a protocol BDS, and
          * - opening that file (as this BDS's format) will automatically create
          *   the BDS tree we have right now, that is:
@@ -6158,11 +6173,10 @@ void bdrv_refresh_filename(BlockDriverState *bs)
          *   - no non-file child of this BDS has been overridden by the user
          *   Both of these conditions are represented by generate_json_filename.
          */
-        if (bs->file->bs->exact_filename[0] &&
-            bs->file->bs->drv->bdrv_file_open &&
-            !generate_json_filename)
+        if (storage_bs->exact_filename[0] && storage_bs->drv->bdrv_file_open &&
+            !drv->is_filter && !generate_json_filename)
         {
-            strcpy(bs->exact_filename, bs->file->bs->exact_filename);
+            strcpy(bs->exact_filename, storage_bs->exact_filename);
         }
     }
 
@@ -6179,6 +6193,7 @@ void bdrv_refresh_filename(BlockDriverState *bs)
 char *bdrv_dirname(BlockDriverState *bs, Error **errp)
 {
     BlockDriver *drv = bs->drv;
+    BlockDriverState *storage_bs;
 
     if (!drv) {
         error_setg(errp, "Node '%s' is ejected", bs->node_name);
@@ -6189,8 +6204,9 @@ char *bdrv_dirname(BlockDriverState *bs, Error **errp)
         return drv->bdrv_dirname(bs, errp);
     }
 
-    if (bs->file) {
-        return bdrv_dirname(bs->file->bs, errp);
+    storage_bs = bdrv_storage_bs(bs);
+    if (storage_bs) {
+        return bdrv_dirname(storage_bs, errp);
     }
 
     bdrv_refresh_filename(bs);
@@ -6324,6 +6340,15 @@ BdrvChild *bdrv_filtered_child(BlockDriverState *bs)
     return cow_child ?: rw_child;
 }
 
+/*
+ * Return the child that stores the data that is allocated on this
+ * node.  This may or may not include metadata.
+ */
+BdrvChild *bdrv_storage_child(BlockDriverState *bs)
+{
+    return bdrv_filtered_rw_child(bs) ?: bs->file;
+}
+
 static BlockDriverState *bdrv_skip_filters(BlockDriverState *bs,
                                            bool stop_on_explicit_filter)
 {
diff --git a/block/io.c b/block/io.c
index 83c2b6b46a..5c33ecc080 100644
--- a/block/io.c
+++ b/block/io.c
@@ -118,17 +118,10 @@ static void bdrv_merge_limits(BlockLimits *dst, const BlockLimits *src)
 void bdrv_refresh_limits(BlockDriverState *bs, Error **errp)
 {
     BlockDriver *drv = bs->drv;
-    BlockDriverState *storage_bs;
+    BlockDriverState *storage_bs = bdrv_storage_bs(bs);
     BlockDriverState *cow_bs = bdrv_filtered_cow_bs(bs);
     Error *local_err = NULL;
 
-    /*
-     * FIXME: There should be a function for this, and in fact there
-     * will be as of a follow-up patch.
-     */
-    storage_bs =
-        child_bs(bs->file) ?: bdrv_filtered_rw_bs(bs);
-
     memset(&bs->bl, 0, sizeof(bs->bl));
 
     if (!drv) {
@@ -2390,6 +2383,7 @@ bdrv_co_rw_vmstate(BlockDriverState *bs, QEMUIOVector *qiov, int64_t pos,
                    bool is_read)
 {
     BlockDriver *drv = bs->drv;
+    BlockDriverState *storage_bs = bdrv_storage_bs(bs);
     int ret = -ENOTSUP;
 
     bdrv_inc_in_flight(bs);
@@ -2402,8 +2396,8 @@ bdrv_co_rw_vmstate(BlockDriverState *bs, QEMUIOVector *qiov, int64_t pos,
         } else {
             ret = drv->bdrv_save_vmstate(bs, qiov, pos);
         }
-    } else if (bs->file) {
-        ret = bdrv_co_rw_vmstate(bs->file->bs, qiov, pos, is_read);
+    } else if (storage_bs) {
+        ret = bdrv_co_rw_vmstate(storage_bs, qiov, pos, is_read);
     }
 
     bdrv_dec_in_flight(bs);
@@ -2530,6 +2524,7 @@ static void coroutine_fn bdrv_flush_co_entry(void *opaque)
 
 int coroutine_fn bdrv_co_flush(BlockDriverState *bs)
 {
+    BlockDriverState *storage_bs;
     int current_gen;
     int ret = 0;
 
@@ -2559,7 +2554,7 @@ int coroutine_fn bdrv_co_flush(BlockDriverState *bs)
     }
 
     /* Write back cached data to the OS even with cache=unsafe */
-    BLKDBG_EVENT(bs->file, BLKDBG_FLUSH_TO_OS);
+    BLKDBG_EVENT(bdrv_storage_child(bs), BLKDBG_FLUSH_TO_OS);
     if (bs->drv->bdrv_co_flush_to_os) {
         ret = bs->drv->bdrv_co_flush_to_os(bs);
         if (ret < 0) {
@@ -2577,7 +2572,7 @@ int coroutine_fn bdrv_co_flush(BlockDriverState *bs)
         goto flush_parent;
     }
 
-    BLKDBG_EVENT(bs->file, BLKDBG_FLUSH_TO_DISK);
+    BLKDBG_EVENT(bdrv_storage_child(bs), BLKDBG_FLUSH_TO_DISK);
     if (!bs->drv) {
         /* bs->drv->bdrv_co_flush() might have ejected the BDS
          * (even in case of apparent success) */
@@ -2622,7 +2617,8 @@ int coroutine_fn bdrv_co_flush(BlockDriverState *bs)
      * in the case of cache=unsafe, so there are no useless flushes.
      */
 flush_parent:
-    ret = bs->file ? bdrv_co_flush(bs->file->bs) : 0;
+    storage_bs = bdrv_storage_bs(bs);
+    ret = storage_bs ? bdrv_co_flush(storage_bs) : 0;
 out:
     /* Notify any pending flushes that we have completed */
     if (ret == 0) {
diff --git a/block/qapi.c b/block/qapi.c
index 478c6f5e0d..e026d27077 100644
--- a/block/qapi.c
+++ b/block/qapi.c
@@ -509,7 +509,7 @@ static void bdrv_query_blk_stats(BlockDeviceStats *ds, BlockBackend *blk)
 static BlockStats *bdrv_query_bds_stats(BlockDriverState *bs,
                                         bool blk_level)
 {
-    BlockDriverState *cow_bs;
+    BlockDriverState *storage_bs, *cow_bs;
     BlockStats *s = NULL;
 
     s = g_malloc0(sizeof(*s));
@@ -533,9 +533,10 @@ static BlockStats *bdrv_query_bds_stats(BlockDriverState *bs,
 
     s->stats->wr_highest_offset = stat64_get(&bs->wr_highest_offset);
 
-    if (bs->file) {
+    storage_bs = bdrv_storage_bs(bs);
+    if (storage_bs) {
         s->has_parent = true;
-        s->parent = bdrv_query_bds_stats(bs->file->bs, blk_level);
+        s->parent = bdrv_query_bds_stats(storage_bs, blk_level);
     }
 
     cow_bs = bdrv_filtered_cow_bs(bs);
diff --git a/block/snapshot.c b/block/snapshot.c
index f2f48f926a..3032cd0341 100644
--- a/block/snapshot.c
+++ b/block/snapshot.c
@@ -154,8 +154,9 @@ int bdrv_can_snapshot(BlockDriverState *bs)
     }
 
     if (!drv->bdrv_snapshot_create) {
-        if (bs->file != NULL) {
-            return bdrv_can_snapshot(bs->file->bs);
+        BlockDriverState *storage_bs = bdrv_storage_bs(bs);
+        if (storage_bs) {
+            return bdrv_can_snapshot(storage_bs);
         }
         return 0;
     }
@@ -167,14 +168,15 @@ int bdrv_snapshot_create(BlockDriverState *bs,
                          QEMUSnapshotInfo *sn_info)
 {
     BlockDriver *drv = bs->drv;
+    BlockDriverState *storage_bs = bdrv_storage_bs(bs);
     if (!drv) {
         return -ENOMEDIUM;
     }
     if (drv->bdrv_snapshot_create) {
         return drv->bdrv_snapshot_create(bs, sn_info);
     }
-    if (bs->file) {
-        return bdrv_snapshot_create(bs->file->bs, sn_info);
+    if (storage_bs) {
+        return bdrv_snapshot_create(storage_bs, sn_info);
     }
     return -ENOTSUP;
 }
@@ -184,6 +186,7 @@ int bdrv_snapshot_goto(BlockDriverState *bs,
                        Error **errp)
 {
     BlockDriver *drv = bs->drv;
+    BlockDriverState *storage_bs;
     int ret, open_ret;
 
     if (!drv) {
@@ -204,39 +207,40 @@ int bdrv_snapshot_goto(BlockDriverState *bs,
         return ret;
     }
 
-    if (bs->file) {
-        BlockDriverState *file;
+    storage_bs = bdrv_storage_bs(bs);
+    if (storage_bs) {
         QDict *options = qdict_clone_shallow(bs->options);
         QDict *file_options;
         Error *local_err = NULL;
 
-        file = bs->file->bs;
         /* Prevent it from getting deleted when detached from bs */
-        bdrv_ref(file);
+        bdrv_ref(storage_bs);
 
         qdict_extract_subqdict(options, &file_options, "file.");
         qobject_unref(file_options);
-        qdict_put_str(options, "file", bdrv_get_node_name(file));
+        qdict_put_str(options, "file", bdrv_get_node_name(storage_bs));
 
         if (drv->bdrv_close) {
             drv->bdrv_close(bs);
         }
+
+        assert(bs->file->bs == storage_bs);
         bdrv_unref_child(bs, bs->file);
         bs->file = NULL;
 
-        ret = bdrv_snapshot_goto(file, snapshot_id, errp);
+        ret = bdrv_snapshot_goto(storage_bs, snapshot_id, errp);
         open_ret = drv->bdrv_open(bs, options, bs->open_flags, &local_err);
         qobject_unref(options);
         if (open_ret < 0) {
-            bdrv_unref(file);
+            bdrv_unref(storage_bs);
             bs->drv = NULL;
             /* A bdrv_snapshot_goto() error takes precedence */
             error_propagate(errp, local_err);
             return ret < 0 ? ret : open_ret;
         }
 
-        assert(bs->file->bs == file);
-        bdrv_unref(file);
+        assert(bs->file->bs == storage_bs);
+        bdrv_unref(storage_bs);
         return ret;
     }
 
@@ -272,6 +276,7 @@ int bdrv_snapshot_delete(BlockDriverState *bs,
                          Error **errp)
 {
     BlockDriver *drv = bs->drv;
+    BlockDriverState *storage_bs = bdrv_storage_bs(bs);
     int ret;
 
     if (!drv) {
@@ -288,8 +293,8 @@ int bdrv_snapshot_delete(BlockDriverState *bs,
 
     if (drv->bdrv_snapshot_delete) {
         ret = drv->bdrv_snapshot_delete(bs, snapshot_id, name, errp);
-    } else if (bs->file) {
-        ret = bdrv_snapshot_delete(bs->file->bs, snapshot_id, name, errp);
+    } else if (storage_bs) {
+        ret = bdrv_snapshot_delete(storage_bs, snapshot_id, name, errp);
     } else {
         error_setg(errp, "Block format '%s' used by device '%s' "
                    "does not support internal snapshot deletion",
@@ -305,14 +310,15 @@ int bdrv_snapshot_list(BlockDriverState *bs,
                        QEMUSnapshotInfo **psn_info)
 {
     BlockDriver *drv = bs->drv;
+    BlockDriverState *storage_bs = bdrv_storage_bs(bs);
     if (!drv) {
         return -ENOMEDIUM;
     }
     if (drv->bdrv_snapshot_list) {
         return drv->bdrv_snapshot_list(bs, psn_info);
     }
-    if (bs->file) {
-        return bdrv_snapshot_list(bs->file->bs, psn_info);
+    if (storage_bs) {
+        return bdrv_snapshot_list(storage_bs, psn_info);
     }
     return -ENOTSUP;
 }
-- 
2.20.1

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [Qemu-devel] [PATCH v4 03/11] block: Storage child access function
@ 2019-04-10 20:20   ` Max Reitz
  0 siblings, 0 replies; 48+ messages in thread
From: Max Reitz @ 2019-04-10 20:20 UTC (permalink / raw)
  To: qemu-block; +Cc: Kevin Wolf, qemu-devel, Max Reitz

For completeness' sake, add a function for accessing a node's storage
child, too.  For filters, this is their filtered child; for non-filters,
this is bs->file.

Some places are deliberately left unconverted:
- BDS opening/closing functions where bs->file is handled specially
  (which is basically wrong, but at least simplifies probing)
- bdrv_co_block_status_from_file(), because its name implies that it
  points to ->file
- bdrv_snapshot_goto() in one places unrefs bs->file.  Such a
  modification is not covered by this patch and is therefore just
  safeguarded by an additional assert(), but otherwise kept as-is.

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 include/block/block_int.h |  6 +++++
 block.c                   | 53 ++++++++++++++++++++++++++++-----------
 block/io.c                | 22 +++++++---------
 block/qapi.c              |  7 +++---
 block/snapshot.c          | 40 ++++++++++++++++-------------
 5 files changed, 81 insertions(+), 47 deletions(-)

diff --git a/include/block/block_int.h b/include/block/block_int.h
index b22b1164f8..d0309e6307 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -1243,6 +1243,7 @@ int refresh_total_sectors(BlockDriverState *bs, int64_t hint);
 BdrvChild *bdrv_filtered_cow_child(BlockDriverState *bs);
 BdrvChild *bdrv_filtered_rw_child(BlockDriverState *bs);
 BdrvChild *bdrv_filtered_child(BlockDriverState *bs);
+BdrvChild *bdrv_storage_child(BlockDriverState *bs);
 BlockDriverState *bdrv_skip_implicit_filters(BlockDriverState *bs);
 BlockDriverState *bdrv_skip_rw_filters(BlockDriverState *bs);
 BlockDriverState *bdrv_backing_chain_next(BlockDriverState *bs);
@@ -1267,4 +1268,9 @@ static inline BlockDriverState *bdrv_filtered_bs(BlockDriverState *bs)
     return child_bs(bdrv_filtered_child(bs));
 }
 
+static inline BlockDriverState *bdrv_storage_bs(BlockDriverState *bs)
+{
+    return child_bs(bdrv_storage_child(bs));
+}
+
 #endif /* BLOCK_INT_H */
diff --git a/block.c b/block.c
index e8f6febda0..89cb6de4c3 100644
--- a/block.c
+++ b/block.c
@@ -4404,15 +4404,21 @@ exit:
 int64_t bdrv_get_allocated_file_size(BlockDriverState *bs)
 {
     BlockDriver *drv = bs->drv;
+    BlockDriverState *storage_bs;
+
     if (!drv) {
         return -ENOMEDIUM;
     }
+
     if (drv->bdrv_get_allocated_file_size) {
         return drv->bdrv_get_allocated_file_size(bs);
     }
-    if (bs->file) {
-        return bdrv_get_allocated_file_size(bs->file->bs);
+
+    storage_bs = bdrv_storage_bs(bs);
+    if (storage_bs) {
+        return bdrv_get_allocated_file_size(storage_bs);
     }
+
     return -ENOTSUP;
 }
 
@@ -4982,7 +4988,7 @@ int bdrv_debug_breakpoint(BlockDriverState *bs, const char *event,
                           const char *tag)
 {
     while (bs && bs->drv && !bs->drv->bdrv_debug_breakpoint) {
-        bs = bs->file ? bs->file->bs : NULL;
+        bs = bdrv_storage_bs(bs);
     }
 
     if (bs && bs->drv && bs->drv->bdrv_debug_breakpoint) {
@@ -4995,7 +5001,7 @@ int bdrv_debug_breakpoint(BlockDriverState *bs, const char *event,
 int bdrv_debug_remove_breakpoint(BlockDriverState *bs, const char *tag)
 {
     while (bs && bs->drv && !bs->drv->bdrv_debug_remove_breakpoint) {
-        bs = bs->file ? bs->file->bs : NULL;
+        bs = bdrv_storage_bs(bs);
     }
 
     if (bs && bs->drv && bs->drv->bdrv_debug_remove_breakpoint) {
@@ -5008,7 +5014,7 @@ int bdrv_debug_remove_breakpoint(BlockDriverState *bs, const char *tag)
 int bdrv_debug_resume(BlockDriverState *bs, const char *tag)
 {
     while (bs && (!bs->drv || !bs->drv->bdrv_debug_resume)) {
-        bs = bs->file ? bs->file->bs : NULL;
+        bs = bdrv_storage_bs(bs);
     }
 
     if (bs && bs->drv && bs->drv->bdrv_debug_resume) {
@@ -5021,7 +5027,7 @@ int bdrv_debug_resume(BlockDriverState *bs, const char *tag)
 bool bdrv_debug_is_suspended(BlockDriverState *bs, const char *tag)
 {
     while (bs && bs->drv && !bs->drv->bdrv_debug_is_suspended) {
-        bs = bs->file ? bs->file->bs : NULL;
+        bs = bdrv_storage_bs(bs);
     }
 
     if (bs && bs->drv && bs->drv->bdrv_debug_is_suspended) {
@@ -6142,14 +6148,23 @@ void bdrv_refresh_filename(BlockDriverState *bs)
         bs->exact_filename[0] = '\0';
 
         drv->bdrv_refresh_filename(bs);
-    } else if (bs->file) {
-        /* Try to reconstruct valid information from the underlying file */
+    } else if (bdrv_storage_child(bs)) {
+        /*
+         * Try to reconstruct valid information from the underlying
+         * file -- this only works for format nodes (filter nodes
+         * cannot be probed and as such must be selected by the user
+         * either through an options dict, or through a special
+         * filename which the filter driver must construct in its
+         * .bdrv_refresh_filename() implementation).
+         */
+        BlockDriverState *storage_bs = bdrv_storage_bs(bs);
 
         bs->exact_filename[0] = '\0';
 
         /*
          * We can use the underlying file's filename if:
          * - it has a filename,
+         * - the current BDS is not a filter,
          * - the file is a protocol BDS, and
          * - opening that file (as this BDS's format) will automatically create
          *   the BDS tree we have right now, that is:
@@ -6158,11 +6173,10 @@ void bdrv_refresh_filename(BlockDriverState *bs)
          *   - no non-file child of this BDS has been overridden by the user
          *   Both of these conditions are represented by generate_json_filename.
          */
-        if (bs->file->bs->exact_filename[0] &&
-            bs->file->bs->drv->bdrv_file_open &&
-            !generate_json_filename)
+        if (storage_bs->exact_filename[0] && storage_bs->drv->bdrv_file_open &&
+            !drv->is_filter && !generate_json_filename)
         {
-            strcpy(bs->exact_filename, bs->file->bs->exact_filename);
+            strcpy(bs->exact_filename, storage_bs->exact_filename);
         }
     }
 
@@ -6179,6 +6193,7 @@ void bdrv_refresh_filename(BlockDriverState *bs)
 char *bdrv_dirname(BlockDriverState *bs, Error **errp)
 {
     BlockDriver *drv = bs->drv;
+    BlockDriverState *storage_bs;
 
     if (!drv) {
         error_setg(errp, "Node '%s' is ejected", bs->node_name);
@@ -6189,8 +6204,9 @@ char *bdrv_dirname(BlockDriverState *bs, Error **errp)
         return drv->bdrv_dirname(bs, errp);
     }
 
-    if (bs->file) {
-        return bdrv_dirname(bs->file->bs, errp);
+    storage_bs = bdrv_storage_bs(bs);
+    if (storage_bs) {
+        return bdrv_dirname(storage_bs, errp);
     }
 
     bdrv_refresh_filename(bs);
@@ -6324,6 +6340,15 @@ BdrvChild *bdrv_filtered_child(BlockDriverState *bs)
     return cow_child ?: rw_child;
 }
 
+/*
+ * Return the child that stores the data that is allocated on this
+ * node.  This may or may not include metadata.
+ */
+BdrvChild *bdrv_storage_child(BlockDriverState *bs)
+{
+    return bdrv_filtered_rw_child(bs) ?: bs->file;
+}
+
 static BlockDriverState *bdrv_skip_filters(BlockDriverState *bs,
                                            bool stop_on_explicit_filter)
 {
diff --git a/block/io.c b/block/io.c
index 83c2b6b46a..5c33ecc080 100644
--- a/block/io.c
+++ b/block/io.c
@@ -118,17 +118,10 @@ static void bdrv_merge_limits(BlockLimits *dst, const BlockLimits *src)
 void bdrv_refresh_limits(BlockDriverState *bs, Error **errp)
 {
     BlockDriver *drv = bs->drv;
-    BlockDriverState *storage_bs;
+    BlockDriverState *storage_bs = bdrv_storage_bs(bs);
     BlockDriverState *cow_bs = bdrv_filtered_cow_bs(bs);
     Error *local_err = NULL;
 
-    /*
-     * FIXME: There should be a function for this, and in fact there
-     * will be as of a follow-up patch.
-     */
-    storage_bs =
-        child_bs(bs->file) ?: bdrv_filtered_rw_bs(bs);
-
     memset(&bs->bl, 0, sizeof(bs->bl));
 
     if (!drv) {
@@ -2390,6 +2383,7 @@ bdrv_co_rw_vmstate(BlockDriverState *bs, QEMUIOVector *qiov, int64_t pos,
                    bool is_read)
 {
     BlockDriver *drv = bs->drv;
+    BlockDriverState *storage_bs = bdrv_storage_bs(bs);
     int ret = -ENOTSUP;
 
     bdrv_inc_in_flight(bs);
@@ -2402,8 +2396,8 @@ bdrv_co_rw_vmstate(BlockDriverState *bs, QEMUIOVector *qiov, int64_t pos,
         } else {
             ret = drv->bdrv_save_vmstate(bs, qiov, pos);
         }
-    } else if (bs->file) {
-        ret = bdrv_co_rw_vmstate(bs->file->bs, qiov, pos, is_read);
+    } else if (storage_bs) {
+        ret = bdrv_co_rw_vmstate(storage_bs, qiov, pos, is_read);
     }
 
     bdrv_dec_in_flight(bs);
@@ -2530,6 +2524,7 @@ static void coroutine_fn bdrv_flush_co_entry(void *opaque)
 
 int coroutine_fn bdrv_co_flush(BlockDriverState *bs)
 {
+    BlockDriverState *storage_bs;
     int current_gen;
     int ret = 0;
 
@@ -2559,7 +2554,7 @@ int coroutine_fn bdrv_co_flush(BlockDriverState *bs)
     }
 
     /* Write back cached data to the OS even with cache=unsafe */
-    BLKDBG_EVENT(bs->file, BLKDBG_FLUSH_TO_OS);
+    BLKDBG_EVENT(bdrv_storage_child(bs), BLKDBG_FLUSH_TO_OS);
     if (bs->drv->bdrv_co_flush_to_os) {
         ret = bs->drv->bdrv_co_flush_to_os(bs);
         if (ret < 0) {
@@ -2577,7 +2572,7 @@ int coroutine_fn bdrv_co_flush(BlockDriverState *bs)
         goto flush_parent;
     }
 
-    BLKDBG_EVENT(bs->file, BLKDBG_FLUSH_TO_DISK);
+    BLKDBG_EVENT(bdrv_storage_child(bs), BLKDBG_FLUSH_TO_DISK);
     if (!bs->drv) {
         /* bs->drv->bdrv_co_flush() might have ejected the BDS
          * (even in case of apparent success) */
@@ -2622,7 +2617,8 @@ int coroutine_fn bdrv_co_flush(BlockDriverState *bs)
      * in the case of cache=unsafe, so there are no useless flushes.
      */
 flush_parent:
-    ret = bs->file ? bdrv_co_flush(bs->file->bs) : 0;
+    storage_bs = bdrv_storage_bs(bs);
+    ret = storage_bs ? bdrv_co_flush(storage_bs) : 0;
 out:
     /* Notify any pending flushes that we have completed */
     if (ret == 0) {
diff --git a/block/qapi.c b/block/qapi.c
index 478c6f5e0d..e026d27077 100644
--- a/block/qapi.c
+++ b/block/qapi.c
@@ -509,7 +509,7 @@ static void bdrv_query_blk_stats(BlockDeviceStats *ds, BlockBackend *blk)
 static BlockStats *bdrv_query_bds_stats(BlockDriverState *bs,
                                         bool blk_level)
 {
-    BlockDriverState *cow_bs;
+    BlockDriverState *storage_bs, *cow_bs;
     BlockStats *s = NULL;
 
     s = g_malloc0(sizeof(*s));
@@ -533,9 +533,10 @@ static BlockStats *bdrv_query_bds_stats(BlockDriverState *bs,
 
     s->stats->wr_highest_offset = stat64_get(&bs->wr_highest_offset);
 
-    if (bs->file) {
+    storage_bs = bdrv_storage_bs(bs);
+    if (storage_bs) {
         s->has_parent = true;
-        s->parent = bdrv_query_bds_stats(bs->file->bs, blk_level);
+        s->parent = bdrv_query_bds_stats(storage_bs, blk_level);
     }
 
     cow_bs = bdrv_filtered_cow_bs(bs);
diff --git a/block/snapshot.c b/block/snapshot.c
index f2f48f926a..3032cd0341 100644
--- a/block/snapshot.c
+++ b/block/snapshot.c
@@ -154,8 +154,9 @@ int bdrv_can_snapshot(BlockDriverState *bs)
     }
 
     if (!drv->bdrv_snapshot_create) {
-        if (bs->file != NULL) {
-            return bdrv_can_snapshot(bs->file->bs);
+        BlockDriverState *storage_bs = bdrv_storage_bs(bs);
+        if (storage_bs) {
+            return bdrv_can_snapshot(storage_bs);
         }
         return 0;
     }
@@ -167,14 +168,15 @@ int bdrv_snapshot_create(BlockDriverState *bs,
                          QEMUSnapshotInfo *sn_info)
 {
     BlockDriver *drv = bs->drv;
+    BlockDriverState *storage_bs = bdrv_storage_bs(bs);
     if (!drv) {
         return -ENOMEDIUM;
     }
     if (drv->bdrv_snapshot_create) {
         return drv->bdrv_snapshot_create(bs, sn_info);
     }
-    if (bs->file) {
-        return bdrv_snapshot_create(bs->file->bs, sn_info);
+    if (storage_bs) {
+        return bdrv_snapshot_create(storage_bs, sn_info);
     }
     return -ENOTSUP;
 }
@@ -184,6 +186,7 @@ int bdrv_snapshot_goto(BlockDriverState *bs,
                        Error **errp)
 {
     BlockDriver *drv = bs->drv;
+    BlockDriverState *storage_bs;
     int ret, open_ret;
 
     if (!drv) {
@@ -204,39 +207,40 @@ int bdrv_snapshot_goto(BlockDriverState *bs,
         return ret;
     }
 
-    if (bs->file) {
-        BlockDriverState *file;
+    storage_bs = bdrv_storage_bs(bs);
+    if (storage_bs) {
         QDict *options = qdict_clone_shallow(bs->options);
         QDict *file_options;
         Error *local_err = NULL;
 
-        file = bs->file->bs;
         /* Prevent it from getting deleted when detached from bs */
-        bdrv_ref(file);
+        bdrv_ref(storage_bs);
 
         qdict_extract_subqdict(options, &file_options, "file.");
         qobject_unref(file_options);
-        qdict_put_str(options, "file", bdrv_get_node_name(file));
+        qdict_put_str(options, "file", bdrv_get_node_name(storage_bs));
 
         if (drv->bdrv_close) {
             drv->bdrv_close(bs);
         }
+
+        assert(bs->file->bs == storage_bs);
         bdrv_unref_child(bs, bs->file);
         bs->file = NULL;
 
-        ret = bdrv_snapshot_goto(file, snapshot_id, errp);
+        ret = bdrv_snapshot_goto(storage_bs, snapshot_id, errp);
         open_ret = drv->bdrv_open(bs, options, bs->open_flags, &local_err);
         qobject_unref(options);
         if (open_ret < 0) {
-            bdrv_unref(file);
+            bdrv_unref(storage_bs);
             bs->drv = NULL;
             /* A bdrv_snapshot_goto() error takes precedence */
             error_propagate(errp, local_err);
             return ret < 0 ? ret : open_ret;
         }
 
-        assert(bs->file->bs == file);
-        bdrv_unref(file);
+        assert(bs->file->bs == storage_bs);
+        bdrv_unref(storage_bs);
         return ret;
     }
 
@@ -272,6 +276,7 @@ int bdrv_snapshot_delete(BlockDriverState *bs,
                          Error **errp)
 {
     BlockDriver *drv = bs->drv;
+    BlockDriverState *storage_bs = bdrv_storage_bs(bs);
     int ret;
 
     if (!drv) {
@@ -288,8 +293,8 @@ int bdrv_snapshot_delete(BlockDriverState *bs,
 
     if (drv->bdrv_snapshot_delete) {
         ret = drv->bdrv_snapshot_delete(bs, snapshot_id, name, errp);
-    } else if (bs->file) {
-        ret = bdrv_snapshot_delete(bs->file->bs, snapshot_id, name, errp);
+    } else if (storage_bs) {
+        ret = bdrv_snapshot_delete(storage_bs, snapshot_id, name, errp);
     } else {
         error_setg(errp, "Block format '%s' used by device '%s' "
                    "does not support internal snapshot deletion",
@@ -305,14 +310,15 @@ int bdrv_snapshot_list(BlockDriverState *bs,
                        QEMUSnapshotInfo **psn_info)
 {
     BlockDriver *drv = bs->drv;
+    BlockDriverState *storage_bs = bdrv_storage_bs(bs);
     if (!drv) {
         return -ENOMEDIUM;
     }
     if (drv->bdrv_snapshot_list) {
         return drv->bdrv_snapshot_list(bs, psn_info);
     }
-    if (bs->file) {
-        return bdrv_snapshot_list(bs->file->bs, psn_info);
+    if (storage_bs) {
+        return bdrv_snapshot_list(storage_bs, psn_info);
     }
     return -ENOTSUP;
 }
-- 
2.20.1



^ permalink raw reply	[flat|nested] 48+ messages in thread

* [Qemu-devel] [PATCH v4 04/11] block: Inline bdrv_co_block_status_from_*()
@ 2019-04-10 20:20   ` Max Reitz
  0 siblings, 0 replies; 48+ messages in thread
From: Max Reitz @ 2019-04-10 20:20 UTC (permalink / raw)
  To: qemu-block; +Cc: qemu-devel, Max Reitz, Kevin Wolf, Eric Blake

With bdrv_filtered_rw_bs(), we can easily handle this default filter
behavior in bdrv_co_block_status().

blkdebug wants to have an additional assertion, so it keeps its own
implementation, except bdrv_co_block_status_from_file() needs to be
inlined there.

Suggested-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 include/block/block_int.h | 22 -----------------
 block/blkdebug.c          |  7 ++++--
 block/blklogwrites.c      |  1 -
 block/commit.c            |  1 -
 block/copy-on-read.c      |  2 --
 block/io.c                | 51 +++++++++++++--------------------------
 block/mirror.c            |  1 -
 block/throttle.c          |  1 -
 8 files changed, 22 insertions(+), 64 deletions(-)

diff --git a/include/block/block_int.h b/include/block/block_int.h
index d0309e6307..76c7c0a111 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -1187,28 +1187,6 @@ void bdrv_format_default_perms(BlockDriverState *bs, BdrvChild *c,
                                uint64_t perm, uint64_t shared,
                                uint64_t *nperm, uint64_t *nshared);
 
-/*
- * Default implementation for drivers to pass bdrv_co_block_status() to
- * their file.
- */
-int coroutine_fn bdrv_co_block_status_from_file(BlockDriverState *bs,
-                                                bool want_zero,
-                                                int64_t offset,
-                                                int64_t bytes,
-                                                int64_t *pnum,
-                                                int64_t *map,
-                                                BlockDriverState **file);
-/*
- * Default implementation for drivers to pass bdrv_co_block_status() to
- * their backing file.
- */
-int coroutine_fn bdrv_co_block_status_from_backing(BlockDriverState *bs,
-                                                   bool want_zero,
-                                                   int64_t offset,
-                                                   int64_t bytes,
-                                                   int64_t *pnum,
-                                                   int64_t *map,
-                                                   BlockDriverState **file);
 const char *bdrv_get_parent_name(const BlockDriverState *bs);
 void blk_dev_change_media_cb(BlockBackend *blk, bool load, Error **errp);
 bool blk_dev_has_removable_media(BlockBackend *blk);
diff --git a/block/blkdebug.c b/block/blkdebug.c
index efd9441625..7950ae729c 100644
--- a/block/blkdebug.c
+++ b/block/blkdebug.c
@@ -637,8 +637,11 @@ static int coroutine_fn blkdebug_co_block_status(BlockDriverState *bs,
                                                  BlockDriverState **file)
 {
     assert(QEMU_IS_ALIGNED(offset | bytes, bs->bl.request_alignment));
-    return bdrv_co_block_status_from_file(bs, want_zero, offset, bytes,
-                                          pnum, map, file);
+    assert(bs->file && bs->file->bs);
+    *pnum = bytes;
+    *map = offset;
+    *file = bs->file->bs;
+    return BDRV_BLOCK_RAW | BDRV_BLOCK_OFFSET_VALID;
 }
 
 static void blkdebug_close(BlockDriverState *bs)
diff --git a/block/blklogwrites.c b/block/blklogwrites.c
index eb2b4901a5..1eb4a5c613 100644
--- a/block/blklogwrites.c
+++ b/block/blklogwrites.c
@@ -518,7 +518,6 @@ static BlockDriver bdrv_blk_log_writes = {
     .bdrv_co_pwrite_zeroes  = blk_log_writes_co_pwrite_zeroes,
     .bdrv_co_flush_to_disk  = blk_log_writes_co_flush_to_disk,
     .bdrv_co_pdiscard       = blk_log_writes_co_pdiscard,
-    .bdrv_co_block_status   = bdrv_co_block_status_from_file,
 
     .is_filter              = true,
     .strong_runtime_opts    = blk_log_writes_strong_runtime_opts,
diff --git a/block/commit.c b/block/commit.c
index 252007fd57..c366ee9655 100644
--- a/block/commit.c
+++ b/block/commit.c
@@ -254,7 +254,6 @@ static void bdrv_commit_top_child_perm(BlockDriverState *bs, BdrvChild *c,
 static BlockDriver bdrv_commit_top = {
     .format_name                = "commit_top",
     .bdrv_co_preadv             = bdrv_commit_top_preadv,
-    .bdrv_co_block_status       = bdrv_co_block_status_from_backing,
     .bdrv_refresh_filename      = bdrv_commit_top_refresh_filename,
     .bdrv_child_perm            = bdrv_commit_top_child_perm,
 
diff --git a/block/copy-on-read.c b/block/copy-on-read.c
index 53972b1da3..fe9260163c 100644
--- a/block/copy-on-read.c
+++ b/block/copy-on-read.c
@@ -150,8 +150,6 @@ static BlockDriver bdrv_copy_on_read = {
     .bdrv_eject                         = cor_eject,
     .bdrv_lock_medium                   = cor_lock_medium,
 
-    .bdrv_co_block_status               = bdrv_co_block_status_from_file,
-
     .bdrv_recurse_is_first_non_filter   = cor_recurse_is_first_non_filter,
 
     .has_variable_length                = true,
diff --git a/block/io.c b/block/io.c
index 5c33ecc080..8d124bae5c 100644
--- a/block/io.c
+++ b/block/io.c
@@ -1993,36 +1993,6 @@ typedef struct BdrvCoBlockStatusData {
     bool done;
 } BdrvCoBlockStatusData;
 
-int coroutine_fn bdrv_co_block_status_from_file(BlockDriverState *bs,
-                                                bool want_zero,
-                                                int64_t offset,
-                                                int64_t bytes,
-                                                int64_t *pnum,
-                                                int64_t *map,
-                                                BlockDriverState **file)
-{
-    assert(bs->file && bs->file->bs);
-    *pnum = bytes;
-    *map = offset;
-    *file = bs->file->bs;
-    return BDRV_BLOCK_RAW | BDRV_BLOCK_OFFSET_VALID;
-}
-
-int coroutine_fn bdrv_co_block_status_from_backing(BlockDriverState *bs,
-                                                   bool want_zero,
-                                                   int64_t offset,
-                                                   int64_t bytes,
-                                                   int64_t *pnum,
-                                                   int64_t *map,
-                                                   BlockDriverState **file)
-{
-    assert(bs->backing && bs->backing->bs);
-    *pnum = bytes;
-    *map = offset;
-    *file = bs->backing->bs;
-    return BDRV_BLOCK_RAW | BDRV_BLOCK_OFFSET_VALID;
-}
-
 /*
  * Returns the allocation status of the specified sectors.
  * Drivers not implementing the functionality are assumed to not support
@@ -2063,6 +2033,7 @@ static int coroutine_fn bdrv_co_block_status(BlockDriverState *bs,
     BlockDriverState *local_file = NULL;
     int64_t aligned_offset, aligned_bytes;
     uint32_t align;
+    bool has_filtered_child;
 
     assert(pnum);
     *pnum = 0;
@@ -2088,7 +2059,8 @@ static int coroutine_fn bdrv_co_block_status(BlockDriverState *bs,
 
     /* Must be non-NULL or bdrv_getlength() would have failed */
     assert(bs->drv);
-    if (!bs->drv->bdrv_co_block_status) {
+    has_filtered_child = bs->drv->is_filter && bdrv_filtered_rw_child(bs);
+    if (!bs->drv->bdrv_co_block_status && !has_filtered_child) {
         *pnum = bytes;
         ret = BDRV_BLOCK_DATA | BDRV_BLOCK_ALLOCATED;
         if (offset + bytes == total_size) {
@@ -2109,9 +2081,20 @@ static int coroutine_fn bdrv_co_block_status(BlockDriverState *bs,
     aligned_offset = QEMU_ALIGN_DOWN(offset, align);
     aligned_bytes = ROUND_UP(offset + bytes, align) - aligned_offset;
 
-    ret = bs->drv->bdrv_co_block_status(bs, want_zero, aligned_offset,
-                                        aligned_bytes, pnum, &local_map,
-                                        &local_file);
+    if (bs->drv->bdrv_co_block_status) {
+        ret = bs->drv->bdrv_co_block_status(bs, want_zero, aligned_offset,
+                                            aligned_bytes, pnum, &local_map,
+                                            &local_file);
+    } else {
+        /* Default code for filters */
+
+        local_file = bdrv_filtered_rw_bs(bs);
+        assert(local_file);
+
+        *pnum = aligned_bytes;
+        local_map = aligned_offset;
+        ret = BDRV_BLOCK_RAW | BDRV_BLOCK_OFFSET_VALID;
+    }
     if (ret < 0) {
         *pnum = 0;
         goto out;
diff --git a/block/mirror.c b/block/mirror.c
index 80cef587f0..2e521c726a 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -1487,7 +1487,6 @@ static BlockDriver bdrv_mirror_top = {
     .bdrv_co_pwrite_zeroes      = bdrv_mirror_top_pwrite_zeroes,
     .bdrv_co_pdiscard           = bdrv_mirror_top_pdiscard,
     .bdrv_co_flush              = bdrv_mirror_top_flush,
-    .bdrv_co_block_status       = bdrv_co_block_status_from_backing,
     .bdrv_refresh_filename      = bdrv_mirror_top_refresh_filename,
     .bdrv_child_perm            = bdrv_mirror_top_child_perm,
 
diff --git a/block/throttle.c b/block/throttle.c
index f64dcc27b9..b6922e734f 100644
--- a/block/throttle.c
+++ b/block/throttle.c
@@ -259,7 +259,6 @@ static BlockDriver bdrv_throttle = {
     .bdrv_reopen_prepare                =   throttle_reopen_prepare,
     .bdrv_reopen_commit                 =   throttle_reopen_commit,
     .bdrv_reopen_abort                  =   throttle_reopen_abort,
-    .bdrv_co_block_status               =   bdrv_co_block_status_from_file,
 
     .bdrv_co_drain_begin                =   throttle_co_drain_begin,
     .bdrv_co_drain_end                  =   throttle_co_drain_end,
-- 
2.20.1

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [Qemu-devel] [PATCH v4 04/11] block: Inline bdrv_co_block_status_from_*()
@ 2019-04-10 20:20   ` Max Reitz
  0 siblings, 0 replies; 48+ messages in thread
From: Max Reitz @ 2019-04-10 20:20 UTC (permalink / raw)
  To: qemu-block; +Cc: Kevin Wolf, qemu-devel, Max Reitz

With bdrv_filtered_rw_bs(), we can easily handle this default filter
behavior in bdrv_co_block_status().

blkdebug wants to have an additional assertion, so it keeps its own
implementation, except bdrv_co_block_status_from_file() needs to be
inlined there.

Suggested-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 include/block/block_int.h | 22 -----------------
 block/blkdebug.c          |  7 ++++--
 block/blklogwrites.c      |  1 -
 block/commit.c            |  1 -
 block/copy-on-read.c      |  2 --
 block/io.c                | 51 +++++++++++++--------------------------
 block/mirror.c            |  1 -
 block/throttle.c          |  1 -
 8 files changed, 22 insertions(+), 64 deletions(-)

diff --git a/include/block/block_int.h b/include/block/block_int.h
index d0309e6307..76c7c0a111 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -1187,28 +1187,6 @@ void bdrv_format_default_perms(BlockDriverState *bs, BdrvChild *c,
                                uint64_t perm, uint64_t shared,
                                uint64_t *nperm, uint64_t *nshared);
 
-/*
- * Default implementation for drivers to pass bdrv_co_block_status() to
- * their file.
- */
-int coroutine_fn bdrv_co_block_status_from_file(BlockDriverState *bs,
-                                                bool want_zero,
-                                                int64_t offset,
-                                                int64_t bytes,
-                                                int64_t *pnum,
-                                                int64_t *map,
-                                                BlockDriverState **file);
-/*
- * Default implementation for drivers to pass bdrv_co_block_status() to
- * their backing file.
- */
-int coroutine_fn bdrv_co_block_status_from_backing(BlockDriverState *bs,
-                                                   bool want_zero,
-                                                   int64_t offset,
-                                                   int64_t bytes,
-                                                   int64_t *pnum,
-                                                   int64_t *map,
-                                                   BlockDriverState **file);
 const char *bdrv_get_parent_name(const BlockDriverState *bs);
 void blk_dev_change_media_cb(BlockBackend *blk, bool load, Error **errp);
 bool blk_dev_has_removable_media(BlockBackend *blk);
diff --git a/block/blkdebug.c b/block/blkdebug.c
index efd9441625..7950ae729c 100644
--- a/block/blkdebug.c
+++ b/block/blkdebug.c
@@ -637,8 +637,11 @@ static int coroutine_fn blkdebug_co_block_status(BlockDriverState *bs,
                                                  BlockDriverState **file)
 {
     assert(QEMU_IS_ALIGNED(offset | bytes, bs->bl.request_alignment));
-    return bdrv_co_block_status_from_file(bs, want_zero, offset, bytes,
-                                          pnum, map, file);
+    assert(bs->file && bs->file->bs);
+    *pnum = bytes;
+    *map = offset;
+    *file = bs->file->bs;
+    return BDRV_BLOCK_RAW | BDRV_BLOCK_OFFSET_VALID;
 }
 
 static void blkdebug_close(BlockDriverState *bs)
diff --git a/block/blklogwrites.c b/block/blklogwrites.c
index eb2b4901a5..1eb4a5c613 100644
--- a/block/blklogwrites.c
+++ b/block/blklogwrites.c
@@ -518,7 +518,6 @@ static BlockDriver bdrv_blk_log_writes = {
     .bdrv_co_pwrite_zeroes  = blk_log_writes_co_pwrite_zeroes,
     .bdrv_co_flush_to_disk  = blk_log_writes_co_flush_to_disk,
     .bdrv_co_pdiscard       = blk_log_writes_co_pdiscard,
-    .bdrv_co_block_status   = bdrv_co_block_status_from_file,
 
     .is_filter              = true,
     .strong_runtime_opts    = blk_log_writes_strong_runtime_opts,
diff --git a/block/commit.c b/block/commit.c
index 252007fd57..c366ee9655 100644
--- a/block/commit.c
+++ b/block/commit.c
@@ -254,7 +254,6 @@ static void bdrv_commit_top_child_perm(BlockDriverState *bs, BdrvChild *c,
 static BlockDriver bdrv_commit_top = {
     .format_name                = "commit_top",
     .bdrv_co_preadv             = bdrv_commit_top_preadv,
-    .bdrv_co_block_status       = bdrv_co_block_status_from_backing,
     .bdrv_refresh_filename      = bdrv_commit_top_refresh_filename,
     .bdrv_child_perm            = bdrv_commit_top_child_perm,
 
diff --git a/block/copy-on-read.c b/block/copy-on-read.c
index 53972b1da3..fe9260163c 100644
--- a/block/copy-on-read.c
+++ b/block/copy-on-read.c
@@ -150,8 +150,6 @@ static BlockDriver bdrv_copy_on_read = {
     .bdrv_eject                         = cor_eject,
     .bdrv_lock_medium                   = cor_lock_medium,
 
-    .bdrv_co_block_status               = bdrv_co_block_status_from_file,
-
     .bdrv_recurse_is_first_non_filter   = cor_recurse_is_first_non_filter,
 
     .has_variable_length                = true,
diff --git a/block/io.c b/block/io.c
index 5c33ecc080..8d124bae5c 100644
--- a/block/io.c
+++ b/block/io.c
@@ -1993,36 +1993,6 @@ typedef struct BdrvCoBlockStatusData {
     bool done;
 } BdrvCoBlockStatusData;
 
-int coroutine_fn bdrv_co_block_status_from_file(BlockDriverState *bs,
-                                                bool want_zero,
-                                                int64_t offset,
-                                                int64_t bytes,
-                                                int64_t *pnum,
-                                                int64_t *map,
-                                                BlockDriverState **file)
-{
-    assert(bs->file && bs->file->bs);
-    *pnum = bytes;
-    *map = offset;
-    *file = bs->file->bs;
-    return BDRV_BLOCK_RAW | BDRV_BLOCK_OFFSET_VALID;
-}
-
-int coroutine_fn bdrv_co_block_status_from_backing(BlockDriverState *bs,
-                                                   bool want_zero,
-                                                   int64_t offset,
-                                                   int64_t bytes,
-                                                   int64_t *pnum,
-                                                   int64_t *map,
-                                                   BlockDriverState **file)
-{
-    assert(bs->backing && bs->backing->bs);
-    *pnum = bytes;
-    *map = offset;
-    *file = bs->backing->bs;
-    return BDRV_BLOCK_RAW | BDRV_BLOCK_OFFSET_VALID;
-}
-
 /*
  * Returns the allocation status of the specified sectors.
  * Drivers not implementing the functionality are assumed to not support
@@ -2063,6 +2033,7 @@ static int coroutine_fn bdrv_co_block_status(BlockDriverState *bs,
     BlockDriverState *local_file = NULL;
     int64_t aligned_offset, aligned_bytes;
     uint32_t align;
+    bool has_filtered_child;
 
     assert(pnum);
     *pnum = 0;
@@ -2088,7 +2059,8 @@ static int coroutine_fn bdrv_co_block_status(BlockDriverState *bs,
 
     /* Must be non-NULL or bdrv_getlength() would have failed */
     assert(bs->drv);
-    if (!bs->drv->bdrv_co_block_status) {
+    has_filtered_child = bs->drv->is_filter && bdrv_filtered_rw_child(bs);
+    if (!bs->drv->bdrv_co_block_status && !has_filtered_child) {
         *pnum = bytes;
         ret = BDRV_BLOCK_DATA | BDRV_BLOCK_ALLOCATED;
         if (offset + bytes == total_size) {
@@ -2109,9 +2081,20 @@ static int coroutine_fn bdrv_co_block_status(BlockDriverState *bs,
     aligned_offset = QEMU_ALIGN_DOWN(offset, align);
     aligned_bytes = ROUND_UP(offset + bytes, align) - aligned_offset;
 
-    ret = bs->drv->bdrv_co_block_status(bs, want_zero, aligned_offset,
-                                        aligned_bytes, pnum, &local_map,
-                                        &local_file);
+    if (bs->drv->bdrv_co_block_status) {
+        ret = bs->drv->bdrv_co_block_status(bs, want_zero, aligned_offset,
+                                            aligned_bytes, pnum, &local_map,
+                                            &local_file);
+    } else {
+        /* Default code for filters */
+
+        local_file = bdrv_filtered_rw_bs(bs);
+        assert(local_file);
+
+        *pnum = aligned_bytes;
+        local_map = aligned_offset;
+        ret = BDRV_BLOCK_RAW | BDRV_BLOCK_OFFSET_VALID;
+    }
     if (ret < 0) {
         *pnum = 0;
         goto out;
diff --git a/block/mirror.c b/block/mirror.c
index 80cef587f0..2e521c726a 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -1487,7 +1487,6 @@ static BlockDriver bdrv_mirror_top = {
     .bdrv_co_pwrite_zeroes      = bdrv_mirror_top_pwrite_zeroes,
     .bdrv_co_pdiscard           = bdrv_mirror_top_pdiscard,
     .bdrv_co_flush              = bdrv_mirror_top_flush,
-    .bdrv_co_block_status       = bdrv_co_block_status_from_backing,
     .bdrv_refresh_filename      = bdrv_mirror_top_refresh_filename,
     .bdrv_child_perm            = bdrv_mirror_top_child_perm,
 
diff --git a/block/throttle.c b/block/throttle.c
index f64dcc27b9..b6922e734f 100644
--- a/block/throttle.c
+++ b/block/throttle.c
@@ -259,7 +259,6 @@ static BlockDriver bdrv_throttle = {
     .bdrv_reopen_prepare                =   throttle_reopen_prepare,
     .bdrv_reopen_commit                 =   throttle_reopen_commit,
     .bdrv_reopen_abort                  =   throttle_reopen_abort,
-    .bdrv_co_block_status               =   bdrv_co_block_status_from_file,
 
     .bdrv_co_drain_begin                =   throttle_co_drain_begin,
     .bdrv_co_drain_end                  =   throttle_co_drain_end,
-- 
2.20.1



^ permalink raw reply	[flat|nested] 48+ messages in thread

* [Qemu-devel] [PATCH v4 05/11] block: Fix check_to_replace_node()
@ 2019-04-10 20:20   ` Max Reitz
  0 siblings, 0 replies; 48+ messages in thread
From: Max Reitz @ 2019-04-10 20:20 UTC (permalink / raw)
  To: qemu-block; +Cc: qemu-devel, Max Reitz, Kevin Wolf, Eric Blake

Currently, check_to_replace_node() only allows mirror to replace a node
in the chain of the source node, and only if it is the first non-filter
node below the source.  Well, technically, the idea is that you can
exactly replace a quorum child by mirroring from quorum.

This has (probably) two reasons:
(1) We do not want to create loops.
(2) @replaces and @device should have exactly the same content so
    replacing them does not cause visible data to change.

This has two issues:
(1) It is overly restrictive.  It is completely fine for @replaces to be
    a filter.
(2) It is not restrictive enough.  You can create loops with this as
    follows:

$ qemu-img create -f qcow2 /tmp/source.qcow2 64M
$ qemu-system-x86_64 -qmp stdio
{"execute": "qmp_capabilities"}
{"execute": "object-add",
 "arguments": {"qom-type": "throttle-group", "id": "tg0"}}
{"execute": "blockdev-add",
 "arguments": {
     "node-name": "source",
     "driver": "throttle",
     "throttle-group": "tg0",
     "file": {
         "node-name": "filtered",
         "driver": "qcow2",
         "file": {
             "driver": "file",
             "filename": "/tmp/source.qcow2"
         } } } }
{"execute": "drive-mirror",
 "arguments": {
     "job-id": "mirror",
     "device": "source",
     "target": "/tmp/target.qcow2",
     "format": "qcow2",
     "node-name": "target",
     "sync" :"none",
     "replaces": "filtered"
 } }
{"execute": "block-job-complete", "arguments": {"device": "mirror"}}

And qemu crashes because of a stack overflow due to the loop being
created (target's backing file is source, so when it replaces filtered,
it points to itself through source).

(blockdev-mirror can be broken similarly.)

So let us make the checks for the two conditions above explicit, which
makes the whole function exactly as restrictive as it needs to be.

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 include/block/block.h |  1 +
 block.c               | 83 +++++++++++++++++++++++++++++++++++++++----
 blockdev.c            | 34 ++++++++++++++++--
 3 files changed, 110 insertions(+), 8 deletions(-)

diff --git a/include/block/block.h b/include/block/block.h
index 2005664f14..2878198892 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -402,6 +402,7 @@ bool bdrv_is_first_non_filter(BlockDriverState *candidate);
 
 /* check if a named node can be replaced when doing drive-mirror */
 BlockDriverState *check_to_replace_node(BlockDriverState *parent_bs,
+                                        BlockDriverState *backing_bs,
                                         const char *node_name, Error **errp);
 
 /* async block I/O */
diff --git a/block.c b/block.c
index 89cb6de4c3..820244f52e 100644
--- a/block.c
+++ b/block.c
@@ -5916,7 +5916,59 @@ bool bdrv_is_first_non_filter(BlockDriverState *candidate)
     return false;
 }
 
+static bool is_child_of(BlockDriverState *child, BlockDriverState *parent)
+{
+    BdrvChild *c;
+
+    if (!parent) {
+        return false;
+    }
+
+    QLIST_FOREACH(c, &parent->children, next) {
+        if (c->bs == child || is_child_of(child, c->bs)) {
+            return true;
+        }
+    }
+
+    return false;
+}
+
+/*
+ * Return true if there are only filters in [@top, @base).  Note that
+ * this may include quorum (which bdrv_chain_contains() cannot
+ * handle).
+ */
+static bool is_filtered_child(BlockDriverState *top, BlockDriverState *base)
+{
+    BdrvChild *c;
+
+    if (!top) {
+        return false;
+    }
+
+    if (top == base) {
+        return true;
+    }
+
+    if (!top->drv->is_filter) {
+        return false;
+    }
+
+    QLIST_FOREACH(c, &top->children, next) {
+        if (is_filtered_child(c->bs, base)) {
+            return true;
+        }
+    }
+
+    return false;
+}
+
+/*
+ * @parent_bs is mirror's source BDS, @backing_bs is the BDS which
+ * will be attached to the target when mirror completes.
+ */
 BlockDriverState *check_to_replace_node(BlockDriverState *parent_bs,
+                                        BlockDriverState *backing_bs,
                                         const char *node_name, Error **errp)
 {
     BlockDriverState *to_replace_bs = bdrv_find_node(node_name);
@@ -5935,13 +5987,32 @@ BlockDriverState *check_to_replace_node(BlockDriverState *parent_bs,
         goto out;
     }
 
-    /* We don't want arbitrary node of the BDS chain to be replaced only the top
-     * most non filter in order to prevent data corruption.
-     * Another benefit is that this tests exclude backing files which are
-     * blocked by the backing blockers.
+    /*
+     * If to_replace_bs is (recursively) a child of backing_bs,
+     * replacing it may create a loop.  We cannot allow that.
      */
-    if (!bdrv_recurse_is_first_non_filter(parent_bs, to_replace_bs)) {
-        error_setg(errp, "Only top most non filter can be replaced");
+    if (to_replace_bs == backing_bs || is_child_of(to_replace_bs, backing_bs)) {
+        error_setg(errp, "Replacing this node would result in a loop");
+        to_replace_bs = NULL;
+        goto out;
+    }
+
+    /*
+     * Mirror is designed in such a way that when it completes, the
+     * source BDS is seamlessly replaced.  It is therefore not allowed
+     * to replace a BDS where this condition would be violated, as that
+     * would defeat the purpose of mirror and could lead to data
+     * corruption.
+     * Therefore, between parent_bs and to_replace_bs there may be
+     * only filters (and the one on top must be a filter, too), so
+     * their data always stays in sync and mirror can complete and
+     * replace to_replace_bs without any possible corruptions.
+     */
+    if (!is_filtered_child(parent_bs, to_replace_bs) &&
+        !is_filtered_child(to_replace_bs, parent_bs))
+    {
+        error_setg(errp, "The node to be replaced must be connected to the "
+                   "source through filter nodes only");
         to_replace_bs = NULL;
         goto out;
     }
diff --git a/blockdev.c b/blockdev.c
index bb71b8368d..53d17cc05e 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -3782,7 +3782,7 @@ static void blockdev_mirror_common(const char *job_id, BlockDriverState *bs,
     }
 
     if (has_replaces) {
-        BlockDriverState *to_replace_bs;
+        BlockDriverState *to_replace_bs, *backing_bs;
         AioContext *replace_aio_context;
         int64_t bs_size, replace_size;
 
@@ -3792,7 +3792,37 @@ static void blockdev_mirror_common(const char *job_id, BlockDriverState *bs,
             return;
         }
 
-        to_replace_bs = check_to_replace_node(bs, replaces, errp);
+        if (backing_mode == MIRROR_SOURCE_BACKING_CHAIN ||
+            backing_mode == MIRROR_OPEN_BACKING_CHAIN)
+        {
+            /*
+             * While we do not quite know what OPEN_BACKING_CHAIN
+             * (used for mode=existing) will yield, it is probably
+             * best to restrict it exactly like SOURCE_BACKING_CHAIN,
+             * because that is our best guess.
+             */
+            switch (sync) {
+            case MIRROR_SYNC_MODE_FULL:
+                backing_bs = NULL;
+                break;
+
+            case MIRROR_SYNC_MODE_TOP:
+                backing_bs = bdrv_filtered_cow_bs(bdrv_skip_rw_filters(bs));
+                break;
+
+            case MIRROR_SYNC_MODE_NONE:
+                backing_bs = bs;
+                break;
+
+            default:
+                abort();
+            }
+        } else {
+            assert(backing_mode == MIRROR_LEAVE_BACKING_CHAIN);
+            backing_bs = bdrv_filtered_cow_bs(bdrv_skip_rw_filters(target));
+        }
+
+        to_replace_bs = check_to_replace_node(bs, backing_bs, replaces, errp);
         if (!to_replace_bs) {
             return;
         }
-- 
2.20.1

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [Qemu-devel] [PATCH v4 05/11] block: Fix check_to_replace_node()
@ 2019-04-10 20:20   ` Max Reitz
  0 siblings, 0 replies; 48+ messages in thread
From: Max Reitz @ 2019-04-10 20:20 UTC (permalink / raw)
  To: qemu-block; +Cc: Kevin Wolf, qemu-devel, Max Reitz

Currently, check_to_replace_node() only allows mirror to replace a node
in the chain of the source node, and only if it is the first non-filter
node below the source.  Well, technically, the idea is that you can
exactly replace a quorum child by mirroring from quorum.

This has (probably) two reasons:
(1) We do not want to create loops.
(2) @replaces and @device should have exactly the same content so
    replacing them does not cause visible data to change.

This has two issues:
(1) It is overly restrictive.  It is completely fine for @replaces to be
    a filter.
(2) It is not restrictive enough.  You can create loops with this as
    follows:

$ qemu-img create -f qcow2 /tmp/source.qcow2 64M
$ qemu-system-x86_64 -qmp stdio
{"execute": "qmp_capabilities"}
{"execute": "object-add",
 "arguments": {"qom-type": "throttle-group", "id": "tg0"}}
{"execute": "blockdev-add",
 "arguments": {
     "node-name": "source",
     "driver": "throttle",
     "throttle-group": "tg0",
     "file": {
         "node-name": "filtered",
         "driver": "qcow2",
         "file": {
             "driver": "file",
             "filename": "/tmp/source.qcow2"
         } } } }
{"execute": "drive-mirror",
 "arguments": {
     "job-id": "mirror",
     "device": "source",
     "target": "/tmp/target.qcow2",
     "format": "qcow2",
     "node-name": "target",
     "sync" :"none",
     "replaces": "filtered"
 } }
{"execute": "block-job-complete", "arguments": {"device": "mirror"}}

And qemu crashes because of a stack overflow due to the loop being
created (target's backing file is source, so when it replaces filtered,
it points to itself through source).

(blockdev-mirror can be broken similarly.)

So let us make the checks for the two conditions above explicit, which
makes the whole function exactly as restrictive as it needs to be.

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 include/block/block.h |  1 +
 block.c               | 83 +++++++++++++++++++++++++++++++++++++++----
 blockdev.c            | 34 ++++++++++++++++--
 3 files changed, 110 insertions(+), 8 deletions(-)

diff --git a/include/block/block.h b/include/block/block.h
index 2005664f14..2878198892 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -402,6 +402,7 @@ bool bdrv_is_first_non_filter(BlockDriverState *candidate);
 
 /* check if a named node can be replaced when doing drive-mirror */
 BlockDriverState *check_to_replace_node(BlockDriverState *parent_bs,
+                                        BlockDriverState *backing_bs,
                                         const char *node_name, Error **errp);
 
 /* async block I/O */
diff --git a/block.c b/block.c
index 89cb6de4c3..820244f52e 100644
--- a/block.c
+++ b/block.c
@@ -5916,7 +5916,59 @@ bool bdrv_is_first_non_filter(BlockDriverState *candidate)
     return false;
 }
 
+static bool is_child_of(BlockDriverState *child, BlockDriverState *parent)
+{
+    BdrvChild *c;
+
+    if (!parent) {
+        return false;
+    }
+
+    QLIST_FOREACH(c, &parent->children, next) {
+        if (c->bs == child || is_child_of(child, c->bs)) {
+            return true;
+        }
+    }
+
+    return false;
+}
+
+/*
+ * Return true if there are only filters in [@top, @base).  Note that
+ * this may include quorum (which bdrv_chain_contains() cannot
+ * handle).
+ */
+static bool is_filtered_child(BlockDriverState *top, BlockDriverState *base)
+{
+    BdrvChild *c;
+
+    if (!top) {
+        return false;
+    }
+
+    if (top == base) {
+        return true;
+    }
+
+    if (!top->drv->is_filter) {
+        return false;
+    }
+
+    QLIST_FOREACH(c, &top->children, next) {
+        if (is_filtered_child(c->bs, base)) {
+            return true;
+        }
+    }
+
+    return false;
+}
+
+/*
+ * @parent_bs is mirror's source BDS, @backing_bs is the BDS which
+ * will be attached to the target when mirror completes.
+ */
 BlockDriverState *check_to_replace_node(BlockDriverState *parent_bs,
+                                        BlockDriverState *backing_bs,
                                         const char *node_name, Error **errp)
 {
     BlockDriverState *to_replace_bs = bdrv_find_node(node_name);
@@ -5935,13 +5987,32 @@ BlockDriverState *check_to_replace_node(BlockDriverState *parent_bs,
         goto out;
     }
 
-    /* We don't want arbitrary node of the BDS chain to be replaced only the top
-     * most non filter in order to prevent data corruption.
-     * Another benefit is that this tests exclude backing files which are
-     * blocked by the backing blockers.
+    /*
+     * If to_replace_bs is (recursively) a child of backing_bs,
+     * replacing it may create a loop.  We cannot allow that.
      */
-    if (!bdrv_recurse_is_first_non_filter(parent_bs, to_replace_bs)) {
-        error_setg(errp, "Only top most non filter can be replaced");
+    if (to_replace_bs == backing_bs || is_child_of(to_replace_bs, backing_bs)) {
+        error_setg(errp, "Replacing this node would result in a loop");
+        to_replace_bs = NULL;
+        goto out;
+    }
+
+    /*
+     * Mirror is designed in such a way that when it completes, the
+     * source BDS is seamlessly replaced.  It is therefore not allowed
+     * to replace a BDS where this condition would be violated, as that
+     * would defeat the purpose of mirror and could lead to data
+     * corruption.
+     * Therefore, between parent_bs and to_replace_bs there may be
+     * only filters (and the one on top must be a filter, too), so
+     * their data always stays in sync and mirror can complete and
+     * replace to_replace_bs without any possible corruptions.
+     */
+    if (!is_filtered_child(parent_bs, to_replace_bs) &&
+        !is_filtered_child(to_replace_bs, parent_bs))
+    {
+        error_setg(errp, "The node to be replaced must be connected to the "
+                   "source through filter nodes only");
         to_replace_bs = NULL;
         goto out;
     }
diff --git a/blockdev.c b/blockdev.c
index bb71b8368d..53d17cc05e 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -3782,7 +3782,7 @@ static void blockdev_mirror_common(const char *job_id, BlockDriverState *bs,
     }
 
     if (has_replaces) {
-        BlockDriverState *to_replace_bs;
+        BlockDriverState *to_replace_bs, *backing_bs;
         AioContext *replace_aio_context;
         int64_t bs_size, replace_size;
 
@@ -3792,7 +3792,37 @@ static void blockdev_mirror_common(const char *job_id, BlockDriverState *bs,
             return;
         }
 
-        to_replace_bs = check_to_replace_node(bs, replaces, errp);
+        if (backing_mode == MIRROR_SOURCE_BACKING_CHAIN ||
+            backing_mode == MIRROR_OPEN_BACKING_CHAIN)
+        {
+            /*
+             * While we do not quite know what OPEN_BACKING_CHAIN
+             * (used for mode=existing) will yield, it is probably
+             * best to restrict it exactly like SOURCE_BACKING_CHAIN,
+             * because that is our best guess.
+             */
+            switch (sync) {
+            case MIRROR_SYNC_MODE_FULL:
+                backing_bs = NULL;
+                break;
+
+            case MIRROR_SYNC_MODE_TOP:
+                backing_bs = bdrv_filtered_cow_bs(bdrv_skip_rw_filters(bs));
+                break;
+
+            case MIRROR_SYNC_MODE_NONE:
+                backing_bs = bs;
+                break;
+
+            default:
+                abort();
+            }
+        } else {
+            assert(backing_mode == MIRROR_LEAVE_BACKING_CHAIN);
+            backing_bs = bdrv_filtered_cow_bs(bdrv_skip_rw_filters(target));
+        }
+
+        to_replace_bs = check_to_replace_node(bs, backing_bs, replaces, errp);
         if (!to_replace_bs) {
             return;
         }
-- 
2.20.1



^ permalink raw reply	[flat|nested] 48+ messages in thread

* [Qemu-devel] [PATCH v4 06/11] iotests: Add tests for mirror @replaces loops
@ 2019-04-10 20:20   ` Max Reitz
  0 siblings, 0 replies; 48+ messages in thread
From: Max Reitz @ 2019-04-10 20:20 UTC (permalink / raw)
  To: qemu-block; +Cc: qemu-devel, Max Reitz, Kevin Wolf, Eric Blake

This adds two tests for cases where our old check_to_replace_node()
function failed to detect that executing this job with these parameters
would result in a cyclic graph.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
---
 tests/qemu-iotests/041     | 124 +++++++++++++++++++++++++++++++++++++
 tests/qemu-iotests/041.out |   4 +-
 2 files changed, 126 insertions(+), 2 deletions(-)

diff --git a/tests/qemu-iotests/041 b/tests/qemu-iotests/041
index 26bf1701eb..0c1432f189 100755
--- a/tests/qemu-iotests/041
+++ b/tests/qemu-iotests/041
@@ -1067,5 +1067,129 @@ class TestOrphanedSource(iotests.QMPTestCase):
                              target='dest-ro')
         self.assert_qmp(result, 'error/class', 'GenericError')
 
+# Various tests for the @replaces option (independent of quorum)
+class TestReplaces(iotests.QMPTestCase):
+    def setUp(self):
+        self.vm = iotests.VM()
+        self.vm.launch()
+
+    def tearDown(self):
+        self.vm.shutdown()
+
+    def test_drive_mirror_loop(self):
+        qemu_img('create', '-f', iotests.imgfmt, test_img, '1M')
+
+        result = self.vm.qmp('object-add', qom_type='throttle-group', id='tg')
+        self.assert_qmp(result, 'return', {})
+
+        result = self.vm.qmp('blockdev-add', **{
+                    'node-name': 'source',
+                    'driver': 'throttle',
+                    'throttle-group': 'tg',
+                    'file': {
+                        'node-name': 'filtered',
+                        'driver': iotests.imgfmt,
+                        'file': {
+                            'driver': 'file',
+                            'filename': test_img
+                        }
+                    }
+                })
+        self.assert_qmp(result, 'return', {})
+
+        # Mirror from @source to @target in sync=none, so that @source
+        # will be @target's backing file; but replace @filtered.
+        # Then, @target's backing file will be @source, whose backing
+        # file is now @target instead of @filtered.  That is a loop.
+        # (But apart from the loop, replacing @filtered instead of
+        # @source is fine, because both are just filtered versions of
+        # each other.)
+        result = self.vm.qmp('drive-mirror',
+                             job_id='mirror',
+                             device='source',
+                             target=target_img,
+                             format=iotests.imgfmt,
+                             node_name='target',
+                             sync='none',
+                             replaces='filtered')
+        if 'error' in result:
+            # This is the correct result
+            self.assert_qmp(result, 'error/class', 'GenericError')
+        else:
+            # This is wrong, but let's run it to the bitter conclusion
+            self.complete_and_wait(drive='mirror')
+            # Fail for good measure, although qemu should have crashed
+            # anyway
+            self.fail('Loop creation was successful')
+
+        os.remove(test_img)
+        try:
+            os.remove(target_img)
+        except OSError:
+            pass
+
+    def test_blockdev_mirror_loop(self):
+        qemu_img('create', '-f', iotests.imgfmt, test_img, '1M')
+        qemu_img('create', '-f', iotests.imgfmt, target_img, '1M')
+
+        result = self.vm.qmp('object-add', qom_type='throttle-group', id='tg')
+        self.assert_qmp(result, 'return', {})
+
+        result = self.vm.qmp('blockdev-add', **{
+                    'node-name': 'source',
+                    'driver': 'throttle',
+                    'throttle-group': 'tg',
+                    'file': {
+                        'node-name': 'middle',
+                        'driver': 'throttle',
+                        'throttle-group': 'tg',
+                        'file': {
+                            'node-name': 'bottom',
+                            'driver': iotests.imgfmt,
+                            'file': {
+                                'driver': 'file',
+                                'filename': test_img
+                            }
+                        }
+                    }
+                })
+        self.assert_qmp(result, 'return', {})
+
+        result = self.vm.qmp('blockdev-add', **{
+                    'node-name': 'target',
+                    'driver': iotests.imgfmt,
+                    'file': {
+                        'driver': 'file',
+                        'filename': target_img
+                    },
+                    'backing': 'middle'
+                })
+
+        # Mirror from @source to @target.  With blockdev-mirror, the
+        # current (old) backing file is retained (which is @middle).
+        # By replacing @bottom, @middle's file will be @target, whose
+        # backing file is @middle again.  That is a loop.
+        # (But apart from the loop, replacing @bottom instead of
+        # @source is fine, because both are just filtered versions of
+        # each other.)
+        result = self.vm.qmp('blockdev-mirror',
+                             job_id='mirror',
+                             device='source',
+                             target='target',
+                             sync='full',
+                             replaces='bottom')
+        if 'error' in result:
+            # This is the correct result
+            self.assert_qmp(result, 'error/class', 'GenericError')
+        else:
+            # This is wrong, but let's run it to the bitter conclusion
+            self.complete_and_wait(drive='mirror')
+            # Fail for good measure, although qemu should have crashed
+            # anyway
+            self.fail('Loop creation was successful')
+
+        os.remove(test_img)
+        os.remove(target_img)
+
 if __name__ == '__main__':
     iotests.main(supported_fmts=['qcow2', 'qed'])
diff --git a/tests/qemu-iotests/041.out b/tests/qemu-iotests/041.out
index e071d0b261..2c448b4239 100644
--- a/tests/qemu-iotests/041.out
+++ b/tests/qemu-iotests/041.out
@@ -1,5 +1,5 @@
-........................................................................................
+..........................................................................................
 ----------------------------------------------------------------------
-Ran 88 tests
+Ran 90 tests
 
 OK
-- 
2.20.1

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [Qemu-devel] [PATCH v4 06/11] iotests: Add tests for mirror @replaces loops
@ 2019-04-10 20:20   ` Max Reitz
  0 siblings, 0 replies; 48+ messages in thread
From: Max Reitz @ 2019-04-10 20:20 UTC (permalink / raw)
  To: qemu-block; +Cc: Kevin Wolf, qemu-devel, Max Reitz

This adds two tests for cases where our old check_to_replace_node()
function failed to detect that executing this job with these parameters
would result in a cyclic graph.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
---
 tests/qemu-iotests/041     | 124 +++++++++++++++++++++++++++++++++++++
 tests/qemu-iotests/041.out |   4 +-
 2 files changed, 126 insertions(+), 2 deletions(-)

diff --git a/tests/qemu-iotests/041 b/tests/qemu-iotests/041
index 26bf1701eb..0c1432f189 100755
--- a/tests/qemu-iotests/041
+++ b/tests/qemu-iotests/041
@@ -1067,5 +1067,129 @@ class TestOrphanedSource(iotests.QMPTestCase):
                              target='dest-ro')
         self.assert_qmp(result, 'error/class', 'GenericError')
 
+# Various tests for the @replaces option (independent of quorum)
+class TestReplaces(iotests.QMPTestCase):
+    def setUp(self):
+        self.vm = iotests.VM()
+        self.vm.launch()
+
+    def tearDown(self):
+        self.vm.shutdown()
+
+    def test_drive_mirror_loop(self):
+        qemu_img('create', '-f', iotests.imgfmt, test_img, '1M')
+
+        result = self.vm.qmp('object-add', qom_type='throttle-group', id='tg')
+        self.assert_qmp(result, 'return', {})
+
+        result = self.vm.qmp('blockdev-add', **{
+                    'node-name': 'source',
+                    'driver': 'throttle',
+                    'throttle-group': 'tg',
+                    'file': {
+                        'node-name': 'filtered',
+                        'driver': iotests.imgfmt,
+                        'file': {
+                            'driver': 'file',
+                            'filename': test_img
+                        }
+                    }
+                })
+        self.assert_qmp(result, 'return', {})
+
+        # Mirror from @source to @target in sync=none, so that @source
+        # will be @target's backing file; but replace @filtered.
+        # Then, @target's backing file will be @source, whose backing
+        # file is now @target instead of @filtered.  That is a loop.
+        # (But apart from the loop, replacing @filtered instead of
+        # @source is fine, because both are just filtered versions of
+        # each other.)
+        result = self.vm.qmp('drive-mirror',
+                             job_id='mirror',
+                             device='source',
+                             target=target_img,
+                             format=iotests.imgfmt,
+                             node_name='target',
+                             sync='none',
+                             replaces='filtered')
+        if 'error' in result:
+            # This is the correct result
+            self.assert_qmp(result, 'error/class', 'GenericError')
+        else:
+            # This is wrong, but let's run it to the bitter conclusion
+            self.complete_and_wait(drive='mirror')
+            # Fail for good measure, although qemu should have crashed
+            # anyway
+            self.fail('Loop creation was successful')
+
+        os.remove(test_img)
+        try:
+            os.remove(target_img)
+        except OSError:
+            pass
+
+    def test_blockdev_mirror_loop(self):
+        qemu_img('create', '-f', iotests.imgfmt, test_img, '1M')
+        qemu_img('create', '-f', iotests.imgfmt, target_img, '1M')
+
+        result = self.vm.qmp('object-add', qom_type='throttle-group', id='tg')
+        self.assert_qmp(result, 'return', {})
+
+        result = self.vm.qmp('blockdev-add', **{
+                    'node-name': 'source',
+                    'driver': 'throttle',
+                    'throttle-group': 'tg',
+                    'file': {
+                        'node-name': 'middle',
+                        'driver': 'throttle',
+                        'throttle-group': 'tg',
+                        'file': {
+                            'node-name': 'bottom',
+                            'driver': iotests.imgfmt,
+                            'file': {
+                                'driver': 'file',
+                                'filename': test_img
+                            }
+                        }
+                    }
+                })
+        self.assert_qmp(result, 'return', {})
+
+        result = self.vm.qmp('blockdev-add', **{
+                    'node-name': 'target',
+                    'driver': iotests.imgfmt,
+                    'file': {
+                        'driver': 'file',
+                        'filename': target_img
+                    },
+                    'backing': 'middle'
+                })
+
+        # Mirror from @source to @target.  With blockdev-mirror, the
+        # current (old) backing file is retained (which is @middle).
+        # By replacing @bottom, @middle's file will be @target, whose
+        # backing file is @middle again.  That is a loop.
+        # (But apart from the loop, replacing @bottom instead of
+        # @source is fine, because both are just filtered versions of
+        # each other.)
+        result = self.vm.qmp('blockdev-mirror',
+                             job_id='mirror',
+                             device='source',
+                             target='target',
+                             sync='full',
+                             replaces='bottom')
+        if 'error' in result:
+            # This is the correct result
+            self.assert_qmp(result, 'error/class', 'GenericError')
+        else:
+            # This is wrong, but let's run it to the bitter conclusion
+            self.complete_and_wait(drive='mirror')
+            # Fail for good measure, although qemu should have crashed
+            # anyway
+            self.fail('Loop creation was successful')
+
+        os.remove(test_img)
+        os.remove(target_img)
+
 if __name__ == '__main__':
     iotests.main(supported_fmts=['qcow2', 'qed'])
diff --git a/tests/qemu-iotests/041.out b/tests/qemu-iotests/041.out
index e071d0b261..2c448b4239 100644
--- a/tests/qemu-iotests/041.out
+++ b/tests/qemu-iotests/041.out
@@ -1,5 +1,5 @@
-........................................................................................
+..........................................................................................
 ----------------------------------------------------------------------
-Ran 88 tests
+Ran 90 tests
 
 OK
-- 
2.20.1



^ permalink raw reply	[flat|nested] 48+ messages in thread

* [Qemu-devel] [PATCH v4 07/11] block: Leave BDS.backing_file constant
@ 2019-04-10 20:20   ` Max Reitz
  0 siblings, 0 replies; 48+ messages in thread
From: Max Reitz @ 2019-04-10 20:20 UTC (permalink / raw)
  To: qemu-block; +Cc: qemu-devel, Max Reitz, Kevin Wolf, Eric Blake

Parts of the block layer treat BDS.backing_file as if it were whatever
the image header says (i.e., if it is a relative path, it is relative to
the overlay), other parts treat it like a cache for
bs->backing->bs->filename (relative paths are relative to the CWD).
Considering bs->backing->bs->filename exists, let us make it mean the
former.

Among other things, this now allows the user to specify a base when
using qemu-img to commit an image file in a directory that is not the
CWD (assuming, everything uses relative filenames).

Before this patch:

$ ./qemu-img create -f qcow2 foo/bot.qcow2 1M
$ ./qemu-img create -f qcow2 -b bot.qcow2 foo/mid.qcow2
$ ./qemu-img create -f qcow2 -b mid.qcow2 foo/top.qcow2
$ ./qemu-img commit -b mid.qcow2 foo/top.qcow2
qemu-img: Did not find 'mid.qcow2' in the backing chain of 'foo/top.qcow2'
$ ./qemu-img commit -b foo/mid.qcow2 foo/top.qcow2
qemu-img: Did not find 'foo/mid.qcow2' in the backing chain of 'foo/top.qcow2'
$ ./qemu-img commit -b $PWD/foo/mid.qcow2 foo/top.qcow2
qemu-img: Did not find '[...]/foo/mid.qcow2' in the backing chain of 'foo/top.qcow2'

After this patch:

$ ./qemu-img commit -b mid.qcow2 foo/top.qcow2
Image committed.
$ ./qemu-img commit -b foo/mid.qcow2 foo/top.qcow2
qemu-img: Did not find 'foo/mid.qcow2' in the backing chain of 'foo/top.qcow2'
$ ./qemu-img commit -b $PWD/foo/mid.qcow2 foo/top.qcow2
Image committed.

With this change, bdrv_find_backing_image() must look at whether the
user has overridden a BDS's backing file.  If so, it can no longer use
bs->backing_file, but must instead compare the given filename against
the backing node's filename directly.

Note that this changes the QAPI output for a node's backing_file.  We
had very inconsistent output there (sometimes what the image header
said, sometimes the actual filename of the backing image).  This
inconsistent output was effectively useless, so we have to decide one
way or the other.  Considering that bs->backing_file usually at runtime
contained the path to the image relative to qemu's CWD (or absolute),
this patch changes QAPI's backing_file to always report the
bs->backing->bs->filename from now on.  If you want to receive the image
header information, you have to refer to full-backing-filename.

This necessitates a change to iotest 228.  The interesting information
it really wanted is the image header, and it can get that now, but it
has to use full-backing-filename instead of backing_file.  Because of
this patch's changes to bs->backing_file's behavior, we also need some
reference output changes.

Along with the changes to bs->backing_file, stop updating
BDS.backing_format in bdrv_backing_attach() as well.  This necessitates
a change to the reference output of iotest 191.

iotest 245 changes in behavior: With the backing node no longer
overriding the parent node's backing_file string, you can now omit the
@backing option when reopening a node with neither a default nor a
current backing file even if it used to have a backing node at some
point.

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 include/block/block_int.h  | 19 ++++++++++++++-----
 block.c                    | 35 ++++++++++++++++++++++++++++-------
 block/qapi.c               |  7 ++++---
 qemu-img.c                 | 12 ++++++++++--
 tests/qemu-iotests/191.out |  1 -
 tests/qemu-iotests/228     |  6 +++---
 tests/qemu-iotests/228.out |  6 +++---
 tests/qemu-iotests/245     |  4 +++-
 8 files changed, 65 insertions(+), 25 deletions(-)

diff --git a/include/block/block_int.h b/include/block/block_int.h
index 76c7c0a111..69524bc712 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -760,11 +760,20 @@ struct BlockDriverState {
     bool walking_aio_notifiers; /* to make removal during iteration safe */
 
     char filename[PATH_MAX];
-    char backing_file[PATH_MAX]; /* if non zero, the image is a diff of
-                                    this file image */
-    /* The backing filename indicated by the image header; if we ever
-     * open this file, then this is replaced by the resulting BDS's
-     * filename (i.e. after a bdrv_refresh_filename() run). */
+    /*
+     * If not empty, this image is a diff in relation to backing_file.
+     * Note that this is the name given in the image header and
+     * therefore may or may not be equal to .backing->bs->filename.
+     * If this field contains a relative path, it is to be resolved
+     * relatively to the overlay's location.
+     */
+    char backing_file[PATH_MAX];
+    /*
+     * The backing filename indicated by the image header.  Contrary
+     * to backing_file, if we ever open this file, auto_backing_file
+     * is replaced by the resulting BDS's filename (i.e. after a
+     * bdrv_refresh_filename() run).
+     */
     char auto_backing_file[PATH_MAX];
     char backing_format[16]; /* if non-zero and backing_file exists */
 
diff --git a/block.c b/block.c
index 820244f52e..a4c2dda039 100644
--- a/block.c
+++ b/block.c
@@ -78,6 +78,8 @@ static BlockDriverState *bdrv_open_inherit(const char *filename,
                                            const BdrvChildRole *child_role,
                                            Error **errp);
 
+static bool bdrv_backing_overridden(BlockDriverState *bs);
+
 /* If non-zero, use only whitelisted block drivers */
 static int use_bdrv_whitelist;
 
@@ -1046,10 +1048,6 @@ static void bdrv_backing_attach(BdrvChild *c)
     bdrv_refresh_filename(backing_hd);
 
     parent->open_flags &= ~BDRV_O_NO_BACKING;
-    pstrcpy(parent->backing_file, sizeof(parent->backing_file),
-            backing_hd->filename);
-    pstrcpy(parent->backing_format, sizeof(parent->backing_format),
-            backing_hd->drv ? backing_hd->drv->format_name : "");
 
     bdrv_op_block_all(backing_hd, parent->backing_blocker);
     /* Otherwise we won't be able to commit or stream */
@@ -5048,6 +5046,7 @@ BlockDriverState *bdrv_find_backing_image(BlockDriverState *bs,
     char *backing_file_full = NULL;
     char *filename_tmp = NULL;
     int is_protocol = 0;
+    bool filenames_refreshed = false;
     BlockDriverState *curr_bs = NULL;
     BlockDriverState *retval = NULL;
 
@@ -5072,9 +5071,31 @@ BlockDriverState *bdrv_find_backing_image(BlockDriverState *bs,
     {
         BlockDriverState *bs_below = bdrv_backing_chain_next(curr_bs);
 
-        /* If either of the filename paths is actually a protocol, then
-         * compare unmodified paths; otherwise make paths relative */
-        if (is_protocol || path_has_protocol(curr_bs->backing_file)) {
+        if (bdrv_backing_overridden(curr_bs)) {
+            /*
+             * If the backing file was overridden, we can only compare
+             * directly against the backing node's filename.
+             */
+
+            if (!filenames_refreshed) {
+                /*
+                 * This will automatically refresh all of the
+                 * filenames in the rest of the backing chain, so we
+                 * only need to do this once.
+                 */
+                bdrv_refresh_filename(bs_below);
+                filenames_refreshed = true;
+            }
+
+            if (strcmp(backing_file, bs_below->filename) == 0) {
+                retval = bs_below;
+                break;
+            }
+        } else if (is_protocol || path_has_protocol(curr_bs->backing_file)) {
+            /*
+             * If either of the filename paths is actually a protocol, then
+             * compare unmodified paths; otherwise make paths relative.
+             */
             char *backing_file_full_ret;
 
             if (strcmp(backing_file, curr_bs->backing_file) == 0) {
diff --git a/block/qapi.c b/block/qapi.c
index e026d27077..d0c895808a 100644
--- a/block/qapi.c
+++ b/block/qapi.c
@@ -43,7 +43,7 @@ BlockDeviceInfo *bdrv_block_device_info(BlockBackend *blk,
                                         BlockDriverState *bs, Error **errp)
 {
     ImageInfo **p_image_info;
-    BlockDriverState *bs0;
+    BlockDriverState *bs0, *backing;
     BlockDeviceInfo *info;
 
     if (!bs->drv) {
@@ -72,9 +72,10 @@ BlockDeviceInfo *bdrv_block_device_info(BlockBackend *blk,
         info->node_name = g_strdup(bs->node_name);
     }
 
-    if (bs->backing_file[0]) {
+    backing = bdrv_filtered_cow_bs(bs);
+    if (backing) {
         info->has_backing_file = true;
-        info->backing_file = g_strdup(bs->backing_file);
+        info->backing_file = g_strdup(backing->filename);
     }
 
     info->detect_zeroes = bs->detect_zeroes;
diff --git a/qemu-img.c b/qemu-img.c
index bcfbb743fc..836e22061c 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -3305,7 +3305,7 @@ static int img_rebase(int argc, char **argv)
 
     /* For safe rebasing we need to compare old and new backing file */
     if (!unsafe) {
-        char backing_name[PATH_MAX];
+        char *backing_name;
         QDict *options = NULL;
 
         if (bs->backing_format[0] != '\0') {
@@ -3319,16 +3319,24 @@ static int img_rebase(int argc, char **argv)
             }
             qdict_put_bool(options, BDRV_OPT_FORCE_SHARE, true);
         }
-        bdrv_get_backing_filename(bs, backing_name, sizeof(backing_name));
+        backing_name = bdrv_get_full_backing_filename(bs, &local_err);
+        if (local_err) {
+            error_reportf_err(local_err,
+                              "Could not resolve old backing file name: ");
+            ret = -1;
+            goto out;
+        }
         blk_old_backing = blk_new_open(backing_name, NULL,
                                        options, src_flags, &local_err);
         if (!blk_old_backing) {
             error_reportf_err(local_err,
                               "Could not open old backing file '%s': ",
                               backing_name);
+            g_free(backing_name);
             ret = -1;
             goto out;
         }
+        g_free(backing_name);
 
         if (out_baseimg[0]) {
             const char *overlay_filename;
diff --git a/tests/qemu-iotests/191.out b/tests/qemu-iotests/191.out
index 3fc92bb56e..0b3c216b0c 100644
--- a/tests/qemu-iotests/191.out
+++ b/tests/qemu-iotests/191.out
@@ -605,7 +605,6 @@ wrote 65536/65536 bytes at offset 1048576
                     "backing-filename": "TEST_DIR/t.IMGFMT.base",
                     "dirty-flag": false
                 },
-                "backing-filename-format": "IMGFMT",
                 "virtual-size": 67108864,
                 "filename": "TEST_DIR/t.IMGFMT.ovl3",
                 "cluster-size": 65536,
diff --git a/tests/qemu-iotests/228 b/tests/qemu-iotests/228
index 9a50afd205..a1f3187212 100755
--- a/tests/qemu-iotests/228
+++ b/tests/qemu-iotests/228
@@ -34,7 +34,7 @@ def log_node_info(node):
 
     log('bs->filename: ' + node['image']['filename'],
         filters=[filter_testfiles, filter_imgfmt])
-    log('bs->backing_file: ' + node['backing_file'],
+    log('bs->backing_file: ' + node['image']['full-backing-filename'],
         filters=[filter_testfiles, filter_imgfmt])
 
     if 'backing-image' in node['image']:
@@ -70,8 +70,8 @@ with iotests.FilePath('base.img') as base_img_path, \
                 },
                 filters=[filter_qmp_testfiles, filter_qmp_imgfmt])
 
-    # Filename should be plain, and the backing filename should not
-    # contain the "file:" prefix
+    # Filename should be plain, and the backing node filename should
+    # not contain the "file:" prefix
     log_node_info(vm.node_info('node0'))
 
     vm.qmp_log('blockdev-del', node_name='node0')
diff --git a/tests/qemu-iotests/228.out b/tests/qemu-iotests/228.out
index 4217df24fe..8c82009abe 100644
--- a/tests/qemu-iotests/228.out
+++ b/tests/qemu-iotests/228.out
@@ -4,7 +4,7 @@
 {"return": {}}
 
 bs->filename: TEST_DIR/PID-top.img
-bs->backing_file: TEST_DIR/PID-base.img
+bs->backing_file: file:TEST_DIR/PID-base.img
 bs->backing->bs->filename: TEST_DIR/PID-base.img
 
 {"execute": "blockdev-del", "arguments": {"node-name": "node0"}}
@@ -41,7 +41,7 @@ bs->backing->bs->filename: TEST_DIR/PID-base.img
 {"return": {}}
 
 bs->filename: TEST_DIR/PID-top.img
-bs->backing_file: TEST_DIR/PID-base.img
+bs->backing_file: file:TEST_DIR/PID-base.img
 bs->backing->bs->filename: TEST_DIR/PID-base.img
 
 {"execute": "blockdev-del", "arguments": {"node-name": "node0"}}
@@ -55,7 +55,7 @@ bs->backing->bs->filename: TEST_DIR/PID-base.img
 {"return": {}}
 
 bs->filename: json:{"backing": {"driver": "null-co"}, "driver": "IMGFMT", "file": {"driver": "file", "filename": "TEST_DIR/PID-top.img"}}
-bs->backing_file: null-co://
+bs->backing_file: TEST_DIR/PID-base.img
 bs->backing->bs->filename: null-co://
 
 {"execute": "blockdev-del", "arguments": {"node-name": "node0"}}
diff --git a/tests/qemu-iotests/245 b/tests/qemu-iotests/245
index a04c6235c1..1b191d4da7 100644
--- a/tests/qemu-iotests/245
+++ b/tests/qemu-iotests/245
@@ -722,7 +722,9 @@ class TestBlockdevReopen(iotests.QMPTestCase):
 
         # Detach hd2 from hd0.
         self.reopen(opts, {'backing': None})
-        self.reopen(opts, {}, "backing is missing for 'hd0'")
+
+        # Without a backing file, we can omit 'backing' again
+        self.reopen(opts)
 
         # Remove both hd0 and hd2
         result = self.vm.qmp('blockdev-del', conv_keys = True, node_name = 'hd0')
-- 
2.20.1

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [Qemu-devel] [PATCH v4 07/11] block: Leave BDS.backing_file constant
@ 2019-04-10 20:20   ` Max Reitz
  0 siblings, 0 replies; 48+ messages in thread
From: Max Reitz @ 2019-04-10 20:20 UTC (permalink / raw)
  To: qemu-block; +Cc: Kevin Wolf, qemu-devel, Max Reitz

Parts of the block layer treat BDS.backing_file as if it were whatever
the image header says (i.e., if it is a relative path, it is relative to
the overlay), other parts treat it like a cache for
bs->backing->bs->filename (relative paths are relative to the CWD).
Considering bs->backing->bs->filename exists, let us make it mean the
former.

Among other things, this now allows the user to specify a base when
using qemu-img to commit an image file in a directory that is not the
CWD (assuming, everything uses relative filenames).

Before this patch:

$ ./qemu-img create -f qcow2 foo/bot.qcow2 1M
$ ./qemu-img create -f qcow2 -b bot.qcow2 foo/mid.qcow2
$ ./qemu-img create -f qcow2 -b mid.qcow2 foo/top.qcow2
$ ./qemu-img commit -b mid.qcow2 foo/top.qcow2
qemu-img: Did not find 'mid.qcow2' in the backing chain of 'foo/top.qcow2'
$ ./qemu-img commit -b foo/mid.qcow2 foo/top.qcow2
qemu-img: Did not find 'foo/mid.qcow2' in the backing chain of 'foo/top.qcow2'
$ ./qemu-img commit -b $PWD/foo/mid.qcow2 foo/top.qcow2
qemu-img: Did not find '[...]/foo/mid.qcow2' in the backing chain of 'foo/top.qcow2'

After this patch:

$ ./qemu-img commit -b mid.qcow2 foo/top.qcow2
Image committed.
$ ./qemu-img commit -b foo/mid.qcow2 foo/top.qcow2
qemu-img: Did not find 'foo/mid.qcow2' in the backing chain of 'foo/top.qcow2'
$ ./qemu-img commit -b $PWD/foo/mid.qcow2 foo/top.qcow2
Image committed.

With this change, bdrv_find_backing_image() must look at whether the
user has overridden a BDS's backing file.  If so, it can no longer use
bs->backing_file, but must instead compare the given filename against
the backing node's filename directly.

Note that this changes the QAPI output for a node's backing_file.  We
had very inconsistent output there (sometimes what the image header
said, sometimes the actual filename of the backing image).  This
inconsistent output was effectively useless, so we have to decide one
way or the other.  Considering that bs->backing_file usually at runtime
contained the path to the image relative to qemu's CWD (or absolute),
this patch changes QAPI's backing_file to always report the
bs->backing->bs->filename from now on.  If you want to receive the image
header information, you have to refer to full-backing-filename.

This necessitates a change to iotest 228.  The interesting information
it really wanted is the image header, and it can get that now, but it
has to use full-backing-filename instead of backing_file.  Because of
this patch's changes to bs->backing_file's behavior, we also need some
reference output changes.

Along with the changes to bs->backing_file, stop updating
BDS.backing_format in bdrv_backing_attach() as well.  This necessitates
a change to the reference output of iotest 191.

iotest 245 changes in behavior: With the backing node no longer
overriding the parent node's backing_file string, you can now omit the
@backing option when reopening a node with neither a default nor a
current backing file even if it used to have a backing node at some
point.

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 include/block/block_int.h  | 19 ++++++++++++++-----
 block.c                    | 35 ++++++++++++++++++++++++++++-------
 block/qapi.c               |  7 ++++---
 qemu-img.c                 | 12 ++++++++++--
 tests/qemu-iotests/191.out |  1 -
 tests/qemu-iotests/228     |  6 +++---
 tests/qemu-iotests/228.out |  6 +++---
 tests/qemu-iotests/245     |  4 +++-
 8 files changed, 65 insertions(+), 25 deletions(-)

diff --git a/include/block/block_int.h b/include/block/block_int.h
index 76c7c0a111..69524bc712 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -760,11 +760,20 @@ struct BlockDriverState {
     bool walking_aio_notifiers; /* to make removal during iteration safe */
 
     char filename[PATH_MAX];
-    char backing_file[PATH_MAX]; /* if non zero, the image is a diff of
-                                    this file image */
-    /* The backing filename indicated by the image header; if we ever
-     * open this file, then this is replaced by the resulting BDS's
-     * filename (i.e. after a bdrv_refresh_filename() run). */
+    /*
+     * If not empty, this image is a diff in relation to backing_file.
+     * Note that this is the name given in the image header and
+     * therefore may or may not be equal to .backing->bs->filename.
+     * If this field contains a relative path, it is to be resolved
+     * relatively to the overlay's location.
+     */
+    char backing_file[PATH_MAX];
+    /*
+     * The backing filename indicated by the image header.  Contrary
+     * to backing_file, if we ever open this file, auto_backing_file
+     * is replaced by the resulting BDS's filename (i.e. after a
+     * bdrv_refresh_filename() run).
+     */
     char auto_backing_file[PATH_MAX];
     char backing_format[16]; /* if non-zero and backing_file exists */
 
diff --git a/block.c b/block.c
index 820244f52e..a4c2dda039 100644
--- a/block.c
+++ b/block.c
@@ -78,6 +78,8 @@ static BlockDriverState *bdrv_open_inherit(const char *filename,
                                            const BdrvChildRole *child_role,
                                            Error **errp);
 
+static bool bdrv_backing_overridden(BlockDriverState *bs);
+
 /* If non-zero, use only whitelisted block drivers */
 static int use_bdrv_whitelist;
 
@@ -1046,10 +1048,6 @@ static void bdrv_backing_attach(BdrvChild *c)
     bdrv_refresh_filename(backing_hd);
 
     parent->open_flags &= ~BDRV_O_NO_BACKING;
-    pstrcpy(parent->backing_file, sizeof(parent->backing_file),
-            backing_hd->filename);
-    pstrcpy(parent->backing_format, sizeof(parent->backing_format),
-            backing_hd->drv ? backing_hd->drv->format_name : "");
 
     bdrv_op_block_all(backing_hd, parent->backing_blocker);
     /* Otherwise we won't be able to commit or stream */
@@ -5048,6 +5046,7 @@ BlockDriverState *bdrv_find_backing_image(BlockDriverState *bs,
     char *backing_file_full = NULL;
     char *filename_tmp = NULL;
     int is_protocol = 0;
+    bool filenames_refreshed = false;
     BlockDriverState *curr_bs = NULL;
     BlockDriverState *retval = NULL;
 
@@ -5072,9 +5071,31 @@ BlockDriverState *bdrv_find_backing_image(BlockDriverState *bs,
     {
         BlockDriverState *bs_below = bdrv_backing_chain_next(curr_bs);
 
-        /* If either of the filename paths is actually a protocol, then
-         * compare unmodified paths; otherwise make paths relative */
-        if (is_protocol || path_has_protocol(curr_bs->backing_file)) {
+        if (bdrv_backing_overridden(curr_bs)) {
+            /*
+             * If the backing file was overridden, we can only compare
+             * directly against the backing node's filename.
+             */
+
+            if (!filenames_refreshed) {
+                /*
+                 * This will automatically refresh all of the
+                 * filenames in the rest of the backing chain, so we
+                 * only need to do this once.
+                 */
+                bdrv_refresh_filename(bs_below);
+                filenames_refreshed = true;
+            }
+
+            if (strcmp(backing_file, bs_below->filename) == 0) {
+                retval = bs_below;
+                break;
+            }
+        } else if (is_protocol || path_has_protocol(curr_bs->backing_file)) {
+            /*
+             * If either of the filename paths is actually a protocol, then
+             * compare unmodified paths; otherwise make paths relative.
+             */
             char *backing_file_full_ret;
 
             if (strcmp(backing_file, curr_bs->backing_file) == 0) {
diff --git a/block/qapi.c b/block/qapi.c
index e026d27077..d0c895808a 100644
--- a/block/qapi.c
+++ b/block/qapi.c
@@ -43,7 +43,7 @@ BlockDeviceInfo *bdrv_block_device_info(BlockBackend *blk,
                                         BlockDriverState *bs, Error **errp)
 {
     ImageInfo **p_image_info;
-    BlockDriverState *bs0;
+    BlockDriverState *bs0, *backing;
     BlockDeviceInfo *info;
 
     if (!bs->drv) {
@@ -72,9 +72,10 @@ BlockDeviceInfo *bdrv_block_device_info(BlockBackend *blk,
         info->node_name = g_strdup(bs->node_name);
     }
 
-    if (bs->backing_file[0]) {
+    backing = bdrv_filtered_cow_bs(bs);
+    if (backing) {
         info->has_backing_file = true;
-        info->backing_file = g_strdup(bs->backing_file);
+        info->backing_file = g_strdup(backing->filename);
     }
 
     info->detect_zeroes = bs->detect_zeroes;
diff --git a/qemu-img.c b/qemu-img.c
index bcfbb743fc..836e22061c 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -3305,7 +3305,7 @@ static int img_rebase(int argc, char **argv)
 
     /* For safe rebasing we need to compare old and new backing file */
     if (!unsafe) {
-        char backing_name[PATH_MAX];
+        char *backing_name;
         QDict *options = NULL;
 
         if (bs->backing_format[0] != '\0') {
@@ -3319,16 +3319,24 @@ static int img_rebase(int argc, char **argv)
             }
             qdict_put_bool(options, BDRV_OPT_FORCE_SHARE, true);
         }
-        bdrv_get_backing_filename(bs, backing_name, sizeof(backing_name));
+        backing_name = bdrv_get_full_backing_filename(bs, &local_err);
+        if (local_err) {
+            error_reportf_err(local_err,
+                              "Could not resolve old backing file name: ");
+            ret = -1;
+            goto out;
+        }
         blk_old_backing = blk_new_open(backing_name, NULL,
                                        options, src_flags, &local_err);
         if (!blk_old_backing) {
             error_reportf_err(local_err,
                               "Could not open old backing file '%s': ",
                               backing_name);
+            g_free(backing_name);
             ret = -1;
             goto out;
         }
+        g_free(backing_name);
 
         if (out_baseimg[0]) {
             const char *overlay_filename;
diff --git a/tests/qemu-iotests/191.out b/tests/qemu-iotests/191.out
index 3fc92bb56e..0b3c216b0c 100644
--- a/tests/qemu-iotests/191.out
+++ b/tests/qemu-iotests/191.out
@@ -605,7 +605,6 @@ wrote 65536/65536 bytes at offset 1048576
                     "backing-filename": "TEST_DIR/t.IMGFMT.base",
                     "dirty-flag": false
                 },
-                "backing-filename-format": "IMGFMT",
                 "virtual-size": 67108864,
                 "filename": "TEST_DIR/t.IMGFMT.ovl3",
                 "cluster-size": 65536,
diff --git a/tests/qemu-iotests/228 b/tests/qemu-iotests/228
index 9a50afd205..a1f3187212 100755
--- a/tests/qemu-iotests/228
+++ b/tests/qemu-iotests/228
@@ -34,7 +34,7 @@ def log_node_info(node):
 
     log('bs->filename: ' + node['image']['filename'],
         filters=[filter_testfiles, filter_imgfmt])
-    log('bs->backing_file: ' + node['backing_file'],
+    log('bs->backing_file: ' + node['image']['full-backing-filename'],
         filters=[filter_testfiles, filter_imgfmt])
 
     if 'backing-image' in node['image']:
@@ -70,8 +70,8 @@ with iotests.FilePath('base.img') as base_img_path, \
                 },
                 filters=[filter_qmp_testfiles, filter_qmp_imgfmt])
 
-    # Filename should be plain, and the backing filename should not
-    # contain the "file:" prefix
+    # Filename should be plain, and the backing node filename should
+    # not contain the "file:" prefix
     log_node_info(vm.node_info('node0'))
 
     vm.qmp_log('blockdev-del', node_name='node0')
diff --git a/tests/qemu-iotests/228.out b/tests/qemu-iotests/228.out
index 4217df24fe..8c82009abe 100644
--- a/tests/qemu-iotests/228.out
+++ b/tests/qemu-iotests/228.out
@@ -4,7 +4,7 @@
 {"return": {}}
 
 bs->filename: TEST_DIR/PID-top.img
-bs->backing_file: TEST_DIR/PID-base.img
+bs->backing_file: file:TEST_DIR/PID-base.img
 bs->backing->bs->filename: TEST_DIR/PID-base.img
 
 {"execute": "blockdev-del", "arguments": {"node-name": "node0"}}
@@ -41,7 +41,7 @@ bs->backing->bs->filename: TEST_DIR/PID-base.img
 {"return": {}}
 
 bs->filename: TEST_DIR/PID-top.img
-bs->backing_file: TEST_DIR/PID-base.img
+bs->backing_file: file:TEST_DIR/PID-base.img
 bs->backing->bs->filename: TEST_DIR/PID-base.img
 
 {"execute": "blockdev-del", "arguments": {"node-name": "node0"}}
@@ -55,7 +55,7 @@ bs->backing->bs->filename: TEST_DIR/PID-base.img
 {"return": {}}
 
 bs->filename: json:{"backing": {"driver": "null-co"}, "driver": "IMGFMT", "file": {"driver": "file", "filename": "TEST_DIR/PID-top.img"}}
-bs->backing_file: null-co://
+bs->backing_file: TEST_DIR/PID-base.img
 bs->backing->bs->filename: null-co://
 
 {"execute": "blockdev-del", "arguments": {"node-name": "node0"}}
diff --git a/tests/qemu-iotests/245 b/tests/qemu-iotests/245
index a04c6235c1..1b191d4da7 100644
--- a/tests/qemu-iotests/245
+++ b/tests/qemu-iotests/245
@@ -722,7 +722,9 @@ class TestBlockdevReopen(iotests.QMPTestCase):
 
         # Detach hd2 from hd0.
         self.reopen(opts, {'backing': None})
-        self.reopen(opts, {}, "backing is missing for 'hd0'")
+
+        # Without a backing file, we can omit 'backing' again
+        self.reopen(opts)
 
         # Remove both hd0 and hd2
         result = self.vm.qmp('blockdev-del', conv_keys = True, node_name = 'hd0')
-- 
2.20.1



^ permalink raw reply	[flat|nested] 48+ messages in thread

* [Qemu-devel] [PATCH v4 08/11] iotests: Add filter commit test cases
@ 2019-04-10 20:20   ` Max Reitz
  0 siblings, 0 replies; 48+ messages in thread
From: Max Reitz @ 2019-04-10 20:20 UTC (permalink / raw)
  To: qemu-block; +Cc: qemu-devel, Max Reitz, Kevin Wolf, Eric Blake

This patch adds some tests on how commit copes with filter nodes.

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 tests/qemu-iotests/040     | 130 +++++++++++++++++++++++++++++++++++++
 tests/qemu-iotests/040.out |   4 +-
 2 files changed, 132 insertions(+), 2 deletions(-)

diff --git a/tests/qemu-iotests/040 b/tests/qemu-iotests/040
index b81133a474..dc3fe57fbd 100755
--- a/tests/qemu-iotests/040
+++ b/tests/qemu-iotests/040
@@ -394,5 +394,135 @@ class TestReopenOverlay(ImageCommitTestCase):
     def test_reopen_overlay(self):
         self.run_commit_test(self.img1, self.img0)
 
+class TestCommitWithFilters(iotests.QMPTestCase):
+    img0 = os.path.join(iotests.test_dir, '0.img')
+    img1 = os.path.join(iotests.test_dir, '1.img')
+    img2 = os.path.join(iotests.test_dir, '2.img')
+    img3 = os.path.join(iotests.test_dir, '3.img')
+
+    def setUp(self):
+        qemu_img('create', '-f', iotests.imgfmt, self.img0, '1M')
+        qemu_img('create', '-f', iotests.imgfmt, self.img1, '1M')
+        qemu_img('create', '-f', iotests.imgfmt, self.img2, '1M')
+        qemu_img('create', '-f', iotests.imgfmt, self.img3, '1M')
+
+        self.vm = iotests.VM()
+        self.vm.launch()
+        result = self.vm.qmp('object-add', qom_type='throttle-group', id='tg')
+        self.assert_qmp(result, 'return', {})
+
+        result = self.vm.qmp('blockdev-add', **{
+                'node-name': 'top-filter',
+                'driver': 'throttle',
+                'throttle-group': 'tg',
+                'file': {
+                    'node-name': 'cow-3',
+                    'driver': iotests.imgfmt,
+                    'file': {
+                        'driver': 'file',
+                        'filename': self.img3
+                    },
+                    'backing': {
+                        'node-name': 'cow-2',
+                        'driver': iotests.imgfmt,
+                        'file': {
+                            'driver': 'file',
+                            'filename': self.img2
+                        },
+                        'backing': {
+                            'node-name': 'cow-1',
+                            'driver': iotests.imgfmt,
+                            'file': {
+                                'driver': 'file',
+                                'filename': self.img1
+                            },
+                            'backing': {
+                                'node-name': 'bottom-filter',
+                                'driver': 'throttle',
+                                'throttle-group': 'tg',
+                                'file': {
+                                    'node-name': 'cow-0',
+                                    'driver': iotests.imgfmt,
+                                    'file': {
+                                        'driver': 'file',
+                                        'filename': self.img0
+                                    }
+                                }
+                            }
+                        }
+                    }
+                }
+            })
+        self.assert_qmp(result, 'return', {})
+
+    def tearDown(self):
+        self.vm.shutdown()
+        os.remove(self.img3)
+        os.remove(self.img2)
+        os.remove(self.img1)
+        os.remove(self.img0)
+
+    # Filters make for funny filenames, so we cannot just use
+    # self.imgX for the block-commit parameters
+    def get_filename(self, node):
+        return self.vm.node_info(node)['image']['filename']
+
+    def test_filterless_commit(self):
+        self.assert_no_active_block_jobs()
+        result = self.vm.qmp('block-commit',
+                             job_id='commit',
+                             device='top-filter',
+                             top=self.get_filename('cow-2'),
+                             base=self.get_filename('cow-1'))
+        self.assert_qmp(result, 'return', {})
+        self.wait_until_completed(drive='commit')
+
+    def test_commit_through_filter(self):
+        self.assert_no_active_block_jobs()
+        result = self.vm.qmp('block-commit',
+                             job_id='commit',
+                             device='top-filter',
+                             top=self.get_filename('cow-1'),
+                             base=self.get_filename('cow-0'))
+        # Cannot commit through explicitly added filters (yet,
+        # although in the future we probably want to make users use
+        # blockdev-copy for this)
+        self.assert_qmp(result, 'error/class', 'GenericError')
+        self.assert_qmp(result, 'error/desc', 'Cannot commit through explicit filter nodes')
+
+    def test_filtered_active_commit_with_filter(self):
+        self.assert_no_active_block_jobs()
+        result = self.vm.qmp('block-commit',
+                             job_id='commit',
+                             device='top-filter',
+                             base=self.get_filename('cow-2'))
+        # Not specifying @top means active commit, so including the
+        # filter on top (which is not allowed right now)
+        self.assert_qmp(result, 'error/class', 'GenericError')
+        self.assert_qmp(result, 'error/desc', 'Cannot commit through explicit filter nodes')
+
+    def test_filtered_active_commit_without_filter(self):
+        cow3_name = self.get_filename('cow-3')
+
+        self.assert_no_active_block_jobs()
+        result = self.vm.qmp('block-commit',
+                             job_id='commit',
+                             device='top-filter',
+                             top=cow3_name,
+                             base=self.get_filename('cow-2'))
+        # This is how you'd want to specify committing img3 into img2
+        # disregarding the filter on top of img3 -- but that does not
+        # work, because you can only specify names of backing files
+        # (and img3 is not a backing file).  The solution for this
+        # would be for block-commit to accept node names.
+        # Note that even if it did work, the above command would
+        # result in a non-active commit, because img3 is not the top
+        # node.  Which is wrong, because img3 can still be written to,
+        # so it should be an active commit, but that is a different
+        # story.
+        self.assert_qmp(result, 'error/class', 'GenericError')
+        self.assert_qmp(result, 'error/desc',
+                        'Top image file %s not found' % cow3_name)
+
 if __name__ == '__main__':
     iotests.main(supported_fmts=['qcow2', 'qed'])
diff --git a/tests/qemu-iotests/040.out b/tests/qemu-iotests/040.out
index 802ffaa0c0..220a5fa82c 100644
--- a/tests/qemu-iotests/040.out
+++ b/tests/qemu-iotests/040.out
@@ -1,5 +1,5 @@
-...........................................
+...............................................
 ----------------------------------------------------------------------
-Ran 43 tests
+Ran 47 tests
 
 OK
-- 
2.20.1

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [Qemu-devel] [PATCH v4 08/11] iotests: Add filter commit test cases
@ 2019-04-10 20:20   ` Max Reitz
  0 siblings, 0 replies; 48+ messages in thread
From: Max Reitz @ 2019-04-10 20:20 UTC (permalink / raw)
  To: qemu-block; +Cc: Kevin Wolf, qemu-devel, Max Reitz

This patch adds some tests on how commit copes with filter nodes.

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 tests/qemu-iotests/040     | 130 +++++++++++++++++++++++++++++++++++++
 tests/qemu-iotests/040.out |   4 +-
 2 files changed, 132 insertions(+), 2 deletions(-)

diff --git a/tests/qemu-iotests/040 b/tests/qemu-iotests/040
index b81133a474..dc3fe57fbd 100755
--- a/tests/qemu-iotests/040
+++ b/tests/qemu-iotests/040
@@ -394,5 +394,135 @@ class TestReopenOverlay(ImageCommitTestCase):
     def test_reopen_overlay(self):
         self.run_commit_test(self.img1, self.img0)
 
+class TestCommitWithFilters(iotests.QMPTestCase):
+    img0 = os.path.join(iotests.test_dir, '0.img')
+    img1 = os.path.join(iotests.test_dir, '1.img')
+    img2 = os.path.join(iotests.test_dir, '2.img')
+    img3 = os.path.join(iotests.test_dir, '3.img')
+
+    def setUp(self):
+        qemu_img('create', '-f', iotests.imgfmt, self.img0, '1M')
+        qemu_img('create', '-f', iotests.imgfmt, self.img1, '1M')
+        qemu_img('create', '-f', iotests.imgfmt, self.img2, '1M')
+        qemu_img('create', '-f', iotests.imgfmt, self.img3, '1M')
+
+        self.vm = iotests.VM()
+        self.vm.launch()
+        result = self.vm.qmp('object-add', qom_type='throttle-group', id='tg')
+        self.assert_qmp(result, 'return', {})
+
+        result = self.vm.qmp('blockdev-add', **{
+                'node-name': 'top-filter',
+                'driver': 'throttle',
+                'throttle-group': 'tg',
+                'file': {
+                    'node-name': 'cow-3',
+                    'driver': iotests.imgfmt,
+                    'file': {
+                        'driver': 'file',
+                        'filename': self.img3
+                    },
+                    'backing': {
+                        'node-name': 'cow-2',
+                        'driver': iotests.imgfmt,
+                        'file': {
+                            'driver': 'file',
+                            'filename': self.img2
+                        },
+                        'backing': {
+                            'node-name': 'cow-1',
+                            'driver': iotests.imgfmt,
+                            'file': {
+                                'driver': 'file',
+                                'filename': self.img1
+                            },
+                            'backing': {
+                                'node-name': 'bottom-filter',
+                                'driver': 'throttle',
+                                'throttle-group': 'tg',
+                                'file': {
+                                    'node-name': 'cow-0',
+                                    'driver': iotests.imgfmt,
+                                    'file': {
+                                        'driver': 'file',
+                                        'filename': self.img0
+                                    }
+                                }
+                            }
+                        }
+                    }
+                }
+            })
+        self.assert_qmp(result, 'return', {})
+
+    def tearDown(self):
+        self.vm.shutdown()
+        os.remove(self.img3)
+        os.remove(self.img2)
+        os.remove(self.img1)
+        os.remove(self.img0)
+
+    # Filters make for funny filenames, so we cannot just use
+    # self.imgX for the block-commit parameters
+    def get_filename(self, node):
+        return self.vm.node_info(node)['image']['filename']
+
+    def test_filterless_commit(self):
+        self.assert_no_active_block_jobs()
+        result = self.vm.qmp('block-commit',
+                             job_id='commit',
+                             device='top-filter',
+                             top=self.get_filename('cow-2'),
+                             base=self.get_filename('cow-1'))
+        self.assert_qmp(result, 'return', {})
+        self.wait_until_completed(drive='commit')
+
+    def test_commit_through_filter(self):
+        self.assert_no_active_block_jobs()
+        result = self.vm.qmp('block-commit',
+                             job_id='commit',
+                             device='top-filter',
+                             top=self.get_filename('cow-1'),
+                             base=self.get_filename('cow-0'))
+        # Cannot commit through explicitly added filters (yet,
+        # although in the future we probably want to make users use
+        # blockdev-copy for this)
+        self.assert_qmp(result, 'error/class', 'GenericError')
+        self.assert_qmp(result, 'error/desc', 'Cannot commit through explicit filter nodes')
+
+    def test_filtered_active_commit_with_filter(self):
+        self.assert_no_active_block_jobs()
+        result = self.vm.qmp('block-commit',
+                             job_id='commit',
+                             device='top-filter',
+                             base=self.get_filename('cow-2'))
+        # Not specifying @top means active commit, so including the
+        # filter on top (which is not allowed right now)
+        self.assert_qmp(result, 'error/class', 'GenericError')
+        self.assert_qmp(result, 'error/desc', 'Cannot commit through explicit filter nodes')
+
+    def test_filtered_active_commit_without_filter(self):
+        cow3_name = self.get_filename('cow-3')
+
+        self.assert_no_active_block_jobs()
+        result = self.vm.qmp('block-commit',
+                             job_id='commit',
+                             device='top-filter',
+                             top=cow3_name,
+                             base=self.get_filename('cow-2'))
+        # This is how you'd want to specify committing img3 into img2
+        # disregarding the filter on top of img3 -- but that does not
+        # work, because you can only specify names of backing files
+        # (and img3 is not a backing file).  The solution for this
+        # would be for block-commit to accept node names.
+        # Note that even if it did work, the above command would
+        # result in a non-active commit, because img3 is not the top
+        # node.  Which is wrong, because img3 can still be written to,
+        # so it should be an active commit, but that is a different
+        # story.
+        self.assert_qmp(result, 'error/class', 'GenericError')
+        self.assert_qmp(result, 'error/desc',
+                        'Top image file %s not found' % cow3_name)
+
 if __name__ == '__main__':
     iotests.main(supported_fmts=['qcow2', 'qed'])
diff --git a/tests/qemu-iotests/040.out b/tests/qemu-iotests/040.out
index 802ffaa0c0..220a5fa82c 100644
--- a/tests/qemu-iotests/040.out
+++ b/tests/qemu-iotests/040.out
@@ -1,5 +1,5 @@
-...........................................
+...............................................
 ----------------------------------------------------------------------
-Ran 43 tests
+Ran 47 tests
 
 OK
-- 
2.20.1



^ permalink raw reply	[flat|nested] 48+ messages in thread

* [Qemu-devel] [PATCH v4 09/11] iotests: Add filter mirror test cases
@ 2019-04-10 20:20   ` Max Reitz
  0 siblings, 0 replies; 48+ messages in thread
From: Max Reitz @ 2019-04-10 20:20 UTC (permalink / raw)
  To: qemu-block; +Cc: qemu-devel, Max Reitz, Kevin Wolf, Eric Blake

This patch adds some test cases how mirroring relates to filters.  One
of them tests what happens when you mirror off a filtered COW node, two
others use the mirror filter node as basically our only example of an
implicitly created filter node so far (besides the commit filter).

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 tests/qemu-iotests/041     | 146 ++++++++++++++++++++++++++++++++++++-
 tests/qemu-iotests/041.out |   4 +-
 2 files changed, 147 insertions(+), 3 deletions(-)

diff --git a/tests/qemu-iotests/041 b/tests/qemu-iotests/041
index 0c1432f189..c2b5299f62 100755
--- a/tests/qemu-iotests/041
+++ b/tests/qemu-iotests/041
@@ -20,8 +20,9 @@
 
 import time
 import os
+import json
 import iotests
-from iotests import qemu_img, qemu_io
+from iotests import qemu_img, qemu_img_pipe, qemu_io
 
 backing_img = os.path.join(iotests.test_dir, 'backing.img')
 target_backing_img = os.path.join(iotests.test_dir, 'target-backing.img')
@@ -1191,5 +1192,148 @@ class TestReplaces(iotests.QMPTestCase):
         os.remove(test_img)
         os.remove(target_img)
 
+# Tests for mirror with filters (and how the mirror filter behaves, as
+# an example for an implicit filter)
+class TestFilters(iotests.QMPTestCase):
+    def setUp(self):
+        qemu_img('create', '-f', iotests.imgfmt, backing_img, '1M')
+        qemu_img('create', '-f', iotests.imgfmt, '-b', backing_img, test_img)
+        qemu_img('create', '-f', iotests.imgfmt, '-b', backing_img, target_img)
+
+        qemu_io('-c', 'write -P 1 0 512k', backing_img)
+        qemu_io('-c', 'write -P 2 512k 512k', test_img)
+
+        self.vm = iotests.VM()
+        self.vm.launch()
+
+        result = self.vm.qmp('blockdev-add', **{
+                                'node-name': 'target',
+                                'driver': iotests.imgfmt,
+                                'file': {
+                                    'driver': 'file',
+                                    'filename': target_img
+                                },
+                                'backing': None
+                            })
+        self.assert_qmp(result, 'return', {})
+
+        self.filterless_chain = {
+                'node-name': 'source',
+                'driver': iotests.imgfmt,
+                'file': {
+                    'driver': 'file',
+                    'filename': test_img
+                },
+                'backing': {
+                    'node-name': 'backing',
+                    'driver': iotests.imgfmt,
+                    'file': {
+                        'driver': 'file',
+                        'filename': backing_img
+                    }
+                }
+            }
+
+    def tearDown(self):
+        self.vm.shutdown()
+
+        os.remove(test_img)
+        os.remove(target_img)
+        os.remove(backing_img)
+
+    def test_cor(self):
+        result = self.vm.qmp('blockdev-add', **{
+                                'node-name': 'filter',
+                                'driver': 'copy-on-read',
+                                'file': self.filterless_chain
+                            })
+        self.assert_qmp(result, 'return', {})
+
+        result = self.vm.qmp('blockdev-mirror',
+                             job_id='mirror',
+                             device='filter',
+                             target='target',
+                             sync='top')
+        self.assert_qmp(result, 'return', {})
+
+        self.complete_and_wait('mirror')
+
+        self.vm.qmp('blockdev-del', node_name='target')
+
+        target_map = qemu_img_pipe('map', '--output=json', target_img)
+        target_map = json.loads(target_map)
+
+        assert target_map[0]['start'] == 0
+        assert target_map[0]['length'] == 512 * 1024
+        assert target_map[0]['depth'] == 1
+
+        assert target_map[1]['start'] == 512 * 1024
+        assert target_map[1]['length'] == 512 * 1024
+        assert target_map[1]['depth'] == 0
+
+    def test_implicit_mirror_filter(self):
+        result = self.vm.qmp('blockdev-add', **self.filterless_chain)
+        self.assert_qmp(result, 'return', {})
+
+        # We need this so we can query from above the mirror node
+        result = self.vm.qmp('device_add',
+                             driver='virtio-blk',
+                             id='virtio',
+                             bus='pci.0',
+                             drive='source')
+        self.assert_qmp(result, 'return', {})
+
+        result = self.vm.qmp('blockdev-mirror',
+                             job_id='mirror',
+                             device='source',
+                             target='target',
+                             sync='top')
+        self.assert_qmp(result, 'return', {})
+
+        # The mirror filter is now an implicit node, so it should be
+        # invisible when querying the backing chain
+        device_info = self.vm.qmp('query-block')['return'][0]
+        assert device_info['qdev'] == '/machine/peripheral/virtio/virtio-backend'
+
+        assert device_info['inserted']['node-name'] == 'source'
+
+        image_info = device_info['inserted']['image']
+        assert image_info['filename'] == test_img
+        assert image_info['backing-image']['filename'] == backing_img
+
+        self.complete_and_wait('mirror')
+
+    def test_explicit_mirror_filter(self):
+        # Same test as above, but this time we give the mirror filter
+        # a node-name so it will not be invisible
+        result = self.vm.qmp('blockdev-add', **self.filterless_chain)
+        self.assert_qmp(result, 'return', {})
+
+        # We need this so we can query from above the mirror node
+        result = self.vm.qmp('device_add',
+                             driver='virtio-blk',
+                             id='virtio',
+                             bus='pci.0',
+                             drive='source')
+        self.assert_qmp(result, 'return', {})
+
+        result = self.vm.qmp('blockdev-mirror',
+                             job_id='mirror',
+                             device='source',
+                             target='target',
+                             sync='top',
+                             filter_node_name='mirror-filter')
+        self.assert_qmp(result, 'return', {})
+
+        # With a node-name given to it, the mirror filter should now
+        # be visible
+        device_info = self.vm.qmp('query-block')['return'][0]
+        assert device_info['qdev'] == '/machine/peripheral/virtio/virtio-backend'
+
+        assert device_info['inserted']['node-name'] == 'mirror-filter'
+
+        self.complete_and_wait('mirror')
+
+
 if __name__ == '__main__':
     iotests.main(supported_fmts=['qcow2', 'qed'])
diff --git a/tests/qemu-iotests/041.out b/tests/qemu-iotests/041.out
index 2c448b4239..ffc779b4d1 100644
--- a/tests/qemu-iotests/041.out
+++ b/tests/qemu-iotests/041.out
@@ -1,5 +1,5 @@
-..........................................................................................
+.............................................................................................
 ----------------------------------------------------------------------
-Ran 90 tests
+Ran 93 tests
 
 OK
-- 
2.20.1

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [Qemu-devel] [PATCH v4 09/11] iotests: Add filter mirror test cases
@ 2019-04-10 20:20   ` Max Reitz
  0 siblings, 0 replies; 48+ messages in thread
From: Max Reitz @ 2019-04-10 20:20 UTC (permalink / raw)
  To: qemu-block; +Cc: Kevin Wolf, qemu-devel, Max Reitz

This patch adds some test cases how mirroring relates to filters.  One
of them tests what happens when you mirror off a filtered COW node, two
others use the mirror filter node as basically our only example of an
implicitly created filter node so far (besides the commit filter).

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 tests/qemu-iotests/041     | 146 ++++++++++++++++++++++++++++++++++++-
 tests/qemu-iotests/041.out |   4 +-
 2 files changed, 147 insertions(+), 3 deletions(-)

diff --git a/tests/qemu-iotests/041 b/tests/qemu-iotests/041
index 0c1432f189..c2b5299f62 100755
--- a/tests/qemu-iotests/041
+++ b/tests/qemu-iotests/041
@@ -20,8 +20,9 @@
 
 import time
 import os
+import json
 import iotests
-from iotests import qemu_img, qemu_io
+from iotests import qemu_img, qemu_img_pipe, qemu_io
 
 backing_img = os.path.join(iotests.test_dir, 'backing.img')
 target_backing_img = os.path.join(iotests.test_dir, 'target-backing.img')
@@ -1191,5 +1192,148 @@ class TestReplaces(iotests.QMPTestCase):
         os.remove(test_img)
         os.remove(target_img)
 
+# Tests for mirror with filters (and how the mirror filter behaves, as
+# an example for an implicit filter)
+class TestFilters(iotests.QMPTestCase):
+    def setUp(self):
+        qemu_img('create', '-f', iotests.imgfmt, backing_img, '1M')
+        qemu_img('create', '-f', iotests.imgfmt, '-b', backing_img, test_img)
+        qemu_img('create', '-f', iotests.imgfmt, '-b', backing_img, target_img)
+
+        qemu_io('-c', 'write -P 1 0 512k', backing_img)
+        qemu_io('-c', 'write -P 2 512k 512k', test_img)
+
+        self.vm = iotests.VM()
+        self.vm.launch()
+
+        result = self.vm.qmp('blockdev-add', **{
+                                'node-name': 'target',
+                                'driver': iotests.imgfmt,
+                                'file': {
+                                    'driver': 'file',
+                                    'filename': target_img
+                                },
+                                'backing': None
+                            })
+        self.assert_qmp(result, 'return', {})
+
+        self.filterless_chain = {
+                'node-name': 'source',
+                'driver': iotests.imgfmt,
+                'file': {
+                    'driver': 'file',
+                    'filename': test_img
+                },
+                'backing': {
+                    'node-name': 'backing',
+                    'driver': iotests.imgfmt,
+                    'file': {
+                        'driver': 'file',
+                        'filename': backing_img
+                    }
+                }
+            }
+
+    def tearDown(self):
+        self.vm.shutdown()
+
+        os.remove(test_img)
+        os.remove(target_img)
+        os.remove(backing_img)
+
+    def test_cor(self):
+        result = self.vm.qmp('blockdev-add', **{
+                                'node-name': 'filter',
+                                'driver': 'copy-on-read',
+                                'file': self.filterless_chain
+                            })
+        self.assert_qmp(result, 'return', {})
+
+        result = self.vm.qmp('blockdev-mirror',
+                             job_id='mirror',
+                             device='filter',
+                             target='target',
+                             sync='top')
+        self.assert_qmp(result, 'return', {})
+
+        self.complete_and_wait('mirror')
+
+        self.vm.qmp('blockdev-del', node_name='target')
+
+        target_map = qemu_img_pipe('map', '--output=json', target_img)
+        target_map = json.loads(target_map)
+
+        assert target_map[0]['start'] == 0
+        assert target_map[0]['length'] == 512 * 1024
+        assert target_map[0]['depth'] == 1
+
+        assert target_map[1]['start'] == 512 * 1024
+        assert target_map[1]['length'] == 512 * 1024
+        assert target_map[1]['depth'] == 0
+
+    def test_implicit_mirror_filter(self):
+        result = self.vm.qmp('blockdev-add', **self.filterless_chain)
+        self.assert_qmp(result, 'return', {})
+
+        # We need this so we can query from above the mirror node
+        result = self.vm.qmp('device_add',
+                             driver='virtio-blk',
+                             id='virtio',
+                             bus='pci.0',
+                             drive='source')
+        self.assert_qmp(result, 'return', {})
+
+        result = self.vm.qmp('blockdev-mirror',
+                             job_id='mirror',
+                             device='source',
+                             target='target',
+                             sync='top')
+        self.assert_qmp(result, 'return', {})
+
+        # The mirror filter is now an implicit node, so it should be
+        # invisible when querying the backing chain
+        device_info = self.vm.qmp('query-block')['return'][0]
+        assert device_info['qdev'] == '/machine/peripheral/virtio/virtio-backend'
+
+        assert device_info['inserted']['node-name'] == 'source'
+
+        image_info = device_info['inserted']['image']
+        assert image_info['filename'] == test_img
+        assert image_info['backing-image']['filename'] == backing_img
+
+        self.complete_and_wait('mirror')
+
+    def test_explicit_mirror_filter(self):
+        # Same test as above, but this time we give the mirror filter
+        # a node-name so it will not be invisible
+        result = self.vm.qmp('blockdev-add', **self.filterless_chain)
+        self.assert_qmp(result, 'return', {})
+
+        # We need this so we can query from above the mirror node
+        result = self.vm.qmp('device_add',
+                             driver='virtio-blk',
+                             id='virtio',
+                             bus='pci.0',
+                             drive='source')
+        self.assert_qmp(result, 'return', {})
+
+        result = self.vm.qmp('blockdev-mirror',
+                             job_id='mirror',
+                             device='source',
+                             target='target',
+                             sync='top',
+                             filter_node_name='mirror-filter')
+        self.assert_qmp(result, 'return', {})
+
+        # With a node-name given to it, the mirror filter should now
+        # be visible
+        device_info = self.vm.qmp('query-block')['return'][0]
+        assert device_info['qdev'] == '/machine/peripheral/virtio/virtio-backend'
+
+        assert device_info['inserted']['node-name'] == 'mirror-filter'
+
+        self.complete_and_wait('mirror')
+
+
 if __name__ == '__main__':
     iotests.main(supported_fmts=['qcow2', 'qed'])
diff --git a/tests/qemu-iotests/041.out b/tests/qemu-iotests/041.out
index 2c448b4239..ffc779b4d1 100644
--- a/tests/qemu-iotests/041.out
+++ b/tests/qemu-iotests/041.out
@@ -1,5 +1,5 @@
-..........................................................................................
+.............................................................................................
 ----------------------------------------------------------------------
-Ran 90 tests
+Ran 93 tests
 
 OK
-- 
2.20.1



^ permalink raw reply	[flat|nested] 48+ messages in thread

* [Qemu-devel] [PATCH v4 10/11] iotests: Add test for commit in sub directory
@ 2019-04-10 20:20   ` Max Reitz
  0 siblings, 0 replies; 48+ messages in thread
From: Max Reitz @ 2019-04-10 20:20 UTC (permalink / raw)
  To: qemu-block; +Cc: qemu-devel, Max Reitz, Kevin Wolf, Eric Blake

Add a test for committing an overlay in a sub directory to one of the
images in its backing chain, using both relative and absolute filenames.

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 tests/qemu-iotests/020     | 36 ++++++++++++++++++++++++++++++++++++
 tests/qemu-iotests/020.out | 10 ++++++++++
 2 files changed, 46 insertions(+)

diff --git a/tests/qemu-iotests/020 b/tests/qemu-iotests/020
index 71fa753b4e..cfcbc0cf45 100755
--- a/tests/qemu-iotests/020
+++ b/tests/qemu-iotests/020
@@ -31,6 +31,11 @@ _cleanup()
 	_cleanup_test_img
     rm -f "$TEST_IMG.base"
     rm -f "$TEST_IMG.orig"
+
+    rm -f "$TEST_DIR/subdir/t.$IMGFMT.base"
+    rm -f "$TEST_DIR/subdir/t.$IMGFMT.mid"
+    rm -f "$TEST_DIR/subdir/t.$IMGFMT"
+    rmdir "$TEST_DIR/subdir" &> /dev/null
 }
 trap "_cleanup; exit \$status" 0 1 2 3 15
 
@@ -134,6 +139,37 @@ $QEMU_IO -c 'writev 0 64k' "$TEST_IMG" | _filter_qemu_io
 $QEMU_IMG commit "$TEST_IMG"
 _cleanup
 
+
+echo
+echo 'Testing commit in sub-directory with relative filenames'
+echo
+
+pushd "$TEST_DIR" > /dev/null
+
+mkdir subdir
+
+TEST_IMG="subdir/t.$IMGFMT.base" _make_test_img 1M
+TEST_IMG="subdir/t.$IMGFMT.mid" _make_test_img -b "t.$IMGFMT.base"
+TEST_IMG="subdir/t.$IMGFMT" _make_test_img -b "t.$IMGFMT.mid"
+
+# Should work
+$QEMU_IMG commit -b "t.$IMGFMT.mid" "subdir/t.$IMGFMT"
+
+# Might theoretically work, but does not in practice (we have to
+# decide between this and the above; and since we always represent
+# backing file names as relative to the overlay, we go for the above)
+$QEMU_IMG commit -b "subdir/t.$IMGFMT.mid" "subdir/t.$IMGFMT" 2>&1 | \
+    _filter_imgfmt
+
+# This should work as well
+$QEMU_IMG commit -b "$TEST_DIR/subdir/t.$IMGFMT.mid" "subdir/t.$IMGFMT"
+
+popd > /dev/null
+
+# Now let's try with just absolute filenames
+$QEMU_IMG commit -b "$TEST_DIR/subdir/t.$IMGFMT.mid" \
+    "$TEST_DIR/subdir/t.$IMGFMT"
+
 # success, all done
 echo "*** done"
 rm -f $seq.full
diff --git a/tests/qemu-iotests/020.out b/tests/qemu-iotests/020.out
index 4b722b2dd0..228c37dded 100644
--- a/tests/qemu-iotests/020.out
+++ b/tests/qemu-iotests/020.out
@@ -1094,4 +1094,14 @@ Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1048576 backing_file=json:{'driv
 wrote 65536/65536 bytes at offset 0
 64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 qemu-img: Block job failed: No space left on device
+
+Testing commit in sub-directory with relative filenames
+
+Formatting 'subdir/t.IMGFMT.base', fmt=IMGFMT size=1048576
+Formatting 'subdir/t.IMGFMT.mid', fmt=IMGFMT size=1048576 backing_file=t.IMGFMT.base
+Formatting 'subdir/t.IMGFMT', fmt=IMGFMT size=1048576 backing_file=t.IMGFMT.mid
+Image committed.
+qemu-img: Did not find 'subdir/t.IMGFMT.mid' in the backing chain of 'subdir/t.IMGFMT'
+Image committed.
+Image committed.
 *** done
-- 
2.20.1

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [Qemu-devel] [PATCH v4 10/11] iotests: Add test for commit in sub directory
@ 2019-04-10 20:20   ` Max Reitz
  0 siblings, 0 replies; 48+ messages in thread
From: Max Reitz @ 2019-04-10 20:20 UTC (permalink / raw)
  To: qemu-block; +Cc: Kevin Wolf, qemu-devel, Max Reitz

Add a test for committing an overlay in a sub directory to one of the
images in its backing chain, using both relative and absolute filenames.

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 tests/qemu-iotests/020     | 36 ++++++++++++++++++++++++++++++++++++
 tests/qemu-iotests/020.out | 10 ++++++++++
 2 files changed, 46 insertions(+)

diff --git a/tests/qemu-iotests/020 b/tests/qemu-iotests/020
index 71fa753b4e..cfcbc0cf45 100755
--- a/tests/qemu-iotests/020
+++ b/tests/qemu-iotests/020
@@ -31,6 +31,11 @@ _cleanup()
 	_cleanup_test_img
     rm -f "$TEST_IMG.base"
     rm -f "$TEST_IMG.orig"
+
+    rm -f "$TEST_DIR/subdir/t.$IMGFMT.base"
+    rm -f "$TEST_DIR/subdir/t.$IMGFMT.mid"
+    rm -f "$TEST_DIR/subdir/t.$IMGFMT"
+    rmdir "$TEST_DIR/subdir" &> /dev/null
 }
 trap "_cleanup; exit \$status" 0 1 2 3 15
 
@@ -134,6 +139,37 @@ $QEMU_IO -c 'writev 0 64k' "$TEST_IMG" | _filter_qemu_io
 $QEMU_IMG commit "$TEST_IMG"
 _cleanup
 
+
+echo
+echo 'Testing commit in sub-directory with relative filenames'
+echo
+
+pushd "$TEST_DIR" > /dev/null
+
+mkdir subdir
+
+TEST_IMG="subdir/t.$IMGFMT.base" _make_test_img 1M
+TEST_IMG="subdir/t.$IMGFMT.mid" _make_test_img -b "t.$IMGFMT.base"
+TEST_IMG="subdir/t.$IMGFMT" _make_test_img -b "t.$IMGFMT.mid"
+
+# Should work
+$QEMU_IMG commit -b "t.$IMGFMT.mid" "subdir/t.$IMGFMT"
+
+# Might theoretically work, but does not in practice (we have to
+# decide between this and the above; and since we always represent
+# backing file names as relative to the overlay, we go for the above)
+$QEMU_IMG commit -b "subdir/t.$IMGFMT.mid" "subdir/t.$IMGFMT" 2>&1 | \
+    _filter_imgfmt
+
+# This should work as well
+$QEMU_IMG commit -b "$TEST_DIR/subdir/t.$IMGFMT.mid" "subdir/t.$IMGFMT"
+
+popd > /dev/null
+
+# Now let's try with just absolute filenames
+$QEMU_IMG commit -b "$TEST_DIR/subdir/t.$IMGFMT.mid" \
+    "$TEST_DIR/subdir/t.$IMGFMT"
+
 # success, all done
 echo "*** done"
 rm -f $seq.full
diff --git a/tests/qemu-iotests/020.out b/tests/qemu-iotests/020.out
index 4b722b2dd0..228c37dded 100644
--- a/tests/qemu-iotests/020.out
+++ b/tests/qemu-iotests/020.out
@@ -1094,4 +1094,14 @@ Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1048576 backing_file=json:{'driv
 wrote 65536/65536 bytes at offset 0
 64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 qemu-img: Block job failed: No space left on device
+
+Testing commit in sub-directory with relative filenames
+
+Formatting 'subdir/t.IMGFMT.base', fmt=IMGFMT size=1048576
+Formatting 'subdir/t.IMGFMT.mid', fmt=IMGFMT size=1048576 backing_file=t.IMGFMT.base
+Formatting 'subdir/t.IMGFMT', fmt=IMGFMT size=1048576 backing_file=t.IMGFMT.mid
+Image committed.
+qemu-img: Did not find 'subdir/t.IMGFMT.mid' in the backing chain of 'subdir/t.IMGFMT'
+Image committed.
+Image committed.
 *** done
-- 
2.20.1



^ permalink raw reply	[flat|nested] 48+ messages in thread

* [Qemu-devel] [PATCH v4 11/11] iotests: Test committing to overridden backing
@ 2019-04-10 20:20   ` Max Reitz
  0 siblings, 0 replies; 48+ messages in thread
From: Max Reitz @ 2019-04-10 20:20 UTC (permalink / raw)
  To: qemu-block; +Cc: qemu-devel, Max Reitz, Kevin Wolf, Eric Blake

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 tests/qemu-iotests/040     | 61 ++++++++++++++++++++++++++++++++++++++
 tests/qemu-iotests/040.out |  4 +--
 2 files changed, 63 insertions(+), 2 deletions(-)

diff --git a/tests/qemu-iotests/040 b/tests/qemu-iotests/040
index dc3fe57fbd..155c1d8c23 100755
--- a/tests/qemu-iotests/040
+++ b/tests/qemu-iotests/040
@@ -524,5 +524,66 @@ class TestCommitWithFilters(iotests.QMPTestCase):
         self.assert_qmp(result, 'error/desc',
                         'Top image file %s not found' % cow3_name)
 
+class TestCommitWithOverriddenBacking(iotests.QMPTestCase):
+    img_base_a = os.path.join(iotests.test_dir, 'base_a.img')
+    img_base_b = os.path.join(iotests.test_dir, 'base_b.img')
+    img_top = os.path.join(iotests.test_dir, 'top.img')
+
+    def setUp(self):
+        qemu_img('create', '-f', iotests.imgfmt, self.img_base_a, '1M')
+        qemu_img('create', '-f', iotests.imgfmt, self.img_base_b, '1M')
+        qemu_img('create', '-f', iotests.imgfmt, '-b', self.img_base_a, \
+                 self.img_top)
+
+        self.vm = iotests.VM()
+        self.vm.launch()
+
+        # Use base_b instead of base_a as the backing of top
+        result = self.vm.qmp('blockdev-add', **{
+                                'node-name': 'top',
+                                'driver': iotests.imgfmt,
+                                'file': {
+                                    'driver': 'file',
+                                    'filename': self.img_top
+                                },
+                                'backing': {
+                                    'node-name': 'base',
+                                    'driver': iotests.imgfmt,
+                                    'file': {
+                                        'driver': 'file',
+                                        'filename': self.img_base_b
+                                    }
+                                }
+                            })
+        self.assert_qmp(result, 'return', {})
+
+    def tearDown(self):
+        self.vm.shutdown()
+        os.remove(self.img_top)
+        os.remove(self.img_base_a)
+        os.remove(self.img_base_b)
+
+    def test_commit_to_a(self):
+        # Try committing to base_a (which should fail, as top's
+        # backing image is base_b instead)
+        result = self.vm.qmp('block-commit',
+                             job_id='commit',
+                             device='top',
+                             base=self.img_base_a)
+        self.assert_qmp(result, 'error/class', 'GenericError')
+
+    def test_commit_to_b(self):
+        # Try committing to base_b (which should work, since that is
+        # actually top's backing image)
+        result = self.vm.qmp('block-commit',
+                             job_id='commit',
+                             device='top',
+                             base=self.img_base_b)
+        self.assert_qmp(result, 'return', {})
+
+        self.vm.event_wait('BLOCK_JOB_READY')
+        self.vm.qmp('block-job-complete', device='commit')
+        self.vm.event_wait('BLOCK_JOB_COMPLETED')
+
 if __name__ == '__main__':
     iotests.main(supported_fmts=['qcow2', 'qed'])
diff --git a/tests/qemu-iotests/040.out b/tests/qemu-iotests/040.out
index 220a5fa82c..bbcbc202a0 100644
--- a/tests/qemu-iotests/040.out
+++ b/tests/qemu-iotests/040.out
@@ -1,5 +1,5 @@
-...............................................
+.................................................
 ----------------------------------------------------------------------
-Ran 47 tests
+Ran 49 tests
 
 OK
-- 
2.20.1

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [Qemu-devel] [PATCH v4 11/11] iotests: Test committing to overridden backing
@ 2019-04-10 20:20   ` Max Reitz
  0 siblings, 0 replies; 48+ messages in thread
From: Max Reitz @ 2019-04-10 20:20 UTC (permalink / raw)
  To: qemu-block; +Cc: Kevin Wolf, qemu-devel, Max Reitz

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 tests/qemu-iotests/040     | 61 ++++++++++++++++++++++++++++++++++++++
 tests/qemu-iotests/040.out |  4 +--
 2 files changed, 63 insertions(+), 2 deletions(-)

diff --git a/tests/qemu-iotests/040 b/tests/qemu-iotests/040
index dc3fe57fbd..155c1d8c23 100755
--- a/tests/qemu-iotests/040
+++ b/tests/qemu-iotests/040
@@ -524,5 +524,66 @@ class TestCommitWithFilters(iotests.QMPTestCase):
         self.assert_qmp(result, 'error/desc',
                         'Top image file %s not found' % cow3_name)
 
+class TestCommitWithOverriddenBacking(iotests.QMPTestCase):
+    img_base_a = os.path.join(iotests.test_dir, 'base_a.img')
+    img_base_b = os.path.join(iotests.test_dir, 'base_b.img')
+    img_top = os.path.join(iotests.test_dir, 'top.img')
+
+    def setUp(self):
+        qemu_img('create', '-f', iotests.imgfmt, self.img_base_a, '1M')
+        qemu_img('create', '-f', iotests.imgfmt, self.img_base_b, '1M')
+        qemu_img('create', '-f', iotests.imgfmt, '-b', self.img_base_a, \
+                 self.img_top)
+
+        self.vm = iotests.VM()
+        self.vm.launch()
+
+        # Use base_b instead of base_a as the backing of top
+        result = self.vm.qmp('blockdev-add', **{
+                                'node-name': 'top',
+                                'driver': iotests.imgfmt,
+                                'file': {
+                                    'driver': 'file',
+                                    'filename': self.img_top
+                                },
+                                'backing': {
+                                    'node-name': 'base',
+                                    'driver': iotests.imgfmt,
+                                    'file': {
+                                        'driver': 'file',
+                                        'filename': self.img_base_b
+                                    }
+                                }
+                            })
+        self.assert_qmp(result, 'return', {})
+
+    def tearDown(self):
+        self.vm.shutdown()
+        os.remove(self.img_top)
+        os.remove(self.img_base_a)
+        os.remove(self.img_base_b)
+
+    def test_commit_to_a(self):
+        # Try committing to base_a (which should fail, as top's
+        # backing image is base_b instead)
+        result = self.vm.qmp('block-commit',
+                             job_id='commit',
+                             device='top',
+                             base=self.img_base_a)
+        self.assert_qmp(result, 'error/class', 'GenericError')
+
+    def test_commit_to_b(self):
+        # Try committing to base_b (which should work, since that is
+        # actually top's backing image)
+        result = self.vm.qmp('block-commit',
+                             job_id='commit',
+                             device='top',
+                             base=self.img_base_b)
+        self.assert_qmp(result, 'return', {})
+
+        self.vm.event_wait('BLOCK_JOB_READY')
+        self.vm.qmp('block-job-complete', device='commit')
+        self.vm.event_wait('BLOCK_JOB_COMPLETED')
+
 if __name__ == '__main__':
     iotests.main(supported_fmts=['qcow2', 'qed'])
diff --git a/tests/qemu-iotests/040.out b/tests/qemu-iotests/040.out
index 220a5fa82c..bbcbc202a0 100644
--- a/tests/qemu-iotests/040.out
+++ b/tests/qemu-iotests/040.out
@@ -1,5 +1,5 @@
-...............................................
+.................................................
 ----------------------------------------------------------------------
-Ran 47 tests
+Ran 49 tests
 
 OK
-- 
2.20.1



^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [PATCH v4 02/11] block: Filtered children access functions
  2019-04-10 20:20   ` Max Reitz
  (?)
@ 2019-04-16 10:02   ` Vladimir Sementsov-Ogievskiy
  2019-04-17 16:22     ` Max Reitz
  2019-05-31 16:26     ` Max Reitz
  -1 siblings, 2 replies; 48+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2019-04-16 10:02 UTC (permalink / raw)
  To: Max Reitz, qemu-block; +Cc: Kevin Wolf, qemu-devel

10.04.2019 23:20, Max Reitz wrote:
> What bs->file and bs->backing mean depends on the node.  For filter
> nodes, both signify a node that will eventually receive all R/W
> accesses.  For format nodes, bs->file contains metadata and data, and
> bs->backing will not receive writes -- instead, writes are COWed to
> bs->file.  Usually.
> 
> In any case, it is not trivial to guess what a child means exactly with
> our currently limited form of expression.  It is better to introduce
> some functions that actually guarantee a meaning:
> 
> - bdrv_filtered_cow_child() will return the child that receives requests
>    filtered through COW.  That is, reads may or may not be forwarded
>    (depending on the overlay's allocation status), but writes never go to
>    this child.
> 
> - bdrv_filtered_rw_child() will return the child that receives requests
>    filtered through some very plain process.  Reads and writes issued to
>    the parent will go to the child as well (although timing, etc. may be
>    modified).
> 
> - All drivers but quorum (but quorum is pretty opaque to the general
>    block layer anyway) always only have one of these children: All read
>    requests must be served from the filtered_rw_child (if it exists), so
>    if there was a filtered_cow_child in addition, it would not receive
>    any requests at all.
>    (The closest here is mirror, where all requests are passed on to the
>    source, but with write-blocking, write requests are "COWed" to the
>    target.  But that just means that the target is a special child that
>    cannot be introspected by the generic block layer functions, and that
>    source is a filtered_rw_child.)
>    Therefore, we can also add bdrv_filtered_child() which returns that
>    one child (or NULL, if there is no filtered child).
> 
> Also, many places in the current block layer should be skipping filters
> (all filters or just the ones added implicitly, it depends) when going
> through a block node chain.  They do not do that currently, but this
> patch makes them.
> 
> One example for this is qemu-img map, which should skip filters and only
> look at the COW elements in the graph.  The change to iotest 204's
> reference output shows how using blkdebug on top of a COW node used to
> make qemu-img map disregard the rest of the backing chain, but with this
> patch, the allocation in the base image is reported correctly.
> 
> Furthermore, a note should be made that sometimes we do want to access
> bs->backing directly.  This is whenever the operation in question is not
> about accessing the COW child, but the "backing" child, be it COW or
> not.  This is the case in functions such as bdrv_open_backing_file() or
> whenever we have to deal with the special behavior of @backing as a
> blockdev option, which is that it does not default to null like all
> other child references do.
> 
> Finally, the query functions (query-block and query-named-block-nodes)
> are modified to return any filtered child under "backing", not just
> bs->backing or COW children.  This is so that filters do not interrupt
> the reported backing chain.  This changes the output of iotest 184, as
> the throttled node now appears as a backing child.
> 
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>   qapi/block-core.json           |   4 +
>   include/block/block.h          |   1 +
>   include/block/block_int.h      |  40 +++++--
>   block.c                        | 210 +++++++++++++++++++++++++++------
>   block/backup.c                 |   8 +-
>   block/block-backend.c          |  16 ++-
>   block/commit.c                 |  33 +++---
>   block/io.c                     |  45 ++++---
>   block/mirror.c                 |  21 ++--
>   block/qapi.c                   |  30 +++--
>   block/stream.c                 |  13 +-
>   blockdev.c                     |  88 +++++++++++---
>   migration/block-dirty-bitmap.c |   4 +-
>   nbd/server.c                   |   6 +-
>   qemu-img.c                     |  29 ++---
>   tests/qemu-iotests/184.out     |   7 +-
>   tests/qemu-iotests/204.out     |   1 +
>   17 files changed, 411 insertions(+), 145 deletions(-)

really huge... didn't you consider conversion file-by-file?

[..]

> diff --git a/block.c b/block.c
> index 16615bc876..e8f6febda0 100644
> --- a/block.c
> +++ b/block.c

[..]

>   
> @@ -3467,14 +3469,17 @@ static int bdrv_reopen_parse_backing(BDRVReopenState *reopen_state,
>       /*
>        * Find the "actual" backing file by skipping all links that point
>        * to an implicit node, if any (e.g. a commit filter node).
> +     * We cannot use any of the bdrv_skip_*() functions here because
> +     * those return the first explicit node, while we are looking for
> +     * its overlay here.
>        */
>       overlay_bs = bs;
> -    while (backing_bs(overlay_bs) && backing_bs(overlay_bs)->implicit) {
> -        overlay_bs = backing_bs(overlay_bs);
> +    while (overlay_bs->backing && bdrv_filtered_bs(overlay_bs)->implicit) {

So, you don't want to skip implicit filters with 'file' child? Then, why not to use
child_bs(overlay_bs->backing), like in following if condition?

Could we instead make backing-based filters equal to file-based, to make it possible
to use file-based filters in backing-chain related scenarios (like upcoming copy-on-read
filter for stream)? So, to expand backing-chain concept to include filters with file child?


> +        overlay_bs = bdrv_filtered_bs(overlay_bs);
>       }
>   
>       /* If we want to replace the backing file we need some extra checks */
> -    if (new_backing_bs != backing_bs(overlay_bs)) {
> +    if (new_backing_bs != child_bs(overlay_bs->backing)) { >           /* Check for implicit nodes between bs and its backing file */
>           if (bs != overlay_bs) {
>               error_setg(errp, "Cannot change backing link if '%s' has "

[..]

> @@ -4203,8 +4208,8 @@ int bdrv_change_backing_file(BlockDriverState *bs,
>   BlockDriverState *bdrv_find_overlay(BlockDriverState *active,
>                                       BlockDriverState *bs)
>   {
> -    while (active && bs != backing_bs(active)) {
> -        active = backing_bs(active);
> +    while (active && bs != bdrv_filtered_bs(active)) {

hmm and here you actually support backing-chain with file-child-based filters in it..

> +        active = bdrv_filtered_bs(active);
>       }
>   
>       return active;
> @@ -4226,11 +4231,11 @@ bool bdrv_is_backing_chain_frozen(BlockDriverState *bs, BlockDriverState *base,
>   {
>       BlockDriverState *i;
>   
> -    for (i = bs; i != base; i = backing_bs(i)) {
> +    for (i = bs; i != base; i = child_bs(i->backing)) {

and here don't..

>           if (i->backing && i->backing->frozen) {
>               error_setg(errp, "Cannot change '%s' link from '%s' to '%s'",
>                          i->backing->name, i->node_name,
> -                       backing_bs(i)->node_name);
> +                       i->backing->bs->node_name);
>               return true;
>           }
>       }

[..]

> +static BlockDriverState *bdrv_skip_filters(BlockDriverState *bs,
> +                                           bool stop_on_explicit_filter)
> +{
> +    BdrvChild *filtered;
> +
> +    if (!bs) {
> +        return NULL;
> +    }
> +
> +    while (!(stop_on_explicit_filter && !bs->implicit)) {

you may save some characters and extra operators by

bool skip_explicit
...
while (skip_explicit || bs->implicit) {


> +        filtered = bdrv_filtered_rw_child(bs);
> +        if (!filtered) {
> +            break;
> +        }
> +        bs = filtered->bs;
> +    }
> +    /*
> +     * Note that this treats nodes with bs->drv == NULL as not being
> +     * R/W filters (bs->drv == NULL should be replaced by something
> +     * else anyway).
> +     * The advantage of this behavior is that this function will thus
> +     * always return a non-NULL value (given a non-NULL @bs).
> +     */
> +
> +    return bs;
> +}
> +
> +/*
> + * Return the first BDS that has not been added implicitly or that
> + * does not have an RW-filtered child down the chain starting from @bs
> + * (including @bs itself).
> + */
> +BlockDriverState *bdrv_skip_implicit_filters(BlockDriverState *bs)
> +{
> +    return bdrv_skip_filters(bs, true);
> +}
> +
> +/*
> + * Return the first BDS that does not have an RW-filtered child down
> + * the chain starting from @bs (including @bs itself).
> + */
> +BlockDriverState *bdrv_skip_rw_filters(BlockDriverState *bs)
> +{
> +    return bdrv_skip_filters(bs, false);
> +}
> +
> +/*
> + * For a backing chain, return the first non-filter backing image.

or second, if we start from filter

> + */
> +BlockDriverState *bdrv_backing_chain_next(BlockDriverState *bs)
> +{
> +    return bdrv_skip_rw_filters(bdrv_filtered_cow_bs(bdrv_skip_rw_filters(bs)));
> +}



-- 
Best regards,
Vladimir

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [PATCH v4 02/11] block: Filtered children access functions
  2019-04-16 10:02   ` Vladimir Sementsov-Ogievskiy
@ 2019-04-17 16:22     ` Max Reitz
  2019-04-18  8:36       ` Vladimir Sementsov-Ogievskiy
  2019-04-19 10:23       ` Vladimir Sementsov-Ogievskiy
  2019-05-31 16:26     ` Max Reitz
  1 sibling, 2 replies; 48+ messages in thread
From: Max Reitz @ 2019-04-17 16:22 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-block; +Cc: Kevin Wolf, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 11465 bytes --]

On 16.04.19 12:02, Vladimir Sementsov-Ogievskiy wrote:
> 10.04.2019 23:20, Max Reitz wrote:
>> What bs->file and bs->backing mean depends on the node.  For filter
>> nodes, both signify a node that will eventually receive all R/W
>> accesses.  For format nodes, bs->file contains metadata and data, and
>> bs->backing will not receive writes -- instead, writes are COWed to
>> bs->file.  Usually.
>>
>> In any case, it is not trivial to guess what a child means exactly with
>> our currently limited form of expression.  It is better to introduce
>> some functions that actually guarantee a meaning:
>>
>> - bdrv_filtered_cow_child() will return the child that receives requests
>>    filtered through COW.  That is, reads may or may not be forwarded
>>    (depending on the overlay's allocation status), but writes never go to
>>    this child.
>>
>> - bdrv_filtered_rw_child() will return the child that receives requests
>>    filtered through some very plain process.  Reads and writes issued to
>>    the parent will go to the child as well (although timing, etc. may be
>>    modified).
>>
>> - All drivers but quorum (but quorum is pretty opaque to the general
>>    block layer anyway) always only have one of these children: All read
>>    requests must be served from the filtered_rw_child (if it exists), so
>>    if there was a filtered_cow_child in addition, it would not receive
>>    any requests at all.
>>    (The closest here is mirror, where all requests are passed on to the
>>    source, but with write-blocking, write requests are "COWed" to the
>>    target.  But that just means that the target is a special child that
>>    cannot be introspected by the generic block layer functions, and that
>>    source is a filtered_rw_child.)
>>    Therefore, we can also add bdrv_filtered_child() which returns that
>>    one child (or NULL, if there is no filtered child).
>>
>> Also, many places in the current block layer should be skipping filters
>> (all filters or just the ones added implicitly, it depends) when going
>> through a block node chain.  They do not do that currently, but this
>> patch makes them.
>>
>> One example for this is qemu-img map, which should skip filters and only
>> look at the COW elements in the graph.  The change to iotest 204's
>> reference output shows how using blkdebug on top of a COW node used to
>> make qemu-img map disregard the rest of the backing chain, but with this
>> patch, the allocation in the base image is reported correctly.
>>
>> Furthermore, a note should be made that sometimes we do want to access
>> bs->backing directly.  This is whenever the operation in question is not
>> about accessing the COW child, but the "backing" child, be it COW or
>> not.  This is the case in functions such as bdrv_open_backing_file() or
>> whenever we have to deal with the special behavior of @backing as a
>> blockdev option, which is that it does not default to null like all
>> other child references do.
>>
>> Finally, the query functions (query-block and query-named-block-nodes)
>> are modified to return any filtered child under "backing", not just
>> bs->backing or COW children.  This is so that filters do not interrupt
>> the reported backing chain.  This changes the output of iotest 184, as
>> the throttled node now appears as a backing child.
>>
>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>> ---
>>   qapi/block-core.json           |   4 +
>>   include/block/block.h          |   1 +
>>   include/block/block_int.h      |  40 +++++--
>>   block.c                        | 210 +++++++++++++++++++++++++++------
>>   block/backup.c                 |   8 +-
>>   block/block-backend.c          |  16 ++-
>>   block/commit.c                 |  33 +++---
>>   block/io.c                     |  45 ++++---
>>   block/mirror.c                 |  21 ++--
>>   block/qapi.c                   |  30 +++--
>>   block/stream.c                 |  13 +-
>>   blockdev.c                     |  88 +++++++++++---
>>   migration/block-dirty-bitmap.c |   4 +-
>>   nbd/server.c                   |   6 +-
>>   qemu-img.c                     |  29 ++---
>>   tests/qemu-iotests/184.out     |   7 +-
>>   tests/qemu-iotests/204.out     |   1 +
>>   17 files changed, 411 insertions(+), 145 deletions(-)
> 
> really huge... didn't you consider conversion file-by-file?

Frankly, no, I just didn’t consider it.

Hm.  I don’t know, 30-patch series always look so frightening.

>> diff --git a/block.c b/block.c
>> index 16615bc876..e8f6febda0 100644
>> --- a/block.c
>> +++ b/block.c
> 
> [..]
> 
>>   
>> @@ -3467,14 +3469,17 @@ static int bdrv_reopen_parse_backing(BDRVReopenState *reopen_state,
>>       /*
>>        * Find the "actual" backing file by skipping all links that point
>>        * to an implicit node, if any (e.g. a commit filter node).
>> +     * We cannot use any of the bdrv_skip_*() functions here because
>> +     * those return the first explicit node, while we are looking for
>> +     * its overlay here.
>>        */
>>       overlay_bs = bs;
>> -    while (backing_bs(overlay_bs) && backing_bs(overlay_bs)->implicit) {
>> -        overlay_bs = backing_bs(overlay_bs);
>> +    while (overlay_bs->backing && bdrv_filtered_bs(overlay_bs)->implicit) {
> 
> So, you don't want to skip implicit filters with 'file' child? Then, why not to use
> child_bs(overlay_bs->backing), like in following if condition?

I think it was an artifact of writing the patch.  I started with
bdrv_filtered_bs() and then realized this depends on ->backing,
actually.  There was no functional difference so I left it as it was.

But you’re right, it is more clear to use child_bs(overlay_bs->backing)
isntead.

> Could we instead make backing-based filters equal to file-based, to make it possible
> to use file-based filters in backing-chain related scenarios (like upcoming copy-on-read
> filter for stream)? So, to expand backing-chain concept to include filters with file child?

If I understand you correctly, that’s basically the purpose of this
series and especially this patch here.  As far as it is possible and
reasonable, I want filters that use bs->backing and bs->file behave the
same.

However, there are cases where this is not possible and
bdrv_reopen_parse_backing() is one such case.  bs->backing and bs->file
correspond to QAPI names, namely 'backing' and 'file'.  If that
distinction was already visible to the user, we cannot change it now.

We definitely cannot make file-based filters use bs->backing now because
you can create them over QAPI and they use 'file' as their child name.
Can we make backing-based filters use bs->file?  Seems more likely,
because all of them are implicit nodes, so the user usually doesn’t see
them.  But usually isn’t always; they do become user-visible once the
user specifies a node-name for mirror or commit.

I found it more reasonable to introduce new functions that explicitly
express what kind of child they expect and then apply them everywhere as
I saw fit, instead of making the mirror/commit filter drivers use
bs->file and hope it works; not least because I’d still have to go
through the whole block layer and check every instance of bs->backing to
see whether it really needs bs->backing or whether it should use either
of bs->backing or bs->file.

>> +        overlay_bs = bdrv_filtered_bs(overlay_bs);
>>       }
>>   
>>       /* If we want to replace the backing file we need some extra checks */
>> -    if (new_backing_bs != backing_bs(overlay_bs)) {
>> +    if (new_backing_bs != child_bs(overlay_bs->backing)) { >           /* Check for implicit nodes between bs and its backing file */
>>           if (bs != overlay_bs) {
>>               error_setg(errp, "Cannot change backing link if '%s' has "
> 
> [..]
> 
>> @@ -4203,8 +4208,8 @@ int bdrv_change_backing_file(BlockDriverState *bs,
>>   BlockDriverState *bdrv_find_overlay(BlockDriverState *active,
>>                                       BlockDriverState *bs)
>>   {
>> -    while (active && bs != backing_bs(active)) {
>> -        active = backing_bs(active);
>> +    while (active && bs != bdrv_filtered_bs(active)) {
> 
> hmm and here you actually support backing-chain with file-child-based filters in it..

Yes, because this is not about the QAPI 'backing' link.  This function
should continue to work even if there are filters in the backing chain.

>> +        active = bdrv_filtered_bs(active);
>>       }
>>   
>>       return active;
>> @@ -4226,11 +4231,11 @@ bool bdrv_is_backing_chain_frozen(BlockDriverState *bs, BlockDriverState *base,
>>   {
>>       BlockDriverState *i;
>>   
>> -    for (i = bs; i != base; i = backing_bs(i)) {
>> +    for (i = bs; i != base; i = child_bs(i->backing)) {
> 
> and here don't..

Yes, because this function is about the QAPI 'backing' link.

>>           if (i->backing && i->backing->frozen) {
>>               error_setg(errp, "Cannot change '%s' link from '%s' to '%s'",
>>                          i->backing->name, i->node_name,
>> -                       backing_bs(i)->node_name);
>> +                       i->backing->bs->node_name);
>>               return true;
>>           }
>>       }
> 
> [..]
> 
>> +static BlockDriverState *bdrv_skip_filters(BlockDriverState *bs,
>> +                                           bool stop_on_explicit_filter)
>> +{
>> +    BdrvChild *filtered;
>> +
>> +    if (!bs) {
>> +        return NULL;
>> +    }
>> +
>> +    while (!(stop_on_explicit_filter && !bs->implicit)) {
> 
> you may save some characters and extra operators by
> 
> bool skip_explicit
> ...
> while (skip_explicit || bs->implicit) {

But is it really simpler?

>> +        filtered = bdrv_filtered_rw_child(bs);
>> +        if (!filtered) {
>> +            break;
>> +        }
>> +        bs = filtered->bs;
>> +    }
>> +    /*
>> +     * Note that this treats nodes with bs->drv == NULL as not being
>> +     * R/W filters (bs->drv == NULL should be replaced by something
>> +     * else anyway).
>> +     * The advantage of this behavior is that this function will thus
>> +     * always return a non-NULL value (given a non-NULL @bs).
>> +     */
>> +
>> +    return bs;
>> +}
>> +
>> +/*
>> + * Return the first BDS that has not been added implicitly or that
>> + * does not have an RW-filtered child down the chain starting from @bs
>> + * (including @bs itself).
>> + */
>> +BlockDriverState *bdrv_skip_implicit_filters(BlockDriverState *bs)
>> +{
>> +    return bdrv_skip_filters(bs, true);
>> +}
>> +
>> +/*
>> + * Return the first BDS that does not have an RW-filtered child down
>> + * the chain starting from @bs (including @bs itself).
>> + */
>> +BlockDriverState *bdrv_skip_rw_filters(BlockDriverState *bs)
>> +{
>> +    return bdrv_skip_filters(bs, false);
>> +}
>> +
>> +/*
>> + * For a backing chain, return the first non-filter backing image.
> 
> or second, if we start from filter

Hm, in a sense.  Maybe:

> For a backing chain, return the first non-filter backing image of the
> first non-filter image.

?

>> + */
>> +BlockDriverState *bdrv_backing_chain_next(BlockDriverState *bs)
>> +{
>> +    return bdrv_skip_rw_filters(bdrv_filtered_cow_bs(bdrv_skip_rw_filters(bs)));
>> +}
> 
> 
> 



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [PATCH v4 02/11] block: Filtered children access functions
  2019-04-17 16:22     ` Max Reitz
@ 2019-04-18  8:36       ` Vladimir Sementsov-Ogievskiy
  2019-04-24 15:23         ` Max Reitz
  2019-04-19 10:23       ` Vladimir Sementsov-Ogievskiy
  1 sibling, 1 reply; 48+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2019-04-18  8:36 UTC (permalink / raw)
  To: Max Reitz, qemu-block; +Cc: Kevin Wolf, qemu-devel

17.04.2019 19:22, Max Reitz wrote:
> On 16.04.19 12:02, Vladimir Sementsov-Ogievskiy wrote:
>> 10.04.2019 23:20, Max Reitz wrote:
>>> What bs->file and bs->backing mean depends on the node.  For filter
>>> nodes, both signify a node that will eventually receive all R/W
>>> accesses.  For format nodes, bs->file contains metadata and data, and
>>> bs->backing will not receive writes -- instead, writes are COWed to
>>> bs->file.  Usually.
>>>
>>> In any case, it is not trivial to guess what a child means exactly with
>>> our currently limited form of expression.  It is better to introduce
>>> some functions that actually guarantee a meaning:
>>>
>>> - bdrv_filtered_cow_child() will return the child that receives requests
>>>     filtered through COW.  That is, reads may or may not be forwarded
>>>     (depending on the overlay's allocation status), but writes never go to
>>>     this child.
>>>
>>> - bdrv_filtered_rw_child() will return the child that receives requests
>>>     filtered through some very plain process.  Reads and writes issued to
>>>     the parent will go to the child as well (although timing, etc. may be
>>>     modified).
>>>
>>> - All drivers but quorum (but quorum is pretty opaque to the general
>>>     block layer anyway) always only have one of these children: All read
>>>     requests must be served from the filtered_rw_child (if it exists), so
>>>     if there was a filtered_cow_child in addition, it would not receive
>>>     any requests at all.
>>>     (The closest here is mirror, where all requests are passed on to the
>>>     source, but with write-blocking, write requests are "COWed" to the
>>>     target.  But that just means that the target is a special child that
>>>     cannot be introspected by the generic block layer functions, and that
>>>     source is a filtered_rw_child.)
>>>     Therefore, we can also add bdrv_filtered_child() which returns that
>>>     one child (or NULL, if there is no filtered child).
>>>
>>> Also, many places in the current block layer should be skipping filters
>>> (all filters or just the ones added implicitly, it depends) when going
>>> through a block node chain.  They do not do that currently, but this
>>> patch makes them.
>>>
>>> One example for this is qemu-img map, which should skip filters and only
>>> look at the COW elements in the graph.  The change to iotest 204's
>>> reference output shows how using blkdebug on top of a COW node used to
>>> make qemu-img map disregard the rest of the backing chain, but with this
>>> patch, the allocation in the base image is reported correctly.
>>>
>>> Furthermore, a note should be made that sometimes we do want to access
>>> bs->backing directly.  This is whenever the operation in question is not
>>> about accessing the COW child, but the "backing" child, be it COW or
>>> not.  This is the case in functions such as bdrv_open_backing_file() or
>>> whenever we have to deal with the special behavior of @backing as a
>>> blockdev option, which is that it does not default to null like all
>>> other child references do.
>>>
>>> Finally, the query functions (query-block and query-named-block-nodes)
>>> are modified to return any filtered child under "backing", not just
>>> bs->backing or COW children.  This is so that filters do not interrupt
>>> the reported backing chain.  This changes the output of iotest 184, as
>>> the throttled node now appears as a backing child.
>>>
>>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>>> ---
>>>    qapi/block-core.json           |   4 +
>>>    include/block/block.h          |   1 +
>>>    include/block/block_int.h      |  40 +++++--
>>>    block.c                        | 210 +++++++++++++++++++++++++++------
>>>    block/backup.c                 |   8 +-
>>>    block/block-backend.c          |  16 ++-
>>>    block/commit.c                 |  33 +++---
>>>    block/io.c                     |  45 ++++---
>>>    block/mirror.c                 |  21 ++--
>>>    block/qapi.c                   |  30 +++--
>>>    block/stream.c                 |  13 +-
>>>    blockdev.c                     |  88 +++++++++++---
>>>    migration/block-dirty-bitmap.c |   4 +-
>>>    nbd/server.c                   |   6 +-
>>>    qemu-img.c                     |  29 ++---
>>>    tests/qemu-iotests/184.out     |   7 +-
>>>    tests/qemu-iotests/204.out     |   1 +
>>>    17 files changed, 411 insertions(+), 145 deletions(-)
>>
>> really huge... didn't you consider conversion file-by-file?
> 
> Frankly, no, I just didn’t consider it.
> 
> Hm.  I don’t know, 30-patch series always look so frightening.
> 
>>> diff --git a/block.c b/block.c
>>> index 16615bc876..e8f6febda0 100644
>>> --- a/block.c
>>> +++ b/block.c
>>
>> [..]
>>
>>>    
>>> @@ -3467,14 +3469,17 @@ static int bdrv_reopen_parse_backing(BDRVReopenState *reopen_state,
>>>        /*
>>>         * Find the "actual" backing file by skipping all links that point
>>>         * to an implicit node, if any (e.g. a commit filter node).
>>> +     * We cannot use any of the bdrv_skip_*() functions here because
>>> +     * those return the first explicit node, while we are looking for
>>> +     * its overlay here.
>>>         */
>>>        overlay_bs = bs;
>>> -    while (backing_bs(overlay_bs) && backing_bs(overlay_bs)->implicit) {
>>> -        overlay_bs = backing_bs(overlay_bs);
>>> +    while (overlay_bs->backing && bdrv_filtered_bs(overlay_bs)->implicit) {
>>
>> So, you don't want to skip implicit filters with 'file' child? Then, why not to use
>> child_bs(overlay_bs->backing), like in following if condition?
> 
> I think it was an artifact of writing the patch.  I started with
> bdrv_filtered_bs() and then realized this depends on ->backing,
> actually.  There was no functional difference so I left it as it was.
> 
> But you’re right, it is more clear to use child_bs(overlay_bs->backing)
> isntead.
> 
>> Could we instead make backing-based filters equal to file-based, to make it possible
>> to use file-based filters in backing-chain related scenarios (like upcoming copy-on-read
>> filter for stream)? So, to expand backing-chain concept to include filters with file child?
> 
> If I understand you correctly, that’s basically the purpose of this
> series and especially this patch here.  As far as it is possible and
> reasonable, I want filters that use bs->backing and bs->file behave the
> same.
> 
> However, there are cases where this is not possible and
> bdrv_reopen_parse_backing() is one such case.  bs->backing and bs->file
> correspond to QAPI names, namely 'backing' and 'file'.  If that
> distinction was already visible to the user, we cannot change it now.
> 
> We definitely cannot make file-based filters use bs->backing now because
> you can create them over QAPI and they use 'file' as their child name.
> Can we make backing-based filters use bs->file?  Seems more likely,
> because all of them are implicit nodes, so the user usually doesn’t see
> them.  But usually isn’t always; they do become user-visible once the
> user specifies a node-name for mirror or commit.
> 
> I found it more reasonable to introduce new functions that explicitly
> express what kind of child they expect and then apply them everywhere as
> I saw fit, instead of making the mirror/commit filter drivers use
> bs->file and hope it works; not least because I’d still have to go
> through the whole block layer and check every instance of bs->backing to
> see whether it really needs bs->backing or whether it should use either
> of bs->backing or bs->file.
> 
>>> +        overlay_bs = bdrv_filtered_bs(overlay_bs);
>>>        }
>>>    
>>>        /* If we want to replace the backing file we need some extra checks */
>>> -    if (new_backing_bs != backing_bs(overlay_bs)) {
>>> +    if (new_backing_bs != child_bs(overlay_bs->backing)) { >           /* Check for implicit nodes between bs and its backing file */
>>>            if (bs != overlay_bs) {
>>>                error_setg(errp, "Cannot change backing link if '%s' has "
>>
>> [..]
>>
>>> @@ -4203,8 +4208,8 @@ int bdrv_change_backing_file(BlockDriverState *bs,
>>>    BlockDriverState *bdrv_find_overlay(BlockDriverState *active,
>>>                                        BlockDriverState *bs)
>>>    {
>>> -    while (active && bs != backing_bs(active)) {
>>> -        active = backing_bs(active);
>>> +    while (active && bs != bdrv_filtered_bs(active)) {
>>
>> hmm and here you actually support backing-chain with file-child-based filters in it..
> 
> Yes, because this is not about the QAPI 'backing' link.  This function
> should continue to work even if there are filters in the backing chain.
> 
>>> +        active = bdrv_filtered_bs(active);
>>>        }
>>>    
>>>        return active;
>>> @@ -4226,11 +4231,11 @@ bool bdrv_is_backing_chain_frozen(BlockDriverState *bs, BlockDriverState *base,
>>>    {
>>>        BlockDriverState *i;
>>>    
>>> -    for (i = bs; i != base; i = backing_bs(i)) {
>>> +    for (i = bs; i != base; i = child_bs(i->backing)) {
>>
>> and here don't..
> 
> Yes, because this function is about the QAPI 'backing' link.

Why? What is bad if we just treat backing and file child equally for filters? Some
scenarios will start to work which didn't, but neither should be damaged I think..

I mean, if we declare for users that "backing chain" may include file child of
filter nodes, what will break?

> 
>>>            if (i->backing && i->backing->frozen) {
>>>                error_setg(errp, "Cannot change '%s' link from '%s' to '%s'",
>>>                           i->backing->name, i->node_name,
>>> -                       backing_bs(i)->node_name);
>>> +                       i->backing->bs->node_name);
>>>                return true;
>>>            }
>>>        }
>>
>> [..]
>>
>>> +static BlockDriverState *bdrv_skip_filters(BlockDriverState *bs,
>>> +                                           bool stop_on_explicit_filter)
>>> +{
>>> +    BdrvChild *filtered;
>>> +
>>> +    if (!bs) {
>>> +        return NULL;
>>> +    }
>>> +
>>> +    while (!(stop_on_explicit_filter && !bs->implicit)) {
>>
>> you may save some characters and extra operators by
>>
>> bool skip_explicit
>> ...
>> while (skip_explicit || bs->implicit) {
> 
> But is it really simpler?

hmm, I thought yes? Anyway, I'm OK with either variant.

> 
>>> +        filtered = bdrv_filtered_rw_child(bs);
>>> +        if (!filtered) {
>>> +            break;
>>> +        }
>>> +        bs = filtered->bs;
>>> +    }
>>> +    /*
>>> +     * Note that this treats nodes with bs->drv == NULL as not being
>>> +     * R/W filters (bs->drv == NULL should be replaced by something
>>> +     * else anyway).
>>> +     * The advantage of this behavior is that this function will thus
>>> +     * always return a non-NULL value (given a non-NULL @bs).
>>> +     */
>>> +
>>> +    return bs;
>>> +}
>>> +
>>> +/*
>>> + * Return the first BDS that has not been added implicitly or that
>>> + * does not have an RW-filtered child down the chain starting from @bs
>>> + * (including @bs itself).
>>> + */
>>> +BlockDriverState *bdrv_skip_implicit_filters(BlockDriverState *bs)
>>> +{
>>> +    return bdrv_skip_filters(bs, true);
>>> +}
>>> +
>>> +/*
>>> + * Return the first BDS that does not have an RW-filtered child down
>>> + * the chain starting from @bs (including @bs itself).
>>> + */
>>> +BlockDriverState *bdrv_skip_rw_filters(BlockDriverState *bs)
>>> +{
>>> +    return bdrv_skip_filters(bs, false);
>>> +}
>>> +
>>> +/*
>>> + * For a backing chain, return the first non-filter backing image.
>>
>> or second, if we start from filter
> 
> Hm, in a sense.  Maybe:
> 
>> For a backing chain, return the first non-filter backing image of the
>> first non-filter image.
> 
> ?
> 

Yes, this sounds good for me.


-- 
Best regards,
Vladimir

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [PATCH v4 02/11] block: Filtered children access functions
  2019-04-17 16:22     ` Max Reitz
  2019-04-18  8:36       ` Vladimir Sementsov-Ogievskiy
@ 2019-04-19 10:23       ` Vladimir Sementsov-Ogievskiy
  2019-04-24 16:36         ` Max Reitz
  1 sibling, 1 reply; 48+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2019-04-19 10:23 UTC (permalink / raw)
  To: Max Reitz, qemu-block; +Cc: Kevin Wolf, qemu-devel

17.04.2019 19:22, Max Reitz wrote:
> On 16.04.19 12:02, Vladimir Sementsov-Ogievskiy wrote:
>> 10.04.2019 23:20, Max Reitz wrote:
>>> What bs->file and bs->backing mean depends on the node.  For filter
>>> nodes, both signify a node that will eventually receive all R/W
>>> accesses.  For format nodes, bs->file contains metadata and data, and
>>> bs->backing will not receive writes -- instead, writes are COWed to
>>> bs->file.  Usually.
>>>
>>> In any case, it is not trivial to guess what a child means exactly with
>>> our currently limited form of expression.  It is better to introduce
>>> some functions that actually guarantee a meaning:
>>>
>>> - bdrv_filtered_cow_child() will return the child that receives requests
>>>     filtered through COW.  That is, reads may or may not be forwarded
>>>     (depending on the overlay's allocation status), but writes never go to
>>>     this child.
>>>
>>> - bdrv_filtered_rw_child() will return the child that receives requests
>>>     filtered through some very plain process.  Reads and writes issued to
>>>     the parent will go to the child as well (although timing, etc. may be
>>>     modified).
>>>
>>> - All drivers but quorum (but quorum is pretty opaque to the general
>>>     block layer anyway) always only have one of these children: All read
>>>     requests must be served from the filtered_rw_child (if it exists), so
>>>     if there was a filtered_cow_child in addition, it would not receive
>>>     any requests at all.
>>>     (The closest here is mirror, where all requests are passed on to the
>>>     source, but with write-blocking, write requests are "COWed" to the
>>>     target.  But that just means that the target is a special child that
>>>     cannot be introspected by the generic block layer functions, and that
>>>     source is a filtered_rw_child.)
>>>     Therefore, we can also add bdrv_filtered_child() which returns that
>>>     one child (or NULL, if there is no filtered child).
>>>
>>> Also, many places in the current block layer should be skipping filters
>>> (all filters or just the ones added implicitly, it depends) when going
>>> through a block node chain.  They do not do that currently, but this
>>> patch makes them.
>>>
>>> One example for this is qemu-img map, which should skip filters and only
>>> look at the COW elements in the graph.  The change to iotest 204's
>>> reference output shows how using blkdebug on top of a COW node used to
>>> make qemu-img map disregard the rest of the backing chain, but with this
>>> patch, the allocation in the base image is reported correctly.
>>>
>>> Furthermore, a note should be made that sometimes we do want to access
>>> bs->backing directly.  This is whenever the operation in question is not
>>> about accessing the COW child, but the "backing" child, be it COW or
>>> not.  This is the case in functions such as bdrv_open_backing_file() or
>>> whenever we have to deal with the special behavior of @backing as a
>>> blockdev option, which is that it does not default to null like all
>>> other child references do.
>>>
>>> Finally, the query functions (query-block and query-named-block-nodes)
>>> are modified to return any filtered child under "backing", not just
>>> bs->backing or COW children.  This is so that filters do not interrupt
>>> the reported backing chain.  This changes the output of iotest 184, as
>>> the throttled node now appears as a backing child.
>>>
>>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>>> ---
>>>    qapi/block-core.json           |   4 +
>>>    include/block/block.h          |   1 +
>>>    include/block/block_int.h      |  40 +++++--
>>>    block.c                        | 210 +++++++++++++++++++++++++++------
>>>    block/backup.c                 |   8 +-
>>>    block/block-backend.c          |  16 ++-
>>>    block/commit.c                 |  33 +++---
>>>    block/io.c                     |  45 ++++---
>>>    block/mirror.c                 |  21 ++--
>>>    block/qapi.c                   |  30 +++--
>>>    block/stream.c                 |  13 +-
>>>    blockdev.c                     |  88 +++++++++++---
>>>    migration/block-dirty-bitmap.c |   4 +-
>>>    nbd/server.c                   |   6 +-
>>>    qemu-img.c                     |  29 ++---
>>>    tests/qemu-iotests/184.out     |   7 +-
>>>    tests/qemu-iotests/204.out     |   1 +
>>>    17 files changed, 411 insertions(+), 145 deletions(-)
>>
>> really huge... didn't you consider conversion file-by-file?
> 
> Frankly, no, I just didn’t consider it.
> 
> Hm.  I don’t know, 30-patch series always look so frightening.
> 
>>> diff --git a/block.c b/block.c
>>> index 16615bc876..e8f6febda0 100644
>>> --- a/block.c
>>> +++ b/block.c
>>
>> [..]
>>
>>>    
>>> @@ -3467,14 +3469,17 @@ static int bdrv_reopen_parse_backing(BDRVReopenState *reopen_state,
>>>        /*
>>>         * Find the "actual" backing file by skipping all links that point
>>>         * to an implicit node, if any (e.g. a commit filter node).
>>> +     * We cannot use any of the bdrv_skip_*() functions here because
>>> +     * those return the first explicit node, while we are looking for
>>> +     * its overlay here.
>>>         */
>>>        overlay_bs = bs;
>>> -    while (backing_bs(overlay_bs) && backing_bs(overlay_bs)->implicit) {
>>> -        overlay_bs = backing_bs(overlay_bs);
>>> +    while (overlay_bs->backing && bdrv_filtered_bs(overlay_bs)->implicit) {
>>
>> So, you don't want to skip implicit filters with 'file' child? Then, why not to use
>> child_bs(overlay_bs->backing), like in following if condition?
> 
> I think it was an artifact of writing the patch.  I started with
> bdrv_filtered_bs() and then realized this depends on ->backing,
> actually.  There was no functional difference so I left it as it was.
> 
> But you’re right, it is more clear to use child_bs(overlay_bs->backing)
> isntead.
> 
>> Could we instead make backing-based filters equal to file-based, to make it possible
>> to use file-based filters in backing-chain related scenarios (like upcoming copy-on-read
>> filter for stream)? So, to expand backing-chain concept to include filters with file child?
> 
> If I understand you correctly, that’s basically the purpose of this
> series and especially this patch here.  As far as it is possible and
> reasonable, I want filters that use bs->backing and bs->file behave the
> same.
> 
> However, there are cases where this is not possible and
> bdrv_reopen_parse_backing() is one such case.  bs->backing and bs->file
> correspond to QAPI names, namely 'backing' and 'file'.  If that
> distinction was already visible to the user, we cannot change it now.
> 
> We definitely cannot make file-based filters use bs->backing now because
> you can create them over QAPI and they use 'file' as their child name.
> Can we make backing-based filters use bs->file?  Seems more likely,
> because all of them are implicit nodes, so the user usually doesn’t see
> them.  But usually isn’t always; they do become user-visible once the
> user specifies a node-name for mirror or commit.
> 
> I found it more reasonable to introduce new functions that explicitly
> express what kind of child they expect and then apply them everywhere as
> I saw fit, instead of making the mirror/commit filter drivers use
> bs->file and hope it works; not least because I’d still have to go
> through the whole block layer and check every instance of bs->backing to
> see whether it really needs bs->backing or whether it should use either
> of bs->backing or bs->file.
> 
>>> +        overlay_bs = bdrv_filtered_bs(overlay_bs);
>>>        }
>>>    
>>>        /* If we want to replace the backing file we need some extra checks */
>>> -    if (new_backing_bs != backing_bs(overlay_bs)) {
>>> +    if (new_backing_bs != child_bs(overlay_bs->backing)) { >           /* Check for implicit nodes between bs and its backing file */
>>>            if (bs != overlay_bs) {
>>>                error_setg(errp, "Cannot change backing link if '%s' has "
>>
>> [..]
>>
>>> @@ -4203,8 +4208,8 @@ int bdrv_change_backing_file(BlockDriverState *bs,
>>>    BlockDriverState *bdrv_find_overlay(BlockDriverState *active,
>>>                                        BlockDriverState *bs)
>>>    {
>>> -    while (active && bs != backing_bs(active)) {
>>> -        active = backing_bs(active);
>>> +    while (active && bs != bdrv_filtered_bs(active)) {
>>
>> hmm and here you actually support backing-chain with file-child-based filters in it..
> 
> Yes, because this is not about the QAPI 'backing' link.  This function
> should continue to work even if there are filters in the backing chain.

this is a generic function to find overlay in backing chain and it may be used from different places,
for example it is used in Andrey's series about filter for block-stream.

It is used from qmp_block_commit, isn't it about QAPI?

> 
>>> +        active = bdrv_filtered_bs(active);
>>>        }
>>>    
>>>        return active;
>>> @@ -4226,11 +4231,11 @@ bool bdrv_is_backing_chain_frozen(BlockDriverState *bs, BlockDriverState *base,
>>>    {
>>>        BlockDriverState *i;
>>>    
>>> -    for (i = bs; i != base; i = backing_bs(i)) {
>>> +    for (i = bs; i != base; i = child_bs(i->backing)) {
>>
>> and here don't..
> 
> Yes, because this function is about the QAPI 'backing' link.

And this again a generic thing, that may be used in same places as bdrv_find_overlay,
and it is used in series about block-stream filter too. So, for further developments
we'll have to keep in mind all these differences between generic block layer functions,
which supports .file children inside backing chain and which are not...

> 
>>>            if (i->backing && i->backing->frozen) {
>>>                error_setg(errp, "Cannot change '%s' link from '%s' to '%s'",
>>>                           i->backing->name, i->node_name,
>>> -                       backing_bs(i)->node_name);
>>> +                       i->backing->bs->node_name);
>>>                return true;
>>>            }
>>>        }
>>
>> [..]
>>
>>> +static BlockDriverState *bdrv_skip_filters(BlockDriverState *bs,
>>> +                                           bool stop_on_explicit_filter)
>>> +{
>>> +    BdrvChild *filtered;
>>> +
>>> +    if (!bs) {
>>> +        return NULL;
>>> +    }
>>> +
>>> +    while (!(stop_on_explicit_filter && !bs->implicit)) {
>>
>> you may save some characters and extra operators by
>>
>> bool skip_explicit
>> ...
>> while (skip_explicit || bs->implicit) {
> 
> But is it really simpler?
> 
>>> +        filtered = bdrv_filtered_rw_child(bs);
>>> +        if (!filtered) {
>>> +            break;
>>> +        }
>>> +        bs = filtered->bs;
>>> +    }
>>> +    /*
>>> +     * Note that this treats nodes with bs->drv == NULL as not being
>>> +     * R/W filters (bs->drv == NULL should be replaced by something
>>> +     * else anyway).
>>> +     * The advantage of this behavior is that this function will thus
>>> +     * always return a non-NULL value (given a non-NULL @bs).
>>> +     */
>>> +
>>> +    return bs;
>>> +}
>>> +
>>> +/*
>>> + * Return the first BDS that has not been added implicitly or that
>>> + * does not have an RW-filtered child down the chain starting from @bs
>>> + * (including @bs itself).
>>> + */
>>> +BlockDriverState *bdrv_skip_implicit_filters(BlockDriverState *bs)
>>> +{
>>> +    return bdrv_skip_filters(bs, true);
>>> +}
>>> +
>>> +/*
>>> + * Return the first BDS that does not have an RW-filtered child down
>>> + * the chain starting from @bs (including @bs itself).
>>> + */
>>> +BlockDriverState *bdrv_skip_rw_filters(BlockDriverState *bs)
>>> +{
>>> +    return bdrv_skip_filters(bs, false);
>>> +}
>>> +
>>> +/*
>>> + * For a backing chain, return the first non-filter backing image.
>>
>> or second, if we start from filter
> 
> Hm, in a sense.  Maybe:
> 
>> For a backing chain, return the first non-filter backing image of the
>> first non-filter image.
> 
> ?
> 
>>> + */
>>> +BlockDriverState *bdrv_backing_chain_next(BlockDriverState *bs)
>>> +{
>>> +    return bdrv_skip_rw_filters(bdrv_filtered_cow_bs(bdrv_skip_rw_filters(bs)));
>>> +}
>>
>>
>>
> 
> 


-- 
Best regards,
Vladimir

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [PATCH v4 02/11] block: Filtered children access functions
  2019-04-18  8:36       ` Vladimir Sementsov-Ogievskiy
@ 2019-04-24 15:23         ` Max Reitz
  0 siblings, 0 replies; 48+ messages in thread
From: Max Reitz @ 2019-04-24 15:23 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-block; +Cc: Kevin Wolf, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 11354 bytes --]

On 18.04.19 10:36, Vladimir Sementsov-Ogievskiy wrote:
> 17.04.2019 19:22, Max Reitz wrote:
>> On 16.04.19 12:02, Vladimir Sementsov-Ogievskiy wrote:
>>> 10.04.2019 23:20, Max Reitz wrote:
>>>> What bs->file and bs->backing mean depends on the node.  For filter
>>>> nodes, both signify a node that will eventually receive all R/W
>>>> accesses.  For format nodes, bs->file contains metadata and data, and
>>>> bs->backing will not receive writes -- instead, writes are COWed to
>>>> bs->file.  Usually.
>>>>
>>>> In any case, it is not trivial to guess what a child means exactly with
>>>> our currently limited form of expression.  It is better to introduce
>>>> some functions that actually guarantee a meaning:
>>>>
>>>> - bdrv_filtered_cow_child() will return the child that receives requests
>>>>     filtered through COW.  That is, reads may or may not be forwarded
>>>>     (depending on the overlay's allocation status), but writes never go to
>>>>     this child.
>>>>
>>>> - bdrv_filtered_rw_child() will return the child that receives requests
>>>>     filtered through some very plain process.  Reads and writes issued to
>>>>     the parent will go to the child as well (although timing, etc. may be
>>>>     modified).
>>>>
>>>> - All drivers but quorum (but quorum is pretty opaque to the general
>>>>     block layer anyway) always only have one of these children: All read
>>>>     requests must be served from the filtered_rw_child (if it exists), so
>>>>     if there was a filtered_cow_child in addition, it would not receive
>>>>     any requests at all.
>>>>     (The closest here is mirror, where all requests are passed on to the
>>>>     source, but with write-blocking, write requests are "COWed" to the
>>>>     target.  But that just means that the target is a special child that
>>>>     cannot be introspected by the generic block layer functions, and that
>>>>     source is a filtered_rw_child.)
>>>>     Therefore, we can also add bdrv_filtered_child() which returns that
>>>>     one child (or NULL, if there is no filtered child).
>>>>
>>>> Also, many places in the current block layer should be skipping filters
>>>> (all filters or just the ones added implicitly, it depends) when going
>>>> through a block node chain.  They do not do that currently, but this
>>>> patch makes them.
>>>>
>>>> One example for this is qemu-img map, which should skip filters and only
>>>> look at the COW elements in the graph.  The change to iotest 204's
>>>> reference output shows how using blkdebug on top of a COW node used to
>>>> make qemu-img map disregard the rest of the backing chain, but with this
>>>> patch, the allocation in the base image is reported correctly.
>>>>
>>>> Furthermore, a note should be made that sometimes we do want to access
>>>> bs->backing directly.  This is whenever the operation in question is not
>>>> about accessing the COW child, but the "backing" child, be it COW or
>>>> not.  This is the case in functions such as bdrv_open_backing_file() or
>>>> whenever we have to deal with the special behavior of @backing as a
>>>> blockdev option, which is that it does not default to null like all
>>>> other child references do.
>>>>
>>>> Finally, the query functions (query-block and query-named-block-nodes)
>>>> are modified to return any filtered child under "backing", not just
>>>> bs->backing or COW children.  This is so that filters do not interrupt
>>>> the reported backing chain.  This changes the output of iotest 184, as
>>>> the throttled node now appears as a backing child.
>>>>
>>>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>>>> ---
>>>>    qapi/block-core.json           |   4 +
>>>>    include/block/block.h          |   1 +
>>>>    include/block/block_int.h      |  40 +++++--
>>>>    block.c                        | 210 +++++++++++++++++++++++++++------
>>>>    block/backup.c                 |   8 +-
>>>>    block/block-backend.c          |  16 ++-
>>>>    block/commit.c                 |  33 +++---
>>>>    block/io.c                     |  45 ++++---
>>>>    block/mirror.c                 |  21 ++--
>>>>    block/qapi.c                   |  30 +++--
>>>>    block/stream.c                 |  13 +-
>>>>    blockdev.c                     |  88 +++++++++++---
>>>>    migration/block-dirty-bitmap.c |   4 +-
>>>>    nbd/server.c                   |   6 +-
>>>>    qemu-img.c                     |  29 ++---
>>>>    tests/qemu-iotests/184.out     |   7 +-
>>>>    tests/qemu-iotests/204.out     |   1 +
>>>>    17 files changed, 411 insertions(+), 145 deletions(-)
>>>
>>> really huge... didn't you consider conversion file-by-file?
>>
>> Frankly, no, I just didn’t consider it.
>>
>> Hm.  I don’t know, 30-patch series always look so frightening.
>>
>>>> diff --git a/block.c b/block.c
>>>> index 16615bc876..e8f6febda0 100644
>>>> --- a/block.c
>>>> +++ b/block.c
>>>
>>> [..]
>>>
>>>>    
>>>> @@ -3467,14 +3469,17 @@ static int bdrv_reopen_parse_backing(BDRVReopenState *reopen_state,
>>>>        /*
>>>>         * Find the "actual" backing file by skipping all links that point
>>>>         * to an implicit node, if any (e.g. a commit filter node).
>>>> +     * We cannot use any of the bdrv_skip_*() functions here because
>>>> +     * those return the first explicit node, while we are looking for
>>>> +     * its overlay here.
>>>>         */
>>>>        overlay_bs = bs;
>>>> -    while (backing_bs(overlay_bs) && backing_bs(overlay_bs)->implicit) {
>>>> -        overlay_bs = backing_bs(overlay_bs);
>>>> +    while (overlay_bs->backing && bdrv_filtered_bs(overlay_bs)->implicit) {
>>>
>>> So, you don't want to skip implicit filters with 'file' child? Then, why not to use
>>> child_bs(overlay_bs->backing), like in following if condition?
>>
>> I think it was an artifact of writing the patch.  I started with
>> bdrv_filtered_bs() and then realized this depends on ->backing,
>> actually.  There was no functional difference so I left it as it was.
>>
>> But you’re right, it is more clear to use child_bs(overlay_bs->backing)
>> isntead.
>>
>>> Could we instead make backing-based filters equal to file-based, to make it possible
>>> to use file-based filters in backing-chain related scenarios (like upcoming copy-on-read
>>> filter for stream)? So, to expand backing-chain concept to include filters with file child?
>>
>> If I understand you correctly, that’s basically the purpose of this
>> series and especially this patch here.  As far as it is possible and
>> reasonable, I want filters that use bs->backing and bs->file behave the
>> same.
>>
>> However, there are cases where this is not possible and
>> bdrv_reopen_parse_backing() is one such case.  bs->backing and bs->file
>> correspond to QAPI names, namely 'backing' and 'file'.  If that
>> distinction was already visible to the user, we cannot change it now.
>>
>> We definitely cannot make file-based filters use bs->backing now because
>> you can create them over QAPI and they use 'file' as their child name.
>> Can we make backing-based filters use bs->file?  Seems more likely,
>> because all of them are implicit nodes, so the user usually doesn’t see
>> them.  But usually isn’t always; they do become user-visible once the
>> user specifies a node-name for mirror or commit.
>>
>> I found it more reasonable to introduce new functions that explicitly
>> express what kind of child they expect and then apply them everywhere as
>> I saw fit, instead of making the mirror/commit filter drivers use
>> bs->file and hope it works; not least because I’d still have to go
>> through the whole block layer and check every instance of bs->backing to
>> see whether it really needs bs->backing or whether it should use either
>> of bs->backing or bs->file.
>>
>>>> +        overlay_bs = bdrv_filtered_bs(overlay_bs);
>>>>        }
>>>>    
>>>>        /* If we want to replace the backing file we need some extra checks */
>>>> -    if (new_backing_bs != backing_bs(overlay_bs)) {
>>>> +    if (new_backing_bs != child_bs(overlay_bs->backing)) { >           /* Check for implicit nodes between bs and its backing file */
>>>>            if (bs != overlay_bs) {
>>>>                error_setg(errp, "Cannot change backing link if '%s' has "
>>>
>>> [..]
>>>
>>>> @@ -4203,8 +4208,8 @@ int bdrv_change_backing_file(BlockDriverState *bs,
>>>>    BlockDriverState *bdrv_find_overlay(BlockDriverState *active,
>>>>                                        BlockDriverState *bs)
>>>>    {
>>>> -    while (active && bs != backing_bs(active)) {
>>>> -        active = backing_bs(active);
>>>> +    while (active && bs != bdrv_filtered_bs(active)) {
>>>
>>> hmm and here you actually support backing-chain with file-child-based filters in it..
>>
>> Yes, because this is not about the QAPI 'backing' link.  This function
>> should continue to work even if there are filters in the backing chain.
>>
>>>> +        active = bdrv_filtered_bs(active);
>>>>        }
>>>>    
>>>>        return active;
>>>> @@ -4226,11 +4231,11 @@ bool bdrv_is_backing_chain_frozen(BlockDriverState *bs, BlockDriverState *base,
>>>>    {
>>>>        BlockDriverState *i;
>>>>    
>>>> -    for (i = bs; i != base; i = backing_bs(i)) {
>>>> +    for (i = bs; i != base; i = child_bs(i->backing)) {
>>>
>>> and here don't..
>>
>> Yes, because this function is about the QAPI 'backing' link.
> 
> Why? What is bad if we just treat backing and file child equally for filters? Some
> scenarios will start to work which didn't, but neither should be damaged I think..

So you mean use bdrv_filtered_bs() everywhere?

> I mean, if we declare for users that "backing chain" may include file child of
> filter nodes, what will break?

Hm, let me try to answer for this case here, and maybe move other cases
to your other mail.

bdrv_is_backing_chain_frozen() is called by:
- bdrv_set_backing_hd()
- bdrv_reopen_parse_backing()
- bdrv_freeze_backing_chain()

Disregarding the last one, these are functions that specifically handle
the 'backing' child (as it is visible to the user through
query-named-block-nodes etc.) -- more on that in reply to your other mail.

Well, it doesn’t matter for bdrv_set_backing_hd(), because this one
specifically uses bs->backing->bs as the @base.  Same for
bdrv_reopen_parse_backing().

OK, so I can’t disregard the last one because it is the only relevant
caller where child_bs(i->backing) vs. bdrv_filtered_bs(i) makes a
difference.  So the actual question is whether
bdrv_freeze_backing_chain() should include non-'backing' children, and I
think it should indeed.  It's used by the block jobs which are supposed
to support filters in the backing chain, so bdrv_freeze_backing_chain()
should walk through filters (and freeze their links).  Consequentially,
bdrv_is_backing_chain_frozen() has to do the same.

So you’re right, in this case, we should use bdrv_filtered_bs(i) and not
child_bs(i->backing).  But I still think there are cases where we
continue to have to use child_bs(i->backing); see the other mail I still
have to write.  (Maybe while writing it I come to the conclusion that I
was just completely wrong.  Who knows.)

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [PATCH v4 02/11] block: Filtered children access functions
  2019-04-19 10:23       ` Vladimir Sementsov-Ogievskiy
@ 2019-04-24 16:36         ` Max Reitz
  2019-05-07  9:32           ` Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 48+ messages in thread
From: Max Reitz @ 2019-04-24 16:36 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-block; +Cc: Kevin Wolf, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 13785 bytes --]

On 19.04.19 12:23, Vladimir Sementsov-Ogievskiy wrote:
> 17.04.2019 19:22, Max Reitz wrote:
>> On 16.04.19 12:02, Vladimir Sementsov-Ogievskiy wrote:
>>> 10.04.2019 23:20, Max Reitz wrote:
>>>> What bs->file and bs->backing mean depends on the node.  For filter
>>>> nodes, both signify a node that will eventually receive all R/W
>>>> accesses.  For format nodes, bs->file contains metadata and data, and
>>>> bs->backing will not receive writes -- instead, writes are COWed to
>>>> bs->file.  Usually.
>>>>
>>>> In any case, it is not trivial to guess what a child means exactly with
>>>> our currently limited form of expression.  It is better to introduce
>>>> some functions that actually guarantee a meaning:
>>>>
>>>> - bdrv_filtered_cow_child() will return the child that receives requests
>>>>     filtered through COW.  That is, reads may or may not be forwarded
>>>>     (depending on the overlay's allocation status), but writes never go to
>>>>     this child.
>>>>
>>>> - bdrv_filtered_rw_child() will return the child that receives requests
>>>>     filtered through some very plain process.  Reads and writes issued to
>>>>     the parent will go to the child as well (although timing, etc. may be
>>>>     modified).
>>>>
>>>> - All drivers but quorum (but quorum is pretty opaque to the general
>>>>     block layer anyway) always only have one of these children: All read
>>>>     requests must be served from the filtered_rw_child (if it exists), so
>>>>     if there was a filtered_cow_child in addition, it would not receive
>>>>     any requests at all.
>>>>     (The closest here is mirror, where all requests are passed on to the
>>>>     source, but with write-blocking, write requests are "COWed" to the
>>>>     target.  But that just means that the target is a special child that
>>>>     cannot be introspected by the generic block layer functions, and that
>>>>     source is a filtered_rw_child.)
>>>>     Therefore, we can also add bdrv_filtered_child() which returns that
>>>>     one child (or NULL, if there is no filtered child).
>>>>
>>>> Also, many places in the current block layer should be skipping filters
>>>> (all filters or just the ones added implicitly, it depends) when going
>>>> through a block node chain.  They do not do that currently, but this
>>>> patch makes them.
>>>>
>>>> One example for this is qemu-img map, which should skip filters and only
>>>> look at the COW elements in the graph.  The change to iotest 204's
>>>> reference output shows how using blkdebug on top of a COW node used to
>>>> make qemu-img map disregard the rest of the backing chain, but with this
>>>> patch, the allocation in the base image is reported correctly.
>>>>
>>>> Furthermore, a note should be made that sometimes we do want to access
>>>> bs->backing directly.  This is whenever the operation in question is not
>>>> about accessing the COW child, but the "backing" child, be it COW or
>>>> not.  This is the case in functions such as bdrv_open_backing_file() or
>>>> whenever we have to deal with the special behavior of @backing as a
>>>> blockdev option, which is that it does not default to null like all
>>>> other child references do.
>>>>
>>>> Finally, the query functions (query-block and query-named-block-nodes)
>>>> are modified to return any filtered child under "backing", not just
>>>> bs->backing or COW children.  This is so that filters do not interrupt
>>>> the reported backing chain.  This changes the output of iotest 184, as
>>>> the throttled node now appears as a backing child.
>>>>
>>>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>>>> ---
>>>>    qapi/block-core.json           |   4 +
>>>>    include/block/block.h          |   1 +
>>>>    include/block/block_int.h      |  40 +++++--
>>>>    block.c                        | 210 +++++++++++++++++++++++++++------
>>>>    block/backup.c                 |   8 +-
>>>>    block/block-backend.c          |  16 ++-
>>>>    block/commit.c                 |  33 +++---
>>>>    block/io.c                     |  45 ++++---
>>>>    block/mirror.c                 |  21 ++--
>>>>    block/qapi.c                   |  30 +++--
>>>>    block/stream.c                 |  13 +-
>>>>    blockdev.c                     |  88 +++++++++++---
>>>>    migration/block-dirty-bitmap.c |   4 +-
>>>>    nbd/server.c                   |   6 +-
>>>>    qemu-img.c                     |  29 ++---
>>>>    tests/qemu-iotests/184.out     |   7 +-
>>>>    tests/qemu-iotests/204.out     |   1 +
>>>>    17 files changed, 411 insertions(+), 145 deletions(-)
>>>
>>> really huge... didn't you consider conversion file-by-file?
>>
>> Frankly, no, I just didn’t consider it.
>>
>> Hm.  I don’t know, 30-patch series always look so frightening.
>>
>>>> diff --git a/block.c b/block.c
>>>> index 16615bc876..e8f6febda0 100644
>>>> --- a/block.c
>>>> +++ b/block.c
>>>
>>> [..]
>>>
>>>>    
>>>> @@ -3467,14 +3469,17 @@ static int bdrv_reopen_parse_backing(BDRVReopenState *reopen_state,
>>>>        /*
>>>>         * Find the "actual" backing file by skipping all links that point
>>>>         * to an implicit node, if any (e.g. a commit filter node).
>>>> +     * We cannot use any of the bdrv_skip_*() functions here because
>>>> +     * those return the first explicit node, while we are looking for
>>>> +     * its overlay here.
>>>>         */
>>>>        overlay_bs = bs;
>>>> -    while (backing_bs(overlay_bs) && backing_bs(overlay_bs)->implicit) {
>>>> -        overlay_bs = backing_bs(overlay_bs);
>>>> +    while (overlay_bs->backing && bdrv_filtered_bs(overlay_bs)->implicit) {
>>>
>>> So, you don't want to skip implicit filters with 'file' child? Then, why not to use
>>> child_bs(overlay_bs->backing), like in following if condition?
>>
>> I think it was an artifact of writing the patch.  I started with
>> bdrv_filtered_bs() and then realized this depends on ->backing,
>> actually.  There was no functional difference so I left it as it was.
>>
>> But you’re right, it is more clear to use child_bs(overlay_bs->backing)
>> isntead.
>>
>>> Could we instead make backing-based filters equal to file-based, to make it possible
>>> to use file-based filters in backing-chain related scenarios (like upcoming copy-on-read
>>> filter for stream)? So, to expand backing-chain concept to include filters with file child?
>>
>> If I understand you correctly, that’s basically the purpose of this
>> series and especially this patch here.  As far as it is possible and
>> reasonable, I want filters that use bs->backing and bs->file behave the
>> same.
>>
>> However, there are cases where this is not possible and
>> bdrv_reopen_parse_backing() is one such case.  bs->backing and bs->file
>> correspond to QAPI names, namely 'backing' and 'file'.  If that
>> distinction was already visible to the user, we cannot change it now.
>>
>> We definitely cannot make file-based filters use bs->backing now because
>> you can create them over QAPI and they use 'file' as their child name.
>> Can we make backing-based filters use bs->file?  Seems more likely,
>> because all of them are implicit nodes, so the user usually doesn’t see
>> them.  But usually isn’t always; they do become user-visible once the
>> user specifies a node-name for mirror or commit.
>>
>> I found it more reasonable to introduce new functions that explicitly
>> express what kind of child they expect and then apply them everywhere as
>> I saw fit, instead of making the mirror/commit filter drivers use
>> bs->file and hope it works; not least because I’d still have to go
>> through the whole block layer and check every instance of bs->backing to
>> see whether it really needs bs->backing or whether it should use either
>> of bs->backing or bs->file.
>>
>>>> +        overlay_bs = bdrv_filtered_bs(overlay_bs);
>>>>        }
>>>>    
>>>>        /* If we want to replace the backing file we need some extra checks */
>>>> -    if (new_backing_bs != backing_bs(overlay_bs)) {
>>>> +    if (new_backing_bs != child_bs(overlay_bs->backing)) { >           /* Check for implicit nodes between bs and its backing file */
>>>>            if (bs != overlay_bs) {
>>>>                error_setg(errp, "Cannot change backing link if '%s' has "
>>>
>>> [..]
>>>
>>>> @@ -4203,8 +4208,8 @@ int bdrv_change_backing_file(BlockDriverState *bs,
>>>>    BlockDriverState *bdrv_find_overlay(BlockDriverState *active,
>>>>                                        BlockDriverState *bs)
>>>>    {
>>>> -    while (active && bs != backing_bs(active)) {
>>>> -        active = backing_bs(active);
>>>> +    while (active && bs != bdrv_filtered_bs(active)) {
>>>
>>> hmm and here you actually support backing-chain with file-child-based filters in it..
>>
>> Yes, because this is not about the QAPI 'backing' link.  This function
>> should continue to work even if there are filters in the backing chain.
> 
> this is a generic function to find overlay in backing chain and it may be used from different places,
> for example it is used in Andrey's series about filter for block-stream.

Well, all places that use it accept backing chains with filters inside
of them.

> It is used from qmp_block_commit, isn't it about QAPI?

By "QAPI 'backing' link" I mean the user-visible block graph.  Hm.  I
wrote in my other mail that you could use query-named-block-nodes to see
that graph; apparently you can’t.  So besides x-debug-query-block-graph,
we still don’t have any facility to query the block graph?  I don’t know
what to say.

Anyway, you can still construct the graph with blockdev-add, so it is
user-visible.  And in that block graph, there is a 'backing' link, and
there is a 'file' link -- this is what I mean with "QAPI link".

We have commands that are abstract and don’t work on specific graph
links.  For instance, block-commit commits across a backing chain, so it
doesn’t matter whether the graph link is called 'backing' or whatever,
what is important is that it’s a COW link.  But we should also ignore
filters on the way, so this patch makes block-commit and others use
those more abstract child access functions.

But whenever it is about exactly the "file" or the "backing" link, we
have to use bs->file and bs->backing, respectively.  That's just how it
currently is.

>>>> +        active = bdrv_filtered_bs(active);
>>>>        }
>>>>    
>>>>        return active;
>>>> @@ -4226,11 +4231,11 @@ bool bdrv_is_backing_chain_frozen(BlockDriverState *bs, BlockDriverState *base,
>>>>    {
>>>>        BlockDriverState *i;
>>>>    
>>>> -    for (i = bs; i != base; i = backing_bs(i)) {
>>>> +    for (i = bs; i != base; i = child_bs(i->backing)) {
>>>
>>> and here don't..
>>
>> Yes, because this function is about the QAPI 'backing' link.
> 
> And this again a generic thing, that may be used in same places as bdrv_find_overlay,

But it isn’t.

> and it is used in series about block-stream filter too. So, for further developments
> we'll have to keep in mind all these differences between generic block layer functions,
> which supports .file children inside backing chain and which are not...

I was wrong about bdrv_is_backing_chain_frozen(), if that helps (as I
wrote in my other (previous) mail).

But for example bdrv_set_backing_hd() always has to use bs->backing,
because that’s what it’s about (and I do change its descriptive comment
to reflect that, so you don’t need to keep it in mind).  Same for
bdrv_open_backing_file().

Hm, what other cases are there...

bdrv_reopen_parse_backing(): Fundamentally, this too is about the
user-visible "backing" link (as specified through x-blockdev-reopen).
But the loop it contains is more difficult to translate than I had
thought.  At some point, there needs to be a bs->backing link, because
that is what this function is about, but it should also skip all
implicit filters in the way, I think.  So e.g. this should be recognized:

bs  ---backing-->  COR ---file-->  base

@overlay_bs should be COR, I think...?  I mean, as long as COR is an
implicit node.  So the loop really should use bdrv_filtered_bs()
everywhere, and then the same afterwards.  I think that we should also
ensure that @bs can support a ->backing child, but how would I check
that?  Maybe it’s safe to just omit such a check...

But then another issue comes in: The link to replace (in the above case
from "COR" to "base") is no longer necessarily a backing link.  So
bdrv_reopen_commit() has to be capable of replacing both bs->backing and
bs->file.

Actually, how does bdrv_reopen_commit() handle implicit nodes at all?
bdrv_reopen_parse_backing() just sets reopen_state->replace_backing_bs
and ->new_backing_bs.  It doesn’t communicate anything about overlay_bs.
 bdrv_reopen_commit() then asserts that !bs->backing->bs->implicit and
replaces bs->backing.  So it seems to just fail on the implicit nodes
that bdrv_reopen_parse_backing() took care to skip...


OK, what else...  bdrv_reopen_prepare() checks
reopen_state->bs->backing, which I claim is correct because while there
may be implicit filters in the chain, the first link has to be a
->backing link.

bdrv_backing_overridden() has to query bs->backing because this function
is used when it is about a specific characteristic of the backing link:
There is a non-null default (given by the image header), so if the
current bs->backing matches this default, you do not have to specify the
backing filename in either blockdev-add or a filename.  Same in
bdrv_refresh_filename().


I hope that was all...?

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [PATCH v4 02/11] block: Filtered children access functions
  2019-04-24 16:36         ` Max Reitz
@ 2019-05-07  9:32           ` Vladimir Sementsov-Ogievskiy
  2019-05-07 13:15             ` Max Reitz
  0 siblings, 1 reply; 48+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2019-05-07  9:32 UTC (permalink / raw)
  To: Max Reitz, qemu-block; +Cc: Kevin Wolf, qemu-devel

24.04.2019 19:36, Max Reitz wrote:
> On 19.04.19 12:23, Vladimir Sementsov-Ogievskiy wrote:
>> 17.04.2019 19:22, Max Reitz wrote:
>>> On 16.04.19 12:02, Vladimir Sementsov-Ogievskiy wrote:
>>>> 10.04.2019 23:20, Max Reitz wrote:
>>>>> What bs->file and bs->backing mean depends on the node.  For filter
>>>>> nodes, both signify a node that will eventually receive all R/W
>>>>> accesses.  For format nodes, bs->file contains metadata and data, and
>>>>> bs->backing will not receive writes -- instead, writes are COWed to
>>>>> bs->file.  Usually.
>>>>>
>>>>> In any case, it is not trivial to guess what a child means exactly with
>>>>> our currently limited form of expression.  It is better to introduce
>>>>> some functions that actually guarantee a meaning:
>>>>>
>>>>> - bdrv_filtered_cow_child() will return the child that receives requests
>>>>>      filtered through COW.  That is, reads may or may not be forwarded
>>>>>      (depending on the overlay's allocation status), but writes never go to
>>>>>      this child.
>>>>>
>>>>> - bdrv_filtered_rw_child() will return the child that receives requests
>>>>>      filtered through some very plain process.  Reads and writes issued to
>>>>>      the parent will go to the child as well (although timing, etc. may be
>>>>>      modified).
>>>>>
>>>>> - All drivers but quorum (but quorum is pretty opaque to the general
>>>>>      block layer anyway) always only have one of these children: All read
>>>>>      requests must be served from the filtered_rw_child (if it exists), so
>>>>>      if there was a filtered_cow_child in addition, it would not receive
>>>>>      any requests at all.
>>>>>      (The closest here is mirror, where all requests are passed on to the
>>>>>      source, but with write-blocking, write requests are "COWed" to the
>>>>>      target.  But that just means that the target is a special child that
>>>>>      cannot be introspected by the generic block layer functions, and that
>>>>>      source is a filtered_rw_child.)
>>>>>      Therefore, we can also add bdrv_filtered_child() which returns that
>>>>>      one child (or NULL, if there is no filtered child).
>>>>>
>>>>> Also, many places in the current block layer should be skipping filters
>>>>> (all filters or just the ones added implicitly, it depends) when going
>>>>> through a block node chain.  They do not do that currently, but this
>>>>> patch makes them.
>>>>>
>>>>> One example for this is qemu-img map, which should skip filters and only
>>>>> look at the COW elements in the graph.  The change to iotest 204's
>>>>> reference output shows how using blkdebug on top of a COW node used to
>>>>> make qemu-img map disregard the rest of the backing chain, but with this
>>>>> patch, the allocation in the base image is reported correctly.
>>>>>
>>>>> Furthermore, a note should be made that sometimes we do want to access
>>>>> bs->backing directly.  This is whenever the operation in question is not
>>>>> about accessing the COW child, but the "backing" child, be it COW or
>>>>> not.  This is the case in functions such as bdrv_open_backing_file() or
>>>>> whenever we have to deal with the special behavior of @backing as a
>>>>> blockdev option, which is that it does not default to null like all
>>>>> other child references do.
>>>>>
>>>>> Finally, the query functions (query-block and query-named-block-nodes)
>>>>> are modified to return any filtered child under "backing", not just
>>>>> bs->backing or COW children.  This is so that filters do not interrupt
>>>>> the reported backing chain.  This changes the output of iotest 184, as
>>>>> the throttled node now appears as a backing child.
>>>>>
>>>>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>>>>> ---
>>>>>     qapi/block-core.json           |   4 +
>>>>>     include/block/block.h          |   1 +
>>>>>     include/block/block_int.h      |  40 +++++--
>>>>>     block.c                        | 210 +++++++++++++++++++++++++++------
>>>>>     block/backup.c                 |   8 +-
>>>>>     block/block-backend.c          |  16 ++-
>>>>>     block/commit.c                 |  33 +++---
>>>>>     block/io.c                     |  45 ++++---
>>>>>     block/mirror.c                 |  21 ++--
>>>>>     block/qapi.c                   |  30 +++--
>>>>>     block/stream.c                 |  13 +-
>>>>>     blockdev.c                     |  88 +++++++++++---
>>>>>     migration/block-dirty-bitmap.c |   4 +-
>>>>>     nbd/server.c                   |   6 +-
>>>>>     qemu-img.c                     |  29 ++---
>>>>>     tests/qemu-iotests/184.out     |   7 +-
>>>>>     tests/qemu-iotests/204.out     |   1 +
>>>>>     17 files changed, 411 insertions(+), 145 deletions(-)
>>>>
>>>> really huge... didn't you consider conversion file-by-file?
>>>
>>> Frankly, no, I just didn’t consider it.
>>>
>>> Hm.  I don’t know, 30-patch series always look so frightening.
>>>
>>>>> diff --git a/block.c b/block.c
>>>>> index 16615bc876..e8f6febda0 100644
>>>>> --- a/block.c
>>>>> +++ b/block.c
>>>>
>>>> [..]
>>>>
>>>>>     
>>>>> @@ -3467,14 +3469,17 @@ static int bdrv_reopen_parse_backing(BDRVReopenState *reopen_state,
>>>>>         /*
>>>>>          * Find the "actual" backing file by skipping all links that point
>>>>>          * to an implicit node, if any (e.g. a commit filter node).
>>>>> +     * We cannot use any of the bdrv_skip_*() functions here because
>>>>> +     * those return the first explicit node, while we are looking for
>>>>> +     * its overlay here.
>>>>>          */
>>>>>         overlay_bs = bs;
>>>>> -    while (backing_bs(overlay_bs) && backing_bs(overlay_bs)->implicit) {
>>>>> -        overlay_bs = backing_bs(overlay_bs);
>>>>> +    while (overlay_bs->backing && bdrv_filtered_bs(overlay_bs)->implicit) {
>>>>
>>>> So, you don't want to skip implicit filters with 'file' child? Then, why not to use
>>>> child_bs(overlay_bs->backing), like in following if condition?
>>>
>>> I think it was an artifact of writing the patch.  I started with
>>> bdrv_filtered_bs() and then realized this depends on ->backing,
>>> actually.  There was no functional difference so I left it as it was.
>>>
>>> But you’re right, it is more clear to use child_bs(overlay_bs->backing)
>>> isntead.
>>>
>>>> Could we instead make backing-based filters equal to file-based, to make it possible
>>>> to use file-based filters in backing-chain related scenarios (like upcoming copy-on-read
>>>> filter for stream)? So, to expand backing-chain concept to include filters with file child?
>>>
>>> If I understand you correctly, that’s basically the purpose of this
>>> series and especially this patch here.  As far as it is possible and
>>> reasonable, I want filters that use bs->backing and bs->file behave the
>>> same.
>>>
>>> However, there are cases where this is not possible and
>>> bdrv_reopen_parse_backing() is one such case.  bs->backing and bs->file
>>> correspond to QAPI names, namely 'backing' and 'file'.  If that
>>> distinction was already visible to the user, we cannot change it now.
>>>
>>> We definitely cannot make file-based filters use bs->backing now because
>>> you can create them over QAPI and they use 'file' as their child name.
>>> Can we make backing-based filters use bs->file?  Seems more likely,
>>> because all of them are implicit nodes, so the user usually doesn’t see
>>> them.  But usually isn’t always; they do become user-visible once the
>>> user specifies a node-name for mirror or commit.
>>>
>>> I found it more reasonable to introduce new functions that explicitly
>>> express what kind of child they expect and then apply them everywhere as
>>> I saw fit, instead of making the mirror/commit filter drivers use
>>> bs->file and hope it works; not least because I’d still have to go
>>> through the whole block layer and check every instance of bs->backing to
>>> see whether it really needs bs->backing or whether it should use either
>>> of bs->backing or bs->file.
>>>
>>>>> +        overlay_bs = bdrv_filtered_bs(overlay_bs);
>>>>>         }
>>>>>     
>>>>>         /* If we want to replace the backing file we need some extra checks */
>>>>> -    if (new_backing_bs != backing_bs(overlay_bs)) {
>>>>> +    if (new_backing_bs != child_bs(overlay_bs->backing)) { >           /* Check for implicit nodes between bs and its backing file */
>>>>>             if (bs != overlay_bs) {
>>>>>                 error_setg(errp, "Cannot change backing link if '%s' has "
>>>>
>>>> [..]
>>>>
>>>>> @@ -4203,8 +4208,8 @@ int bdrv_change_backing_file(BlockDriverState *bs,
>>>>>     BlockDriverState *bdrv_find_overlay(BlockDriverState *active,
>>>>>                                         BlockDriverState *bs)
>>>>>     {
>>>>> -    while (active && bs != backing_bs(active)) {
>>>>> -        active = backing_bs(active);
>>>>> +    while (active && bs != bdrv_filtered_bs(active)) {
>>>>
>>>> hmm and here you actually support backing-chain with file-child-based filters in it..
>>>
>>> Yes, because this is not about the QAPI 'backing' link.  This function
>>> should continue to work even if there are filters in the backing chain.
>>
>> this is a generic function to find overlay in backing chain and it may be used from different places,
>> for example it is used in Andrey's series about filter for block-stream.
> 
> Well, all places that use it accept backing chains with filters inside
> of them.
> 
>> It is used from qmp_block_commit, isn't it about QAPI?
> 
> By "QAPI 'backing' link" I mean the user-visible block graph.  Hm.  I
> wrote in my other mail that you could use query-named-block-nodes to see
> that graph; apparently you can’t.  So besides x-debug-query-block-graph,
> we still don’t have any facility to query the block graph?  I don’t know
> what to say.
> 
> Anyway, you can still construct the graph with blockdev-add, so it is
> user-visible.  And in that block graph, there is a 'backing' link, and
> there is a 'file' link -- this is what I mean with "QAPI link".
> 
> We have commands that are abstract and don’t work on specific graph
> links.  For instance, block-commit commits across a backing chain, so it
> doesn’t matter whether the graph link is called 'backing' or whatever,
> what is important is that it’s a COW link.  But we should also ignore
> filters on the way, so this patch makes block-commit and others use
> those more abstract child access functions.
> 
> But whenever it is about exactly the "file" or the "backing" link, we
> have to use bs->file and bs->backing, respectively.  That's just how it
> currently is.
> 
>>>>> +        active = bdrv_filtered_bs(active);
>>>>>         }
>>>>>     
>>>>>         return active;
>>>>> @@ -4226,11 +4231,11 @@ bool bdrv_is_backing_chain_frozen(BlockDriverState *bs, BlockDriverState *base,
>>>>>     {
>>>>>         BlockDriverState *i;
>>>>>     
>>>>> -    for (i = bs; i != base; i = backing_bs(i)) {
>>>>> +    for (i = bs; i != base; i = child_bs(i->backing)) {
>>>>
>>>> and here don't..
>>>
>>> Yes, because this function is about the QAPI 'backing' link.
>>
>> And this again a generic thing, that may be used in same places as bdrv_find_overlay,
> 
> But it isn’t.
> 
>> and it is used in series about block-stream filter too. So, for further developments
>> we'll have to keep in mind all these differences between generic block layer functions,
>> which supports .file children inside backing chain and which are not...
> 
> I was wrong about bdrv_is_backing_chain_frozen(), if that helps (as I
> wrote in my other (previous) mail).
> 
> But for example bdrv_set_backing_hd() always has to use bs->backing,
> because that’s what it’s about (and I do change its descriptive comment
> to reflect that, so you don’t need to keep it in mind).  Same for
> bdrv_open_backing_file().
> 
> Hm, what other cases are there...
> 
> bdrv_reopen_parse_backing(): Fundamentally, this too is about the
> user-visible "backing" link (as specified through x-blockdev-reopen).
> But the loop it contains is more difficult to translate than I had
> thought.  At some point, there needs to be a bs->backing link, because
> that is what this function is about, but it should also skip all
> implicit filters in the way, I think.  So e.g. this should be recognized:
> 
> bs  ---backing-->  COR ---file-->  base
> 
> @overlay_bs should be COR, I think...?  I mean, as long as COR is an
> implicit node.  So the loop really should use bdrv_filtered_bs()
> everywhere, and then the same afterwards.  I think that we should also
> ensure that @bs can support a ->backing child, but how would I check
> that?  Maybe it’s safe to just omit such a check...
> 
> But then another issue comes in: The link to replace (in the above case
> from "COR" to "base") is no longer necessarily a backing link.  So
> bdrv_reopen_commit() has to be capable of replacing both bs->backing and
> bs->file.
> 
> Actually, how does bdrv_reopen_commit() handle implicit nodes at all?
> bdrv_reopen_parse_backing() just sets reopen_state->replace_backing_bs
> and ->new_backing_bs.  It doesn’t communicate anything about overlay_bs.
>   bdrv_reopen_commit() then asserts that !bs->backing->bs->implicit and
> replaces bs->backing.  So it seems to just fail on the implicit nodes
> that bdrv_reopen_parse_backing() took care to skip...
> 
> 
> OK, what else...  bdrv_reopen_prepare() checks
> reopen_state->bs->backing, which I claim is correct because while there
> may be implicit filters in the chain, the first link has to be a
> ->backing link.

[sorry for a long delay]
Are you working on next version or waiting for more reviews?

Why first link should be backing? We want to skip all implicit filters, including
file-child-based in following call to bdrv_reopen_parse_backing(). So, don't we
want something like bdrv_backing_chain_next() here? But then a question, could
reopen_state->bs be filter itself...

> 
> bdrv_backing_overridden() has to query bs->backing because this function
> is used when it is about a specific characteristic of the backing link:
> There is a non-null default (given by the image header), so if the
> current bs->backing matches this default, you do not have to specify the
> backing filename in either blockdev-add or a filename.  Same in
> bdrv_refresh_filename().
> 
> 
> I hope that was all...?
> 
> Max
> 


-- 
Best regards,
Vladimir

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [PATCH v4 02/11] block: Filtered children access functions
  2019-05-07  9:32           ` Vladimir Sementsov-Ogievskiy
@ 2019-05-07 13:15             ` Max Reitz
  2019-05-07 13:33               ` Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 48+ messages in thread
From: Max Reitz @ 2019-05-07 13:15 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-block; +Cc: Kevin Wolf, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 15287 bytes --]

On 07.05.19 11:32, Vladimir Sementsov-Ogievskiy wrote:
> 24.04.2019 19:36, Max Reitz wrote:
>> On 19.04.19 12:23, Vladimir Sementsov-Ogievskiy wrote:
>>> 17.04.2019 19:22, Max Reitz wrote:
>>>> On 16.04.19 12:02, Vladimir Sementsov-Ogievskiy wrote:
>>>>> 10.04.2019 23:20, Max Reitz wrote:
>>>>>> What bs->file and bs->backing mean depends on the node.  For filter
>>>>>> nodes, both signify a node that will eventually receive all R/W
>>>>>> accesses.  For format nodes, bs->file contains metadata and data, and
>>>>>> bs->backing will not receive writes -- instead, writes are COWed to
>>>>>> bs->file.  Usually.
>>>>>>
>>>>>> In any case, it is not trivial to guess what a child means exactly with
>>>>>> our currently limited form of expression.  It is better to introduce
>>>>>> some functions that actually guarantee a meaning:
>>>>>>
>>>>>> - bdrv_filtered_cow_child() will return the child that receives requests
>>>>>>      filtered through COW.  That is, reads may or may not be forwarded
>>>>>>      (depending on the overlay's allocation status), but writes never go to
>>>>>>      this child.
>>>>>>
>>>>>> - bdrv_filtered_rw_child() will return the child that receives requests
>>>>>>      filtered through some very plain process.  Reads and writes issued to
>>>>>>      the parent will go to the child as well (although timing, etc. may be
>>>>>>      modified).
>>>>>>
>>>>>> - All drivers but quorum (but quorum is pretty opaque to the general
>>>>>>      block layer anyway) always only have one of these children: All read
>>>>>>      requests must be served from the filtered_rw_child (if it exists), so
>>>>>>      if there was a filtered_cow_child in addition, it would not receive
>>>>>>      any requests at all.
>>>>>>      (The closest here is mirror, where all requests are passed on to the
>>>>>>      source, but with write-blocking, write requests are "COWed" to the
>>>>>>      target.  But that just means that the target is a special child that
>>>>>>      cannot be introspected by the generic block layer functions, and that
>>>>>>      source is a filtered_rw_child.)
>>>>>>      Therefore, we can also add bdrv_filtered_child() which returns that
>>>>>>      one child (or NULL, if there is no filtered child).
>>>>>>
>>>>>> Also, many places in the current block layer should be skipping filters
>>>>>> (all filters or just the ones added implicitly, it depends) when going
>>>>>> through a block node chain.  They do not do that currently, but this
>>>>>> patch makes them.
>>>>>>
>>>>>> One example for this is qemu-img map, which should skip filters and only
>>>>>> look at the COW elements in the graph.  The change to iotest 204's
>>>>>> reference output shows how using blkdebug on top of a COW node used to
>>>>>> make qemu-img map disregard the rest of the backing chain, but with this
>>>>>> patch, the allocation in the base image is reported correctly.
>>>>>>
>>>>>> Furthermore, a note should be made that sometimes we do want to access
>>>>>> bs->backing directly.  This is whenever the operation in question is not
>>>>>> about accessing the COW child, but the "backing" child, be it COW or
>>>>>> not.  This is the case in functions such as bdrv_open_backing_file() or
>>>>>> whenever we have to deal with the special behavior of @backing as a
>>>>>> blockdev option, which is that it does not default to null like all
>>>>>> other child references do.
>>>>>>
>>>>>> Finally, the query functions (query-block and query-named-block-nodes)
>>>>>> are modified to return any filtered child under "backing", not just
>>>>>> bs->backing or COW children.  This is so that filters do not interrupt
>>>>>> the reported backing chain.  This changes the output of iotest 184, as
>>>>>> the throttled node now appears as a backing child.
>>>>>>
>>>>>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>>>>>> ---
>>>>>>     qapi/block-core.json           |   4 +
>>>>>>     include/block/block.h          |   1 +
>>>>>>     include/block/block_int.h      |  40 +++++--
>>>>>>     block.c                        | 210 +++++++++++++++++++++++++++------
>>>>>>     block/backup.c                 |   8 +-
>>>>>>     block/block-backend.c          |  16 ++-
>>>>>>     block/commit.c                 |  33 +++---
>>>>>>     block/io.c                     |  45 ++++---
>>>>>>     block/mirror.c                 |  21 ++--
>>>>>>     block/qapi.c                   |  30 +++--
>>>>>>     block/stream.c                 |  13 +-
>>>>>>     blockdev.c                     |  88 +++++++++++---
>>>>>>     migration/block-dirty-bitmap.c |   4 +-
>>>>>>     nbd/server.c                   |   6 +-
>>>>>>     qemu-img.c                     |  29 ++---
>>>>>>     tests/qemu-iotests/184.out     |   7 +-
>>>>>>     tests/qemu-iotests/204.out     |   1 +
>>>>>>     17 files changed, 411 insertions(+), 145 deletions(-)
>>>>>
>>>>> really huge... didn't you consider conversion file-by-file?
>>>>
>>>> Frankly, no, I just didn’t consider it.
>>>>
>>>> Hm.  I don’t know, 30-patch series always look so frightening.
>>>>
>>>>>> diff --git a/block.c b/block.c
>>>>>> index 16615bc876..e8f6febda0 100644
>>>>>> --- a/block.c
>>>>>> +++ b/block.c
>>>>>
>>>>> [..]
>>>>>
>>>>>>     
>>>>>> @@ -3467,14 +3469,17 @@ static int bdrv_reopen_parse_backing(BDRVReopenState *reopen_state,
>>>>>>         /*
>>>>>>          * Find the "actual" backing file by skipping all links that point
>>>>>>          * to an implicit node, if any (e.g. a commit filter node).
>>>>>> +     * We cannot use any of the bdrv_skip_*() functions here because
>>>>>> +     * those return the first explicit node, while we are looking for
>>>>>> +     * its overlay here.
>>>>>>          */
>>>>>>         overlay_bs = bs;
>>>>>> -    while (backing_bs(overlay_bs) && backing_bs(overlay_bs)->implicit) {
>>>>>> -        overlay_bs = backing_bs(overlay_bs);
>>>>>> +    while (overlay_bs->backing && bdrv_filtered_bs(overlay_bs)->implicit) {
>>>>>
>>>>> So, you don't want to skip implicit filters with 'file' child? Then, why not to use
>>>>> child_bs(overlay_bs->backing), like in following if condition?
>>>>
>>>> I think it was an artifact of writing the patch.  I started with
>>>> bdrv_filtered_bs() and then realized this depends on ->backing,
>>>> actually.  There was no functional difference so I left it as it was.
>>>>
>>>> But you’re right, it is more clear to use child_bs(overlay_bs->backing)
>>>> isntead.
>>>>
>>>>> Could we instead make backing-based filters equal to file-based, to make it possible
>>>>> to use file-based filters in backing-chain related scenarios (like upcoming copy-on-read
>>>>> filter for stream)? So, to expand backing-chain concept to include filters with file child?
>>>>
>>>> If I understand you correctly, that’s basically the purpose of this
>>>> series and especially this patch here.  As far as it is possible and
>>>> reasonable, I want filters that use bs->backing and bs->file behave the
>>>> same.
>>>>
>>>> However, there are cases where this is not possible and
>>>> bdrv_reopen_parse_backing() is one such case.  bs->backing and bs->file
>>>> correspond to QAPI names, namely 'backing' and 'file'.  If that
>>>> distinction was already visible to the user, we cannot change it now.
>>>>
>>>> We definitely cannot make file-based filters use bs->backing now because
>>>> you can create them over QAPI and they use 'file' as their child name.
>>>> Can we make backing-based filters use bs->file?  Seems more likely,
>>>> because all of them are implicit nodes, so the user usually doesn’t see
>>>> them.  But usually isn’t always; they do become user-visible once the
>>>> user specifies a node-name for mirror or commit.
>>>>
>>>> I found it more reasonable to introduce new functions that explicitly
>>>> express what kind of child they expect and then apply them everywhere as
>>>> I saw fit, instead of making the mirror/commit filter drivers use
>>>> bs->file and hope it works; not least because I’d still have to go
>>>> through the whole block layer and check every instance of bs->backing to
>>>> see whether it really needs bs->backing or whether it should use either
>>>> of bs->backing or bs->file.
>>>>
>>>>>> +        overlay_bs = bdrv_filtered_bs(overlay_bs);
>>>>>>         }
>>>>>>     
>>>>>>         /* If we want to replace the backing file we need some extra checks */
>>>>>> -    if (new_backing_bs != backing_bs(overlay_bs)) {
>>>>>> +    if (new_backing_bs != child_bs(overlay_bs->backing)) { >           /* Check for implicit nodes between bs and its backing file */
>>>>>>             if (bs != overlay_bs) {
>>>>>>                 error_setg(errp, "Cannot change backing link if '%s' has "
>>>>>
>>>>> [..]
>>>>>
>>>>>> @@ -4203,8 +4208,8 @@ int bdrv_change_backing_file(BlockDriverState *bs,
>>>>>>     BlockDriverState *bdrv_find_overlay(BlockDriverState *active,
>>>>>>                                         BlockDriverState *bs)
>>>>>>     {
>>>>>> -    while (active && bs != backing_bs(active)) {
>>>>>> -        active = backing_bs(active);
>>>>>> +    while (active && bs != bdrv_filtered_bs(active)) {
>>>>>
>>>>> hmm and here you actually support backing-chain with file-child-based filters in it..
>>>>
>>>> Yes, because this is not about the QAPI 'backing' link.  This function
>>>> should continue to work even if there are filters in the backing chain.
>>>
>>> this is a generic function to find overlay in backing chain and it may be used from different places,
>>> for example it is used in Andrey's series about filter for block-stream.
>>
>> Well, all places that use it accept backing chains with filters inside
>> of them.
>>
>>> It is used from qmp_block_commit, isn't it about QAPI?
>>
>> By "QAPI 'backing' link" I mean the user-visible block graph.  Hm.  I
>> wrote in my other mail that you could use query-named-block-nodes to see
>> that graph; apparently you can’t.  So besides x-debug-query-block-graph,
>> we still don’t have any facility to query the block graph?  I don’t know
>> what to say.
>>
>> Anyway, you can still construct the graph with blockdev-add, so it is
>> user-visible.  And in that block graph, there is a 'backing' link, and
>> there is a 'file' link -- this is what I mean with "QAPI link".
>>
>> We have commands that are abstract and don’t work on specific graph
>> links.  For instance, block-commit commits across a backing chain, so it
>> doesn’t matter whether the graph link is called 'backing' or whatever,
>> what is important is that it’s a COW link.  But we should also ignore
>> filters on the way, so this patch makes block-commit and others use
>> those more abstract child access functions.
>>
>> But whenever it is about exactly the "file" or the "backing" link, we
>> have to use bs->file and bs->backing, respectively.  That's just how it
>> currently is.
>>
>>>>>> +        active = bdrv_filtered_bs(active);
>>>>>>         }
>>>>>>     
>>>>>>         return active;
>>>>>> @@ -4226,11 +4231,11 @@ bool bdrv_is_backing_chain_frozen(BlockDriverState *bs, BlockDriverState *base,
>>>>>>     {
>>>>>>         BlockDriverState *i;
>>>>>>     
>>>>>> -    for (i = bs; i != base; i = backing_bs(i)) {
>>>>>> +    for (i = bs; i != base; i = child_bs(i->backing)) {
>>>>>
>>>>> and here don't..
>>>>
>>>> Yes, because this function is about the QAPI 'backing' link.
>>>
>>> And this again a generic thing, that may be used in same places as bdrv_find_overlay,
>>
>> But it isn’t.
>>
>>> and it is used in series about block-stream filter too. So, for further developments
>>> we'll have to keep in mind all these differences between generic block layer functions,
>>> which supports .file children inside backing chain and which are not...
>>
>> I was wrong about bdrv_is_backing_chain_frozen(), if that helps (as I
>> wrote in my other (previous) mail).
>>
>> But for example bdrv_set_backing_hd() always has to use bs->backing,
>> because that’s what it’s about (and I do change its descriptive comment
>> to reflect that, so you don’t need to keep it in mind).  Same for
>> bdrv_open_backing_file().
>>
>> Hm, what other cases are there...
>>
>> bdrv_reopen_parse_backing(): Fundamentally, this too is about the
>> user-visible "backing" link (as specified through x-blockdev-reopen).
>> But the loop it contains is more difficult to translate than I had
>> thought.  At some point, there needs to be a bs->backing link, because
>> that is what this function is about, but it should also skip all
>> implicit filters in the way, I think.  So e.g. this should be recognized:
>>
>> bs  ---backing-->  COR ---file-->  base
>>
>> @overlay_bs should be COR, I think...?  I mean, as long as COR is an
>> implicit node.  So the loop really should use bdrv_filtered_bs()
>> everywhere, and then the same afterwards.  I think that we should also
>> ensure that @bs can support a ->backing child, but how would I check
>> that?  Maybe it’s safe to just omit such a check...
>>
>> But then another issue comes in: The link to replace (in the above case
>> from "COR" to "base") is no longer necessarily a backing link.  So
>> bdrv_reopen_commit() has to be capable of replacing both bs->backing and
>> bs->file.
>>
>> Actually, how does bdrv_reopen_commit() handle implicit nodes at all?
>> bdrv_reopen_parse_backing() just sets reopen_state->replace_backing_bs
>> and ->new_backing_bs.  It doesn’t communicate anything about overlay_bs.
>>   bdrv_reopen_commit() then asserts that !bs->backing->bs->implicit and
>> replaces bs->backing.  So it seems to just fail on the implicit nodes
>> that bdrv_reopen_parse_backing() took care to skip...
>>
>>
>> OK, what else...  bdrv_reopen_prepare() checks
>> reopen_state->bs->backing, which I claim is correct because while there
>> may be implicit filters in the chain, the first link has to be a
>> ->backing link.
> 
> [sorry for a long delay]
> Are you working on next version or waiting for more reviews?

I haven’t worked on the next version yet, but that’s just because other
things were more important, not because of reviews.

> Why first link should be backing? We want to skip all implicit filters, including
> file-child-based in following call to bdrv_reopen_parse_backing(). So, don't we
> want something like bdrv_backing_chain_next() here? But then a question, could
> reopen_state->bs be filter itself...

Because this function is about the 'backing' option.  As I explained
above, this must correspond to a bs->backing child.  If there is an
implicit filter, it will still be under bs->backing.

Max

>> bdrv_backing_overridden() has to query bs->backing because this function
>> is used when it is about a specific characteristic of the backing link:
>> There is a non-null default (given by the image header), so if the
>> current bs->backing matches this default, you do not have to specify the
>> backing filename in either blockdev-add or a filename.  Same in
>> bdrv_refresh_filename().
>>
>>
>> I hope that was all...?
>>
>> Max
>>
> 
> 



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [PATCH v4 02/11] block: Filtered children access functions
  2019-04-10 20:20   ` Max Reitz
  (?)
  (?)
@ 2019-05-07 13:30   ` Vladimir Sementsov-Ogievskiy
  2019-05-07 15:13     ` Max Reitz
  -1 siblings, 1 reply; 48+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2019-05-07 13:30 UTC (permalink / raw)
  To: Max Reitz, qemu-block; +Cc: Kevin Wolf, qemu-devel

10.04.2019 23:20, Max Reitz wrote:
> What bs->file and bs->backing mean depends on the node.  For filter
> nodes, both signify a node that will eventually receive all R/W
> accesses.  For format nodes, bs->file contains metadata and data, and
> bs->backing will not receive writes -- instead, writes are COWed to
> bs->file.  Usually.
> 
> In any case, it is not trivial to guess what a child means exactly with
> our currently limited form of expression.  It is better to introduce
> some functions that actually guarantee a meaning:
> 
> - bdrv_filtered_cow_child() will return the child that receives requests
>    filtered through COW.  That is, reads may or may not be forwarded
>    (depending on the overlay's allocation status), but writes never go to
>    this child.
> 
> - bdrv_filtered_rw_child() will return the child that receives requests
>    filtered through some very plain process.  Reads and writes issued to
>    the parent will go to the child as well (although timing, etc. may be
>    modified).
> 
> - All drivers but quorum (but quorum is pretty opaque to the general
>    block layer anyway) always only have one of these children: All read
>    requests must be served from the filtered_rw_child (if it exists), so
>    if there was a filtered_cow_child in addition, it would not receive
>    any requests at all.
>    (The closest here is mirror, where all requests are passed on to the
>    source, but with write-blocking, write requests are "COWed" to the
>    target.  But that just means that the target is a special child that
>    cannot be introspected by the generic block layer functions, and that
>    source is a filtered_rw_child.)
>    Therefore, we can also add bdrv_filtered_child() which returns that
>    one child (or NULL, if there is no filtered child).
> 
> Also, many places in the current block layer should be skipping filters
> (all filters or just the ones added implicitly, it depends) when going
> through a block node chain.  They do not do that currently, but this
> patch makes them.
> 
> One example for this is qemu-img map, which should skip filters and only
> look at the COW elements in the graph.  The change to iotest 204's
> reference output shows how using blkdebug on top of a COW node used to
> make qemu-img map disregard the rest of the backing chain, but with this
> patch, the allocation in the base image is reported correctly.
> 
> Furthermore, a note should be made that sometimes we do want to access
> bs->backing directly.  This is whenever the operation in question is not
> about accessing the COW child, but the "backing" child, be it COW or
> not.  This is the case in functions such as bdrv_open_backing_file() or
> whenever we have to deal with the special behavior of @backing as a
> blockdev option, which is that it does not default to null like all
> other child references do.
> 
> Finally, the query functions (query-block and query-named-block-nodes)
> are modified to return any filtered child under "backing", not just
> bs->backing or COW children.  This is so that filters do not interrupt
> the reported backing chain.  This changes the output of iotest 184, as
> the throttled node now appears as a backing child.
> 
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>   qapi/block-core.json           |   4 +
>   include/block/block.h          |   1 +
>   include/block/block_int.h      |  40 +++++--
>   block.c                        | 210 +++++++++++++++++++++++++++------
>   block/backup.c                 |   8 +-
>   block/block-backend.c          |  16 ++-
>   block/commit.c                 |  33 +++---
>   block/io.c                     |  45 ++++---
>   block/mirror.c                 |  21 ++--
>   block/qapi.c                   |  30 +++--
>   block/stream.c                 |  13 +-
>   blockdev.c                     |  88 +++++++++++---
>   migration/block-dirty-bitmap.c |   4 +-
>   nbd/server.c                   |   6 +-
>   qemu-img.c                     |  29 ++---
>   tests/qemu-iotests/184.out     |   7 +-
>   tests/qemu-iotests/204.out     |   1 +
>   17 files changed, 411 insertions(+), 145 deletions(-)
> 
> diff --git a/qapi/block-core.json b/qapi/block-core.json
> index 7ccbfff9d0..dbd9286e4a 100644
> --- a/qapi/block-core.json
> +++ b/qapi/block-core.json
> @@ -2502,6 +2502,10 @@
>   # On successful completion the image file is updated to drop the backing file
>   # and the BLOCK_JOB_COMPLETED event is emitted.
>   #
> +# In case @device is a filter node, block-stream modifies the first non-filter
> +# overlay node below it to point to base's backing node (or NULL if @base was
> +# not specified) instead of modifying @device itself.
> +#

Is it necessary, why we can't keep it as is, modifying exactly device node? May be,
user wants to use filter in stream process, throttling for example.

>   # @job-id: identifier for the newly-created block job. If
>   #          omitted, the device name will be used. (Since 2.7)
>   #
> diff --git a/include/block/block.h b/include/block/block.h
> index c7a26199aa..2005664f14 100644
> --- a/include/block/block.h
> +++ b/include/block/block.h
> @@ -467,6 +467,7 @@ BlockDriverState *bdrv_lookup_bs(const char *device,
>                                    const char *node_name,
>                                    Error **errp);
>   bool bdrv_chain_contains(BlockDriverState *top, BlockDriverState *base);
> +bool bdrv_legacy_chain_contains(BlockDriverState *top, BlockDriverState *base);
>   BlockDriverState *bdrv_next_node(BlockDriverState *bs);
>   BlockDriverState *bdrv_next_all_states(BlockDriverState *bs);
>   
> diff --git a/include/block/block_int.h b/include/block/block_int.h
> index 01e855a066..b22b1164f8 100644
> --- a/include/block/block_int.h
> +++ b/include/block/block_int.h
> @@ -90,9 +90,11 @@ struct BlockDriver {
>       int instance_size;
>   
>       /* set to true if the BlockDriver is a block filter. Block filters pass
> -     * certain callbacks that refer to data (see block.c) to their bs->file if
> -     * the driver doesn't implement them. Drivers that do not wish to forward
> -     * must implement them and return -ENOTSUP.
> +     * certain callbacks that refer to data (see block.c) to their bs->file
> +     * or bs->backing (whichever one exists) if the driver doesn't implement
> +     * them. Drivers that do not wish to forward must implement them and return
> +     * -ENOTSUP.
> +     * Note that filters are not allowed to modify data.
>        */
>       bool is_filter;
>       /* for snapshots block filter like Quorum can implement the
> @@ -906,11 +908,6 @@ typedef enum BlockMirrorBackingMode {
>       MIRROR_LEAVE_BACKING_CHAIN,
>   } BlockMirrorBackingMode;
>   
> -static inline BlockDriverState *backing_bs(BlockDriverState *bs)
> -{
> -    return bs->backing ? bs->backing->bs : NULL;
> -}
> -
>   
>   /* Essential block drivers which must always be statically linked into qemu, and
>    * which therefore can be accessed without using bdrv_find_format() */
> @@ -1243,4 +1240,31 @@ int coroutine_fn bdrv_co_copy_range_to(BdrvChild *src, uint64_t src_offset,
>   
>   int refresh_total_sectors(BlockDriverState *bs, int64_t hint);
>   
> +BdrvChild *bdrv_filtered_cow_child(BlockDriverState *bs);
> +BdrvChild *bdrv_filtered_rw_child(BlockDriverState *bs);
> +BdrvChild *bdrv_filtered_child(BlockDriverState *bs);
> +BlockDriverState *bdrv_skip_implicit_filters(BlockDriverState *bs);
> +BlockDriverState *bdrv_skip_rw_filters(BlockDriverState *bs);
> +BlockDriverState *bdrv_backing_chain_next(BlockDriverState *bs);
> +
> +static inline BlockDriverState *child_bs(BdrvChild *child)
> +{
> +    return child ? child->bs : NULL;
> +}
> +
> +static inline BlockDriverState *bdrv_filtered_cow_bs(BlockDriverState *bs)
> +{
> +    return child_bs(bdrv_filtered_cow_child(bs));
> +}
> +
> +static inline BlockDriverState *bdrv_filtered_rw_bs(BlockDriverState *bs)
> +{
> +    return child_bs(bdrv_filtered_rw_child(bs));
> +}
> +
> +static inline BlockDriverState *bdrv_filtered_bs(BlockDriverState *bs)
> +{
> +    return child_bs(bdrv_filtered_child(bs));
> +}
> +
>   #endif /* BLOCK_INT_H */
> diff --git a/block.c b/block.c
> index 16615bc876..e8f6febda0 100644
> --- a/block.c
> +++ b/block.c
> @@ -556,11 +556,12 @@ int bdrv_create_file(const char *filename, QemuOpts *opts, Error **errp)
>   int bdrv_probe_blocksizes(BlockDriverState *bs, BlockSizes *bsz)
>   {
>       BlockDriver *drv = bs->drv;
> +    BlockDriverState *filtered = bdrv_filtered_rw_bs(bs);
>   
>       if (drv && drv->bdrv_probe_blocksizes) {
>           return drv->bdrv_probe_blocksizes(bs, bsz);
> -    } else if (drv && drv->is_filter && bs->file) {
> -        return bdrv_probe_blocksizes(bs->file->bs, bsz);
> +    } else if (filtered) {
> +        return bdrv_probe_blocksizes(filtered, bsz);
>       }

OK: add support for backing-filters

>   
>       return -ENOTSUP;
> @@ -575,11 +576,12 @@ int bdrv_probe_blocksizes(BlockDriverState *bs, BlockSizes *bsz)
>   int bdrv_probe_geometry(BlockDriverState *bs, HDGeometry *geo)
>   {
>       BlockDriver *drv = bs->drv;
> +    BlockDriverState *filtered = bdrv_filtered_rw_bs(bs);
>   
>       if (drv && drv->bdrv_probe_geometry) {
>           return drv->bdrv_probe_geometry(bs, geo);
> -    } else if (drv && drv->is_filter && bs->file) {
> -        return bdrv_probe_geometry(bs->file->bs, geo);
> +    } else if (filtered) {
> +        return bdrv_probe_geometry(filtered, geo);
>       }


OK: add support for backing-filters (short for backing-child-based filters, as
well as file-filtesr = file-child-based filters)

>   
>       return -ENOTSUP;
> @@ -2336,7 +2338,7 @@ static bool bdrv_inherits_from_recursive(BlockDriverState *child,
>   }
>   
>   /*
> - * Sets the backing file link of a BDS. A new reference is created; callers
> + * Sets the bs->backing link of a BDS. A new reference is created; callers
>    * which don't need their own reference any more must call bdrv_unref().
>    */
>   void bdrv_set_backing_hd(BlockDriverState *bs, BlockDriverState *backing_hd,
> @@ -2345,7 +2347,7 @@ void bdrv_set_backing_hd(BlockDriverState *bs, BlockDriverState *backing_hd,
>       bool update_inherits_from = bdrv_chain_contains(bs, backing_hd) &&
>           bdrv_inherits_from_recursive(backing_hd, bs);
>   
> -    if (bdrv_is_backing_chain_frozen(bs, backing_bs(bs), errp)) {
> +    if (bdrv_is_backing_chain_frozen(bs, child_bs(bs->backing), errp)) {

If we support file-filters for frozen backing chain, could it go through file child here?
Hmm, only in case when we are going to set backing hd for file-filter.. Hmm, could filter have
both file and backing children? Your new API don't restrict it, and choses backing as a default
in this case in bdrv_filtered_rw_child(), so, I assume you suppose possibility of it.

Here we don't want to check the chain, we exactly want to check backing link, so it should be
something like

if (bs->backing && bs->backing->frozen) {
    error_setg("backig exists and frozen!");
    return;
}


Hmm, on the other hand, if we have frozen backing chain, going through file child, we must not add
backing child to the node with file child, as it will change backing chain (which by default goes
through backing)..

Anyway, we don't need to check the whole backing chain, as we may find other frozen backing subchain,
far away of bs.. So, we possibly want to check

if (bdrv_filtered_child(bs) && bdrv_filtered_child(bs)->frozed) {
   ERROR
}


....

also, we'll need to check for frozen file child, when we want to replace it.


>           return;
>       }
>   
> @@ -3467,14 +3469,17 @@ static int bdrv_reopen_parse_backing(BDRVReopenState *reopen_state,
>       /*
>        * Find the "actual" backing file by skipping all links that point
>        * to an implicit node, if any (e.g. a commit filter node).
> +     * We cannot use any of the bdrv_skip_*() functions here because
> +     * those return the first explicit node, while we are looking for
> +     * its overlay here.
>        */
>       overlay_bs = bs;
> -    while (backing_bs(overlay_bs) && backing_bs(overlay_bs)->implicit) {
> -        overlay_bs = backing_bs(overlay_bs);
> +    while (overlay_bs->backing && bdrv_filtered_bs(overlay_bs)->implicit) {
> +        overlay_bs = bdrv_filtered_bs(overlay_bs);
>       }

Agree, that we somehow want to support implicit file-fitlers here too.

>   
>       /* If we want to replace the backing file we need some extra checks */
> -    if (new_backing_bs != backing_bs(overlay_bs)) {
> +    if (new_backing_bs != child_bs(overlay_bs->backing)) {
>           /* Check for implicit nodes between bs and its backing file */
>           if (bs != overlay_bs) {
>               error_setg(errp, "Cannot change backing link if '%s' has "
> @@ -3482,8 +3487,8 @@ static int bdrv_reopen_parse_backing(BDRVReopenState *reopen_state,
>               return -EPERM;
>           }
>           /* Check if the backing link that we want to replace is frozen */
> -        if (bdrv_is_backing_chain_frozen(overlay_bs, backing_bs(overlay_bs),
> -                                         errp)) {
> +        if (bdrv_is_backing_chain_frozen(overlay_bs,
> +                                         child_bs(overlay_bs->backing), errp)) {

Again, I think we need bdrv_is_child_frozen() to check such things.

>               return -EPERM;
>           }
>           reopen_state->replace_backing_bs = true;
> @@ -3634,7 +3639,7 @@ int bdrv_reopen_prepare(BDRVReopenState *reopen_state, BlockReopenQueue *queue,
>        * its metadata. Otherwise the 'backing' option can be omitted.
>        */
>       if (drv->supports_backing && reopen_state->backing_missing &&
> -        (backing_bs(reopen_state->bs) || reopen_state->bs->backing_file[0])) {
> +        (reopen_state->bs->backing || reopen_state->bs->backing_file[0])) {

and if we skip implicit filters in bdrv_backing_chain_next(), shouldn't we skip them
here too?

>           error_setg(errp, "backing is missing for '%s'",
>                      reopen_state->bs->node_name);
>           ret = -EINVAL;
> @@ -3779,7 +3784,7 @@ void bdrv_reopen_commit(BDRVReopenState *reopen_state)
>        * from bdrv_set_backing_hd()) has the new values.
>        */
>       if (reopen_state->replace_backing_bs) {
> -        BlockDriverState *old_backing_bs = backing_bs(bs);
> +        BlockDriverState *old_backing_bs = child_bs(bs->backing);
>           assert(!old_backing_bs || !old_backing_bs->implicit);
>           /* Abort the permission update on the backing bs we're detaching */
>           if (old_backing_bs) {
> @@ -4203,8 +4208,8 @@ int bdrv_change_backing_file(BlockDriverState *bs,
>   BlockDriverState *bdrv_find_overlay(BlockDriverState *active,
>                                       BlockDriverState *bs)
>   {
> -    while (active && bs != backing_bs(active)) {
> -        active = backing_bs(active);
> +    while (active && bs != bdrv_filtered_bs(active)) {

need to adjust comment to the function then, as we may find file-based-overlay, not backing.

> +        active = bdrv_filtered_bs(active);
>       }
>   
>       return active;
> @@ -4226,11 +4231,11 @@ bool bdrv_is_backing_chain_frozen(BlockDriverState *bs, BlockDriverState *base,
>   {
>       BlockDriverState *i;
>   
> -    for (i = bs; i != base; i = backing_bs(i)) {
> +    for (i = bs; i != base; i = child_bs(i->backing)) {
>           if (i->backing && i->backing->frozen) {
>               error_setg(errp, "Cannot change '%s' link from '%s' to '%s'",
>                          i->backing->name, i->node_name,
> -                       backing_bs(i)->node_name);
> +                       i->backing->bs->node_name);
>               return true;
>           }
>       }
> @@ -4254,7 +4259,7 @@ int bdrv_freeze_backing_chain(BlockDriverState *bs, BlockDriverState *base,
>           return -EPERM;
>       }
>   
> -    for (i = bs; i != base; i = backing_bs(i)) {
> +    for (i = bs; i != base; i = child_bs(i->backing)) {
>           if (i->backing) {
>               i->backing->frozen = true;
>           }
> @@ -4272,7 +4277,7 @@ void bdrv_unfreeze_backing_chain(BlockDriverState *bs, BlockDriverState *base)
>   {
>       BlockDriverState *i;
>   
> -    for (i = bs; i != base; i = backing_bs(i)) {
> +    for (i = bs; i != base; i = child_bs(i->backing)) {
>           if (i->backing) {
>               assert(i->backing->frozen);
>               i->backing->frozen = false;
> @@ -4342,9 +4347,7 @@ int bdrv_drop_intermediate(BlockDriverState *top, BlockDriverState *base,
>        * other intermediate nodes have been dropped.
>        * If 'top' is an implicit node (e.g. "commit_top") we should skip
>        * it because no one inherits from it. We use explicit_top for that. */
> -    while (explicit_top && explicit_top->implicit) {
> -        explicit_top = backing_bs(explicit_top);
> -    }
> +    explicit_top = bdrv_skip_implicit_filters(explicit_top);
>       update_inherits_from = bdrv_inherits_from_recursive(base, explicit_top);
>   
>       /* success - we can delete the intermediate states, and link top->base */
> @@ -4494,10 +4497,14 @@ bool bdrv_is_sg(BlockDriverState *bs)
>   
>   bool bdrv_is_encrypted(BlockDriverState *bs)
>   {
> -    if (bs->backing && bs->backing->bs->encrypted) {
> +    BlockDriverState *filtered = bdrv_filtered_bs(bs);
> +    if (bs->encrypted) {
> +        return true;
> +    }
> +    if (filtered && bdrv_is_encrypted(filtered)) {
>           return true;
>       }
> -    return bs->encrypted;
> +    return false;
>   }

one backing child -> recursion through extended backing chain

>   
>   const char *bdrv_get_format_name(BlockDriverState *bs)
> @@ -4794,7 +4801,21 @@ BlockDriverState *bdrv_lookup_bs(const char *device,
>   bool bdrv_chain_contains(BlockDriverState *top, BlockDriverState *base)
>   {
>       while (top && top != base) {
> -        top = backing_bs(top);
> +        top = bdrv_filtered_bs(top);
> +    }
> +
> +    return top != NULL;
> +}

support file-filters

> +
> +/*
> + * Same as bdrv_chain_contains(), but skip implicitly added R/W filter
> + * nodes and do not move past explicitly added R/W filters.
> + */
> +bool bdrv_legacy_chain_contains(BlockDriverState *top, BlockDriverState *base)
> +{
> +    top = bdrv_skip_implicit_filters(top);
> +    while (top && top != base) {
> +        top = bdrv_skip_implicit_filters(bdrv_filtered_cow_bs(top));
>       }

ok

>   
>       return top != NULL;
> @@ -4866,20 +4887,24 @@ int bdrv_has_zero_init_1(BlockDriverState *bs)
>   
>   int bdrv_has_zero_init(BlockDriverState *bs)
>   {
> +    BlockDriverState *filtered;
> +
>       if (!bs->drv) {
>           return 0;
>       }
>   
>       /* If BS is a copy on write image, it is initialized to
>          the contents of the base image, which may not be zeroes.  */
> -    if (bs->backing) {
> +    if (bdrv_filtered_cow_child(bs)) {
>           return 0;
>       }
>       if (bs->drv->bdrv_has_zero_init) {
>           return bs->drv->bdrv_has_zero_init(bs);
>       }
> -    if (bs->file && bs->drv->is_filter) {
> -        return bdrv_has_zero_init(bs->file->bs);
> +
> +    filtered = bdrv_filtered_rw_bs(bs);
> +    if (filtered) {
> +        return bdrv_has_zero_init(filtered);
>       }

add recursion for filters

>   
>       /* safe default */
> @@ -4890,7 +4915,7 @@ bool bdrv_unallocated_blocks_are_zero(BlockDriverState *bs)
>   {
>       BlockDriverInfo bdi;
>   
> -    if (bs->backing) {
> +    if (bdrv_filtered_cow_child(bs)) {
>           return false;
>       }
>   
> @@ -4924,8 +4949,9 @@ int bdrv_get_info(BlockDriverState *bs, BlockDriverInfo *bdi)
>           return -ENOMEDIUM;
>       }
>       if (!drv->bdrv_get_info) {
> -        if (bs->file && drv->is_filter) {
> -            return bdrv_get_info(bs->file->bs, bdi);
> +        BlockDriverState *filtered = bdrv_filtered_rw_bs(bs);
> +        if (filtered) {
> +            return bdrv_get_info(filtered, bdi);
>           }
>           return -ENOTSUP;
>       }
> @@ -5028,7 +5054,17 @@ BlockDriverState *bdrv_find_backing_image(BlockDriverState *bs,
>   
>       is_protocol = path_has_protocol(backing_file);
>   
> -    for (curr_bs = bs; curr_bs->backing; curr_bs = curr_bs->backing->bs) {
> +    /*
> +     * Being largely a legacy function, skip any filters here
> +     * (because filters do not have normal filenames, so they cannot
> +     * match anyway; and allowing json:{} filenames is a bit out of
> +     * scope).
> +     */
> +    for (curr_bs = bdrv_skip_rw_filters(bs);
> +         bdrv_filtered_cow_child(curr_bs) != NULL;
> +         curr_bs = bdrv_backing_chain_next(curr_bs))
> +    {
> +        BlockDriverState *bs_below = bdrv_backing_chain_next(curr_bs);
>   
>           /* If either of the filename paths is actually a protocol, then
>            * compare unmodified paths; otherwise make paths relative */
> @@ -5036,7 +5072,7 @@ BlockDriverState *bdrv_find_backing_image(BlockDriverState *bs,
>               char *backing_file_full_ret;
>   
>               if (strcmp(backing_file, curr_bs->backing_file) == 0) {
> -                retval = curr_bs->backing->bs;
> +                retval = bs_below;
>                   break;
>               }
>               /* Also check against the full backing filename for the image */
> @@ -5046,7 +5082,7 @@ BlockDriverState *bdrv_find_backing_image(BlockDriverState *bs,
>                   bool equal = strcmp(backing_file, backing_file_full_ret) == 0;
>                   g_free(backing_file_full_ret);
>                   if (equal) {
> -                    retval = curr_bs->backing->bs;
> +                    retval = bs_below;
>                       break;
>                   }
>               }
> @@ -5072,7 +5108,7 @@ BlockDriverState *bdrv_find_backing_image(BlockDriverState *bs,
>               g_free(filename_tmp);
>   
>               if (strcmp(backing_file_full, filename_full) == 0) {
> -                retval = curr_bs->backing->bs;
> +                retval = bs_below;
>                   break;
>               }
>           }
> @@ -6237,3 +6273,107 @@ bool bdrv_can_store_new_dirty_bitmap(BlockDriverState *bs, const char *name,
>   
>       return drv->bdrv_can_store_new_dirty_bitmap(bs, name, granularity, errp);
>   }
> +
> +/*
> + * Return the child that @bs acts as an overlay for, and from which data may be
> + * copied in COW or COR operations.  Usually this is the backing file.
> + */
> +BdrvChild *bdrv_filtered_cow_child(BlockDriverState *bs)
> +{
> +    if (!bs || !bs->drv) {
> +        return NULL;
> +    }
> +
> +    if (bs->drv->is_filter) {
> +        return NULL;
> +    }
> +
> +    return bs->backing;
> +}
> +
> +/*
> + * If @bs acts as a pass-through filter for one of its children,
> + * return that child.  "Pass-through" means that write operations to
> + * @bs are forwarded to that child instead of triggering COW.
> + */
> +BdrvChild *bdrv_filtered_rw_child(BlockDriverState *bs)
> +{
> +    if (!bs || !bs->drv) {
> +        return NULL;
> +    }
> +
> +    if (!bs->drv->is_filter) {
> +        return NULL;
> +    }
> +
> +    return bs->backing ?: bs->file;
> +}
> +
> +/*
> + * Return any filtered child, independently of how it reacts to write
> + * accesses and whether data is copied onto this BDS through COR.
> + */
> +BdrvChild *bdrv_filtered_child(BlockDriverState *bs)
> +{
> +    BdrvChild *cow_child = bdrv_filtered_cow_child(bs);
> +    BdrvChild *rw_child = bdrv_filtered_rw_child(bs);
> +
> +    /* There can only be one filtered child at a time */
> +    assert(!(cow_child && rw_child));
> +
> +    return cow_child ?: rw_child;
> +}
> +
> +static BlockDriverState *bdrv_skip_filters(BlockDriverState *bs,
> +                                           bool stop_on_explicit_filter)
> +{
> +    BdrvChild *filtered;
> +
> +    if (!bs) {
> +        return NULL;
> +    }
> +
> +    while (!(stop_on_explicit_filter && !bs->implicit)) {
> +        filtered = bdrv_filtered_rw_child(bs);
> +        if (!filtered) {
> +            break;
> +        }
> +        bs = filtered->bs;
> +    }
> +    /*
> +     * Note that this treats nodes with bs->drv == NULL as not being
> +     * R/W filters (bs->drv == NULL should be replaced by something
> +     * else anyway).
> +     * The advantage of this behavior is that this function will thus
> +     * always return a non-NULL value (given a non-NULL @bs).
> +     */
> +
> +    return bs;
> +}
> +
> +/*
> + * Return the first BDS that has not been added implicitly or that
> + * does not have an RW-filtered child down the chain starting from @bs
> + * (including @bs itself).
> + */
> +BlockDriverState *bdrv_skip_implicit_filters(BlockDriverState *bs)
> +{
> +    return bdrv_skip_filters(bs, true);
> +}
> +
> +/*
> + * Return the first BDS that does not have an RW-filtered child down
> + * the chain starting from @bs (including @bs itself).
> + */
> +BlockDriverState *bdrv_skip_rw_filters(BlockDriverState *bs)
> +{
> +    return bdrv_skip_filters(bs, false);
> +}
> +
> +/*
> + * For a backing chain, return the first non-filter backing image.
> + */
> +BlockDriverState *bdrv_backing_chain_next(BlockDriverState *bs)
> +{
> +    return bdrv_skip_rw_filters(bdrv_filtered_cow_bs(bdrv_skip_rw_filters(bs)));
> +}
> diff --git a/block/backup.c b/block/backup.c
> index 9988753249..9c08353b23 100644
> --- a/block/backup.c
> +++ b/block/backup.c
> @@ -577,6 +577,7 @@ BlockJob *backup_job_create(const char *job_id, BlockDriverState *bs,
>       int64_t len;
>       BlockDriverInfo bdi;
>       BackupBlockJob *job = NULL;
> +    bool target_does_cow;
>       int ret;
>   
>       assert(bs);
> @@ -671,8 +672,9 @@ BlockJob *backup_job_create(const char *job_id, BlockDriverState *bs,
>       /* If there is no backing file on the target, we cannot rely on COW if our
>        * backup cluster size is smaller than the target cluster size. Even for
>        * targets with a backing file, try to avoid COW if possible. */
> +    target_does_cow = bdrv_filtered_cow_child(target);

So, you excluded false-positive case when target is backing-filter. I think, we'd better skip
filters here:

target_does_cow = bdrv_filtered_cow_child(bdrv_skip_rw_filters(target))

>       ret = bdrv_get_info(target, &bdi);
> -    if (ret == -ENOTSUP && !target->backing) {
> +    if (ret == -ENOTSUP && !target_does_cow) {
>           /* Cluster size is not defined */
>           warn_report("The target block device doesn't provide "
>                       "information about the block size and it doesn't have a "
> @@ -681,14 +683,14 @@ BlockJob *backup_job_create(const char *job_id, BlockDriverState *bs,
>                       "this default, the backup may be unusable",
>                       BACKUP_CLUSTER_SIZE_DEFAULT);
>           job->cluster_size = BACKUP_CLUSTER_SIZE_DEFAULT;
> -    } else if (ret < 0 && !target->backing) {
> +    } else if (ret < 0 && !target_does_cow) {
>           error_setg_errno(errp, -ret,
>               "Couldn't determine the cluster size of the target image, "
>               "which has no backing file");
>           error_append_hint(errp,
>               "Aborting, since this may create an unusable destination image\n");
>           goto error;
> -    } else if (ret < 0 && target->backing) {
> +    } else if (ret < 0 && target_does_cow) {
>           /* Not fatal; just trudge on ahead. */
>           job->cluster_size = BACKUP_CLUSTER_SIZE_DEFAULT;
>       } else {
> diff --git a/block/block-backend.c b/block/block-backend.c
> index f78e82a707..aa9a1d84a6 100644
> --- a/block/block-backend.c
> +++ b/block/block-backend.c
> @@ -2089,11 +2089,17 @@ int blk_commit_all(void)
>           AioContext *aio_context = blk_get_aio_context(blk);
>   
>           aio_context_acquire(aio_context);
> -        if (blk_is_inserted(blk) && blk->root->bs->backing) {
> -            int ret = bdrv_commit(blk->root->bs);
> -            if (ret < 0) {
> -                aio_context_release(aio_context);
> -                return ret;
> +        if (blk_is_inserted(blk)) {
> +            BlockDriverState *non_filter;
> +
> +            /* Legacy function, so skip implicit filters */
> +            non_filter = bdrv_skip_implicit_filters(blk->root->bs);
> +            if (bdrv_filtered_cow_child(non_filter)) {
> +                int ret = bdrv_commit(non_filter);
> +                if (ret < 0) {
> +                    aio_context_release(aio_context);
> +                    return ret;
> +                }
>               }
>           }
>           aio_context_release(aio_context);
> diff --git a/block/commit.c b/block/commit.c
> index 02eab34925..252007fd57 100644
> --- a/block/commit.c
> +++ b/block/commit.c
> @@ -113,7 +113,7 @@ static void commit_abort(Job *job)
>        * something to base, the intermediate images aren't valid any more. */
>       bdrv_child_try_set_perm(s->commit_top_bs->backing, 0, BLK_PERM_ALL,
>                               &error_abort);
> -    bdrv_replace_node(s->commit_top_bs, backing_bs(s->commit_top_bs),
> +    bdrv_replace_node(s->commit_top_bs, s->commit_top_bs->backing->bs,
>                         &error_abort);
>   
>       bdrv_unref(s->commit_top_bs);
> @@ -324,10 +324,16 @@ void commit_start(const char *job_id, BlockDriverState *bs,
>       s->commit_top_bs = commit_top_bs;
>       bdrv_unref(commit_top_bs);
>   
> -    /* Block all nodes between top and base, because they will
> -     * disappear from the chain after this operation. */
> +    /*
> +     * Block all nodes between top and base, because they will
> +     * disappear from the chain after this operation.
> +     * Note that this assumes that the user is fine with removing all
> +     * nodes (including R/W filters) between top and base.  Assuring
> +     * this is the responsibility of the interface (i.e. whoever calls
> +     * commit_start()).
> +     */
>       assert(bdrv_chain_contains(top, base));
> -    for (iter = top; iter != base; iter = backing_bs(iter)) {
> +    for (iter = top; iter != base; iter = bdrv_filtered_bs(iter)) {
>           /* XXX BLK_PERM_WRITE needs to be allowed so we don't block ourselves
>            * at s->base (if writes are blocked for a node, they are also blocked
>            * for its backing file). The other options would be a second filter
> @@ -414,19 +420,22 @@ int bdrv_commit(BlockDriverState *bs)
>       if (!drv)
>           return -ENOMEDIUM;
>   
> -    if (!bs->backing) {
> +    backing_file_bs = bdrv_filtered_cow_bs(bs);
> +
> +    if (!backing_file_bs) {
>           return -ENOTSUP;
>       }
>   
>       if (bdrv_op_is_blocked(bs, BLOCK_OP_TYPE_COMMIT_SOURCE, NULL) ||
> -        bdrv_op_is_blocked(bs->backing->bs, BLOCK_OP_TYPE_COMMIT_TARGET, NULL)) {
> +        bdrv_op_is_blocked(backing_file_bs, BLOCK_OP_TYPE_COMMIT_TARGET, NULL))
> +    {
>           return -EBUSY;
>       }
>   
> -    ro = bs->backing->bs->read_only;
> +    ro = backing_file_bs->read_only;
>   
>       if (ro) {
> -        if (bdrv_reopen_set_read_only(bs->backing->bs, false, NULL)) {
> +        if (bdrv_reopen_set_read_only(backing_file_bs, false, NULL)) {
>               return -EACCES;
>           }
>       }
> @@ -441,8 +450,6 @@ int bdrv_commit(BlockDriverState *bs)
>       }
>   
>       /* Insert commit_top block node above backing, so we can write to it */
> -    backing_file_bs = backing_bs(bs);
> -
>       commit_top_bs = bdrv_new_open_driver(&bdrv_commit_top, NULL, BDRV_O_RDWR,
>                                            &local_err);
>       if (commit_top_bs == NULL) {
> @@ -528,15 +535,13 @@ ro_cleanup:
>       qemu_vfree(buf);
>   
>       blk_unref(backing);
> -    if (backing_file_bs) {
> -        bdrv_set_backing_hd(bs, backing_file_bs, &error_abort);
> -    }
> +    bdrv_set_backing_hd(bs, backing_file_bs, &error_abort);
>       bdrv_unref(commit_top_bs);
>       blk_unref(src);
>   
>       if (ro) {
>           /* ignoring error return here */
> -        bdrv_reopen_set_read_only(bs->backing->bs, true, NULL);
> +        bdrv_reopen_set_read_only(backing_file_bs, true, NULL);
>       }
>   
>       return ret;
> diff --git a/block/io.c b/block/io.c
> index dfc153b8d8..83c2b6b46a 100644
> --- a/block/io.c
> +++ b/block/io.c
> @@ -118,8 +118,17 @@ static void bdrv_merge_limits(BlockLimits *dst, const BlockLimits *src)
>   void bdrv_refresh_limits(BlockDriverState *bs, Error **errp)
>   {
>       BlockDriver *drv = bs->drv;
> +    BlockDriverState *storage_bs;
> +    BlockDriverState *cow_bs = bdrv_filtered_cow_bs(bs);
>       Error *local_err = NULL;
>   
> +    /*
> +     * FIXME: There should be a function for this, and in fact there
> +     * will be as of a follow-up patch.
> +     */
> +    storage_bs =
> +        child_bs(bs->file) ?: bdrv_filtered_rw_bs(bs);
> +
>       memset(&bs->bl, 0, sizeof(bs->bl));
>   
>       if (!drv) {
> @@ -131,13 +140,13 @@ void bdrv_refresh_limits(BlockDriverState *bs, Error **errp)
>                                   drv->bdrv_aio_preadv) ? 1 : 512;
>   
>       /* Take some limits from the children as a default */
> -    if (bs->file) {
> -        bdrv_refresh_limits(bs->file->bs, &local_err);
> +    if (storage_bs) {
> +        bdrv_refresh_limits(storage_bs, &local_err);
>           if (local_err) {
>               error_propagate(errp, local_err);
>               return;
>           }
> -        bdrv_merge_limits(&bs->bl, &bs->file->bs->bl);
> +        bdrv_merge_limits(&bs->bl, &storage_bs->bl);
>       } else {
>           bs->bl.min_mem_alignment = 512;
>           bs->bl.opt_mem_alignment = getpagesize();
> @@ -146,13 +155,13 @@ void bdrv_refresh_limits(BlockDriverState *bs, Error **errp)
>           bs->bl.max_iov = IOV_MAX;
>       }
>   
> -    if (bs->backing) {
> -        bdrv_refresh_limits(bs->backing->bs, &local_err);
> +    if (cow_bs) {
> +        bdrv_refresh_limits(cow_bs, &local_err);
>           if (local_err) {
>               error_propagate(errp, local_err);
>               return;
>           }
> -        bdrv_merge_limits(&bs->bl, &bs->backing->bs->bl);
> +        bdrv_merge_limits(&bs->bl, &cow_bs->bl);
>       }
>   
>       /* Then let the driver override it */
> @@ -2139,11 +2148,12 @@ static int coroutine_fn bdrv_co_block_status(BlockDriverState *bs,
>       if (ret & (BDRV_BLOCK_DATA | BDRV_BLOCK_ZERO)) {
>           ret |= BDRV_BLOCK_ALLOCATED;
>       } else if (want_zero) {
> +        BlockDriverState *cow_bs = bdrv_filtered_cow_bs(bs);
> +
>           if (bdrv_unallocated_blocks_are_zero(bs)) {
>               ret |= BDRV_BLOCK_ZERO;
> -        } else if (bs->backing) {
> -            BlockDriverState *bs2 = bs->backing->bs;
> -            int64_t size2 = bdrv_getlength(bs2);
> +        } else if (cow_bs) {
> +            int64_t size2 = bdrv_getlength(cow_bs);
>   
>               if (size2 >= 0 && offset >= size2) {
>                   ret |= BDRV_BLOCK_ZERO;
> @@ -2208,7 +2218,7 @@ static int coroutine_fn bdrv_co_block_status_above(BlockDriverState *bs,
>       bool first = true;
>   
>       assert(bs != base);
> -    for (p = bs; p != base; p = backing_bs(p)) {
> +    for (p = bs; p != base; p = bdrv_filtered_bs(p)) {
>           ret = bdrv_co_block_status(p, want_zero, offset, bytes, pnum, map,
>                                      file);

Interesting that for filters who use bdrv_co_block_status_from_backing and
bdrv_co_block_status_from_file we will finally call .bdrv_co_block_status of
underalying real node two or more times.. It's not wrong but obviously not optimal.


>           if (ret < 0) {
> @@ -2294,7 +2304,7 @@ int bdrv_block_status_above(BlockDriverState *bs, BlockDriverState *base,
>   int bdrv_block_status(BlockDriverState *bs, int64_t offset, int64_t bytes,
>                         int64_t *pnum, int64_t *map, BlockDriverState **file)
>   {
> -    return bdrv_block_status_above(bs, backing_bs(bs),
> +    return bdrv_block_status_above(bs, bdrv_filtered_bs(bs),
>                                      offset, bytes, pnum, map, file);
>   }
>   
> @@ -2304,9 +2314,9 @@ int coroutine_fn bdrv_is_allocated(BlockDriverState *bs, int64_t offset,
>       int ret;
>       int64_t dummy;
>   
> -    ret = bdrv_common_block_status_above(bs, backing_bs(bs), false, offset,
> -                                         bytes, pnum ? pnum : &dummy, NULL,
> -                                         NULL);
> +    ret = bdrv_common_block_status_above(bs, bdrv_filtered_bs(bs), false,
> +                                         offset, bytes, pnum ? pnum : &dummy,
> +                                         NULL, NULL);
>       if (ret < 0) {
>           return ret;
>       }
> @@ -2360,7 +2370,7 @@ int bdrv_is_allocated_above(BlockDriverState *top,
>               n = pnum_inter;
>           }
>   
> -        intermediate = backing_bs(intermediate);
> +        intermediate = bdrv_filtered_bs(intermediate);
>       }
>   
>       *pnum = n;
> @@ -3135,8 +3145,9 @@ int coroutine_fn bdrv_co_truncate(BdrvChild *child, int64_t offset,
>       }
>   
>       if (!drv->bdrv_co_truncate) {
> -        if (bs->file && drv->is_filter) {
> -            ret = bdrv_co_truncate(bs->file, offset, prealloc, errp);
> +        BdrvChild *filtered = bdrv_filtered_rw_child(bs);
> +        if (filtered) {
> +            ret = bdrv_co_truncate(filtered, offset, prealloc, errp);
>               goto out;
>           }
>           error_setg(errp, "Image format driver does not support resize");
> diff --git a/block/mirror.c b/block/mirror.c
> index 8b2404051f..80cef587f0 100644
> --- a/block/mirror.c
> +++ b/block/mirror.c
> @@ -660,8 +660,9 @@ static int mirror_exit_common(Job *job)
>                               &error_abort);
>       if (!abort && s->backing_mode == MIRROR_SOURCE_BACKING_CHAIN) {
>           BlockDriverState *backing = s->is_none_mode ? src : s->base;
> -        if (backing_bs(target_bs) != backing) {
> -            bdrv_set_backing_hd(target_bs, backing, &local_err);
> +        if (bdrv_backing_chain_next(target_bs) != backing) {
> +            bdrv_set_backing_hd(bdrv_skip_rw_filters(target_bs), backing,

hmm, here you support filters above target_bs ...

> +                                &local_err);
>               if (local_err) {
>                   error_report_err(local_err);
>                   ret = -EPERM;
> @@ -711,7 +712,7 @@ static int mirror_exit_common(Job *job)
>       block_job_remove_all_bdrv(bjob);
>       bdrv_child_try_set_perm(mirror_top_bs->backing, 0, BLK_PERM_ALL,
>                               &error_abort);
> -    bdrv_replace_node(mirror_top_bs, backing_bs(mirror_top_bs), &error_abort);
> +    bdrv_replace_node(mirror_top_bs, mirror_top_bs->backing->bs, &error_abort);
>   
>       /* We just changed the BDS the job BB refers to (with either or both of the
>        * bdrv_replace_node() calls), so switch the BB back so the cleanup does
> @@ -903,7 +904,7 @@ static int coroutine_fn mirror_run(Job *job, Error **errp)
>       } else {
>           s->target_cluster_size = BDRV_SECTOR_SIZE;
>       }
> -    if (backing_filename[0] && !target_bs->backing &&
> +    if (backing_filename[0] && !bdrv_filtered_cow_child(target_bs) &&

... and here - not

[stopped here for now]



-- 
Best regards,
Vladimir

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [PATCH v4 02/11] block: Filtered children access functions
  2019-05-07 13:15             ` Max Reitz
@ 2019-05-07 13:33               ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 48+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2019-05-07 13:33 UTC (permalink / raw)
  To: Max Reitz, qemu-block; +Cc: Kevin Wolf, qemu-devel

07.05.2019 16:15, Max Reitz wrote:
> On 07.05.19 11:32, Vladimir Sementsov-Ogievskiy wrote:
>> 24.04.2019 19:36, Max Reitz wrote:
>>> On 19.04.19 12:23, Vladimir Sementsov-Ogievskiy wrote:
>>>> 17.04.2019 19:22, Max Reitz wrote:
>>>>> On 16.04.19 12:02, Vladimir Sementsov-Ogievskiy wrote:
>>>>>> 10.04.2019 23:20, Max Reitz wrote:
>>>>>>> What bs->file and bs->backing mean depends on the node.  For filter
>>>>>>> nodes, both signify a node that will eventually receive all R/W
>>>>>>> accesses.  For format nodes, bs->file contains metadata and data, and
>>>>>>> bs->backing will not receive writes -- instead, writes are COWed to
>>>>>>> bs->file.  Usually.
>>>>>>>
>>>>>>> In any case, it is not trivial to guess what a child means exactly with
>>>>>>> our currently limited form of expression.  It is better to introduce
>>>>>>> some functions that actually guarantee a meaning:
>>>>>>>
>>>>>>> - bdrv_filtered_cow_child() will return the child that receives requests
>>>>>>>       filtered through COW.  That is, reads may or may not be forwarded
>>>>>>>       (depending on the overlay's allocation status), but writes never go to
>>>>>>>       this child.
>>>>>>>
>>>>>>> - bdrv_filtered_rw_child() will return the child that receives requests
>>>>>>>       filtered through some very plain process.  Reads and writes issued to
>>>>>>>       the parent will go to the child as well (although timing, etc. may be
>>>>>>>       modified).
>>>>>>>
>>>>>>> - All drivers but quorum (but quorum is pretty opaque to the general
>>>>>>>       block layer anyway) always only have one of these children: All read
>>>>>>>       requests must be served from the filtered_rw_child (if it exists), so
>>>>>>>       if there was a filtered_cow_child in addition, it would not receive
>>>>>>>       any requests at all.
>>>>>>>       (The closest here is mirror, where all requests are passed on to the
>>>>>>>       source, but with write-blocking, write requests are "COWed" to the
>>>>>>>       target.  But that just means that the target is a special child that
>>>>>>>       cannot be introspected by the generic block layer functions, and that
>>>>>>>       source is a filtered_rw_child.)
>>>>>>>       Therefore, we can also add bdrv_filtered_child() which returns that
>>>>>>>       one child (or NULL, if there is no filtered child).
>>>>>>>
>>>>>>> Also, many places in the current block layer should be skipping filters
>>>>>>> (all filters or just the ones added implicitly, it depends) when going
>>>>>>> through a block node chain.  They do not do that currently, but this
>>>>>>> patch makes them.
>>>>>>>
>>>>>>> One example for this is qemu-img map, which should skip filters and only
>>>>>>> look at the COW elements in the graph.  The change to iotest 204's
>>>>>>> reference output shows how using blkdebug on top of a COW node used to
>>>>>>> make qemu-img map disregard the rest of the backing chain, but with this
>>>>>>> patch, the allocation in the base image is reported correctly.
>>>>>>>
>>>>>>> Furthermore, a note should be made that sometimes we do want to access
>>>>>>> bs->backing directly.  This is whenever the operation in question is not
>>>>>>> about accessing the COW child, but the "backing" child, be it COW or
>>>>>>> not.  This is the case in functions such as bdrv_open_backing_file() or
>>>>>>> whenever we have to deal with the special behavior of @backing as a
>>>>>>> blockdev option, which is that it does not default to null like all
>>>>>>> other child references do.
>>>>>>>
>>>>>>> Finally, the query functions (query-block and query-named-block-nodes)
>>>>>>> are modified to return any filtered child under "backing", not just
>>>>>>> bs->backing or COW children.  This is so that filters do not interrupt
>>>>>>> the reported backing chain.  This changes the output of iotest 184, as
>>>>>>> the throttled node now appears as a backing child.
>>>>>>>
>>>>>>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>>>>>>> ---
>>>>>>>      qapi/block-core.json           |   4 +
>>>>>>>      include/block/block.h          |   1 +
>>>>>>>      include/block/block_int.h      |  40 +++++--
>>>>>>>      block.c                        | 210 +++++++++++++++++++++++++++------
>>>>>>>      block/backup.c                 |   8 +-
>>>>>>>      block/block-backend.c          |  16 ++-
>>>>>>>      block/commit.c                 |  33 +++---
>>>>>>>      block/io.c                     |  45 ++++---
>>>>>>>      block/mirror.c                 |  21 ++--
>>>>>>>      block/qapi.c                   |  30 +++--
>>>>>>>      block/stream.c                 |  13 +-
>>>>>>>      blockdev.c                     |  88 +++++++++++---
>>>>>>>      migration/block-dirty-bitmap.c |   4 +-
>>>>>>>      nbd/server.c                   |   6 +-
>>>>>>>      qemu-img.c                     |  29 ++---
>>>>>>>      tests/qemu-iotests/184.out     |   7 +-
>>>>>>>      tests/qemu-iotests/204.out     |   1 +
>>>>>>>      17 files changed, 411 insertions(+), 145 deletions(-)
>>>>>>
>>>>>> really huge... didn't you consider conversion file-by-file?
>>>>>
>>>>> Frankly, no, I just didn’t consider it.
>>>>>
>>>>> Hm.  I don’t know, 30-patch series always look so frightening.
>>>>>
>>>>>>> diff --git a/block.c b/block.c
>>>>>>> index 16615bc876..e8f6febda0 100644
>>>>>>> --- a/block.c
>>>>>>> +++ b/block.c
>>>>>>
>>>>>> [..]
>>>>>>
>>>>>>>      
>>>>>>> @@ -3467,14 +3469,17 @@ static int bdrv_reopen_parse_backing(BDRVReopenState *reopen_state,
>>>>>>>          /*
>>>>>>>           * Find the "actual" backing file by skipping all links that point
>>>>>>>           * to an implicit node, if any (e.g. a commit filter node).
>>>>>>> +     * We cannot use any of the bdrv_skip_*() functions here because
>>>>>>> +     * those return the first explicit node, while we are looking for
>>>>>>> +     * its overlay here.
>>>>>>>           */
>>>>>>>          overlay_bs = bs;
>>>>>>> -    while (backing_bs(overlay_bs) && backing_bs(overlay_bs)->implicit) {
>>>>>>> -        overlay_bs = backing_bs(overlay_bs);
>>>>>>> +    while (overlay_bs->backing && bdrv_filtered_bs(overlay_bs)->implicit) {
>>>>>>
>>>>>> So, you don't want to skip implicit filters with 'file' child? Then, why not to use
>>>>>> child_bs(overlay_bs->backing), like in following if condition?
>>>>>
>>>>> I think it was an artifact of writing the patch.  I started with
>>>>> bdrv_filtered_bs() and then realized this depends on ->backing,
>>>>> actually.  There was no functional difference so I left it as it was.
>>>>>
>>>>> But you’re right, it is more clear to use child_bs(overlay_bs->backing)
>>>>> isntead.
>>>>>
>>>>>> Could we instead make backing-based filters equal to file-based, to make it possible
>>>>>> to use file-based filters in backing-chain related scenarios (like upcoming copy-on-read
>>>>>> filter for stream)? So, to expand backing-chain concept to include filters with file child?
>>>>>
>>>>> If I understand you correctly, that’s basically the purpose of this
>>>>> series and especially this patch here.  As far as it is possible and
>>>>> reasonable, I want filters that use bs->backing and bs->file behave the
>>>>> same.
>>>>>
>>>>> However, there are cases where this is not possible and
>>>>> bdrv_reopen_parse_backing() is one such case.  bs->backing and bs->file
>>>>> correspond to QAPI names, namely 'backing' and 'file'.  If that
>>>>> distinction was already visible to the user, we cannot change it now.
>>>>>
>>>>> We definitely cannot make file-based filters use bs->backing now because
>>>>> you can create them over QAPI and they use 'file' as their child name.
>>>>> Can we make backing-based filters use bs->file?  Seems more likely,
>>>>> because all of them are implicit nodes, so the user usually doesn’t see
>>>>> them.  But usually isn’t always; they do become user-visible once the
>>>>> user specifies a node-name for mirror or commit.
>>>>>
>>>>> I found it more reasonable to introduce new functions that explicitly
>>>>> express what kind of child they expect and then apply them everywhere as
>>>>> I saw fit, instead of making the mirror/commit filter drivers use
>>>>> bs->file and hope it works; not least because I’d still have to go
>>>>> through the whole block layer and check every instance of bs->backing to
>>>>> see whether it really needs bs->backing or whether it should use either
>>>>> of bs->backing or bs->file.
>>>>>
>>>>>>> +        overlay_bs = bdrv_filtered_bs(overlay_bs);
>>>>>>>          }
>>>>>>>      
>>>>>>>          /* If we want to replace the backing file we need some extra checks */
>>>>>>> -    if (new_backing_bs != backing_bs(overlay_bs)) {
>>>>>>> +    if (new_backing_bs != child_bs(overlay_bs->backing)) { >           /* Check for implicit nodes between bs and its backing file */
>>>>>>>              if (bs != overlay_bs) {
>>>>>>>                  error_setg(errp, "Cannot change backing link if '%s' has "
>>>>>>
>>>>>> [..]
>>>>>>
>>>>>>> @@ -4203,8 +4208,8 @@ int bdrv_change_backing_file(BlockDriverState *bs,
>>>>>>>      BlockDriverState *bdrv_find_overlay(BlockDriverState *active,
>>>>>>>                                          BlockDriverState *bs)
>>>>>>>      {
>>>>>>> -    while (active && bs != backing_bs(active)) {
>>>>>>> -        active = backing_bs(active);
>>>>>>> +    while (active && bs != bdrv_filtered_bs(active)) {
>>>>>>
>>>>>> hmm and here you actually support backing-chain with file-child-based filters in it..
>>>>>
>>>>> Yes, because this is not about the QAPI 'backing' link.  This function
>>>>> should continue to work even if there are filters in the backing chain.
>>>>
>>>> this is a generic function to find overlay in backing chain and it may be used from different places,
>>>> for example it is used in Andrey's series about filter for block-stream.
>>>
>>> Well, all places that use it accept backing chains with filters inside
>>> of them.
>>>
>>>> It is used from qmp_block_commit, isn't it about QAPI?
>>>
>>> By "QAPI 'backing' link" I mean the user-visible block graph.  Hm.  I
>>> wrote in my other mail that you could use query-named-block-nodes to see
>>> that graph; apparently you can’t.  So besides x-debug-query-block-graph,
>>> we still don’t have any facility to query the block graph?  I don’t know
>>> what to say.
>>>
>>> Anyway, you can still construct the graph with blockdev-add, so it is
>>> user-visible.  And in that block graph, there is a 'backing' link, and
>>> there is a 'file' link -- this is what I mean with "QAPI link".
>>>
>>> We have commands that are abstract and don’t work on specific graph
>>> links.  For instance, block-commit commits across a backing chain, so it
>>> doesn’t matter whether the graph link is called 'backing' or whatever,
>>> what is important is that it’s a COW link.  But we should also ignore
>>> filters on the way, so this patch makes block-commit and others use
>>> those more abstract child access functions.
>>>
>>> But whenever it is about exactly the "file" or the "backing" link, we
>>> have to use bs->file and bs->backing, respectively.  That's just how it
>>> currently is.
>>>
>>>>>>> +        active = bdrv_filtered_bs(active);
>>>>>>>          }
>>>>>>>      
>>>>>>>          return active;
>>>>>>> @@ -4226,11 +4231,11 @@ bool bdrv_is_backing_chain_frozen(BlockDriverState *bs, BlockDriverState *base,
>>>>>>>      {
>>>>>>>          BlockDriverState *i;
>>>>>>>      
>>>>>>> -    for (i = bs; i != base; i = backing_bs(i)) {
>>>>>>> +    for (i = bs; i != base; i = child_bs(i->backing)) {
>>>>>>
>>>>>> and here don't..
>>>>>
>>>>> Yes, because this function is about the QAPI 'backing' link.
>>>>
>>>> And this again a generic thing, that may be used in same places as bdrv_find_overlay,
>>>
>>> But it isn’t.
>>>
>>>> and it is used in series about block-stream filter too. So, for further developments
>>>> we'll have to keep in mind all these differences between generic block layer functions,
>>>> which supports .file children inside backing chain and which are not...
>>>
>>> I was wrong about bdrv_is_backing_chain_frozen(), if that helps (as I
>>> wrote in my other (previous) mail).
>>>
>>> But for example bdrv_set_backing_hd() always has to use bs->backing,
>>> because that’s what it’s about (and I do change its descriptive comment
>>> to reflect that, so you don’t need to keep it in mind).  Same for
>>> bdrv_open_backing_file().
>>>
>>> Hm, what other cases are there...
>>>
>>> bdrv_reopen_parse_backing(): Fundamentally, this too is about the
>>> user-visible "backing" link (as specified through x-blockdev-reopen).
>>> But the loop it contains is more difficult to translate than I had
>>> thought.  At some point, there needs to be a bs->backing link, because
>>> that is what this function is about, but it should also skip all
>>> implicit filters in the way, I think.  So e.g. this should be recognized:
>>>
>>> bs  ---backing-->  COR ---file-->  base
>>>
>>> @overlay_bs should be COR, I think...?  I mean, as long as COR is an
>>> implicit node.  So the loop really should use bdrv_filtered_bs()
>>> everywhere, and then the same afterwards.  I think that we should also
>>> ensure that @bs can support a ->backing child, but how would I check
>>> that?  Maybe it’s safe to just omit such a check...
>>>
>>> But then another issue comes in: The link to replace (in the above case
>>> from "COR" to "base") is no longer necessarily a backing link.  So
>>> bdrv_reopen_commit() has to be capable of replacing both bs->backing and
>>> bs->file.
>>>
>>> Actually, how does bdrv_reopen_commit() handle implicit nodes at all?
>>> bdrv_reopen_parse_backing() just sets reopen_state->replace_backing_bs
>>> and ->new_backing_bs.  It doesn’t communicate anything about overlay_bs.
>>>    bdrv_reopen_commit() then asserts that !bs->backing->bs->implicit and
>>> replaces bs->backing.  So it seems to just fail on the implicit nodes
>>> that bdrv_reopen_parse_backing() took care to skip...
>>>
>>>
>>> OK, what else...  bdrv_reopen_prepare() checks
>>> reopen_state->bs->backing, which I claim is correct because while there
>>> may be implicit filters in the chain, the first link has to be a
>>> ->backing link.
>>
>> [sorry for a long delay]
>> Are you working on next version or waiting for more reviews?
> 
> I haven’t worked on the next version yet, but that’s just because other
> things were more important, not because of reviews.
> 
>> Why first link should be backing? We want to skip all implicit filters, including
>> file-child-based in following call to bdrv_reopen_parse_backing(). So, don't we
>> want something like bdrv_backing_chain_next() here? But then a question, could
>> reopen_state->bs be filter itself...
> 
> Because this function is about the 'backing' option.  As I explained
> above, this must correspond to a bs->backing child.  If there is an
> implicit filter, it will still be under bs->backing.
> 

Aha, ok, understand.

> 
>>> bdrv_backing_overridden() has to query bs->backing because this function
>>> is used when it is about a specific characteristic of the backing link:
>>> There is a non-null default (given by the image header), so if the
>>> current bs->backing matches this default, you do not have to specify the
>>> backing filename in either blockdev-add or a filename.  Same in
>>> bdrv_refresh_filename().
>>>
>>>
>>> I hope that was all...?
>>>
>>> Max
>>>
>>
>>
> 
> 


-- 
Best regards,
Vladimir

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [PATCH v4 02/11] block: Filtered children access functions
  2019-05-07 13:30   ` Vladimir Sementsov-Ogievskiy
@ 2019-05-07 15:13     ` Max Reitz
  2019-05-17 11:50       ` Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 48+ messages in thread
From: Max Reitz @ 2019-05-07 15:13 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-block; +Cc: Kevin Wolf, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 16148 bytes --]

On 07.05.19 15:30, Vladimir Sementsov-Ogievskiy wrote:
> 10.04.2019 23:20, Max Reitz wrote:
>> What bs->file and bs->backing mean depends on the node.  For filter
>> nodes, both signify a node that will eventually receive all R/W
>> accesses.  For format nodes, bs->file contains metadata and data, and
>> bs->backing will not receive writes -- instead, writes are COWed to
>> bs->file.  Usually.
>>
>> In any case, it is not trivial to guess what a child means exactly with
>> our currently limited form of expression.  It is better to introduce
>> some functions that actually guarantee a meaning:
>>
>> - bdrv_filtered_cow_child() will return the child that receives requests
>>    filtered through COW.  That is, reads may or may not be forwarded
>>    (depending on the overlay's allocation status), but writes never go to
>>    this child.
>>
>> - bdrv_filtered_rw_child() will return the child that receives requests
>>    filtered through some very plain process.  Reads and writes issued to
>>    the parent will go to the child as well (although timing, etc. may be
>>    modified).
>>
>> - All drivers but quorum (but quorum is pretty opaque to the general
>>    block layer anyway) always only have one of these children: All read
>>    requests must be served from the filtered_rw_child (if it exists), so
>>    if there was a filtered_cow_child in addition, it would not receive
>>    any requests at all.
>>    (The closest here is mirror, where all requests are passed on to the
>>    source, but with write-blocking, write requests are "COWed" to the
>>    target.  But that just means that the target is a special child that
>>    cannot be introspected by the generic block layer functions, and that
>>    source is a filtered_rw_child.)
>>    Therefore, we can also add bdrv_filtered_child() which returns that
>>    one child (or NULL, if there is no filtered child).
>>
>> Also, many places in the current block layer should be skipping filters
>> (all filters or just the ones added implicitly, it depends) when going
>> through a block node chain.  They do not do that currently, but this
>> patch makes them.
>>
>> One example for this is qemu-img map, which should skip filters and only
>> look at the COW elements in the graph.  The change to iotest 204's
>> reference output shows how using blkdebug on top of a COW node used to
>> make qemu-img map disregard the rest of the backing chain, but with this
>> patch, the allocation in the base image is reported correctly.
>>
>> Furthermore, a note should be made that sometimes we do want to access
>> bs->backing directly.  This is whenever the operation in question is not
>> about accessing the COW child, but the "backing" child, be it COW or
>> not.  This is the case in functions such as bdrv_open_backing_file() or
>> whenever we have to deal with the special behavior of @backing as a
>> blockdev option, which is that it does not default to null like all
>> other child references do.
>>
>> Finally, the query functions (query-block and query-named-block-nodes)
>> are modified to return any filtered child under "backing", not just
>> bs->backing or COW children.  This is so that filters do not interrupt
>> the reported backing chain.  This changes the output of iotest 184, as
>> the throttled node now appears as a backing child.
>>
>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>> ---
>>   qapi/block-core.json           |   4 +
>>   include/block/block.h          |   1 +
>>   include/block/block_int.h      |  40 +++++--
>>   block.c                        | 210 +++++++++++++++++++++++++++------
>>   block/backup.c                 |   8 +-
>>   block/block-backend.c          |  16 ++-
>>   block/commit.c                 |  33 +++---
>>   block/io.c                     |  45 ++++---
>>   block/mirror.c                 |  21 ++--
>>   block/qapi.c                   |  30 +++--
>>   block/stream.c                 |  13 +-
>>   blockdev.c                     |  88 +++++++++++---
>>   migration/block-dirty-bitmap.c |   4 +-
>>   nbd/server.c                   |   6 +-
>>   qemu-img.c                     |  29 ++---
>>   tests/qemu-iotests/184.out     |   7 +-
>>   tests/qemu-iotests/204.out     |   1 +
>>   17 files changed, 411 insertions(+), 145 deletions(-)
>>
>> diff --git a/qapi/block-core.json b/qapi/block-core.json
>> index 7ccbfff9d0..dbd9286e4a 100644
>> --- a/qapi/block-core.json
>> +++ b/qapi/block-core.json
>> @@ -2502,6 +2502,10 @@
>>   # On successful completion the image file is updated to drop the backing file
>>   # and the BLOCK_JOB_COMPLETED event is emitted.
>>   #
>> +# In case @device is a filter node, block-stream modifies the first non-filter
>> +# overlay node below it to point to base's backing node (or NULL if @base was
>> +# not specified) instead of modifying @device itself.
>> +#
> 
> Is it necessary, why we can't keep it as is, modifying exactly device node? May be,
> user wants to use filter in stream process, throttling for example.

That wouldn't make any sense.  Say you have this configuration:

throttle -> top -> base

Now you stream from base to throttle.  The data goes from base through
throttle to top.  You propose to then make throttle point to base:

throttle -> base

This will discard all the data in top.

Filters don’t store any data.  You need to keep the top data storing
image, i.e. the first non-filter overlay.

>>   # @job-id: identifier for the newly-created block job. If
>>   #          omitted, the device name will be used. (Since 2.7)
>>   #

[...]

>> @@ -2345,7 +2347,7 @@ void bdrv_set_backing_hd(BlockDriverState *bs, BlockDriverState *backing_hd,
>>       bool update_inherits_from = bdrv_chain_contains(bs, backing_hd) &&
>>           bdrv_inherits_from_recursive(backing_hd, bs);
>>   
>> -    if (bdrv_is_backing_chain_frozen(bs, backing_bs(bs), errp)) {
>> +    if (bdrv_is_backing_chain_frozen(bs, child_bs(bs->backing), errp)) {
> 
> If we support file-filters for frozen backing chain, could it go through file child here?
> Hmm, only in case when we are going to set backing hd for file-filter.. Hmm, could filter have
> both file and backing children?

No.  A filter passes through data from its children, so it can only have
a single child, or it is quorum.

The file/backing combination is reserved for COW overlays.  file is
where the current layer’s data is, backing is the filtered child.

> Your new API don't restrict it, and choses backing as a default
> in this case in bdrv_filtered_rw_child(), so, I assume you suppose possibility of it.

I can add an assertion against it if you’d like.

> Here we don't want to check the chain, we exactly want to check backing link, so it should be
> something like
> 
> if (bs->backing && bs->backing->frozen) {
>     error_setg("backig exists and frozen!");
>     return;
> }
> 
> 
> Hmm, on the other hand, if we have frozen backing chain, going through file child, we must not add
> backing child to the node with file child, as it will change backing chain (which by default goes
> through backing)..
> 
> Anyway, we don't need to check the whole backing chain, as we may find other frozen backing subchain,
> far away of bs.. So, we possibly want to check
> 
> if (bdrv_filtered_child(bs) && bdrv_filtered_child(bs)->frozed) {
>    ERROR
> }
> 
> 
> ....
> 
> also, we'll need to check for frozen file child, when we want to replace it.

I don’t quite understand.  It sounds to me like you’re saying we don’t
need to check the whole chain here but just the immediate child.  But
isn’t that true regardless of this series?

>>           return;
>>       }
>>   

[...]

>> @@ -3634,7 +3639,7 @@ int bdrv_reopen_prepare(BDRVReopenState *reopen_state, BlockReopenQueue *queue,
>>        * its metadata. Otherwise the 'backing' option can be omitted.
>>        */
>>       if (drv->supports_backing && reopen_state->backing_missing &&
>> -        (backing_bs(reopen_state->bs) || reopen_state->bs->backing_file[0])) {
>> +        (reopen_state->bs->backing || reopen_state->bs->backing_file[0])) {
> 
> and if we skip implicit filters in bdrv_backing_chain_next(), shouldn't we skip them
> here too?

This is a check whether it is mandatory for the user to specify the
'backing' key when reopening @bs.  It is mandatory if it currently has a
backing node, or if it should get a backing file by default because the
image header says so.

We don’t care about any node in particular.  The question is just “Is
there a backing node?”.  So I don’t see how skipping filters would
change anything.

>>           error_setg(errp, "backing is missing for '%s'",
>>                      reopen_state->bs->node_name);
>>           ret = -EINVAL;
>> @@ -3779,7 +3784,7 @@ void bdrv_reopen_commit(BDRVReopenState *reopen_state)
>>        * from bdrv_set_backing_hd()) has the new values.
>>        */
>>       if (reopen_state->replace_backing_bs) {
>> -        BlockDriverState *old_backing_bs = backing_bs(bs);
>> +        BlockDriverState *old_backing_bs = child_bs(bs->backing);
>>           assert(!old_backing_bs || !old_backing_bs->implicit);
>>           /* Abort the permission update on the backing bs we're detaching */
>>           if (old_backing_bs) {
>> @@ -4203,8 +4208,8 @@ int bdrv_change_backing_file(BlockDriverState *bs,
>>   BlockDriverState *bdrv_find_overlay(BlockDriverState *active,
>>                                       BlockDriverState *bs)
>>   {
>> -    while (active && bs != backing_bs(active)) {
>> -        active = backing_bs(active);
>> +    while (active && bs != bdrv_filtered_bs(active)) {
> 
> need to adjust comment to the function then, as we may find file-based-overlay, not backing.

Yes, true.

>> +        active = bdrv_filtered_bs(active);
>>       }
>>   
>>       return active;

[...]

>> @@ -4494,10 +4497,14 @@ bool bdrv_is_sg(BlockDriverState *bs)
>>   
>>   bool bdrv_is_encrypted(BlockDriverState *bs)
>>   {
>> -    if (bs->backing && bs->backing->bs->encrypted) {
>> +    BlockDriverState *filtered = bdrv_filtered_bs(bs);
>> +    if (bs->encrypted) {
>> +        return true;
>> +    }
>> +    if (filtered && bdrv_is_encrypted(filtered)) {
>>           return true;
>>       }
>> -    return bs->encrypted;
>> +    return false;
>>   }
> 
> one backing child -> recursion through extended backing chain

Yes, but isn’t that what we want?

[...]

>> @@ -4866,20 +4887,24 @@ int bdrv_has_zero_init_1(BlockDriverState *bs)
>>   
>>   int bdrv_has_zero_init(BlockDriverState *bs)
>>   {
>> +    BlockDriverState *filtered;
>> +
>>       if (!bs->drv) {
>>           return 0;
>>       }
>>   
>>       /* If BS is a copy on write image, it is initialized to
>>          the contents of the base image, which may not be zeroes.  */
>> -    if (bs->backing) {
>> +    if (bdrv_filtered_cow_child(bs)) {
>>           return 0;
>>       }
>>       if (bs->drv->bdrv_has_zero_init) {
>>           return bs->drv->bdrv_has_zero_init(bs);
>>       }
>> -    if (bs->file && bs->drv->is_filter) {
>> -        return bdrv_has_zero_init(bs->file->bs);
>> +
>> +    filtered = bdrv_filtered_rw_bs(bs);
>> +    if (filtered) {
>> +        return bdrv_has_zero_init(filtered);
>>       }
> 
> add recursion for filters

Not really, we had that before.

>>   
>>       /* safe default */

[...]

>> diff --git a/block/backup.c b/block/backup.c
>> index 9988753249..9c08353b23 100644
>> --- a/block/backup.c
>> +++ b/block/backup.c
>> @@ -577,6 +577,7 @@ BlockJob *backup_job_create(const char *job_id, BlockDriverState *bs,
>>       int64_t len;
>>       BlockDriverInfo bdi;
>>       BackupBlockJob *job = NULL;
>> +    bool target_does_cow;
>>       int ret;
>>   
>>       assert(bs);
>> @@ -671,8 +672,9 @@ BlockJob *backup_job_create(const char *job_id, BlockDriverState *bs,
>>       /* If there is no backing file on the target, we cannot rely on COW if our
>>        * backup cluster size is smaller than the target cluster size. Even for
>>        * targets with a backing file, try to avoid COW if possible. */
>> +    target_does_cow = bdrv_filtered_cow_child(target);
> 
> So, you excluded false-positive case when target is backing-filter. I think, we'd better skip
> filters here:
> 
> target_does_cow = bdrv_filtered_cow_child(bdrv_skip_rw_filters(target))

Sounds correct, yes.

I suppose we then need a fix for compression, too, though.  Currently,
the code checks whether the target driver offers
.bdrv_co_pwritev_compressed().  First, we should do the same there and
skip filters.  But second, in block/io.c, we then need to also skip
filters for compressed writes (by issuing normal writes to them with the
BDRV_REQ_WRITE_COMPRESSED flag set).

>>       ret = bdrv_get_info(target, &bdi);
>> -    if (ret == -ENOTSUP && !target->backing) {
>> +    if (ret == -ENOTSUP && !target_does_cow) {
>>           /* Cluster size is not defined */
>>           warn_report("The target block device doesn't provide "
>>                       "information about the block size and it doesn't have a "

[...]

>> diff --git a/block/io.c b/block/io.c
>> index dfc153b8d8..83c2b6b46a 100644
>> --- a/block/io.c
>> +++ b/block/io.c

[...]

>> @@ -2208,7 +2218,7 @@ static int coroutine_fn bdrv_co_block_status_above(BlockDriverState *bs,
>>       bool first = true;
>>   
>>       assert(bs != base);
>> -    for (p = bs; p != base; p = backing_bs(p)) {
>> +    for (p = bs; p != base; p = bdrv_filtered_bs(p)) {
>>           ret = bdrv_co_block_status(p, want_zero, offset, bytes, pnum, map,
>>                                      file);
> 
> Interesting that for filters who use bdrv_co_block_status_from_backing and
> bdrv_co_block_status_from_file we will finally call .bdrv_co_block_status of
> underalying real node two or more times.. It's not wrong but obviously not optimal.

Hm.  If @p is a filter, we could skip straight to *file.  Would that work?

[...]

>> diff --git a/block/mirror.c b/block/mirror.c
>> index 8b2404051f..80cef587f0 100644
>> --- a/block/mirror.c
>> +++ b/block/mirror.c
>> @@ -660,8 +660,9 @@ static int mirror_exit_common(Job *job)
>>                               &error_abort);
>>       if (!abort && s->backing_mode == MIRROR_SOURCE_BACKING_CHAIN) {
>>           BlockDriverState *backing = s->is_none_mode ? src : s->base;
>> -        if (backing_bs(target_bs) != backing) {
>> -            bdrv_set_backing_hd(target_bs, backing, &local_err);
>> +        if (bdrv_backing_chain_next(target_bs) != backing) {
>> +            bdrv_set_backing_hd(bdrv_skip_rw_filters(target_bs), backing,
> 
> hmm, here you support filters above target_bs ...
> 
>> +                                &local_err);
>>               if (local_err) {
>>                   error_report_err(local_err);
>>                   ret = -EPERM;
>> @@ -711,7 +712,7 @@ static int mirror_exit_common(Job *job)
>>       block_job_remove_all_bdrv(bjob);
>>       bdrv_child_try_set_perm(mirror_top_bs->backing, 0, BLK_PERM_ALL,
>>                               &error_abort);
>> -    bdrv_replace_node(mirror_top_bs, backing_bs(mirror_top_bs), &error_abort);
>> +    bdrv_replace_node(mirror_top_bs, mirror_top_bs->backing->bs, &error_abort);
>>   
>>       /* We just changed the BDS the job BB refers to (with either or both of the
>>        * bdrv_replace_node() calls), so switch the BB back so the cleanup does
>> @@ -903,7 +904,7 @@ static int coroutine_fn mirror_run(Job *job, Error **errp)
>>       } else {
>>           s->target_cluster_size = BDRV_SECTOR_SIZE;
>>       }
>> -    if (backing_filename[0] && !target_bs->backing &&
>> +    if (backing_filename[0] && !bdrv_filtered_cow_child(target_bs) &&
> 
> ... and here - not

Hm, yes, I think that is a mistake.  I’ll try to fix it.

> [stopped here for now]

Thanks so far! :-)

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [PATCH v4 02/11] block: Filtered children access functions
  2019-05-07 15:13     ` Max Reitz
@ 2019-05-17 11:50       ` Vladimir Sementsov-Ogievskiy
  2019-05-23 14:49         ` Max Reitz
  0 siblings, 1 reply; 48+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2019-05-17 11:50 UTC (permalink / raw)
  To: Max Reitz, qemu-block; +Cc: Kevin Wolf, qemu-devel

07.05.2019 18:13, Max Reitz wrote:
> On 07.05.19 15:30, Vladimir Sementsov-Ogievskiy wrote:
>> 10.04.2019 23:20, Max Reitz wrote:
>>> What bs->file and bs->backing mean depends on the node.  For filter
>>> nodes, both signify a node that will eventually receive all R/W
>>> accesses.  For format nodes, bs->file contains metadata and data, and
>>> bs->backing will not receive writes -- instead, writes are COWed to
>>> bs->file.  Usually.
>>>
>>> In any case, it is not trivial to guess what a child means exactly with
>>> our currently limited form of expression.  It is better to introduce
>>> some functions that actually guarantee a meaning:
>>>
>>> - bdrv_filtered_cow_child() will return the child that receives requests
>>>     filtered through COW.  That is, reads may or may not be forwarded
>>>     (depending on the overlay's allocation status), but writes never go to
>>>     this child.
>>>
>>> - bdrv_filtered_rw_child() will return the child that receives requests
>>>     filtered through some very plain process.  Reads and writes issued to
>>>     the parent will go to the child as well (although timing, etc. may be
>>>     modified).
>>>
>>> - All drivers but quorum (but quorum is pretty opaque to the general
>>>     block layer anyway) always only have one of these children: All read
>>>     requests must be served from the filtered_rw_child (if it exists), so
>>>     if there was a filtered_cow_child in addition, it would not receive
>>>     any requests at all.
>>>     (The closest here is mirror, where all requests are passed on to the
>>>     source, but with write-blocking, write requests are "COWed" to the
>>>     target.  But that just means that the target is a special child that
>>>     cannot be introspected by the generic block layer functions, and that
>>>     source is a filtered_rw_child.)
>>>     Therefore, we can also add bdrv_filtered_child() which returns that
>>>     one child (or NULL, if there is no filtered child).
>>>
>>> Also, many places in the current block layer should be skipping filters
>>> (all filters or just the ones added implicitly, it depends) when going
>>> through a block node chain.  They do not do that currently, but this
>>> patch makes them.
>>>
>>> One example for this is qemu-img map, which should skip filters and only
>>> look at the COW elements in the graph.  The change to iotest 204's
>>> reference output shows how using blkdebug on top of a COW node used to
>>> make qemu-img map disregard the rest of the backing chain, but with this
>>> patch, the allocation in the base image is reported correctly.
>>>
>>> Furthermore, a note should be made that sometimes we do want to access
>>> bs->backing directly.  This is whenever the operation in question is not
>>> about accessing the COW child, but the "backing" child, be it COW or
>>> not.  This is the case in functions such as bdrv_open_backing_file() or
>>> whenever we have to deal with the special behavior of @backing as a
>>> blockdev option, which is that it does not default to null like all
>>> other child references do.
>>>
>>> Finally, the query functions (query-block and query-named-block-nodes)
>>> are modified to return any filtered child under "backing", not just
>>> bs->backing or COW children.  This is so that filters do not interrupt
>>> the reported backing chain.  This changes the output of iotest 184, as
>>> the throttled node now appears as a backing child.
>>>
>>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>>> ---
>>>    qapi/block-core.json           |   4 +
>>>    include/block/block.h          |   1 +
>>>    include/block/block_int.h      |  40 +++++--
>>>    block.c                        | 210 +++++++++++++++++++++++++++------
>>>    block/backup.c                 |   8 +-
>>>    block/block-backend.c          |  16 ++-
>>>    block/commit.c                 |  33 +++---
>>>    block/io.c                     |  45 ++++---
>>>    block/mirror.c                 |  21 ++--
>>>    block/qapi.c                   |  30 +++--
>>>    block/stream.c                 |  13 +-
>>>    blockdev.c                     |  88 +++++++++++---
>>>    migration/block-dirty-bitmap.c |   4 +-
>>>    nbd/server.c                   |   6 +-
>>>    qemu-img.c                     |  29 ++---
>>>    tests/qemu-iotests/184.out     |   7 +-
>>>    tests/qemu-iotests/204.out     |   1 +
>>>    17 files changed, 411 insertions(+), 145 deletions(-)
>>>
>>> diff --git a/qapi/block-core.json b/qapi/block-core.json
>>> index 7ccbfff9d0..dbd9286e4a 100644
>>> --- a/qapi/block-core.json
>>> +++ b/qapi/block-core.json
>>> @@ -2502,6 +2502,10 @@
>>>    # On successful completion the image file is updated to drop the backing file
>>>    # and the BLOCK_JOB_COMPLETED event is emitted.
>>>    #
>>> +# In case @device is a filter node, block-stream modifies the first non-filter
>>> +# overlay node below it to point to base's backing node (or NULL if @base was
>>> +# not specified) instead of modifying @device itself.
>>> +#
>>
>> Is it necessary, why we can't keep it as is, modifying exactly device node? May be,
>> user wants to use filter in stream process, throttling for example.
> 
> That wouldn't make any sense.  Say you have this configuration:
> 
> throttle -> top -> base
> 
> Now you stream from base to throttle.  The data goes from base through
> throttle to top.  You propose to then make throttle point to base:
> 
> throttle -> base
> 
> This will discard all the data in top.
> 
> Filters don’t store any data.  You need to keep the top data storing
> image, i.e. the first non-filter overlay.

Ah, yes, good reason.

> 
>>>    # @job-id: identifier for the newly-created block job. If
>>>    #          omitted, the device name will be used. (Since 2.7)
>>>    #
> 
> [...]
> 
>>> @@ -2345,7 +2347,7 @@ void bdrv_set_backing_hd(BlockDriverState *bs, BlockDriverState *backing_hd,
>>>        bool update_inherits_from = bdrv_chain_contains(bs, backing_hd) &&
>>>            bdrv_inherits_from_recursive(backing_hd, bs);
>>>    
>>> -    if (bdrv_is_backing_chain_frozen(bs, backing_bs(bs), errp)) {
>>> +    if (bdrv_is_backing_chain_frozen(bs, child_bs(bs->backing), errp)) {
>>
>> If we support file-filters for frozen backing chain, could it go through file child here?
>> Hmm, only in case when we are going to set backing hd for file-filter.. Hmm, could filter have
>> both file and backing children?
> 
> No.  A filter passes through data from its children, so it can only have
> a single child, or it is quorum.
> 
> The file/backing combination is reserved for COW overlays.  file is
> where the current layer’s data is, backing is the filtered child.

My backup-top has two children - backing and target.. So, I think, we can state that
filter should not have both file and backing children, but may have any other special
children he wants, invisible for backing-child/file-child generic logic.

May be we need an assertion somewhere, like
assert(!(bs->is_filter && bs->file && bs->backing))

> 
>> Your new API don't restrict it, and choses backing as a default
>> in this case in bdrv_filtered_rw_child(), so, I assume you suppose possibility of it.
> 
> I can add an assertion against it if you’d like.

Yes, I think it worth doing

> 
>> Here we don't want to check the chain, we exactly want to check backing link, so it should be
>> something like
>>
>> if (bs->backing && bs->backing->frozen) {
>>      error_setg("backig exists and frozen!");
>>      return;
>> }
>>
>>
>> Hmm, on the other hand, if we have frozen backing chain, going through file child, we must not add
>> backing child to the node with file child, as it will change backing chain (which by default goes
>> through backing)..
>>
>> Anyway, we don't need to check the whole backing chain, as we may find other frozen backing subchain,
>> far away of bs.. So, we possibly want to check
>>
>> if (bdrv_filtered_child(bs) && bdrv_filtered_child(bs)->frozed) {
>>     ERROR
>> }
>>
>>
>> ....
>>
>> also, we'll need to check for frozen file child, when we want to replace it.
> 
> I don’t quite understand.  It sounds to me like you’re saying we don’t
> need to check the whole chain here but just the immediate child.  But
> isn’t that true regardless of this series?

If we restrict adding backing child to filter with file child, all becomes simpler and seems to be correct.

Should we add check for frozen file child to bdrv_replace_child() ?

> 
>>>            return;
>>>        }
>>>    
> 
> [...]
> 
>>> @@ -3634,7 +3639,7 @@ int bdrv_reopen_prepare(BDRVReopenState *reopen_state, BlockReopenQueue *queue,
>>>         * its metadata. Otherwise the 'backing' option can be omitted.
>>>         */
>>>        if (drv->supports_backing && reopen_state->backing_missing &&
>>> -        (backing_bs(reopen_state->bs) || reopen_state->bs->backing_file[0])) {
>>> +        (reopen_state->bs->backing || reopen_state->bs->backing_file[0])) {
>>
>> and if we skip implicit filters in bdrv_backing_chain_next(), shouldn't we skip them
>> here too?
> 
> This is a check whether it is mandatory for the user to specify the
> 'backing' key when reopening @bs.  It is mandatory if it currently has a
> backing node, or if it should get a backing file by default because the
> image header says so.
> 
> We don’t care about any node in particular.  The question is just “Is
> there a backing node?”.  So I don’t see how skipping filters would
> change anything.
> 
>>>            error_setg(errp, "backing is missing for '%s'",
>>>                       reopen_state->bs->node_name);
>>>            ret = -EINVAL;
>>> @@ -3779,7 +3784,7 @@ void bdrv_reopen_commit(BDRVReopenState *reopen_state)
>>>         * from bdrv_set_backing_hd()) has the new values.
>>>         */
>>>        if (reopen_state->replace_backing_bs) {
>>> -        BlockDriverState *old_backing_bs = backing_bs(bs);
>>> +        BlockDriverState *old_backing_bs = child_bs(bs->backing);
>>>            assert(!old_backing_bs || !old_backing_bs->implicit);
>>>            /* Abort the permission update on the backing bs we're detaching */
>>>            if (old_backing_bs) {
>>> @@ -4203,8 +4208,8 @@ int bdrv_change_backing_file(BlockDriverState *bs,
>>>    BlockDriverState *bdrv_find_overlay(BlockDriverState *active,
>>>                                        BlockDriverState *bs)
>>>    {
>>> -    while (active && bs != backing_bs(active)) {
>>> -        active = backing_bs(active);
>>> +    while (active && bs != bdrv_filtered_bs(active)) {
>>
>> need to adjust comment to the function then, as we may find file-based-overlay, not backing.
> 
> Yes, true.
> 
>>> +        active = bdrv_filtered_bs(active);
>>>        }
>>>    
>>>        return active;
> 
> [...]
> 
>>> @@ -4494,10 +4497,14 @@ bool bdrv_is_sg(BlockDriverState *bs)
>>>    
>>>    bool bdrv_is_encrypted(BlockDriverState *bs)
>>>    {
>>> -    if (bs->backing && bs->backing->bs->encrypted) {
>>> +    BlockDriverState *filtered = bdrv_filtered_bs(bs);
>>> +    if (bs->encrypted) {
>>> +        return true;
>>> +    }
>>> +    if (filtered && bdrv_is_encrypted(filtered)) {
>>>            return true;
>>>        }
>>> -    return bs->encrypted;
>>> +    return false;
>>>    }
>>
>> one backing child -> recursion through extended backing chain
> 
> Yes, but isn’t that what we want?

It was just a note, I don't mind.

> 
> [...]
> 
>>> @@ -4866,20 +4887,24 @@ int bdrv_has_zero_init_1(BlockDriverState *bs)
>>>    
>>>    int bdrv_has_zero_init(BlockDriverState *bs)
>>>    {
>>> +    BlockDriverState *filtered;
>>> +
>>>        if (!bs->drv) {
>>>            return 0;
>>>        }
>>>    
>>>        /* If BS is a copy on write image, it is initialized to
>>>           the contents of the base image, which may not be zeroes.  */
>>> -    if (bs->backing) {
>>> +    if (bdrv_filtered_cow_child(bs)) {
>>>            return 0;
>>>        }
>>>        if (bs->drv->bdrv_has_zero_init) {
>>>            return bs->drv->bdrv_has_zero_init(bs);
>>>        }
>>> -    if (bs->file && bs->drv->is_filter) {
>>> -        return bdrv_has_zero_init(bs->file->bs);
>>> +
>>> +    filtered = bdrv_filtered_rw_bs(bs);
>>> +    if (filtered) {
>>> +        return bdrv_has_zero_init(filtered);
>>>        }
>>
>> add recursion for filters
> 
> Not really, we had that before.
> 
>>>    
>>>        /* safe default */
> 
> [...]
> 
>>> diff --git a/block/backup.c b/block/backup.c
>>> index 9988753249..9c08353b23 100644
>>> --- a/block/backup.c
>>> +++ b/block/backup.c
>>> @@ -577,6 +577,7 @@ BlockJob *backup_job_create(const char *job_id, BlockDriverState *bs,
>>>        int64_t len;
>>>        BlockDriverInfo bdi;
>>>        BackupBlockJob *job = NULL;
>>> +    bool target_does_cow;
>>>        int ret;
>>>    
>>>        assert(bs);
>>> @@ -671,8 +672,9 @@ BlockJob *backup_job_create(const char *job_id, BlockDriverState *bs,
>>>        /* If there is no backing file on the target, we cannot rely on COW if our
>>>         * backup cluster size is smaller than the target cluster size. Even for
>>>         * targets with a backing file, try to avoid COW if possible. */
>>> +    target_does_cow = bdrv_filtered_cow_child(target);
>>
>> So, you excluded false-positive case when target is backing-filter. I think, we'd better skip
>> filters here:
>>
>> target_does_cow = bdrv_filtered_cow_child(bdrv_skip_rw_filters(target))
> 
> Sounds correct, yes.
> 
> I suppose we then need a fix for compression, too, though.  Currently,
> the code checks whether the target driver offers
> .bdrv_co_pwritev_compressed().  First, we should do the same there and
> skip filters.  But second, in block/io.c, we then need to also skip
> filters for compressed writes (by issuing normal writes to them with the
> BDRV_REQ_WRITE_COMPRESSED flag set).

Agreed

> 
>>>        ret = bdrv_get_info(target, &bdi);
>>> -    if (ret == -ENOTSUP && !target->backing) {
>>> +    if (ret == -ENOTSUP && !target_does_cow) {
>>>            /* Cluster size is not defined */
>>>            warn_report("The target block device doesn't provide "
>>>                        "information about the block size and it doesn't have a "
> 
> [...]
> 
>>> diff --git a/block/io.c b/block/io.c
>>> index dfc153b8d8..83c2b6b46a 100644
>>> --- a/block/io.c
>>> +++ b/block/io.c
> 
> [...]
> 
>>> @@ -2208,7 +2218,7 @@ static int coroutine_fn bdrv_co_block_status_above(BlockDriverState *bs,
>>>        bool first = true;
>>>    
>>>        assert(bs != base);
>>> -    for (p = bs; p != base; p = backing_bs(p)) {
>>> +    for (p = bs; p != base; p = bdrv_filtered_bs(p)) {
>>>            ret = bdrv_co_block_status(p, want_zero, offset, bytes, pnum, map,
>>>                                       file);
>>
>> Interesting that for filters who use bdrv_co_block_status_from_backing and
>> bdrv_co_block_status_from_file we will finally call .bdrv_co_block_status of
>> underalying real node two or more times.. It's not wrong but obviously not optimal.
> 
> Hm.  If @p is a filter, we could skip straight to *file.  Would that work?

No, as file may be not in backing chain:

filter
    |
    v
qcow2 -> file
    |
    v
qcow2

So, we shouldn't redirect the whole loop to file..

May be the correct solution should be introducing additional handler
.bdrv_co_block_status_above with different logic..

> 
> [...]
> 
>>> diff --git a/block/mirror.c b/block/mirror.c
>>> index 8b2404051f..80cef587f0 100644
>>> --- a/block/mirror.c
>>> +++ b/block/mirror.c
>>> @@ -660,8 +660,9 @@ static int mirror_exit_common(Job *job)
>>>                                &error_abort);
>>>        if (!abort && s->backing_mode == MIRROR_SOURCE_BACKING_CHAIN) {
>>>            BlockDriverState *backing = s->is_none_mode ? src : s->base;
>>> -        if (backing_bs(target_bs) != backing) {
>>> -            bdrv_set_backing_hd(target_bs, backing, &local_err);
>>> +        if (bdrv_backing_chain_next(target_bs) != backing) {
>>> +            bdrv_set_backing_hd(bdrv_skip_rw_filters(target_bs), backing,
>>
>> hmm, here you support filters above target_bs ...
>>
>>> +                                &local_err);
>>>                if (local_err) {
>>>                    error_report_err(local_err);
>>>                    ret = -EPERM;
>>> @@ -711,7 +712,7 @@ static int mirror_exit_common(Job *job)
>>>        block_job_remove_all_bdrv(bjob);
>>>        bdrv_child_try_set_perm(mirror_top_bs->backing, 0, BLK_PERM_ALL,
>>>                                &error_abort);
>>> -    bdrv_replace_node(mirror_top_bs, backing_bs(mirror_top_bs), &error_abort);
>>> +    bdrv_replace_node(mirror_top_bs, mirror_top_bs->backing->bs, &error_abort);
>>>    
>>>        /* We just changed the BDS the job BB refers to (with either or both of the
>>>         * bdrv_replace_node() calls), so switch the BB back so the cleanup does
>>> @@ -903,7 +904,7 @@ static int coroutine_fn mirror_run(Job *job, Error **errp)
>>>        } else {
>>>            s->target_cluster_size = BDRV_SECTOR_SIZE;
>>>        }
>>> -    if (backing_filename[0] && !target_bs->backing &&
>>> +    if (backing_filename[0] && !bdrv_filtered_cow_child(target_bs) &&
>>
>> ... and here - not
> 
> Hm, yes, I think that is a mistake.  I’ll try to fix it.
> 
>> [stopped here for now]
> 
> Thanks so far! :-)
> 
> Max
> 


-- 
Best regards,
Vladimir

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [PATCH v4 02/11] block: Filtered children access functions
  2019-04-10 20:20   ` Max Reitz
                     ` (2 preceding siblings ...)
  (?)
@ 2019-05-17 14:50   ` Vladimir Sementsov-Ogievskiy
  2019-05-23 17:27     ` Max Reitz
  -1 siblings, 1 reply; 48+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2019-05-17 14:50 UTC (permalink / raw)
  To: Max Reitz, qemu-block; +Cc: Kevin Wolf, qemu-devel

10.04.2019 23:20, Max Reitz wrote:
> What bs->file and bs->backing mean depends on the node.  For filter
> nodes, both signify a node that will eventually receive all R/W
> accesses.  For format nodes, bs->file contains metadata and data, and
> bs->backing will not receive writes -- instead, writes are COWed to
> bs->file.  Usually.
> 
> In any case, it is not trivial to guess what a child means exactly with
> our currently limited form of expression.  It is better to introduce
> some functions that actually guarantee a meaning:
> 
> - bdrv_filtered_cow_child() will return the child that receives requests
>    filtered through COW.  That is, reads may or may not be forwarded
>    (depending on the overlay's allocation status), but writes never go to
>    this child.
> 
> - bdrv_filtered_rw_child() will return the child that receives requests
>    filtered through some very plain process.  Reads and writes issued to
>    the parent will go to the child as well (although timing, etc. may be
>    modified).
> 
> - All drivers but quorum (but quorum is pretty opaque to the general
>    block layer anyway) always only have one of these children: All read
>    requests must be served from the filtered_rw_child (if it exists), so
>    if there was a filtered_cow_child in addition, it would not receive
>    any requests at all.
>    (The closest here is mirror, where all requests are passed on to the
>    source, but with write-blocking, write requests are "COWed" to the
>    target.  But that just means that the target is a special child that
>    cannot be introspected by the generic block layer functions, and that
>    source is a filtered_rw_child.)
>    Therefore, we can also add bdrv_filtered_child() which returns that
>    one child (or NULL, if there is no filtered child).
> 
> Also, many places in the current block layer should be skipping filters
> (all filters or just the ones added implicitly, it depends) when going
> through a block node chain.  They do not do that currently, but this
> patch makes them.
> 
> One example for this is qemu-img map, which should skip filters and only
> look at the COW elements in the graph.  The change to iotest 204's
> reference output shows how using blkdebug on top of a COW node used to
> make qemu-img map disregard the rest of the backing chain, but with this
> patch, the allocation in the base image is reported correctly.
> 
> Furthermore, a note should be made that sometimes we do want to access
> bs->backing directly.  This is whenever the operation in question is not
> about accessing the COW child, but the "backing" child, be it COW or
> not.  This is the case in functions such as bdrv_open_backing_file() or
> whenever we have to deal with the special behavior of @backing as a
> blockdev option, which is that it does not default to null like all
> other child references do.
> 
> Finally, the query functions (query-block and query-named-block-nodes)
> are modified to return any filtered child under "backing", not just
> bs->backing or COW children.  This is so that filters do not interrupt
> the reported backing chain.  This changes the output of iotest 184, as
> the throttled node now appears as a backing child.
> 
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---

[..]

> --- a/block/mirror.c
> +++ b/block/mirror.c
> @@ -660,8 +660,9 @@ static int mirror_exit_common(Job *job)
>                               &error_abort);
>       if (!abort && s->backing_mode == MIRROR_SOURCE_BACKING_CHAIN) {
>           BlockDriverState *backing = s->is_none_mode ? src : s->base;
> -        if (backing_bs(target_bs) != backing) {
> -            bdrv_set_backing_hd(target_bs, backing, &local_err);
> +        if (bdrv_backing_chain_next(target_bs) != backing) {
> +            bdrv_set_backing_hd(bdrv_skip_rw_filters(target_bs), backing,


here you support filters above target_bs ...

> +                                &local_err);
>               if (local_err) {
>                   error_report_err(local_err);
>                   ret = -EPERM;
> @@ -711,7 +712,7 @@ static int mirror_exit_common(Job *job)
>       block_job_remove_all_bdrv(bjob);
>       bdrv_child_try_set_perm(mirror_top_bs->backing, 0, BLK_PERM_ALL,
>                               &error_abort);
> -    bdrv_replace_node(mirror_top_bs, backing_bs(mirror_top_bs), &error_abort);
> +    bdrv_replace_node(mirror_top_bs, mirror_top_bs->backing->bs, &error_abort);
>   
>       /* We just changed the BDS the job BB refers to (with either or both of the
>        * bdrv_replace_node() calls), so switch the BB back so the cleanup does
> @@ -903,7 +904,7 @@ static int coroutine_fn mirror_run(Job *job, Error **errp)
>       } else {
>           s->target_cluster_size = BDRV_SECTOR_SIZE;
>       }
> -    if (backing_filename[0] && !target_bs->backing &&
> +    if (backing_filename[0] && !bdrv_filtered_cow_child(target_bs) &&

And here we need it too

[continuing from here]

>           s->granularity < s->target_cluster_size) {
>           s->buf_size = MAX(s->buf_size, s->target_cluster_size);
>           s->cow_bitmap = bitmap_new(length);
> @@ -1083,7 +1084,7 @@ static void mirror_complete(Job *job, Error **errp)
>       if (s->backing_mode == MIRROR_OPEN_BACKING_CHAIN) {
>           int ret;
>   
> -        assert(!target->backing);
> +        assert(!bdrv_filtered_cow_child(target));

hmm and here...

Possibly, we should add a kind of s->filtered_target to use in all such cases.

>           ret = bdrv_open_backing_file(target, NULL, "backing", errp);
>           if (ret < 0) {
>               return;
> @@ -1650,7 +1651,9 @@ static void mirror_start_job(const char *job_id, BlockDriverState *bs,
>        * any jobs in them must be blocked */
>       if (target_is_backing) {
>           BlockDriverState *iter;
> -        for (iter = backing_bs(bs); iter != target; iter = backing_bs(iter)) {
> +        for (iter = bdrv_filtered_bs(bs); iter != target;

should it be filtered_target too?

> +             iter = bdrv_filtered_bs(iter))
> +        {
>               /* XXX BLK_PERM_WRITE needs to be allowed so we don't block
>                * ourselves at s->base (if writes are blocked for a node, they are
>                * also blocked for its backing file). The other options would be a
> @@ -1691,7 +1694,7 @@ fail:
>   
>       bdrv_child_try_set_perm(mirror_top_bs->backing, 0, BLK_PERM_ALL,
>                               &error_abort);
> -    bdrv_replace_node(mirror_top_bs, backing_bs(mirror_top_bs), &error_abort);
> +    bdrv_replace_node(mirror_top_bs, mirror_top_bs->backing->bs, &error_abort);
>   
>       bdrv_unref(mirror_top_bs);
>   }
> @@ -1707,14 +1710,14 @@ void mirror_start(const char *job_id, BlockDriverState *bs,
>                     MirrorCopyMode copy_mode, Error **errp)
>   {
>       bool is_none_mode;
> -    BlockDriverState *base;
> +    BlockDriverState *base = NULL;

dead assignment

>   
>       if (mode == MIRROR_SYNC_MODE_INCREMENTAL) {
>           error_setg(errp, "Sync mode 'incremental' not supported");
>           return;
>       }
>       is_none_mode = mode == MIRROR_SYNC_MODE_NONE;
> -    base = mode == MIRROR_SYNC_MODE_TOP ? backing_bs(bs) : NULL;
> +    base = mode == MIRROR_SYNC_MODE_TOP ? bdrv_backing_chain_next(bs) : NULL;
>       mirror_start_job(job_id, bs, creation_flags, target, replaces,
>                        speed, granularity, buf_size, backing_mode,
>                        on_source_error, on_target_error, unmap, NULL, NULL,
> diff --git a/block/qapi.c b/block/qapi.c
> index 110d05dc57..478c6f5e0d 100644
> --- a/block/qapi.c
> +++ b/block/qapi.c
> @@ -149,9 +149,13 @@ BlockDeviceInfo *bdrv_block_device_info(BlockBackend *blk,
>               return NULL;
>           }
>   
> -        if (bs0->drv && bs0->backing) {
> +        if (bs0->drv && bdrv_filtered_child(bs0)) {
> +            /*
> +             * Put any filtered child here (for backwards compatibility to when
> +             * we put bs0->backing here, which might be any filtered child).
> +             */
>               info->backing_file_depth++;
> -            bs0 = bs0->backing->bs;
> +            bs0 = bdrv_filtered_bs(bs0);
>               (*p_image_info)->has_backing_image = true;
>               p_image_info = &((*p_image_info)->backing_image);
>           } else {
> @@ -160,9 +164,8 @@ BlockDeviceInfo *bdrv_block_device_info(BlockBackend *blk,
>   
>           /* Skip automatically inserted nodes that the user isn't aware of for
>            * query-block (blk != NULL), but not for query-named-block-nodes */
> -        while (blk && bs0->drv && bs0->implicit) {
> -            bs0 = backing_bs(bs0);
> -            assert(bs0);
> +        if (blk) {
> +            bs0 = bdrv_skip_implicit_filters(bs0);
>           }
>       }
>   
> @@ -347,9 +350,9 @@ static void bdrv_query_info(BlockBackend *blk, BlockInfo **p_info,
>       BlockDriverState *bs = blk_bs(blk);
>       char *qdev;
>   
> -    /* Skip automatically inserted nodes that the user isn't aware of */
> -    while (bs && bs->drv && bs->implicit) {
> -        bs = backing_bs(bs);
> +    if (bs) {
> +        /* Skip automatically inserted nodes that the user isn't aware of */
> +        bs = bdrv_skip_implicit_filters(bs);
>       }
>   
>       info->device = g_strdup(blk_name(blk));
> @@ -506,6 +509,7 @@ static void bdrv_query_blk_stats(BlockDeviceStats *ds, BlockBackend *blk)
>   static BlockStats *bdrv_query_bds_stats(BlockDriverState *bs,
>                                           bool blk_level)
>   {
> +    BlockDriverState *cow_bs;
>       BlockStats *s = NULL;
>   
>       s = g_malloc0(sizeof(*s));
> @@ -518,9 +522,8 @@ static BlockStats *bdrv_query_bds_stats(BlockDriverState *bs,
>       /* Skip automatically inserted nodes that the user isn't aware of in
>        * a BlockBackend-level command. Stay at the exact node for a node-level
>        * command. */
> -    while (blk_level && bs->drv && bs->implicit) {
> -        bs = backing_bs(bs);
> -        assert(bs);
> +    if (blk_level) {
> +        bs = bdrv_skip_implicit_filters(bs);
>       }
>   
>       if (bdrv_get_node_name(bs)[0]) {
> @@ -535,9 +538,10 @@ static BlockStats *bdrv_query_bds_stats(BlockDriverState *bs,
>           s->parent = bdrv_query_bds_stats(bs->file->bs, blk_level);
>       }
>   
> -    if (blk_level && bs->backing) {
> +    cow_bs = bdrv_filtered_cow_bs(bs);

So, if we at blk_level and top bs is explicit filter, you don't want to show it's
child?

Hmm, at least, we can't show it if it is file-child, as qapi filed already called
backing. So, if we can't show for file-child-based filters, it may be better to not
show filter children here at all.

> +    if (blk_level && cow_bs) {
>           s->has_backing = true;
> -        s->backing = bdrv_query_bds_stats(bs->backing->bs, blk_level);
> +        s->backing = bdrv_query_bds_stats(cow_bs, blk_level);
>       }
>   
>       return s;
> diff --git a/block/stream.c b/block/stream.c
> index bfaebb861a..23d5c890e0 100644
> --- a/block/stream.c
> +++ b/block/stream.c
> @@ -65,6 +65,7 @@ static int stream_prepare(Job *job)
>       StreamBlockJob *s = container_of(job, StreamBlockJob, common.job);
>       BlockJob *bjob = &s->common;
>       BlockDriverState *bs = blk_bs(bjob->blk);
> +    BlockDriverState *unfiltered = bdrv_skip_rw_filters(bs);

Aha, I'd call it filtered, but unfiltered is correct too, it's amazing

>       BlockDriverState *base = s->base;
>       Error *local_err = NULL;
>       int ret = 0;
> @@ -72,7 +73,7 @@ static int stream_prepare(Job *job)
>       bdrv_unfreeze_backing_chain(bs, base);
>       s->chain_frozen = false;
>   
> -    if (bs->backing) {
> +    if (bdrv_filtered_cow_child(unfiltered)) {
>           const char *base_id = NULL, *base_fmt = NULL;
>           if (base) {
>               base_id = s->backing_file_str;
> @@ -80,7 +81,7 @@ static int stream_prepare(Job *job)
>                   base_fmt = base->drv->format_name;
>               }
>           }
> -        ret = bdrv_change_backing_file(bs, base_id, base_fmt);
> +        ret = bdrv_change_backing_file(unfiltered, base_id, base_fmt);
>           bdrv_set_backing_hd(bs, base, &local_err);
>           if (local_err) {
>               error_report_err(local_err);
> @@ -121,7 +122,7 @@ static int coroutine_fn stream_run(Job *job, Error **errp)
>       int64_t n = 0; /* bytes */
>       void *buf;
>   
> -    if (!bs->backing) {
> +    if (!bdrv_filtered_child(bs)) {
>           goto out;
>       }

this condition checks that there is nothing to stream, so, I thing it's better to check
if (!bdrv_backing_chain_next(bs)) {
   goto out;
}

>   
> @@ -162,7 +163,7 @@ static int coroutine_fn stream_run(Job *job, Error **errp)
>           } else if (ret >= 0) {
>               /* Copy if allocated in the intermediate images.  Limit to the
>                * known-unallocated area [offset, offset+n*BDRV_SECTOR_SIZE).  */
> -            ret = bdrv_is_allocated_above(backing_bs(bs), base,
> +            ret = bdrv_is_allocated_above(bdrv_filtered_bs(bs), base,
>                                             offset, n, &n);

Hmm, if we trying to support bs to be filter, and actually operate on first-non-filter,
as you write in qapi spec, this is wrong. Again it should be
bdrv_filtered_cow_bs(bdrv_skip_rw_filters(bs))..

Or, may be better, we at stream start should calculate reald top bs to operate on, and
forget about all filters above.. i.e., do bs = bdrv_skip_rw_filters(bs) at the very
beginning, when creating a job.

>   
>               /* Finish early if end of backing file has been reached */
> @@ -268,7 +269,9 @@ void stream_start(const char *job_id, BlockDriverState *bs,
>        * disappear from the chain after this operation. The streaming job reads
>        * every block only once, assuming that it doesn't change, so block writes
>        * and resizes. */
> -    for (iter = backing_bs(bs); iter && iter != base; iter = backing_bs(iter)) {
> +    for (iter = bdrv_filtered_bs(bs); iter && iter != base;
> +         iter = bdrv_filtered_bs(iter))
> +    {
>           block_job_add_bdrv(&s->common, "intermediate node", iter, 0,
>                              BLK_PERM_CONSISTENT_READ | BLK_PERM_WRITE_UNCHANGED,
>                              &error_abort);
> diff --git a/blockdev.c b/blockdev.c
> index 4775a07d93..bb71b8368d 100644
> --- a/blockdev.c
> +++ b/blockdev.c
> @@ -1094,7 +1094,7 @@ void hmp_commit(Monitor *mon, const QDict *qdict)
>               return;
>           }
>   
> -        bs = blk_bs(blk);
> +        bs = bdrv_skip_implicit_filters(blk_bs(blk));
>           aio_context = bdrv_get_aio_context(bs);
>           aio_context_acquire(aio_context);
>   
> @@ -1663,7 +1663,7 @@ static void external_snapshot_prepare(BlkActionState *common,
>           goto out;
>       }
>   
> -    if (state->new_bs->backing != NULL) {
> +    if (bdrv_filtered_cow_child(state->new_bs)) {

Do we allow to create filter snapshot? We should either restrict it explicitly or
check bdrv_filtered_child here.. And we can't allow file-based-filters anyway..

[skipped up to the end of blockdev.c, I'm tired o_O]

[..]

> diff --git a/migration/block-dirty-bitmap.c b/migration/block-dirty-bitmap.c
> index d1bb863cb6..f99f753fba 100644
> --- a/migration/block-dirty-bitmap.c
> +++ b/migration/block-dirty-bitmap.c
> @@ -285,9 +285,7 @@ static int init_dirty_bitmap_migration(void)
>           const char *drive_name = bdrv_get_device_or_node_name(bs);
>   
>           /* skip automatically inserted nodes */
> -        while (bs && bs->drv && bs->implicit) {
> -            bs = backing_bs(bs);
> -        }
> +        bs = bdrv_skip_implicit_filters(bs);

this intersects with Jonh's patch
[PATCH v2] migration/dirty-bitmaps: change bitmap enumeration method
https://lists.gnu.org/archive/html/qemu-devel/2019-05/msg03340.html

>   
>           for (bitmap = bdrv_dirty_bitmap_next(bs, NULL); bitmap;
>                bitmap = bdrv_dirty_bitmap_next(bs, bitmap))
> diff --git a/nbd/server.c b/nbd/server.c
> index e21bd501dc..e41ae89dbe 100644
> --- a/nbd/server.c
> +++ b/nbd/server.c
> @@ -1506,13 +1506,13 @@ NBDExport *nbd_export_new(BlockDriverState *bs, uint64_t dev_offset,
>       if (bitmap) {
>           BdrvDirtyBitmap *bm = NULL;
>   
> -        while (true) {
> +        while (bs) {
>               bm = bdrv_find_dirty_bitmap(bs, bitmap);
> -            if (bm != NULL || bs->backing == NULL) {
> +            if (bm != NULL) {
>                   break;
>               }
>   
> -            bs = bs->backing->bs;
> +            bs = bdrv_filtered_bs(bs);
>           }

Check in documentation: "@bitmap: Also export the dirty bitmap reachable from @device".

"Reachable" is not bad, but we may want to clarify that extended backing chain is meant

>   
>           if (bm == NULL) {
> diff --git a/qemu-img.c b/qemu-img.c
> index aa6f81f1ea..bcfbb743fc 100644
> --- a/qemu-img.c
> +++ b/qemu-img.c
> @@ -982,7 +982,7 @@ static int img_commit(int argc, char **argv)
>       if (!blk) {
>           return 1;
>       }
> -    bs = blk_bs(blk);
> +    bs = bdrv_skip_implicit_filters(blk_bs(blk));

hope there should not be any, but for consistancy we may skip them

>   
>       qemu_progress_init(progress, 1.f);
>       qemu_progress_print(0.f, 100);
> @@ -999,7 +999,7 @@ static int img_commit(int argc, char **argv)
>           /* This is different from QMP, which by default uses the deepest file in
>            * the backing chain (i.e., the very base); however, the traditional
>            * behavior of qemu-img commit is using the immediate backing file. */
> -        base_bs = backing_bs(bs);
> +        base_bs = bdrv_filtered_cow_bs(bs);
>           if (!base_bs) {
>               error_setg(&local_err, "Image does not have a backing file");
>               goto done;
> @@ -1616,19 +1616,18 @@ static int convert_iteration_sectors(ImgConvertState *s, int64_t sector_num)
>   
>       if (s->sector_next_status <= sector_num) {
>           int64_t count = n * BDRV_SECTOR_SIZE;
> +        BlockDriverState *src_bs = blk_bs(s->src[src_cur]);
> +        BlockDriverState *base;
>   
>           if (s->target_has_backing) {
> -
> -            ret = bdrv_block_status(blk_bs(s->src[src_cur]),
> -                                    (sector_num - src_cur_offset) *
> -                                    BDRV_SECTOR_SIZE,
> -                                    count, &count, NULL, NULL);
> +            base = bdrv_backing_chain_next(src_bs);
>           } else {
> -            ret = bdrv_block_status_above(blk_bs(s->src[src_cur]), NULL,
> -                                          (sector_num - src_cur_offset) *
> -                                          BDRV_SECTOR_SIZE,
> -                                          count, &count, NULL, NULL);
> +            base = NULL;
>           }
> +        ret = bdrv_block_status_above(src_bs, base,
> +                                      (sector_num - src_cur_offset) *
> +                                      BDRV_SECTOR_SIZE,
> +                                      count, &count, NULL, NULL);
>           if (ret < 0) {
>               error_report("error while reading block status of sector %" PRId64
>                            ": %s", sector_num, strerror(-ret));
> @@ -2434,7 +2433,8 @@ static int img_convert(int argc, char **argv)
>            * s.target_backing_sectors has to be negative, which it will
>            * be automatically).  The backing file length is used only
>            * for optimizations, so such a case is not fatal. */
> -        s.target_backing_sectors = bdrv_nb_sectors(out_bs->backing->bs);
> +        s.target_backing_sectors =
> +            bdrv_nb_sectors(bdrv_filtered_cow_bs(out_bs));

can't out_bs be filter itself?

>       } else {
>           s.target_backing_sectors = -1;
>       }
> @@ -2797,6 +2797,7 @@ static int get_block_status(BlockDriverState *bs, int64_t offset,
>   
>       depth = 0;
>       for (;;) {
> +        bs = bdrv_skip_rw_filters(bs);

Why? Filters may have own implementation of block_status, why to skip it?

Or, thay cannot? Really, may be disallow filters have block_status, we may solve
inefficient block_status_above we talked about before.

>           ret = bdrv_block_status(bs, offset, bytes, &bytes, &map, &file);
>           if (ret < 0) {
>               return ret;
> @@ -2805,7 +2806,7 @@ static int get_block_status(BlockDriverState *bs, int64_t offset,
>           if (ret & (BDRV_BLOCK_ZERO|BDRV_BLOCK_DATA)) {
>               break;
>           }
> -        bs = backing_bs(bs);
> +        bs = bdrv_filtered_cow_bs(bs);
>           if (bs == NULL) {
>               ret = 0;
>               break;
> @@ -2944,7 +2945,7 @@ static int img_map(int argc, char **argv)
>       if (!blk) {
>           return 1;
>       }
> -    bs = blk_bs(blk);
> +    bs = bdrv_skip_implicit_filters(blk_bs(blk));
>   
>       if (output_format == OFORMAT_HUMAN) {
>           printf("%-16s%-16s%-16s%s\n", "Offset", "Length", "Mapped to", "File");
> diff --git a/tests/qemu-iotests/184.out b/tests/qemu-iotests/184.out
> index 3deb3cfb94..1d61f7e224 100644
> --- a/tests/qemu-iotests/184.out
> +++ b/tests/qemu-iotests/184.out
> @@ -27,6 +27,11 @@ Testing:
>               "iops_rd": 0,
>               "detect_zeroes": "off",
>               "image": {
> +                "backing-image": {
> +                    "virtual-size": 1073741824,
> +                    "filename": "null-co://",
> +                    "format": "null-co"
> +                },
>                   "virtual-size": 1073741824,
>                   "filename": "json:{\"throttle-group\": \"group0\", \"driver\": \"throttle\", \"file\": {\"driver\": \"null-co\"}}",
>                   "format": "throttle"
> @@ -34,7 +39,7 @@ Testing:
>               "iops_wr": 0,
>               "ro": false,
>               "node-name": "throttle0",
> -            "backing_file_depth": 0,
> +            "backing_file_depth": 1,
>               "drv": "throttle",
>               "iops": 0,
>               "bps_wr": 0,
> diff --git a/tests/qemu-iotests/204.out b/tests/qemu-iotests/204.out
> index f3a10fbe90..684774d763 100644
> --- a/tests/qemu-iotests/204.out
> +++ b/tests/qemu-iotests/204.out
> @@ -59,5 +59,6 @@ Offset          Length          File
>   0x900000        0x2400000       TEST_DIR/t.IMGFMT
>   0x3c00000       0x1100000       TEST_DIR/t.IMGFMT
>   0x6a00000       0x400000        TEST_DIR/t.IMGFMT
> +0x6e00000       0x1200000       TEST_DIR/t.IMGFMT.base
>   No errors were found on the image.
>   *** done
> 


-- 
Best regards,
Vladimir

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [PATCH v4 03/11] block: Storage child access function
  2019-04-10 20:20   ` Max Reitz
  (?)
@ 2019-05-20 10:41   ` Vladimir Sementsov-Ogievskiy
  2019-05-28 18:09     ` Max Reitz
  -1 siblings, 1 reply; 48+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2019-05-20 10:41 UTC (permalink / raw)
  To: Max Reitz, qemu-block; +Cc: Kevin Wolf, qemu-devel

10.04.2019 23:20, Max Reitz wrote:
> For completeness' sake, add a function for accessing a node's storage
> child, too.  For filters, this is their filtered child; for non-filters,
> this is bs->file.
> 
> Some places are deliberately left unconverted:
> - BDS opening/closing functions where bs->file is handled specially
>    (which is basically wrong, but at least simplifies probing)
> - bdrv_co_block_status_from_file(), because its name implies that it
>    points to ->file
> - bdrv_snapshot_goto() in one places unrefs bs->file.  Such a
>    modification is not covered by this patch and is therefore just
>    safeguarded by an additional assert(), but otherwise kept as-is.
> 
> Signed-off-by: Max Reitz <mreitz@redhat.com>

[..]

> --- a/block/io.c
> +++ b/block/io.c

[..]

> @@ -2559,7 +2554,7 @@ int coroutine_fn bdrv_co_flush(BlockDriverState *bs)
>       }
>   
>       /* Write back cached data to the OS even with cache=unsafe */
> -    BLKDBG_EVENT(bs->file, BLKDBG_FLUSH_TO_OS);
> +    BLKDBG_EVENT(bdrv_storage_child(bs), BLKDBG_FLUSH_TO_OS);

Hmm, preexistent, but strange that we call EVENT for bs->file before action on bs...

>       if (bs->drv->bdrv_co_flush_to_os) {
>           ret = bs->drv->bdrv_co_flush_to_os(bs);
>           if (ret < 0) {
> @@ -2577,7 +2572,7 @@ int coroutine_fn bdrv_co_flush(BlockDriverState *bs)
>           goto flush_parent;
>       }
>   
> -    BLKDBG_EVENT(bs->file, BLKDBG_FLUSH_TO_DISK);
> +    BLKDBG_EVENT(bdrv_storage_child(bs), BLKDBG_FLUSH_TO_DISK);
>       if (!bs->drv) {
>           /* bs->drv->bdrv_co_flush() might have ejected the BDS
>            * (even in case of apparent success) */
> @@ -2622,7 +2617,8 @@ int coroutine_fn bdrv_co_flush(BlockDriverState *bs)
>        * in the case of cache=unsafe, so there are no useless flushes.
>        */
>   flush_parent:
> -    ret = bs->file ? bdrv_co_flush(bs->file->bs) : 0;
> +    storage_bs = bdrv_storage_bs(bs);
> +    ret = storage_bs ? bdrv_co_flush(storage_bs) : 0;
>   out:
>       /* Notify any pending flushes that we have completed */
>       if (ret == 0) {

[..]

> --- a/block/snapshot.c
> +++ b/block/snapshot.c

[..]

> @@ -184,6 +186,7 @@ int bdrv_snapshot_goto(BlockDriverState *bs,
>                          Error **errp)
>   {
>       BlockDriver *drv = bs->drv;
> +    BlockDriverState *storage_bs;
>       int ret, open_ret;
>   
>       if (!drv) {
> @@ -204,39 +207,40 @@ int bdrv_snapshot_goto(BlockDriverState *bs,
>           return ret;
>       }
>   
> -    if (bs->file) {
> -        BlockDriverState *file;
> +    storage_bs = bdrv_storage_bs(bs);
> +    if (storage_bs) {
>           QDict *options = qdict_clone_shallow(bs->options);
>           QDict *file_options;
>           Error *local_err = NULL;
>   
> -        file = bs->file->bs;
>           /* Prevent it from getting deleted when detached from bs */
> -        bdrv_ref(file);
> +        bdrv_ref(storage_bs);
>   
>           qdict_extract_subqdict(options, &file_options, "file.");
>           qobject_unref(file_options);
> -        qdict_put_str(options, "file", bdrv_get_node_name(file));
> +        qdict_put_str(options, "file", bdrv_get_node_name(storage_bs));
>   
>           if (drv->bdrv_close) {
>               drv->bdrv_close(bs);
>           }
> +
> +        assert(bs->file->bs == storage_bs);

Hmm, but what save us from this assertion fail for backing-filters? Before your
patch it was unreachable for them. Or what I miss?

>           bdrv_unref_child(bs, bs->file);
>           bs->file = NULL;
>   
> -        ret = bdrv_snapshot_goto(file, snapshot_id, errp);
> +        ret = bdrv_snapshot_goto(storage_bs, snapshot_id, errp);
>           open_ret = drv->bdrv_open(bs, options, bs->open_flags, &local_err);
>           qobject_unref(options);
>           if (open_ret < 0) {
> -            bdrv_unref(file);
> +            bdrv_unref(storage_bs);
>               bs->drv = NULL;
>               /* A bdrv_snapshot_goto() error takes precedence */
>               error_propagate(errp, local_err);
>               return ret < 0 ? ret : open_ret;
>           }
>   
> -        assert(bs->file->bs == file);
> -        bdrv_unref(file);
> +        assert(bs->file->bs == storage_bs);
> +        bdrv_unref(storage_bs);
>           return ret;
>       }
>   



-- 
Best regards,
Vladimir

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [PATCH v4 04/11] block: Inline bdrv_co_block_status_from_*()
  2019-04-10 20:20   ` Max Reitz
  (?)
@ 2019-05-21  8:57   ` Vladimir Sementsov-Ogievskiy
  2019-05-28 17:58     ` Max Reitz
  -1 siblings, 1 reply; 48+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2019-05-21  8:57 UTC (permalink / raw)
  To: Max Reitz, qemu-block; +Cc: Kevin Wolf, qemu-devel

10.04.2019 23:20, Max Reitz wrote:
> With bdrv_filtered_rw_bs(), we can easily handle this default filter
> behavior in bdrv_co_block_status().
> 
> blkdebug wants to have an additional assertion, so it keeps its own
> implementation, except bdrv_co_block_status_from_file() needs to be
> inlined there.
> 
> Suggested-by: Eric Blake <eblake@redhat.com>
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>   include/block/block_int.h | 22 -----------------
>   block/blkdebug.c          |  7 ++++--
>   block/blklogwrites.c      |  1 -
>   block/commit.c            |  1 -
>   block/copy-on-read.c      |  2 --
>   block/io.c                | 51 +++++++++++++--------------------------
>   block/mirror.c            |  1 -
>   block/throttle.c          |  1 -
>   8 files changed, 22 insertions(+), 64 deletions(-)
> 
> diff --git a/include/block/block_int.h b/include/block/block_int.h
> index d0309e6307..76c7c0a111 100644
> --- a/include/block/block_int.h
> +++ b/include/block/block_int.h
> @@ -1187,28 +1187,6 @@ void bdrv_format_default_perms(BlockDriverState *bs, BdrvChild *c,
>                                  uint64_t perm, uint64_t shared,
>                                  uint64_t *nperm, uint64_t *nshared);
>   
> -/*
> - * Default implementation for drivers to pass bdrv_co_block_status() to
> - * their file.
> - */
> -int coroutine_fn bdrv_co_block_status_from_file(BlockDriverState *bs,
> -                                                bool want_zero,
> -                                                int64_t offset,
> -                                                int64_t bytes,
> -                                                int64_t *pnum,
> -                                                int64_t *map,
> -                                                BlockDriverState **file);
> -/*
> - * Default implementation for drivers to pass bdrv_co_block_status() to
> - * their backing file.
> - */
> -int coroutine_fn bdrv_co_block_status_from_backing(BlockDriverState *bs,
> -                                                   bool want_zero,
> -                                                   int64_t offset,
> -                                                   int64_t bytes,
> -                                                   int64_t *pnum,
> -                                                   int64_t *map,
> -                                                   BlockDriverState **file);
>   const char *bdrv_get_parent_name(const BlockDriverState *bs);
>   void blk_dev_change_media_cb(BlockBackend *blk, bool load, Error **errp);
>   bool blk_dev_has_removable_media(BlockBackend *blk);
> diff --git a/block/blkdebug.c b/block/blkdebug.c
> index efd9441625..7950ae729c 100644
> --- a/block/blkdebug.c
> +++ b/block/blkdebug.c
> @@ -637,8 +637,11 @@ static int coroutine_fn blkdebug_co_block_status(BlockDriverState *bs,
>                                                    BlockDriverState **file)
>   {
>       assert(QEMU_IS_ALIGNED(offset | bytes, bs->bl.request_alignment));
> -    return bdrv_co_block_status_from_file(bs, want_zero, offset, bytes,
> -                                          pnum, map, file);
> +    assert(bs->file && bs->file->bs);
> +    *pnum = bytes;
> +    *map = offset;
> +    *file = bs->file->bs;
> +    return BDRV_BLOCK_RAW | BDRV_BLOCK_OFFSET_VALID;

directly inlined, OK

>   }
>   
>   static void blkdebug_close(BlockDriverState *bs)
> diff --git a/block/blklogwrites.c b/block/blklogwrites.c
> index eb2b4901a5..1eb4a5c613 100644
> --- a/block/blklogwrites.c
> +++ b/block/blklogwrites.c
> @@ -518,7 +518,6 @@ static BlockDriver bdrv_blk_log_writes = {
>       .bdrv_co_pwrite_zeroes  = blk_log_writes_co_pwrite_zeroes,
>       .bdrv_co_flush_to_disk  = blk_log_writes_co_flush_to_disk,
>       .bdrv_co_pdiscard       = blk_log_writes_co_pdiscard,
> -    .bdrv_co_block_status   = bdrv_co_block_status_from_file,
>   
>       .is_filter              = true,
>       .strong_runtime_opts    = blk_log_writes_strong_runtime_opts,
> diff --git a/block/commit.c b/block/commit.c
> index 252007fd57..c366ee9655 100644
> --- a/block/commit.c
> +++ b/block/commit.c
> @@ -254,7 +254,6 @@ static void bdrv_commit_top_child_perm(BlockDriverState *bs, BdrvChild *c,
>   static BlockDriver bdrv_commit_top = {
>       .format_name                = "commit_top",
>       .bdrv_co_preadv             = bdrv_commit_top_preadv,
> -    .bdrv_co_block_status       = bdrv_co_block_status_from_backing,
>       .bdrv_refresh_filename      = bdrv_commit_top_refresh_filename,
>       .bdrv_child_perm            = bdrv_commit_top_child_perm,
>   
> diff --git a/block/copy-on-read.c b/block/copy-on-read.c
> index 53972b1da3..fe9260163c 100644
> --- a/block/copy-on-read.c
> +++ b/block/copy-on-read.c
> @@ -150,8 +150,6 @@ static BlockDriver bdrv_copy_on_read = {
>       .bdrv_eject                         = cor_eject,
>       .bdrv_lock_medium                   = cor_lock_medium,
>   
> -    .bdrv_co_block_status               = bdrv_co_block_status_from_file,
> -
>       .bdrv_recurse_is_first_non_filter   = cor_recurse_is_first_non_filter,
>   
>       .has_variable_length                = true,
> diff --git a/block/io.c b/block/io.c
> index 5c33ecc080..8d124bae5c 100644
> --- a/block/io.c
> +++ b/block/io.c
> @@ -1993,36 +1993,6 @@ typedef struct BdrvCoBlockStatusData {
>       bool done;
>   } BdrvCoBlockStatusData;
>   
> -int coroutine_fn bdrv_co_block_status_from_file(BlockDriverState *bs,
> -                                                bool want_zero,
> -                                                int64_t offset,
> -                                                int64_t bytes,
> -                                                int64_t *pnum,
> -                                                int64_t *map,
> -                                                BlockDriverState **file)
> -{
> -    assert(bs->file && bs->file->bs);
> -    *pnum = bytes;
> -    *map = offset;
> -    *file = bs->file->bs;
> -    return BDRV_BLOCK_RAW | BDRV_BLOCK_OFFSET_VALID;
> -}
> -
> -int coroutine_fn bdrv_co_block_status_from_backing(BlockDriverState *bs,
> -                                                   bool want_zero,
> -                                                   int64_t offset,
> -                                                   int64_t bytes,
> -                                                   int64_t *pnum,
> -                                                   int64_t *map,
> -                                                   BlockDriverState **file)
> -{
> -    assert(bs->backing && bs->backing->bs);
> -    *pnum = bytes;
> -    *map = offset;
> -    *file = bs->backing->bs;
> -    return BDRV_BLOCK_RAW | BDRV_BLOCK_OFFSET_VALID;
> -}
> -
>   /*
>    * Returns the allocation status of the specified sectors.
>    * Drivers not implementing the functionality are assumed to not support
> @@ -2063,6 +2033,7 @@ static int coroutine_fn bdrv_co_block_status(BlockDriverState *bs,
>       BlockDriverState *local_file = NULL;
>       int64_t aligned_offset, aligned_bytes;
>       uint32_t align;
> +    bool has_filtered_child;
>   
>       assert(pnum);
>       *pnum = 0;
> @@ -2088,7 +2059,8 @@ static int coroutine_fn bdrv_co_block_status(BlockDriverState *bs,
>   
>       /* Must be non-NULL or bdrv_getlength() would have failed */
>       assert(bs->drv);
> -    if (!bs->drv->bdrv_co_block_status) {
> +    has_filtered_child = bs->drv->is_filter && bdrv_filtered_rw_child(bs);
> +    if (!bs->drv->bdrv_co_block_status && !has_filtered_child) {
>           *pnum = bytes;
>           ret = BDRV_BLOCK_DATA | BDRV_BLOCK_ALLOCATED;
>           if (offset + bytes == total_size) {
> @@ -2109,9 +2081,20 @@ static int coroutine_fn bdrv_co_block_status(BlockDriverState *bs,
>       aligned_offset = QEMU_ALIGN_DOWN(offset, align);
>       aligned_bytes = ROUND_UP(offset + bytes, align) - aligned_offset;
>   
> -    ret = bs->drv->bdrv_co_block_status(bs, want_zero, aligned_offset,
> -                                        aligned_bytes, pnum, &local_map,
> -                                        &local_file);
> +    if (bs->drv->bdrv_co_block_status) {
> +        ret = bs->drv->bdrv_co_block_status(bs, want_zero, aligned_offset,
> +                                            aligned_bytes, pnum, &local_map,
> +                                            &local_file);
> +    } else {
> +        /* Default code for filters */
> +
> +        local_file = bdrv_filtered_rw_bs(bs);
> +        assert(local_file);
> +
> +        *pnum = aligned_bytes;
> +        local_map = aligned_offset;
> +        ret = BDRV_BLOCK_RAW | BDRV_BLOCK_OFFSET_VALID;
> +    }


preexistent, but why default for filters is aligned and for other nodes is not?

>       if (ret < 0) {
>           *pnum = 0;
>           goto out;
> diff --git a/block/mirror.c b/block/mirror.c
> index 80cef587f0..2e521c726a 100644
> --- a/block/mirror.c
> +++ b/block/mirror.c
> @@ -1487,7 +1487,6 @@ static BlockDriver bdrv_mirror_top = {
>       .bdrv_co_pwrite_zeroes      = bdrv_mirror_top_pwrite_zeroes,
>       .bdrv_co_pdiscard           = bdrv_mirror_top_pdiscard,
>       .bdrv_co_flush              = bdrv_mirror_top_flush,
> -    .bdrv_co_block_status       = bdrv_co_block_status_from_backing,
>       .bdrv_refresh_filename      = bdrv_mirror_top_refresh_filename,
>       .bdrv_child_perm            = bdrv_mirror_top_child_perm,
>   
> diff --git a/block/throttle.c b/block/throttle.c
> index f64dcc27b9..b6922e734f 100644
> --- a/block/throttle.c
> +++ b/block/throttle.c
> @@ -259,7 +259,6 @@ static BlockDriver bdrv_throttle = {
>       .bdrv_reopen_prepare                =   throttle_reopen_prepare,
>       .bdrv_reopen_commit                 =   throttle_reopen_commit,
>       .bdrv_reopen_abort                  =   throttle_reopen_abort,
> -    .bdrv_co_block_status               =   bdrv_co_block_status_from_file,
>   
>       .bdrv_co_drain_begin                =   throttle_co_drain_begin,
>       .bdrv_co_drain_end                  =   throttle_co_drain_end,
> 


-- 
Best regards,
Vladimir

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [PATCH v4 02/11] block: Filtered children access functions
  2019-05-17 11:50       ` Vladimir Sementsov-Ogievskiy
@ 2019-05-23 14:49         ` Max Reitz
  2019-05-23 15:08           ` Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 48+ messages in thread
From: Max Reitz @ 2019-05-23 14:49 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-block; +Cc: Kevin Wolf, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 10832 bytes --]

On 17.05.19 13:50, Vladimir Sementsov-Ogievskiy wrote:
> 07.05.2019 18:13, Max Reitz wrote:
>> On 07.05.19 15:30, Vladimir Sementsov-Ogievskiy wrote:
>>> 10.04.2019 23:20, Max Reitz wrote:
>>>> What bs->file and bs->backing mean depends on the node.  For filter
>>>> nodes, both signify a node that will eventually receive all R/W
>>>> accesses.  For format nodes, bs->file contains metadata and data, and
>>>> bs->backing will not receive writes -- instead, writes are COWed to
>>>> bs->file.  Usually.
>>>>
>>>> In any case, it is not trivial to guess what a child means exactly with
>>>> our currently limited form of expression.  It is better to introduce
>>>> some functions that actually guarantee a meaning:
>>>>
>>>> - bdrv_filtered_cow_child() will return the child that receives requests
>>>>     filtered through COW.  That is, reads may or may not be forwarded
>>>>     (depending on the overlay's allocation status), but writes never go to
>>>>     this child.
>>>>
>>>> - bdrv_filtered_rw_child() will return the child that receives requests
>>>>     filtered through some very plain process.  Reads and writes issued to
>>>>     the parent will go to the child as well (although timing, etc. may be
>>>>     modified).
>>>>
>>>> - All drivers but quorum (but quorum is pretty opaque to the general
>>>>     block layer anyway) always only have one of these children: All read
>>>>     requests must be served from the filtered_rw_child (if it exists), so
>>>>     if there was a filtered_cow_child in addition, it would not receive
>>>>     any requests at all.
>>>>     (The closest here is mirror, where all requests are passed on to the
>>>>     source, but with write-blocking, write requests are "COWed" to the
>>>>     target.  But that just means that the target is a special child that
>>>>     cannot be introspected by the generic block layer functions, and that
>>>>     source is a filtered_rw_child.)
>>>>     Therefore, we can also add bdrv_filtered_child() which returns that
>>>>     one child (or NULL, if there is no filtered child).
>>>>
>>>> Also, many places in the current block layer should be skipping filters
>>>> (all filters or just the ones added implicitly, it depends) when going
>>>> through a block node chain.  They do not do that currently, but this
>>>> patch makes them.
>>>>
>>>> One example for this is qemu-img map, which should skip filters and only
>>>> look at the COW elements in the graph.  The change to iotest 204's
>>>> reference output shows how using blkdebug on top of a COW node used to
>>>> make qemu-img map disregard the rest of the backing chain, but with this
>>>> patch, the allocation in the base image is reported correctly.
>>>>
>>>> Furthermore, a note should be made that sometimes we do want to access
>>>> bs->backing directly.  This is whenever the operation in question is not
>>>> about accessing the COW child, but the "backing" child, be it COW or
>>>> not.  This is the case in functions such as bdrv_open_backing_file() or
>>>> whenever we have to deal with the special behavior of @backing as a
>>>> blockdev option, which is that it does not default to null like all
>>>> other child references do.
>>>>
>>>> Finally, the query functions (query-block and query-named-block-nodes)
>>>> are modified to return any filtered child under "backing", not just
>>>> bs->backing or COW children.  This is so that filters do not interrupt
>>>> the reported backing chain.  This changes the output of iotest 184, as
>>>> the throttled node now appears as a backing child.
>>>>
>>>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>>>> ---
>>>>    qapi/block-core.json           |   4 +
>>>>    include/block/block.h          |   1 +
>>>>    include/block/block_int.h      |  40 +++++--
>>>>    block.c                        | 210 +++++++++++++++++++++++++++------
>>>>    block/backup.c                 |   8 +-
>>>>    block/block-backend.c          |  16 ++-
>>>>    block/commit.c                 |  33 +++---
>>>>    block/io.c                     |  45 ++++---
>>>>    block/mirror.c                 |  21 ++--
>>>>    block/qapi.c                   |  30 +++--
>>>>    block/stream.c                 |  13 +-
>>>>    blockdev.c                     |  88 +++++++++++---
>>>>    migration/block-dirty-bitmap.c |   4 +-
>>>>    nbd/server.c                   |   6 +-
>>>>    qemu-img.c                     |  29 ++---
>>>>    tests/qemu-iotests/184.out     |   7 +-
>>>>    tests/qemu-iotests/204.out     |   1 +
>>>>    17 files changed, 411 insertions(+), 145 deletions(-)
>>>>
>>>> diff --git a/qapi/block-core.json b/qapi/block-core.json
>>>> index 7ccbfff9d0..dbd9286e4a 100644
>>>> --- a/qapi/block-core.json
>>>> +++ b/qapi/block-core.json
>>>> @@ -2502,6 +2502,10 @@
>>>>    # On successful completion the image file is updated to drop the backing file
>>>>    # and the BLOCK_JOB_COMPLETED event is emitted.
>>>>    #
>>>> +# In case @device is a filter node, block-stream modifies the first non-filter
>>>> +# overlay node below it to point to base's backing node (or NULL if @base was
>>>> +# not specified) instead of modifying @device itself.
>>>> +#
>>>
>>> Is it necessary, why we can't keep it as is, modifying exactly device node? May be,
>>> user wants to use filter in stream process, throttling for example.
>>
>> That wouldn't make any sense.  Say you have this configuration:
>>
>> throttle -> top -> base
>>
>> Now you stream from base to throttle.  The data goes from base through
>> throttle to top.  You propose to then make throttle point to base:
>>
>> throttle -> base
>>
>> This will discard all the data in top.
>>
>> Filters don’t store any data.  You need to keep the top data storing
>> image, i.e. the first non-filter overlay.
> 
> Ah, yes, good reason.
> 
>>
>>>>    # @job-id: identifier for the newly-created block job. If
>>>>    #          omitted, the device name will be used. (Since 2.7)
>>>>    #
>>
>> [...]
>>
>>>> @@ -2345,7 +2347,7 @@ void bdrv_set_backing_hd(BlockDriverState *bs, BlockDriverState *backing_hd,
>>>>        bool update_inherits_from = bdrv_chain_contains(bs, backing_hd) &&
>>>>            bdrv_inherits_from_recursive(backing_hd, bs);
>>>>    
>>>> -    if (bdrv_is_backing_chain_frozen(bs, backing_bs(bs), errp)) {
>>>> +    if (bdrv_is_backing_chain_frozen(bs, child_bs(bs->backing), errp)) {
>>>
>>> If we support file-filters for frozen backing chain, could it go through file child here?
>>> Hmm, only in case when we are going to set backing hd for file-filter.. Hmm, could filter have
>>> both file and backing children?
>>
>> No.  A filter passes through data from its children, so it can only have
>> a single child, or it is quorum.
>>
>> The file/backing combination is reserved for COW overlays.  file is
>> where the current layer’s data is, backing is the filtered child.
> 
> My backup-top has two children - backing and target.. So, I think, we can state that
> filter should not have both file and backing children, but may have any other special
> children he wants, invisible for backing-child/file-child generic logic.

Ah, yes, sorry, that’s what I meant.  A filter can have only a single
filtered child, but other than that, they’re free to have whatever.

[...]

>>> Here we don't want to check the chain, we exactly want to check backing link, so it should be
>>> something like
>>>
>>> if (bs->backing && bs->backing->frozen) {
>>>      error_setg("backig exists and frozen!");
>>>      return;
>>> }
>>>
>>>
>>> Hmm, on the other hand, if we have frozen backing chain, going through file child, we must not add
>>> backing child to the node with file child, as it will change backing chain (which by default goes
>>> through backing)..
>>>
>>> Anyway, we don't need to check the whole backing chain, as we may find other frozen backing subchain,
>>> far away of bs.. So, we possibly want to check
>>>
>>> if (bdrv_filtered_child(bs) && bdrv_filtered_child(bs)->frozed) {
>>>     ERROR
>>> }
>>>
>>>
>>> ....
>>>
>>> also, we'll need to check for frozen file child, when we want to replace it.
>>
>> I don’t quite understand.  It sounds to me like you’re saying we don’t
>> need to check the whole chain here but just the immediate child.  But
>> isn’t that true regardless of this series?
> 
> If we restrict adding backing child to filter with file child, all becomes simpler and seems to be correct.

OK. :-)

> Should we add check for frozen file child to bdrv_replace_child() ?

Argh.  You mean move it from bdrv_set_backing_hd()?  That actually makes
a lot of sense to me.  The problem is that bdrv_replace_child()
currently cannot return an error, which may be a problem for
bdrv_detach_child().  Hm.  But that’s effectively only called from
functions where the child is unref’d, and you have to know that your own
child is not frozen before you unref it.  So I guess we should be good
to pass an &error_abort there.

[...]

>>>> @@ -2208,7 +2218,7 @@ static int coroutine_fn bdrv_co_block_status_above(BlockDriverState *bs,
>>>>        bool first = true;
>>>>    
>>>>        assert(bs != base);
>>>> -    for (p = bs; p != base; p = backing_bs(p)) {
>>>> +    for (p = bs; p != base; p = bdrv_filtered_bs(p)) {
>>>>            ret = bdrv_co_block_status(p, want_zero, offset, bytes, pnum, map,
>>>>                                       file);
>>>
>>> Interesting that for filters who use bdrv_co_block_status_from_backing and
>>> bdrv_co_block_status_from_file we will finally call .bdrv_co_block_status of
>>> underalying real node two or more times.. It's not wrong but obviously not optimal.
>>
>> Hm.  If @p is a filter, we could skip straight to *file.  Would that work?
> 
> No, as file may be not in backing chain:
> 
> filter
>     |
>     v
> qcow2 -> file
>     |
>     v
> qcow2
> 
> So, we shouldn't redirect the whole loop to file..

But qcow2 is not a filter.  I meant skipping to *file only if the
current node is a filter.  And I don’t mean bs->file, I mean *file --
like, what bdrv_co_block_status() returns.

You say in your other mail that filters can have an own implementation
of .bdrv_co_block_status(), but I don’t think that makes sense,
actually.  They should always pass the status of their filtered child.

blkdebug is the only filter I know that has an own implementation, and
the only thing besides passing the data through is add an alignment
assertion.  If it simplifies everything else, I’m very much willing to
break that.

Max

> May be the correct solution should be introducing additional handler
> .bdrv_co_block_status_above with different logic..


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [PATCH v4 02/11] block: Filtered children access functions
  2019-05-23 14:49         ` Max Reitz
@ 2019-05-23 15:08           ` Vladimir Sementsov-Ogievskiy
  2019-05-23 15:56             ` Max Reitz
  0 siblings, 1 reply; 48+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2019-05-23 15:08 UTC (permalink / raw)
  To: Max Reitz, qemu-block; +Cc: Kevin Wolf, qemu-devel

23.05.2019 17:49, Max Reitz wrote:
> On 17.05.19 13:50, Vladimir Sementsov-Ogievskiy wrote:
>> 07.05.2019 18:13, Max Reitz wrote:
>>> On 07.05.19 15:30, Vladimir Sementsov-Ogievskiy wrote:
>>>> 10.04.2019 23:20, Max Reitz wrote:
>>>>> What bs->file and bs->backing mean depends on the node.  For filter
>>>>> nodes, both signify a node that will eventually receive all R/W
>>>>> accesses.  For format nodes, bs->file contains metadata and data, and
>>>>> bs->backing will not receive writes -- instead, writes are COWed to
>>>>> bs->file.  Usually.
>>>>>
>>>>> In any case, it is not trivial to guess what a child means exactly with
>>>>> our currently limited form of expression.  It is better to introduce
>>>>> some functions that actually guarantee a meaning:
>>>>>
>>>>> - bdrv_filtered_cow_child() will return the child that receives requests
>>>>>      filtered through COW.  That is, reads may or may not be forwarded
>>>>>      (depending on the overlay's allocation status), but writes never go to
>>>>>      this child.
>>>>>
>>>>> - bdrv_filtered_rw_child() will return the child that receives requests
>>>>>      filtered through some very plain process.  Reads and writes issued to
>>>>>      the parent will go to the child as well (although timing, etc. may be
>>>>>      modified).
>>>>>
>>>>> - All drivers but quorum (but quorum is pretty opaque to the general
>>>>>      block layer anyway) always only have one of these children: All read
>>>>>      requests must be served from the filtered_rw_child (if it exists), so
>>>>>      if there was a filtered_cow_child in addition, it would not receive
>>>>>      any requests at all.
>>>>>      (The closest here is mirror, where all requests are passed on to the
>>>>>      source, but with write-blocking, write requests are "COWed" to the
>>>>>      target.  But that just means that the target is a special child that
>>>>>      cannot be introspected by the generic block layer functions, and that
>>>>>      source is a filtered_rw_child.)
>>>>>      Therefore, we can also add bdrv_filtered_child() which returns that
>>>>>      one child (or NULL, if there is no filtered child).
>>>>>
>>>>> Also, many places in the current block layer should be skipping filters
>>>>> (all filters or just the ones added implicitly, it depends) when going
>>>>> through a block node chain.  They do not do that currently, but this
>>>>> patch makes them.
>>>>>
>>>>> One example for this is qemu-img map, which should skip filters and only
>>>>> look at the COW elements in the graph.  The change to iotest 204's
>>>>> reference output shows how using blkdebug on top of a COW node used to
>>>>> make qemu-img map disregard the rest of the backing chain, but with this
>>>>> patch, the allocation in the base image is reported correctly.
>>>>>
>>>>> Furthermore, a note should be made that sometimes we do want to access
>>>>> bs->backing directly.  This is whenever the operation in question is not
>>>>> about accessing the COW child, but the "backing" child, be it COW or
>>>>> not.  This is the case in functions such as bdrv_open_backing_file() or
>>>>> whenever we have to deal with the special behavior of @backing as a
>>>>> blockdev option, which is that it does not default to null like all
>>>>> other child references do.
>>>>>
>>>>> Finally, the query functions (query-block and query-named-block-nodes)
>>>>> are modified to return any filtered child under "backing", not just
>>>>> bs->backing or COW children.  This is so that filters do not interrupt
>>>>> the reported backing chain.  This changes the output of iotest 184, as
>>>>> the throttled node now appears as a backing child.
>>>>>
>>>>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>>>>> ---
>>>>>     qapi/block-core.json           |   4 +
>>>>>     include/block/block.h          |   1 +
>>>>>     include/block/block_int.h      |  40 +++++--
>>>>>     block.c                        | 210 +++++++++++++++++++++++++++------
>>>>>     block/backup.c                 |   8 +-
>>>>>     block/block-backend.c          |  16 ++-
>>>>>     block/commit.c                 |  33 +++---
>>>>>     block/io.c                     |  45 ++++---
>>>>>     block/mirror.c                 |  21 ++--
>>>>>     block/qapi.c                   |  30 +++--
>>>>>     block/stream.c                 |  13 +-
>>>>>     blockdev.c                     |  88 +++++++++++---
>>>>>     migration/block-dirty-bitmap.c |   4 +-
>>>>>     nbd/server.c                   |   6 +-
>>>>>     qemu-img.c                     |  29 ++---
>>>>>     tests/qemu-iotests/184.out     |   7 +-
>>>>>     tests/qemu-iotests/204.out     |   1 +
>>>>>     17 files changed, 411 insertions(+), 145 deletions(-)
>>>>>
>>>>> diff --git a/qapi/block-core.json b/qapi/block-core.json
>>>>> index 7ccbfff9d0..dbd9286e4a 100644
>>>>> --- a/qapi/block-core.json
>>>>> +++ b/qapi/block-core.json
>>>>> @@ -2502,6 +2502,10 @@
>>>>>     # On successful completion the image file is updated to drop the backing file
>>>>>     # and the BLOCK_JOB_COMPLETED event is emitted.
>>>>>     #
>>>>> +# In case @device is a filter node, block-stream modifies the first non-filter
>>>>> +# overlay node below it to point to base's backing node (or NULL if @base was
>>>>> +# not specified) instead of modifying @device itself.
>>>>> +#
>>>>
>>>> Is it necessary, why we can't keep it as is, modifying exactly device node? May be,
>>>> user wants to use filter in stream process, throttling for example.
>>>
>>> That wouldn't make any sense.  Say you have this configuration:
>>>
>>> throttle -> top -> base
>>>
>>> Now you stream from base to throttle.  The data goes from base through
>>> throttle to top.  You propose to then make throttle point to base:
>>>
>>> throttle -> base
>>>
>>> This will discard all the data in top.
>>>
>>> Filters don’t store any data.  You need to keep the top data storing
>>> image, i.e. the first non-filter overlay.
>>
>> Ah, yes, good reason.
>>
>>>
>>>>>     # @job-id: identifier for the newly-created block job. If
>>>>>     #          omitted, the device name will be used. (Since 2.7)
>>>>>     #
>>>
>>> [...]
>>>
>>>>> @@ -2345,7 +2347,7 @@ void bdrv_set_backing_hd(BlockDriverState *bs, BlockDriverState *backing_hd,
>>>>>         bool update_inherits_from = bdrv_chain_contains(bs, backing_hd) &&
>>>>>             bdrv_inherits_from_recursive(backing_hd, bs);
>>>>>     
>>>>> -    if (bdrv_is_backing_chain_frozen(bs, backing_bs(bs), errp)) {
>>>>> +    if (bdrv_is_backing_chain_frozen(bs, child_bs(bs->backing), errp)) {
>>>>
>>>> If we support file-filters for frozen backing chain, could it go through file child here?
>>>> Hmm, only in case when we are going to set backing hd for file-filter.. Hmm, could filter have
>>>> both file and backing children?
>>>
>>> No.  A filter passes through data from its children, so it can only have
>>> a single child, or it is quorum.
>>>
>>> The file/backing combination is reserved for COW overlays.  file is
>>> where the current layer’s data is, backing is the filtered child.
>>
>> My backup-top has two children - backing and target.. So, I think, we can state that
>> filter should not have both file and backing children, but may have any other special
>> children he wants, invisible for backing-child/file-child generic logic.
> 
> Ah, yes, sorry, that’s what I meant.  A filter can have only a single
> filtered child, but other than that, they’re free to have whatever.
> 
> [...]
> 
>>>> Here we don't want to check the chain, we exactly want to check backing link, so it should be
>>>> something like
>>>>
>>>> if (bs->backing && bs->backing->frozen) {
>>>>       error_setg("backig exists and frozen!");
>>>>       return;
>>>> }
>>>>
>>>>
>>>> Hmm, on the other hand, if we have frozen backing chain, going through file child, we must not add
>>>> backing child to the node with file child, as it will change backing chain (which by default goes
>>>> through backing)..
>>>>
>>>> Anyway, we don't need to check the whole backing chain, as we may find other frozen backing subchain,
>>>> far away of bs.. So, we possibly want to check
>>>>
>>>> if (bdrv_filtered_child(bs) && bdrv_filtered_child(bs)->frozed) {
>>>>      ERROR
>>>> }
>>>>
>>>>
>>>> ....
>>>>
>>>> also, we'll need to check for frozen file child, when we want to replace it.
>>>
>>> I don’t quite understand.  It sounds to me like you’re saying we don’t
>>> need to check the whole chain here but just the immediate child.  But
>>> isn’t that true regardless of this series?
>>
>> If we restrict adding backing child to filter with file child, all becomes simpler and seems to be correct.
> 
> OK. :-)
> 
>> Should we add check for frozen file child to bdrv_replace_child() ?
> 
> Argh.  You mean move it from bdrv_set_backing_hd()?  That actually makes
> a lot of sense to me.  The problem is that bdrv_replace_child()
> currently cannot return an error, which may be a problem for
> bdrv_detach_child().  Hm.  But that’s effectively only called from
> functions where the child is unref’d, and you have to know that your own
> child is not frozen before you unref it.  So I guess we should be good
> to pass an &error_abort there.
> 
> [...]
> 
>>>>> @@ -2208,7 +2218,7 @@ static int coroutine_fn bdrv_co_block_status_above(BlockDriverState *bs,
>>>>>         bool first = true;
>>>>>     
>>>>>         assert(bs != base);
>>>>> -    for (p = bs; p != base; p = backing_bs(p)) {
>>>>> +    for (p = bs; p != base; p = bdrv_filtered_bs(p)) {
>>>>>             ret = bdrv_co_block_status(p, want_zero, offset, bytes, pnum, map,
>>>>>                                        file);
>>>>
>>>> Interesting that for filters who use bdrv_co_block_status_from_backing and
>>>> bdrv_co_block_status_from_file we will finally call .bdrv_co_block_status of
>>>> underalying real node two or more times.. It's not wrong but obviously not optimal.
>>>
>>> Hm.  If @p is a filter, we could skip straight to *file.  Would that work?
>>
>> No, as file may be not in backing chain:
>>
>> filter  [A]
>>      |
>>      v
>> qcow2 -> file  [B]
>>      |
>>      v
>> qcow2
>>
>> So, we shouldn't redirect the whole loop to file..
> 
> But qcow2 is not a filter.  I meant skipping to *file only if the
> current node is a filter.  And I don’t mean bs->file, I mean *file --
> like, what bdrv_co_block_status() returns.

Me too. But as I understand, if we call bdrv_block_status on filter [A],
resulting *file returned by bdrv_co_block_status() will point to file [B]
due to recursion in bdrv_co_block_status.

> 
> You say in your other mail that filters can have an own implementation
> of .bdrv_co_block_status(), but I don’t think that makes sense,
> actually.  They should always pass the status of their filtered child.
> 
> blkdebug is the only filter I know that has an own implementation, and
> the only thing besides passing the data through is add an alignment
> assertion.  If it simplifies everything else, I’m very much willing to
> break that.

Agree that assertion is a bad reason to not implement some clean generic
logic.

> 
> Max
> 
>> May be the correct solution should be introducing additional handler
>> .bdrv_co_block_status_above with different logic..
> 


-- 
Best regards,
Vladimir

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [PATCH v4 02/11] block: Filtered children access functions
  2019-05-23 15:08           ` Vladimir Sementsov-Ogievskiy
@ 2019-05-23 15:56             ` Max Reitz
  0 siblings, 0 replies; 48+ messages in thread
From: Max Reitz @ 2019-05-23 15:56 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-block; +Cc: Kevin Wolf, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 11916 bytes --]

On 23.05.19 17:08, Vladimir Sementsov-Ogievskiy wrote:
> 23.05.2019 17:49, Max Reitz wrote:
>> On 17.05.19 13:50, Vladimir Sementsov-Ogievskiy wrote:
>>> 07.05.2019 18:13, Max Reitz wrote:
>>>> On 07.05.19 15:30, Vladimir Sementsov-Ogievskiy wrote:
>>>>> 10.04.2019 23:20, Max Reitz wrote:
>>>>>> What bs->file and bs->backing mean depends on the node.  For filter
>>>>>> nodes, both signify a node that will eventually receive all R/W
>>>>>> accesses.  For format nodes, bs->file contains metadata and data, and
>>>>>> bs->backing will not receive writes -- instead, writes are COWed to
>>>>>> bs->file.  Usually.
>>>>>>
>>>>>> In any case, it is not trivial to guess what a child means exactly with
>>>>>> our currently limited form of expression.  It is better to introduce
>>>>>> some functions that actually guarantee a meaning:
>>>>>>
>>>>>> - bdrv_filtered_cow_child() will return the child that receives requests
>>>>>>      filtered through COW.  That is, reads may or may not be forwarded
>>>>>>      (depending on the overlay's allocation status), but writes never go to
>>>>>>      this child.
>>>>>>
>>>>>> - bdrv_filtered_rw_child() will return the child that receives requests
>>>>>>      filtered through some very plain process.  Reads and writes issued to
>>>>>>      the parent will go to the child as well (although timing, etc. may be
>>>>>>      modified).
>>>>>>
>>>>>> - All drivers but quorum (but quorum is pretty opaque to the general
>>>>>>      block layer anyway) always only have one of these children: All read
>>>>>>      requests must be served from the filtered_rw_child (if it exists), so
>>>>>>      if there was a filtered_cow_child in addition, it would not receive
>>>>>>      any requests at all.
>>>>>>      (The closest here is mirror, where all requests are passed on to the
>>>>>>      source, but with write-blocking, write requests are "COWed" to the
>>>>>>      target.  But that just means that the target is a special child that
>>>>>>      cannot be introspected by the generic block layer functions, and that
>>>>>>      source is a filtered_rw_child.)
>>>>>>      Therefore, we can also add bdrv_filtered_child() which returns that
>>>>>>      one child (or NULL, if there is no filtered child).
>>>>>>
>>>>>> Also, many places in the current block layer should be skipping filters
>>>>>> (all filters or just the ones added implicitly, it depends) when going
>>>>>> through a block node chain.  They do not do that currently, but this
>>>>>> patch makes them.
>>>>>>
>>>>>> One example for this is qemu-img map, which should skip filters and only
>>>>>> look at the COW elements in the graph.  The change to iotest 204's
>>>>>> reference output shows how using blkdebug on top of a COW node used to
>>>>>> make qemu-img map disregard the rest of the backing chain, but with this
>>>>>> patch, the allocation in the base image is reported correctly.
>>>>>>
>>>>>> Furthermore, a note should be made that sometimes we do want to access
>>>>>> bs->backing directly.  This is whenever the operation in question is not
>>>>>> about accessing the COW child, but the "backing" child, be it COW or
>>>>>> not.  This is the case in functions such as bdrv_open_backing_file() or
>>>>>> whenever we have to deal with the special behavior of @backing as a
>>>>>> blockdev option, which is that it does not default to null like all
>>>>>> other child references do.
>>>>>>
>>>>>> Finally, the query functions (query-block and query-named-block-nodes)
>>>>>> are modified to return any filtered child under "backing", not just
>>>>>> bs->backing or COW children.  This is so that filters do not interrupt
>>>>>> the reported backing chain.  This changes the output of iotest 184, as
>>>>>> the throttled node now appears as a backing child.
>>>>>>
>>>>>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>>>>>> ---
>>>>>>     qapi/block-core.json           |   4 +
>>>>>>     include/block/block.h          |   1 +
>>>>>>     include/block/block_int.h      |  40 +++++--
>>>>>>     block.c                        | 210 +++++++++++++++++++++++++++------
>>>>>>     block/backup.c                 |   8 +-
>>>>>>     block/block-backend.c          |  16 ++-
>>>>>>     block/commit.c                 |  33 +++---
>>>>>>     block/io.c                     |  45 ++++---
>>>>>>     block/mirror.c                 |  21 ++--
>>>>>>     block/qapi.c                   |  30 +++--
>>>>>>     block/stream.c                 |  13 +-
>>>>>>     blockdev.c                     |  88 +++++++++++---
>>>>>>     migration/block-dirty-bitmap.c |   4 +-
>>>>>>     nbd/server.c                   |   6 +-
>>>>>>     qemu-img.c                     |  29 ++---
>>>>>>     tests/qemu-iotests/184.out     |   7 +-
>>>>>>     tests/qemu-iotests/204.out     |   1 +
>>>>>>     17 files changed, 411 insertions(+), 145 deletions(-)
>>>>>>
>>>>>> diff --git a/qapi/block-core.json b/qapi/block-core.json
>>>>>> index 7ccbfff9d0..dbd9286e4a 100644
>>>>>> --- a/qapi/block-core.json
>>>>>> +++ b/qapi/block-core.json
>>>>>> @@ -2502,6 +2502,10 @@
>>>>>>     # On successful completion the image file is updated to drop the backing file
>>>>>>     # and the BLOCK_JOB_COMPLETED event is emitted.
>>>>>>     #
>>>>>> +# In case @device is a filter node, block-stream modifies the first non-filter
>>>>>> +# overlay node below it to point to base's backing node (or NULL if @base was
>>>>>> +# not specified) instead of modifying @device itself.
>>>>>> +#
>>>>>
>>>>> Is it necessary, why we can't keep it as is, modifying exactly device node? May be,
>>>>> user wants to use filter in stream process, throttling for example.
>>>>
>>>> That wouldn't make any sense.  Say you have this configuration:
>>>>
>>>> throttle -> top -> base
>>>>
>>>> Now you stream from base to throttle.  The data goes from base through
>>>> throttle to top.  You propose to then make throttle point to base:
>>>>
>>>> throttle -> base
>>>>
>>>> This will discard all the data in top.
>>>>
>>>> Filters don’t store any data.  You need to keep the top data storing
>>>> image, i.e. the first non-filter overlay.
>>>
>>> Ah, yes, good reason.
>>>
>>>>
>>>>>>     # @job-id: identifier for the newly-created block job. If
>>>>>>     #          omitted, the device name will be used. (Since 2.7)
>>>>>>     #
>>>>
>>>> [...]
>>>>
>>>>>> @@ -2345,7 +2347,7 @@ void bdrv_set_backing_hd(BlockDriverState *bs, BlockDriverState *backing_hd,
>>>>>>         bool update_inherits_from = bdrv_chain_contains(bs, backing_hd) &&
>>>>>>             bdrv_inherits_from_recursive(backing_hd, bs);
>>>>>>     
>>>>>> -    if (bdrv_is_backing_chain_frozen(bs, backing_bs(bs), errp)) {
>>>>>> +    if (bdrv_is_backing_chain_frozen(bs, child_bs(bs->backing), errp)) {
>>>>>
>>>>> If we support file-filters for frozen backing chain, could it go through file child here?
>>>>> Hmm, only in case when we are going to set backing hd for file-filter.. Hmm, could filter have
>>>>> both file and backing children?
>>>>
>>>> No.  A filter passes through data from its children, so it can only have
>>>> a single child, or it is quorum.
>>>>
>>>> The file/backing combination is reserved for COW overlays.  file is
>>>> where the current layer’s data is, backing is the filtered child.
>>>
>>> My backup-top has two children - backing and target.. So, I think, we can state that
>>> filter should not have both file and backing children, but may have any other special
>>> children he wants, invisible for backing-child/file-child generic logic.
>>
>> Ah, yes, sorry, that’s what I meant.  A filter can have only a single
>> filtered child, but other than that, they’re free to have whatever.
>>
>> [...]
>>
>>>>> Here we don't want to check the chain, we exactly want to check backing link, so it should be
>>>>> something like
>>>>>
>>>>> if (bs->backing && bs->backing->frozen) {
>>>>>       error_setg("backig exists and frozen!");
>>>>>       return;
>>>>> }
>>>>>
>>>>>
>>>>> Hmm, on the other hand, if we have frozen backing chain, going through file child, we must not add
>>>>> backing child to the node with file child, as it will change backing chain (which by default goes
>>>>> through backing)..
>>>>>
>>>>> Anyway, we don't need to check the whole backing chain, as we may find other frozen backing subchain,
>>>>> far away of bs.. So, we possibly want to check
>>>>>
>>>>> if (bdrv_filtered_child(bs) && bdrv_filtered_child(bs)->frozed) {
>>>>>      ERROR
>>>>> }
>>>>>
>>>>>
>>>>> ....
>>>>>
>>>>> also, we'll need to check for frozen file child, when we want to replace it.
>>>>
>>>> I don’t quite understand.  It sounds to me like you’re saying we don’t
>>>> need to check the whole chain here but just the immediate child.  But
>>>> isn’t that true regardless of this series?
>>>
>>> If we restrict adding backing child to filter with file child, all becomes simpler and seems to be correct.
>>
>> OK. :-)
>>
>>> Should we add check for frozen file child to bdrv_replace_child() ?
>>
>> Argh.  You mean move it from bdrv_set_backing_hd()?  That actually makes
>> a lot of sense to me.  The problem is that bdrv_replace_child()
>> currently cannot return an error, which may be a problem for
>> bdrv_detach_child().  Hm.  But that’s effectively only called from
>> functions where the child is unref’d, and you have to know that your own
>> child is not frozen before you unref it.  So I guess we should be good
>> to pass an &error_abort there.
>>
>> [...]
>>
>>>>>> @@ -2208,7 +2218,7 @@ static int coroutine_fn bdrv_co_block_status_above(BlockDriverState *bs,
>>>>>>         bool first = true;
>>>>>>     
>>>>>>         assert(bs != base);
>>>>>> -    for (p = bs; p != base; p = backing_bs(p)) {
>>>>>> +    for (p = bs; p != base; p = bdrv_filtered_bs(p)) {
>>>>>>             ret = bdrv_co_block_status(p, want_zero, offset, bytes, pnum, map,
>>>>>>                                        file);
>>>>>
>>>>> Interesting that for filters who use bdrv_co_block_status_from_backing and
>>>>> bdrv_co_block_status_from_file we will finally call .bdrv_co_block_status of
>>>>> underalying real node two or more times.. It's not wrong but obviously not optimal.
>>>>
>>>> Hm.  If @p is a filter, we could skip straight to *file.  Would that work?
>>>
>>> No, as file may be not in backing chain:
>>>
>>> filter  [A]
>>>      |
>>>      v
>>> qcow2 -> file  [B]
>>>      |
>>>      v
>>> qcow2
>>>
>>> So, we shouldn't redirect the whole loop to file..
>>
>> But qcow2 is not a filter.  I meant skipping to *file only if the
>> current node is a filter.  And I don’t mean bs->file, I mean *file --
>> like, what bdrv_co_block_status() returns.
> 
> Me too. But as I understand, if we call bdrv_block_status on filter [A],
> resulting *file returned by bdrv_co_block_status() will point to file [B]
> due to recursion in bdrv_co_block_status.

Crap, you’re right.  Hm.  I guess I’ll just ignore this performance
problem for now, then.  The best thing I can think of would be to turn
@want_zero into a flag mask and then add a flag for “Please do not
recurse through filters, just give me BDRV_BLOCK_RAW”.

>> You say in your other mail that filters can have an own implementation
>> of .bdrv_co_block_status(), but I don’t think that makes sense,
>> actually.  They should always pass the status of their filtered child.
>>
>> blkdebug is the only filter I know that has an own implementation, and
>> the only thing besides passing the data through is add an alignment
>> assertion.  If it simplifies everything else, I’m very much willing to
>> break that.
> 
> Agree that assertion is a bad reason to not implement some clean generic
> logic.

OK.

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [PATCH v4 02/11] block: Filtered children access functions
  2019-05-17 14:50   ` Vladimir Sementsov-Ogievskiy
@ 2019-05-23 17:27     ` Max Reitz
  2019-05-24  8:12       ` Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 48+ messages in thread
From: Max Reitz @ 2019-05-23 17:27 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-block; +Cc: Kevin Wolf, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 14831 bytes --]

On 17.05.19 16:50, Vladimir Sementsov-Ogievskiy wrote:
> 10.04.2019 23:20, Max Reitz wrote:
>> What bs->file and bs->backing mean depends on the node.  For filter
>> nodes, both signify a node that will eventually receive all R/W
>> accesses.  For format nodes, bs->file contains metadata and data, and
>> bs->backing will not receive writes -- instead, writes are COWed to
>> bs->file.  Usually.
>>
>> In any case, it is not trivial to guess what a child means exactly with
>> our currently limited form of expression.  It is better to introduce
>> some functions that actually guarantee a meaning:
>>
>> - bdrv_filtered_cow_child() will return the child that receives requests
>>    filtered through COW.  That is, reads may or may not be forwarded
>>    (depending on the overlay's allocation status), but writes never go to
>>    this child.
>>
>> - bdrv_filtered_rw_child() will return the child that receives requests
>>    filtered through some very plain process.  Reads and writes issued to
>>    the parent will go to the child as well (although timing, etc. may be
>>    modified).
>>
>> - All drivers but quorum (but quorum is pretty opaque to the general
>>    block layer anyway) always only have one of these children: All read
>>    requests must be served from the filtered_rw_child (if it exists), so
>>    if there was a filtered_cow_child in addition, it would not receive
>>    any requests at all.
>>    (The closest here is mirror, where all requests are passed on to the
>>    source, but with write-blocking, write requests are "COWed" to the
>>    target.  But that just means that the target is a special child that
>>    cannot be introspected by the generic block layer functions, and that
>>    source is a filtered_rw_child.)
>>    Therefore, we can also add bdrv_filtered_child() which returns that
>>    one child (or NULL, if there is no filtered child).
>>
>> Also, many places in the current block layer should be skipping filters
>> (all filters or just the ones added implicitly, it depends) when going
>> through a block node chain.  They do not do that currently, but this
>> patch makes them.
>>
>> One example for this is qemu-img map, which should skip filters and only
>> look at the COW elements in the graph.  The change to iotest 204's
>> reference output shows how using blkdebug on top of a COW node used to
>> make qemu-img map disregard the rest of the backing chain, but with this
>> patch, the allocation in the base image is reported correctly.
>>
>> Furthermore, a note should be made that sometimes we do want to access
>> bs->backing directly.  This is whenever the operation in question is not
>> about accessing the COW child, but the "backing" child, be it COW or
>> not.  This is the case in functions such as bdrv_open_backing_file() or
>> whenever we have to deal with the special behavior of @backing as a
>> blockdev option, which is that it does not default to null like all
>> other child references do.
>>
>> Finally, the query functions (query-block and query-named-block-nodes)
>> are modified to return any filtered child under "backing", not just
>> bs->backing or COW children.  This is so that filters do not interrupt
>> the reported backing chain.  This changes the output of iotest 184, as
>> the throttled node now appears as a backing child.
>>
>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>> ---
> 
> [..]
> 
>> --- a/block/mirror.c
>> +++ b/block/mirror.c

[...]

>> @@ -1650,7 +1651,9 @@ static void mirror_start_job(const char *job_id, BlockDriverState *bs,
>>        * any jobs in them must be blocked */
>>       if (target_is_backing) {
>>           BlockDriverState *iter;
>> -        for (iter = backing_bs(bs); iter != target; iter = backing_bs(iter)) {
>> +        for (iter = bdrv_filtered_bs(bs); iter != target;
> 
> should it be filtered_target too?

Hmm...  The comment says that all nodes that disappear must be blocked.
 I don’t even know by heart which nodes I let disappear. :-/

I suppose we should start at the first explicit node, filter or not...?

>> +             iter = bdrv_filtered_bs(iter))
>> +        {
>>               /* XXX BLK_PERM_WRITE needs to be allowed so we don't block
>>                * ourselves at s->base (if writes are blocked for a node, they are
>>                * also blocked for its backing file). The other options would be a

[...]

>> @@ -1707,14 +1710,14 @@ void mirror_start(const char *job_id, BlockDriverState *bs,
>>                     MirrorCopyMode copy_mode, Error **errp)
>>   {
>>       bool is_none_mode;
>> -    BlockDriverState *base;
>> +    BlockDriverState *base = NULL;
> 
> dead assignment

Now I wonder why I even have that.  Probably an artifact from some
intermediate point.

>>   
>>       if (mode == MIRROR_SYNC_MODE_INCREMENTAL) {
>>           error_setg(errp, "Sync mode 'incremental' not supported");
>>           return;
>>       }
>>       is_none_mode = mode == MIRROR_SYNC_MODE_NONE;
>> -    base = mode == MIRROR_SYNC_MODE_TOP ? backing_bs(bs) : NULL;
>> +    base = mode == MIRROR_SYNC_MODE_TOP ? bdrv_backing_chain_next(bs) : NULL;
>>       mirror_start_job(job_id, bs, creation_flags, target, replaces,
>>                        speed, granularity, buf_size, backing_mode,
>>                        on_source_error, on_target_error, unmap, NULL, NULL,
>> diff --git a/block/qapi.c b/block/qapi.c
>> index 110d05dc57..478c6f5e0d 100644
>> --- a/block/qapi.c
>> +++ b/block/qapi.c

[...]

>> @@ -535,9 +538,10 @@ static BlockStats *bdrv_query_bds_stats(BlockDriverState *bs,
>>           s->parent = bdrv_query_bds_stats(bs->file->bs, blk_level);
>>       }
>>   
>> -    if (blk_level && bs->backing) {
>> +    cow_bs = bdrv_filtered_cow_bs(bs);
> 
> So, if we at blk_level and top bs is explicit filter, you don't want to show it's
> child?

I do.  It’s in s->parent.  I thought it makes sense to change the
existing bs->file vs. bs->backing to storage vs. COW.

> Hmm, at least, we can't show it if it is file-child, as qapi filed already called
> backing. So, if we can't show for file-child-based filters, it may be better to not
> show filter children here at all.
> 
>> +    if (blk_level && cow_bs) {
>>           s->has_backing = true;
>> -        s->backing = bdrv_query_bds_stats(bs->backing->bs, blk_level);
>> +        s->backing = bdrv_query_bds_stats(cow_bs, blk_level);
>>       }
>>   
>>       return s;
>> diff --git a/block/stream.c b/block/stream.c
>> index bfaebb861a..23d5c890e0 100644
>> --- a/block/stream.c
>> +++ b/block/stream.c
>> @@ -65,6 +65,7 @@ static int stream_prepare(Job *job)
>>       StreamBlockJob *s = container_of(job, StreamBlockJob, common.job);
>>       BlockJob *bjob = &s->common;
>>       BlockDriverState *bs = blk_bs(bjob->blk);
>> +    BlockDriverState *unfiltered = bdrv_skip_rw_filters(bs);
> 
> Aha, I'd call it filtered, but unfiltered is correct too, it's amazing

Haha :-)

I think it’s all rather insane than amazing, but, well, insanity never
ceases to amaze, does it.

>>       BlockDriverState *base = s->base;
>>       Error *local_err = NULL;
>>       int ret = 0;
>> @@ -72,7 +73,7 @@ static int stream_prepare(Job *job)

[...]

>> @@ -121,7 +122,7 @@ static int coroutine_fn stream_run(Job *job, Error **errp)
>>       int64_t n = 0; /* bytes */
>>       void *buf;
>>   
>> -    if (!bs->backing) {
>> +    if (!bdrv_filtered_child(bs)) {
>>           goto out;
>>       }
> 
> this condition checks that there is nothing to stream, so, I thing it's better to check
> if (!bdrv_backing_chain_next(bs)) {
>    goto out;
> }

Ah, sure.

>> @@ -162,7 +163,7 @@ static int coroutine_fn stream_run(Job *job, Error **errp)
>>           } else if (ret >= 0) {
>>               /* Copy if allocated in the intermediate images.  Limit to the
>>                * known-unallocated area [offset, offset+n*BDRV_SECTOR_SIZE).  */
>> -            ret = bdrv_is_allocated_above(backing_bs(bs), base,
>> +            ret = bdrv_is_allocated_above(bdrv_filtered_bs(bs), base,
>>                                             offset, n, &n);
> 
> Hmm, if we trying to support bs to be filter, and actually operate on first-non-filter,
> as you write in qapi spec, this is wrong. Again it should be
> bdrv_filtered_cow_bs(bdrv_skip_rw_filters(bs))..

Would bdrv_backing_chain_next() fulfill the same purpose?  It can’t be
allocated in a filter node, after all.

> Or, may be better, we at stream start should calculate reald top bs to operate on, and
> forget about all filters above.. i.e., do bs = bdrv_skip_rw_filters(bs) at the very
> beginning, when creating a job.

Sounds reasonable.  We can ignore all the filters on top of the
(un)filtered top anyway.

>>               /* Finish early if end of backing file has been reached */
>> @@ -268,7 +269,9 @@ void stream_start(const char *job_id, BlockDriverState *bs,
>>        * disappear from the chain after this operation. The streaming job reads
>>        * every block only once, assuming that it doesn't change, so block writes
>>        * and resizes. */
>> -    for (iter = backing_bs(bs); iter && iter != base; iter = backing_bs(iter)) {
>> +    for (iter = bdrv_filtered_bs(bs); iter && iter != base;
>> +         iter = bdrv_filtered_bs(iter))
>> +    {
>>           block_job_add_bdrv(&s->common, "intermediate node", iter, 0,
>>                              BLK_PERM_CONSISTENT_READ | BLK_PERM_WRITE_UNCHANGED,
>>                              &error_abort);
>> diff --git a/blockdev.c b/blockdev.c
>> index 4775a07d93..bb71b8368d 100644
>> --- a/blockdev.c
>> +++ b/blockdev.c
>> @@ -1094,7 +1094,7 @@ void hmp_commit(Monitor *mon, const QDict *qdict)
>>               return;
>>           }
>>   
>> -        bs = blk_bs(blk);
>> +        bs = bdrv_skip_implicit_filters(blk_bs(blk));
>>           aio_context = bdrv_get_aio_context(bs);
>>           aio_context_acquire(aio_context);
>>   
>> @@ -1663,7 +1663,7 @@ static void external_snapshot_prepare(BlkActionState *common,
>>           goto out;
>>       }
>>   
>> -    if (state->new_bs->backing != NULL) {
>> +    if (bdrv_filtered_cow_child(state->new_bs)) {
> 
> Do we allow to create filter snapshot? We should either restrict it explicitly or
> check bdrv_filtered_child here.. And we can't allow file-based-filters anyway..

Hm, yes, we should probably check both (separately to give better error
messages).

In theory it might be possible to allow filters on top, but there isn’t
really any point.  If someone wants to add filters on top of the
snapshot, they should use reopen.

> [skipped up to the end of blockdev.c, I'm tired o_O]

I can very much relate. :-)

Your review definitely is much appreciated.

>> diff --git a/migration/block-dirty-bitmap.c b/migration/block-dirty-bitmap.c
>> index d1bb863cb6..f99f753fba 100644
>> --- a/migration/block-dirty-bitmap.c
>> +++ b/migration/block-dirty-bitmap.c
>> @@ -285,9 +285,7 @@ static int init_dirty_bitmap_migration(void)
>>           const char *drive_name = bdrv_get_device_or_node_name(bs);
>>   
>>           /* skip automatically inserted nodes */
>> -        while (bs && bs->drv && bs->implicit) {
>> -            bs = backing_bs(bs);
>> -        }
>> +        bs = bdrv_skip_implicit_filters(bs);
> 
> this intersects with Jonh's patch
> [PATCH v2] migration/dirty-bitmaps: change bitmap enumeration method
> https://lists.gnu.org/archive/html/qemu-devel/2019-05/msg03340.html

Well.  I’m not really considerate of other patches with this series.
Rebasing is always such a pain that I just write it for the current
master.  I won’t incorporate unmerged series because doing so may cause
me to have to rebase more than once.

And I can’t get this series merged soon enough because it’s just wrong
that I (and you) have to the one(s) thinking about how to treat filters
everywhere.  It should be the people that introduce the code.

>>           for (bitmap = bdrv_dirty_bitmap_next(bs, NULL); bitmap;
>>                bitmap = bdrv_dirty_bitmap_next(bs, bitmap))
>> diff --git a/nbd/server.c b/nbd/server.c
>> index e21bd501dc..e41ae89dbe 100644
>> --- a/nbd/server.c
>> +++ b/nbd/server.c
>> @@ -1506,13 +1506,13 @@ NBDExport *nbd_export_new(BlockDriverState *bs, uint64_t dev_offset,
>>       if (bitmap) {
>>           BdrvDirtyBitmap *bm = NULL;
>>   
>> -        while (true) {
>> +        while (bs) {
>>               bm = bdrv_find_dirty_bitmap(bs, bitmap);
>> -            if (bm != NULL || bs->backing == NULL) {
>> +            if (bm != NULL) {
>>                   break;
>>               }
>>   
>> -            bs = bs->backing->bs;
>> +            bs = bdrv_filtered_bs(bs);
>>           }
> 
> Check in documentation: "@bitmap: Also export the dirty bitmap reachable from @device".
> 
> "Reachable" is not bad, but we may want to clarify that extended backing chain is meant

Hm...  Isn’t that just a problem with the current documentation?

I think this change in code better fits what I’d guess from “reachable”
than what it currently means.

>>           if (bm == NULL) {
>> diff --git a/qemu-img.c b/qemu-img.c
>> index aa6f81f1ea..bcfbb743fc 100644
>> --- a/qemu-img.c
>> +++ b/qemu-img.c

[...]

>> @@ -2434,7 +2433,8 @@ static int img_convert(int argc, char **argv)
>>            * s.target_backing_sectors has to be negative, which it will
>>            * be automatically).  The backing file length is used only
>>            * for optimizations, so such a case is not fatal. */
>> -        s.target_backing_sectors = bdrv_nb_sectors(out_bs->backing->bs);
>> +        s.target_backing_sectors =
>> +            bdrv_nb_sectors(bdrv_filtered_cow_bs(out_bs));
> 
> can't out_bs be filter itself?

why would you do that

More serious, well, perhaps, in theory.  In practice I really cannot
imagine why it would be.

> 
>>       } else {
>>           s.target_backing_sectors = -1;
>>       }
>> @@ -2797,6 +2797,7 @@ static int get_block_status(BlockDriverState *bs, int64_t offset,
>>   
>>       depth = 0;
>>       for (;;) {
>> +        bs = bdrv_skip_rw_filters(bs);
> 
> Why? Filters may have own implementation of block_status, why to skip it?
> 
> Or, thay cannot? Really, may be disallow filters have block_status, we may solve
> inefficient block_status_above we talked about before.

As said in the other subthread, I think ignoring filters here is fine.

Max

>>           ret = bdrv_block_status(bs, offset, bytes, &bytes, &map, &file);
>>           if (ret < 0) {
>>               return ret;


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [PATCH v4 02/11] block: Filtered children access functions
  2019-05-23 17:27     ` Max Reitz
@ 2019-05-24  8:12       ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 48+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2019-05-24  8:12 UTC (permalink / raw)
  To: Max Reitz, qemu-block; +Cc: Kevin Wolf, qemu-devel

23.05.2019 20:27, Max Reitz wrote:
> On 17.05.19 16:50, Vladimir Sementsov-Ogievskiy wrote:
>> 10.04.2019 23:20, Max Reitz wrote:
>>> What bs->file and bs->backing mean depends on the node.  For filter
>>> nodes, both signify a node that will eventually receive all R/W
>>> accesses.  For format nodes, bs->file contains metadata and data, and
>>> bs->backing will not receive writes -- instead, writes are COWed to
>>> bs->file.  Usually.
>>>
>>> In any case, it is not trivial to guess what a child means exactly with
>>> our currently limited form of expression.  It is better to introduce
>>> some functions that actually guarantee a meaning:
>>>
>>> - bdrv_filtered_cow_child() will return the child that receives requests
>>>     filtered through COW.  That is, reads may or may not be forwarded
>>>     (depending on the overlay's allocation status), but writes never go to
>>>     this child.
>>>
>>> - bdrv_filtered_rw_child() will return the child that receives requests
>>>     filtered through some very plain process.  Reads and writes issued to
>>>     the parent will go to the child as well (although timing, etc. may be
>>>     modified).
>>>
>>> - All drivers but quorum (but quorum is pretty opaque to the general
>>>     block layer anyway) always only have one of these children: All read
>>>     requests must be served from the filtered_rw_child (if it exists), so
>>>     if there was a filtered_cow_child in addition, it would not receive
>>>     any requests at all.
>>>     (The closest here is mirror, where all requests are passed on to the
>>>     source, but with write-blocking, write requests are "COWed" to the
>>>     target.  But that just means that the target is a special child that
>>>     cannot be introspected by the generic block layer functions, and that
>>>     source is a filtered_rw_child.)
>>>     Therefore, we can also add bdrv_filtered_child() which returns that
>>>     one child (or NULL, if there is no filtered child).
>>>
>>> Also, many places in the current block layer should be skipping filters
>>> (all filters or just the ones added implicitly, it depends) when going
>>> through a block node chain.  They do not do that currently, but this
>>> patch makes them.
>>>
>>> One example for this is qemu-img map, which should skip filters and only
>>> look at the COW elements in the graph.  The change to iotest 204's
>>> reference output shows how using blkdebug on top of a COW node used to
>>> make qemu-img map disregard the rest of the backing chain, but with this
>>> patch, the allocation in the base image is reported correctly.
>>>
>>> Furthermore, a note should be made that sometimes we do want to access
>>> bs->backing directly.  This is whenever the operation in question is not
>>> about accessing the COW child, but the "backing" child, be it COW or
>>> not.  This is the case in functions such as bdrv_open_backing_file() or
>>> whenever we have to deal with the special behavior of @backing as a
>>> blockdev option, which is that it does not default to null like all
>>> other child references do.
>>>
>>> Finally, the query functions (query-block and query-named-block-nodes)
>>> are modified to return any filtered child under "backing", not just
>>> bs->backing or COW children.  This is so that filters do not interrupt
>>> the reported backing chain.  This changes the output of iotest 184, as
>>> the throttled node now appears as a backing child.
>>>
>>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>>> ---
>>
>> [..]
>>
>>> --- a/block/mirror.c
>>> +++ b/block/mirror.c
> 
> [...]
> 
>>> @@ -1650,7 +1651,9 @@ static void mirror_start_job(const char *job_id, BlockDriverState *bs,
>>>         * any jobs in them must be blocked */
>>>        if (target_is_backing) {
>>>            BlockDriverState *iter;
>>> -        for (iter = backing_bs(bs); iter != target; iter = backing_bs(iter)) {
>>> +        for (iter = bdrv_filtered_bs(bs); iter != target;
>>
>> should it be filtered_target too?
> 
> Hmm...  The comment says that all nodes that disappear must be blocked.
>   I don’t even know by heart which nodes I let disappear. :-/
> 
> I suppose we should start at the first explicit node, filter or not...?

Hm, I thought about where should we stop. But I don't think we want to remove nodes
under target, so it should be OK as is..

> 
>>> +             iter = bdrv_filtered_bs(iter))
>>> +        {
>>>                /* XXX BLK_PERM_WRITE needs to be allowed so we don't block
>>>                 * ourselves at s->base (if writes are blocked for a node, they are
>>>                 * also blocked for its backing file). The other options would be a
> 
> [...]
> 
>>> @@ -1707,14 +1710,14 @@ void mirror_start(const char *job_id, BlockDriverState *bs,
>>>                      MirrorCopyMode copy_mode, Error **errp)
>>>    {
>>>        bool is_none_mode;
>>> -    BlockDriverState *base;
>>> +    BlockDriverState *base = NULL;
>>
>> dead assignment
> 
> Now I wonder why I even have that.  Probably an artifact from some
> intermediate point.
> 
>>>    
>>>        if (mode == MIRROR_SYNC_MODE_INCREMENTAL) {
>>>            error_setg(errp, "Sync mode 'incremental' not supported");
>>>            return;
>>>        }
>>>        is_none_mode = mode == MIRROR_SYNC_MODE_NONE;
>>> -    base = mode == MIRROR_SYNC_MODE_TOP ? backing_bs(bs) : NULL;
>>> +    base = mode == MIRROR_SYNC_MODE_TOP ? bdrv_backing_chain_next(bs) : NULL;
>>>        mirror_start_job(job_id, bs, creation_flags, target, replaces,
>>>                         speed, granularity, buf_size, backing_mode,
>>>                         on_source_error, on_target_error, unmap, NULL, NULL,
>>> diff --git a/block/qapi.c b/block/qapi.c
>>> index 110d05dc57..478c6f5e0d 100644
>>> --- a/block/qapi.c
>>> +++ b/block/qapi.c
> 
> [...]
> 
>>> @@ -535,9 +538,10 @@ static BlockStats *bdrv_query_bds_stats(BlockDriverState *bs,
>>>            s->parent = bdrv_query_bds_stats(bs->file->bs, blk_level);
>>>        }
>>>    
>>> -    if (blk_level && bs->backing) {
>>> +    cow_bs = bdrv_filtered_cow_bs(bs);
>>
>> So, if we at blk_level and top bs is explicit filter, you don't want to show it's
>> child?
> 
> I do.  It’s in s->parent.  I thought it makes sense to change the
> existing bs->file vs. bs->backing to storage vs. COW.
Hmm. When I reviewed this I didn't consider the following patch. Actually, in this patch
you break showing backing child for filter, and in following - fix it. Not very good,
but not a reason to merge these two patches.. Ok for me.

> 
>> Hmm, at least, we can't show it if it is file-child, as qapi filed already called
>> backing. So, if we can't show for file-child-based filters, it may be better to not
>> show filter children here at all.
>>
>>> +    if (blk_level && cow_bs) {
>>>            s->has_backing = true;
>>> -        s->backing = bdrv_query_bds_stats(bs->backing->bs, blk_level);
>>> +        s->backing = bdrv_query_bds_stats(cow_bs, blk_level);
>>>        }
>>>    
>>>        return s;
>>> diff --git a/block/stream.c b/block/stream.c
>>> index bfaebb861a..23d5c890e0 100644
>>> --- a/block/stream.c
>>> +++ b/block/stream.c
>>> @@ -65,6 +65,7 @@ static int stream_prepare(Job *job)
>>>        StreamBlockJob *s = container_of(job, StreamBlockJob, common.job);
>>>        BlockJob *bjob = &s->common;
>>>        BlockDriverState *bs = blk_bs(bjob->blk);
>>> +    BlockDriverState *unfiltered = bdrv_skip_rw_filters(bs);
>>
>> Aha, I'd call it filtered, but unfiltered is correct too, it's amazing
> 
> Haha :-)
> 
> I think it’s all rather insane than amazing, but, well, insanity never
> ceases to amaze, does it.
> 
>>>        BlockDriverState *base = s->base;
>>>        Error *local_err = NULL;
>>>        int ret = 0;
>>> @@ -72,7 +73,7 @@ static int stream_prepare(Job *job)
> 
> [...]
> 
>>> @@ -121,7 +122,7 @@ static int coroutine_fn stream_run(Job *job, Error **errp)
>>>        int64_t n = 0; /* bytes */
>>>        void *buf;
>>>    
>>> -    if (!bs->backing) {
>>> +    if (!bdrv_filtered_child(bs)) {
>>>            goto out;
>>>        }
>>
>> this condition checks that there is nothing to stream, so, I thing it's better to check
>> if (!bdrv_backing_chain_next(bs)) {
>>     goto out;
>> }
> 
> Ah, sure.
> 
>>> @@ -162,7 +163,7 @@ static int coroutine_fn stream_run(Job *job, Error **errp)
>>>            } else if (ret >= 0) {
>>>                /* Copy if allocated in the intermediate images.  Limit to the
>>>                 * known-unallocated area [offset, offset+n*BDRV_SECTOR_SIZE).  */
>>> -            ret = bdrv_is_allocated_above(backing_bs(bs), base,
>>> +            ret = bdrv_is_allocated_above(bdrv_filtered_bs(bs), base,
>>>                                              offset, n, &n);
>>
>> Hmm, if we trying to support bs to be filter, and actually operate on first-non-filter,
>> as you write in qapi spec, this is wrong. Again it should be
>> bdrv_filtered_cow_bs(bdrv_skip_rw_filters(bs))..
> 
> Would bdrv_backing_chain_next() fulfill the same purpose?  It can’t be
> allocated in a filter node, after all.

It's OK too.

> 
>> Or, may be better, we at stream start should calculate reald top bs to operate on, and
>> forget about all filters above.. i.e., do bs = bdrv_skip_rw_filters(bs) at the very
>> beginning, when creating a job.
> 
> Sounds reasonable.  We can ignore all the filters on top of the
> (un)filtered top anyway.
> 
>>>                /* Finish early if end of backing file has been reached */
>>> @@ -268,7 +269,9 @@ void stream_start(const char *job_id, BlockDriverState *bs,
>>>         * disappear from the chain after this operation. The streaming job reads
>>>         * every block only once, assuming that it doesn't change, so block writes
>>>         * and resizes. */
>>> -    for (iter = backing_bs(bs); iter && iter != base; iter = backing_bs(iter)) {
>>> +    for (iter = bdrv_filtered_bs(bs); iter && iter != base;
>>> +         iter = bdrv_filtered_bs(iter))
>>> +    {
>>>            block_job_add_bdrv(&s->common, "intermediate node", iter, 0,
>>>                               BLK_PERM_CONSISTENT_READ | BLK_PERM_WRITE_UNCHANGED,
>>>                               &error_abort);
>>> diff --git a/blockdev.c b/blockdev.c
>>> index 4775a07d93..bb71b8368d 100644
>>> --- a/blockdev.c
>>> +++ b/blockdev.c
>>> @@ -1094,7 +1094,7 @@ void hmp_commit(Monitor *mon, const QDict *qdict)
>>>                return;
>>>            }
>>>    
>>> -        bs = blk_bs(blk);
>>> +        bs = bdrv_skip_implicit_filters(blk_bs(blk));
>>>            aio_context = bdrv_get_aio_context(bs);
>>>            aio_context_acquire(aio_context);
>>>    
>>> @@ -1663,7 +1663,7 @@ static void external_snapshot_prepare(BlkActionState *common,
>>>            goto out;
>>>        }
>>>    
>>> -    if (state->new_bs->backing != NULL) {
>>> +    if (bdrv_filtered_cow_child(state->new_bs)) {
>>
>> Do we allow to create filter snapshot? We should either restrict it explicitly or
>> check bdrv_filtered_child here.. And we can't allow file-based-filters anyway..
> 
> Hm, yes, we should probably check both (separately to give better error
> messages).
> 
> In theory it might be possible to allow filters on top, but there isn’t
> really any point.  If someone wants to add filters on top of the
> snapshot, they should use reopen.
> 
>> [skipped up to the end of blockdev.c, I'm tired o_O]
> 
> I can very much relate. :-)
> 
> Your review definitely is much appreciated.
> 
>>> diff --git a/migration/block-dirty-bitmap.c b/migration/block-dirty-bitmap.c
>>> index d1bb863cb6..f99f753fba 100644
>>> --- a/migration/block-dirty-bitmap.c
>>> +++ b/migration/block-dirty-bitmap.c
>>> @@ -285,9 +285,7 @@ static int init_dirty_bitmap_migration(void)
>>>            const char *drive_name = bdrv_get_device_or_node_name(bs);
>>>    
>>>            /* skip automatically inserted nodes */
>>> -        while (bs && bs->drv && bs->implicit) {
>>> -            bs = backing_bs(bs);
>>> -        }
>>> +        bs = bdrv_skip_implicit_filters(bs);
>>
>> this intersects with Jonh's patch
>> [PATCH v2] migration/dirty-bitmaps: change bitmap enumeration method
>> https://lists.gnu.org/archive/html/qemu-devel/2019-05/msg03340.html
> 
> Well.  I’m not really considerate of other patches with this series.
> Rebasing is always such a pain that I just write it for the current
> master.  I won’t incorporate unmerged series because doing so may cause
> me to have to rebase more than once.
> 
> And I can’t get this series merged soon enough because it’s just wrong
> that I (and you) have to the one(s) thinking about how to treat filters
> everywhere.  It should be the people that introduce the code.

I think it's just impossible to think over all filter use-cases now. Better
to stop at some point, and then fix wrong things while covering real use-cases
by io-tests.

> 
>>>            for (bitmap = bdrv_dirty_bitmap_next(bs, NULL); bitmap;
>>>                 bitmap = bdrv_dirty_bitmap_next(bs, bitmap))
>>> diff --git a/nbd/server.c b/nbd/server.c
>>> index e21bd501dc..e41ae89dbe 100644
>>> --- a/nbd/server.c
>>> +++ b/nbd/server.c
>>> @@ -1506,13 +1506,13 @@ NBDExport *nbd_export_new(BlockDriverState *bs, uint64_t dev_offset,
>>>        if (bitmap) {
>>>            BdrvDirtyBitmap *bm = NULL;
>>>    
>>> -        while (true) {
>>> +        while (bs) {
>>>                bm = bdrv_find_dirty_bitmap(bs, bitmap);
>>> -            if (bm != NULL || bs->backing == NULL) {
>>> +            if (bm != NULL) {
>>>                    break;
>>>                }
>>>    
>>> -            bs = bs->backing->bs;
>>> +            bs = bdrv_filtered_bs(bs);
>>>            }
>>
>> Check in documentation: "@bitmap: Also export the dirty bitmap reachable from @device".
>>
>> "Reachable" is not bad, but we may want to clarify that extended backing chain is meant
> 
> Hm...  Isn’t that just a problem with the current documentation?
> 

Yes. Anyway, it's not necessary to fix it in theses series.

> I think this change in code better fits what I’d guess from “reachable”
> than what it currently means.
> 
>>>            if (bm == NULL) {
>>> diff --git a/qemu-img.c b/qemu-img.c
>>> index aa6f81f1ea..bcfbb743fc 100644
>>> --- a/qemu-img.c
>>> +++ b/qemu-img.c
> 
> [...]
> 
>>> @@ -2434,7 +2433,8 @@ static int img_convert(int argc, char **argv)
>>>             * s.target_backing_sectors has to be negative, which it will
>>>             * be automatically).  The backing file length is used only
>>>             * for optimizations, so such a case is not fatal. */
>>> -        s.target_backing_sectors = bdrv_nb_sectors(out_bs->backing->bs);
>>> +        s.target_backing_sectors =
>>> +            bdrv_nb_sectors(bdrv_filtered_cow_bs(out_bs));
>>
>> can't out_bs be filter itself?
> 
> why would you do that
> 
> More serious, well, perhaps, in theory.  In practice I really cannot
> imagine why it would be.

Throttling? But it has file child and will not work with backing anyway. And
we don't have public backing-based filters anyway. So, I don't care, it's OK.

> 
>>
>>>        } else {
>>>            s.target_backing_sectors = -1;
>>>        }
>>> @@ -2797,6 +2797,7 @@ static int get_block_status(BlockDriverState *bs, int64_t offset,
>>>    
>>>        depth = 0;
>>>        for (;;) {
>>> +        bs = bdrv_skip_rw_filters(bs);
>>
>> Why? Filters may have own implementation of block_status, why to skip it?
>>
>> Or, thay cannot? Really, may be disallow filters have block_status, we may solve
>> inefficient block_status_above we talked about before.
> 
> As said in the other subthread, I think ignoring filters here is fine.
> 
> Max
> 
>>>            ret = bdrv_block_status(bs, offset, bytes, &bytes, &map, &file);
>>>            if (ret < 0) {
>>>                return ret;
> 


-- 
Best regards,
Vladimir

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [PATCH v4 04/11] block: Inline bdrv_co_block_status_from_*()
  2019-05-21  8:57   ` Vladimir Sementsov-Ogievskiy
@ 2019-05-28 17:58     ` Max Reitz
  0 siblings, 0 replies; 48+ messages in thread
From: Max Reitz @ 2019-05-28 17:58 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-block; +Cc: Kevin Wolf, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 3205 bytes --]

On 21.05.19 10:57, Vladimir Sementsov-Ogievskiy wrote:
> 10.04.2019 23:20, Max Reitz wrote:
>> With bdrv_filtered_rw_bs(), we can easily handle this default filter
>> behavior in bdrv_co_block_status().
>>
>> blkdebug wants to have an additional assertion, so it keeps its own
>> implementation, except bdrv_co_block_status_from_file() needs to be
>> inlined there.
>>
>> Suggested-by: Eric Blake <eblake@redhat.com>
>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>> ---
>>   include/block/block_int.h | 22 -----------------
>>   block/blkdebug.c          |  7 ++++--
>>   block/blklogwrites.c      |  1 -
>>   block/commit.c            |  1 -
>>   block/copy-on-read.c      |  2 --
>>   block/io.c                | 51 +++++++++++++--------------------------
>>   block/mirror.c            |  1 -
>>   block/throttle.c          |  1 -
>>   8 files changed, 22 insertions(+), 64 deletions(-)

[...]

>> diff --git a/block/io.c b/block/io.c
>> index 5c33ecc080..8d124bae5c 100644
>> --- a/block/io.c
>> +++ b/block/io.c

[...]

>> @@ -2088,7 +2059,8 @@ static int coroutine_fn bdrv_co_block_status(BlockDriverState *bs,
>>   
>>       /* Must be non-NULL or bdrv_getlength() would have failed */
>>       assert(bs->drv);
>> -    if (!bs->drv->bdrv_co_block_status) {
>> +    has_filtered_child = bs->drv->is_filter && bdrv_filtered_rw_child(bs);
>> +    if (!bs->drv->bdrv_co_block_status && !has_filtered_child) {
>>           *pnum = bytes;
>>           ret = BDRV_BLOCK_DATA | BDRV_BLOCK_ALLOCATED;
>>           if (offset + bytes == total_size) {
>> @@ -2109,9 +2081,20 @@ static int coroutine_fn bdrv_co_block_status(BlockDriverState *bs,
>>       aligned_offset = QEMU_ALIGN_DOWN(offset, align);
>>       aligned_bytes = ROUND_UP(offset + bytes, align) - aligned_offset;
>>   
>> -    ret = bs->drv->bdrv_co_block_status(bs, want_zero, aligned_offset,
>> -                                        aligned_bytes, pnum, &local_map,
>> -                                        &local_file);
>> +    if (bs->drv->bdrv_co_block_status) {
>> +        ret = bs->drv->bdrv_co_block_status(bs, want_zero, aligned_offset,
>> +                                            aligned_bytes, pnum, &local_map,
>> +                                            &local_file);
>> +    } else {
>> +        /* Default code for filters */
>> +
>> +        local_file = bdrv_filtered_rw_bs(bs);
>> +        assert(local_file);
>> +
>> +        *pnum = aligned_bytes;
>> +        local_map = aligned_offset;
>> +        ret = BDRV_BLOCK_RAW | BDRV_BLOCK_OFFSET_VALID;
>> +    }
> 
> 
> preexistent, but why default for filters is aligned and for other nodes is not?

I suppose because the default code for other nodes has been written
before the aligning code was introduced.

I guess there is no good reason to enforce alignment in either case.  It
is important to do so when issuing a request to the driver because the
driver is not required to be able to handle unaligned requests.  If we
completely forgo the driver and just go through to the next layer, it
doesn’t really matter, I think.

Well, I just kept it as it was before. O:-)

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [PATCH v4 03/11] block: Storage child access function
  2019-05-20 10:41   ` Vladimir Sementsov-Ogievskiy
@ 2019-05-28 18:09     ` Max Reitz
  0 siblings, 0 replies; 48+ messages in thread
From: Max Reitz @ 2019-05-28 18:09 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-block; +Cc: Kevin Wolf, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 3372 bytes --]

On 20.05.19 12:41, Vladimir Sementsov-Ogievskiy wrote:
> 10.04.2019 23:20, Max Reitz wrote:
>> For completeness' sake, add a function for accessing a node's storage
>> child, too.  For filters, this is their filtered child; for non-filters,
>> this is bs->file.
>>
>> Some places are deliberately left unconverted:
>> - BDS opening/closing functions where bs->file is handled specially
>>    (which is basically wrong, but at least simplifies probing)
>> - bdrv_co_block_status_from_file(), because its name implies that it
>>    points to ->file
>> - bdrv_snapshot_goto() in one places unrefs bs->file.  Such a
>>    modification is not covered by this patch and is therefore just
>>    safeguarded by an additional assert(), but otherwise kept as-is.
>>
>> Signed-off-by: Max Reitz <mreitz@redhat.com>

[...]

>> --- a/block/snapshot.c
>> +++ b/block/snapshot.c
> 
> [..]
> 
>> @@ -184,6 +186,7 @@ int bdrv_snapshot_goto(BlockDriverState *bs,
>>                          Error **errp)
>>   {
>>       BlockDriver *drv = bs->drv;
>> +    BlockDriverState *storage_bs;
>>       int ret, open_ret;
>>   
>>       if (!drv) {
>> @@ -204,39 +207,40 @@ int bdrv_snapshot_goto(BlockDriverState *bs,
>>           return ret;
>>       }
>>   
>> -    if (bs->file) {
>> -        BlockDriverState *file;
>> +    storage_bs = bdrv_storage_bs(bs);
>> +    if (storage_bs) {
>>           QDict *options = qdict_clone_shallow(bs->options);
>>           QDict *file_options;
>>           Error *local_err = NULL;
>>   
>> -        file = bs->file->bs;
>>           /* Prevent it from getting deleted when detached from bs */
>> -        bdrv_ref(file);
>> +        bdrv_ref(storage_bs);
>>   
>>           qdict_extract_subqdict(options, &file_options, "file.");
>>           qobject_unref(file_options);
>> -        qdict_put_str(options, "file", bdrv_get_node_name(file));
>> +        qdict_put_str(options, "file", bdrv_get_node_name(storage_bs));
>>   
>>           if (drv->bdrv_close) {
>>               drv->bdrv_close(bs);
>>           }
>> +
>> +        assert(bs->file->bs == storage_bs);
> 
> Hmm, but what save us from this assertion fail for backing-filters? Before your
> patch it was unreachable for them. Or what I miss?

Ha, good point.  I simply missed this point.  Yes, I need to check
whether storage_bs is bs->file or bs->backing and then take the
corresponding sub-QDict from bs->options.

Max

>>           bdrv_unref_child(bs, bs->file);
>>           bs->file = NULL;
>>   
>> -        ret = bdrv_snapshot_goto(file, snapshot_id, errp);
>> +        ret = bdrv_snapshot_goto(storage_bs, snapshot_id, errp);
>>           open_ret = drv->bdrv_open(bs, options, bs->open_flags, &local_err);
>>           qobject_unref(options);
>>           if (open_ret < 0) {
>> -            bdrv_unref(file);
>> +            bdrv_unref(storage_bs);
>>               bs->drv = NULL;
>>               /* A bdrv_snapshot_goto() error takes precedence */
>>               error_propagate(errp, local_err);
>>               return ret < 0 ? ret : open_ret;
>>           }
>>   
>> -        assert(bs->file->bs == file);
>> -        bdrv_unref(file);
>> +        assert(bs->file->bs == storage_bs);
>> +        bdrv_unref(storage_bs);
>>           return ret;
>>       }
>>   
> 
> 
> 



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [PATCH v4 02/11] block: Filtered children access functions
  2019-04-16 10:02   ` Vladimir Sementsov-Ogievskiy
  2019-04-17 16:22     ` Max Reitz
@ 2019-05-31 16:26     ` Max Reitz
  2019-05-31 17:02       ` Max Reitz
  1 sibling, 1 reply; 48+ messages in thread
From: Max Reitz @ 2019-05-31 16:26 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-block; +Cc: Kevin Wolf, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 6012 bytes --]

On 16.04.19 12:02, Vladimir Sementsov-Ogievskiy wrote:
> 10.04.2019 23:20, Max Reitz wrote:
>> What bs->file and bs->backing mean depends on the node.  For filter
>> nodes, both signify a node that will eventually receive all R/W
>> accesses.  For format nodes, bs->file contains metadata and data, and
>> bs->backing will not receive writes -- instead, writes are COWed to
>> bs->file.  Usually.
>>
>> In any case, it is not trivial to guess what a child means exactly with
>> our currently limited form of expression.  It is better to introduce
>> some functions that actually guarantee a meaning:
>>
>> - bdrv_filtered_cow_child() will return the child that receives requests
>>    filtered through COW.  That is, reads may or may not be forwarded
>>    (depending on the overlay's allocation status), but writes never go to
>>    this child.
>>
>> - bdrv_filtered_rw_child() will return the child that receives requests
>>    filtered through some very plain process.  Reads and writes issued to
>>    the parent will go to the child as well (although timing, etc. may be
>>    modified).
>>
>> - All drivers but quorum (but quorum is pretty opaque to the general
>>    block layer anyway) always only have one of these children: All read
>>    requests must be served from the filtered_rw_child (if it exists), so
>>    if there was a filtered_cow_child in addition, it would not receive
>>    any requests at all.
>>    (The closest here is mirror, where all requests are passed on to the
>>    source, but with write-blocking, write requests are "COWed" to the
>>    target.  But that just means that the target is a special child that
>>    cannot be introspected by the generic block layer functions, and that
>>    source is a filtered_rw_child.)
>>    Therefore, we can also add bdrv_filtered_child() which returns that
>>    one child (or NULL, if there is no filtered child).
>>
>> Also, many places in the current block layer should be skipping filters
>> (all filters or just the ones added implicitly, it depends) when going
>> through a block node chain.  They do not do that currently, but this
>> patch makes them.
>>
>> One example for this is qemu-img map, which should skip filters and only
>> look at the COW elements in the graph.  The change to iotest 204's
>> reference output shows how using blkdebug on top of a COW node used to
>> make qemu-img map disregard the rest of the backing chain, but with this
>> patch, the allocation in the base image is reported correctly.
>>
>> Furthermore, a note should be made that sometimes we do want to access
>> bs->backing directly.  This is whenever the operation in question is not
>> about accessing the COW child, but the "backing" child, be it COW or
>> not.  This is the case in functions such as bdrv_open_backing_file() or
>> whenever we have to deal with the special behavior of @backing as a
>> blockdev option, which is that it does not default to null like all
>> other child references do.
>>
>> Finally, the query functions (query-block and query-named-block-nodes)
>> are modified to return any filtered child under "backing", not just
>> bs->backing or COW children.  This is so that filters do not interrupt
>> the reported backing chain.  This changes the output of iotest 184, as
>> the throttled node now appears as a backing child.
>>
>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>> ---
>>   qapi/block-core.json           |   4 +
>>   include/block/block.h          |   1 +
>>   include/block/block_int.h      |  40 +++++--
>>   block.c                        | 210 +++++++++++++++++++++++++++------
>>   block/backup.c                 |   8 +-
>>   block/block-backend.c          |  16 ++-
>>   block/commit.c                 |  33 +++---
>>   block/io.c                     |  45 ++++---
>>   block/mirror.c                 |  21 ++--
>>   block/qapi.c                   |  30 +++--
>>   block/stream.c                 |  13 +-
>>   blockdev.c                     |  88 +++++++++++---
>>   migration/block-dirty-bitmap.c |   4 +-
>>   nbd/server.c                   |   6 +-
>>   qemu-img.c                     |  29 ++---
>>   tests/qemu-iotests/184.out     |   7 +-
>>   tests/qemu-iotests/204.out     |   1 +
>>   17 files changed, 411 insertions(+), 145 deletions(-)
> 
> really huge... didn't you consider conversion file-by-file?
> 
> [..]
> 
>> diff --git a/block.c b/block.c
>> index 16615bc876..e8f6febda0 100644
>> --- a/block.c
>> +++ b/block.c
> 
> [..]
> 
>>   
>> @@ -3467,14 +3469,17 @@ static int bdrv_reopen_parse_backing(BDRVReopenState *reopen_state,
>>       /*
>>        * Find the "actual" backing file by skipping all links that point
>>        * to an implicit node, if any (e.g. a commit filter node).
>> +     * We cannot use any of the bdrv_skip_*() functions here because
>> +     * those return the first explicit node, while we are looking for
>> +     * its overlay here.
>>        */
>>       overlay_bs = bs;
>> -    while (backing_bs(overlay_bs) && backing_bs(overlay_bs)->implicit) {
>> -        overlay_bs = backing_bs(overlay_bs);
>> +    while (overlay_bs->backing && bdrv_filtered_bs(overlay_bs)->implicit) {
> 
> So, you don't want to skip implicit filters with 'file' child? Then, why not to use
> child_bs(overlay_bs->backing), like in following if condition?

On second thought, I actually think this version is wrong in the other way.

There needs to be a bs with bs->backing != NULL and !bs->implicit
somewhere in the chain.  We try to find that node.  It doesn’t matter
what’s on top of it, though,  If there are implicit node (which we try
to skip here), the user isn’t aware of them.  Consequentially, it
doesn’t matter whether these implicit nodes use bs->backing or bs->file,
we just need to skip them.

What is wrong is the “while (overlay_bs->backing ...)”.  That needs to
be “while (bdrv_filtered_bs(overlay_bs) ...)”.

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [PATCH v4 02/11] block: Filtered children access functions
  2019-05-31 16:26     ` Max Reitz
@ 2019-05-31 17:02       ` Max Reitz
  0 siblings, 0 replies; 48+ messages in thread
From: Max Reitz @ 2019-05-31 17:02 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-block; +Cc: Kevin Wolf, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 6307 bytes --]

On 31.05.19 18:26, Max Reitz wrote:
> On 16.04.19 12:02, Vladimir Sementsov-Ogievskiy wrote:
>> 10.04.2019 23:20, Max Reitz wrote:
>>> What bs->file and bs->backing mean depends on the node.  For filter
>>> nodes, both signify a node that will eventually receive all R/W
>>> accesses.  For format nodes, bs->file contains metadata and data, and
>>> bs->backing will not receive writes -- instead, writes are COWed to
>>> bs->file.  Usually.
>>>
>>> In any case, it is not trivial to guess what a child means exactly with
>>> our currently limited form of expression.  It is better to introduce
>>> some functions that actually guarantee a meaning:
>>>
>>> - bdrv_filtered_cow_child() will return the child that receives requests
>>>    filtered through COW.  That is, reads may or may not be forwarded
>>>    (depending on the overlay's allocation status), but writes never go to
>>>    this child.
>>>
>>> - bdrv_filtered_rw_child() will return the child that receives requests
>>>    filtered through some very plain process.  Reads and writes issued to
>>>    the parent will go to the child as well (although timing, etc. may be
>>>    modified).
>>>
>>> - All drivers but quorum (but quorum is pretty opaque to the general
>>>    block layer anyway) always only have one of these children: All read
>>>    requests must be served from the filtered_rw_child (if it exists), so
>>>    if there was a filtered_cow_child in addition, it would not receive
>>>    any requests at all.
>>>    (The closest here is mirror, where all requests are passed on to the
>>>    source, but with write-blocking, write requests are "COWed" to the
>>>    target.  But that just means that the target is a special child that
>>>    cannot be introspected by the generic block layer functions, and that
>>>    source is a filtered_rw_child.)
>>>    Therefore, we can also add bdrv_filtered_child() which returns that
>>>    one child (or NULL, if there is no filtered child).
>>>
>>> Also, many places in the current block layer should be skipping filters
>>> (all filters or just the ones added implicitly, it depends) when going
>>> through a block node chain.  They do not do that currently, but this
>>> patch makes them.
>>>
>>> One example for this is qemu-img map, which should skip filters and only
>>> look at the COW elements in the graph.  The change to iotest 204's
>>> reference output shows how using blkdebug on top of a COW node used to
>>> make qemu-img map disregard the rest of the backing chain, but with this
>>> patch, the allocation in the base image is reported correctly.
>>>
>>> Furthermore, a note should be made that sometimes we do want to access
>>> bs->backing directly.  This is whenever the operation in question is not
>>> about accessing the COW child, but the "backing" child, be it COW or
>>> not.  This is the case in functions such as bdrv_open_backing_file() or
>>> whenever we have to deal with the special behavior of @backing as a
>>> blockdev option, which is that it does not default to null like all
>>> other child references do.
>>>
>>> Finally, the query functions (query-block and query-named-block-nodes)
>>> are modified to return any filtered child under "backing", not just
>>> bs->backing or COW children.  This is so that filters do not interrupt
>>> the reported backing chain.  This changes the output of iotest 184, as
>>> the throttled node now appears as a backing child.
>>>
>>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>>> ---
>>>   qapi/block-core.json           |   4 +
>>>   include/block/block.h          |   1 +
>>>   include/block/block_int.h      |  40 +++++--
>>>   block.c                        | 210 +++++++++++++++++++++++++++------
>>>   block/backup.c                 |   8 +-
>>>   block/block-backend.c          |  16 ++-
>>>   block/commit.c                 |  33 +++---
>>>   block/io.c                     |  45 ++++---
>>>   block/mirror.c                 |  21 ++--
>>>   block/qapi.c                   |  30 +++--
>>>   block/stream.c                 |  13 +-
>>>   blockdev.c                     |  88 +++++++++++---
>>>   migration/block-dirty-bitmap.c |   4 +-
>>>   nbd/server.c                   |   6 +-
>>>   qemu-img.c                     |  29 ++---
>>>   tests/qemu-iotests/184.out     |   7 +-
>>>   tests/qemu-iotests/204.out     |   1 +
>>>   17 files changed, 411 insertions(+), 145 deletions(-)
>>
>> really huge... didn't you consider conversion file-by-file?
>>
>> [..]
>>
>>> diff --git a/block.c b/block.c
>>> index 16615bc876..e8f6febda0 100644
>>> --- a/block.c
>>> +++ b/block.c
>>
>> [..]
>>
>>>   
>>> @@ -3467,14 +3469,17 @@ static int bdrv_reopen_parse_backing(BDRVReopenState *reopen_state,
>>>       /*
>>>        * Find the "actual" backing file by skipping all links that point
>>>        * to an implicit node, if any (e.g. a commit filter node).
>>> +     * We cannot use any of the bdrv_skip_*() functions here because
>>> +     * those return the first explicit node, while we are looking for
>>> +     * its overlay here.
>>>        */
>>>       overlay_bs = bs;
>>> -    while (backing_bs(overlay_bs) && backing_bs(overlay_bs)->implicit) {
>>> -        overlay_bs = backing_bs(overlay_bs);
>>> +    while (overlay_bs->backing && bdrv_filtered_bs(overlay_bs)->implicit) {
>>
>> So, you don't want to skip implicit filters with 'file' child? Then, why not to use
>> child_bs(overlay_bs->backing), like in following if condition?
> 
> On second thought, I actually think this version is wrong in the other way.
> 
> There needs to be a bs with bs->backing != NULL and !bs->implicit
> somewhere in the chain.

(Actually, no, the bs->backing node is @bs)

> We try to find that node.  It doesn’t matter
> what’s on top of it, though,  If there are implicit node (which we try
> to skip here), the user isn’t aware of them.  Consequentially, it
> doesn’t matter whether these implicit nodes use bs->backing or bs->file,
> we just need to skip them.
> 
> What is wrong is the “while (overlay_bs->backing ...)”.  That needs to
> be “while (bdrv_filtered_bs(overlay_bs) ...)”.

I just saw my reply where I noticed this before...  So nothing too new then.

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 48+ messages in thread

end of thread, other threads:[~2019-05-31 17:05 UTC | newest]

Thread overview: 48+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-04-10 20:20 [Qemu-devel] [PATCH v4 00/11] block: Deal with filters Max Reitz
2019-04-10 20:20 ` Max Reitz
2019-04-10 20:20 ` [Qemu-devel] [PATCH v4 01/11] block: Mark commit and mirror as filter drivers Max Reitz
2019-04-10 20:20   ` Max Reitz
2019-04-10 20:20 ` [Qemu-devel] [PATCH v4 02/11] block: Filtered children access functions Max Reitz
2019-04-10 20:20   ` Max Reitz
2019-04-16 10:02   ` Vladimir Sementsov-Ogievskiy
2019-04-17 16:22     ` Max Reitz
2019-04-18  8:36       ` Vladimir Sementsov-Ogievskiy
2019-04-24 15:23         ` Max Reitz
2019-04-19 10:23       ` Vladimir Sementsov-Ogievskiy
2019-04-24 16:36         ` Max Reitz
2019-05-07  9:32           ` Vladimir Sementsov-Ogievskiy
2019-05-07 13:15             ` Max Reitz
2019-05-07 13:33               ` Vladimir Sementsov-Ogievskiy
2019-05-31 16:26     ` Max Reitz
2019-05-31 17:02       ` Max Reitz
2019-05-07 13:30   ` Vladimir Sementsov-Ogievskiy
2019-05-07 15:13     ` Max Reitz
2019-05-17 11:50       ` Vladimir Sementsov-Ogievskiy
2019-05-23 14:49         ` Max Reitz
2019-05-23 15:08           ` Vladimir Sementsov-Ogievskiy
2019-05-23 15:56             ` Max Reitz
2019-05-17 14:50   ` Vladimir Sementsov-Ogievskiy
2019-05-23 17:27     ` Max Reitz
2019-05-24  8:12       ` Vladimir Sementsov-Ogievskiy
2019-04-10 20:20 ` [Qemu-devel] [PATCH v4 03/11] block: Storage child access function Max Reitz
2019-04-10 20:20   ` Max Reitz
2019-05-20 10:41   ` Vladimir Sementsov-Ogievskiy
2019-05-28 18:09     ` Max Reitz
2019-04-10 20:20 ` [Qemu-devel] [PATCH v4 04/11] block: Inline bdrv_co_block_status_from_*() Max Reitz
2019-04-10 20:20   ` Max Reitz
2019-05-21  8:57   ` Vladimir Sementsov-Ogievskiy
2019-05-28 17:58     ` Max Reitz
2019-04-10 20:20 ` [Qemu-devel] [PATCH v4 05/11] block: Fix check_to_replace_node() Max Reitz
2019-04-10 20:20   ` Max Reitz
2019-04-10 20:20 ` [Qemu-devel] [PATCH v4 06/11] iotests: Add tests for mirror @replaces loops Max Reitz
2019-04-10 20:20   ` Max Reitz
2019-04-10 20:20 ` [Qemu-devel] [PATCH v4 07/11] block: Leave BDS.backing_file constant Max Reitz
2019-04-10 20:20   ` Max Reitz
2019-04-10 20:20 ` [Qemu-devel] [PATCH v4 08/11] iotests: Add filter commit test cases Max Reitz
2019-04-10 20:20   ` Max Reitz
2019-04-10 20:20 ` [Qemu-devel] [PATCH v4 09/11] iotests: Add filter mirror " Max Reitz
2019-04-10 20:20   ` Max Reitz
2019-04-10 20:20 ` [Qemu-devel] [PATCH v4 10/11] iotests: Add test for commit in sub directory Max Reitz
2019-04-10 20:20   ` Max Reitz
2019-04-10 20:20 ` [Qemu-devel] [PATCH v4 11/11] iotests: Test committing to overridden backing Max Reitz
2019-04-10 20:20   ` Max Reitz

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.