All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v7 00/47] block: Deal with filters
@ 2020-06-25 15:21 Max Reitz
  2020-06-25 15:21 ` [PATCH v7 01/47] block: Add child access functions Max Reitz
                   ` (48 more replies)
  0 siblings, 49 replies; 173+ messages in thread
From: Max Reitz @ 2020-06-25 15:21 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, qemu-devel, Max Reitz

v6: https://lists.nongnu.org/archive/html/qemu-devel/2019-08/msg01715.html

Branch: https://github.com/XanClic/qemu.git child-access-functions-v7
Branch: https://git.xanclic.moe/XanClic/qemu.git child-access-functions-v7


Hello!

This is v7.  Conceptually, not much has changed, so please follow the
above link to v6’s cover letter if you’re looking for an introduction
to this series.

I did say that conceptually, not much has changed, but from a diff
standpoint, a lot has changed all over this series.


Changes from v6:
- Patch 1:
  - More elaborate explanation of .is_filter
  - Changed function names
  - Dropped bdrv_storage_child() and bdrv_metadata_child()
  - Some checking of the BdrvChildRole

- Patch 2:
  - Mostly changes resulting from the different naming scheme

- Patch 3: New

- Patch 5:
  - Don’t rename those functions
  - Don’t drop a comment that shouldn’t be dropped

- Patch 7:
  - Use block_driver_can_compress()
  - Move setting @filtered down where it’s needed

- Patches 10 and 11: New (extension of 8 and 9)

- Patch 12:
  - Function name changes
  - More cases:
    - bdrv_recurse_can_replace()
    - init_dirty_bitmap_migration()
  - bdrv_co_truncate() has changed

- Patch 13:
  - Function name changes

- Patch 14:
  - Variables renamed to be more consistent with the rest of this series
  - Function name changes
  - The freeze backing chain functions haven’t been renamed
  - STREAM_BUFFER_SIZE is STREAM_CHUNK as of some point last year
  - Fix overlay finding (e.g. handle when @base is not in the device’s
    backing chain)

- Patch 15:
  - Added note to the commit message that bdrv_find_overlay()’s behavior
    changes a bit
  - Function name changes
  - Restructured bdrv_find_backing_image() loop a bit

- Patch 16: New (became necessary because of truncate having to look at
            the backing file as of 955c7d6687fefcd903900)

- Patch 17:
  - Function name changes
  - The freeze backing chain functions haven’t been renamed

- Patch 18:
  - Only flush children for which the parent has taken the WRITE
    permission
  - Mention that this is a bug fix for qcow2

- Patch 19: New

- Patch 20: New, replaces “block: Use CAFs in bdrv_refresh_limits()”

- Patch 23:
  - We can only really fall back to bs->file or bs->backing, so stop
    pretending otherwise

- Patch 24:
  - Rebase conflicts

- Patches 25, 26, 27, and 28: New, they replace “block: Fix
  bdrv_get_allocated_file_size's fallback”

- Patch 29:
  - Function name changes

- Patch 30: New, split out from the next patch

- Patch 31:
  - Function name changes
  - bdrv_skip_implicit_filters() can deal with NULL arguments, so don’t
    wrap it in an “if (bs) {}” block
  - For bdrv_query_bds_stats(), the bs->file part has been split into
    the preceding patch (patch 30)
  - Addional actual-size line in the iotest output thanks to patch 28

- Patch 32: New

- Patch 33:
  - Additional note in the QAPI documentation concerning @replaces (that
    by default, the first non-implicit node on @device is replaced)
    - Move that skippage of implicit nodes to blockdev_mirror_common()
      (from qmp_drive_mirror() and qmp_blockdev_mirror())
  - Function name changes
  - Rename s/source/target_backing_bs/ in qmp_drive_mirror(), because
    that’s better
  - Don’t disallow mirroring through filters with sync=top

- Patch 34:
  - Function name changes
  - There is backup-top.c to care about, too, now

- Patch 35:
  - Function name changes
  - Call bdrv_commit() even for nodes that do not have backing files so
    we get an error
  - s/above_base/base_overlay/

- Patch 36:
  - Function name changes

- Patch 37:
  - Function name changes
  - In img_convert(), when inquiring target_backing_sectors, use
    bdrv_backing_chain_next() instead of bdrv_cow_bs() (because @out_bs
    may be a filter)
  - Forgot to use the backing file of @unfiltered_bs for
    bdrv_is_allocated_above() in img_rebase() (instead of @unfiltered_bs
    itself), fixed

- Patch 39:
  - Function name changes

- Patch 40:
  - Function name changes
  - There are backup-top and filter-compress now

- Patch 41:
  - Make bdrv_backing_overridden() globally available, so block/qapi.c
    can use it to determine whether we can inquire bs->backing’s format
    to get the backing_format
  - Function name changes

- Patch 42: New

- Patch 43:
  - Rebase conflicts

- Patch 44:
  - Create a dedicated do_test_io() function
  - Don’t unnecessarily clear and pass has_quit
  - Drop assert_no_active_block_jobs() that does very little
  - Additional graph constraint check
  - Rebase conflict in the reference output

- Patch 45:
  - Rebase conflict in the reference output

- Patch 46:
  - Use _rm_test_img rather than rm -f
  - Skip one of the test cases when IMGOPTS asks for a data_file

- Patch 47:
  - Rebase conflict in the reference output


Patches removed:
- The whole check_to_replace_node() stuff for mirror (was its own
  series)

- Making bdrv_get_cumulative_perm() public, because it already was

- bdrv_storage_child() (was replaced by child roles)


git-backport-diff against v6:

Key:
[----] : patches are identical
[####] : number of functional differences between upstream/downstream patch
[down] : patch is downstream-only
The flags [FC] indicate (F)unctional and (C)ontextual differences, respectively

001/47:[0126] [FC] 'block: Add child access functions'
002/47:[0032] [FC] 'block: Add chain helper functions'
003/47:[down] 'block: bdrv_cow_child() for bdrv_has_zero_init()'
004/47:[----] [-C] 'block: bdrv_set_backing_hd() is about bs->backing'
005/47:[0067] [FC] 'block: Include filters when freezing backing chain'
006/47:[----] [--] 'block: Drop bdrv_is_encrypted()'
007/47:[0005] [FC] 'block: Add bdrv_supports_compressed_writes()'
008/47:[----] [-C] 'throttle: Support compressed writes'
009/47:[----] [--] 'copy-on-read: Support compressed writes'
010/47:[down] 'mirror-top: Support compressed writes'
011/47:[down] 'backup-top: Support compressed writes'
012/47:[0036] [FC] 'block: Use bdrv_filtered_rw* where obvious'
013/47:[0010] [FC] 'block: Use CAFs in block status functions'
014/47:[0079] [FC] 'stream: Deal with filters'
015/47:[0015] [FC] 'block: Use CAFs when working with backing chains'
016/47:[down] 'block: Use bdrv_cow_child() in bdrv_co_truncate()'
017/47:[0019] [FC] 'block: Re-evaluate backing file handling in reopen'
018/47:[0010] [FC] 'block: Flush all children in generic code'
019/47:[down] 'vmdk: Drop vmdk_co_flush()'
020/47:[down] 'block: Iterate over children in refresh_limits'
021/47:[----] [--] 'block: Use CAFs in bdrv_refresh_filename()'
022/47:[----] [--] 'block: Use CAF in bdrv_co_rw_vmstate()'
023/47:[0094] [FC] 'block/snapshot: Fix fallback'
024/47:[0014] [FC] 'block: Use CAFs for debug breakpoints'
025/47:[down] 'block: Def. impl.s for get_allocated_file_size'
026/47:[down] 'block: Improve get_allocated_file_size's default'
027/47:[down] 'blkverify: Use bdrv_sum_allocated_file_size()'
028/47:[down] 'block/null: Implement bdrv_get_allocated_file_size'
029/47:[0002] [FC] 'blockdev: Use CAF in external_snapshot_prepare()'
030/47:[down] 'block: Report data child for query-blockstats'
031/47:[0031] [FC] 'block: Use child access functions for QAPI queries'
032/47:[down] 'block-copy: Use CAF to find sync=top base'
033/47:[0086] [FC] 'mirror: Deal with filters'
034/47:[0006] [FC] 'backup: Deal with filters'
035/47:[0035] [FC] 'commit: Deal with filters'
036/47:[0002] [FC] 'nbd: Use CAF when looking for dirty bitmap'
037/47:[0017] [FC] 'qemu-img: Use child access functions'
038/47:[----] [--] 'block: Drop backing_bs()'
039/47:[0002] [FC] 'blockdev: Fix active commit choice'
040/47:[0009] [FC] 'block: Inline bdrv_co_block_status_from_*()'
041/47:[0019] [FC] 'block: Leave BDS.backing_file constant'
042/47:[down] 'iotests: Test that qcow2's data-file is flushed'
043/47:[0016] [FC] 'iotests: Let complete_and_wait() work with commit'
044/47:[0042] [FC] 'iotests: Add filter commit test cases'
045/47:[0008] [FC] 'iotests: Add filter mirror test cases'
046/47:[0018] [FC] 'iotests: Add test for commit in sub directory'
047/47:[0008] [FC] 'iotests: Test committing to overridden backing'


Max Reitz (47):
  block: Add child access functions
  block: Add chain helper functions
  block: bdrv_cow_child() for bdrv_has_zero_init()
  block: bdrv_set_backing_hd() is about bs->backing
  block: Include filters when freezing backing chain
  block: Drop bdrv_is_encrypted()
  block: Add bdrv_supports_compressed_writes()
  throttle: Support compressed writes
  copy-on-read: Support compressed writes
  mirror-top: Support compressed writes
  backup-top: Support compressed writes
  block: Use bdrv_filter_(bs|child) where obvious
  block: Use CAFs in block status functions
  stream: Deal with filters
  block: Use CAFs when working with backing chains
  block: Use bdrv_cow_child() in bdrv_co_truncate()
  block: Re-evaluate backing file handling in reopen
  block: Flush all children in generic code
  vmdk: Drop vmdk_co_flush()
  block: Iterate over children in refresh_limits
  block: Use CAFs in bdrv_refresh_filename()
  block: Use CAF in bdrv_co_rw_vmstate()
  block/snapshot: Fix fallback
  block: Use CAFs for debug breakpoints
  block: Def. impl.s for get_allocated_file_size
  block: Improve get_allocated_file_size's default
  blkverify: Use bdrv_sum_allocated_file_size()
  block/null: Implement bdrv_get_allocated_file_size
  blockdev: Use CAF in external_snapshot_prepare()
  block: Report data child for query-blockstats
  block: Use child access functions for QAPI queries
  block-copy: Use CAF to find sync=top base
  mirror: Deal with filters
  backup: Deal with filters
  commit: Deal with filters
  nbd: Use CAF when looking for dirty bitmap
  qemu-img: Use child access functions
  block: Drop backing_bs()
  blockdev: Fix active commit choice
  block: Inline bdrv_co_block_status_from_*()
  block: Leave BDS.backing_file constant
  iotests: Test that qcow2's data-file is flushed
  iotests: Let complete_and_wait() work with commit
  iotests: Add filter commit test cases
  iotests: Add filter mirror test cases
  iotests: Add test for commit in sub directory
  iotests: Test committing to overridden backing

 qapi/block-core.json           |  10 +-
 include/block/block.h          |   2 +-
 include/block/block_int.h      |  99 ++++---
 block.c                        | 500 +++++++++++++++++++++++++++------
 block/backup-top.c             |  14 +-
 block/backup.c                 |   9 +-
 block/blkdebug.c               |   7 +-
 block/blklogwrites.c           |   1 -
 block/blkverify.c              |   1 +
 block/block-backend.c          |   9 +-
 block/block-copy.c             |   4 +-
 block/commit.c                 |  97 +++++--
 block/copy-on-read.c           |  13 +-
 block/filter-compress.c        |   2 -
 block/io.c                     | 142 +++++-----
 block/mirror.c                 | 129 +++++++--
 block/monitor/block-hmp-cmds.c |   2 +-
 block/null.c                   |   7 +
 block/qapi.c                   |  83 ++++--
 block/snapshot.c               | 104 +++++--
 block/stream.c                 |  63 +++--
 block/throttle.c               |  11 +-
 block/vmdk.c                   |  16 --
 blockdev.c                     |  94 +++++--
 migration/block-dirty-bitmap.c |   8 +-
 nbd/server.c                   |   6 +-
 qemu-img.c                     |  36 ++-
 tests/qemu-iotests/020         |  44 +++
 tests/qemu-iotests/020.out     |  10 +
 tests/qemu-iotests/040         | 238 ++++++++++++++++
 tests/qemu-iotests/040.out     |   4 +-
 tests/qemu-iotests/041         | 146 +++++++++-
 tests/qemu-iotests/041.out     |   4 +-
 tests/qemu-iotests/153.out     |   2 +-
 tests/qemu-iotests/184.out     |  14 +-
 tests/qemu-iotests/204.out     |   1 +
 tests/qemu-iotests/228         |   6 +-
 tests/qemu-iotests/228.out     |   6 +-
 tests/qemu-iotests/244         |  49 ++++
 tests/qemu-iotests/244.out     |   7 +
 tests/qemu-iotests/245         |   4 +-
 tests/qemu-iotests/iotests.py  |  10 +-
 42 files changed, 1602 insertions(+), 412 deletions(-)

-- 
2.26.2



^ permalink raw reply	[flat|nested] 173+ messages in thread

* [PATCH v7 01/47] block: Add child access functions
  2020-06-25 15:21 [PATCH v7 00/47] block: Deal with filters Max Reitz
@ 2020-06-25 15:21 ` Max Reitz
  2020-07-08 17:22   ` Andrey Shinkevich
                     ` (2 more replies)
  2020-06-25 15:21 ` [PATCH v7 02/47] block: Add chain helper functions Max Reitz
                   ` (47 subsequent siblings)
  48 siblings, 3 replies; 173+ messages in thread
From: Max Reitz @ 2020-06-25 15:21 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, qemu-devel, Max Reitz

There are BDS children that the general block layer code can access,
namely bs->file and bs->backing.  Since the introduction of filters and
external data files, their meaning is not quite clear.  bs->backing can
be a COW source, or it can be a filtered child; bs->file can be a
filtered child, it can be data and metadata storage, or it can be just
metadata storage.

This overloading really is not helpful.  This patch adds functions that
retrieve the correct child for each exact purpose.  Later patches in
this series will make use of them.  Doing so will allow us to handle
filter nodes in a meaningful way.

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 include/block/block_int.h | 44 +++++++++++++++++--
 block.c                   | 90 +++++++++++++++++++++++++++++++++++++++
 2 files changed, 131 insertions(+), 3 deletions(-)

diff --git a/include/block/block_int.h b/include/block/block_int.h
index 1b86b59af1..bb3457c5e8 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -90,9 +90,17 @@ struct BlockDriver {
     int instance_size;
 
     /* set to true if the BlockDriver is a block filter. Block filters pass
-     * certain callbacks that refer to data (see block.c) to their bs->file if
-     * the driver doesn't implement them. Drivers that do not wish to forward
-     * must implement them and return -ENOTSUP.
+     * certain callbacks that refer to data (see block.c) to their bs->file
+     * or bs->backing (whichever one exists) if the driver doesn't implement
+     * them. Drivers that do not wish to forward must implement them and return
+     * -ENOTSUP.
+     * Note that filters are not allowed to modify data.
+     *
+     * Filters generally cannot have more than a single filtered child,
+     * because the data they present must at all times be the same as
+     * that on their filtered child.  That would be impossible to
+     * achieve for multiple filtered children.
+     * (And this filtered child must then be bs->file or bs->backing.)
      */
     bool is_filter;
     /*
@@ -1370,4 +1378,34 @@ BdrvDirtyBitmap *block_dirty_bitmap_remove(const char *node, const char *name,
                                            BlockDriverState **bitmap_bs,
                                            Error **errp);
 
+BdrvChild *bdrv_cow_child(BlockDriverState *bs);
+BdrvChild *bdrv_filter_child(BlockDriverState *bs);
+BdrvChild *bdrv_filter_or_cow_child(BlockDriverState *bs);
+BdrvChild *bdrv_primary_child(BlockDriverState *bs);
+
+static inline BlockDriverState *child_bs(BdrvChild *child)
+{
+    return child ? child->bs : NULL;
+}
+
+static inline BlockDriverState *bdrv_cow_bs(BlockDriverState *bs)
+{
+    return child_bs(bdrv_cow_child(bs));
+}
+
+static inline BlockDriverState *bdrv_filter_bs(BlockDriverState *bs)
+{
+    return child_bs(bdrv_filter_child(bs));
+}
+
+static inline BlockDriverState *bdrv_filter_or_cow_bs(BlockDriverState *bs)
+{
+    return child_bs(bdrv_filter_or_cow_child(bs));
+}
+
+static inline BlockDriverState *bdrv_primary_bs(BlockDriverState *bs)
+{
+    return child_bs(bdrv_primary_child(bs));
+}
+
 #endif /* BLOCK_INT_H */
diff --git a/block.c b/block.c
index 144f52e413..5a42ef49fd 100644
--- a/block.c
+++ b/block.c
@@ -6918,3 +6918,93 @@ int bdrv_make_empty(BdrvChild *c, Error **errp)
 
     return 0;
 }
+
+/*
+ * Return the child that @bs acts as an overlay for, and from which data may be
+ * copied in COW or COR operations.  Usually this is the backing file.
+ */
+BdrvChild *bdrv_cow_child(BlockDriverState *bs)
+{
+    if (!bs || !bs->drv) {
+        return NULL;
+    }
+
+    if (bs->drv->is_filter) {
+        return NULL;
+    }
+
+    if (!bs->backing) {
+        return NULL;
+    }
+
+    assert(bs->backing->role & BDRV_CHILD_COW);
+    return bs->backing;
+}
+
+/*
+ * If @bs acts as a filter for exactly one of its children, return
+ * that child.
+ */
+BdrvChild *bdrv_filter_child(BlockDriverState *bs)
+{
+    BdrvChild *c;
+
+    if (!bs || !bs->drv) {
+        return NULL;
+    }
+
+    if (!bs->drv->is_filter) {
+        return NULL;
+    }
+
+    /* Only one of @backing or @file may be used */
+    assert(!(bs->backing && bs->file));
+
+    c = bs->backing ?: bs->file;
+    if (!c) {
+        return NULL;
+    }
+
+    assert(c->role & BDRV_CHILD_FILTERED);
+    return c;
+}
+
+/*
+ * Return either the result of bdrv_cow_child() or bdrv_filter_child(),
+ * whichever is non-NULL.
+ *
+ * Return NULL if both are NULL.
+ */
+BdrvChild *bdrv_filter_or_cow_child(BlockDriverState *bs)
+{
+    BdrvChild *cow_child = bdrv_cow_child(bs);
+    BdrvChild *filter_child = bdrv_filter_child(bs);
+
+    /* Filter nodes cannot have COW backing files */
+    assert(!(cow_child && filter_child));
+
+    return cow_child ?: filter_child;
+}
+
+/*
+ * Return the primary child of this node: For filters, that is the
+ * filtered child.  For other nodes, that is usually the child storing
+ * metadata.
+ * (A generally more helpful description is that this is (usually) the
+ * child that has the same filename as @bs.)
+ *
+ * Drivers do not necessarily have a primary child; for example quorum
+ * does not.
+ */
+BdrvChild *bdrv_primary_child(BlockDriverState *bs)
+{
+    BdrvChild *c;
+
+    QLIST_FOREACH(c, &bs->children, next) {
+        if (c->role & BDRV_CHILD_PRIMARY) {
+            return c;
+        }
+    }
+
+    return NULL;
+}
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 173+ messages in thread

* [PATCH v7 02/47] block: Add chain helper functions
  2020-06-25 15:21 [PATCH v7 00/47] block: Deal with filters Max Reitz
  2020-06-25 15:21 ` [PATCH v7 01/47] block: Add child access functions Max Reitz
@ 2020-06-25 15:21 ` Max Reitz
  2020-07-08 17:20   ` Andrey Shinkevich
  2020-07-13 10:18   ` Vladimir Sementsov-Ogievskiy
  2020-06-25 15:21 ` [PATCH v7 03/47] block: bdrv_cow_child() for bdrv_has_zero_init() Max Reitz
                   ` (46 subsequent siblings)
  48 siblings, 2 replies; 173+ messages in thread
From: Max Reitz @ 2020-06-25 15:21 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, qemu-devel, Max Reitz

Add some helper functions for skipping filters in a chain of block
nodes.

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 include/block/block_int.h |  3 +++
 block.c                   | 55 +++++++++++++++++++++++++++++++++++++++
 2 files changed, 58 insertions(+)

diff --git a/include/block/block_int.h b/include/block/block_int.h
index bb3457c5e8..5da793bfc3 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -1382,6 +1382,9 @@ BdrvChild *bdrv_cow_child(BlockDriverState *bs);
 BdrvChild *bdrv_filter_child(BlockDriverState *bs);
 BdrvChild *bdrv_filter_or_cow_child(BlockDriverState *bs);
 BdrvChild *bdrv_primary_child(BlockDriverState *bs);
+BlockDriverState *bdrv_skip_implicit_filters(BlockDriverState *bs);
+BlockDriverState *bdrv_skip_filters(BlockDriverState *bs);
+BlockDriverState *bdrv_backing_chain_next(BlockDriverState *bs);
 
 static inline BlockDriverState *child_bs(BdrvChild *child)
 {
diff --git a/block.c b/block.c
index 5a42ef49fd..0a0b855261 100644
--- a/block.c
+++ b/block.c
@@ -7008,3 +7008,58 @@ BdrvChild *bdrv_primary_child(BlockDriverState *bs)
 
     return NULL;
 }
+
+static BlockDriverState *bdrv_do_skip_filters(BlockDriverState *bs,
+                                              bool stop_on_explicit_filter)
+{
+    BdrvChild *c;
+
+    if (!bs) {
+        return NULL;
+    }
+
+    while (!(stop_on_explicit_filter && !bs->implicit)) {
+        c = bdrv_filter_child(bs);
+        if (!c) {
+            break;
+        }
+        bs = c->bs;
+    }
+    /*
+     * Note that this treats nodes with bs->drv == NULL as not being
+     * filters (bs->drv == NULL should be replaced by something else
+     * anyway).
+     * The advantage of this behavior is that this function will thus
+     * always return a non-NULL value (given a non-NULL @bs).
+     */
+
+    return bs;
+}
+
+/*
+ * Return the first BDS that has not been added implicitly or that
+ * does not have a filtered child down the chain starting from @bs
+ * (including @bs itself).
+ */
+BlockDriverState *bdrv_skip_implicit_filters(BlockDriverState *bs)
+{
+    return bdrv_do_skip_filters(bs, true);
+}
+
+/*
+ * Return the first BDS that does not have a filtered child down the
+ * chain starting from @bs (including @bs itself).
+ */
+BlockDriverState *bdrv_skip_filters(BlockDriverState *bs)
+{
+    return bdrv_do_skip_filters(bs, false);
+}
+
+/*
+ * For a backing chain, return the first non-filter backing image of
+ * the first non-filter image.
+ */
+BlockDriverState *bdrv_backing_chain_next(BlockDriverState *bs)
+{
+    return bdrv_skip_filters(bdrv_cow_bs(bdrv_skip_filters(bs)));
+}
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 173+ messages in thread

* [PATCH v7 03/47] block: bdrv_cow_child() for bdrv_has_zero_init()
  2020-06-25 15:21 [PATCH v7 00/47] block: Deal with filters Max Reitz
  2020-06-25 15:21 ` [PATCH v7 01/47] block: Add child access functions Max Reitz
  2020-06-25 15:21 ` [PATCH v7 02/47] block: Add chain helper functions Max Reitz
@ 2020-06-25 15:21 ` Max Reitz
  2020-07-08 17:23   ` Andrey Shinkevich
  2020-08-07  9:37   ` Vladimir Sementsov-Ogievskiy
  2020-06-25 15:21 ` [PATCH v7 04/47] block: bdrv_set_backing_hd() is about bs->backing Max Reitz
                   ` (45 subsequent siblings)
  48 siblings, 2 replies; 173+ messages in thread
From: Max Reitz @ 2020-06-25 15:21 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, qemu-devel, Max Reitz

bdrv_has_zero_init() and the related bdrv_unallocated_blocks_are_zero()
should use bdrv_cow_child() if they want to check whether the given BDS
has a COW backing file.

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 block.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/block.c b/block.c
index 0a0b855261..f3e2aae49c 100644
--- a/block.c
+++ b/block.c
@@ -5394,7 +5394,7 @@ int bdrv_has_zero_init(BlockDriverState *bs)
 
     /* If BS is a copy on write image, it is initialized to
        the contents of the base image, which may not be zeroes.  */
-    if (bs->backing) {
+    if (bdrv_cow_child(bs)) {
         return 0;
     }
     if (bs->drv->bdrv_has_zero_init) {
@@ -5412,7 +5412,7 @@ bool bdrv_unallocated_blocks_are_zero(BlockDriverState *bs)
 {
     BlockDriverInfo bdi;
 
-    if (bs->backing) {
+    if (bdrv_cow_child(bs)) {
         return false;
     }
 
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 173+ messages in thread

* [PATCH v7 04/47] block: bdrv_set_backing_hd() is about bs->backing
  2020-06-25 15:21 [PATCH v7 00/47] block: Deal with filters Max Reitz
                   ` (2 preceding siblings ...)
  2020-06-25 15:21 ` [PATCH v7 03/47] block: bdrv_cow_child() for bdrv_has_zero_init() Max Reitz
@ 2020-06-25 15:21 ` Max Reitz
  2020-07-08 17:24   ` Andrey Shinkevich
  2020-06-25 15:21 ` [PATCH v7 05/47] block: Include filters when freezing backing chain Max Reitz
                   ` (44 subsequent siblings)
  48 siblings, 1 reply; 173+ messages in thread
From: Max Reitz @ 2020-06-25 15:21 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, qemu-devel, Max Reitz

bdrv_set_backing_hd() is a function that explicitly cares about the
bs->backing child.  Highlight that in its description and use
child_bs(bs->backing) instead of backing_bs(bs) to make it more obvious.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 block.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/block.c b/block.c
index f3e2aae49c..d139ffb57d 100644
--- a/block.c
+++ b/block.c
@@ -2846,7 +2846,7 @@ static BdrvChildRole bdrv_backing_role(BlockDriverState *bs)
 }
 
 /*
- * Sets the backing file link of a BDS. A new reference is created; callers
+ * Sets the bs->backing link of a BDS. A new reference is created; callers
  * which don't need their own reference any more must call bdrv_unref().
  */
 void bdrv_set_backing_hd(BlockDriverState *bs, BlockDriverState *backing_hd,
@@ -2855,7 +2855,7 @@ void bdrv_set_backing_hd(BlockDriverState *bs, BlockDriverState *backing_hd,
     bool update_inherits_from = bdrv_chain_contains(bs, backing_hd) &&
         bdrv_inherits_from_recursive(backing_hd, bs);
 
-    if (bdrv_is_backing_chain_frozen(bs, backing_bs(bs), errp)) {
+    if (bdrv_is_backing_chain_frozen(bs, child_bs(bs->backing), errp)) {
         return;
     }
 
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 173+ messages in thread

* [PATCH v7 05/47] block: Include filters when freezing backing chain
  2020-06-25 15:21 [PATCH v7 00/47] block: Deal with filters Max Reitz
                   ` (3 preceding siblings ...)
  2020-06-25 15:21 ` [PATCH v7 04/47] block: bdrv_set_backing_hd() is about bs->backing Max Reitz
@ 2020-06-25 15:21 ` Max Reitz
  2020-07-08 17:25   ` Andrey Shinkevich
  2020-06-25 15:21 ` [PATCH v7 06/47] block: Drop bdrv_is_encrypted() Max Reitz
                   ` (43 subsequent siblings)
  48 siblings, 1 reply; 173+ messages in thread
From: Max Reitz @ 2020-06-25 15:21 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, qemu-devel, Max Reitz

In order to make filters work in backing chains, the associated
functions must be able to deal with them and freeze both COW and filter
child links.

While at it, add some comments that note which functions require their
caller to ensure that a given child link is not frozen, and how the
callers do so.

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 block.c | 60 +++++++++++++++++++++++++++++++++++++--------------------
 1 file changed, 39 insertions(+), 21 deletions(-)

diff --git a/block.c b/block.c
index d139ffb57d..b59bd776cd 100644
--- a/block.c
+++ b/block.c
@@ -2595,12 +2595,15 @@ static void bdrv_replace_child_noperm(BdrvChild *child,
  * If @new_bs is not NULL, bdrv_check_perm() must be called beforehand, as this
  * function uses bdrv_set_perm() to update the permissions according to the new
  * reference that @new_bs gets.
+ *
+ * Callers must ensure that child->frozen is false.
  */
 static void bdrv_replace_child(BdrvChild *child, BlockDriverState *new_bs)
 {
     BlockDriverState *old_bs = child->bs;
     uint64_t perm, shared_perm;
 
+    /* Asserts that child->frozen == false */
     bdrv_replace_child_noperm(child, new_bs);
 
     /*
@@ -2761,6 +2764,7 @@ static void bdrv_detach_child(BdrvChild *child)
     g_free(child);
 }
 
+/* Callers must ensure that child->frozen is false. */
 void bdrv_root_unref_child(BdrvChild *child)
 {
     BlockDriverState *child_bs;
@@ -2798,6 +2802,7 @@ static void bdrv_unset_inherits_from(BlockDriverState *root, BdrvChild *child)
     }
 }
 
+/* Callers must ensure that child->frozen is false. */
 void bdrv_unref_child(BlockDriverState *parent, BdrvChild *child)
 {
     if (child == NULL) {
@@ -2864,6 +2869,7 @@ void bdrv_set_backing_hd(BlockDriverState *bs, BlockDriverState *backing_hd,
     }
 
     if (bs->backing) {
+        /* Cannot be frozen, we checked that above */
         bdrv_unref_child(bs, bs->backing);
         bs->backing = NULL;
     }
@@ -4372,6 +4378,7 @@ static void bdrv_close(BlockDriverState *bs)
 
     if (bs->drv) {
         if (bs->drv->bdrv_close) {
+            /* Must unfreeze all children, so bdrv_unref_child() works */
             bs->drv->bdrv_close(bs);
         }
         bs->drv = NULL;
@@ -4741,20 +4748,22 @@ BlockDriverState *bdrv_find_base(BlockDriverState *bs)
 }
 
 /*
- * Return true if at least one of the backing links between @bs and
- * @base is frozen. @errp is set if that's the case.
+ * Return true if at least one of the COW (backing) and filter links
+ * between @bs and @base is frozen. @errp is set if that's the case.
  * @base must be reachable from @bs, or NULL.
  */
 bool bdrv_is_backing_chain_frozen(BlockDriverState *bs, BlockDriverState *base,
                                   Error **errp)
 {
     BlockDriverState *i;
+    BdrvChild *child;
 
-    for (i = bs; i != base; i = backing_bs(i)) {
-        if (i->backing && i->backing->frozen) {
+    for (i = bs; i != base; i = child_bs(child)) {
+        child = bdrv_filter_or_cow_child(i);
+
+        if (child && child->frozen) {
             error_setg(errp, "Cannot change '%s' link from '%s' to '%s'",
-                       i->backing->name, i->node_name,
-                       backing_bs(i)->node_name);
+                       child->name, i->node_name, child->bs->node_name);
             return true;
         }
     }
@@ -4763,7 +4772,7 @@ bool bdrv_is_backing_chain_frozen(BlockDriverState *bs, BlockDriverState *base,
 }
 
 /*
- * Freeze all backing links between @bs and @base.
+ * Freeze all COW (backing) and filter links between @bs and @base.
  * If any of the links is already frozen the operation is aborted and
  * none of the links are modified.
  * @base must be reachable from @bs, or NULL.
@@ -4773,22 +4782,25 @@ int bdrv_freeze_backing_chain(BlockDriverState *bs, BlockDriverState *base,
                               Error **errp)
 {
     BlockDriverState *i;
+    BdrvChild *child;
 
     if (bdrv_is_backing_chain_frozen(bs, base, errp)) {
         return -EPERM;
     }
 
-    for (i = bs; i != base; i = backing_bs(i)) {
-        if (i->backing && backing_bs(i)->never_freeze) {
+    for (i = bs; i != base; i = child_bs(child)) {
+        child = bdrv_filter_or_cow_child(i);
+        if (child && child->bs->never_freeze) {
             error_setg(errp, "Cannot freeze '%s' link to '%s'",
-                       i->backing->name, backing_bs(i)->node_name);
+                       child->name, child->bs->node_name);
             return -EPERM;
         }
     }
 
-    for (i = bs; i != base; i = backing_bs(i)) {
-        if (i->backing) {
-            i->backing->frozen = true;
+    for (i = bs; i != base; i = child_bs(child)) {
+        child = bdrv_filter_or_cow_child(i);
+        if (child) {
+            child->frozen = true;
         }
     }
 
@@ -4796,18 +4808,21 @@ int bdrv_freeze_backing_chain(BlockDriverState *bs, BlockDriverState *base,
 }
 
 /*
- * Unfreeze all backing links between @bs and @base. The caller must
- * ensure that all links are frozen before using this function.
+ * Unfreeze all COW (backing) and filter links between @bs and @base.
+ * The caller must ensure that all links are frozen before using this
+ * function.
  * @base must be reachable from @bs, or NULL.
  */
 void bdrv_unfreeze_backing_chain(BlockDriverState *bs, BlockDriverState *base)
 {
     BlockDriverState *i;
+    BdrvChild *child;
 
-    for (i = bs; i != base; i = backing_bs(i)) {
-        if (i->backing) {
-            assert(i->backing->frozen);
-            i->backing->frozen = false;
+    for (i = bs; i != base; i = child_bs(child)) {
+        child = bdrv_filter_or_cow_child(i);
+        if (child) {
+            assert(child->frozen);
+            child->frozen = false;
         }
     }
 }
@@ -4910,8 +4925,11 @@ int bdrv_drop_intermediate(BlockDriverState *top, BlockDriverState *base,
             }
         }
 
-        /* Do the actual switch in the in-memory graph.
-         * Completes bdrv_check_update_perm() transaction internally. */
+        /*
+         * Do the actual switch in the in-memory graph.
+         * Completes bdrv_check_update_perm() transaction internally.
+         * c->frozen is false, we have checked that above.
+         */
         bdrv_ref(base);
         bdrv_replace_child(c, base);
         bdrv_unref(top);
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 173+ messages in thread

* [PATCH v7 06/47] block: Drop bdrv_is_encrypted()
  2020-06-25 15:21 [PATCH v7 00/47] block: Deal with filters Max Reitz
                   ` (4 preceding siblings ...)
  2020-06-25 15:21 ` [PATCH v7 05/47] block: Include filters when freezing backing chain Max Reitz
@ 2020-06-25 15:21 ` Max Reitz
  2020-07-08 17:41   ` Andrey Shinkevich
  2020-06-25 15:21 ` [PATCH v7 07/47] block: Add bdrv_supports_compressed_writes() Max Reitz
                   ` (42 subsequent siblings)
  48 siblings, 1 reply; 173+ messages in thread
From: Max Reitz @ 2020-06-25 15:21 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, qemu-devel, Max Reitz

The original purpose of bdrv_is_encrypted() was to inquire whether a BDS
can be used without the user entering a password or not.  It has not
been used for that purpose for quite some time.

Actually, it is not even fit for that purpose, because to answer that
question, it would have recursively query all of the given node's
children.

So now we have to decide in which direction we want to fix
bdrv_is_encrypted(): Recursively query all children, or drop it and just
use bs->encrypted to get the current node's status?

Nowadays, its only purpose is to report through bdrv_query_image_info()
whether the given image is encrypted or not.  For this purpose, it is
probably more interesting to see whether a given node itself is
encrypted or not (otherwise, a management application cannot discern for
certain which nodes are really encrypted and which just have encrypted
children).

Suggested-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 include/block/block.h | 1 -
 block.c               | 8 --------
 block/qapi.c          | 2 +-
 3 files changed, 1 insertion(+), 10 deletions(-)

diff --git a/include/block/block.h b/include/block/block.h
index 86f9728f00..0080fe1311 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -538,7 +538,6 @@ BlockDriverState *bdrv_next(BdrvNextIterator *it);
 void bdrv_next_cleanup(BdrvNextIterator *it);
 
 BlockDriverState *bdrv_next_monitor_owned(BlockDriverState *bs);
-bool bdrv_is_encrypted(BlockDriverState *bs);
 void bdrv_iterate_format(void (*it)(void *opaque, const char *name),
                          void *opaque, bool read_only);
 const char *bdrv_get_node_name(const BlockDriverState *bs);
diff --git a/block.c b/block.c
index b59bd776cd..76277ea4e0 100644
--- a/block.c
+++ b/block.c
@@ -5044,14 +5044,6 @@ bool bdrv_is_sg(BlockDriverState *bs)
     return bs->sg;
 }
 
-bool bdrv_is_encrypted(BlockDriverState *bs)
-{
-    if (bs->backing && bs->backing->bs->encrypted) {
-        return true;
-    }
-    return bs->encrypted;
-}
-
 const char *bdrv_get_format_name(BlockDriverState *bs)
 {
     return bs->drv ? bs->drv->format_name : NULL;
diff --git a/block/qapi.c b/block/qapi.c
index afd9f3b4a7..4807a2b344 100644
--- a/block/qapi.c
+++ b/block/qapi.c
@@ -288,7 +288,7 @@ void bdrv_query_image_info(BlockDriverState *bs,
     info->virtual_size    = size;
     info->actual_size     = bdrv_get_allocated_file_size(bs);
     info->has_actual_size = info->actual_size >= 0;
-    if (bdrv_is_encrypted(bs)) {
+    if (bs->encrypted) {
         info->encrypted = true;
         info->has_encrypted = true;
     }
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 173+ messages in thread

* [PATCH v7 07/47] block: Add bdrv_supports_compressed_writes()
  2020-06-25 15:21 [PATCH v7 00/47] block: Deal with filters Max Reitz
                   ` (5 preceding siblings ...)
  2020-06-25 15:21 ` [PATCH v7 06/47] block: Drop bdrv_is_encrypted() Max Reitz
@ 2020-06-25 15:21 ` Max Reitz
  2020-07-08 17:48   ` Andrey Shinkevich
  2020-06-25 15:21 ` [PATCH v7 08/47] throttle: Support compressed writes Max Reitz
                   ` (41 subsequent siblings)
  48 siblings, 1 reply; 173+ messages in thread
From: Max Reitz @ 2020-06-25 15:21 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, qemu-devel, Max Reitz

Filters cannot compress data themselves but they have to implement
.bdrv_co_pwritev_compressed() still (or they cannot forward compressed
writes).  Therefore, checking whether
bs->drv->bdrv_co_pwritev_compressed is non-NULL is not sufficient to
know whether the node can actually handle compressed writes.  This
function looks down the filter chain to see whether there is a
non-filter that can actually convert the compressed writes into
compressed data (and thus normal writes).

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 include/block/block.h |  1 +
 block.c               | 23 +++++++++++++++++++++++
 2 files changed, 24 insertions(+)

diff --git a/include/block/block.h b/include/block/block.h
index 0080fe1311..a905a5ec05 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -538,6 +538,7 @@ BlockDriverState *bdrv_next(BdrvNextIterator *it);
 void bdrv_next_cleanup(BdrvNextIterator *it);
 
 BlockDriverState *bdrv_next_monitor_owned(BlockDriverState *bs);
+bool bdrv_supports_compressed_writes(BlockDriverState *bs);
 void bdrv_iterate_format(void (*it)(void *opaque, const char *name),
                          void *opaque, bool read_only);
 const char *bdrv_get_node_name(const BlockDriverState *bs);
diff --git a/block.c b/block.c
index 76277ea4e0..6449f3a11d 100644
--- a/block.c
+++ b/block.c
@@ -5044,6 +5044,29 @@ bool bdrv_is_sg(BlockDriverState *bs)
     return bs->sg;
 }
 
+/**
+ * Return whether the given node supports compressed writes.
+ */
+bool bdrv_supports_compressed_writes(BlockDriverState *bs)
+{
+    BlockDriverState *filtered;
+
+    if (!bs->drv || !block_driver_can_compress(bs->drv)) {
+        return false;
+    }
+
+    filtered = bdrv_filter_bs(bs);
+    if (filtered) {
+        /*
+         * Filters can only forward compressed writes, so we have to
+         * check the child.
+         */
+        return bdrv_supports_compressed_writes(filtered);
+    }
+
+    return true;
+}
+
 const char *bdrv_get_format_name(BlockDriverState *bs)
 {
     return bs->drv ? bs->drv->format_name : NULL;
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 173+ messages in thread

* [PATCH v7 08/47] throttle: Support compressed writes
  2020-06-25 15:21 [PATCH v7 00/47] block: Deal with filters Max Reitz
                   ` (6 preceding siblings ...)
  2020-06-25 15:21 ` [PATCH v7 07/47] block: Add bdrv_supports_compressed_writes() Max Reitz
@ 2020-06-25 15:21 ` Max Reitz
  2020-07-08 17:52   ` Andrey Shinkevich
  2020-06-25 15:21 ` [PATCH v7 09/47] copy-on-read: " Max Reitz
                   ` (40 subsequent siblings)
  48 siblings, 1 reply; 173+ messages in thread
From: Max Reitz @ 2020-06-25 15:21 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, qemu-devel, Max Reitz

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 block/throttle.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/block/throttle.c b/block/throttle.c
index 0ebbad0743..f6e619aca2 100644
--- a/block/throttle.c
+++ b/block/throttle.c
@@ -154,6 +154,15 @@ static int coroutine_fn throttle_co_pdiscard(BlockDriverState *bs,
     return bdrv_co_pdiscard(bs->file, offset, bytes);
 }
 
+static int coroutine_fn throttle_co_pwritev_compressed(BlockDriverState *bs,
+                                                       uint64_t offset,
+                                                       uint64_t bytes,
+                                                       QEMUIOVector *qiov)
+{
+    return throttle_co_pwritev(bs, offset, bytes, qiov,
+                               BDRV_REQ_WRITE_COMPRESSED);
+}
+
 static int throttle_co_flush(BlockDriverState *bs)
 {
     return bdrv_co_flush(bs->file->bs);
@@ -246,6 +255,7 @@ static BlockDriver bdrv_throttle = {
 
     .bdrv_co_pwrite_zeroes              =   throttle_co_pwrite_zeroes,
     .bdrv_co_pdiscard                   =   throttle_co_pdiscard,
+    .bdrv_co_pwritev_compressed         =   throttle_co_pwritev_compressed,
 
     .bdrv_attach_aio_context            =   throttle_attach_aio_context,
     .bdrv_detach_aio_context            =   throttle_detach_aio_context,
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 173+ messages in thread

* [PATCH v7 09/47] copy-on-read: Support compressed writes
  2020-06-25 15:21 [PATCH v7 00/47] block: Deal with filters Max Reitz
                   ` (7 preceding siblings ...)
  2020-06-25 15:21 ` [PATCH v7 08/47] throttle: Support compressed writes Max Reitz
@ 2020-06-25 15:21 ` Max Reitz
  2020-07-08 17:54   ` Andrey Shinkevich
  2020-06-25 15:21 ` [PATCH v7 10/47] mirror-top: " Max Reitz
                   ` (39 subsequent siblings)
  48 siblings, 1 reply; 173+ messages in thread
From: Max Reitz @ 2020-06-25 15:21 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, qemu-devel, Max Reitz

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 block/copy-on-read.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/block/copy-on-read.c b/block/copy-on-read.c
index a6e3c74a68..a6a864f147 100644
--- a/block/copy-on-read.c
+++ b/block/copy-on-read.c
@@ -107,6 +107,16 @@ static int coroutine_fn cor_co_pdiscard(BlockDriverState *bs,
 }
 
 
+static int coroutine_fn cor_co_pwritev_compressed(BlockDriverState *bs,
+                                                  uint64_t offset,
+                                                  uint64_t bytes,
+                                                  QEMUIOVector *qiov)
+{
+    return bdrv_co_pwritev(bs->file, offset, bytes, qiov,
+                           BDRV_REQ_WRITE_COMPRESSED);
+}
+
+
 static void cor_eject(BlockDriverState *bs, bool eject_flag)
 {
     bdrv_eject(bs->file->bs, eject_flag);
@@ -131,6 +141,7 @@ static BlockDriver bdrv_copy_on_read = {
     .bdrv_co_pwritev                    = cor_co_pwritev,
     .bdrv_co_pwrite_zeroes              = cor_co_pwrite_zeroes,
     .bdrv_co_pdiscard                   = cor_co_pdiscard,
+    .bdrv_co_pwritev_compressed         = cor_co_pwritev_compressed,
 
     .bdrv_eject                         = cor_eject,
     .bdrv_lock_medium                   = cor_lock_medium,
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 173+ messages in thread

* [PATCH v7 10/47] mirror-top: Support compressed writes
  2020-06-25 15:21 [PATCH v7 00/47] block: Deal with filters Max Reitz
                   ` (8 preceding siblings ...)
  2020-06-25 15:21 ` [PATCH v7 09/47] copy-on-read: " Max Reitz
@ 2020-06-25 15:21 ` Max Reitz
  2020-07-08 17:58   ` Andrey Shinkevich
  2020-08-18 10:27   ` Kevin Wolf
  2020-06-25 15:21 ` [PATCH v7 11/47] backup-top: " Max Reitz
                   ` (38 subsequent siblings)
  48 siblings, 2 replies; 173+ messages in thread
From: Max Reitz @ 2020-06-25 15:21 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, qemu-devel, Max Reitz

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 block/mirror.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/block/mirror.c b/block/mirror.c
index e8e8844afc..469acf4600 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -1480,6 +1480,15 @@ static int coroutine_fn bdrv_mirror_top_pdiscard(BlockDriverState *bs,
                                     NULL, 0);
 }
 
+static int coroutine_fn bdrv_mirror_top_pwritev_compressed(BlockDriverState *bs,
+                                                           uint64_t offset,
+                                                           uint64_t bytes,
+                                                           QEMUIOVector *qiov)
+{
+    return bdrv_mirror_top_pwritev(bs, offset, bytes, qiov,
+                                   BDRV_REQ_WRITE_COMPRESSED);
+}
+
 static void bdrv_mirror_top_refresh_filename(BlockDriverState *bs)
 {
     if (bs->backing == NULL) {
@@ -1526,6 +1535,7 @@ static BlockDriver bdrv_mirror_top = {
     .bdrv_co_pwritev            = bdrv_mirror_top_pwritev,
     .bdrv_co_pwrite_zeroes      = bdrv_mirror_top_pwrite_zeroes,
     .bdrv_co_pdiscard           = bdrv_mirror_top_pdiscard,
+    .bdrv_co_pwritev_compressed = bdrv_mirror_top_pwritev_compressed,
     .bdrv_co_flush              = bdrv_mirror_top_flush,
     .bdrv_co_block_status       = bdrv_co_block_status_from_backing,
     .bdrv_refresh_filename      = bdrv_mirror_top_refresh_filename,
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 173+ messages in thread

* [PATCH v7 11/47] backup-top: Support compressed writes
  2020-06-25 15:21 [PATCH v7 00/47] block: Deal with filters Max Reitz
                   ` (9 preceding siblings ...)
  2020-06-25 15:21 ` [PATCH v7 10/47] mirror-top: " Max Reitz
@ 2020-06-25 15:21 ` Max Reitz
  2020-07-08 17:59   ` Andrey Shinkevich
  2020-06-25 15:21 ` [PATCH v7 12/47] block: Use bdrv_filter_(bs|child) where obvious Max Reitz
                   ` (37 subsequent siblings)
  48 siblings, 1 reply; 173+ messages in thread
From: Max Reitz @ 2020-06-25 15:21 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, qemu-devel, Max Reitz

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 block/backup-top.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/block/backup-top.c b/block/backup-top.c
index af2f20f346..f304df8f26 100644
--- a/block/backup-top.c
+++ b/block/backup-top.c
@@ -99,6 +99,15 @@ static coroutine_fn int backup_top_co_pwritev(BlockDriverState *bs,
     return bdrv_co_pwritev(bs->backing, offset, bytes, qiov, flags);
 }
 
+static coroutine_fn int backup_top_co_pwritev_compressed(BlockDriverState *bs,
+                                                         uint64_t offset,
+                                                         uint64_t bytes,
+                                                         QEMUIOVector *qiov)
+{
+    return backup_top_co_pwritev(bs, offset, bytes, qiov,
+                                 BDRV_REQ_WRITE_COMPRESSED);
+}
+
 static int coroutine_fn backup_top_co_flush(BlockDriverState *bs)
 {
     if (!bs->backing) {
@@ -173,6 +182,7 @@ BlockDriver bdrv_backup_top_filter = {
     .bdrv_co_pwritev            = backup_top_co_pwritev,
     .bdrv_co_pwrite_zeroes      = backup_top_co_pwrite_zeroes,
     .bdrv_co_pdiscard           = backup_top_co_pdiscard,
+    .bdrv_co_pwritev_compressed = backup_top_co_pwritev_compressed,
     .bdrv_co_flush              = backup_top_co_flush,
 
     .bdrv_co_block_status       = bdrv_co_block_status_from_backing,
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 173+ messages in thread

* [PATCH v7 12/47] block: Use bdrv_filter_(bs|child) where obvious
  2020-06-25 15:21 [PATCH v7 00/47] block: Deal with filters Max Reitz
                   ` (10 preceding siblings ...)
  2020-06-25 15:21 ` [PATCH v7 11/47] backup-top: " Max Reitz
@ 2020-06-25 15:21 ` Max Reitz
  2020-07-08 18:24   ` Andrey Shinkevich
  2020-06-25 15:21 ` [PATCH v7 13/47] block: Use CAFs in block status functions Max Reitz
                   ` (36 subsequent siblings)
  48 siblings, 1 reply; 173+ messages in thread
From: Max Reitz @ 2020-06-25 15:21 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, qemu-devel, Max Reitz

Places that use patterns like

    if (bs->drv->is_filter && bs->file) {
        ... something about bs->file->bs ...
    }

should be

    BlockDriverState *filtered = bdrv_filter_bs(bs);
    if (filtered) {
        ... something about @filtered ...
    }

instead.

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 block.c                        | 31 ++++++++++++++++++++-----------
 block/io.c                     |  7 +++++--
 migration/block-dirty-bitmap.c |  8 +-------
 3 files changed, 26 insertions(+), 20 deletions(-)

diff --git a/block.c b/block.c
index 6449f3a11d..a44af9c3c1 100644
--- a/block.c
+++ b/block.c
@@ -710,11 +710,12 @@ int coroutine_fn bdrv_co_delete_file(BlockDriverState *bs, Error **errp)
 int bdrv_probe_blocksizes(BlockDriverState *bs, BlockSizes *bsz)
 {
     BlockDriver *drv = bs->drv;
+    BlockDriverState *filtered = bdrv_filter_bs(bs);
 
     if (drv && drv->bdrv_probe_blocksizes) {
         return drv->bdrv_probe_blocksizes(bs, bsz);
-    } else if (drv && drv->is_filter && bs->file) {
-        return bdrv_probe_blocksizes(bs->file->bs, bsz);
+    } else if (filtered) {
+        return bdrv_probe_blocksizes(filtered, bsz);
     }
 
     return -ENOTSUP;
@@ -729,11 +730,12 @@ int bdrv_probe_blocksizes(BlockDriverState *bs, BlockSizes *bsz)
 int bdrv_probe_geometry(BlockDriverState *bs, HDGeometry *geo)
 {
     BlockDriver *drv = bs->drv;
+    BlockDriverState *filtered = bdrv_filter_bs(bs);
 
     if (drv && drv->bdrv_probe_geometry) {
         return drv->bdrv_probe_geometry(bs, geo);
-    } else if (drv && drv->is_filter && bs->file) {
-        return bdrv_probe_geometry(bs->file->bs, geo);
+    } else if (filtered) {
+        return bdrv_probe_geometry(filtered, geo);
     }
 
     return -ENOTSUP;
@@ -5421,6 +5423,8 @@ int bdrv_has_zero_init_1(BlockDriverState *bs)
 
 int bdrv_has_zero_init(BlockDriverState *bs)
 {
+    BlockDriverState *filtered;
+
     if (!bs->drv) {
         return 0;
     }
@@ -5433,8 +5437,10 @@ int bdrv_has_zero_init(BlockDriverState *bs)
     if (bs->drv->bdrv_has_zero_init) {
         return bs->drv->bdrv_has_zero_init(bs);
     }
-    if (bs->file && bs->drv->is_filter) {
-        return bdrv_has_zero_init(bs->file->bs);
+
+    filtered = bdrv_filter_bs(bs);
+    if (filtered) {
+        return bdrv_has_zero_init(filtered);
     }
 
     /* safe default */
@@ -5479,8 +5485,9 @@ int bdrv_get_info(BlockDriverState *bs, BlockDriverInfo *bdi)
         return -ENOMEDIUM;
     }
     if (!drv->bdrv_get_info) {
-        if (bs->file && drv->is_filter) {
-            return bdrv_get_info(bs->file->bs, bdi);
+        BlockDriverState *filtered = bdrv_filter_bs(bs);
+        if (filtered) {
+            return bdrv_get_info(filtered, bdi);
         }
         return -ENOTSUP;
     }
@@ -6546,6 +6553,8 @@ int bdrv_amend_options(BlockDriverState *bs, QemuOpts *opts,
 bool bdrv_recurse_can_replace(BlockDriverState *bs,
                               BlockDriverState *to_replace)
 {
+    BlockDriverState *filtered;
+
     if (!bs || !bs->drv) {
         return false;
     }
@@ -6560,9 +6569,9 @@ bool bdrv_recurse_can_replace(BlockDriverState *bs,
     }
 
     /* For filters without an own implementation, we can recurse on our own */
-    if (bs->drv->is_filter) {
-        BdrvChild *child = bs->file ?: bs->backing;
-        return bdrv_recurse_can_replace(child->bs, to_replace);
+    filtered = bdrv_filter_bs(bs);
+    if (filtered) {
+        return bdrv_recurse_can_replace(filtered, to_replace);
     }
 
     /* Safe default */
diff --git a/block/io.c b/block/io.c
index df8f2a98d4..385176b331 100644
--- a/block/io.c
+++ b/block/io.c
@@ -3307,6 +3307,7 @@ int coroutine_fn bdrv_co_truncate(BdrvChild *child, int64_t offset, bool exact,
                                   Error **errp)
 {
     BlockDriverState *bs = child->bs;
+    BdrvChild *filtered;
     BlockDriver *drv = bs->drv;
     BdrvTrackedRequest req;
     int64_t old_size, new_bytes;
@@ -3358,6 +3359,8 @@ int coroutine_fn bdrv_co_truncate(BdrvChild *child, int64_t offset, bool exact,
         goto out;
     }
 
+    filtered = bdrv_filter_child(bs);
+
     /*
      * If the image has a backing file that is large enough that it would
      * provide data for the new area, we cannot leave it unallocated because
@@ -3390,8 +3393,8 @@ int coroutine_fn bdrv_co_truncate(BdrvChild *child, int64_t offset, bool exact,
             goto out;
         }
         ret = drv->bdrv_co_truncate(bs, offset, exact, prealloc, flags, errp);
-    } else if (bs->file && drv->is_filter) {
-        ret = bdrv_co_truncate(bs->file, offset, exact, prealloc, flags, errp);
+    } else if (filtered) {
+        ret = bdrv_co_truncate(filtered, offset, exact, prealloc, flags, errp);
     } else {
         error_setg(errp, "Image format driver does not support resize");
         ret = -ENOTSUP;
diff --git a/migration/block-dirty-bitmap.c b/migration/block-dirty-bitmap.c
index 47bc0f650c..dec656c074 100644
--- a/migration/block-dirty-bitmap.c
+++ b/migration/block-dirty-bitmap.c
@@ -356,13 +356,7 @@ static int init_dirty_bitmap_migration(void)
         while (bs && bs->drv && bs->drv->is_filter &&
                !bdrv_has_named_bitmaps(bs))
         {
-            if (bs->backing) {
-                bs = bs->backing->bs;
-            } else if (bs->file) {
-                bs = bs->file->bs;
-            } else {
-                bs = NULL;
-            }
+            bs = bdrv_filter_bs(bs);
         }
 
         if (bs && bs->drv && !bs->drv->is_filter) {
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 173+ messages in thread

* [PATCH v7 13/47] block: Use CAFs in block status functions
  2020-06-25 15:21 [PATCH v7 00/47] block: Deal with filters Max Reitz
                   ` (11 preceding siblings ...)
  2020-06-25 15:21 ` [PATCH v7 12/47] block: Use bdrv_filter_(bs|child) where obvious Max Reitz
@ 2020-06-25 15:21 ` Max Reitz
  2020-07-08 19:13   ` Andrey Shinkevich
  2020-06-25 15:21 ` [PATCH v7 14/47] stream: Deal with filters Max Reitz
                   ` (35 subsequent siblings)
  48 siblings, 1 reply; 173+ messages in thread
From: Max Reitz @ 2020-06-25 15:21 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, qemu-devel, Max Reitz

Use the child access functions in the block status inquiry functions as
appropriate.

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 block/io.c | 19 ++++++++++---------
 1 file changed, 10 insertions(+), 9 deletions(-)

diff --git a/block/io.c b/block/io.c
index 385176b331..dc9891d6ce 100644
--- a/block/io.c
+++ b/block/io.c
@@ -2407,11 +2407,12 @@ static int coroutine_fn bdrv_co_block_status(BlockDriverState *bs,
     if (ret & (BDRV_BLOCK_DATA | BDRV_BLOCK_ZERO)) {
         ret |= BDRV_BLOCK_ALLOCATED;
     } else if (want_zero) {
+        BlockDriverState *cow_bs = bdrv_cow_bs(bs);
+
         if (bdrv_unallocated_blocks_are_zero(bs)) {
             ret |= BDRV_BLOCK_ZERO;
-        } else if (bs->backing) {
-            BlockDriverState *bs2 = bs->backing->bs;
-            int64_t size2 = bdrv_getlength(bs2);
+        } else if (cow_bs) {
+            int64_t size2 = bdrv_getlength(cow_bs);
 
             if (size2 >= 0 && offset >= size2) {
                 ret |= BDRV_BLOCK_ZERO;
@@ -2477,7 +2478,7 @@ static int coroutine_fn bdrv_co_block_status_above(BlockDriverState *bs,
     bool first = true;
 
     assert(bs != base);
-    for (p = bs; p != base; p = backing_bs(p)) {
+    for (p = bs; p != base; p = bdrv_filter_or_cow_bs(p)) {
         ret = bdrv_co_block_status(p, want_zero, offset, bytes, pnum, map,
                                    file);
         if (ret < 0) {
@@ -2551,7 +2552,7 @@ int bdrv_block_status_above(BlockDriverState *bs, BlockDriverState *base,
 int bdrv_block_status(BlockDriverState *bs, int64_t offset, int64_t bytes,
                       int64_t *pnum, int64_t *map, BlockDriverState **file)
 {
-    return bdrv_block_status_above(bs, backing_bs(bs),
+    return bdrv_block_status_above(bs, bdrv_filter_or_cow_bs(bs),
                                    offset, bytes, pnum, map, file);
 }
 
@@ -2561,9 +2562,9 @@ int coroutine_fn bdrv_is_allocated(BlockDriverState *bs, int64_t offset,
     int ret;
     int64_t dummy;
 
-    ret = bdrv_common_block_status_above(bs, backing_bs(bs), false, offset,
-                                         bytes, pnum ? pnum : &dummy, NULL,
-                                         NULL);
+    ret = bdrv_common_block_status_above(bs, bdrv_filter_or_cow_bs(bs), false,
+                                         offset, bytes, pnum ? pnum : &dummy,
+                                         NULL, NULL);
     if (ret < 0) {
         return ret;
     }
@@ -2626,7 +2627,7 @@ int bdrv_is_allocated_above(BlockDriverState *top,
             break;
         }
 
-        intermediate = backing_bs(intermediate);
+        intermediate = bdrv_filter_or_cow_bs(intermediate);
     }
 
     *pnum = n;
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 173+ messages in thread

* [PATCH v7 14/47] stream: Deal with filters
  2020-06-25 15:21 [PATCH v7 00/47] block: Deal with filters Max Reitz
                   ` (12 preceding siblings ...)
  2020-06-25 15:21 ` [PATCH v7 13/47] block: Use CAFs in block status functions Max Reitz
@ 2020-06-25 15:21 ` Max Reitz
  2020-07-09 14:52   ` Andrey Shinkevich
                     ` (2 more replies)
  2020-06-25 15:21 ` [PATCH v7 15/47] block: Use CAFs when working with backing chains Max Reitz
                   ` (34 subsequent siblings)
  48 siblings, 3 replies; 173+ messages in thread
From: Max Reitz @ 2020-06-25 15:21 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, qemu-devel, Max Reitz

Because of the (not so recent anymore) changes that make the stream job
independent of the base node and instead track the node above it, we
have to split that "bottom" node into two cases: The bottom COW node,
and the node directly above the base node (which may be an R/W filter
or the bottom COW node).

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 qapi/block-core.json |  4 +++
 block/stream.c       | 63 ++++++++++++++++++++++++++++++++------------
 blockdev.c           |  4 ++-
 3 files changed, 53 insertions(+), 18 deletions(-)

diff --git a/qapi/block-core.json b/qapi/block-core.json
index b20332e592..df87855429 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -2486,6 +2486,10 @@
 # On successful completion the image file is updated to drop the backing file
 # and the BLOCK_JOB_COMPLETED event is emitted.
 #
+# In case @device is a filter node, block-stream modifies the first non-filter
+# overlay node below it to point to base's backing node (or NULL if @base was
+# not specified) instead of modifying @device itself.
+#
 # @job-id: identifier for the newly-created block job. If
 #          omitted, the device name will be used. (Since 2.7)
 #
diff --git a/block/stream.c b/block/stream.c
index aa2e7af98e..b9c1141656 100644
--- a/block/stream.c
+++ b/block/stream.c
@@ -31,7 +31,8 @@ enum {
 
 typedef struct StreamBlockJob {
     BlockJob common;
-    BlockDriverState *bottom;
+    BlockDriverState *base_overlay; /* COW overlay (stream from this) */
+    BlockDriverState *above_base;   /* Node directly above the base */
     BlockdevOnError on_error;
     char *backing_file_str;
     bool bs_read_only;
@@ -53,7 +54,7 @@ static void stream_abort(Job *job)
 
     if (s->chain_frozen) {
         BlockJob *bjob = &s->common;
-        bdrv_unfreeze_backing_chain(blk_bs(bjob->blk), s->bottom);
+        bdrv_unfreeze_backing_chain(blk_bs(bjob->blk), s->above_base);
     }
 }
 
@@ -62,14 +63,15 @@ static int stream_prepare(Job *job)
     StreamBlockJob *s = container_of(job, StreamBlockJob, common.job);
     BlockJob *bjob = &s->common;
     BlockDriverState *bs = blk_bs(bjob->blk);
-    BlockDriverState *base = backing_bs(s->bottom);
+    BlockDriverState *unfiltered_bs = bdrv_skip_filters(bs);
+    BlockDriverState *base = bdrv_filter_or_cow_bs(s->above_base);
     Error *local_err = NULL;
     int ret = 0;
 
-    bdrv_unfreeze_backing_chain(bs, s->bottom);
+    bdrv_unfreeze_backing_chain(bs, s->above_base);
     s->chain_frozen = false;
 
-    if (bs->backing) {
+    if (bdrv_cow_child(unfiltered_bs)) {
         const char *base_id = NULL, *base_fmt = NULL;
         if (base) {
             base_id = s->backing_file_str;
@@ -77,8 +79,8 @@ static int stream_prepare(Job *job)
                 base_fmt = base->drv->format_name;
             }
         }
-        bdrv_set_backing_hd(bs, base, &local_err);
-        ret = bdrv_change_backing_file(bs, base_id, base_fmt);
+        bdrv_set_backing_hd(unfiltered_bs, base, &local_err);
+        ret = bdrv_change_backing_file(unfiltered_bs, base_id, base_fmt);
         if (local_err) {
             error_report_err(local_err);
             return -EPERM;
@@ -109,14 +111,15 @@ static int coroutine_fn stream_run(Job *job, Error **errp)
     StreamBlockJob *s = container_of(job, StreamBlockJob, common.job);
     BlockBackend *blk = s->common.blk;
     BlockDriverState *bs = blk_bs(blk);
-    bool enable_cor = !backing_bs(s->bottom);
+    BlockDriverState *unfiltered_bs = bdrv_skip_filters(bs);
+    bool enable_cor = !bdrv_cow_child(s->base_overlay);
     int64_t len;
     int64_t offset = 0;
     uint64_t delay_ns = 0;
     int error = 0;
     int64_t n = 0; /* bytes */
 
-    if (bs == s->bottom) {
+    if (unfiltered_bs == s->base_overlay) {
         /* Nothing to stream */
         return 0;
     }
@@ -150,13 +153,14 @@ static int coroutine_fn stream_run(Job *job, Error **errp)
 
         copy = false;
 
-        ret = bdrv_is_allocated(bs, offset, STREAM_CHUNK, &n);
+        ret = bdrv_is_allocated(unfiltered_bs, offset, STREAM_CHUNK, &n);
         if (ret == 1) {
             /* Allocated in the top, no need to copy.  */
         } else if (ret >= 0) {
             /* Copy if allocated in the intermediate images.  Limit to the
              * known-unallocated area [offset, offset+n*BDRV_SECTOR_SIZE).  */
-            ret = bdrv_is_allocated_above(backing_bs(bs), s->bottom, true,
+            ret = bdrv_is_allocated_above(bdrv_cow_bs(unfiltered_bs),
+                                          s->base_overlay, true,
                                           offset, n, &n);
             /* Finish early if end of backing file has been reached */
             if (ret == 0 && n == 0) {
@@ -223,9 +227,29 @@ void stream_start(const char *job_id, BlockDriverState *bs,
     BlockDriverState *iter;
     bool bs_read_only;
     int basic_flags = BLK_PERM_CONSISTENT_READ | BLK_PERM_WRITE_UNCHANGED;
-    BlockDriverState *bottom = bdrv_find_overlay(bs, base);
+    BlockDriverState *base_overlay = bdrv_find_overlay(bs, base);
+    BlockDriverState *above_base;
 
-    if (bdrv_freeze_backing_chain(bs, bottom, errp) < 0) {
+    if (!base_overlay) {
+        error_setg(errp, "'%s' is not in the backing chain of '%s'",
+                   base->node_name, bs->node_name);
+        return;
+    }
+
+    /*
+     * Find the node directly above @base.  @base_overlay is a COW overlay, so
+     * it must have a bdrv_cow_child(), but it is the immediate overlay of
+     * @base, so between the two there can only be filters.
+     */
+    above_base = base_overlay;
+    if (bdrv_cow_bs(above_base) != base) {
+        above_base = bdrv_cow_bs(above_base);
+        while (bdrv_filter_bs(above_base) != base) {
+            above_base = bdrv_filter_bs(above_base);
+        }
+    }
+
+    if (bdrv_freeze_backing_chain(bs, above_base, errp) < 0) {
         return;
     }
 
@@ -255,14 +279,19 @@ void stream_start(const char *job_id, BlockDriverState *bs,
      * and resizes. Reassign the base node pointer because the backing BS of the
      * bottom node might change after the call to bdrv_reopen_set_read_only()
      * due to parallel block jobs running.
+     * above_base node might change after the call to
+     * bdrv_reopen_set_read_only() due to parallel block jobs running.
      */
-    base = backing_bs(bottom);
-    for (iter = backing_bs(bs); iter && iter != base; iter = backing_bs(iter)) {
+    base = bdrv_filter_or_cow_bs(above_base);
+    for (iter = bdrv_filter_or_cow_bs(bs); iter != base;
+         iter = bdrv_filter_or_cow_bs(iter))
+    {
         block_job_add_bdrv(&s->common, "intermediate node", iter, 0,
                            basic_flags, &error_abort);
     }
 
-    s->bottom = bottom;
+    s->base_overlay = base_overlay;
+    s->above_base = above_base;
     s->backing_file_str = g_strdup(backing_file_str);
     s->bs_read_only = bs_read_only;
     s->chain_frozen = true;
@@ -276,5 +305,5 @@ fail:
     if (bs_read_only) {
         bdrv_reopen_set_read_only(bs, true, NULL);
     }
-    bdrv_unfreeze_backing_chain(bs, bottom);
+    bdrv_unfreeze_backing_chain(bs, above_base);
 }
diff --git a/blockdev.c b/blockdev.c
index 72df193ca7..1eb0fcdea2 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -2515,7 +2515,9 @@ void qmp_block_stream(bool has_job_id, const char *job_id, const char *device,
     }
 
     /* Check for op blockers in the whole chain between bs and base */
-    for (iter = bs; iter && iter != base_bs; iter = backing_bs(iter)) {
+    for (iter = bs; iter && iter != base_bs;
+         iter = bdrv_filter_or_cow_bs(iter))
+    {
         if (bdrv_op_is_blocked(iter, BLOCK_OP_TYPE_STREAM, errp)) {
             goto out;
         }
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 173+ messages in thread

* [PATCH v7 15/47] block: Use CAFs when working with backing chains
  2020-06-25 15:21 [PATCH v7 00/47] block: Deal with filters Max Reitz
                   ` (13 preceding siblings ...)
  2020-06-25 15:21 ` [PATCH v7 14/47] stream: Deal with filters Max Reitz
@ 2020-06-25 15:21 ` Max Reitz
  2020-07-10 15:28   ` Andrey Shinkevich
  2020-06-25 15:21 ` [PATCH v7 16/47] block: Use bdrv_cow_child() in bdrv_co_truncate() Max Reitz
                   ` (33 subsequent siblings)
  48 siblings, 1 reply; 173+ messages in thread
From: Max Reitz @ 2020-06-25 15:21 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, qemu-devel, Max Reitz

Use child access functions when iterating through backing chains so
filters do not break the chain.

In addition, bdrv_find_overlay() will now always return the actual
overlay; that is, it will never return a filter node but only one with a
COW backing file (there may be filter nodes between that node and @bs).

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 block.c | 41 +++++++++++++++++++++++++++++------------
 1 file changed, 29 insertions(+), 12 deletions(-)

diff --git a/block.c b/block.c
index a44af9c3c1..712230ef5c 100644
--- a/block.c
+++ b/block.c
@@ -4724,7 +4724,8 @@ int bdrv_change_backing_file(BlockDriverState *bs,
 }
 
 /*
- * Finds the image layer in the chain that has 'bs' as its backing file.
+ * Finds the image layer in the chain that has 'bs' (or a filter on
+ * top of it) as its backing file.
  *
  * active is the current topmost image.
  *
@@ -4736,11 +4737,18 @@ int bdrv_change_backing_file(BlockDriverState *bs,
 BlockDriverState *bdrv_find_overlay(BlockDriverState *active,
                                     BlockDriverState *bs)
 {
-    while (active && bs != backing_bs(active)) {
-        active = backing_bs(active);
+    bs = bdrv_skip_filters(bs);
+    active = bdrv_skip_filters(active);
+
+    while (active) {
+        BlockDriverState *next = bdrv_backing_chain_next(active);
+        if (bs == next) {
+            return active;
+        }
+        active = next;
     }
 
-    return active;
+    return NULL;
 }
 
 /* Given a BDS, searches for the base layer. */
@@ -4892,9 +4900,7 @@ int bdrv_drop_intermediate(BlockDriverState *top, BlockDriverState *base,
      * other intermediate nodes have been dropped.
      * If 'top' is an implicit node (e.g. "commit_top") we should skip
      * it because no one inherits from it. We use explicit_top for that. */
-    while (explicit_top && explicit_top->implicit) {
-        explicit_top = backing_bs(explicit_top);
-    }
+    explicit_top = bdrv_skip_implicit_filters(explicit_top);
     update_inherits_from = bdrv_inherits_from_recursive(base, explicit_top);
 
     /* success - we can delete the intermediate states, and link top->base */
@@ -5351,7 +5357,7 @@ BlockDriverState *bdrv_lookup_bs(const char *device,
 bool bdrv_chain_contains(BlockDriverState *top, BlockDriverState *base)
 {
     while (top && top != base) {
-        top = backing_bs(top);
+        top = bdrv_filter_or_cow_bs(top);
     }
 
     return top != NULL;
@@ -5607,6 +5613,7 @@ BlockDriverState *bdrv_find_backing_image(BlockDriverState *bs,
     int is_protocol = 0;
     BlockDriverState *curr_bs = NULL;
     BlockDriverState *retval = NULL;
+    BlockDriverState *bs_below;
 
     if (!bs || !bs->drv || !backing_file) {
         return NULL;
@@ -5617,7 +5624,17 @@ BlockDriverState *bdrv_find_backing_image(BlockDriverState *bs,
 
     is_protocol = path_has_protocol(backing_file);
 
-    for (curr_bs = bs; curr_bs->backing; curr_bs = curr_bs->backing->bs) {
+    /*
+     * Being largely a legacy function, skip any filters here
+     * (because filters do not have normal filenames, so they cannot
+     * match anyway; and allowing json:{} filenames is a bit out of
+     * scope).
+     */
+    for (curr_bs = bdrv_skip_filters(bs);
+         bdrv_cow_child(curr_bs) != NULL;
+         curr_bs = bs_below)
+    {
+        bs_below = bdrv_backing_chain_next(curr_bs);
 
         /* If either of the filename paths is actually a protocol, then
          * compare unmodified paths; otherwise make paths relative */
@@ -5625,7 +5642,7 @@ BlockDriverState *bdrv_find_backing_image(BlockDriverState *bs,
             char *backing_file_full_ret;
 
             if (strcmp(backing_file, curr_bs->backing_file) == 0) {
-                retval = curr_bs->backing->bs;
+                retval = bs_below;
                 break;
             }
             /* Also check against the full backing filename for the image */
@@ -5635,7 +5652,7 @@ BlockDriverState *bdrv_find_backing_image(BlockDriverState *bs,
                 bool equal = strcmp(backing_file, backing_file_full_ret) == 0;
                 g_free(backing_file_full_ret);
                 if (equal) {
-                    retval = curr_bs->backing->bs;
+                    retval = bs_below;
                     break;
                 }
             }
@@ -5661,7 +5678,7 @@ BlockDriverState *bdrv_find_backing_image(BlockDriverState *bs,
             g_free(filename_tmp);
 
             if (strcmp(backing_file_full, filename_full) == 0) {
-                retval = curr_bs->backing->bs;
+                retval = bs_below;
                 break;
             }
         }
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 173+ messages in thread

* [PATCH v7 16/47] block: Use bdrv_cow_child() in bdrv_co_truncate()
  2020-06-25 15:21 [PATCH v7 00/47] block: Deal with filters Max Reitz
                   ` (14 preceding siblings ...)
  2020-06-25 15:21 ` [PATCH v7 15/47] block: Use CAFs when working with backing chains Max Reitz
@ 2020-06-25 15:21 ` Max Reitz
  2020-07-10 15:54   ` Andrey Shinkevich
  2020-06-25 15:21 ` [PATCH v7 17/47] block: Re-evaluate backing file handling in reopen Max Reitz
                   ` (32 subsequent siblings)
  48 siblings, 1 reply; 173+ messages in thread
From: Max Reitz @ 2020-06-25 15:21 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, qemu-devel, Max Reitz

The condition modified here is not about potentially filtered children,
but only about COW sources (i.e. traditional backing files).

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 block/io.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/block/io.c b/block/io.c
index dc9891d6ce..097a3861d8 100644
--- a/block/io.c
+++ b/block/io.c
@@ -3308,7 +3308,7 @@ int coroutine_fn bdrv_co_truncate(BdrvChild *child, int64_t offset, bool exact,
                                   Error **errp)
 {
     BlockDriverState *bs = child->bs;
-    BdrvChild *filtered;
+    BdrvChild *filtered, *backing;
     BlockDriver *drv = bs->drv;
     BdrvTrackedRequest req;
     int64_t old_size, new_bytes;
@@ -3361,6 +3361,7 @@ int coroutine_fn bdrv_co_truncate(BdrvChild *child, int64_t offset, bool exact,
     }
 
     filtered = bdrv_filter_child(bs);
+    backing = bdrv_cow_child(bs);
 
     /*
      * If the image has a backing file that is large enough that it would
@@ -3372,10 +3373,10 @@ int coroutine_fn bdrv_co_truncate(BdrvChild *child, int64_t offset, bool exact,
      * backing file, taking care of keeping things consistent with that backing
      * file is the user's responsibility.
      */
-    if (new_bytes && bs->backing) {
+    if (new_bytes && backing) {
         int64_t backing_len;
 
-        backing_len = bdrv_getlength(backing_bs(bs));
+        backing_len = bdrv_getlength(backing->bs);
         if (backing_len < 0) {
             ret = backing_len;
             error_setg_errno(errp, -ret, "Could not get backing file size");
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 173+ messages in thread

* [PATCH v7 17/47] block: Re-evaluate backing file handling in reopen
  2020-06-25 15:21 [PATCH v7 00/47] block: Deal with filters Max Reitz
                   ` (15 preceding siblings ...)
  2020-06-25 15:21 ` [PATCH v7 16/47] block: Use bdrv_cow_child() in bdrv_co_truncate() Max Reitz
@ 2020-06-25 15:21 ` Max Reitz
  2020-07-10 19:42   ` Andrey Shinkevich
  2020-06-25 15:21 ` [PATCH v7 18/47] block: Flush all children in generic code Max Reitz
                   ` (31 subsequent siblings)
  48 siblings, 1 reply; 173+ messages in thread
From: Max Reitz @ 2020-06-25 15:21 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, qemu-devel, Max Reitz

Reopening a node's backing child needs a bit of special handling because
the "backing" child has different defaults than all other children
(among other things).  Adding filter support here is a bit more
difficult than just using the child access functions.  In fact, we often
have to directly use bs->backing because these functions are about the
"backing" child (which may or may not be the COW backing file).

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 block.c | 46 ++++++++++++++++++++++++++++++++++++++--------
 1 file changed, 38 insertions(+), 8 deletions(-)

diff --git a/block.c b/block.c
index 712230ef5c..8131d0b5eb 100644
--- a/block.c
+++ b/block.c
@@ -4026,26 +4026,56 @@ static int bdrv_reopen_parse_backing(BDRVReopenState *reopen_state,
         }
     }
 
+    /*
+     * Ensure that @bs can really handle backing files, because we are
+     * about to give it one (or swap the existing one)
+     */
+    if (bs->drv->is_filter) {
+        /* Filters always have a file or a backing child */
+        if (!bs->backing) {
+            error_setg(errp, "'%s' is a %s filter node that does not support a "
+                       "backing child", bs->node_name, bs->drv->format_name);
+            return -EINVAL;
+        }
+    } else if (!bs->drv->supports_backing) {
+        error_setg(errp, "Driver '%s' of node '%s' does not support backing "
+                   "files", bs->drv->format_name, bs->node_name);
+        return -EINVAL;
+    }
+
     /*
      * Find the "actual" backing file by skipping all links that point
      * to an implicit node, if any (e.g. a commit filter node).
+     * We cannot use any of the bdrv_skip_*() functions here because
+     * those return the first explicit node, while we are looking for
+     * its overlay here.
      */
     overlay_bs = bs;
-    while (backing_bs(overlay_bs) && backing_bs(overlay_bs)->implicit) {
-        overlay_bs = backing_bs(overlay_bs);
+    while (bdrv_filter_or_cow_bs(overlay_bs) &&
+           bdrv_filter_or_cow_bs(overlay_bs)->implicit)
+    {
+        overlay_bs = bdrv_filter_or_cow_bs(overlay_bs);
     }
 
     /* If we want to replace the backing file we need some extra checks */
-    if (new_backing_bs != backing_bs(overlay_bs)) {
+    if (new_backing_bs != bdrv_filter_or_cow_bs(overlay_bs)) {
         /* Check for implicit nodes between bs and its backing file */
         if (bs != overlay_bs) {
             error_setg(errp, "Cannot change backing link if '%s' has "
                        "an implicit backing file", bs->node_name);
             return -EPERM;
         }
-        /* Check if the backing link that we want to replace is frozen */
-        if (bdrv_is_backing_chain_frozen(overlay_bs, backing_bs(overlay_bs),
-                                         errp)) {
+        /*
+         * Check if the backing link that we want to replace is frozen.
+         * Note that
+         * bdrv_filter_or_cow_child(overlay_bs) == overlay_bs->backing,
+         * because we know that overlay_bs == bs, and that @bs
+         * either is a filter that uses ->backing or a COW format BDS
+         * with bs->drv->supports_backing == true.
+         */
+        if (bdrv_is_backing_chain_frozen(overlay_bs,
+                                         child_bs(overlay_bs->backing), errp))
+        {
             return -EPERM;
         }
         reopen_state->replace_backing_bs = true;
@@ -4196,7 +4226,7 @@ int bdrv_reopen_prepare(BDRVReopenState *reopen_state, BlockReopenQueue *queue,
      * its metadata. Otherwise the 'backing' option can be omitted.
      */
     if (drv->supports_backing && reopen_state->backing_missing &&
-        (backing_bs(reopen_state->bs) || reopen_state->bs->backing_file[0])) {
+        (reopen_state->bs->backing || reopen_state->bs->backing_file[0])) {
         error_setg(errp, "backing is missing for '%s'",
                    reopen_state->bs->node_name);
         ret = -EINVAL;
@@ -4337,7 +4367,7 @@ void bdrv_reopen_commit(BDRVReopenState *reopen_state)
      * from bdrv_set_backing_hd()) has the new values.
      */
     if (reopen_state->replace_backing_bs) {
-        BlockDriverState *old_backing_bs = backing_bs(bs);
+        BlockDriverState *old_backing_bs = child_bs(bs->backing);
         assert(!old_backing_bs || !old_backing_bs->implicit);
         /* Abort the permission update on the backing bs we're detaching */
         if (old_backing_bs) {
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 173+ messages in thread

* [PATCH v7 18/47] block: Flush all children in generic code
  2020-06-25 15:21 [PATCH v7 00/47] block: Deal with filters Max Reitz
                   ` (16 preceding siblings ...)
  2020-06-25 15:21 ` [PATCH v7 17/47] block: Re-evaluate backing file handling in reopen Max Reitz
@ 2020-06-25 15:21 ` Max Reitz
  2020-07-14 12:52   ` Andrey Shinkevich
  2020-06-25 15:21 ` [PATCH v7 19/47] vmdk: Drop vmdk_co_flush() Max Reitz
                   ` (30 subsequent siblings)
  48 siblings, 1 reply; 173+ messages in thread
From: Max Reitz @ 2020-06-25 15:21 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, qemu-devel, Max Reitz

If the driver does not support .bdrv_co_flush() so bdrv_co_flush()
itself has to flush the children of the given node, it should not flush
just bs->file->bs, but in fact all children that might have been written
to (judging from the permissions taken on them).

This is a bug fix for qcow2 images with an external data file, as they
so far did not flush that data_file node.

In any case, the BLKDBG_EVENT() should be emitted on the primary child,
because that is where a blkdebug node would be if there is any.

Suggested-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 block/io.c | 23 +++++++++++++++++------
 1 file changed, 17 insertions(+), 6 deletions(-)

diff --git a/block/io.c b/block/io.c
index 097a3861d8..c2af7711d6 100644
--- a/block/io.c
+++ b/block/io.c
@@ -2769,6 +2769,8 @@ static int coroutine_fn bdrv_flush_co_entry(void *opaque)
 
 int coroutine_fn bdrv_co_flush(BlockDriverState *bs)
 {
+    BdrvChild *primary_child = bdrv_primary_child(bs);
+    BdrvChild *child;
     int current_gen;
     int ret = 0;
 
@@ -2798,7 +2800,7 @@ int coroutine_fn bdrv_co_flush(BlockDriverState *bs)
     }
 
     /* Write back cached data to the OS even with cache=unsafe */
-    BLKDBG_EVENT(bs->file, BLKDBG_FLUSH_TO_OS);
+    BLKDBG_EVENT(primary_child, BLKDBG_FLUSH_TO_OS);
     if (bs->drv->bdrv_co_flush_to_os) {
         ret = bs->drv->bdrv_co_flush_to_os(bs);
         if (ret < 0) {
@@ -2808,15 +2810,15 @@ int coroutine_fn bdrv_co_flush(BlockDriverState *bs)
 
     /* But don't actually force it to the disk with cache=unsafe */
     if (bs->open_flags & BDRV_O_NO_FLUSH) {
-        goto flush_parent;
+        goto flush_children;
     }
 
     /* Check if we really need to flush anything */
     if (bs->flushed_gen == current_gen) {
-        goto flush_parent;
+        goto flush_children;
     }
 
-    BLKDBG_EVENT(bs->file, BLKDBG_FLUSH_TO_DISK);
+    BLKDBG_EVENT(primary_child, BLKDBG_FLUSH_TO_DISK);
     if (!bs->drv) {
         /* bs->drv->bdrv_co_flush() might have ejected the BDS
          * (even in case of apparent success) */
@@ -2860,8 +2862,17 @@ int coroutine_fn bdrv_co_flush(BlockDriverState *bs)
     /* Now flush the underlying protocol.  It will also have BDRV_O_NO_FLUSH
      * in the case of cache=unsafe, so there are no useless flushes.
      */
-flush_parent:
-    ret = bs->file ? bdrv_co_flush(bs->file->bs) : 0;
+flush_children:
+    ret = 0;
+    QLIST_FOREACH(child, &bs->children, next) {
+        if (child->perm & (BLK_PERM_WRITE | BLK_PERM_WRITE_UNCHANGED)) {
+            int this_child_ret = bdrv_co_flush(child->bs);
+            if (!ret) {
+                ret = this_child_ret;
+            }
+        }
+    }
+
 out:
     /* Notify any pending flushes that we have completed */
     if (ret == 0) {
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 173+ messages in thread

* [PATCH v7 19/47] vmdk: Drop vmdk_co_flush()
  2020-06-25 15:21 [PATCH v7 00/47] block: Deal with filters Max Reitz
                   ` (17 preceding siblings ...)
  2020-06-25 15:21 ` [PATCH v7 18/47] block: Flush all children in generic code Max Reitz
@ 2020-06-25 15:21 ` Max Reitz
  2020-07-14 14:52   ` Andrey Shinkevich
  2020-06-25 15:21 ` [PATCH v7 20/47] block: Iterate over children in refresh_limits Max Reitz
                   ` (29 subsequent siblings)
  48 siblings, 1 reply; 173+ messages in thread
From: Max Reitz @ 2020-06-25 15:21 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, qemu-devel, Max Reitz

Before HEAD^, we needed this because bdrv_co_flush() by itself would
only flush bs->file.  With HEAD^, bdrv_co_flush() will flush all
children on which a WRITE or WRITE_UNCHANGED permission has been taken.
Thus, vmdk no longer needs to do it itself.

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 block/vmdk.c | 16 ----------------
 1 file changed, 16 deletions(-)

diff --git a/block/vmdk.c b/block/vmdk.c
index 62da465126..a23890e6ec 100644
--- a/block/vmdk.c
+++ b/block/vmdk.c
@@ -2802,21 +2802,6 @@ static void vmdk_close(BlockDriverState *bs)
     error_free(s->migration_blocker);
 }
 
-static coroutine_fn int vmdk_co_flush(BlockDriverState *bs)
-{
-    BDRVVmdkState *s = bs->opaque;
-    int i, err;
-    int ret = 0;
-
-    for (i = 0; i < s->num_extents; i++) {
-        err = bdrv_co_flush(s->extents[i].file->bs);
-        if (err < 0) {
-            ret = err;
-        }
-    }
-    return ret;
-}
-
 static int64_t vmdk_get_allocated_file_size(BlockDriverState *bs)
 {
     int i;
@@ -3075,7 +3060,6 @@ static BlockDriver bdrv_vmdk = {
     .bdrv_close                   = vmdk_close,
     .bdrv_co_create_opts          = vmdk_co_create_opts,
     .bdrv_co_create               = vmdk_co_create,
-    .bdrv_co_flush_to_disk        = vmdk_co_flush,
     .bdrv_co_block_status         = vmdk_co_block_status,
     .bdrv_get_allocated_file_size = vmdk_get_allocated_file_size,
     .bdrv_has_zero_init           = vmdk_has_zero_init,
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 173+ messages in thread

* [PATCH v7 20/47] block: Iterate over children in refresh_limits
  2020-06-25 15:21 [PATCH v7 00/47] block: Deal with filters Max Reitz
                   ` (18 preceding siblings ...)
  2020-06-25 15:21 ` [PATCH v7 19/47] vmdk: Drop vmdk_co_flush() Max Reitz
@ 2020-06-25 15:21 ` Max Reitz
  2020-07-14 18:37   ` Andrey Shinkevich
  2020-06-25 15:21 ` [PATCH v7 21/47] block: Use CAFs in bdrv_refresh_filename() Max Reitz
                   ` (28 subsequent siblings)
  48 siblings, 1 reply; 173+ messages in thread
From: Max Reitz @ 2020-06-25 15:21 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, qemu-devel, Max Reitz

Instead of looking at just bs->file and bs->backing, we should look at
all children that could end up receiving forwarded requests.

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 block/io.c | 32 ++++++++++++++++----------------
 1 file changed, 16 insertions(+), 16 deletions(-)

diff --git a/block/io.c b/block/io.c
index c2af7711d6..37057f13e0 100644
--- a/block/io.c
+++ b/block/io.c
@@ -135,6 +135,8 @@ static void bdrv_merge_limits(BlockLimits *dst, const BlockLimits *src)
 void bdrv_refresh_limits(BlockDriverState *bs, Error **errp)
 {
     BlockDriver *drv = bs->drv;
+    BdrvChild *c;
+    bool have_limits;
     Error *local_err = NULL;
 
     memset(&bs->bl, 0, sizeof(bs->bl));
@@ -149,14 +151,21 @@ void bdrv_refresh_limits(BlockDriverState *bs, Error **errp)
                                 drv->bdrv_co_preadv_part) ? 1 : 512;
 
     /* Take some limits from the children as a default */
-    if (bs->file) {
-        bdrv_refresh_limits(bs->file->bs, &local_err);
-        if (local_err) {
-            error_propagate(errp, local_err);
-            return;
+    have_limits = false;
+    QLIST_FOREACH(c, &bs->children, next) {
+        if (c->role & (BDRV_CHILD_DATA | BDRV_CHILD_FILTERED | BDRV_CHILD_COW))
+        {
+            bdrv_refresh_limits(c->bs, &local_err);
+            if (local_err) {
+                error_propagate(errp, local_err);
+                return;
+            }
+            bdrv_merge_limits(&bs->bl, &c->bs->bl);
+            have_limits = true;
         }
-        bdrv_merge_limits(&bs->bl, &bs->file->bs->bl);
-    } else {
+    }
+
+    if (!have_limits) {
         bs->bl.min_mem_alignment = 512;
         bs->bl.opt_mem_alignment = qemu_real_host_page_size;
 
@@ -164,15 +173,6 @@ void bdrv_refresh_limits(BlockDriverState *bs, Error **errp)
         bs->bl.max_iov = IOV_MAX;
     }
 
-    if (bs->backing) {
-        bdrv_refresh_limits(bs->backing->bs, &local_err);
-        if (local_err) {
-            error_propagate(errp, local_err);
-            return;
-        }
-        bdrv_merge_limits(&bs->bl, &bs->backing->bs->bl);
-    }
-
     /* Then let the driver override it */
     if (drv->bdrv_refresh_limits) {
         drv->bdrv_refresh_limits(bs, errp);
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 173+ messages in thread

* [PATCH v7 21/47] block: Use CAFs in bdrv_refresh_filename()
  2020-06-25 15:21 [PATCH v7 00/47] block: Deal with filters Max Reitz
                   ` (19 preceding siblings ...)
  2020-06-25 15:21 ` [PATCH v7 20/47] block: Iterate over children in refresh_limits Max Reitz
@ 2020-06-25 15:21 ` Max Reitz
  2020-07-15 12:52   ` Andrey Shinkevich
  2020-06-25 15:21 ` [PATCH v7 22/47] block: Use CAF in bdrv_co_rw_vmstate() Max Reitz
                   ` (27 subsequent siblings)
  48 siblings, 1 reply; 173+ messages in thread
From: Max Reitz @ 2020-06-25 15:21 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, qemu-devel, Max Reitz

bdrv_refresh_filename() and the kind of related bdrv_dirname() should
look to the primary child when they wish to copy the underlying file's
filename.

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 block.c | 29 +++++++++++++++++++++--------
 1 file changed, 21 insertions(+), 8 deletions(-)

diff --git a/block.c b/block.c
index 8131d0b5eb..7c827fefa0 100644
--- a/block.c
+++ b/block.c
@@ -6797,6 +6797,7 @@ void bdrv_refresh_filename(BlockDriverState *bs)
 {
     BlockDriver *drv = bs->drv;
     BdrvChild *child;
+    BlockDriverState *primary_child_bs;
     QDict *opts;
     bool backing_overridden;
     bool generate_json_filename; /* Whether our default implementation should
@@ -6866,20 +6867,30 @@ void bdrv_refresh_filename(BlockDriverState *bs)
     qobject_unref(bs->full_open_options);
     bs->full_open_options = opts;
 
+    primary_child_bs = bdrv_primary_bs(bs);
+
     if (drv->bdrv_refresh_filename) {
         /* Obsolete information is of no use here, so drop the old file name
          * information before refreshing it */
         bs->exact_filename[0] = '\0';
 
         drv->bdrv_refresh_filename(bs);
-    } else if (bs->file) {
-        /* Try to reconstruct valid information from the underlying file */
+    } else if (primary_child_bs) {
+        /*
+         * Try to reconstruct valid information from the underlying
+         * file -- this only works for format nodes (filter nodes
+         * cannot be probed and as such must be selected by the user
+         * either through an options dict, or through a special
+         * filename which the filter driver must construct in its
+         * .bdrv_refresh_filename() implementation).
+         */
 
         bs->exact_filename[0] = '\0';
 
         /*
          * We can use the underlying file's filename if:
          * - it has a filename,
+         * - the current BDS is not a filter,
          * - the file is a protocol BDS, and
          * - opening that file (as this BDS's format) will automatically create
          *   the BDS tree we have right now, that is:
@@ -6888,11 +6899,11 @@ void bdrv_refresh_filename(BlockDriverState *bs)
          *   - no non-file child of this BDS has been overridden by the user
          *   Both of these conditions are represented by generate_json_filename.
          */
-        if (bs->file->bs->exact_filename[0] &&
-            bs->file->bs->drv->bdrv_file_open &&
-            !generate_json_filename)
+        if (primary_child_bs->exact_filename[0] &&
+            primary_child_bs->drv->bdrv_file_open &&
+            !drv->is_filter && !generate_json_filename)
         {
-            strcpy(bs->exact_filename, bs->file->bs->exact_filename);
+            strcpy(bs->exact_filename, primary_child_bs->exact_filename);
         }
     }
 
@@ -6912,6 +6923,7 @@ void bdrv_refresh_filename(BlockDriverState *bs)
 char *bdrv_dirname(BlockDriverState *bs, Error **errp)
 {
     BlockDriver *drv = bs->drv;
+    BlockDriverState *child_bs;
 
     if (!drv) {
         error_setg(errp, "Node '%s' is ejected", bs->node_name);
@@ -6922,8 +6934,9 @@ char *bdrv_dirname(BlockDriverState *bs, Error **errp)
         return drv->bdrv_dirname(bs, errp);
     }
 
-    if (bs->file) {
-        return bdrv_dirname(bs->file->bs, errp);
+    child_bs = bdrv_primary_bs(bs);
+    if (child_bs) {
+        return bdrv_dirname(child_bs, errp);
     }
 
     bdrv_refresh_filename(bs);
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 173+ messages in thread

* [PATCH v7 22/47] block: Use CAF in bdrv_co_rw_vmstate()
  2020-06-25 15:21 [PATCH v7 00/47] block: Deal with filters Max Reitz
                   ` (20 preceding siblings ...)
  2020-06-25 15:21 ` [PATCH v7 21/47] block: Use CAFs in bdrv_refresh_filename() Max Reitz
@ 2020-06-25 15:21 ` Max Reitz
  2020-07-15 13:39   ` Andrey Shinkevich
  2020-06-25 15:21 ` [PATCH v7 23/47] block/snapshot: Fix fallback Max Reitz
                   ` (26 subsequent siblings)
  48 siblings, 1 reply; 173+ messages in thread
From: Max Reitz @ 2020-06-25 15:21 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, qemu-devel, Max Reitz

If a node whose driver does not provide VM state functions has a
metadata child, the VM state should probably go there; if it is a
filter, the VM state should probably go there.  It follows that we
should generally go down to the primary child.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 block/io.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/block/io.c b/block/io.c
index 37057f13e0..9e802804bb 100644
--- a/block/io.c
+++ b/block/io.c
@@ -2646,6 +2646,7 @@ bdrv_co_rw_vmstate(BlockDriverState *bs, QEMUIOVector *qiov, int64_t pos,
                    bool is_read)
 {
     BlockDriver *drv = bs->drv;
+    BlockDriverState *child_bs = bdrv_primary_bs(bs);
     int ret = -ENOTSUP;
 
     bdrv_inc_in_flight(bs);
@@ -2658,8 +2659,8 @@ bdrv_co_rw_vmstate(BlockDriverState *bs, QEMUIOVector *qiov, int64_t pos,
         } else {
             ret = drv->bdrv_save_vmstate(bs, qiov, pos);
         }
-    } else if (bs->file) {
-        ret = bdrv_co_rw_vmstate(bs->file->bs, qiov, pos, is_read);
+    } else if (child_bs) {
+        ret = bdrv_co_rw_vmstate(child_bs, qiov, pos, is_read);
     }
 
     bdrv_dec_in_flight(bs);
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 173+ messages in thread

* [PATCH v7 23/47] block/snapshot: Fix fallback
  2020-06-25 15:21 [PATCH v7 00/47] block: Deal with filters Max Reitz
                   ` (21 preceding siblings ...)
  2020-06-25 15:21 ` [PATCH v7 22/47] block: Use CAF in bdrv_co_rw_vmstate() Max Reitz
@ 2020-06-25 15:21 ` Max Reitz
  2020-07-15 21:22   ` Andrey Shinkevich
  2020-06-25 15:21 ` [PATCH v7 24/47] block: Use CAFs for debug breakpoints Max Reitz
                   ` (25 subsequent siblings)
  48 siblings, 1 reply; 173+ messages in thread
From: Max Reitz @ 2020-06-25 15:21 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, qemu-devel, Max Reitz

If the top node's driver does not provide snapshot functionality and we
want to fall back to a node down the chain, we need to snapshot all
non-COW children.  For simplicity's sake, just do not fall back if there
is more than one such child.  Furthermore, we really only can fall back
to bs->file and bs->backing, because bdrv_snapshot_goto() has to modify
the child link (notably, set it to NULL).

Suggested-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 block/snapshot.c | 104 +++++++++++++++++++++++++++++++++++++----------
 1 file changed, 83 insertions(+), 21 deletions(-)

diff --git a/block/snapshot.c b/block/snapshot.c
index bd9fb01817..a2bf3a54eb 100644
--- a/block/snapshot.c
+++ b/block/snapshot.c
@@ -147,6 +147,56 @@ bool bdrv_snapshot_find_by_id_and_name(BlockDriverState *bs,
     return ret;
 }
 
+/**
+ * Return a pointer to the child BDS pointer to which we can fall
+ * back if the given BDS does not support snapshots.
+ * Return NULL if there is no BDS to (safely) fall back to.
+ *
+ * We need to return an indirect pointer because bdrv_snapshot_goto()
+ * has to modify the BdrvChild pointer.
+ */
+static BdrvChild **bdrv_snapshot_fallback_ptr(BlockDriverState *bs)
+{
+    BdrvChild **fallback;
+    BdrvChild *child;
+
+    /*
+     * The only BdrvChild pointers that are safe to modify (and which
+     * we can thus return a reference to) are bs->file and
+     * bs->backing.
+     */
+    fallback = &bs->file;
+    if (!*fallback && bs->drv && bs->drv->is_filter) {
+        fallback = &bs->backing;
+    }
+
+    if (!*fallback) {
+        return NULL;
+    }
+
+    /*
+     * Check that there are no other children that would need to be
+     * snapshotted.  If there are, it is not safe to fall back to
+     * *fallback.
+     */
+    QLIST_FOREACH(child, &bs->children, next) {
+        if (child->role & (BDRV_CHILD_DATA | BDRV_CHILD_METADATA |
+                           BDRV_CHILD_FILTERED) &&
+            child != *fallback)
+        {
+            return NULL;
+        }
+    }
+
+    return fallback;
+}
+
+static BlockDriverState *bdrv_snapshot_fallback(BlockDriverState *bs)
+{
+    BdrvChild **child_ptr = bdrv_snapshot_fallback_ptr(bs);
+    return child_ptr ? (*child_ptr)->bs : NULL;
+}
+
 int bdrv_can_snapshot(BlockDriverState *bs)
 {
     BlockDriver *drv = bs->drv;
@@ -155,8 +205,9 @@ int bdrv_can_snapshot(BlockDriverState *bs)
     }
 
     if (!drv->bdrv_snapshot_create) {
-        if (bs->file != NULL) {
-            return bdrv_can_snapshot(bs->file->bs);
+        BlockDriverState *fallback_bs = bdrv_snapshot_fallback(bs);
+        if (fallback_bs) {
+            return bdrv_can_snapshot(fallback_bs);
         }
         return 0;
     }
@@ -168,14 +219,15 @@ int bdrv_snapshot_create(BlockDriverState *bs,
                          QEMUSnapshotInfo *sn_info)
 {
     BlockDriver *drv = bs->drv;
+    BlockDriverState *fallback_bs = bdrv_snapshot_fallback(bs);
     if (!drv) {
         return -ENOMEDIUM;
     }
     if (drv->bdrv_snapshot_create) {
         return drv->bdrv_snapshot_create(bs, sn_info);
     }
-    if (bs->file) {
-        return bdrv_snapshot_create(bs->file->bs, sn_info);
+    if (fallback_bs) {
+        return bdrv_snapshot_create(fallback_bs, sn_info);
     }
     return -ENOTSUP;
 }
@@ -185,6 +237,7 @@ int bdrv_snapshot_goto(BlockDriverState *bs,
                        Error **errp)
 {
     BlockDriver *drv = bs->drv;
+    BdrvChild **fallback_ptr;
     int ret, open_ret;
 
     if (!drv) {
@@ -205,39 +258,46 @@ int bdrv_snapshot_goto(BlockDriverState *bs,
         return ret;
     }
 
-    if (bs->file) {
-        BlockDriverState *file;
-        QDict *options = qdict_clone_shallow(bs->options);
+    fallback_ptr = bdrv_snapshot_fallback_ptr(bs);
+    if (fallback_ptr) {
+        QDict *options;
         QDict *file_options;
         Error *local_err = NULL;
+        BlockDriverState *fallback_bs = (*fallback_ptr)->bs;
+        char *subqdict_prefix = g_strdup_printf("%s.", (*fallback_ptr)->name);
+
+        options = qdict_clone_shallow(bs->options);
 
-        file = bs->file->bs;
         /* Prevent it from getting deleted when detached from bs */
-        bdrv_ref(file);
+        bdrv_ref(fallback_bs);
 
-        qdict_extract_subqdict(options, &file_options, "file.");
+        qdict_extract_subqdict(options, &file_options, subqdict_prefix);
         qobject_unref(file_options);
-        qdict_put_str(options, "file", bdrv_get_node_name(file));
+        g_free(subqdict_prefix);
+
+        qdict_put_str(options, (*fallback_ptr)->name,
+                      bdrv_get_node_name(fallback_bs));
 
         if (drv->bdrv_close) {
             drv->bdrv_close(bs);
         }
-        bdrv_unref_child(bs, bs->file);
-        bs->file = NULL;
 
-        ret = bdrv_snapshot_goto(file, snapshot_id, errp);
+        bdrv_unref_child(bs, *fallback_ptr);
+        *fallback_ptr = NULL;
+
+        ret = bdrv_snapshot_goto(fallback_bs, snapshot_id, errp);
         open_ret = drv->bdrv_open(bs, options, bs->open_flags, &local_err);
         qobject_unref(options);
         if (open_ret < 0) {
-            bdrv_unref(file);
+            bdrv_unref(fallback_bs);
             bs->drv = NULL;
             /* A bdrv_snapshot_goto() error takes precedence */
             error_propagate(errp, local_err);
             return ret < 0 ? ret : open_ret;
         }
 
-        assert(bs->file->bs == file);
-        bdrv_unref(file);
+        assert(fallback_bs == (*fallback_ptr)->bs);
+        bdrv_unref(fallback_bs);
         return ret;
     }
 
@@ -273,6 +333,7 @@ int bdrv_snapshot_delete(BlockDriverState *bs,
                          Error **errp)
 {
     BlockDriver *drv = bs->drv;
+    BlockDriverState *fallback_bs = bdrv_snapshot_fallback(bs);
     int ret;
 
     if (!drv) {
@@ -289,8 +350,8 @@ int bdrv_snapshot_delete(BlockDriverState *bs,
 
     if (drv->bdrv_snapshot_delete) {
         ret = drv->bdrv_snapshot_delete(bs, snapshot_id, name, errp);
-    } else if (bs->file) {
-        ret = bdrv_snapshot_delete(bs->file->bs, snapshot_id, name, errp);
+    } else if (fallback_bs) {
+        ret = bdrv_snapshot_delete(fallback_bs, snapshot_id, name, errp);
     } else {
         error_setg(errp, "Block format '%s' used by device '%s' "
                    "does not support internal snapshot deletion",
@@ -306,14 +367,15 @@ int bdrv_snapshot_list(BlockDriverState *bs,
                        QEMUSnapshotInfo **psn_info)
 {
     BlockDriver *drv = bs->drv;
+    BlockDriverState *fallback_bs = bdrv_snapshot_fallback(bs);
     if (!drv) {
         return -ENOMEDIUM;
     }
     if (drv->bdrv_snapshot_list) {
         return drv->bdrv_snapshot_list(bs, psn_info);
     }
-    if (bs->file) {
-        return bdrv_snapshot_list(bs->file->bs, psn_info);
+    if (fallback_bs) {
+        return bdrv_snapshot_list(fallback_bs, psn_info);
     }
     return -ENOTSUP;
 }
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 173+ messages in thread

* [PATCH v7 24/47] block: Use CAFs for debug breakpoints
  2020-06-25 15:21 [PATCH v7 00/47] block: Deal with filters Max Reitz
                   ` (22 preceding siblings ...)
  2020-06-25 15:21 ` [PATCH v7 23/47] block/snapshot: Fix fallback Max Reitz
@ 2020-06-25 15:21 ` Max Reitz
  2020-07-15 21:43   ` Andrey Shinkevich
  2020-06-25 15:21 ` [PATCH v7 25/47] block: Def. impl.s for get_allocated_file_size Max Reitz
                   ` (24 subsequent siblings)
  48 siblings, 1 reply; 173+ messages in thread
From: Max Reitz @ 2020-06-25 15:21 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, qemu-devel, Max Reitz

When looking for a blkdebug node (which implements debug breakpoints),
use bdrv_primary_bs() to iterate through the graph, because that is
where a blkdebug node would be.

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 block.c | 16 +++-------------
 1 file changed, 3 insertions(+), 13 deletions(-)

diff --git a/block.c b/block.c
index 7c827fefa0..1c71ecab7c 100644
--- a/block.c
+++ b/block.c
@@ -5562,17 +5562,7 @@ void bdrv_debug_event(BlockDriverState *bs, BlkdebugEvent event)
 static BlockDriverState *bdrv_find_debug_node(BlockDriverState *bs)
 {
     while (bs && bs->drv && !bs->drv->bdrv_debug_breakpoint) {
-        if (bs->file) {
-            bs = bs->file->bs;
-            continue;
-        }
-
-        if (bs->drv->is_filter && bs->backing) {
-            bs = bs->backing->bs;
-            continue;
-        }
-
-        break;
+        bs = bdrv_primary_bs(bs);
     }
 
     if (bs && bs->drv && bs->drv->bdrv_debug_breakpoint) {
@@ -5607,7 +5597,7 @@ int bdrv_debug_remove_breakpoint(BlockDriverState *bs, const char *tag)
 int bdrv_debug_resume(BlockDriverState *bs, const char *tag)
 {
     while (bs && (!bs->drv || !bs->drv->bdrv_debug_resume)) {
-        bs = bs->file ? bs->file->bs : NULL;
+        bs = bdrv_primary_bs(bs);
     }
 
     if (bs && bs->drv && bs->drv->bdrv_debug_resume) {
@@ -5620,7 +5610,7 @@ int bdrv_debug_resume(BlockDriverState *bs, const char *tag)
 bool bdrv_debug_is_suspended(BlockDriverState *bs, const char *tag)
 {
     while (bs && bs->drv && !bs->drv->bdrv_debug_is_suspended) {
-        bs = bs->file ? bs->file->bs : NULL;
+        bs = bdrv_primary_bs(bs);
     }
 
     if (bs && bs->drv && bs->drv->bdrv_debug_is_suspended) {
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 173+ messages in thread

* [PATCH v7 25/47] block: Def. impl.s for get_allocated_file_size
  2020-06-25 15:21 [PATCH v7 00/47] block: Deal with filters Max Reitz
                   ` (23 preceding siblings ...)
  2020-06-25 15:21 ` [PATCH v7 24/47] block: Use CAFs for debug breakpoints Max Reitz
@ 2020-06-25 15:21 ` Max Reitz
  2020-07-15 22:56   ` Andrey Shinkevich
  2020-08-19 10:57   ` Kevin Wolf
  2020-06-25 15:21 ` [PATCH v7 26/47] block: Improve get_allocated_file_size's default Max Reitz
                   ` (23 subsequent siblings)
  48 siblings, 2 replies; 173+ messages in thread
From: Max Reitz @ 2020-06-25 15:21 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, qemu-devel, Max Reitz

If every BlockDriver were to implement bdrv_get_allocated_file_size(),
there are basically three ways it would be handled:
(1) For protocol drivers: Figure out the actual allocated file size in
    some protocol-specific way
(2) For protocol drivers: If that is not possible (or we just have not
    bothered to implement it yet), return -ENOTSUP
(3) For drivers with children: Return the sum of some or all their
    children's sizes

For the drivers we have, case (3) boils down to either:
(a) The sum of all children's sizes
(b) The size of the primary child

(2), (3a) and (3b) can be implemented generically, so this patch adds
such generic implementations for drivers to use.

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 include/block/block_int.h |  5 ++++
 block.c                   | 51 +++++++++++++++++++++++++++++++++++++++
 2 files changed, 56 insertions(+)

diff --git a/include/block/block_int.h b/include/block/block_int.h
index 5da793bfc3..c963ee9f28 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -1318,6 +1318,11 @@ int coroutine_fn bdrv_co_block_status_from_backing(BlockDriverState *bs,
                                                    int64_t *pnum,
                                                    int64_t *map,
                                                    BlockDriverState **file);
+
+int64_t bdrv_sum_allocated_file_size(BlockDriverState *bs);
+int64_t bdrv_primary_allocated_file_size(BlockDriverState *bs);
+int64_t bdrv_notsup_allocated_file_size(BlockDriverState *bs);
+
 const char *bdrv_get_parent_name(const BlockDriverState *bs);
 void blk_dev_change_media_cb(BlockBackend *blk, bool load, Error **errp);
 bool blk_dev_has_removable_media(BlockBackend *blk);
diff --git a/block.c b/block.c
index 1c71ecab7c..fc01ce90b3 100644
--- a/block.c
+++ b/block.c
@@ -5003,6 +5003,57 @@ int64_t bdrv_get_allocated_file_size(BlockDriverState *bs)
     return -ENOTSUP;
 }
 
+/**
+ * Implementation of BlockDriver.bdrv_get_allocated_file_size() for
+ * block drivers that want it to sum all children they store data on.
+ * (This excludes backing children.)
+ */
+int64_t bdrv_sum_allocated_file_size(BlockDriverState *bs)
+{
+    BdrvChild *child;
+    int64_t child_size, sum = 0;
+
+    QLIST_FOREACH(child, &bs->children, next) {
+        if (child->role & (BDRV_CHILD_DATA | BDRV_CHILD_METADATA |
+                           BDRV_CHILD_FILTERED))
+        {
+            child_size = bdrv_get_allocated_file_size(child->bs);
+            if (child_size < 0) {
+                return child_size;
+            }
+            sum += child_size;
+        }
+    }
+
+    return sum;
+}
+
+/**
+ * Implementation of BlockDriver.bdrv_get_allocated_file_size() for
+ * block drivers that want it to return only the size of a node's
+ * primary child.
+ */
+int64_t bdrv_primary_allocated_file_size(BlockDriverState *bs)
+{
+    BlockDriverState *primary_bs;
+
+    primary_bs = bdrv_primary_bs(bs);
+    if (!primary_bs) {
+        return -ENOTSUP;
+    }
+
+    return bdrv_get_allocated_file_size(primary_bs);
+}
+
+/**
+ * Implementation of BlockDriver.bdrv_get_allocated_file_size() for
+ * protocol block drivers that just do not support it.
+ */
+int64_t bdrv_notsup_allocated_file_size(BlockDriverState *bs)
+{
+    return -ENOTSUP;
+}
+
 /*
  * bdrv_measure:
  * @drv: Format driver
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 173+ messages in thread

* [PATCH v7 26/47] block: Improve get_allocated_file_size's default
  2020-06-25 15:21 [PATCH v7 00/47] block: Deal with filters Max Reitz
                   ` (24 preceding siblings ...)
  2020-06-25 15:21 ` [PATCH v7 25/47] block: Def. impl.s for get_allocated_file_size Max Reitz
@ 2020-06-25 15:21 ` Max Reitz
  2020-07-20 15:12   ` Andrey Shinkevich
  2020-06-25 15:21 ` [PATCH v7 27/47] blkverify: Use bdrv_sum_allocated_file_size() Max Reitz
                   ` (22 subsequent siblings)
  48 siblings, 1 reply; 173+ messages in thread
From: Max Reitz @ 2020-06-25 15:21 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, qemu-devel, Max Reitz

There are two practical problems with bdrv_get_allocated_file_size()'s
default right now:
(1) For drivers with children, we should generally sum all their sizes
    instead of just passing the request through to bs->file.  The latter
    is good for filters, but not so much for format drivers.

(2) Filters need not have bs->file, so we should actually go to the
    filtered child instead of hard-coding bs->file.

And we can make the whole default implementation more idiomatic by using
the three generic functions added by the previous patch.

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 block.c | 17 ++++++++++++++---
 1 file changed, 14 insertions(+), 3 deletions(-)

diff --git a/block.c b/block.c
index fc01ce90b3..a19f243997 100644
--- a/block.c
+++ b/block.c
@@ -4997,10 +4997,21 @@ int64_t bdrv_get_allocated_file_size(BlockDriverState *bs)
     if (drv->bdrv_get_allocated_file_size) {
         return drv->bdrv_get_allocated_file_size(bs);
     }
-    if (bs->file) {
-        return bdrv_get_allocated_file_size(bs->file->bs);
+
+    if (drv->bdrv_file_open) {
+        /*
+         * Protocol drivers default to -ENOTSUP (most of their data is
+         * not stored in any of their children (if they even have any),
+         * so there is no generic way to figure it out).
+         */
+        return bdrv_notsup_allocated_file_size(bs);
+    } else if (drv->is_filter) {
+        /* Filter drivers default to the size of their primary child */
+        return bdrv_primary_allocated_file_size(bs);
+    } else {
+        /* Other drivers default to summing their children's sizes */
+        return bdrv_sum_allocated_file_size(bs);
     }
-    return -ENOTSUP;
 }
 
 /**
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 173+ messages in thread

* [PATCH v7 27/47] blkverify: Use bdrv_sum_allocated_file_size()
  2020-06-25 15:21 [PATCH v7 00/47] block: Deal with filters Max Reitz
                   ` (25 preceding siblings ...)
  2020-06-25 15:21 ` [PATCH v7 26/47] block: Improve get_allocated_file_size's default Max Reitz
@ 2020-06-25 15:21 ` Max Reitz
  2020-07-20 15:10   ` Andrey Shinkevich
  2020-08-19 10:46   ` Kevin Wolf
  2020-06-25 15:21 ` [PATCH v7 28/47] block/null: Implement bdrv_get_allocated_file_size Max Reitz
                   ` (21 subsequent siblings)
  48 siblings, 2 replies; 173+ messages in thread
From: Max Reitz @ 2020-06-25 15:21 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, qemu-devel, Max Reitz

blkverify is a filter, so bdrv_get_allocated_file_size()'s default
implementation will return only the size of its filtered child.
However, because both of its children are disk images, it makes more
sense to sum both of their allocated sizes.

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 block/blkverify.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/block/blkverify.c b/block/blkverify.c
index 2f261de24b..64858c8df0 100644
--- a/block/blkverify.c
+++ b/block/blkverify.c
@@ -323,6 +323,7 @@ static BlockDriver bdrv_blkverify = {
     .bdrv_getlength                   = blkverify_getlength,
     .bdrv_refresh_filename            = blkverify_refresh_filename,
     .bdrv_dirname                     = blkverify_dirname,
+    .bdrv_get_allocated_file_size     = bdrv_sum_allocated_file_size,
 
     .bdrv_co_preadv                   = blkverify_co_preadv,
     .bdrv_co_pwritev                  = blkverify_co_pwritev,
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 173+ messages in thread

* [PATCH v7 28/47] block/null: Implement bdrv_get_allocated_file_size
  2020-06-25 15:21 [PATCH v7 00/47] block: Deal with filters Max Reitz
                   ` (26 preceding siblings ...)
  2020-06-25 15:21 ` [PATCH v7 27/47] blkverify: Use bdrv_sum_allocated_file_size() Max Reitz
@ 2020-06-25 15:21 ` Max Reitz
  2020-07-20 15:10   ` Andrey Shinkevich
  2020-06-25 15:21 ` [PATCH v7 29/47] blockdev: Use CAF in external_snapshot_prepare() Max Reitz
                   ` (20 subsequent siblings)
  48 siblings, 1 reply; 173+ messages in thread
From: Max Reitz @ 2020-06-25 15:21 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, qemu-devel, Max Reitz

It is trivial, so we might as well do it.

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 block/null.c               | 7 +++++++
 tests/qemu-iotests/153.out | 2 +-
 tests/qemu-iotests/184.out | 6 ++++--
 3 files changed, 12 insertions(+), 3 deletions(-)

diff --git a/block/null.c b/block/null.c
index 15e1d56746..cc9b1d4ea7 100644
--- a/block/null.c
+++ b/block/null.c
@@ -262,6 +262,11 @@ static void null_refresh_filename(BlockDriverState *bs)
              bs->drv->format_name);
 }
 
+static int64_t null_allocated_file_size(BlockDriverState *bs)
+{
+    return 0;
+}
+
 static const char *const null_strong_runtime_opts[] = {
     BLOCK_OPT_SIZE,
     NULL_OPT_ZEROES,
@@ -277,6 +282,7 @@ static BlockDriver bdrv_null_co = {
     .bdrv_file_open         = null_file_open,
     .bdrv_parse_filename    = null_co_parse_filename,
     .bdrv_getlength         = null_getlength,
+    .bdrv_get_allocated_file_size = null_allocated_file_size,
 
     .bdrv_co_preadv         = null_co_preadv,
     .bdrv_co_pwritev        = null_co_pwritev,
@@ -297,6 +303,7 @@ static BlockDriver bdrv_null_aio = {
     .bdrv_file_open         = null_file_open,
     .bdrv_parse_filename    = null_aio_parse_filename,
     .bdrv_getlength         = null_getlength,
+    .bdrv_get_allocated_file_size = null_allocated_file_size,
 
     .bdrv_aio_preadv        = null_aio_preadv,
     .bdrv_aio_pwritev       = null_aio_pwritev,
diff --git a/tests/qemu-iotests/153.out b/tests/qemu-iotests/153.out
index b2a90caa6b..8659e6463b 100644
--- a/tests/qemu-iotests/153.out
+++ b/tests/qemu-iotests/153.out
@@ -461,7 +461,7 @@ No conflict:
 image: null-co://
 file format: null-co
 virtual size: 1 GiB (1073741824 bytes)
-disk size: unavailable
+disk size: 0 B
 
 Conflict:
 qemu-img: --force-share/-U conflicts with image options
diff --git a/tests/qemu-iotests/184.out b/tests/qemu-iotests/184.out
index 3deb3cfb94..28b104da89 100644
--- a/tests/qemu-iotests/184.out
+++ b/tests/qemu-iotests/184.out
@@ -29,7 +29,8 @@ Testing:
             "image": {
                 "virtual-size": 1073741824,
                 "filename": "json:{\"throttle-group\": \"group0\", \"driver\": \"throttle\", \"file\": {\"driver\": \"null-co\"}}",
-                "format": "throttle"
+                "format": "throttle",
+                "actual-size": SIZE
             },
             "iops_wr": 0,
             "ro": false,
@@ -56,7 +57,8 @@ Testing:
             "image": {
                 "virtual-size": 1073741824,
                 "filename": "null-co://",
-                "format": "null-co"
+                "format": "null-co",
+                "actual-size": SIZE
             },
             "iops_wr": 0,
             "ro": false,
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 173+ messages in thread

* [PATCH v7 29/47] blockdev: Use CAF in external_snapshot_prepare()
  2020-06-25 15:21 [PATCH v7 00/47] block: Deal with filters Max Reitz
                   ` (27 preceding siblings ...)
  2020-06-25 15:21 ` [PATCH v7 28/47] block/null: Implement bdrv_get_allocated_file_size Max Reitz
@ 2020-06-25 15:21 ` Max Reitz
  2020-07-20 16:08   ` Andrey Shinkevich
  2020-06-25 15:21 ` [PATCH v7 30/47] block: Report data child for query-blockstats Max Reitz
                   ` (19 subsequent siblings)
  48 siblings, 1 reply; 173+ messages in thread
From: Max Reitz @ 2020-06-25 15:21 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, qemu-devel, Max Reitz

This allows us to differentiate between filters and nodes with COW
backing files: Filters cannot be used as overlays at all (for this
function).

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 blockdev.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/blockdev.c b/blockdev.c
index 1eb0fcdea2..aabe51036d 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -1549,7 +1549,12 @@ static void external_snapshot_prepare(BlkActionState *common,
         goto out;
     }
 
-    if (state->new_bs->backing != NULL) {
+    if (state->new_bs->drv->is_filter) {
+        error_setg(errp, "Filters cannot be used as overlays");
+        goto out;
+    }
+
+    if (bdrv_cow_child(state->new_bs)) {
         error_setg(errp, "The overlay already has a backing image");
         goto out;
     }
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 173+ messages in thread

* [PATCH v7 30/47] block: Report data child for query-blockstats
  2020-06-25 15:21 [PATCH v7 00/47] block: Deal with filters Max Reitz
                   ` (28 preceding siblings ...)
  2020-06-25 15:21 ` [PATCH v7 29/47] blockdev: Use CAF in external_snapshot_prepare() Max Reitz
@ 2020-06-25 15:21 ` Max Reitz
  2020-07-21 11:48   ` Andrey Shinkevich
  2020-06-25 15:21 ` [PATCH v7 31/47] block: Use child access functions for QAPI queries Max Reitz
                   ` (18 subsequent siblings)
  48 siblings, 1 reply; 173+ messages in thread
From: Max Reitz @ 2020-06-25 15:21 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, qemu-devel, Max Reitz

It makes no sense to report the block stats of a purely metadata-storing
child in query-blockstats.  So if the primary child does not have any
data, try to find a unique data-storing child.

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 block/qapi.c | 31 +++++++++++++++++++++++++++++--
 1 file changed, 29 insertions(+), 2 deletions(-)

diff --git a/block/qapi.c b/block/qapi.c
index 4807a2b344..c57b42d86d 100644
--- a/block/qapi.c
+++ b/block/qapi.c
@@ -526,6 +526,7 @@ static void bdrv_query_blk_stats(BlockDeviceStats *ds, BlockBackend *blk)
 static BlockStats *bdrv_query_bds_stats(BlockDriverState *bs,
                                         bool blk_level)
 {
+    BdrvChild *parent_child;
     BlockStats *s = NULL;
 
     s = g_malloc0(sizeof(*s));
@@ -555,9 +556,35 @@ static BlockStats *bdrv_query_bds_stats(BlockDriverState *bs,
         s->has_driver_specific = true;
     }
 
-    if (bs->file) {
+    parent_child = bdrv_primary_child(bs);
+    if (!parent_child ||
+        !(parent_child->role & (BDRV_CHILD_DATA | BDRV_CHILD_FILTERED)))
+    {
+        BdrvChild *c;
+
+        /*
+         * Look for a unique data-storing child.  We do not need to look for
+         * filtered children, as there would be only one and it would have been
+         * the primary child.
+         */
+        parent_child = NULL;
+        QLIST_FOREACH(c, &bs->children, next) {
+            if (c->role & BDRV_CHILD_DATA) {
+                if (parent_child) {
+                    /*
+                     * There are multiple data-storing children and we cannot
+                     * choose between them.
+                     */
+                    parent_child = NULL;
+                    break;
+                }
+                parent_child = c;
+            }
+        }
+    }
+    if (parent_child) {
         s->has_parent = true;
-        s->parent = bdrv_query_bds_stats(bs->file->bs, blk_level);
+        s->parent = bdrv_query_bds_stats(parent_child->bs, blk_level);
     }
 
     if (blk_level && bs->backing) {
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 173+ messages in thread

* [PATCH v7 31/47] block: Use child access functions for QAPI queries
  2020-06-25 15:21 [PATCH v7 00/47] block: Deal with filters Max Reitz
                   ` (29 preceding siblings ...)
  2020-06-25 15:21 ` [PATCH v7 30/47] block: Report data child for query-blockstats Max Reitz
@ 2020-06-25 15:21 ` Max Reitz
  2020-07-21 12:30   ` Andrey Shinkevich
  2020-06-25 15:22 ` [PATCH v7 32/47] block-copy: Use CAF to find sync=top base Max Reitz
                   ` (17 subsequent siblings)
  48 siblings, 1 reply; 173+ messages in thread
From: Max Reitz @ 2020-06-25 15:21 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, qemu-devel, Max Reitz

query-block, query-named-block-nodes, and query-blockstats now return
any filtered child under "backing", not just bs->backing or COW
children.  This is so that filters do not interrupt the reported backing
chain.  This changes the output for iotest 184, as the throttled node
now appears as a backing child.

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 block/qapi.c               | 33 ++++++++++++++++++++-------------
 tests/qemu-iotests/184.out |  8 +++++++-
 2 files changed, 27 insertions(+), 14 deletions(-)

diff --git a/block/qapi.c b/block/qapi.c
index c57b42d86d..2628323b63 100644
--- a/block/qapi.c
+++ b/block/qapi.c
@@ -163,9 +163,13 @@ BlockDeviceInfo *bdrv_block_device_info(BlockBackend *blk,
             break;
         }
 
-        if (bs0->drv && bs0->backing) {
+        if (bs0->drv && bdrv_filter_or_cow_child(bs0)) {
+            /*
+             * Put any filtered child here (for backwards compatibility to when
+             * we put bs0->backing here, which might be any filtered child).
+             */
             info->backing_file_depth++;
-            bs0 = bs0->backing->bs;
+            bs0 = bdrv_filter_or_cow_bs(bs0);
             (*p_image_info)->has_backing_image = true;
             p_image_info = &((*p_image_info)->backing_image);
         } else {
@@ -174,9 +178,8 @@ BlockDeviceInfo *bdrv_block_device_info(BlockBackend *blk,
 
         /* Skip automatically inserted nodes that the user isn't aware of for
          * query-block (blk != NULL), but not for query-named-block-nodes */
-        while (blk && bs0->drv && bs0->implicit) {
-            bs0 = backing_bs(bs0);
-            assert(bs0);
+        if (blk) {
+            bs0 = bdrv_skip_implicit_filters(bs0);
         }
     }
 
@@ -362,9 +365,7 @@ static void bdrv_query_info(BlockBackend *blk, BlockInfo **p_info,
     char *qdev;
 
     /* Skip automatically inserted nodes that the user isn't aware of */
-    while (bs && bs->drv && bs->implicit) {
-        bs = backing_bs(bs);
-    }
+    bs = bdrv_skip_implicit_filters(bs);
 
     info->device = g_strdup(blk_name(blk));
     info->type = g_strdup("unknown");
@@ -527,6 +528,7 @@ static BlockStats *bdrv_query_bds_stats(BlockDriverState *bs,
                                         bool blk_level)
 {
     BdrvChild *parent_child;
+    BlockDriverState *filter_or_cow_bs;
     BlockStats *s = NULL;
 
     s = g_malloc0(sizeof(*s));
@@ -539,9 +541,8 @@ static BlockStats *bdrv_query_bds_stats(BlockDriverState *bs,
     /* Skip automatically inserted nodes that the user isn't aware of in
      * a BlockBackend-level command. Stay at the exact node for a node-level
      * command. */
-    while (blk_level && bs->drv && bs->implicit) {
-        bs = backing_bs(bs);
-        assert(bs);
+    if (blk_level) {
+        bs = bdrv_skip_implicit_filters(bs);
     }
 
     if (bdrv_get_node_name(bs)[0]) {
@@ -587,9 +588,15 @@ static BlockStats *bdrv_query_bds_stats(BlockDriverState *bs,
         s->parent = bdrv_query_bds_stats(parent_child->bs, blk_level);
     }
 
-    if (blk_level && bs->backing) {
+    filter_or_cow_bs = bdrv_filter_or_cow_bs(bs);
+    if (blk_level && filter_or_cow_bs) {
+        /*
+         * Put any filtered or COW child here (for backwards
+         * compatibility to when we put bs0->backing here, which might
+         * be either)
+         */
         s->has_backing = true;
-        s->backing = bdrv_query_bds_stats(bs->backing->bs, blk_level);
+        s->backing = bdrv_query_bds_stats(filter_or_cow_bs, blk_level);
     }
 
     return s;
diff --git a/tests/qemu-iotests/184.out b/tests/qemu-iotests/184.out
index 28b104da89..4e92fcfb51 100644
--- a/tests/qemu-iotests/184.out
+++ b/tests/qemu-iotests/184.out
@@ -27,6 +27,12 @@ Testing:
             "iops_rd": 0,
             "detect_zeroes": "off",
             "image": {
+                "backing-image": {
+                    "virtual-size": 1073741824,
+                    "filename": "null-co://",
+                    "format": "null-co",
+                    "actual-size": SIZE
+                },
                 "virtual-size": 1073741824,
                 "filename": "json:{\"throttle-group\": \"group0\", \"driver\": \"throttle\", \"file\": {\"driver\": \"null-co\"}}",
                 "format": "throttle",
@@ -35,7 +41,7 @@ Testing:
             "iops_wr": 0,
             "ro": false,
             "node-name": "throttle0",
-            "backing_file_depth": 0,
+            "backing_file_depth": 1,
             "drv": "throttle",
             "iops": 0,
             "bps_wr": 0,
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 173+ messages in thread

* [PATCH v7 32/47] block-copy: Use CAF to find sync=top base
  2020-06-25 15:21 [PATCH v7 00/47] block: Deal with filters Max Reitz
                   ` (30 preceding siblings ...)
  2020-06-25 15:21 ` [PATCH v7 31/47] block: Use child access functions for QAPI queries Max Reitz
@ 2020-06-25 15:22 ` Max Reitz
  2020-07-21 12:42   ` Andrey Shinkevich
  2020-06-25 15:22 ` [PATCH v7 33/47] mirror: Deal with filters Max Reitz
                   ` (16 subsequent siblings)
  48 siblings, 1 reply; 173+ messages in thread
From: Max Reitz @ 2020-06-25 15:22 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, qemu-devel, Max Reitz

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 block/block-copy.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/block/block-copy.c b/block/block-copy.c
index f7428a7c08..5e80569bb8 100644
--- a/block/block-copy.c
+++ b/block/block-copy.c
@@ -437,8 +437,8 @@ static int block_copy_block_status(BlockCopyState *s, int64_t offset,
     BlockDriverState *base;
     int ret;
 
-    if (s->skip_unallocated && s->source->bs->backing) {
-        base = s->source->bs->backing->bs;
+    if (s->skip_unallocated) {
+        base = bdrv_backing_chain_next(s->source->bs);
     } else {
         base = NULL;
     }
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 173+ messages in thread

* [PATCH v7 33/47] mirror: Deal with filters
  2020-06-25 15:21 [PATCH v7 00/47] block: Deal with filters Max Reitz
                   ` (31 preceding siblings ...)
  2020-06-25 15:22 ` [PATCH v7 32/47] block-copy: Use CAF to find sync=top base Max Reitz
@ 2020-06-25 15:22 ` Max Reitz
  2020-07-22 18:31   ` Andrey Shinkevich
  2020-08-19 16:50   ` Kevin Wolf
  2020-06-25 15:22 ` [PATCH v7 34/47] backup: " Max Reitz
                   ` (15 subsequent siblings)
  48 siblings, 2 replies; 173+ messages in thread
From: Max Reitz @ 2020-06-25 15:22 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, qemu-devel, Max Reitz

This includes some permission limiting (for example, we only need to
take the RESIZE permission for active commits where the base is smaller
than the top).

Use this opportunity to rename qmp_drive_mirror()'s "source" BDS to
"target_backing_bs", because that is what it really refers to.

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 qapi/block-core.json |   6 ++-
 block/mirror.c       | 118 +++++++++++++++++++++++++++++++++----------
 blockdev.c           |  36 +++++++++----
 3 files changed, 121 insertions(+), 39 deletions(-)

diff --git a/qapi/block-core.json b/qapi/block-core.json
index df87855429..0b8ccd30aa 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -1943,7 +1943,8 @@
 #
 # @replaces: with sync=full graph node name to be replaced by the new
 #            image when a whole image copy is done. This can be used to repair
-#            broken Quorum files. (Since 2.1)
+#            broken Quorum files.  By default, @device is replaced, although
+#            implicitly created filters on it are kept. (Since 2.1)
 #
 # @mode: whether and how QEMU should create a new image, default is
 #        'absolute-paths'.
@@ -2254,7 +2255,8 @@
 #
 # @replaces: with sync=full graph node name to be replaced by the new
 #            image when a whole image copy is done. This can be used to repair
-#            broken Quorum files.
+#            broken Quorum files.  By default, @device is replaced, although
+#            implicitly created filters on it are kept.
 #
 # @speed:  the maximum speed, in bytes per second
 #
diff --git a/block/mirror.c b/block/mirror.c
index 469acf4600..770de3b34e 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -42,6 +42,7 @@ typedef struct MirrorBlockJob {
     BlockBackend *target;
     BlockDriverState *mirror_top_bs;
     BlockDriverState *base;
+    BlockDriverState *base_overlay;
 
     /* The name of the graph node to replace */
     char *replaces;
@@ -677,8 +678,10 @@ static int mirror_exit_common(Job *job)
                              &error_abort);
     if (!abort && s->backing_mode == MIRROR_SOURCE_BACKING_CHAIN) {
         BlockDriverState *backing = s->is_none_mode ? src : s->base;
-        if (backing_bs(target_bs) != backing) {
-            bdrv_set_backing_hd(target_bs, backing, &local_err);
+        BlockDriverState *unfiltered_target = bdrv_skip_filters(target_bs);
+
+        if (bdrv_cow_bs(unfiltered_target) != backing) {
+            bdrv_set_backing_hd(unfiltered_target, backing, &local_err);
             if (local_err) {
                 error_report_err(local_err);
                 local_err = NULL;
@@ -740,7 +743,7 @@ static int mirror_exit_common(Job *job)
      * valid.
      */
     block_job_remove_all_bdrv(bjob);
-    bdrv_replace_node(mirror_top_bs, backing_bs(mirror_top_bs), &error_abort);
+    bdrv_replace_node(mirror_top_bs, mirror_top_bs->backing->bs, &error_abort);
 
     /* We just changed the BDS the job BB refers to (with either or both of the
      * bdrv_replace_node() calls), so switch the BB back so the cleanup does
@@ -786,7 +789,6 @@ static void coroutine_fn mirror_throttle(MirrorBlockJob *s)
 static int coroutine_fn mirror_dirty_init(MirrorBlockJob *s)
 {
     int64_t offset;
-    BlockDriverState *base = s->base;
     BlockDriverState *bs = s->mirror_top_bs->backing->bs;
     BlockDriverState *target_bs = blk_bs(s->target);
     int ret;
@@ -837,7 +839,8 @@ static int coroutine_fn mirror_dirty_init(MirrorBlockJob *s)
             return 0;
         }
 
-        ret = bdrv_is_allocated_above(bs, base, false, offset, bytes, &count);
+        ret = bdrv_is_allocated_above(bs, s->base_overlay, true, offset, bytes,
+                                      &count);
         if (ret < 0) {
             return ret;
         }
@@ -936,7 +939,7 @@ static int coroutine_fn mirror_run(Job *job, Error **errp)
     } else {
         s->target_cluster_size = BDRV_SECTOR_SIZE;
     }
-    if (backing_filename[0] && !target_bs->backing &&
+    if (backing_filename[0] && !bdrv_backing_chain_next(target_bs) &&
         s->granularity < s->target_cluster_size) {
         s->buf_size = MAX(s->buf_size, s->target_cluster_size);
         s->cow_bitmap = bitmap_new(length);
@@ -1116,8 +1119,9 @@ static void mirror_complete(Job *job, Error **errp)
     if (s->backing_mode == MIRROR_OPEN_BACKING_CHAIN) {
         int ret;
 
-        assert(!target->backing);
-        ret = bdrv_open_backing_file(target, NULL, "backing", errp);
+        assert(!bdrv_backing_chain_next(target));
+        ret = bdrv_open_backing_file(bdrv_skip_filters(target), NULL,
+                                     "backing", errp);
         if (ret < 0) {
             return;
         }
@@ -1565,8 +1569,8 @@ static BlockJob *mirror_start_job(
     MirrorBlockJob *s;
     MirrorBDSOpaque *bs_opaque;
     BlockDriverState *mirror_top_bs;
-    bool target_graph_mod;
     bool target_is_backing;
+    uint64_t target_perms, target_shared_perms;
     Error *local_err = NULL;
     int ret;
 
@@ -1585,7 +1589,7 @@ static BlockJob *mirror_start_job(
         buf_size = DEFAULT_MIRROR_BUF_SIZE;
     }
 
-    if (bs == target) {
+    if (bdrv_skip_filters(bs) == bdrv_skip_filters(target)) {
         error_setg(errp, "Can't mirror node into itself");
         return NULL;
     }
@@ -1649,15 +1653,50 @@ static BlockJob *mirror_start_job(
      * In the case of active commit, things look a bit different, though,
      * because the target is an already populated backing file in active use.
      * We can allow anything except resize there.*/
+
+    target_perms = BLK_PERM_WRITE;
+    target_shared_perms = BLK_PERM_WRITE_UNCHANGED;
+
     target_is_backing = bdrv_chain_contains(bs, target);
-    target_graph_mod = (backing_mode != MIRROR_LEAVE_BACKING_CHAIN);
+    if (target_is_backing) {
+        int64_t bs_size, target_size;
+        bs_size = bdrv_getlength(bs);
+        if (bs_size < 0) {
+            error_setg_errno(errp, -bs_size,
+                             "Could not inquire top image size");
+            goto fail;
+        }
+
+        target_size = bdrv_getlength(target);
+        if (target_size < 0) {
+            error_setg_errno(errp, -target_size,
+                             "Could not inquire base image size");
+            goto fail;
+        }
+
+        if (target_size < bs_size) {
+            target_perms |= BLK_PERM_RESIZE;
+        }
+
+        target_shared_perms |= BLK_PERM_CONSISTENT_READ
+                            |  BLK_PERM_WRITE
+                            |  BLK_PERM_GRAPH_MOD;
+    } else if (bdrv_chain_contains(bs, bdrv_skip_filters(target))) {
+        /*
+         * We may want to allow this in the future, but it would
+         * require taking some extra care.
+         */
+        error_setg(errp, "Cannot mirror to a filter on top of a node in the "
+                   "source's backing chain");
+        goto fail;
+    }
+
+    if (backing_mode != MIRROR_LEAVE_BACKING_CHAIN) {
+        target_perms |= BLK_PERM_GRAPH_MOD;
+    }
+
     s->target = blk_new(s->common.job.aio_context,
-                        BLK_PERM_WRITE | BLK_PERM_RESIZE |
-                        (target_graph_mod ? BLK_PERM_GRAPH_MOD : 0),
-                        BLK_PERM_WRITE_UNCHANGED |
-                        (target_is_backing ? BLK_PERM_CONSISTENT_READ |
-                                             BLK_PERM_WRITE |
-                                             BLK_PERM_GRAPH_MOD : 0));
+                        target_perms, target_shared_perms);
     ret = blk_insert_bs(s->target, target, errp);
     if (ret < 0) {
         goto fail;
@@ -1682,6 +1721,7 @@ static BlockJob *mirror_start_job(
     s->zero_target = zero_target;
     s->copy_mode = copy_mode;
     s->base = base;
+    s->base_overlay = bdrv_find_overlay(bs, base);
     s->granularity = granularity;
     s->buf_size = ROUND_UP(buf_size, granularity);
     s->unmap = unmap;
@@ -1712,15 +1752,39 @@ static BlockJob *mirror_start_job(
     /* In commit_active_start() all intermediate nodes disappear, so
      * any jobs in them must be blocked */
     if (target_is_backing) {
-        BlockDriverState *iter;
-        for (iter = backing_bs(bs); iter != target; iter = backing_bs(iter)) {
-            /* XXX BLK_PERM_WRITE needs to be allowed so we don't block
-             * ourselves at s->base (if writes are blocked for a node, they are
-             * also blocked for its backing file). The other options would be a
-             * second filter driver above s->base (== target). */
+        BlockDriverState *iter, *filtered_target;
+        uint64_t iter_shared_perms;
+
+        /*
+         * The topmost node with
+         * bdrv_skip_filters(filtered_target) == bdrv_skip_filters(target)
+         */
+        filtered_target = bdrv_cow_bs(bdrv_find_overlay(bs, target));
+
+        assert(bdrv_skip_filters(filtered_target) ==
+               bdrv_skip_filters(target));
+
+        /*
+         * XXX BLK_PERM_WRITE needs to be allowed so we don't block
+         * ourselves at s->base (if writes are blocked for a node, they are
+         * also blocked for its backing file). The other options would be a
+         * second filter driver above s->base (== target).
+         */
+        iter_shared_perms = BLK_PERM_WRITE_UNCHANGED | BLK_PERM_WRITE;
+
+        for (iter = bdrv_filter_or_cow_bs(bs); iter != target;
+             iter = bdrv_filter_or_cow_bs(iter))
+        {
+            if (iter == filtered_target) {
+                /*
+                 * From here on, all nodes are filters on the base.
+                 * This allows us to share BLK_PERM_CONSISTENT_READ.
+                 */
+                iter_shared_perms |= BLK_PERM_CONSISTENT_READ;
+            }
+
             ret = block_job_add_bdrv(&s->common, "intermediate node", iter, 0,
-                                     BLK_PERM_WRITE_UNCHANGED | BLK_PERM_WRITE,
-                                     errp);
+                                     iter_shared_perms, errp);
             if (ret < 0) {
                 goto fail;
             }
@@ -1756,7 +1820,7 @@ fail:
     bs_opaque->stop = true;
     bdrv_child_refresh_perms(mirror_top_bs, mirror_top_bs->backing,
                              &error_abort);
-    bdrv_replace_node(mirror_top_bs, backing_bs(mirror_top_bs), &error_abort);
+    bdrv_replace_node(mirror_top_bs, mirror_top_bs->backing->bs, &error_abort);
 
     bdrv_unref(mirror_top_bs);
 
@@ -1784,7 +1848,7 @@ void mirror_start(const char *job_id, BlockDriverState *bs,
         return;
     }
     is_none_mode = mode == MIRROR_SYNC_MODE_NONE;
-    base = mode == MIRROR_SYNC_MODE_TOP ? backing_bs(bs) : NULL;
+    base = mode == MIRROR_SYNC_MODE_TOP ? bdrv_backing_chain_next(bs) : NULL;
     mirror_start_job(job_id, bs, creation_flags, target, replaces,
                      speed, granularity, buf_size, backing_mode, zero_target,
                      on_source_error, on_target_error, unmap, NULL, NULL,
diff --git a/blockdev.c b/blockdev.c
index aabe51036d..6ca9b19779 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -2886,6 +2886,7 @@ static void blockdev_mirror_common(const char *job_id, BlockDriverState *bs,
                                    bool has_auto_dismiss, bool auto_dismiss,
                                    Error **errp)
 {
+    BlockDriverState *unfiltered_bs;
     int job_flags = JOB_DEFAULT;
 
     if (!has_speed) {
@@ -2937,10 +2938,19 @@ static void blockdev_mirror_common(const char *job_id, BlockDriverState *bs,
         return;
     }
 
-    if (!bs->backing && sync == MIRROR_SYNC_MODE_TOP) {
+    if (!bdrv_backing_chain_next(bs) && sync == MIRROR_SYNC_MODE_TOP) {
         sync = MIRROR_SYNC_MODE_FULL;
     }
 
+    if (!has_replaces) {
+        /* We want to mirror from @bs, but keep implicit filters on top */
+        unfiltered_bs = bdrv_skip_implicit_filters(bs);
+        if (unfiltered_bs != bs) {
+            replaces = unfiltered_bs->node_name;
+            has_replaces = true;
+        }
+    }
+
     if (has_replaces) {
         BlockDriverState *to_replace_bs;
         AioContext *replace_aio_context;
@@ -2987,7 +2997,7 @@ static void blockdev_mirror_common(const char *job_id, BlockDriverState *bs,
 void qmp_drive_mirror(DriveMirror *arg, Error **errp)
 {
     BlockDriverState *bs;
-    BlockDriverState *source, *target_bs;
+    BlockDriverState *target_backing_bs, *target_bs;
     AioContext *aio_context;
     AioContext *old_context;
     BlockMirrorBackingMode backing_mode;
@@ -2996,6 +3006,7 @@ void qmp_drive_mirror(DriveMirror *arg, Error **errp)
     int flags;
     int64_t size;
     const char *format = arg->format;
+    const char *replaces_node_name = NULL;
     bool zero_target;
     int ret;
 
@@ -3022,12 +3033,12 @@ void qmp_drive_mirror(DriveMirror *arg, Error **errp)
     }
 
     flags = bs->open_flags | BDRV_O_RDWR;
-    source = backing_bs(bs);
-    if (!source && arg->sync == MIRROR_SYNC_MODE_TOP) {
+    target_backing_bs = bdrv_cow_bs(bdrv_skip_filters(bs));
+    if (!target_backing_bs && arg->sync == MIRROR_SYNC_MODE_TOP) {
         arg->sync = MIRROR_SYNC_MODE_FULL;
     }
     if (arg->sync == MIRROR_SYNC_MODE_NONE) {
-        source = bs;
+        target_backing_bs = bs;
     }
 
     size = bdrv_getlength(bs);
@@ -3042,6 +3053,7 @@ void qmp_drive_mirror(DriveMirror *arg, Error **errp)
                              " named node of the graph");
             goto out;
         }
+        replaces_node_name = arg->replaces;
     }
 
     if (arg->mode == NEW_IMAGE_MODE_ABSOLUTE_PATHS) {
@@ -3053,7 +3065,7 @@ void qmp_drive_mirror(DriveMirror *arg, Error **errp)
     /* Don't open backing image in create() */
     flags |= BDRV_O_NO_BACKING;
 
-    if ((arg->sync == MIRROR_SYNC_MODE_FULL || !source)
+    if ((arg->sync == MIRROR_SYNC_MODE_FULL || !target_backing_bs)
         && arg->mode != NEW_IMAGE_MODE_EXISTING)
     {
         /* create new image w/o backing file */
@@ -3061,15 +3073,19 @@ void qmp_drive_mirror(DriveMirror *arg, Error **errp)
         bdrv_img_create(arg->target, format,
                         NULL, NULL, NULL, size, flags, false, &local_err);
     } else {
+        /* Implicit filters should not appear in the filename */
+        BlockDriverState *explicit_backing =
+            bdrv_skip_implicit_filters(target_backing_bs);
+
         switch (arg->mode) {
         case NEW_IMAGE_MODE_EXISTING:
             break;
         case NEW_IMAGE_MODE_ABSOLUTE_PATHS:
             /* create new image with backing file */
-            bdrv_refresh_filename(source);
+            bdrv_refresh_filename(explicit_backing);
             bdrv_img_create(arg->target, format,
-                            source->filename,
-                            source->drv->format_name,
+                            explicit_backing->filename,
+                            explicit_backing->drv->format_name,
                             NULL, size, flags, false, &local_err);
             break;
         default:
@@ -3119,7 +3135,7 @@ void qmp_drive_mirror(DriveMirror *arg, Error **errp)
     aio_context_acquire(aio_context);
 
     blockdev_mirror_common(arg->has_job_id ? arg->job_id : NULL, bs, target_bs,
-                           arg->has_replaces, arg->replaces, arg->sync,
+                           !!replaces_node_name, replaces_node_name, arg->sync,
                            backing_mode, zero_target,
                            arg->has_speed, arg->speed,
                            arg->has_granularity, arg->granularity,
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 173+ messages in thread

* [PATCH v7 34/47] backup: Deal with filters
  2020-06-25 15:21 [PATCH v7 00/47] block: Deal with filters Max Reitz
                   ` (32 preceding siblings ...)
  2020-06-25 15:22 ` [PATCH v7 33/47] mirror: Deal with filters Max Reitz
@ 2020-06-25 15:22 ` Max Reitz
  2020-07-23 15:51   ` Andrey Shinkevich
  2020-06-25 15:22 ` [PATCH v7 35/47] commit: " Max Reitz
                   ` (14 subsequent siblings)
  48 siblings, 1 reply; 173+ messages in thread
From: Max Reitz @ 2020-06-25 15:22 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, qemu-devel, Max Reitz

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 block/backup-top.c |  2 +-
 block/backup.c     |  9 +++++----
 blockdev.c         | 19 +++++++++++++++----
 3 files changed, 21 insertions(+), 9 deletions(-)

diff --git a/block/backup-top.c b/block/backup-top.c
index f304df8f26..89bd3937d0 100644
--- a/block/backup-top.c
+++ b/block/backup-top.c
@@ -291,7 +291,7 @@ void bdrv_backup_top_drop(BlockDriverState *bs)
 
     s->active = false;
     bdrv_child_refresh_perms(bs, bs->backing, &error_abort);
-    bdrv_replace_node(bs, backing_bs(bs), &error_abort);
+    bdrv_replace_node(bs, bs->backing->bs, &error_abort);
     bdrv_set_backing_hd(bs, NULL, &error_abort);
 
     bdrv_drained_end(bs);
diff --git a/block/backup.c b/block/backup.c
index 4f13bb20a5..9afa0bf3b4 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -297,6 +297,7 @@ static int64_t backup_calculate_cluster_size(BlockDriverState *target,
 {
     int ret;
     BlockDriverInfo bdi;
+    bool target_does_cow = bdrv_backing_chain_next(target);
 
     /*
      * If there is no backing file on the target, we cannot rely on COW if our
@@ -304,7 +305,7 @@ static int64_t backup_calculate_cluster_size(BlockDriverState *target,
      * targets with a backing file, try to avoid COW if possible.
      */
     ret = bdrv_get_info(target, &bdi);
-    if (ret == -ENOTSUP && !target->backing) {
+    if (ret == -ENOTSUP && !target_does_cow) {
         /* Cluster size is not defined */
         warn_report("The target block device doesn't provide "
                     "information about the block size and it doesn't have a "
@@ -313,14 +314,14 @@ static int64_t backup_calculate_cluster_size(BlockDriverState *target,
                     "this default, the backup may be unusable",
                     BACKUP_CLUSTER_SIZE_DEFAULT);
         return BACKUP_CLUSTER_SIZE_DEFAULT;
-    } else if (ret < 0 && !target->backing) {
+    } else if (ret < 0 && !target_does_cow) {
         error_setg_errno(errp, -ret,
             "Couldn't determine the cluster size of the target image, "
             "which has no backing file");
         error_append_hint(errp,
             "Aborting, since this may create an unusable destination image\n");
         return ret;
-    } else if (ret < 0 && target->backing) {
+    } else if (ret < 0 && target_does_cow) {
         /* Not fatal; just trudge on ahead. */
         return BACKUP_CLUSTER_SIZE_DEFAULT;
     }
@@ -371,7 +372,7 @@ BlockJob *backup_job_create(const char *job_id, BlockDriverState *bs,
         return NULL;
     }
 
-    if (compress && !block_driver_can_compress(target->drv)) {
+    if (compress && !bdrv_supports_compressed_writes(target)) {
         error_setg(errp, "Compression is not supported for this drive %s",
                    bdrv_get_device_name(target));
         return NULL;
diff --git a/blockdev.c b/blockdev.c
index 6ca9b19779..9ce99b9cbc 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -1728,7 +1728,13 @@ static void drive_backup_prepare(BlkActionState *common, Error **errp)
      * on top of.
      */
     if (backup->sync == MIRROR_SYNC_MODE_TOP) {
-        source = backing_bs(bs);
+        /*
+         * Backup will not replace the source by the target, so none
+         * of the filters skipped here will be removed (in contrast to
+         * mirror).  Therefore, we can skip all of them when looking
+         * for the first COW relationship.
+         */
+        source = bdrv_cow_bs(bdrv_skip_filters(bs));
         if (!source) {
             backup->sync = MIRROR_SYNC_MODE_FULL;
         }
@@ -1748,9 +1754,14 @@ static void drive_backup_prepare(BlkActionState *common, Error **errp)
     if (backup->mode != NEW_IMAGE_MODE_EXISTING) {
         assert(backup->format);
         if (source) {
-            bdrv_refresh_filename(source);
-            bdrv_img_create(backup->target, backup->format, source->filename,
-                            source->drv->format_name, NULL,
+            /* Implicit filters should not appear in the filename */
+            BlockDriverState *explicit_backing =
+                bdrv_skip_implicit_filters(source);
+
+            bdrv_refresh_filename(explicit_backing);
+            bdrv_img_create(backup->target, backup->format,
+                            explicit_backing->filename,
+                            explicit_backing->drv->format_name, NULL,
                             size, flags, false, &local_err);
         } else {
             bdrv_img_create(backup->target, backup->format, NULL, NULL, NULL,
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 173+ messages in thread

* [PATCH v7 35/47] commit: Deal with filters
  2020-06-25 15:21 [PATCH v7 00/47] block: Deal with filters Max Reitz
                   ` (33 preceding siblings ...)
  2020-06-25 15:22 ` [PATCH v7 34/47] backup: " Max Reitz
@ 2020-06-25 15:22 ` Max Reitz
  2020-07-23 17:15   ` Andrey Shinkevich
  2020-08-19 17:58   ` Kevin Wolf
  2020-06-25 15:22 ` [PATCH v7 36/47] nbd: Use CAF when looking for dirty bitmap Max Reitz
                   ` (13 subsequent siblings)
  48 siblings, 2 replies; 173+ messages in thread
From: Max Reitz @ 2020-06-25 15:22 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, qemu-devel, Max Reitz

This includes some permission limiting (for example, we only need to
take the RESIZE permission if the base is smaller than the top).

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 block/block-backend.c          |  9 +++-
 block/commit.c                 | 96 +++++++++++++++++++++++++---------
 block/monitor/block-hmp-cmds.c |  2 +-
 blockdev.c                     |  4 +-
 4 files changed, 81 insertions(+), 30 deletions(-)

diff --git a/block/block-backend.c b/block/block-backend.c
index 6936b25c83..7f2c7dbccc 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -2271,8 +2271,13 @@ int blk_commit_all(void)
         AioContext *aio_context = blk_get_aio_context(blk);
 
         aio_context_acquire(aio_context);
-        if (blk_is_inserted(blk) && blk->root->bs->backing) {
-            int ret = bdrv_commit(blk->root->bs);
+        if (blk_is_inserted(blk)) {
+            BlockDriverState *non_filter;
+            int ret;
+
+            /* Legacy function, so skip implicit filters */
+            non_filter = bdrv_skip_implicit_filters(blk->root->bs);
+            ret = bdrv_commit(non_filter);
             if (ret < 0) {
                 aio_context_release(aio_context);
                 return ret;
diff --git a/block/commit.c b/block/commit.c
index 7732d02dfe..4122b6736d 100644
--- a/block/commit.c
+++ b/block/commit.c
@@ -37,6 +37,7 @@ typedef struct CommitBlockJob {
     BlockBackend *top;
     BlockBackend *base;
     BlockDriverState *base_bs;
+    BlockDriverState *base_overlay;
     BlockdevOnError on_error;
     bool base_read_only;
     bool chain_frozen;
@@ -89,7 +90,7 @@ static void commit_abort(Job *job)
      * XXX Can (or should) we somehow keep 'consistent read' blocked even
      * after the failed/cancelled commit job is gone? If we already wrote
      * something to base, the intermediate images aren't valid any more. */
-    bdrv_replace_node(s->commit_top_bs, backing_bs(s->commit_top_bs),
+    bdrv_replace_node(s->commit_top_bs, s->commit_top_bs->backing->bs,
                       &error_abort);
 
     bdrv_unref(s->commit_top_bs);
@@ -153,7 +154,7 @@ static int coroutine_fn commit_run(Job *job, Error **errp)
             break;
         }
         /* Copy if allocated above the base */
-        ret = bdrv_is_allocated_above(blk_bs(s->top), blk_bs(s->base), false,
+        ret = bdrv_is_allocated_above(blk_bs(s->top), s->base_overlay, true,
                                       offset, COMMIT_BUFFER_SIZE, &n);
         copy = (ret == 1);
         trace_commit_one_iteration(s, offset, n, ret);
@@ -253,15 +254,35 @@ void commit_start(const char *job_id, BlockDriverState *bs,
     CommitBlockJob *s;
     BlockDriverState *iter;
     BlockDriverState *commit_top_bs = NULL;
+    BlockDriverState *filtered_base;
     Error *local_err = NULL;
+    int64_t base_size, top_size;
+    uint64_t perms, iter_shared_perms;
     int ret;
 
     assert(top != bs);
-    if (top == base) {
+    if (bdrv_skip_filters(top) == bdrv_skip_filters(base)) {
         error_setg(errp, "Invalid files for merge: top and base are the same");
         return;
     }
 
+    base_size = bdrv_getlength(base);
+    if (base_size < 0) {
+        error_setg_errno(errp, -base_size, "Could not inquire base image size");
+        return;
+    }
+
+    top_size = bdrv_getlength(top);
+    if (top_size < 0) {
+        error_setg_errno(errp, -top_size, "Could not inquire top image size");
+        return;
+    }
+
+    perms = BLK_PERM_CONSISTENT_READ | BLK_PERM_WRITE;
+    if (base_size < top_size) {
+        perms |= BLK_PERM_RESIZE;
+    }
+
     s = block_job_create(job_id, &commit_job_driver, NULL, bs, 0, BLK_PERM_ALL,
                          speed, creation_flags, NULL, NULL, errp);
     if (!s) {
@@ -301,17 +322,43 @@ void commit_start(const char *job_id, BlockDriverState *bs,
 
     s->commit_top_bs = commit_top_bs;
 
-    /* Block all nodes between top and base, because they will
-     * disappear from the chain after this operation. */
-    assert(bdrv_chain_contains(top, base));
-    for (iter = top; iter != base; iter = backing_bs(iter)) {
-        /* XXX BLK_PERM_WRITE needs to be allowed so we don't block ourselves
-         * at s->base (if writes are blocked for a node, they are also blocked
-         * for its backing file). The other options would be a second filter
-         * driver above s->base. */
+    /*
+     * Block all nodes between top and base, because they will
+     * disappear from the chain after this operation.
+     * Note that this assumes that the user is fine with removing all
+     * nodes (including R/W filters) between top and base.  Assuring
+     * this is the responsibility of the interface (i.e. whoever calls
+     * commit_start()).
+     */
+    s->base_overlay = bdrv_find_overlay(top, base);
+    assert(s->base_overlay);
+
+    /*
+     * The topmost node with
+     * bdrv_skip_filters(filtered_base) == bdrv_skip_filters(base)
+     */
+    filtered_base = bdrv_cow_bs(s->base_overlay);
+    assert(bdrv_skip_filters(filtered_base) == bdrv_skip_filters(base));
+
+    /*
+     * XXX BLK_PERM_WRITE needs to be allowed so we don't block ourselves
+     * at s->base (if writes are blocked for a node, they are also blocked
+     * for its backing file). The other options would be a second filter
+     * driver above s->base.
+     */
+    iter_shared_perms = BLK_PERM_WRITE_UNCHANGED | BLK_PERM_WRITE;
+
+    for (iter = top; iter != base; iter = bdrv_filter_or_cow_bs(iter)) {
+        if (iter == filtered_base) {
+            /*
+             * From here on, all nodes are filters on the base.  This
+             * allows us to share BLK_PERM_CONSISTENT_READ.
+             */
+            iter_shared_perms |= BLK_PERM_CONSISTENT_READ;
+        }
+
         ret = block_job_add_bdrv(&s->common, "intermediate node", iter, 0,
-                                 BLK_PERM_WRITE_UNCHANGED | BLK_PERM_WRITE,
-                                 errp);
+                                 iter_shared_perms, errp);
         if (ret < 0) {
             goto fail;
         }
@@ -328,9 +375,7 @@ void commit_start(const char *job_id, BlockDriverState *bs,
     }
 
     s->base = blk_new(s->common.job.aio_context,
-                      BLK_PERM_CONSISTENT_READ
-                      | BLK_PERM_WRITE
-                      | BLK_PERM_RESIZE,
+                      perms,
                       BLK_PERM_CONSISTENT_READ
                       | BLK_PERM_GRAPH_MOD
                       | BLK_PERM_WRITE_UNCHANGED);
@@ -398,19 +443,22 @@ int bdrv_commit(BlockDriverState *bs)
     if (!drv)
         return -ENOMEDIUM;
 
-    if (!bs->backing) {
+    backing_file_bs = bdrv_cow_bs(bs);
+
+    if (!backing_file_bs) {
         return -ENOTSUP;
     }
 
     if (bdrv_op_is_blocked(bs, BLOCK_OP_TYPE_COMMIT_SOURCE, NULL) ||
-        bdrv_op_is_blocked(bs->backing->bs, BLOCK_OP_TYPE_COMMIT_TARGET, NULL)) {
+        bdrv_op_is_blocked(backing_file_bs, BLOCK_OP_TYPE_COMMIT_TARGET, NULL))
+    {
         return -EBUSY;
     }
 
-    ro = bs->backing->bs->read_only;
+    ro = backing_file_bs->read_only;
 
     if (ro) {
-        if (bdrv_reopen_set_read_only(bs->backing->bs, false, NULL)) {
+        if (bdrv_reopen_set_read_only(backing_file_bs, false, NULL)) {
             return -EACCES;
         }
     }
@@ -428,8 +476,6 @@ int bdrv_commit(BlockDriverState *bs)
     }
 
     /* Insert commit_top block node above backing, so we can write to it */
-    backing_file_bs = backing_bs(bs);
-
     commit_top_bs = bdrv_new_open_driver(&bdrv_commit_top, NULL, BDRV_O_RDWR,
                                          &local_err);
     if (commit_top_bs == NULL) {
@@ -515,15 +561,13 @@ ro_cleanup:
     qemu_vfree(buf);
 
     blk_unref(backing);
-    if (backing_file_bs) {
-        bdrv_set_backing_hd(bs, backing_file_bs, &error_abort);
-    }
+    bdrv_set_backing_hd(bs, backing_file_bs, &error_abort);
     bdrv_unref(commit_top_bs);
     blk_unref(src);
 
     if (ro) {
         /* ignoring error return here */
-        bdrv_reopen_set_read_only(bs->backing->bs, true, NULL);
+        bdrv_reopen_set_read_only(backing_file_bs, true, NULL);
     }
 
     return ret;
diff --git a/block/monitor/block-hmp-cmds.c b/block/monitor/block-hmp-cmds.c
index 4c8c375172..4d3db5ed3c 100644
--- a/block/monitor/block-hmp-cmds.c
+++ b/block/monitor/block-hmp-cmds.c
@@ -217,7 +217,7 @@ void hmp_commit(Monitor *mon, const QDict *qdict)
             return;
         }
 
-        bs = blk_bs(blk);
+        bs = bdrv_skip_implicit_filters(blk_bs(blk));
         aio_context = bdrv_get_aio_context(bs);
         aio_context_acquire(aio_context);
 
diff --git a/blockdev.c b/blockdev.c
index 9ce99b9cbc..402f1d1df1 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -2690,7 +2690,9 @@ void qmp_block_commit(bool has_job_id, const char *job_id, const char *device,
 
     assert(bdrv_get_aio_context(base_bs) == aio_context);
 
-    for (iter = top_bs; iter != backing_bs(base_bs); iter = backing_bs(iter)) {
+    for (iter = top_bs; iter != bdrv_filter_or_cow_bs(base_bs);
+         iter = bdrv_filter_or_cow_bs(iter))
+    {
         if (bdrv_op_is_blocked(iter, BLOCK_OP_TYPE_COMMIT_TARGET, errp)) {
             goto out;
         }
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 173+ messages in thread

* [PATCH v7 36/47] nbd: Use CAF when looking for dirty bitmap
  2020-06-25 15:21 [PATCH v7 00/47] block: Deal with filters Max Reitz
                   ` (34 preceding siblings ...)
  2020-06-25 15:22 ` [PATCH v7 35/47] commit: " Max Reitz
@ 2020-06-25 15:22 ` Max Reitz
  2020-07-23 17:21   ` Andrey Shinkevich
  2020-06-25 15:22 ` [PATCH v7 37/47] qemu-img: Use child access functions Max Reitz
                   ` (12 subsequent siblings)
  48 siblings, 1 reply; 173+ messages in thread
From: Max Reitz @ 2020-06-25 15:22 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, qemu-devel, Max Reitz

When looking for a dirty bitmap to share, we should handle filters by
just including them in the search (so they do not break backing chains).

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 nbd/server.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/nbd/server.c b/nbd/server.c
index 20754e9ebc..b504a79435 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -1561,13 +1561,13 @@ NBDExport *nbd_export_new(BlockDriverState *bs, uint64_t dev_offset,
     if (bitmap) {
         BdrvDirtyBitmap *bm = NULL;
 
-        while (true) {
+        while (bs) {
             bm = bdrv_find_dirty_bitmap(bs, bitmap);
-            if (bm != NULL || bs->backing == NULL) {
+            if (bm != NULL) {
                 break;
             }
 
-            bs = bs->backing->bs;
+            bs = bdrv_filter_or_cow_bs(bs);
         }
 
         if (bm == NULL) {
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 173+ messages in thread

* [PATCH v7 37/47] qemu-img: Use child access functions
  2020-06-25 15:21 [PATCH v7 00/47] block: Deal with filters Max Reitz
                   ` (35 preceding siblings ...)
  2020-06-25 15:22 ` [PATCH v7 36/47] nbd: Use CAF when looking for dirty bitmap Max Reitz
@ 2020-06-25 15:22 ` Max Reitz
  2020-07-24 15:51   ` Andrey Shinkevich
  2020-08-21 15:29   ` Kevin Wolf
  2020-06-25 15:22 ` [PATCH v7 38/47] block: Drop backing_bs() Max Reitz
                   ` (11 subsequent siblings)
  48 siblings, 2 replies; 173+ messages in thread
From: Max Reitz @ 2020-06-25 15:22 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, qemu-devel, Max Reitz

This changes iotest 204's output, because blkdebug on top of a COW node
used to make qemu-img map disregard the rest of the backing chain (the
backing chain was broken by the filter).  With this patch, the
allocation in the base image is reported correctly.

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 qemu-img.c                 | 36 ++++++++++++++++++++++--------------
 tests/qemu-iotests/204.out |  1 +
 2 files changed, 23 insertions(+), 14 deletions(-)

diff --git a/qemu-img.c b/qemu-img.c
index 381271a74e..947be6ffac 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -1089,7 +1089,7 @@ static int img_commit(int argc, char **argv)
         /* This is different from QMP, which by default uses the deepest file in
          * the backing chain (i.e., the very base); however, the traditional
          * behavior of qemu-img commit is using the immediate backing file. */
-        base_bs = backing_bs(bs);
+        base_bs = bdrv_backing_chain_next(bs);
         if (!base_bs) {
             error_setg(&local_err, "Image does not have a backing file");
             goto done;
@@ -1737,18 +1737,20 @@ static int convert_iteration_sectors(ImgConvertState *s, int64_t sector_num)
     if (s->sector_next_status <= sector_num) {
         uint64_t offset = (sector_num - src_cur_offset) * BDRV_SECTOR_SIZE;
         int64_t count;
+        BlockDriverState *src_bs = blk_bs(s->src[src_cur]);
+        BlockDriverState *base;
+
+        if (s->target_has_backing) {
+            base = bdrv_cow_bs(bdrv_skip_filters(src_bs));
+        } else {
+            base = NULL;
+        }
 
         do {
             count = n * BDRV_SECTOR_SIZE;
 
-            if (s->target_has_backing) {
-                ret = bdrv_block_status(blk_bs(s->src[src_cur]), offset,
-                                        count, &count, NULL, NULL);
-            } else {
-                ret = bdrv_block_status_above(blk_bs(s->src[src_cur]), NULL,
-                                              offset, count, &count, NULL,
-                                              NULL);
-            }
+            ret = bdrv_block_status_above(src_bs, base, offset, count, &count,
+                                          NULL, NULL);
 
             if (ret < 0) {
                 if (s->salvage) {
@@ -2673,7 +2675,8 @@ static int img_convert(int argc, char **argv)
          * s.target_backing_sectors has to be negative, which it will
          * be automatically).  The backing file length is used only
          * for optimizations, so such a case is not fatal. */
-        s.target_backing_sectors = bdrv_nb_sectors(out_bs->backing->bs);
+        s.target_backing_sectors =
+            bdrv_nb_sectors(bdrv_backing_chain_next(out_bs));
     } else {
         s.target_backing_sectors = -1;
     }
@@ -3044,6 +3047,7 @@ static int get_block_status(BlockDriverState *bs, int64_t offset,
 
     depth = 0;
     for (;;) {
+        bs = bdrv_skip_filters(bs);
         ret = bdrv_block_status(bs, offset, bytes, &bytes, &map, &file);
         if (ret < 0) {
             return ret;
@@ -3052,7 +3056,7 @@ static int get_block_status(BlockDriverState *bs, int64_t offset,
         if (ret & (BDRV_BLOCK_ZERO|BDRV_BLOCK_DATA)) {
             break;
         }
-        bs = backing_bs(bs);
+        bs = bdrv_cow_bs(bs);
         if (bs == NULL) {
             ret = 0;
             break;
@@ -3437,6 +3441,7 @@ static int img_rebase(int argc, char **argv)
     uint8_t *buf_old = NULL;
     uint8_t *buf_new = NULL;
     BlockDriverState *bs = NULL, *prefix_chain_bs = NULL;
+    BlockDriverState *unfiltered_bs;
     char *filename;
     const char *fmt, *cache, *src_cache, *out_basefmt, *out_baseimg;
     int c, flags, src_flags, ret;
@@ -3571,6 +3576,8 @@ static int img_rebase(int argc, char **argv)
     }
     bs = blk_bs(blk);
 
+    unfiltered_bs = bdrv_skip_filters(bs);
+
     if (out_basefmt != NULL) {
         if (bdrv_find_format(out_basefmt) == NULL) {
             error_report("Invalid format name: '%s'", out_basefmt);
@@ -3582,7 +3589,7 @@ static int img_rebase(int argc, char **argv)
     /* For safe rebasing we need to compare old and new backing file */
     if (!unsafe) {
         QDict *options = NULL;
-        BlockDriverState *base_bs = backing_bs(bs);
+        BlockDriverState *base_bs = bdrv_cow_bs(unfiltered_bs);
 
         if (base_bs) {
             blk_old_backing = blk_new(qemu_get_aio_context(),
@@ -3738,8 +3745,9 @@ static int img_rebase(int argc, char **argv)
                  * If cluster wasn't changed since prefix_chain, we don't need
                  * to take action
                  */
-                ret = bdrv_is_allocated_above(backing_bs(bs), prefix_chain_bs,
-                                              false, offset, n, &n);
+                ret = bdrv_is_allocated_above(bdrv_cow_bs(unfiltered_bs),
+                                              prefix_chain_bs, false,
+                                              offset, n, &n);
                 if (ret < 0) {
                     error_report("error while reading image metadata: %s",
                                  strerror(-ret));
diff --git a/tests/qemu-iotests/204.out b/tests/qemu-iotests/204.out
index f3a10fbe90..684774d763 100644
--- a/tests/qemu-iotests/204.out
+++ b/tests/qemu-iotests/204.out
@@ -59,5 +59,6 @@ Offset          Length          File
 0x900000        0x2400000       TEST_DIR/t.IMGFMT
 0x3c00000       0x1100000       TEST_DIR/t.IMGFMT
 0x6a00000       0x400000        TEST_DIR/t.IMGFMT
+0x6e00000       0x1200000       TEST_DIR/t.IMGFMT.base
 No errors were found on the image.
 *** done
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 173+ messages in thread

* [PATCH v7 38/47] block: Drop backing_bs()
  2020-06-25 15:21 [PATCH v7 00/47] block: Deal with filters Max Reitz
                   ` (36 preceding siblings ...)
  2020-06-25 15:22 ` [PATCH v7 37/47] qemu-img: Use child access functions Max Reitz
@ 2020-06-25 15:22 ` Max Reitz
  2020-07-24 15:55   ` Andrey Shinkevich
  2020-06-25 15:22 ` [PATCH v7 39/47] blockdev: Fix active commit choice Max Reitz
                   ` (10 subsequent siblings)
  48 siblings, 1 reply; 173+ messages in thread
From: Max Reitz @ 2020-06-25 15:22 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, qemu-devel, Max Reitz

We want to make it explicit where bs->backing is used, and we have done
so.  The old role of backing_bs() is now effectively taken by
bdrv_cow_bs().

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 include/block/block_int.h | 5 -----
 1 file changed, 5 deletions(-)

diff --git a/include/block/block_int.h b/include/block/block_int.h
index c963ee9f28..6e09e15ed4 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -991,11 +991,6 @@ typedef enum BlockMirrorBackingMode {
     MIRROR_LEAVE_BACKING_CHAIN,
 } BlockMirrorBackingMode;
 
-static inline BlockDriverState *backing_bs(BlockDriverState *bs)
-{
-    return bs->backing ? bs->backing->bs : NULL;
-}
-
 
 /* Essential block drivers which must always be statically linked into qemu, and
  * which therefore can be accessed without using bdrv_find_format() */
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 173+ messages in thread

* [PATCH v7 39/47] blockdev: Fix active commit choice
  2020-06-25 15:21 [PATCH v7 00/47] block: Deal with filters Max Reitz
                   ` (37 preceding siblings ...)
  2020-06-25 15:22 ` [PATCH v7 38/47] block: Drop backing_bs() Max Reitz
@ 2020-06-25 15:22 ` Max Reitz
  2020-08-21 15:50   ` Kevin Wolf
  2020-06-25 15:22 ` [PATCH v7 40/47] block: Inline bdrv_co_block_status_from_*() Max Reitz
                   ` (9 subsequent siblings)
  48 siblings, 1 reply; 173+ messages in thread
From: Max Reitz @ 2020-06-25 15:22 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, qemu-devel, Max Reitz

We have to perform an active commit whenever the top node has a parent
that has taken the WRITE permission on it.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 blockdev.c | 24 +++++++++++++++++++++---
 1 file changed, 21 insertions(+), 3 deletions(-)

diff --git a/blockdev.c b/blockdev.c
index 402f1d1df1..237fffbe53 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -2589,6 +2589,7 @@ void qmp_block_commit(bool has_job_id, const char *job_id, const char *device,
     AioContext *aio_context;
     Error *local_err = NULL;
     int job_flags = JOB_DEFAULT;
+    uint64_t top_perm, top_shared;
 
     if (!has_speed) {
         speed = 0;
@@ -2704,14 +2705,31 @@ void qmp_block_commit(bool has_job_id, const char *job_id, const char *device,
         goto out;
     }
 
-    if (top_bs == bs) {
+    /*
+     * Active commit is required if and only if someone has taken a
+     * WRITE permission on the top node.  Historically, we have always
+     * used active commit for top nodes, so continue that practice.
+     * (Active commit is never really wrong.)
+     */
+    bdrv_get_cumulative_perm(top_bs, &top_perm, &top_shared);
+    if (top_perm & BLK_PERM_WRITE ||
+        bdrv_skip_filters(top_bs) == bdrv_skip_filters(bs))
+    {
         if (has_backing_file) {
             error_setg(errp, "'backing-file' specified,"
                              " but 'top' is the active layer");
             goto out;
         }
-        commit_active_start(has_job_id ? job_id : NULL, bs, base_bs,
-                            job_flags, speed, on_error,
+        if (!has_job_id) {
+            /*
+             * Emulate here what block_job_create() does, because it
+             * is possible that @bs != @top_bs (the block job should
+             * be named after @bs, even if @top_bs is the actual
+             * source)
+             */
+            job_id = bdrv_get_device_name(bs);
+        }
+        commit_active_start(job_id, top_bs, base_bs, job_flags, speed, on_error,
                             filter_node_name, NULL, NULL, false, &local_err);
     } else {
         BlockDriverState *overlay_bs = bdrv_find_overlay(bs, top_bs);
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 173+ messages in thread

* [PATCH v7 40/47] block: Inline bdrv_co_block_status_from_*()
  2020-06-25 15:21 [PATCH v7 00/47] block: Deal with filters Max Reitz
                   ` (38 preceding siblings ...)
  2020-06-25 15:22 ` [PATCH v7 39/47] blockdev: Fix active commit choice Max Reitz
@ 2020-06-25 15:22 ` Max Reitz
  2020-07-24 18:00   ` Andrey Shinkevich
  2020-06-25 15:22 ` [PATCH v7 41/47] block: Leave BDS.backing_file constant Max Reitz
                   ` (8 subsequent siblings)
  48 siblings, 1 reply; 173+ messages in thread
From: Max Reitz @ 2020-06-25 15:22 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, qemu-devel, Max Reitz

With bdrv_filter_bs(), we can easily handle this default filter behavior
in bdrv_co_block_status().

blkdebug wants to have an additional assertion, so it keeps its own
implementation, except bdrv_co_block_status_from_file() needs to be
inlined there.

Suggested-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 include/block/block_int.h | 23 ------------------
 block/backup-top.c        |  2 --
 block/blkdebug.c          |  7 ++++--
 block/blklogwrites.c      |  1 -
 block/commit.c            |  1 -
 block/copy-on-read.c      |  2 --
 block/filter-compress.c   |  2 --
 block/io.c                | 51 +++++++++++++--------------------------
 block/mirror.c            |  1 -
 block/throttle.c          |  1 -
 10 files changed, 22 insertions(+), 69 deletions(-)

diff --git a/include/block/block_int.h b/include/block/block_int.h
index 6e09e15ed4..e5a328c389 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -1291,29 +1291,6 @@ void bdrv_default_perms(BlockDriverState *bs, BdrvChild *c,
                         uint64_t perm, uint64_t shared,
                         uint64_t *nperm, uint64_t *nshared);
 
-/*
- * Default implementation for drivers to pass bdrv_co_block_status() to
- * their file.
- */
-int coroutine_fn bdrv_co_block_status_from_file(BlockDriverState *bs,
-                                                bool want_zero,
-                                                int64_t offset,
-                                                int64_t bytes,
-                                                int64_t *pnum,
-                                                int64_t *map,
-                                                BlockDriverState **file);
-/*
- * Default implementation for drivers to pass bdrv_co_block_status() to
- * their backing file.
- */
-int coroutine_fn bdrv_co_block_status_from_backing(BlockDriverState *bs,
-                                                   bool want_zero,
-                                                   int64_t offset,
-                                                   int64_t bytes,
-                                                   int64_t *pnum,
-                                                   int64_t *map,
-                                                   BlockDriverState **file);
-
 int64_t bdrv_sum_allocated_file_size(BlockDriverState *bs);
 int64_t bdrv_primary_allocated_file_size(BlockDriverState *bs);
 int64_t bdrv_notsup_allocated_file_size(BlockDriverState *bs);
diff --git a/block/backup-top.c b/block/backup-top.c
index 89bd3937d0..bf5fc22fc7 100644
--- a/block/backup-top.c
+++ b/block/backup-top.c
@@ -185,8 +185,6 @@ BlockDriver bdrv_backup_top_filter = {
     .bdrv_co_pwritev_compressed = backup_top_co_pwritev_compressed,
     .bdrv_co_flush              = backup_top_co_flush,
 
-    .bdrv_co_block_status       = bdrv_co_block_status_from_backing,
-
     .bdrv_refresh_filename      = backup_top_refresh_filename,
 
     .bdrv_child_perm            = backup_top_child_perm,
diff --git a/block/blkdebug.c b/block/blkdebug.c
index 7194bc7f06..cf78d8809e 100644
--- a/block/blkdebug.c
+++ b/block/blkdebug.c
@@ -757,8 +757,11 @@ static int coroutine_fn blkdebug_co_block_status(BlockDriverState *bs,
         return err;
     }
 
-    return bdrv_co_block_status_from_file(bs, want_zero, offset, bytes,
-                                          pnum, map, file);
+    assert(bs->file && bs->file->bs);
+    *pnum = bytes;
+    *map = offset;
+    *file = bs->file->bs;
+    return BDRV_BLOCK_RAW | BDRV_BLOCK_OFFSET_VALID;
 }
 
 static void blkdebug_close(BlockDriverState *bs)
diff --git a/block/blklogwrites.c b/block/blklogwrites.c
index 6753bd9a3e..c6b2711fe5 100644
--- a/block/blklogwrites.c
+++ b/block/blklogwrites.c
@@ -517,7 +517,6 @@ static BlockDriver bdrv_blk_log_writes = {
     .bdrv_co_pwrite_zeroes  = blk_log_writes_co_pwrite_zeroes,
     .bdrv_co_flush_to_disk  = blk_log_writes_co_flush_to_disk,
     .bdrv_co_pdiscard       = blk_log_writes_co_pdiscard,
-    .bdrv_co_block_status   = bdrv_co_block_status_from_file,
 
     .is_filter              = true,
     .strong_runtime_opts    = blk_log_writes_strong_runtime_opts,
diff --git a/block/commit.c b/block/commit.c
index 4122b6736d..ea9282daea 100644
--- a/block/commit.c
+++ b/block/commit.c
@@ -238,7 +238,6 @@ static void bdrv_commit_top_child_perm(BlockDriverState *bs, BdrvChild *c,
 static BlockDriver bdrv_commit_top = {
     .format_name                = "commit_top",
     .bdrv_co_preadv             = bdrv_commit_top_preadv,
-    .bdrv_co_block_status       = bdrv_co_block_status_from_backing,
     .bdrv_refresh_filename      = bdrv_commit_top_refresh_filename,
     .bdrv_child_perm            = bdrv_commit_top_child_perm,
 
diff --git a/block/copy-on-read.c b/block/copy-on-read.c
index a6a864f147..2816e61afe 100644
--- a/block/copy-on-read.c
+++ b/block/copy-on-read.c
@@ -146,8 +146,6 @@ static BlockDriver bdrv_copy_on_read = {
     .bdrv_eject                         = cor_eject,
     .bdrv_lock_medium                   = cor_lock_medium,
 
-    .bdrv_co_block_status               = bdrv_co_block_status_from_file,
-
     .has_variable_length                = true,
     .is_filter                          = true,
 };
diff --git a/block/filter-compress.c b/block/filter-compress.c
index 8ec1991c1f..5136371bf8 100644
--- a/block/filter-compress.c
+++ b/block/filter-compress.c
@@ -146,8 +146,6 @@ static BlockDriver bdrv_compress = {
     .bdrv_eject                         = compress_eject,
     .bdrv_lock_medium                   = compress_lock_medium,
 
-    .bdrv_co_block_status               = bdrv_co_block_status_from_file,
-
     .has_variable_length                = true,
     .is_filter                          = true,
 };
diff --git a/block/io.c b/block/io.c
index 9e802804bb..e2196d438c 100644
--- a/block/io.c
+++ b/block/io.c
@@ -2253,36 +2253,6 @@ typedef struct BdrvCoBlockStatusData {
     BlockDriverState **file;
 } BdrvCoBlockStatusData;
 
-int coroutine_fn bdrv_co_block_status_from_file(BlockDriverState *bs,
-                                                bool want_zero,
-                                                int64_t offset,
-                                                int64_t bytes,
-                                                int64_t *pnum,
-                                                int64_t *map,
-                                                BlockDriverState **file)
-{
-    assert(bs->file && bs->file->bs);
-    *pnum = bytes;
-    *map = offset;
-    *file = bs->file->bs;
-    return BDRV_BLOCK_RAW | BDRV_BLOCK_OFFSET_VALID;
-}
-
-int coroutine_fn bdrv_co_block_status_from_backing(BlockDriverState *bs,
-                                                   bool want_zero,
-                                                   int64_t offset,
-                                                   int64_t bytes,
-                                                   int64_t *pnum,
-                                                   int64_t *map,
-                                                   BlockDriverState **file)
-{
-    assert(bs->backing && bs->backing->bs);
-    *pnum = bytes;
-    *map = offset;
-    *file = bs->backing->bs;
-    return BDRV_BLOCK_RAW | BDRV_BLOCK_OFFSET_VALID;
-}
-
 /*
  * Returns the allocation status of the specified sectors.
  * Drivers not implementing the functionality are assumed to not support
@@ -2323,6 +2293,7 @@ static int coroutine_fn bdrv_co_block_status(BlockDriverState *bs,
     BlockDriverState *local_file = NULL;
     int64_t aligned_offset, aligned_bytes;
     uint32_t align;
+    bool has_filtered_child;
 
     assert(pnum);
     *pnum = 0;
@@ -2348,7 +2319,8 @@ static int coroutine_fn bdrv_co_block_status(BlockDriverState *bs,
 
     /* Must be non-NULL or bdrv_getlength() would have failed */
     assert(bs->drv);
-    if (!bs->drv->bdrv_co_block_status) {
+    has_filtered_child = bdrv_filter_child(bs);
+    if (!bs->drv->bdrv_co_block_status && !has_filtered_child) {
         *pnum = bytes;
         ret = BDRV_BLOCK_DATA | BDRV_BLOCK_ALLOCATED;
         if (offset + bytes == total_size) {
@@ -2369,9 +2341,20 @@ static int coroutine_fn bdrv_co_block_status(BlockDriverState *bs,
     aligned_offset = QEMU_ALIGN_DOWN(offset, align);
     aligned_bytes = ROUND_UP(offset + bytes, align) - aligned_offset;
 
-    ret = bs->drv->bdrv_co_block_status(bs, want_zero, aligned_offset,
-                                        aligned_bytes, pnum, &local_map,
-                                        &local_file);
+    if (bs->drv->bdrv_co_block_status) {
+        ret = bs->drv->bdrv_co_block_status(bs, want_zero, aligned_offset,
+                                            aligned_bytes, pnum, &local_map,
+                                            &local_file);
+    } else {
+        /* Default code for filters */
+
+        local_file = bdrv_filter_bs(bs);
+        assert(local_file);
+
+        *pnum = aligned_bytes;
+        local_map = aligned_offset;
+        ret = BDRV_BLOCK_RAW | BDRV_BLOCK_OFFSET_VALID;
+    }
     if (ret < 0) {
         *pnum = 0;
         goto out;
diff --git a/block/mirror.c b/block/mirror.c
index 770de3b34e..5a9e42e488 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -1541,7 +1541,6 @@ static BlockDriver bdrv_mirror_top = {
     .bdrv_co_pdiscard           = bdrv_mirror_top_pdiscard,
     .bdrv_co_pwritev_compressed = bdrv_mirror_top_pwritev_compressed,
     .bdrv_co_flush              = bdrv_mirror_top_flush,
-    .bdrv_co_block_status       = bdrv_co_block_status_from_backing,
     .bdrv_refresh_filename      = bdrv_mirror_top_refresh_filename,
     .bdrv_child_perm            = bdrv_mirror_top_child_perm,
 
diff --git a/block/throttle.c b/block/throttle.c
index f6e619aca2..473ea758df 100644
--- a/block/throttle.c
+++ b/block/throttle.c
@@ -263,7 +263,6 @@ static BlockDriver bdrv_throttle = {
     .bdrv_reopen_prepare                =   throttle_reopen_prepare,
     .bdrv_reopen_commit                 =   throttle_reopen_commit,
     .bdrv_reopen_abort                  =   throttle_reopen_abort,
-    .bdrv_co_block_status               =   bdrv_co_block_status_from_file,
 
     .bdrv_co_drain_begin                =   throttle_co_drain_begin,
     .bdrv_co_drain_end                  =   throttle_co_drain_end,
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 173+ messages in thread

* [PATCH v7 41/47] block: Leave BDS.backing_file constant
  2020-06-25 15:21 [PATCH v7 00/47] block: Deal with filters Max Reitz
                   ` (39 preceding siblings ...)
  2020-06-25 15:22 ` [PATCH v7 40/47] block: Inline bdrv_co_block_status_from_*() Max Reitz
@ 2020-06-25 15:22 ` Max Reitz
  2020-07-27 12:27   ` Andrey Shinkevich
  2020-08-24 13:14   ` Kevin Wolf
  2020-06-25 15:22 ` [PATCH v7 42/47] iotests: Test that qcow2's data-file is flushed Max Reitz
                   ` (7 subsequent siblings)
  48 siblings, 2 replies; 173+ messages in thread
From: Max Reitz @ 2020-06-25 15:22 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, qemu-devel, Max Reitz

Parts of the block layer treat BDS.backing_file as if it were whatever
the image header says (i.e., if it is a relative path, it is relative to
the overlay), other parts treat it like a cache for
bs->backing->bs->filename (relative paths are relative to the CWD).
Considering bs->backing->bs->filename exists, let us make it mean the
former.

Among other things, this now allows the user to specify a base when
using qemu-img to commit an image file in a directory that is not the
CWD (assuming, everything uses relative filenames).

Before this patch:

$ ./qemu-img create -f qcow2 foo/bot.qcow2 1M
$ ./qemu-img create -f qcow2 -b bot.qcow2 foo/mid.qcow2
$ ./qemu-img create -f qcow2 -b mid.qcow2 foo/top.qcow2
$ ./qemu-img commit -b mid.qcow2 foo/top.qcow2
qemu-img: Did not find 'mid.qcow2' in the backing chain of 'foo/top.qcow2'
$ ./qemu-img commit -b foo/mid.qcow2 foo/top.qcow2
qemu-img: Did not find 'foo/mid.qcow2' in the backing chain of 'foo/top.qcow2'
$ ./qemu-img commit -b $PWD/foo/mid.qcow2 foo/top.qcow2
qemu-img: Did not find '[...]/foo/mid.qcow2' in the backing chain of 'foo/top.qcow2'

After this patch:

$ ./qemu-img commit -b mid.qcow2 foo/top.qcow2
Image committed.
$ ./qemu-img commit -b foo/mid.qcow2 foo/top.qcow2
qemu-img: Did not find 'foo/mid.qcow2' in the backing chain of 'foo/top.qcow2'
$ ./qemu-img commit -b $PWD/foo/mid.qcow2 foo/top.qcow2
Image committed.

With this change, bdrv_find_backing_image() must look at whether the
user has overridden a BDS's backing file.  If so, it can no longer use
bs->backing_file, but must instead compare the given filename against
the backing node's filename directly.

Note that this changes the QAPI output for a node's backing_file.  We
had very inconsistent output there (sometimes what the image header
said, sometimes the actual filename of the backing image).  This
inconsistent output was effectively useless, so we have to decide one
way or the other.  Considering that bs->backing_file usually at runtime
contained the path to the image relative to qemu's CWD (or absolute),
this patch changes QAPI's backing_file to always report the
bs->backing->bs->filename from now on.  If you want to receive the image
header information, you have to refer to full-backing-filename.

This necessitates a change to iotest 228.  The interesting information
it really wanted is the image header, and it can get that now, but it
has to use full-backing-filename instead of backing_file.  Because of
this patch's changes to bs->backing_file's behavior, we also need some
reference output changes.

Along with the changes to bs->backing_file, stop updating
BDS.backing_format in bdrv_backing_attach() as well.  In order not to
change our externally visible behavior (incompatibly), we have to let
bdrv_query_image_info() try to get the image format from bs->backing if
bs->backing_format is unset.  (The QAPI schema describes
backing-filename-format as "the format of the backing file", so it is
not necessarily what the image header says, but just the format of the
file referenced by backing-filename (if known).)

iotest 245 changes in behavior: With the backing node no longer
overriding the parent node's backing_file string, you can now omit the
@backing option when reopening a node with neither a default nor a
current backing file even if it used to have a backing node at some
point.

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 include/block/block_int.h  | 21 ++++++++++++++++-----
 block.c                    | 35 +++++++++++++++++++++++++++--------
 block/qapi.c               | 17 +++++++++++++----
 tests/qemu-iotests/228     |  6 +++---
 tests/qemu-iotests/228.out |  6 +++---
 tests/qemu-iotests/245     |  4 +++-
 6 files changed, 65 insertions(+), 24 deletions(-)

diff --git a/include/block/block_int.h b/include/block/block_int.h
index e5a328c389..465a601955 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -835,11 +835,20 @@ struct BlockDriverState {
     bool walking_aio_notifiers; /* to make removal during iteration safe */
 
     char filename[PATH_MAX];
-    char backing_file[PATH_MAX]; /* if non zero, the image is a diff of
-                                    this file image */
-    /* The backing filename indicated by the image header; if we ever
-     * open this file, then this is replaced by the resulting BDS's
-     * filename (i.e. after a bdrv_refresh_filename() run). */
+    /*
+     * If not empty, this image is a diff in relation to backing_file.
+     * Note that this is the name given in the image header and
+     * therefore may or may not be equal to .backing->bs->filename.
+     * If this field contains a relative path, it is to be resolved
+     * relatively to the overlay's location.
+     */
+    char backing_file[PATH_MAX];
+    /*
+     * The backing filename indicated by the image header.  Contrary
+     * to backing_file, if we ever open this file, auto_backing_file
+     * is replaced by the resulting BDS's filename (i.e. after a
+     * bdrv_refresh_filename() run).
+     */
     char auto_backing_file[PATH_MAX];
     char backing_format[16]; /* if non-zero and backing_file exists */
 
@@ -1041,6 +1050,8 @@ BlockDriver *bdrv_probe_all(const uint8_t *buf, int buf_size,
 void bdrv_parse_filename_strip_prefix(const char *filename, const char *prefix,
                                       QDict *options);
 
+bool bdrv_backing_overridden(BlockDriverState *bs);
+
 
 /**
  * bdrv_add_before_write_notifier:
diff --git a/block.c b/block.c
index a19f243997..af8d85bcf2 100644
--- a/block.c
+++ b/block.c
@@ -1153,10 +1153,6 @@ static void bdrv_backing_attach(BdrvChild *c)
     bdrv_refresh_filename(backing_hd);
 
     parent->open_flags &= ~BDRV_O_NO_BACKING;
-    pstrcpy(parent->backing_file, sizeof(parent->backing_file),
-            backing_hd->filename);
-    pstrcpy(parent->backing_format, sizeof(parent->backing_format),
-            backing_hd->drv ? backing_hd->drv->format_name : "");
 
     bdrv_op_block_all(backing_hd, parent->backing_blocker);
     /* Otherwise we won't be able to commit or stream */
@@ -5693,6 +5689,7 @@ BlockDriverState *bdrv_find_backing_image(BlockDriverState *bs,
     char *backing_file_full = NULL;
     char *filename_tmp = NULL;
     int is_protocol = 0;
+    bool filenames_refreshed = false;
     BlockDriverState *curr_bs = NULL;
     BlockDriverState *retval = NULL;
     BlockDriverState *bs_below;
@@ -5718,9 +5715,31 @@ BlockDriverState *bdrv_find_backing_image(BlockDriverState *bs,
     {
         bs_below = bdrv_backing_chain_next(curr_bs);
 
-        /* If either of the filename paths is actually a protocol, then
-         * compare unmodified paths; otherwise make paths relative */
-        if (is_protocol || path_has_protocol(curr_bs->backing_file)) {
+        if (bdrv_backing_overridden(curr_bs)) {
+            /*
+             * If the backing file was overridden, we can only compare
+             * directly against the backing node's filename.
+             */
+
+            if (!filenames_refreshed) {
+                /*
+                 * This will automatically refresh all of the
+                 * filenames in the rest of the backing chain, so we
+                 * only need to do this once.
+                 */
+                bdrv_refresh_filename(bs_below);
+                filenames_refreshed = true;
+            }
+
+            if (strcmp(backing_file, bs_below->filename) == 0) {
+                retval = bs_below;
+                break;
+            }
+        } else if (is_protocol || path_has_protocol(curr_bs->backing_file)) {
+            /*
+             * If either of the filename paths is actually a protocol, then
+             * compare unmodified paths; otherwise make paths relative.
+             */
             char *backing_file_full_ret;
 
             if (strcmp(backing_file, curr_bs->backing_file) == 0) {
@@ -6821,7 +6840,7 @@ static bool append_strong_runtime_options(QDict *d, BlockDriverState *bs)
 /* Note: This function may return false positives; it may return true
  * even if opening the backing file specified by bs's image header
  * would result in exactly bs->backing. */
-static bool bdrv_backing_overridden(BlockDriverState *bs)
+bool bdrv_backing_overridden(BlockDriverState *bs)
 {
     if (bs->backing) {
         return strcmp(bs->auto_backing_file,
diff --git a/block/qapi.c b/block/qapi.c
index 2628323b63..5da6d7e6e0 100644
--- a/block/qapi.c
+++ b/block/qapi.c
@@ -47,7 +47,7 @@ BlockDeviceInfo *bdrv_block_device_info(BlockBackend *blk,
                                         Error **errp)
 {
     ImageInfo **p_image_info;
-    BlockDriverState *bs0;
+    BlockDriverState *bs0, *backing;
     BlockDeviceInfo *info;
 
     if (!bs->drv) {
@@ -76,9 +76,10 @@ BlockDeviceInfo *bdrv_block_device_info(BlockBackend *blk,
         info->node_name = g_strdup(bs->node_name);
     }
 
-    if (bs->backing_file[0]) {
+    backing = bdrv_cow_bs(bs);
+    if (backing) {
         info->has_backing_file = true;
-        info->backing_file = g_strdup(bs->backing_file);
+        info->backing_file = g_strdup(backing->filename);
     }
 
     if (!QLIST_EMPTY(&bs->dirty_bitmaps)) {
@@ -314,6 +315,8 @@ void bdrv_query_image_info(BlockDriverState *bs,
     backing_filename = bs->backing_file;
     if (backing_filename[0] != '\0') {
         char *backing_filename2;
+        const char *backing_format = NULL;
+
         info->backing_filename = g_strdup(backing_filename);
         info->has_backing_filename = true;
         backing_filename2 = bdrv_get_full_backing_filename(bs, NULL);
@@ -326,7 +329,13 @@ void bdrv_query_image_info(BlockDriverState *bs,
         }
 
         if (bs->backing_format[0]) {
-            info->backing_filename_format = g_strdup(bs->backing_format);
+            backing_format = bs->backing_format;
+        } else if (bs->backing && bs->backing->bs->drv &&
+                   !bdrv_backing_overridden(bs)) {
+            backing_format = bs->backing->bs->drv->format_name;
+        }
+        if (backing_format) {
+            info->backing_filename_format = g_strdup(backing_format);
             info->has_backing_filename_format = true;
         }
         g_free(backing_filename2);
diff --git a/tests/qemu-iotests/228 b/tests/qemu-iotests/228
index da0900fb82..90800ecc6a 100755
--- a/tests/qemu-iotests/228
+++ b/tests/qemu-iotests/228
@@ -36,7 +36,7 @@ def log_node_info(node):
 
     log('bs->filename: ' + node['image']['filename'],
         filters=[filter_testfiles, filter_imgfmt])
-    log('bs->backing_file: ' + node['backing_file'],
+    log('bs->backing_file: ' + node['image']['full-backing-filename'],
         filters=[filter_testfiles, filter_imgfmt])
 
     if 'backing-image' in node['image']:
@@ -72,8 +72,8 @@ with iotests.FilePath('base.img') as base_img_path, \
                 },
                 filters=[filter_qmp_testfiles, filter_qmp_imgfmt])
 
-    # Filename should be plain, and the backing filename should not
-    # contain the "file:" prefix
+    # Filename should be plain, and the backing node filename should
+    # not contain the "file:" prefix
     log_node_info(vm.node_info('node0'))
 
     vm.qmp_log('blockdev-del', node_name='node0')
diff --git a/tests/qemu-iotests/228.out b/tests/qemu-iotests/228.out
index 4217df24fe..8c82009abe 100644
--- a/tests/qemu-iotests/228.out
+++ b/tests/qemu-iotests/228.out
@@ -4,7 +4,7 @@
 {"return": {}}
 
 bs->filename: TEST_DIR/PID-top.img
-bs->backing_file: TEST_DIR/PID-base.img
+bs->backing_file: file:TEST_DIR/PID-base.img
 bs->backing->bs->filename: TEST_DIR/PID-base.img
 
 {"execute": "blockdev-del", "arguments": {"node-name": "node0"}}
@@ -41,7 +41,7 @@ bs->backing->bs->filename: TEST_DIR/PID-base.img
 {"return": {}}
 
 bs->filename: TEST_DIR/PID-top.img
-bs->backing_file: TEST_DIR/PID-base.img
+bs->backing_file: file:TEST_DIR/PID-base.img
 bs->backing->bs->filename: TEST_DIR/PID-base.img
 
 {"execute": "blockdev-del", "arguments": {"node-name": "node0"}}
@@ -55,7 +55,7 @@ bs->backing->bs->filename: TEST_DIR/PID-base.img
 {"return": {}}
 
 bs->filename: json:{"backing": {"driver": "null-co"}, "driver": "IMGFMT", "file": {"driver": "file", "filename": "TEST_DIR/PID-top.img"}}
-bs->backing_file: null-co://
+bs->backing_file: TEST_DIR/PID-base.img
 bs->backing->bs->filename: null-co://
 
 {"execute": "blockdev-del", "arguments": {"node-name": "node0"}}
diff --git a/tests/qemu-iotests/245 b/tests/qemu-iotests/245
index 4f5f0bb901..5035763765 100755
--- a/tests/qemu-iotests/245
+++ b/tests/qemu-iotests/245
@@ -724,7 +724,9 @@ class TestBlockdevReopen(iotests.QMPTestCase):
 
         # Detach hd2 from hd0.
         self.reopen(opts, {'backing': None})
-        self.reopen(opts, {}, "backing is missing for 'hd0'")
+
+        # Without a backing file, we can omit 'backing' again
+        self.reopen(opts)
 
         # Remove both hd0 and hd2
         result = self.vm.qmp('blockdev-del', conv_keys = True, node_name = 'hd0')
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 173+ messages in thread

* [PATCH v7 42/47] iotests: Test that qcow2's data-file is flushed
  2020-06-25 15:21 [PATCH v7 00/47] block: Deal with filters Max Reitz
                   ` (40 preceding siblings ...)
  2020-06-25 15:22 ` [PATCH v7 41/47] block: Leave BDS.backing_file constant Max Reitz
@ 2020-06-25 15:22 ` Max Reitz
  2020-07-27 13:28   ` Andrey Shinkevich
  2020-06-25 15:22 ` [PATCH v7 43/47] iotests: Let complete_and_wait() work with commit Max Reitz
                   ` (6 subsequent siblings)
  48 siblings, 1 reply; 173+ messages in thread
From: Max Reitz @ 2020-06-25 15:22 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, qemu-devel, Max Reitz

Flushing a qcow2 node must lead to the data-file node being flushed as
well.

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 tests/qemu-iotests/244     | 49 ++++++++++++++++++++++++++++++++++++++
 tests/qemu-iotests/244.out |  7 ++++++
 2 files changed, 56 insertions(+)

diff --git a/tests/qemu-iotests/244 b/tests/qemu-iotests/244
index efe3c0428b..f2b2dddf1c 100755
--- a/tests/qemu-iotests/244
+++ b/tests/qemu-iotests/244
@@ -217,6 +217,55 @@ $QEMU_IMG amend -f $IMGFMT -o "data_file=blkdebug::$TEST_IMG.data" "$TEST_IMG"
 $QEMU_IMG convert -f $IMGFMT -O $IMGFMT -n -C "$TEST_IMG.src" "$TEST_IMG"
 $QEMU_IMG compare -f $IMGFMT -F $IMGFMT "$TEST_IMG.src" "$TEST_IMG"
 
+echo
+echo "=== Flushing should flush the data file ==="
+echo
+
+# We are going to flush a qcow2 file with a blkdebug node inserted
+# between the qcow2 node and its data file node.  The blkdebug node
+# will return an error for all flushes and so we if the data file is
+# flushed, we will see qemu-io return an error.
+
+# We need to write something or the flush will not do anything; we
+# also need -t writeback so the write is not done as a FUA write
+# (which would then fail thanks to the implicit flush)
+$QEMU_IO -c 'write 0 512' -c flush \
+    -t writeback \
+    "json:{
+         'driver': 'qcow2',
+         'file': {
+             'driver': 'file',
+             'filename': '$TEST_IMG'
+         },
+         'data-file': {
+             'driver': 'blkdebug',
+             'inject-error': [{
+                 'event': 'none',
+                 'iotype': 'flush'
+             }],
+             'image': {
+                 'driver': 'file',
+                 'filename': '$TEST_IMG.data'
+             }
+         }
+     }" \
+    | _filter_qemu_io
+
+result=${PIPESTATUS[0]}
+echo
+
+case $result in
+    0)
+        echo "ERROR: qemu-io succeeded, so the data file was not flushed"
+        ;;
+    1)
+        echo "Success: qemu-io failed, so the data file was flushed"
+        ;;
+    *)
+        echo "ERROR: qemu-io returned unknown exit code $result"
+        ;;
+esac
+
 # success, all done
 echo "*** done"
 rm -f $seq.full
diff --git a/tests/qemu-iotests/244.out b/tests/qemu-iotests/244.out
index dbab7359a9..7269b4295a 100644
--- a/tests/qemu-iotests/244.out
+++ b/tests/qemu-iotests/244.out
@@ -131,4 +131,11 @@ Offset          Length          Mapped to       File
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=67108864 data_file=TEST_DIR/t.IMGFMT.data
 Images are identical.
 Images are identical.
+
+=== Flushing should flush the data file ===
+
+wrote 512/512 bytes at offset 0
+512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+
+Success: qemu-io failed, so the data file was flushed
 *** done
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 173+ messages in thread

* [PATCH v7 43/47] iotests: Let complete_and_wait() work with commit
  2020-06-25 15:21 [PATCH v7 00/47] block: Deal with filters Max Reitz
                   ` (41 preceding siblings ...)
  2020-06-25 15:22 ` [PATCH v7 42/47] iotests: Test that qcow2's data-file is flushed Max Reitz
@ 2020-06-25 15:22 ` Max Reitz
  2020-07-27 13:35   ` Andrey Shinkevich
  2020-06-25 15:22 ` [PATCH v7 44/47] iotests: Add filter commit test cases Max Reitz
                   ` (5 subsequent siblings)
  48 siblings, 1 reply; 173+ messages in thread
From: Max Reitz @ 2020-06-25 15:22 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, qemu-devel, Max Reitz

complete_and_wait() and wait_ready() currently only work for mirror
jobs.  Let them work for active commit jobs, too.

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 tests/qemu-iotests/iotests.py | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py
index 5ea4c4df8b..57b32d8ad3 100644
--- a/tests/qemu-iotests/iotests.py
+++ b/tests/qemu-iotests/iotests.py
@@ -932,8 +932,12 @@ class QMPTestCase(unittest.TestCase):
 
     def wait_ready(self, drive='drive0'):
         """Wait until a BLOCK_JOB_READY event, and return the event."""
-        f = {'data': {'type': 'mirror', 'device': drive}}
-        return self.vm.event_wait(name='BLOCK_JOB_READY', match=f)
+        return self.vm.events_wait([
+            ('BLOCK_JOB_READY',
+             {'data': {'type': 'mirror', 'device': drive}}),
+            ('BLOCK_JOB_READY',
+             {'data': {'type': 'commit', 'device': drive}})
+        ])
 
     def wait_ready_and_cancel(self, drive='drive0'):
         self.wait_ready(drive=drive)
@@ -952,7 +956,7 @@ class QMPTestCase(unittest.TestCase):
         self.assert_qmp(result, 'return', {})
 
         event = self.wait_until_completed(drive=drive, error=completion_error)
-        self.assert_qmp(event, 'data/type', 'mirror')
+        self.assertTrue(event['data']['type'] in ['mirror', 'commit'])
 
     def pause_wait(self, job_id='job0'):
         with Timeout(3, "Timeout waiting for job to pause"):
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 173+ messages in thread

* [PATCH v7 44/47] iotests: Add filter commit test cases
  2020-06-25 15:21 [PATCH v7 00/47] block: Deal with filters Max Reitz
                   ` (42 preceding siblings ...)
  2020-06-25 15:22 ` [PATCH v7 43/47] iotests: Let complete_and_wait() work with commit Max Reitz
@ 2020-06-25 15:22 ` Max Reitz
  2020-07-27 17:45   ` Andrey Shinkevich
  2020-06-25 15:22 ` [PATCH v7 45/47] iotests: Add filter mirror " Max Reitz
                   ` (4 subsequent siblings)
  48 siblings, 1 reply; 173+ messages in thread
From: Max Reitz @ 2020-06-25 15:22 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, qemu-devel, Max Reitz

This patch adds some tests on how commit copes with filter nodes.

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 tests/qemu-iotests/040     | 177 +++++++++++++++++++++++++++++++++++++
 tests/qemu-iotests/040.out |   4 +-
 2 files changed, 179 insertions(+), 2 deletions(-)

diff --git a/tests/qemu-iotests/040 b/tests/qemu-iotests/040
index 32c82b4ec6..e7fa244738 100755
--- a/tests/qemu-iotests/040
+++ b/tests/qemu-iotests/040
@@ -713,6 +713,183 @@ class TestErrorHandling(iotests.QMPTestCase):
         self.assertTrue(iotests.compare_images(mid_img, backing_img, fmt2='raw'),
                         'target image does not match source after commit')
 
+class TestCommitWithFilters(iotests.QMPTestCase):
+    img0 = os.path.join(iotests.test_dir, '0.img')
+    img1 = os.path.join(iotests.test_dir, '1.img')
+    img2 = os.path.join(iotests.test_dir, '2.img')
+    img3 = os.path.join(iotests.test_dir, '3.img')
+
+    def do_test_io(self, read_or_write):
+        for index, pattern_file in enumerate(self.pattern_files):
+            result = qemu_io('-f', iotests.imgfmt,
+                             '-c', '{} -P {} {}M 1M'.format(read_or_write,
+                                                            index + 1, index),
+                             pattern_file)
+            self.assertFalse('Pattern verification failed' in result)
+
+    def setUp(self):
+        qemu_img('create', '-f', iotests.imgfmt, self.img0, '64M')
+        qemu_img('create', '-f', iotests.imgfmt, self.img1, '64M')
+        qemu_img('create', '-f', iotests.imgfmt, self.img2, '64M')
+        qemu_img('create', '-f', iotests.imgfmt, self.img3, '64M')
+
+        # Distributions of the patterns in the files; this is checked
+        # by tearDown() and should be changed by the test cases as is
+        # necessary
+        self.pattern_files = [self.img0, self.img1, self.img2, self.img3]
+
+        self.do_test_io('write')
+
+        self.vm = iotests.VM()
+        self.vm.launch()
+
+        result = self.vm.qmp('object-add', qom_type='throttle-group', id='tg')
+        self.assert_qmp(result, 'return', {})
+
+        result = self.vm.qmp('blockdev-add', **{
+                'node-name': 'top-filter',
+                'driver': 'throttle',
+                'throttle-group': 'tg',
+                'file': {
+                    'node-name': 'cow-3',
+                    'driver': iotests.imgfmt,
+                    'file': {
+                        'driver': 'file',
+                        'filename': self.img3
+                    },
+                    'backing': {
+                        'node-name': 'cow-2',
+                        'driver': iotests.imgfmt,
+                        'file': {
+                            'driver': 'file',
+                            'filename': self.img2
+                        },
+                        'backing': {
+                            'node-name': 'cow-1',
+                            'driver': iotests.imgfmt,
+                            'file': {
+                                'driver': 'file',
+                                'filename': self.img1
+                            },
+                            'backing': {
+                                'node-name': 'bottom-filter',
+                                'driver': 'throttle',
+                                'throttle-group': 'tg',
+                                'file': {
+                                    'node-name': 'cow-0',
+                                    'driver': iotests.imgfmt,
+                                    'file': {
+                                        'driver': 'file',
+                                        'filename': self.img0
+                                    }
+                                }
+                            }
+                        }
+                    }
+                }
+            })
+        self.assert_qmp(result, 'return', {})
+
+    def tearDown(self):
+        self.vm.shutdown()
+        self.do_test_io('read')
+
+        os.remove(self.img3)
+        os.remove(self.img2)
+        os.remove(self.img1)
+        os.remove(self.img0)
+
+    # Filters make for funny filenames, so we cannot just use
+    # self.imgX to get them
+    def get_filename(self, node):
+        return self.vm.node_info(node)['image']['filename']
+
+    def test_filterless_commit(self):
+        result = self.vm.qmp('block-commit',
+                             job_id='commit',
+                             device='top-filter',
+                             top_node='cow-2',
+                             base_node='cow-1')
+        self.assert_qmp(result, 'return', {})
+        self.wait_until_completed(drive='commit')
+
+        self.assertIsNotNone(self.vm.node_info('cow-3'))
+        self.assertIsNone(self.vm.node_info('cow-2'))
+        self.assertIsNotNone(self.vm.node_info('cow-1'))
+
+        # 2 has been comitted into 1
+        self.pattern_files[2] = self.img1
+
+    def test_commit_through_filter(self):
+        result = self.vm.qmp('block-commit',
+                             job_id='commit',
+                             device='top-filter',
+                             top_node='cow-1',
+                             base_node='cow-0')
+        self.assert_qmp(result, 'return', {})
+        self.wait_until_completed(drive='commit')
+
+        self.assertIsNotNone(self.vm.node_info('cow-2'))
+        self.assertIsNone(self.vm.node_info('cow-1'))
+        self.assertIsNone(self.vm.node_info('bottom-filter'))
+        self.assertIsNotNone(self.vm.node_info('cow-0'))
+
+        # 1 has been comitted into 0
+        self.pattern_files[1] = self.img0
+
+    def test_filtered_active_commit_with_filter(self):
+        # Add a device, so the commit job finds a parent it can change
+        # to point to the base node (so we can test that top-filter is
+        # dropped from the graph)
+        result = self.vm.qmp('device_add', id='drv0', driver='virtio-blk',
+                             drive='top-filter')
+        self.assert_qmp(result, 'return', {})
+
+        # Try to release our reference to top-filter; that should not
+        # work because drv0 uses it
+        result = self.vm.qmp('blockdev-del', node_name='top-filter')
+        self.assert_qmp(result, 'error/class', 'GenericError')
+        self.assert_qmp(result, 'error/desc', 'Node top-filter is in use')
+
+        result = self.vm.qmp('block-commit',
+                             job_id='commit',
+                             device='top-filter',
+                             base_node='cow-2')
+        self.assert_qmp(result, 'return', {})
+        self.complete_and_wait(drive='commit')
+
+        # Try to release our reference to top-filter again
+        result = self.vm.qmp('blockdev-del', node_name='top-filter')
+        self.assert_qmp(result, 'return', {})
+
+        self.assertIsNone(self.vm.node_info('top-filter'))
+        self.assertIsNone(self.vm.node_info('cow-3'))
+        self.assertIsNotNone(self.vm.node_info('cow-2'))
+
+        # Check that drv0 is now connected to cow-2
+        blockdevs = self.vm.qmp('query-block')['return']
+        drv0 = next(dev for dev in blockdevs if '/drv0' in dev['qdev'])
+        self.assertEqual(drv0['inserted']['node-name'], 'cow-2')
+
+        # 3 has been comitted into 2
+        self.pattern_files[3] = self.img2
+
+    def test_filtered_active_commit_without_filter(self):
+        result = self.vm.qmp('block-commit',
+                             job_id='commit',
+                             device='top-filter',
+                             top_node='cow-3',
+                             base_node='cow-2')
+        self.assert_qmp(result, 'return', {})
+        self.complete_and_wait(drive='commit')
+
+        self.assertIsNotNone(self.vm.node_info('top-filter'))
+        self.assertIsNone(self.vm.node_info('cow-3'))
+        self.assertIsNotNone(self.vm.node_info('cow-2'))
+
+        # 3 has been comitted into 2
+        self.pattern_files[3] = self.img2
+
 if __name__ == '__main__':
     iotests.main(supported_fmts=['qcow2', 'qed'],
                  supported_protocols=['file'])
diff --git a/tests/qemu-iotests/040.out b/tests/qemu-iotests/040.out
index 6a917130b6..4823c113d5 100644
--- a/tests/qemu-iotests/040.out
+++ b/tests/qemu-iotests/040.out
@@ -1,5 +1,5 @@
-...........................................................
+...............................................................
 ----------------------------------------------------------------------
-Ran 59 tests
+Ran 63 tests
 
 OK
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 173+ messages in thread

* [PATCH v7 45/47] iotests: Add filter mirror test cases
  2020-06-25 15:21 [PATCH v7 00/47] block: Deal with filters Max Reitz
                   ` (43 preceding siblings ...)
  2020-06-25 15:22 ` [PATCH v7 44/47] iotests: Add filter commit test cases Max Reitz
@ 2020-06-25 15:22 ` Max Reitz
  2020-08-02 11:05   ` Andrey Shinkevich
  2020-06-25 15:22 ` [PATCH v7 46/47] iotests: Add test for commit in sub directory Max Reitz
                   ` (3 subsequent siblings)
  48 siblings, 1 reply; 173+ messages in thread
From: Max Reitz @ 2020-06-25 15:22 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, qemu-devel, Max Reitz

This patch adds some test cases how mirroring relates to filters.  One
of them tests what happens when you mirror off a filtered COW node, two
others use the mirror filter node as basically our only example of an
implicitly created filter node so far (besides the commit filter).

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 tests/qemu-iotests/041     | 146 ++++++++++++++++++++++++++++++++++++-
 tests/qemu-iotests/041.out |   4 +-
 2 files changed, 147 insertions(+), 3 deletions(-)

diff --git a/tests/qemu-iotests/041 b/tests/qemu-iotests/041
index b843f88a66..588bb76626 100755
--- a/tests/qemu-iotests/041
+++ b/tests/qemu-iotests/041
@@ -21,8 +21,9 @@
 import time
 import os
 import re
+import json
 import iotests
-from iotests import qemu_img, qemu_io
+from iotests import qemu_img, qemu_img_pipe, qemu_io
 
 backing_img = os.path.join(iotests.test_dir, 'backing.img')
 target_backing_img = os.path.join(iotests.test_dir, 'target-backing.img')
@@ -1275,6 +1276,149 @@ class TestReplaces(iotests.QMPTestCase):
 
         self.vm.assert_block_path('filter0', '/file', 'target')
 
+# Tests for mirror with filters (and how the mirror filter behaves, as
+# an example for an implicit filter)
+class TestFilters(iotests.QMPTestCase):
+    def setUp(self):
+        qemu_img('create', '-f', iotests.imgfmt, backing_img, '1M')
+        qemu_img('create', '-f', iotests.imgfmt, '-b', backing_img, test_img)
+        qemu_img('create', '-f', iotests.imgfmt, '-b', backing_img, target_img)
+
+        qemu_io('-c', 'write -P 1 0 512k', backing_img)
+        qemu_io('-c', 'write -P 2 512k 512k', test_img)
+
+        self.vm = iotests.VM()
+        self.vm.launch()
+
+        result = self.vm.qmp('blockdev-add', **{
+                                'node-name': 'target',
+                                'driver': iotests.imgfmt,
+                                'file': {
+                                    'driver': 'file',
+                                    'filename': target_img
+                                },
+                                'backing': None
+                            })
+        self.assert_qmp(result, 'return', {})
+
+        self.filterless_chain = {
+                'node-name': 'source',
+                'driver': iotests.imgfmt,
+                'file': {
+                    'driver': 'file',
+                    'filename': test_img
+                },
+                'backing': {
+                    'node-name': 'backing',
+                    'driver': iotests.imgfmt,
+                    'file': {
+                        'driver': 'file',
+                        'filename': backing_img
+                    }
+                }
+            }
+
+    def tearDown(self):
+        self.vm.shutdown()
+
+        os.remove(test_img)
+        os.remove(target_img)
+        os.remove(backing_img)
+
+    def test_cor(self):
+        result = self.vm.qmp('blockdev-add', **{
+                                'node-name': 'filter',
+                                'driver': 'copy-on-read',
+                                'file': self.filterless_chain
+                            })
+        self.assert_qmp(result, 'return', {})
+
+        result = self.vm.qmp('blockdev-mirror',
+                             job_id='mirror',
+                             device='filter',
+                             target='target',
+                             sync='top')
+        self.assert_qmp(result, 'return', {})
+
+        self.complete_and_wait('mirror')
+
+        self.vm.qmp('blockdev-del', node_name='target')
+
+        target_map = qemu_img_pipe('map', '--output=json', target_img)
+        target_map = json.loads(target_map)
+
+        assert target_map[0]['start'] == 0
+        assert target_map[0]['length'] == 512 * 1024
+        assert target_map[0]['depth'] == 1
+
+        assert target_map[1]['start'] == 512 * 1024
+        assert target_map[1]['length'] == 512 * 1024
+        assert target_map[1]['depth'] == 0
+
+    def test_implicit_mirror_filter(self):
+        result = self.vm.qmp('blockdev-add', **self.filterless_chain)
+        self.assert_qmp(result, 'return', {})
+
+        # We need this so we can query from above the mirror node
+        result = self.vm.qmp('device_add',
+                             driver='virtio-blk',
+                             id='virtio',
+                             bus='pci.0',
+                             drive='source')
+        self.assert_qmp(result, 'return', {})
+
+        result = self.vm.qmp('blockdev-mirror',
+                             job_id='mirror',
+                             device='source',
+                             target='target',
+                             sync='top')
+        self.assert_qmp(result, 'return', {})
+
+        # The mirror filter is now an implicit node, so it should be
+        # invisible when querying the backing chain
+        device_info = self.vm.qmp('query-block')['return'][0]
+        assert device_info['qdev'] == '/machine/peripheral/virtio/virtio-backend'
+
+        assert device_info['inserted']['node-name'] == 'source'
+
+        image_info = device_info['inserted']['image']
+        assert image_info['filename'] == test_img
+        assert image_info['backing-image']['filename'] == backing_img
+
+        self.complete_and_wait('mirror')
+
+    def test_explicit_mirror_filter(self):
+        # Same test as above, but this time we give the mirror filter
+        # a node-name so it will not be invisible
+        result = self.vm.qmp('blockdev-add', **self.filterless_chain)
+        self.assert_qmp(result, 'return', {})
+
+        # We need this so we can query from above the mirror node
+        result = self.vm.qmp('device_add',
+                             driver='virtio-blk',
+                             id='virtio',
+                             bus='pci.0',
+                             drive='source')
+        self.assert_qmp(result, 'return', {})
+
+        result = self.vm.qmp('blockdev-mirror',
+                             job_id='mirror',
+                             device='source',
+                             target='target',
+                             sync='top',
+                             filter_node_name='mirror-filter')
+        self.assert_qmp(result, 'return', {})
+
+        # With a node-name given to it, the mirror filter should now
+        # be visible
+        device_info = self.vm.qmp('query-block')['return'][0]
+        assert device_info['qdev'] == '/machine/peripheral/virtio/virtio-backend'
+
+        assert device_info['inserted']['node-name'] == 'mirror-filter'
+
+        self.complete_and_wait('mirror')
+
+
 if __name__ == '__main__':
     iotests.main(supported_fmts=['qcow2', 'qed'],
                  supported_protocols=['file'],
diff --git a/tests/qemu-iotests/041.out b/tests/qemu-iotests/041.out
index 53abe11d73..46651953e8 100644
--- a/tests/qemu-iotests/041.out
+++ b/tests/qemu-iotests/041.out
@@ -1,5 +1,5 @@
-........................................................................................................
+...........................................................................................................
 ----------------------------------------------------------------------
-Ran 104 tests
+Ran 107 tests
 
 OK
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 173+ messages in thread

* [PATCH v7 46/47] iotests: Add test for commit in sub directory
  2020-06-25 15:21 [PATCH v7 00/47] block: Deal with filters Max Reitz
                   ` (44 preceding siblings ...)
  2020-06-25 15:22 ` [PATCH v7 45/47] iotests: Add filter mirror " Max Reitz
@ 2020-06-25 15:22 ` Max Reitz
  2020-08-02 12:13   ` Andrey Shinkevich
  2020-06-25 15:22 ` [PATCH v7 47/47] iotests: Test committing to overridden backing Max Reitz
                   ` (2 subsequent siblings)
  48 siblings, 1 reply; 173+ messages in thread
From: Max Reitz @ 2020-06-25 15:22 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, qemu-devel, Max Reitz

Add a test for committing an overlay in a sub directory to one of the
images in its backing chain, using both relative and absolute filenames.

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 tests/qemu-iotests/020     | 44 ++++++++++++++++++++++++++++++++++++++
 tests/qemu-iotests/020.out | 10 +++++++++
 2 files changed, 54 insertions(+)

diff --git a/tests/qemu-iotests/020 b/tests/qemu-iotests/020
index 20f8f185d0..d5b5d34058 100755
--- a/tests/qemu-iotests/020
+++ b/tests/qemu-iotests/020
@@ -31,6 +31,11 @@ _cleanup()
     _cleanup_test_img
     _rm_test_img "$TEST_IMG.base"
     _rm_test_img "$TEST_IMG.orig"
+
+    _rm_test_img "$TEST_DIR/subdir/t.$IMGFMT.base"
+    _rm_test_img "$TEST_DIR/subdir/t.$IMGFMT.mid"
+    _rm_test_img "$TEST_DIR/subdir/t.$IMGFMT"
+    rmdir "$TEST_DIR/subdir" &> /dev/null
 }
 trap "_cleanup; exit \$status" 0 1 2 3 15
 
@@ -134,6 +139,45 @@ $QEMU_IO -c 'writev 0 64k' "$TEST_IMG" | _filter_qemu_io
 $QEMU_IMG commit "$TEST_IMG"
 _cleanup
 
+
+echo
+echo 'Testing commit in sub-directory with relative filenames'
+echo
+
+pushd "$TEST_DIR" > /dev/null
+
+mkdir subdir
+
+TEST_IMG="subdir/t.$IMGFMT.base" _make_test_img 1M
+TEST_IMG="subdir/t.$IMGFMT.mid" _make_test_img -b "t.$IMGFMT.base"
+TEST_IMG="subdir/t.$IMGFMT" _make_test_img -b "t.$IMGFMT.mid"
+
+# Should work
+$QEMU_IMG commit -b "t.$IMGFMT.mid" "subdir/t.$IMGFMT"
+
+# Might theoretically work, but does not in practice (we have to
+# decide between this and the above; and since we always represent
+# backing file names as relative to the overlay, we go for the above)
+$QEMU_IMG commit -b "subdir/t.$IMGFMT.mid" "subdir/t.$IMGFMT" 2>&1 | \
+    _filter_imgfmt
+
+# This should work as well
+$QEMU_IMG commit -b "$TEST_DIR/subdir/t.$IMGFMT.mid" "subdir/t.$IMGFMT"
+
+popd > /dev/null
+
+# Now let's try with just absolute filenames
+# (This will not work with external data files, though, because when
+# using relative paths for those, qemu will always resolve them
+# relative to its CWD.  Therefore, it cannot find those data files now
+# that we left $TEST_DIR.)
+if _get_data_file '' > /dev/null; then
+    echo 'Image committed.' # Skip test
+else
+    $QEMU_IMG commit -b "$TEST_DIR/subdir/t.$IMGFMT.mid" \
+        "$TEST_DIR/subdir/t.$IMGFMT"
+fi
+
 # success, all done
 echo "*** done"
 rm -f $seq.full
diff --git a/tests/qemu-iotests/020.out b/tests/qemu-iotests/020.out
index 4b722b2dd0..228c37dded 100644
--- a/tests/qemu-iotests/020.out
+++ b/tests/qemu-iotests/020.out
@@ -1094,4 +1094,14 @@ Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1048576 backing_file=json:{'driv
 wrote 65536/65536 bytes at offset 0
 64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 qemu-img: Block job failed: No space left on device
+
+Testing commit in sub-directory with relative filenames
+
+Formatting 'subdir/t.IMGFMT.base', fmt=IMGFMT size=1048576
+Formatting 'subdir/t.IMGFMT.mid', fmt=IMGFMT size=1048576 backing_file=t.IMGFMT.base
+Formatting 'subdir/t.IMGFMT', fmt=IMGFMT size=1048576 backing_file=t.IMGFMT.mid
+Image committed.
+qemu-img: Did not find 'subdir/t.IMGFMT.mid' in the backing chain of 'subdir/t.IMGFMT'
+Image committed.
+Image committed.
 *** done
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 173+ messages in thread

* [PATCH v7 47/47] iotests: Test committing to overridden backing
  2020-06-25 15:21 [PATCH v7 00/47] block: Deal with filters Max Reitz
                   ` (45 preceding siblings ...)
  2020-06-25 15:22 ` [PATCH v7 46/47] iotests: Add test for commit in sub directory Max Reitz
@ 2020-06-25 15:22 ` Max Reitz
  2020-08-02 11:43   ` Andrey Shinkevich
  2020-07-08 17:20 ` [PATCH v7 00/47] block: Deal with filters Andrey Shinkevich
  2020-08-24 15:15 ` Kevin Wolf
  48 siblings, 1 reply; 173+ messages in thread
From: Max Reitz @ 2020-06-25 15:22 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, qemu-devel, Max Reitz

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 tests/qemu-iotests/040     | 61 ++++++++++++++++++++++++++++++++++++++
 tests/qemu-iotests/040.out |  4 +--
 2 files changed, 63 insertions(+), 2 deletions(-)

diff --git a/tests/qemu-iotests/040 b/tests/qemu-iotests/040
index e7fa244738..dfd46ddcbe 100755
--- a/tests/qemu-iotests/040
+++ b/tests/qemu-iotests/040
@@ -890,6 +890,67 @@ class TestCommitWithFilters(iotests.QMPTestCase):
         # 3 has been comitted into 2
         self.pattern_files[3] = self.img2
 
+class TestCommitWithOverriddenBacking(iotests.QMPTestCase):
+    img_base_a = os.path.join(iotests.test_dir, 'base_a.img')
+    img_base_b = os.path.join(iotests.test_dir, 'base_b.img')
+    img_top = os.path.join(iotests.test_dir, 'top.img')
+
+    def setUp(self):
+        qemu_img('create', '-f', iotests.imgfmt, self.img_base_a, '1M')
+        qemu_img('create', '-f', iotests.imgfmt, self.img_base_b, '1M')
+        qemu_img('create', '-f', iotests.imgfmt, '-b', self.img_base_a, \
+                 self.img_top)
+
+        self.vm = iotests.VM()
+        self.vm.launch()
+
+        # Use base_b instead of base_a as the backing of top
+        result = self.vm.qmp('blockdev-add', **{
+                                'node-name': 'top',
+                                'driver': iotests.imgfmt,
+                                'file': {
+                                    'driver': 'file',
+                                    'filename': self.img_top
+                                },
+                                'backing': {
+                                    'node-name': 'base',
+                                    'driver': iotests.imgfmt,
+                                    'file': {
+                                        'driver': 'file',
+                                        'filename': self.img_base_b
+                                    }
+                                }
+                            })
+        self.assert_qmp(result, 'return', {})
+
+    def tearDown(self):
+        self.vm.shutdown()
+        os.remove(self.img_top)
+        os.remove(self.img_base_a)
+        os.remove(self.img_base_b)
+
+    def test_commit_to_a(self):
+        # Try committing to base_a (which should fail, as top's
+        # backing image is base_b instead)
+        result = self.vm.qmp('block-commit',
+                             job_id='commit',
+                             device='top',
+                             base=self.img_base_a)
+        self.assert_qmp(result, 'error/class', 'GenericError')
+
+    def test_commit_to_b(self):
+        # Try committing to base_b (which should work, since that is
+        # actually top's backing image)
+        result = self.vm.qmp('block-commit',
+                             job_id='commit',
+                             device='top',
+                             base=self.img_base_b)
+        self.assert_qmp(result, 'return', {})
+
+        self.vm.event_wait('BLOCK_JOB_READY')
+        self.vm.qmp('block-job-complete', device='commit')
+        self.vm.event_wait('BLOCK_JOB_COMPLETED')
+
 if __name__ == '__main__':
     iotests.main(supported_fmts=['qcow2', 'qed'],
                  supported_protocols=['file'])
diff --git a/tests/qemu-iotests/040.out b/tests/qemu-iotests/040.out
index 4823c113d5..1bb1dc5f0e 100644
--- a/tests/qemu-iotests/040.out
+++ b/tests/qemu-iotests/040.out
@@ -1,5 +1,5 @@
-...............................................................
+.................................................................
 ----------------------------------------------------------------------
-Ran 63 tests
+Ran 65 tests
 
 OK
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 173+ messages in thread

* Re: [PATCH v7 00/47] block: Deal with filters
  2020-06-25 15:21 [PATCH v7 00/47] block: Deal with filters Max Reitz
                   ` (46 preceding siblings ...)
  2020-06-25 15:22 ` [PATCH v7 47/47] iotests: Test committing to overridden backing Max Reitz
@ 2020-07-08 17:20 ` Andrey Shinkevich
  2020-07-08 17:32   ` Eric Blake
  2020-07-08 20:47   ` Eric Blake
  2020-08-24 15:15 ` Kevin Wolf
  48 siblings, 2 replies; 173+ messages in thread
From: Andrey Shinkevich @ 2020-07-08 17:20 UTC (permalink / raw)
  To: Max Reitz, qemu-block
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, qemu-devel

On 25.06.2020 18:21, Max Reitz wrote:
> v6: https://lists.nongnu.org/archive/html/qemu-devel/2019-08/msg01715.html
>
> Branch: https://github.com/XanClic/qemu.git child-access-functions-v7
> Branch: https://git.xanclic.moe/XanClic/qemu.git child-access-functions-v7
>
>
I cloned the branch from the github and built successfully.

Running the iotests reports multiple errors of such a kind:

128: readarray -td '' formatting_line < <(sed -e 's/, fmt=/\x0/')

"./common.filter: line 128: readarray: -d: invalid option"

introduced with the commit

a7399eb iotests: Make _filter_img_create more active


Andrey



^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [PATCH v7 02/47] block: Add chain helper functions
  2020-06-25 15:21 ` [PATCH v7 02/47] block: Add chain helper functions Max Reitz
@ 2020-07-08 17:20   ` Andrey Shinkevich
  2020-07-09  8:24     ` Max Reitz
  2020-07-13 10:18   ` Vladimir Sementsov-Ogievskiy
  1 sibling, 1 reply; 173+ messages in thread
From: Andrey Shinkevich @ 2020-07-08 17:20 UTC (permalink / raw)
  To: Max Reitz, qemu-block
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, qemu-devel

On 25.06.2020 18:21, Max Reitz wrote:
> Add some helper functions for skipping filters in a chain of block
> nodes.
>
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>   include/block/block_int.h |  3 +++
>   block.c                   | 55 +++++++++++++++++++++++++++++++++++++++
>   2 files changed, 58 insertions(+)
>
> diff --git a/include/block/block_int.h b/include/block/block_int.h
> index bb3457c5e8..5da793bfc3 100644
> --- a/include/block/block_int.h
> +++ b/include/block/block_int.h
> @@ -1382,6 +1382,9 @@ BdrvChild *bdrv_cow_child(BlockDriverState *bs);
>   BdrvChild *bdrv_filter_child(BlockDriverState *bs);
>   BdrvChild *bdrv_filter_or_cow_child(BlockDriverState *bs);
>   BdrvChild *bdrv_primary_child(BlockDriverState *bs);
> +BlockDriverState *bdrv_skip_implicit_filters(BlockDriverState *bs);
> +BlockDriverState *bdrv_skip_filters(BlockDriverState *bs);
> +BlockDriverState *bdrv_backing_chain_next(BlockDriverState *bs);
>   
>   static inline BlockDriverState *child_bs(BdrvChild *child)
>   {
> diff --git a/block.c b/block.c
> index 5a42ef49fd..0a0b855261 100644
> --- a/block.c
> +++ b/block.c
> @@ -7008,3 +7008,58 @@ BdrvChild *bdrv_primary_child(BlockDriverState *bs)
>   
>       return NULL;
>   }
> +
> +static BlockDriverState *bdrv_do_skip_filters(BlockDriverState *bs,
> +                                              bool stop_on_explicit_filter)
> +{
> +    BdrvChild *c;
> +
> +    if (!bs) {
> +        return NULL;
> +    }
> +
> +    while (!(stop_on_explicit_filter && !bs->implicit)) {
> +        c = bdrv_filter_child(bs);
> +        if (!c) {
> +            break;
> +        }
> +        bs = c->bs;

Could it be child_bs(bs) ?

Andrey

> +    }
Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>



^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [PATCH v7 01/47] block: Add child access functions
  2020-06-25 15:21 ` [PATCH v7 01/47] block: Add child access functions Max Reitz
@ 2020-07-08 17:22   ` Andrey Shinkevich
  2020-07-13  9:06   ` Vladimir Sementsov-Ogievskiy
  2020-07-13  9:57   ` Vladimir Sementsov-Ogievskiy
  2 siblings, 0 replies; 173+ messages in thread
From: Andrey Shinkevich @ 2020-07-08 17:22 UTC (permalink / raw)
  To: Max Reitz, qemu-block
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, qemu-devel


On 25.06.2020 18:21, Max Reitz wrote:
> There are BDS children that the general block layer code can access,
> namely bs->file and bs->backing.  Since the introduction of filters and
> external data files, their meaning is not quite clear.  bs->backing can
> be a COW source, or it can be a filtered child; bs->file can be a
> filtered child, it can be data and metadata storage, or it can be just
> metadata storage.
>
> This overloading really is not helpful.  This patch adds functions that
> retrieve the correct child for each exact purpose.  Later patches in
> this series will make use of them.  Doing so will allow us to handle
> filter nodes in a meaningful way.
>
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>   include/block/block_int.h | 44 +++++++++++++++++--
>   block.c                   | 90 +++++++++++++++++++++++++++++++++++++++
>   2 files changed, 131 insertions(+), 3 deletions(-)
>
> diff --git a/include/block/block_int.h b/include/block/block_int.h
> index 1b86b59af1..bb3457c5e8 100644
> --- a/include/block/block_int.h
> +++ b/include/block/block_int.h
> @@ -90,9 +90,17 @@ struct BlockDriver {
>       int instance_size;
>   
> ...


Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>



^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [PATCH v7 03/47] block: bdrv_cow_child() for bdrv_has_zero_init()
  2020-06-25 15:21 ` [PATCH v7 03/47] block: bdrv_cow_child() for bdrv_has_zero_init() Max Reitz
@ 2020-07-08 17:23   ` Andrey Shinkevich
  2020-08-07  9:37   ` Vladimir Sementsov-Ogievskiy
  1 sibling, 0 replies; 173+ messages in thread
From: Andrey Shinkevich @ 2020-07-08 17:23 UTC (permalink / raw)
  To: Max Reitz, qemu-block
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, qemu-devel


On 25.06.2020 18:21, Max Reitz wrote:
> bdrv_has_zero_init() and the related bdrv_unallocated_blocks_are_zero()
> should use bdrv_cow_child() if they want to check whether the given BDS
> has a COW backing file.
>
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>   block.c | 4 ++--
>   1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/block.c b/block.c
> index 0a0b855261..f3e2aae49c 100644
> --- a/block.c
> +++ b/block.c
> @@ -5394,7 +5394,7 @@ int bdrv_has_zero_init(BlockDriverState *bs)
>   
>       /* If BS is a copy on write image, it is initialized to
>          the contents of the base image, which may not be zeroes.  */
> -    if (bs->backing) {
> +    if (bdrv_cow_child(bs)) {
>           return 0;
>       }
>       if (bs->drv->bdrv_has_zero_init) {
> @@ -5412,7 +5412,7 @@ bool bdrv_unallocated_blocks_are_zero(BlockDriverState *bs)
>   {
>       BlockDriverInfo bdi;
>   
> -    if (bs->backing) {
> +    if (bdrv_cow_child(bs)) {
>           return false;
>       }
>   
Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>


^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [PATCH v7 04/47] block: bdrv_set_backing_hd() is about bs->backing
  2020-06-25 15:21 ` [PATCH v7 04/47] block: bdrv_set_backing_hd() is about bs->backing Max Reitz
@ 2020-07-08 17:24   ` Andrey Shinkevich
  0 siblings, 0 replies; 173+ messages in thread
From: Andrey Shinkevich @ 2020-07-08 17:24 UTC (permalink / raw)
  To: Max Reitz, qemu-block
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, qemu-devel

On 25.06.2020 18:21, Max Reitz wrote:
> bdrv_set_backing_hd() is a function that explicitly cares about the
> bs->backing child.  Highlight that in its description and use
> child_bs(bs->backing) instead of backing_bs(bs) to make it more obvious.
>
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>   block.c | 4 ++--
>   1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/block.c b/block.c
> index f3e2aae49c..d139ffb57d 100644
> --- a/block.c
> +++ b/block.c
> @@ -2846,7 +2846,7 @@ static BdrvChildRole bdrv_backing_role(BlockDriverState *bs)
>   }
>   
>   /*
> - * Sets the backing file link of a BDS. A new reference is created; callers
> + * Sets the bs->backing link of a BDS. A new reference is created; callers
>    * which don't need their own reference any more must call bdrv_unref().
>    */
>   void bdrv_set_backing_hd(BlockDriverState *bs, BlockDriverState *backing_hd,
> @@ -2855,7 +2855,7 @@ void bdrv_set_backing_hd(BlockDriverState *bs, BlockDriverState *backing_hd,
>       bool update_inherits_from = bdrv_chain_contains(bs, backing_hd) &&
>           bdrv_inherits_from_recursive(backing_hd, bs);
>   
> -    if (bdrv_is_backing_chain_frozen(bs, backing_bs(bs), errp)) {
> +    if (bdrv_is_backing_chain_frozen(bs, child_bs(bs->backing), errp)) {
>           return;
>       }
>   
Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>


^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [PATCH v7 05/47] block: Include filters when freezing backing chain
  2020-06-25 15:21 ` [PATCH v7 05/47] block: Include filters when freezing backing chain Max Reitz
@ 2020-07-08 17:25   ` Andrey Shinkevich
  0 siblings, 0 replies; 173+ messages in thread
From: Andrey Shinkevich @ 2020-07-08 17:25 UTC (permalink / raw)
  To: Max Reitz, qemu-block
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, qemu-devel

On 25.06.2020 18:21, Max Reitz wrote:
> In order to make filters work in backing chains, the associated
> functions must be able to deal with them and freeze both COW and filter
> child links.
>
> While at it, add some comments that note which functions require their
> caller to ensure that a given child link is not frozen, and how the
> callers do so.
>
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>   block.c | 60 +++++++++++++++++++++++++++++++++++++--------------------
>   1 file changed, 39 insertions(+), 21 deletions(-)
>
> diff --git a/block.c b/block.c
> index d139ffb57d..b59bd776cd 100644
> --- a/block.c
> +++ b/block.c


Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>



^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [PATCH v7 00/47] block: Deal with filters
  2020-07-08 17:20 ` [PATCH v7 00/47] block: Deal with filters Andrey Shinkevich
@ 2020-07-08 17:32   ` Eric Blake
  2020-07-08 19:46     ` Andrey Shinkevich
  2020-07-08 20:47   ` Eric Blake
  1 sibling, 1 reply; 173+ messages in thread
From: Eric Blake @ 2020-07-08 17:32 UTC (permalink / raw)
  To: Andrey Shinkevich, Max Reitz, qemu-block
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, qemu-devel

On 7/8/20 12:20 PM, Andrey Shinkevich wrote:
> On 25.06.2020 18:21, Max Reitz wrote:
>> v6: 
>> https://lists.nongnu.org/archive/html/qemu-devel/2019-08/msg01715.html
>>
>> Branch: https://github.com/XanClic/qemu.git child-access-functions-v7
>> Branch: https://git.xanclic.moe/XanClic/qemu.git 
>> child-access-functions-v7
>>
>>
> I cloned the branch from the github and built successfully.
> 
> Running the iotests reports multiple errors of such a kind:
> 
> 128: readarray -td '' formatting_line < <(sed -e 's/, fmt=/\x0/')
> 
> "./common.filter: line 128: readarray: -d: invalid option"
> 

Arrgh. If I'm reading bash's changelog correctly, readarray -d was 
introduced in bash 4.4, so I'm guessing you're still on 4.3 or earlier? 
What bash version and platform are you using?

> introduced with the commit
> 
> a7399eb iotests: Make _filter_img_create more active
> 
> 
> Andrey
> 
> 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org



^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [PATCH v7 06/47] block: Drop bdrv_is_encrypted()
  2020-06-25 15:21 ` [PATCH v7 06/47] block: Drop bdrv_is_encrypted() Max Reitz
@ 2020-07-08 17:41   ` Andrey Shinkevich
  0 siblings, 0 replies; 173+ messages in thread
From: Andrey Shinkevich @ 2020-07-08 17:41 UTC (permalink / raw)
  To: Max Reitz, qemu-block
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, qemu-devel

On 25.06.2020 18:21, Max Reitz wrote:
> The original purpose of bdrv_is_encrypted() was to inquire whether a BDS
> can be used without the user entering a password or not.  It has not
> been used for that purpose for quite some time.
>
> Actually, it is not even fit for that purpose, because to answer that
> question, it would have recursively query all of the given node's
> children.
>
> So now we have to decide in which direction we want to fix
> bdrv_is_encrypted(): Recursively query all children, or drop it and just
> use bs->encrypted to get the current node's status?
>
> Nowadays, its only purpose is to report through bdrv_query_image_info()
> whether the given image is encrypted or not.  For this purpose, it is
> probably more interesting to see whether a given node itself is
> encrypted or not (otherwise, a management application cannot discern for
> certain which nodes are really encrypted and which just have encrypted
> children).
>
> Suggested-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>   include/block/block.h | 1 -
>   block.c               | 8 --------
>   block/qapi.c          | 2 +-
>   3 files changed, 1 insertion(+), 10 deletions(-)
>
> diff --git a/include/block/block.h b/include/block/block.h
> index 86f9728f00..0080fe1311 100644
> --- a/include/block/block.h
> +++ b/include/block/block.h
> @@ -538,7 +538,6 @@ BlockDriverState *bdrv_next(BdrvNextIterator *it);
>   void bdrv_next_cleanup(BdrvNextIterator *it);
>   
>   BlockDriverState *bdrv_next_monitor_owned(BlockDriverState *bs);
> -bool bdrv_is_encrypted(BlockDriverState *bs);
>   void bdrv_iterate_format(void (*it)(void *opaque, const char *name),
>                            void *opaque, bool read_only);
>   const char *bdrv_get_node_name(const BlockDriverState *bs);
> diff --git a/block.c b/block.c
> index b59bd776cd..76277ea4e0 100644
> --- a/block.c
> +++ b/block.c
> @@ -5044,14 +5044,6 @@ bool bdrv_is_sg(BlockDriverState *bs)
>       return bs->sg;
>   }
>   
> -bool bdrv_is_encrypted(BlockDriverState *bs)
> -{
> -    if (bs->backing && bs->backing->bs->encrypted) {
> -        return true;
> -    }
> -    return bs->encrypted;
> -}
> -
>   const char *bdrv_get_format_name(BlockDriverState *bs)
>   {
>       return bs->drv ? bs->drv->format_name : NULL;
> diff --git a/block/qapi.c b/block/qapi.c
> index afd9f3b4a7..4807a2b344 100644
> --- a/block/qapi.c
> +++ b/block/qapi.c
> @@ -288,7 +288,7 @@ void bdrv_query_image_info(BlockDriverState *bs,
>       info->virtual_size    = size;
>       info->actual_size     = bdrv_get_allocated_file_size(bs);
>       info->has_actual_size = info->actual_size >= 0;
> -    if (bdrv_is_encrypted(bs)) {
> +    if (bs->encrypted) {
>           info->encrypted = true;
>           info->has_encrypted = true;
>       }
Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>


^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [PATCH v7 07/47] block: Add bdrv_supports_compressed_writes()
  2020-06-25 15:21 ` [PATCH v7 07/47] block: Add bdrv_supports_compressed_writes() Max Reitz
@ 2020-07-08 17:48   ` Andrey Shinkevich
  0 siblings, 0 replies; 173+ messages in thread
From: Andrey Shinkevich @ 2020-07-08 17:48 UTC (permalink / raw)
  To: Max Reitz, qemu-block
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, qemu-devel

On 25.06.2020 18:21, Max Reitz wrote:
> Filters cannot compress data themselves but they have to implement
> .bdrv_co_pwritev_compressed() still (or they cannot forward compressed
> writes).  Therefore, checking whether
> bs->drv->bdrv_co_pwritev_compressed is non-NULL is not sufficient to
> know whether the node can actually handle compressed writes.  This
> function looks down the filter chain to see whether there is a
> non-filter that can actually convert the compressed writes into
> compressed data (and thus normal writes).
>
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>   include/block/block.h |  1 +
>   block.c               | 23 +++++++++++++++++++++++
>   2 files changed, 24 insertions(+)
>
> diff --git a/include/block/block.h b/include/block/block.h
> index 0080fe1311..a905a5ec05 100644
> --- a/include/block/block.h
> +++ b/include/block/block.h
> @@ -538,6 +538,7 @@ BlockDriverState *bdrv_next(BdrvNextIterator *it);
>   void bdrv_next_cleanup(BdrvNextIterator *it);
>   
>   BlockDriverState *bdrv_next_monitor_owned(BlockDriverState *bs);
> +bool bdrv_supports_compressed_writes(BlockDriverState *bs);
>   void bdrv_iterate_format(void (*it)(void *opaque, const char *name),
>                            void *opaque, bool read_only);
>   const char *bdrv_get_node_name(const BlockDriverState *bs);
> diff --git a/block.c b/block.c
> index 76277ea4e0..6449f3a11d 100644
> --- a/block.c
> +++ b/block.c
> @@ -5044,6 +5044,29 @@ bool bdrv_is_sg(BlockDriverState *bs)
>       return bs->sg;
>   }
>   
> +/**
> + * Return whether the given node supports compressed writes.
> + */
> +bool bdrv_supports_compressed_writes(BlockDriverState *bs)
> +{
> +    BlockDriverState *filtered;
> +
> +    if (!bs->drv || !block_driver_can_compress(bs->drv)) {
> +        return false;
> +    }
> +
> +    filtered = bdrv_filter_bs(bs);
> +    if (filtered) {
> +        /*
> +         * Filters can only forward compressed writes, so we have to
> +         * check the child.
> +         */
> +        return bdrv_supports_compressed_writes(filtered);
> +    }
> +
> +    return true;
> +}
> +
>   const char *bdrv_get_format_name(BlockDriverState *bs)
>   {
>       return bs->drv ? bs->drv->format_name : NULL;


Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>



^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [PATCH v7 08/47] throttle: Support compressed writes
  2020-06-25 15:21 ` [PATCH v7 08/47] throttle: Support compressed writes Max Reitz
@ 2020-07-08 17:52   ` Andrey Shinkevich
  0 siblings, 0 replies; 173+ messages in thread
From: Andrey Shinkevich @ 2020-07-08 17:52 UTC (permalink / raw)
  To: Max Reitz, qemu-block
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, qemu-devel

On 25.06.2020 18:21, Max Reitz wrote:
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>   block/throttle.c | 10 ++++++++++
>   1 file changed, 10 insertions(+)
>
> diff --git a/block/throttle.c b/block/throttle.c
> index 0ebbad0743..f6e619aca2 100644
> --- a/block/throttle.c
> +++ b/block/throttle.c
> @@ -154,6 +154,15 @@ static int coroutine_fn throttle_co_pdiscard(BlockDriverState *bs,
>       return bdrv_co_pdiscard(bs->file, offset, bytes);
>   }
>   
> +static int coroutine_fn throttle_co_pwritev_compressed(BlockDriverState *bs,
> +                                                       uint64_t offset,
> +                                                       uint64_t bytes,
> +                                                       QEMUIOVector *qiov)
> +{
> +    return throttle_co_pwritev(bs, offset, bytes, qiov,
> +                               BDRV_REQ_WRITE_COMPRESSED);
> +}
> +
>   static int throttle_co_flush(BlockDriverState *bs)
>   {
>       return bdrv_co_flush(bs->file->bs);
> @@ -246,6 +255,7 @@ static BlockDriver bdrv_throttle = {
>   
>       .bdrv_co_pwrite_zeroes              =   throttle_co_pwrite_zeroes,
>       .bdrv_co_pdiscard                   =   throttle_co_pdiscard,
> +    .bdrv_co_pwritev_compressed         =   throttle_co_pwritev_compressed,
>   
>       .bdrv_attach_aio_context            =   throttle_attach_aio_context,
>       .bdrv_detach_aio_context            =   throttle_detach_aio_context,


Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>



^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [PATCH v7 09/47] copy-on-read: Support compressed writes
  2020-06-25 15:21 ` [PATCH v7 09/47] copy-on-read: " Max Reitz
@ 2020-07-08 17:54   ` Andrey Shinkevich
  0 siblings, 0 replies; 173+ messages in thread
From: Andrey Shinkevich @ 2020-07-08 17:54 UTC (permalink / raw)
  To: Max Reitz, qemu-block
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, qemu-devel

On 25.06.2020 18:21, Max Reitz wrote:
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>   block/copy-on-read.c | 11 +++++++++++
>   1 file changed, 11 insertions(+)
>
> diff --git a/block/copy-on-read.c b/block/copy-on-read.c
> index a6e3c74a68..a6a864f147 100644
> --- a/block/copy-on-read.c
> +++ b/block/copy-on-read.c
> @@ -107,6 +107,16 @@ static int coroutine_fn cor_co_pdiscard(BlockDriverState *bs,
>   }
>   
>   
> +static int coroutine_fn cor_co_pwritev_compressed(BlockDriverState *bs,
> +                                                  uint64_t offset,
> +                                                  uint64_t bytes,
> +                                                  QEMUIOVector *qiov)
> +{
> +    return bdrv_co_pwritev(bs->file, offset, bytes, qiov,
> +                           BDRV_REQ_WRITE_COMPRESSED);
> +}
> +
> +
>   static void cor_eject(BlockDriverState *bs, bool eject_flag)
>   {
>       bdrv_eject(bs->file->bs, eject_flag);
> @@ -131,6 +141,7 @@ static BlockDriver bdrv_copy_on_read = {
>       .bdrv_co_pwritev                    = cor_co_pwritev,
>       .bdrv_co_pwrite_zeroes              = cor_co_pwrite_zeroes,
>       .bdrv_co_pdiscard                   = cor_co_pdiscard,
> +    .bdrv_co_pwritev_compressed         = cor_co_pwritev_compressed,
>   
>       .bdrv_eject                         = cor_eject,
>       .bdrv_lock_medium                   = cor_lock_medium,


Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>



^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [PATCH v7 10/47] mirror-top: Support compressed writes
  2020-06-25 15:21 ` [PATCH v7 10/47] mirror-top: " Max Reitz
@ 2020-07-08 17:58   ` Andrey Shinkevich
  2020-08-18 10:27   ` Kevin Wolf
  1 sibling, 0 replies; 173+ messages in thread
From: Andrey Shinkevich @ 2020-07-08 17:58 UTC (permalink / raw)
  To: Max Reitz, qemu-block
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, qemu-devel

On 25.06.2020 18:21, Max Reitz wrote:
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>   block/mirror.c | 10 ++++++++++
>   1 file changed, 10 insertions(+)
>
> diff --git a/block/mirror.c b/block/mirror.c
> index e8e8844afc..469acf4600 100644
> --- a/block/mirror.c
> +++ b/block/mirror.c
> @@ -1480,6 +1480,15 @@ static int coroutine_fn bdrv_mirror_top_pdiscard(BlockDriverState *bs,
>                                       NULL, 0);
>   }
>   
> +static int coroutine_fn bdrv_mirror_top_pwritev_compressed(BlockDriverState *bs,
> +                                                           uint64_t offset,
> +                                                           uint64_t bytes,
> +                                                           QEMUIOVector *qiov)
> +{
> +    return bdrv_mirror_top_pwritev(bs, offset, bytes, qiov,
> +                                   BDRV_REQ_WRITE_COMPRESSED);
> +}
> +
>   static void bdrv_mirror_top_refresh_filename(BlockDriverState *bs)
>   {
>       if (bs->backing == NULL) {
> @@ -1526,6 +1535,7 @@ static BlockDriver bdrv_mirror_top = {
>       .bdrv_co_pwritev            = bdrv_mirror_top_pwritev,
>       .bdrv_co_pwrite_zeroes      = bdrv_mirror_top_pwrite_zeroes,
>       .bdrv_co_pdiscard           = bdrv_mirror_top_pdiscard,
> +    .bdrv_co_pwritev_compressed = bdrv_mirror_top_pwritev_compressed,
>       .bdrv_co_flush              = bdrv_mirror_top_flush,
>       .bdrv_co_block_status       = bdrv_co_block_status_from_backing,
>       .bdrv_refresh_filename      = bdrv_mirror_top_refresh_filename,


Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>



^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [PATCH v7 11/47] backup-top: Support compressed writes
  2020-06-25 15:21 ` [PATCH v7 11/47] backup-top: " Max Reitz
@ 2020-07-08 17:59   ` Andrey Shinkevich
  0 siblings, 0 replies; 173+ messages in thread
From: Andrey Shinkevich @ 2020-07-08 17:59 UTC (permalink / raw)
  To: Max Reitz, qemu-block
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, qemu-devel

On 25.06.2020 18:21, Max Reitz wrote:
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>   block/backup-top.c | 10 ++++++++++
>   1 file changed, 10 insertions(+)
>
> diff --git a/block/backup-top.c b/block/backup-top.c
> index af2f20f346..f304df8f26 100644
> --- a/block/backup-top.c
> +++ b/block/backup-top.c
> @@ -99,6 +99,15 @@ static coroutine_fn int backup_top_co_pwritev(BlockDriverState *bs,
>       return bdrv_co_pwritev(bs->backing, offset, bytes, qiov, flags);
>   }
>   
> +static coroutine_fn int backup_top_co_pwritev_compressed(BlockDriverState *bs,
> +                                                         uint64_t offset,
> +                                                         uint64_t bytes,
> +                                                         QEMUIOVector *qiov)
> +{
> +    return backup_top_co_pwritev(bs, offset, bytes, qiov,
> +                                 BDRV_REQ_WRITE_COMPRESSED);
> +}
> +
>   static int coroutine_fn backup_top_co_flush(BlockDriverState *bs)
>   {
>       if (!bs->backing) {
> @@ -173,6 +182,7 @@ BlockDriver bdrv_backup_top_filter = {
>       .bdrv_co_pwritev            = backup_top_co_pwritev,
>       .bdrv_co_pwrite_zeroes      = backup_top_co_pwrite_zeroes,
>       .bdrv_co_pdiscard           = backup_top_co_pdiscard,
> +    .bdrv_co_pwritev_compressed = backup_top_co_pwritev_compressed,
>       .bdrv_co_flush              = backup_top_co_flush,
>   
>       .bdrv_co_block_status       = bdrv_co_block_status_from_backing,

Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>



^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [PATCH v7 12/47] block: Use bdrv_filter_(bs|child) where obvious
  2020-06-25 15:21 ` [PATCH v7 12/47] block: Use bdrv_filter_(bs|child) where obvious Max Reitz
@ 2020-07-08 18:24   ` Andrey Shinkevich
  2020-07-09  8:59     ` Max Reitz
  0 siblings, 1 reply; 173+ messages in thread
From: Andrey Shinkevich @ 2020-07-08 18:24 UTC (permalink / raw)
  To: Max Reitz, qemu-block
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, qemu-devel

On 25.06.2020 18:21, Max Reitz wrote:
> Places that use patterns like
>
>      if (bs->drv->is_filter && bs->file) {
>          ... something about bs->file->bs ...
>      }
>
> should be
>
>      BlockDriverState *filtered = bdrv_filter_bs(bs);
>      if (filtered) {
>          ... something about @filtered ...
>      }
>
> instead.
>
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>   block.c                        | 31 ++++++++++++++++++++-----------
>   block/io.c                     |  7 +++++--
>   migration/block-dirty-bitmap.c |  8 +-------
>   3 files changed, 26 insertions(+), 20 deletions(-)
>
...
> diff --git a/block/io.c b/block/io.c
> index df8f2a98d4..385176b331 100644
> --- a/block/io.c
> +++ b/block/io.c
> @@ -3307,6 +3307,7 @@ int coroutine_fn bdrv_co_truncate(BdrvChild *child, int64_t offset, bool exact,
>                                     Error **errp)
>   {
>       BlockDriverState *bs = child->bs;
> +    BdrvChild *filtered;
>       BlockDriver *drv = bs->drv;
>       BdrvTrackedRequest req;
>       int64_t old_size, new_bytes;
> @@ -3358,6 +3359,8 @@ int coroutine_fn bdrv_co_truncate(BdrvChild *child, int64_t offset, bool exact,
>           goto out;
>       }
>   
> +    filtered = bdrv_filter_child(bs);
> +

Isn't better to have this initialization right before the relevant 
if/else block?

Andrey

>       /*
>        * If the image has a backing file that is large enough that it would
>        * provide data for the new area, we cannot leave it unallocated because
> @@ -3390,8 +3393,8 @@ int coroutine_fn bdrv_co_truncate(BdrvChild *child, int64_t offset, bool exact,
>               goto out;
>           }
>           ret = drv->bdrv_co_truncate(bs, offset, exact, prealloc, flags, errp);
> -    } else if (bs->file && drv->is_filter) {
> -        ret = bdrv_co_truncate(bs->file, offset, exact, prealloc, flags, errp);
> +    } else if (filtered) {
> +        ret = bdrv_co_truncate(filtered, offset, exact, prealloc, flags, errp);
>       } else {
>           error_setg(errp, "Image format driver does not support resize");
>           ret = -ENOTSUP;

...

Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>



^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [PATCH v7 13/47] block: Use CAFs in block status functions
  2020-06-25 15:21 ` [PATCH v7 13/47] block: Use CAFs in block status functions Max Reitz
@ 2020-07-08 19:13   ` Andrey Shinkevich
  0 siblings, 0 replies; 173+ messages in thread
From: Andrey Shinkevich @ 2020-07-08 19:13 UTC (permalink / raw)
  To: Max Reitz, qemu-block
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, qemu-devel

On 25.06.2020 18:21, Max Reitz wrote:
> Use the child access functions in the block status inquiry functions as
> appropriate.
>
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>   block/io.c | 19 ++++++++++---------
>   1 file changed, 10 insertions(+), 9 deletions(-)
>
> diff --git a/block/io.c b/block/io.c
> index 385176b331..dc9891d6ce 100644
> --- a/block/io.c
> +++ b/block/io.c
> @@ -2407,11 +2407,12 @@ static int coroutine_fn bdrv_co_block_status(BlockDriverState *bs,
>       if (ret & (BDRV_BLOCK_DATA | BDRV_BLOCK_ZERO)) {
>           ret |= BDRV_BLOCK_ALLOCATED;
>       } else if (want_zero) {
> +        BlockDriverState *cow_bs = bdrv_cow_bs(bs);
> +
>           if (bdrv_unallocated_blocks_are_zero(bs)) {
>               ret |= BDRV_BLOCK_ZERO;
> -        } else if (bs->backing) {
> -            BlockDriverState *bs2 = bs->backing->bs;
> -            int64_t size2 = bdrv_getlength(bs2);
> +        } else if (cow_bs) {
> +            int64_t size2 = bdrv_getlength(cow_bs);
>   
>               if (size2 >= 0 && offset >= size2) {
>                   ret |= BDRV_BLOCK_ZERO;
> @@ -2477,7 +2478,7 @@ static int coroutine_fn bdrv_co_block_status_above(BlockDriverState *bs,
>       bool first = true;
>   
>       assert(bs != base);
> -    for (p = bs; p != base; p = backing_bs(p)) {
> +    for (p = bs; p != base; p = bdrv_filter_or_cow_bs(p)) {
>           ret = bdrv_co_block_status(p, want_zero, offset, bytes, pnum, map,
>                                      file);
>           if (ret < 0) {
> @@ -2551,7 +2552,7 @@ int bdrv_block_status_above(BlockDriverState *bs, BlockDriverState *base,
>   int bdrv_block_status(BlockDriverState *bs, int64_t offset, int64_t bytes,
>                         int64_t *pnum, int64_t *map, BlockDriverState **file)
>   {
> -    return bdrv_block_status_above(bs, backing_bs(bs),
> +    return bdrv_block_status_above(bs, bdrv_filter_or_cow_bs(bs),
>                                      offset, bytes, pnum, map, file);
>   }
>   
> @@ -2561,9 +2562,9 @@ int coroutine_fn bdrv_is_allocated(BlockDriverState *bs, int64_t offset,
>       int ret;
>       int64_t dummy;
>   
> -    ret = bdrv_common_block_status_above(bs, backing_bs(bs), false, offset,
> -                                         bytes, pnum ? pnum : &dummy, NULL,
> -                                         NULL);
> +    ret = bdrv_common_block_status_above(bs, bdrv_filter_or_cow_bs(bs), false,
> +                                         offset, bytes, pnum ? pnum : &dummy,
> +                                         NULL, NULL);
>       if (ret < 0) {
>           return ret;
>       }
> @@ -2626,7 +2627,7 @@ int bdrv_is_allocated_above(BlockDriverState *top,
>               break;
>           }
>   
> -        intermediate = backing_bs(intermediate);
> +        intermediate = bdrv_filter_or_cow_bs(intermediate);
>       }
>   
>       *pnum = n;


Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>



^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [PATCH v7 00/47] block: Deal with filters
  2020-07-08 17:32   ` Eric Blake
@ 2020-07-08 19:46     ` Andrey Shinkevich
  2020-07-08 20:37       ` Eric Blake
  0 siblings, 1 reply; 173+ messages in thread
From: Andrey Shinkevich @ 2020-07-08 19:46 UTC (permalink / raw)
  To: Eric Blake, Max Reitz, qemu-block
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, qemu-devel


On 08.07.2020 20:32, Eric Blake wrote:
> On 7/8/20 12:20 PM, Andrey Shinkevich wrote:
>> On 25.06.2020 18:21, Max Reitz wrote:
>>> v6: 
>>> https://lists.nongnu.org/archive/html/qemu-devel/2019-08/msg01715.html
>>>
>>> Branch: https://github.com/XanClic/qemu.git child-access-functions-v7
>>> Branch: https://git.xanclic.moe/XanClic/qemu.git 
>>> child-access-functions-v7
>>>
>>>
>> I cloned the branch from the github and built successfully.
>>
>> Running the iotests reports multiple errors of such a kind:
>>
>> 128: readarray -td '' formatting_line < <(sed -e 's/, fmt=/\x0/')
>>
>> "./common.filter: line 128: readarray: -d: invalid option"
>>
>
> Arrgh. If I'm reading bash's changelog correctly, readarray -d was 
> introduced in bash 4.4, so I'm guessing you're still on 4.3 or 
> earlier? What bash version and platform are you using?
>
My bash version is 4.2.46.

It is the latest in the virtuozzolinux-base repository. I should install 
the 4.4 package manually.

Thank you Eric for your hint!


Andrey

>> introduced with the commit
>>
>> a7399eb iotests: Make _filter_img_create more active
>>
>>
>> Andrey
>>
>>
>


^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [PATCH v7 00/47] block: Deal with filters
  2020-07-08 19:46     ` Andrey Shinkevich
@ 2020-07-08 20:37       ` Eric Blake
  2020-07-09  8:19         ` Max Reitz
  0 siblings, 1 reply; 173+ messages in thread
From: Eric Blake @ 2020-07-08 20:37 UTC (permalink / raw)
  To: Andrey Shinkevich, Max Reitz, qemu-block
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, qemu-devel

On 7/8/20 2:46 PM, Andrey Shinkevich wrote:
> 
> On 08.07.2020 20:32, Eric Blake wrote:
>> On 7/8/20 12:20 PM, Andrey Shinkevich wrote:
>>> On 25.06.2020 18:21, Max Reitz wrote:
>>>> v6: 
>>>> https://lists.nongnu.org/archive/html/qemu-devel/2019-08/msg01715.html
>>>>
>>>> Branch: https://github.com/XanClic/qemu.git child-access-functions-v7
>>>> Branch: https://git.xanclic.moe/XanClic/qemu.git 
>>>> child-access-functions-v7
>>>>
>>>>
>>> I cloned the branch from the github and built successfully.
>>>
>>> Running the iotests reports multiple errors of such a kind:
>>>
>>> 128: readarray -td '' formatting_line < <(sed -e 's/, fmt=/\x0/')
>>>
>>> "./common.filter: line 128: readarray: -d: invalid option"
>>>
>>
>> Arrgh. If I'm reading bash's changelog correctly, readarray -d was 
>> introduced in bash 4.4, so I'm guessing you're still on 4.3 or 
>> earlier? What bash version and platform are you using?
>>
> My bash version is 4.2.46.
> 
> It is the latest in the virtuozzolinux-base repository. I should install 
> the 4.4 package manually.

Well, if bash 4.2 is the default installed version on any of our 
platforms that meet our supported criteria, then we should instead fix 
the patch in question to avoid non-portable use of readarray.

Per https://repology.org/project/bash/versions (hinted from 
docs/system/build-platforms.rst), at least CentOS 7 still ships bash 
4.2, and per 'make docker', centos7 is still a viable build target.  So 
we do indeed need to fix our regression.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org



^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [PATCH v7 00/47] block: Deal with filters
  2020-07-08 17:20 ` [PATCH v7 00/47] block: Deal with filters Andrey Shinkevich
  2020-07-08 17:32   ` Eric Blake
@ 2020-07-08 20:47   ` Eric Blake
  2020-07-09  8:20     ` Max Reitz
  1 sibling, 1 reply; 173+ messages in thread
From: Eric Blake @ 2020-07-08 20:47 UTC (permalink / raw)
  To: Andrey Shinkevich, Max Reitz, qemu-block
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, qemu-devel

On 7/8/20 12:20 PM, Andrey Shinkevich wrote:
> On 25.06.2020 18:21, Max Reitz wrote:
>> v6: 
>> https://lists.nongnu.org/archive/html/qemu-devel/2019-08/msg01715.html
>>
>> Branch: https://github.com/XanClic/qemu.git child-access-functions-v7
>> Branch: https://git.xanclic.moe/XanClic/qemu.git 
>> child-access-functions-v7
>>
>>
> I cloned the branch from the github and built successfully.
> 
> Running the iotests reports multiple errors of such a kind:
> 
> 128: readarray -td '' formatting_line < <(sed -e 's/, fmt=/\x0/')
> 
> "./common.filter: line 128: readarray: -d: invalid option"
> 
> introduced with the commit
> 
> a7399eb iotests: Make _filter_img_create more active

You appear to be staging off an unreleased preliminary tree.  a7399eb is 
not upstream; the upstream commit 'iotests: Make _filter_img_create more 
active' is commit 57ee95ed, and while it uses readarray, it does not use 
the problematic -d.  In other words, it looks like the problem was 
caught and fixed in between the original patch creation and the pull 
request.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org



^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [PATCH v7 00/47] block: Deal with filters
  2020-07-08 20:37       ` Eric Blake
@ 2020-07-09  8:19         ` Max Reitz
  0 siblings, 0 replies; 173+ messages in thread
From: Max Reitz @ 2020-07-09  8:19 UTC (permalink / raw)
  To: Eric Blake, Andrey Shinkevich, qemu-block
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, qemu-devel


[-- Attachment #1.1: Type: text/plain, Size: 2109 bytes --]

On 08.07.20 22:37, Eric Blake wrote:
> On 7/8/20 2:46 PM, Andrey Shinkevich wrote:
>>
>> On 08.07.2020 20:32, Eric Blake wrote:
>>> On 7/8/20 12:20 PM, Andrey Shinkevich wrote:
>>>> On 25.06.2020 18:21, Max Reitz wrote:
>>>>> v6:
>>>>> https://lists.nongnu.org/archive/html/qemu-devel/2019-08/msg01715.html
>>>>>
>>>>> Branch: https://github.com/XanClic/qemu.git child-access-functions-v7
>>>>> Branch: https://git.xanclic.moe/XanClic/qemu.git
>>>>> child-access-functions-v7
>>>>>
>>>>>
>>>> I cloned the branch from the github and built successfully.
>>>>
>>>> Running the iotests reports multiple errors of such a kind:
>>>>
>>>> 128: readarray -td '' formatting_line < <(sed -e 's/, fmt=/\x0/')
>>>>
>>>> "./common.filter: line 128: readarray: -d: invalid option"
>>>>
>>>
>>> Arrgh. If I'm reading bash's changelog correctly, readarray -d was
>>> introduced in bash 4.4, so I'm guessing you're still on 4.3 or
>>> earlier? What bash version and platform are you using?
>>>
>> My bash version is 4.2.46.
>>
>> It is the latest in the virtuozzolinux-base repository. I should
>> install the 4.4 package manually.
> 
> Well, if bash 4.2 is the default installed version on any of our
> platforms that meet our supported criteria, then we should instead fix
> the patch in question to avoid non-portable use of readarray.
> 
> Per https://repology.org/project/bash/versions (hinted from
> docs/system/build-platforms.rst), at least CentOS 7 still ships bash
> 4.2, and per 'make docker', centos7 is still a viable build target.  So
> we do indeed need to fix our regression.

There is no regression.  It’s just that I based this series on an
earlier version of “Make _filter_img_create more active” – when I sent a
pull request for that version, Peter already reported to me that it
failed on some test environments, so I revised it.

You’ll find there is no a7399eb in master; it’s 57ee95ed4ee there and
doesn’t use -d.

(My branch on github/gitea is still based on that older version, though,
because that’s what I wrote it on.)

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [PATCH v7 00/47] block: Deal with filters
  2020-07-08 20:47   ` Eric Blake
@ 2020-07-09  8:20     ` Max Reitz
  2020-07-09  9:04       ` Andrey Shinkevich
  0 siblings, 1 reply; 173+ messages in thread
From: Max Reitz @ 2020-07-09  8:20 UTC (permalink / raw)
  To: Eric Blake, Andrey Shinkevich, qemu-block
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, qemu-devel


[-- Attachment #1.1: Type: text/plain, Size: 1351 bytes --]

On 08.07.20 22:47, Eric Blake wrote:
> On 7/8/20 12:20 PM, Andrey Shinkevich wrote:
>> On 25.06.2020 18:21, Max Reitz wrote:
>>> v6:
>>> https://lists.nongnu.org/archive/html/qemu-devel/2019-08/msg01715.html
>>>
>>> Branch: https://github.com/XanClic/qemu.git child-access-functions-v7
>>> Branch: https://git.xanclic.moe/XanClic/qemu.git
>>> child-access-functions-v7
>>>
>>>
>> I cloned the branch from the github and built successfully.
>>
>> Running the iotests reports multiple errors of such a kind:
>>
>> 128: readarray -td '' formatting_line < <(sed -e 's/, fmt=/\x0/')
>>
>> "./common.filter: line 128: readarray: -d: invalid option"
>>
>> introduced with the commit
>>
>> a7399eb iotests: Make _filter_img_create more active
> 
> You appear to be staging off an unreleased preliminary tree.  a7399eb is
> not upstream; the upstream commit 'iotests: Make _filter_img_create more
> active' is commit 57ee95ed, and while it uses readarray, it does not use
> the problematic -d.  In other words, it looks like the problem was
> caught and fixed in between the original patch creation and the pull
> request.

Ah, sorry, my mail client’s threading layout hid this mail from me a bit.

Yes.  Well, no, it wasn’t fixed before the pull request, but it was
fixed in the second pull request.  But yes.

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [PATCH v7 02/47] block: Add chain helper functions
  2020-07-08 17:20   ` Andrey Shinkevich
@ 2020-07-09  8:24     ` Max Reitz
  2020-07-09  9:07       ` Andrey Shinkevich
  0 siblings, 1 reply; 173+ messages in thread
From: Max Reitz @ 2020-07-09  8:24 UTC (permalink / raw)
  To: Andrey Shinkevich, qemu-block
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, qemu-devel


[-- Attachment #1.1: Type: text/plain, Size: 2523 bytes --]

On 08.07.20 19:20, Andrey Shinkevich wrote:
> On 25.06.2020 18:21, Max Reitz wrote:
>> Add some helper functions for skipping filters in a chain of block
>> nodes.
>>
>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>> ---
>>   include/block/block_int.h |  3 +++
>>   block.c                   | 55 +++++++++++++++++++++++++++++++++++++++
>>   2 files changed, 58 insertions(+)
>>
>> diff --git a/include/block/block_int.h b/include/block/block_int.h
>> index bb3457c5e8..5da793bfc3 100644
>> --- a/include/block/block_int.h
>> +++ b/include/block/block_int.h
>> @@ -1382,6 +1382,9 @@ BdrvChild *bdrv_cow_child(BlockDriverState *bs);
>>   BdrvChild *bdrv_filter_child(BlockDriverState *bs);
>>   BdrvChild *bdrv_filter_or_cow_child(BlockDriverState *bs);
>>   BdrvChild *bdrv_primary_child(BlockDriverState *bs);
>> +BlockDriverState *bdrv_skip_implicit_filters(BlockDriverState *bs);
>> +BlockDriverState *bdrv_skip_filters(BlockDriverState *bs);
>> +BlockDriverState *bdrv_backing_chain_next(BlockDriverState *bs);
>>     static inline BlockDriverState *child_bs(BdrvChild *child)
>>   {
>> diff --git a/block.c b/block.c
>> index 5a42ef49fd..0a0b855261 100644
>> --- a/block.c
>> +++ b/block.c
>> @@ -7008,3 +7008,58 @@ BdrvChild *bdrv_primary_child(BlockDriverState
>> *bs)
>>         return NULL;
>>   }
>> +
>> +static BlockDriverState *bdrv_do_skip_filters(BlockDriverState *bs,
>> +                                              bool
>> stop_on_explicit_filter)
>> +{
>> +    BdrvChild *c;
>> +
>> +    if (!bs) {
>> +        return NULL;
>> +    }
>> +
>> +    while (!(stop_on_explicit_filter && !bs->implicit)) {
>> +        c = bdrv_filter_child(bs);
>> +        if (!c) {
>> +            break;
>> +        }
>> +        bs = c->bs;
> 
> Could it be child_bs(bs) ?

Well, in a sense, but not really.  We need to check whether there is a
child before overwriting @bs (because @bs must stay a non-NULL pointer),
so we wouldn’t have fewer lines of code if we replaced “BdrvChild *c” by
“BlockDriverState *child_bs”, and then used bdrv_child() to set child_bs.

(And because we have to check whether @c is NULL anyway, there is no
real reason to use child_bs(c) instead of c->bs afterwards.)

>> +    }
> Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>

Thanks a lot for reviewing!


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [PATCH v7 12/47] block: Use bdrv_filter_(bs|child) where obvious
  2020-07-08 18:24   ` Andrey Shinkevich
@ 2020-07-09  8:59     ` Max Reitz
  2020-07-09  9:11       ` Andrey Shinkevich
  0 siblings, 1 reply; 173+ messages in thread
From: Max Reitz @ 2020-07-09  8:59 UTC (permalink / raw)
  To: Andrey Shinkevich, qemu-block
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, qemu-devel


[-- Attachment #1.1: Type: text/plain, Size: 2085 bytes --]

On 08.07.20 20:24, Andrey Shinkevich wrote:
> On 25.06.2020 18:21, Max Reitz wrote:
>> Places that use patterns like
>>
>>      if (bs->drv->is_filter && bs->file) {
>>          ... something about bs->file->bs ...
>>      }
>>
>> should be
>>
>>      BlockDriverState *filtered = bdrv_filter_bs(bs);
>>      if (filtered) {
>>          ... something about @filtered ...
>>      }
>>
>> instead.
>>
>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>> ---
>>   block.c                        | 31 ++++++++++++++++++++-----------
>>   block/io.c                     |  7 +++++--
>>   migration/block-dirty-bitmap.c |  8 +-------
>>   3 files changed, 26 insertions(+), 20 deletions(-)
>>
> ...
>> diff --git a/block/io.c b/block/io.c
>> index df8f2a98d4..385176b331 100644
>> --- a/block/io.c
>> +++ b/block/io.c
>> @@ -3307,6 +3307,7 @@ int coroutine_fn bdrv_co_truncate(BdrvChild
>> *child, int64_t offset, bool exact,
>>                                     Error **errp)
>>   {
>>       BlockDriverState *bs = child->bs;
>> +    BdrvChild *filtered;
>>       BlockDriver *drv = bs->drv;
>>       BdrvTrackedRequest req;
>>       int64_t old_size, new_bytes;
>> @@ -3358,6 +3359,8 @@ int coroutine_fn bdrv_co_truncate(BdrvChild
>> *child, int64_t offset, bool exact,
>>           goto out;
>>       }
>>   +    filtered = bdrv_filter_child(bs);
>> +
> 
> Isn't better to have this initialization right before the relevant
> if/else block?

Hm, well, yes.  In this case, though, maybe not.  Patch 16 will add
another BdrvChild to be initialized here (@backing), and we need to
initialize that one here.  So I felt it made sense to group them together.

They got split up when I decided to put @filtered into this patch and
@backing into its own.  So now it may look a bit weird, but I feel like
after patch 16 it makes sense.

(I’m indifferent, basically.)

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [PATCH v7 00/47] block: Deal with filters
  2020-07-09  8:20     ` Max Reitz
@ 2020-07-09  9:04       ` Andrey Shinkevich
  0 siblings, 0 replies; 173+ messages in thread
From: Andrey Shinkevich @ 2020-07-09  9:04 UTC (permalink / raw)
  To: Max Reitz, Eric Blake, qemu-block
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, qemu-devel

On 09.07.2020 11:20, Max Reitz wrote:
> On 08.07.20 22:47, Eric Blake wrote:
>> On 7/8/20 12:20 PM, Andrey Shinkevich wrote:
>>> On 25.06.2020 18:21, Max Reitz wrote:
>>>> v6:
>>>> https://lists.nongnu.org/archive/html/qemu-devel/2019-08/msg01715.html
>>>>
>>>> Branch: https://github.com/XanClic/qemu.git child-access-functions-v7
>>>> Branch: https://git.xanclic.moe/XanClic/qemu.git
>>>> child-access-functions-v7
>>>>
>>>>
>>> I cloned the branch from the github and built successfully.
>>>
>>> Running the iotests reports multiple errors of such a kind:
>>>
>>> 128: readarray -td '' formatting_line < <(sed -e 's/, fmt=/\x0/')
>>>
>>> "./common.filter: line 128: readarray: -d: invalid option"
>>>
>>> introduced with the commit
>>>
>>> a7399eb iotests: Make _filter_img_create more active
>> You appear to be staging off an unreleased preliminary tree.  a7399eb is
>> not upstream; the upstream commit 'iotests: Make _filter_img_create more
>> active' is commit 57ee95ed, and while it uses readarray, it does not use
>> the problematic -d.  In other words, it looks like the problem was
>> caught and fixed in between the original patch creation and the pull
>> request.
> Ah, sorry, my mail client’s threading layout hid this mail from me a bit.
>
> Yes.  Well, no, it wasn’t fixed before the pull request, but it was
> fixed in the second pull request.  But yes.
>
> Max
>
I'm clear with it now. Thank you all for your explenations and time!

Andrey



^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [PATCH v7 02/47] block: Add chain helper functions
  2020-07-09  8:24     ` Max Reitz
@ 2020-07-09  9:07       ` Andrey Shinkevich
  0 siblings, 0 replies; 173+ messages in thread
From: Andrey Shinkevich @ 2020-07-09  9:07 UTC (permalink / raw)
  To: Max Reitz, qemu-block
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, qemu-devel

On 09.07.2020 11:24, Max Reitz wrote:
> On 08.07.20 19:20, Andrey Shinkevich wrote:
>> On 25.06.2020 18:21, Max Reitz wrote:
>>> Add some helper functions for skipping filters in a chain of block
>>> nodes.
>>>
>>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>>> ---
>>>    include/block/block_int.h |  3 +++
>>>    block.c                   | 55 +++++++++++++++++++++++++++++++++++++++
>>>    2 files changed, 58 insertions(+)
>>>
>>> diff --git a/include/block/block_int.h b/include/block/block_int.h
>>> index bb3457c5e8..5da793bfc3 100644
>>> --- a/include/block/block_int.h
>>> +++ b/include/block/block_int.h
>>> @@ -1382,6 +1382,9 @@ BdrvChild *bdrv_cow_child(BlockDriverState *bs);
>>>    BdrvChild *bdrv_filter_child(BlockDriverState *bs);
>>>    BdrvChild *bdrv_filter_or_cow_child(BlockDriverState *bs);
>>>    BdrvChild *bdrv_primary_child(BlockDriverState *bs);
>>> +BlockDriverState *bdrv_skip_implicit_filters(BlockDriverState *bs);
>>> +BlockDriverState *bdrv_skip_filters(BlockDriverState *bs);
>>> +BlockDriverState *bdrv_backing_chain_next(BlockDriverState *bs);
>>>      static inline BlockDriverState *child_bs(BdrvChild *child)
>>>    {
>>> diff --git a/block.c b/block.c
>>> index 5a42ef49fd..0a0b855261 100644
>>> --- a/block.c
>>> +++ b/block.c
>>> @@ -7008,3 +7008,58 @@ BdrvChild *bdrv_primary_child(BlockDriverState
>>> *bs)
>>>          return NULL;
>>>    }
>>> +
>>> +static BlockDriverState *bdrv_do_skip_filters(BlockDriverState *bs,
>>> +                                              bool
>>> stop_on_explicit_filter)
>>> +{
>>> +    BdrvChild *c;
>>> +
>>> +    if (!bs) {
>>> +        return NULL;
>>> +    }
>>> +
>>> +    while (!(stop_on_explicit_filter && !bs->implicit)) {
>>> +        c = bdrv_filter_child(bs);
>>> +        if (!c) {
>>> +            break;
>>> +        }
>>> +        bs = c->bs;
>> Could it be child_bs(bs) ?
> Well, in a sense, but not really.  We need to check whether there is a
> child before overwriting @bs (because @bs must stay a non-NULL pointer),
> so we wouldn’t have fewer lines of code if we replaced “BdrvChild *c” by
> “BlockDriverState *child_bs”, and then used bdrv_child() to set child_bs.
>
> (And because we have to check whether @c is NULL anyway, there is no
> real reason to use child_bs(c) instead of c->bs afterwards.)

Got it, thanks.

Andrey

>>> +    }
>> Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
> Thanks a lot for reviewing!

Pleasure!

Andrey



^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [PATCH v7 12/47] block: Use bdrv_filter_(bs|child) where obvious
  2020-07-09  8:59     ` Max Reitz
@ 2020-07-09  9:11       ` Andrey Shinkevich
  0 siblings, 0 replies; 173+ messages in thread
From: Andrey Shinkevich @ 2020-07-09  9:11 UTC (permalink / raw)
  To: Max Reitz, qemu-block
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, qemu-devel

On 09.07.2020 11:59, Max Reitz wrote:
> On 08.07.20 20:24, Andrey Shinkevich wrote:
>> On 25.06.2020 18:21, Max Reitz wrote:
>>> Places that use patterns like
>>>
>>>       if (bs->drv->is_filter && bs->file) {
>>>           ... something about bs->file->bs ...
>>>       }
>>>
>>> should be
>>>
>>>       BlockDriverState *filtered = bdrv_filter_bs(bs);
>>>       if (filtered) {
>>>           ... something about @filtered ...
>>>       }
>>>
>>> instead.
>>>
>>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>>> ---
>>>    block.c                        | 31 ++++++++++++++++++++-----------
>>>    block/io.c                     |  7 +++++--
>>>    migration/block-dirty-bitmap.c |  8 +-------
>>>    3 files changed, 26 insertions(+), 20 deletions(-)
>>>
>> ...
>>> diff --git a/block/io.c b/block/io.c
>>> index df8f2a98d4..385176b331 100644
>>> --- a/block/io.c
>>> +++ b/block/io.c
>>> @@ -3307,6 +3307,7 @@ int coroutine_fn bdrv_co_truncate(BdrvChild
>>> *child, int64_t offset, bool exact,
>>>                                      Error **errp)
>>>    {
>>>        BlockDriverState *bs = child->bs;
>>> +    BdrvChild *filtered;
>>>        BlockDriver *drv = bs->drv;
>>>        BdrvTrackedRequest req;
>>>        int64_t old_size, new_bytes;
>>> @@ -3358,6 +3359,8 @@ int coroutine_fn bdrv_co_truncate(BdrvChild
>>> *child, int64_t offset, bool exact,
>>>            goto out;
>>>        }
>>>    +    filtered = bdrv_filter_child(bs);
>>> +
>> Isn't better to have this initialization right before the relevant
>> if/else block?
> Hm, well, yes.  In this case, though, maybe not.  Patch 16 will add
> another BdrvChild to be initialized here (@backing), and we need to
> initialize that one here.  So I felt it made sense to group them together.
>
> They got split up when I decided to put @filtered into this patch and
> @backing into its own.  So now it may look a bit weird, but I feel like
> after patch 16 it makes sense.
>
> (I’m indifferent, basically.)
>
> Max

Yes, it makes a sence. I am on the way to reviewing further and have not 
reached the 16th yet.

It is a minor thing anyway )) Thank you for your response.

Andrey



^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [PATCH v7 14/47] stream: Deal with filters
  2020-06-25 15:21 ` [PATCH v7 14/47] stream: Deal with filters Max Reitz
@ 2020-07-09 14:52   ` Andrey Shinkevich
  2020-07-09 15:27     ` Andrey Shinkevich
  2020-07-10 15:24     ` Max Reitz
  2020-07-09 15:13   ` Andrey Shinkevich
  2020-08-18 14:28   ` Kevin Wolf
  2 siblings, 2 replies; 173+ messages in thread
From: Andrey Shinkevich @ 2020-07-09 14:52 UTC (permalink / raw)
  To: Max Reitz, qemu-block
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, qemu-devel

On 25.06.2020 18:21, Max Reitz wrote:
> Because of the (not so recent anymore) changes that make the stream job
> independent of the base node and instead track the node above it, we
> have to split that "bottom" node into two cases: The bottom COW node,
> and the node directly above the base node (which may be an R/W filter
> or the bottom COW node).
>
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>   qapi/block-core.json |  4 +++
>   block/stream.c       | 63 ++++++++++++++++++++++++++++++++------------
>   blockdev.c           |  4 ++-
>   3 files changed, 53 insertions(+), 18 deletions(-)
>
> diff --git a/qapi/block-core.json b/qapi/block-core.json
> index b20332e592..df87855429 100644
> --- a/qapi/block-core.json
> +++ b/qapi/block-core.json
> @@ -2486,6 +2486,10 @@
>   # On successful completion the image file is updated to drop the backing file
>   # and the BLOCK_JOB_COMPLETED event is emitted.
>   #
> +# In case @device is a filter node, block-stream modifies the first non-filter
> +# overlay node below it to point to base's backing node (or NULL if @base was
> +# not specified) instead of modifying @device itself.
> +#
>   # @job-id: identifier for the newly-created block job. If
>   #          omitted, the device name will be used. (Since 2.7)
>   #
> diff --git a/block/stream.c b/block/stream.c
> index aa2e7af98e..b9c1141656 100644
> --- a/block/stream.c
> +++ b/block/stream.c
> @@ -31,7 +31,8 @@ enum {
>   
>   typedef struct StreamBlockJob {
>       BlockJob common;
> -    BlockDriverState *bottom;
> +    BlockDriverState *base_overlay; /* COW overlay (stream from this) */
> +    BlockDriverState *above_base;   /* Node directly above the base */

Keeping the base_overlay is enough to complete the stream job.

The above_base may disappear during the job and we can't rely on it.

>       BlockdevOnError on_error;
>       char *backing_file_str;
>       bool bs_read_only;
> @@ -53,7 +54,7 @@ static void stream_abort(Job *job)
>   
>       if (s->chain_frozen) {
>           BlockJob *bjob = &s->common;
> -        bdrv_unfreeze_backing_chain(blk_bs(bjob->blk), s->bottom);
> +        bdrv_unfreeze_backing_chain(blk_bs(bjob->blk), s->above_base);
>       }
>   }
>   
> @@ -62,14 +63,15 @@ static int stream_prepare(Job *job)
>       StreamBlockJob *s = container_of(job, StreamBlockJob, common.job);
>       BlockJob *bjob = &s->common;
>       BlockDriverState *bs = blk_bs(bjob->blk);
> -    BlockDriverState *base = backing_bs(s->bottom);
> +    BlockDriverState *unfiltered_bs = bdrv_skip_filters(bs);
> +    BlockDriverState *base = bdrv_filter_or_cow_bs(s->above_base);

The initial base node may be a top node for a concurrent commit job and

may disappear. It is true for the above_base as well.

base = bdrv_filter_or_cow_bs(s->base_overlay) is more reliable.

>       Error *local_err = NULL;
>       int ret = 0;
>   
> -    bdrv_unfreeze_backing_chain(bs, s->bottom);
> +    bdrv_unfreeze_backing_chain(bs, s->above_base);
>       s->chain_frozen = false;
>   
> -    if (bs->backing) {
> +    if (bdrv_cow_child(unfiltered_bs)) {
>           const char *base_id = NULL, *base_fmt = NULL;
>           if (base) {
>               base_id = s->backing_file_str;
> @@ -77,8 +79,8 @@ static int stream_prepare(Job *job)
>                   base_fmt = base->drv->format_name;
>               }
>           }
> -        bdrv_set_backing_hd(bs, base, &local_err);
> -        ret = bdrv_change_backing_file(bs, base_id, base_fmt);
> +        bdrv_set_backing_hd(unfiltered_bs, base, &local_err);
> +        ret = bdrv_change_backing_file(unfiltered_bs, base_id, base_fmt);
>           if (local_err) {
>               error_report_err(local_err);
>               return -EPERM;
> @@ -109,14 +111,15 @@ static int coroutine_fn stream_run(Job *job, Error **errp)
>       StreamBlockJob *s = container_of(job, StreamBlockJob, common.job);
>       BlockBackend *blk = s->common.blk;
>       BlockDriverState *bs = blk_bs(blk);
> -    bool enable_cor = !backing_bs(s->bottom);
> +    BlockDriverState *unfiltered_bs = bdrv_skip_filters(bs);
> +    bool enable_cor = !bdrv_cow_child(s->base_overlay);
>       int64_t len;
>       int64_t offset = 0;
>       uint64_t delay_ns = 0;
>       int error = 0;
>       int64_t n = 0; /* bytes */
>   
> -    if (bs == s->bottom) {
> +    if (unfiltered_bs == s->base_overlay) {
>           /* Nothing to stream */
>           return 0;
>       }
> @@ -150,13 +153,14 @@ static int coroutine_fn stream_run(Job *job, Error **errp)
>   
>           copy = false;
>   
> -        ret = bdrv_is_allocated(bs, offset, STREAM_CHUNK, &n);
> +        ret = bdrv_is_allocated(unfiltered_bs, offset, STREAM_CHUNK, &n);
>           if (ret == 1) {
>               /* Allocated in the top, no need to copy.  */
>           } else if (ret >= 0) {
>               /* Copy if allocated in the intermediate images.  Limit to the
>                * known-unallocated area [offset, offset+n*BDRV_SECTOR_SIZE).  */
> -            ret = bdrv_is_allocated_above(backing_bs(bs), s->bottom, true,
> +            ret = bdrv_is_allocated_above(bdrv_cow_bs(unfiltered_bs),
> +                                          s->base_overlay, true,
>                                             offset, n, &n);
>               /* Finish early if end of backing file has been reached */
>               if (ret == 0 && n == 0) {
> @@ -223,9 +227,29 @@ void stream_start(const char *job_id, BlockDriverState *bs,
>       BlockDriverState *iter;
>       bool bs_read_only;
>       int basic_flags = BLK_PERM_CONSISTENT_READ | BLK_PERM_WRITE_UNCHANGED;
> -    BlockDriverState *bottom = bdrv_find_overlay(bs, base);
> +    BlockDriverState *base_overlay = bdrv_find_overlay(bs, base);
> +    BlockDriverState *above_base;
>   
> -    if (bdrv_freeze_backing_chain(bs, bottom, errp) < 0) {
> +    if (!base_overlay) {
> +        error_setg(errp, "'%s' is not in the backing chain of '%s'",
> +                   base->node_name, bs->node_name);

Sorry, I am not clear with the error message.

In this case, there is no an intermediate COW node but the base, if not 
NULL, is

in the backing chain of bs, isn't it?

> +        return;
> +    }
> +
> +    /*
> +     * Find the node directly above @base.  @base_overlay is a COW overlay, so
> +     * it must have a bdrv_cow_child(), but it is the immediate overlay of
> +     * @base, so between the two there can only be filters.
> +     */
> +    above_base = base_overlay;
> +    if (bdrv_cow_bs(above_base) != base) {
> +        above_base = bdrv_cow_bs(above_base);
> +        while (bdrv_filter_bs(above_base) != base) {
> +            above_base = bdrv_filter_bs(above_base);
> +        }
> +    }
> +
> +    if (bdrv_freeze_backing_chain(bs, above_base, errp) < 0) {

When a concurrent stream job tries to freeze or remove the above_base node,

we will encounter the frozen node error. The above_base node is a part 
of the

concurrent job frozen chain.

>           return;
>       }
>   
> @@ -255,14 +279,19 @@ void stream_start(const char *job_id, BlockDriverState *bs,
>        * and resizes. Reassign the base node pointer because the backing BS of the
>        * bottom node might change after the call to bdrv_reopen_set_read_only()
>        * due to parallel block jobs running.
> +     * above_base node might change after the call to
Yes, if not frozen.
> +     * bdrv_reopen_set_read_only() due to parallel block jobs running.
>        */
> -    base = backing_bs(bottom);
> -    for (iter = backing_bs(bs); iter && iter != base; iter = backing_bs(iter)) {
> +    base = bdrv_filter_or_cow_bs(above_base);
> +    for (iter = bdrv_filter_or_cow_bs(bs); iter != base;
> +         iter = bdrv_filter_or_cow_bs(iter))
> +    {
>           block_job_add_bdrv(&s->common, "intermediate node", iter, 0,
>                              basic_flags, &error_abort);
>       }
>   
> -    s->bottom = bottom;
> +    s->base_overlay = base_overlay;
> +    s->above_base = above_base;

Generally, being the filter for a concurrent job, the above_base node 
may be deleted any time

and we will keep the dangling pointer. It may happen even earlier if 
above_base is not frozen.

If it is, as it here, we may get the frozen link error then.


Andrey

>       s->backing_file_str = g_strdup(backing_file_str);
>       s->bs_read_only = bs_read_only;
>       s->chain_frozen = true;
> @@ -276,5 +305,5 @@ fail:
>       if (bs_read_only) {
>           bdrv_reopen_set_read_only(bs, true, NULL);
>       }
> -    bdrv_unfreeze_backing_chain(bs, bottom);
> +    bdrv_unfreeze_backing_chain(bs, above_base);
>   }
> diff --git a/blockdev.c b/blockdev.c
> index 72df193ca7..1eb0fcdea2 100644
> --- a/blockdev.c
> +++ b/blockdev.c
> @@ -2515,7 +2515,9 @@ void qmp_block_stream(bool has_job_id, const char *job_id, const char *device,
>       }
>   
>       /* Check for op blockers in the whole chain between bs and base */
> -    for (iter = bs; iter && iter != base_bs; iter = backing_bs(iter)) {
> +    for (iter = bs; iter && iter != base_bs;
> +         iter = bdrv_filter_or_cow_bs(iter))
> +    {
>           if (bdrv_op_is_blocked(iter, BLOCK_OP_TYPE_STREAM, errp)) {
>               goto out;
>           }


^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [PATCH v7 14/47] stream: Deal with filters
  2020-06-25 15:21 ` [PATCH v7 14/47] stream: Deal with filters Max Reitz
  2020-07-09 14:52   ` Andrey Shinkevich
@ 2020-07-09 15:13   ` Andrey Shinkevich
  2020-07-10 15:27     ` Max Reitz
  2020-08-18 14:28   ` Kevin Wolf
  2 siblings, 1 reply; 173+ messages in thread
From: Andrey Shinkevich @ 2020-07-09 15:13 UTC (permalink / raw)
  To: Max Reitz, qemu-block
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, qemu-devel

On 25.06.2020 18:21, Max Reitz wrote:
> Because of the (not so recent anymore) changes that make the stream job
> independent of the base node and instead track the node above it, we
> have to split that "bottom" node into two cases: The bottom COW node,
> and the node directly above the base node (which may be an R/W filter
> or the bottom COW node).
>
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>   qapi/block-core.json |  4 +++
>   block/stream.c       | 63 ++++++++++++++++++++++++++++++++------------
>   blockdev.c           |  4 ++-
>   3 files changed, 53 insertions(+), 18 deletions(-)
>
> diff --git a/qapi/block-core.json b/qapi/block-core.json
> index b20332e592..df87855429 100644
> --- a/qapi/block-core.json
> +++ b/qapi/block-core.json
> @@ -2486,6 +2486,10 @@
>   # On successful completion the image file is updated to drop the backing file
>   # and the BLOCK_JOB_COMPLETED event is emitted.
>   #
> +# In case @device is a filter node, block-stream modifies the first non-filter
> +# overlay node below it to point to base's backing node (or NULL if @base was

Forgot one thing. To me, it would be more understandable to read

"...to point to the base as backing node..." because it may be thought 
as a backing

node of the base.

Andrey

> +# not specified) instead of modifying @device itself.
> +#
>   # @job-id: identifier for the newly-created block job. If
>   #          omitted, the device name will be used. (Since 2.7)
>   #


^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [PATCH v7 14/47] stream: Deal with filters
  2020-07-09 14:52   ` Andrey Shinkevich
@ 2020-07-09 15:27     ` Andrey Shinkevich
  2020-07-10 15:24     ` Max Reitz
  1 sibling, 0 replies; 173+ messages in thread
From: Andrey Shinkevich @ 2020-07-09 15:27 UTC (permalink / raw)
  To: Max Reitz, qemu-block
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, qemu-devel

On 09.07.2020 17:52, Andrey Shinkevich wrote:
> On 25.06.2020 18:21, Max Reitz wrote:
>> Because of the (not so recent anymore) changes that make the stream job
>> independent of the base node and instead track the node above it, we
>> have to split that "bottom" node into two cases: The bottom COW node,
>> and the node directly above the base node (which may be an R/W filter
>> or the bottom COW node).
>>
>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>> ---
>>   qapi/block-core.json |  4 +++
>>   block/stream.c       | 63 ++++++++++++++++++++++++++++++++------------
>>   blockdev.c           |  4 ++-
>>   3 files changed, 53 insertions(+), 18 deletions(-)
>>
...
>> +    BlockDriverState *base_overlay = bdrv_find_overlay(bs, base);
>> +    BlockDriverState *above_base;
>>   -    if (bdrv_freeze_backing_chain(bs, bottom, errp) < 0) {
>> +    if (!base_overlay) {
>> +        error_setg(errp, "'%s' is not in the backing chain of '%s'",
>> +                   base->node_name, bs->node_name);
>
> Sorry, I am not clear with the error message.
>
> In this case, there is no an intermediate COW node but the base, if 
> not NULL, is
>
> in the backing chain of bs, isn't it?
>
I am discarding this question. No need to answer.

Andrey


>> +        return;
>> +    }
>> +



^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [PATCH v7 14/47] stream: Deal with filters
  2020-07-09 14:52   ` Andrey Shinkevich
  2020-07-09 15:27     ` Andrey Shinkevich
@ 2020-07-10 15:24     ` Max Reitz
  2020-07-10 17:41       ` Andrey Shinkevich
  1 sibling, 1 reply; 173+ messages in thread
From: Max Reitz @ 2020-07-10 15:24 UTC (permalink / raw)
  To: Andrey Shinkevich, qemu-block
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, qemu-devel


[-- Attachment #1.1: Type: text/plain, Size: 10688 bytes --]

On 09.07.20 16:52, Andrey Shinkevich wrote:
> On 25.06.2020 18:21, Max Reitz wrote:
>> Because of the (not so recent anymore) changes that make the stream job
>> independent of the base node and instead track the node above it, we
>> have to split that "bottom" node into two cases: The bottom COW node,
>> and the node directly above the base node (which may be an R/W filter
>> or the bottom COW node).
>>
>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>> ---
>>   qapi/block-core.json |  4 +++
>>   block/stream.c       | 63 ++++++++++++++++++++++++++++++++------------
>>   blockdev.c           |  4 ++-
>>   3 files changed, 53 insertions(+), 18 deletions(-)
>>
>> diff --git a/qapi/block-core.json b/qapi/block-core.json
>> index b20332e592..df87855429 100644
>> --- a/qapi/block-core.json
>> +++ b/qapi/block-core.json
>> @@ -2486,6 +2486,10 @@
>>   # On successful completion the image file is updated to drop the
>> backing file
>>   # and the BLOCK_JOB_COMPLETED event is emitted.
>>   #
>> +# In case @device is a filter node, block-stream modifies the first
>> non-filter
>> +# overlay node below it to point to base's backing node (or NULL if
>> @base was
>> +# not specified) instead of modifying @device itself.
>> +#
>>   # @job-id: identifier for the newly-created block job. If
>>   #          omitted, the device name will be used. (Since 2.7)
>>   #
>> diff --git a/block/stream.c b/block/stream.c
>> index aa2e7af98e..b9c1141656 100644
>> --- a/block/stream.c
>> +++ b/block/stream.c
>> @@ -31,7 +31,8 @@ enum {
>>     typedef struct StreamBlockJob {
>>       BlockJob common;
>> -    BlockDriverState *bottom;
>> +    BlockDriverState *base_overlay; /* COW overlay (stream from this) */
>> +    BlockDriverState *above_base;   /* Node directly above the base */
> 
> Keeping the base_overlay is enough to complete the stream job.

Depends on the definition.  If we decide it isn’t enough, then it isn’t
enough.

> The above_base may disappear during the job and we can't rely on it.

In this version of this series, it may not, because the chain is frozen.
 So the above_base cannot disappear.

We can discuss whether we should allow it to disappear, but I think not.

The problem is, we need something to set as the backing file after
streaming.  How do we figure out what that should be?  My proposal is we
keep above_base and use its immediate child.

If we don’t keep above_base, then we’re basically left guessing as to
what should be the backing file after the stream job.

>>       BlockdevOnError on_error;
>>       char *backing_file_str;
>>       bool bs_read_only;
>> @@ -53,7 +54,7 @@ static void stream_abort(Job *job)
>>         if (s->chain_frozen) {
>>           BlockJob *bjob = &s->common;
>> -        bdrv_unfreeze_backing_chain(blk_bs(bjob->blk), s->bottom);
>> +        bdrv_unfreeze_backing_chain(blk_bs(bjob->blk), s->above_base);
>>       }
>>   }
>>   @@ -62,14 +63,15 @@ static int stream_prepare(Job *job)
>>       StreamBlockJob *s = container_of(job, StreamBlockJob, common.job);
>>       BlockJob *bjob = &s->common;
>>       BlockDriverState *bs = blk_bs(bjob->blk);
>> -    BlockDriverState *base = backing_bs(s->bottom);
>> +    BlockDriverState *unfiltered_bs = bdrv_skip_filters(bs);
>> +    BlockDriverState *base = bdrv_filter_or_cow_bs(s->above_base);
> 
> The initial base node may be a top node for a concurrent commit job and
> 
> may disappear.
Then it would just be replaced by another node, though, so above_base
keeps a child.  The @base here is not necessarily the initial @base, and
that’s intentional.

> base = bdrv_filter_or_cow_bs(s->base_overlay) is more reliable.

But also wrong.  The point of keeping above_base around is to get its
child here to use that child as the new backing child of the top node.

>>       Error *local_err = NULL;
>>       int ret = 0;
>>   -    bdrv_unfreeze_backing_chain(bs, s->bottom);
>> +    bdrv_unfreeze_backing_chain(bs, s->above_base);
>>       s->chain_frozen = false;
>>   -    if (bs->backing) {
>> +    if (bdrv_cow_child(unfiltered_bs)) {
>>           const char *base_id = NULL, *base_fmt = NULL;
>>           if (base) {
>>               base_id = s->backing_file_str;
>> @@ -77,8 +79,8 @@ static int stream_prepare(Job *job)
>>                   base_fmt = base->drv->format_name;
>>               }
>>           }
>> -        bdrv_set_backing_hd(bs, base, &local_err);
>> -        ret = bdrv_change_backing_file(bs, base_id, base_fmt);
>> +        bdrv_set_backing_hd(unfiltered_bs, base, &local_err);
>> +        ret = bdrv_change_backing_file(unfiltered_bs, base_id,
>> base_fmt);
>>           if (local_err) {
>>               error_report_err(local_err);
>>               return -EPERM;
>> @@ -109,14 +111,15 @@ static int coroutine_fn stream_run(Job *job,
>> Error **errp)
>>       StreamBlockJob *s = container_of(job, StreamBlockJob, common.job);
>>       BlockBackend *blk = s->common.blk;
>>       BlockDriverState *bs = blk_bs(blk);
>> -    bool enable_cor = !backing_bs(s->bottom);
>> +    BlockDriverState *unfiltered_bs = bdrv_skip_filters(bs);
>> +    bool enable_cor = !bdrv_cow_child(s->base_overlay);
>>       int64_t len;
>>       int64_t offset = 0;
>>       uint64_t delay_ns = 0;
>>       int error = 0;
>>       int64_t n = 0; /* bytes */
>>   -    if (bs == s->bottom) {
>> +    if (unfiltered_bs == s->base_overlay) {
>>           /* Nothing to stream */
>>           return 0;
>>       }
>> @@ -150,13 +153,14 @@ static int coroutine_fn stream_run(Job *job,
>> Error **errp)
>>             copy = false;
>>   -        ret = bdrv_is_allocated(bs, offset, STREAM_CHUNK, &n);
>> +        ret = bdrv_is_allocated(unfiltered_bs, offset, STREAM_CHUNK,
>> &n);
>>           if (ret == 1) {
>>               /* Allocated in the top, no need to copy.  */
>>           } else if (ret >= 0) {
>>               /* Copy if allocated in the intermediate images.  Limit
>> to the
>>                * known-unallocated area [offset,
>> offset+n*BDRV_SECTOR_SIZE).  */
>> -            ret = bdrv_is_allocated_above(backing_bs(bs), s->bottom,
>> true,
>> +            ret = bdrv_is_allocated_above(bdrv_cow_bs(unfiltered_bs),
>> +                                          s->base_overlay, true,
>>                                             offset, n, &n);
>>               /* Finish early if end of backing file has been reached */
>>               if (ret == 0 && n == 0) {
>> @@ -223,9 +227,29 @@ void stream_start(const char *job_id,
>> BlockDriverState *bs,
>>       BlockDriverState *iter;
>>       bool bs_read_only;
>>       int basic_flags = BLK_PERM_CONSISTENT_READ |
>> BLK_PERM_WRITE_UNCHANGED;
>> -    BlockDriverState *bottom = bdrv_find_overlay(bs, base);
>> +    BlockDriverState *base_overlay = bdrv_find_overlay(bs, base);
>> +    BlockDriverState *above_base;
>>   -    if (bdrv_freeze_backing_chain(bs, bottom, errp) < 0) {
>> +    if (!base_overlay) {
>> +        error_setg(errp, "'%s' is not in the backing chain of '%s'",
>> +                   base->node_name, bs->node_name);
> 
> Sorry, I am not clear with the error message.
> 
> In this case, there is no an intermediate COW node but the base, if not
> NULL, is
> 
> in the backing chain of bs, isn't it?
> 
>> +        return;
>> +    }
>> +
>> +    /*
>> +     * Find the node directly above @base.  @base_overlay is a COW
>> overlay, so
>> +     * it must have a bdrv_cow_child(), but it is the immediate
>> overlay of
>> +     * @base, so between the two there can only be filters.
>> +     */
>> +    above_base = base_overlay;
>> +    if (bdrv_cow_bs(above_base) != base) {
>> +        above_base = bdrv_cow_bs(above_base);
>> +        while (bdrv_filter_bs(above_base) != base) {
>> +            above_base = bdrv_filter_bs(above_base);
>> +        }
>> +    }
>> +
>> +    if (bdrv_freeze_backing_chain(bs, above_base, errp) < 0) {
> 
> When a concurrent stream job tries to freeze or remove the above_base node,
> 
> we will encounter the frozen node error. The above_base node is a part
> of the
> 
> concurrent job frozen chain.

Correct.

>>           return;
>>       }
>>   @@ -255,14 +279,19 @@ void stream_start(const char *job_id,
>> BlockDriverState *bs,
>>        * and resizes. Reassign the base node pointer because the
>> backing BS of the
>>        * bottom node might change after the call to
>> bdrv_reopen_set_read_only()
>>        * due to parallel block jobs running.
>> +     * above_base node might change after the call to
> Yes, if not frozen.
>> +     * bdrv_reopen_set_read_only() due to parallel block jobs running.
>>        */
>> -    base = backing_bs(bottom);
>> -    for (iter = backing_bs(bs); iter && iter != base; iter =
>> backing_bs(iter)) {
>> +    base = bdrv_filter_or_cow_bs(above_base);
>> +    for (iter = bdrv_filter_or_cow_bs(bs); iter != base;
>> +         iter = bdrv_filter_or_cow_bs(iter))
>> +    {
>>           block_job_add_bdrv(&s->common, "intermediate node", iter, 0,
>>                              basic_flags, &error_abort);
>>       }
>>   -    s->bottom = bottom;
>> +    s->base_overlay = base_overlay;
>> +    s->above_base = above_base;
> 
> Generally, being the filter for a concurrent job, the above_base node
> may be deleted any time
> 
> and we will keep the dangling pointer. It may happen even earlier if
> above_base is not frozen.
> 
> If it is, as it here, we may get the frozen link error then.

I’m not sure what you mean here.  Freezing it was absolutely
intentional.  A dangling pointer would be a problem, but that’s why it’s
frozen, so it stays around and can’t be deleted any time.

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [PATCH v7 14/47] stream: Deal with filters
  2020-07-09 15:13   ` Andrey Shinkevich
@ 2020-07-10 15:27     ` Max Reitz
  0 siblings, 0 replies; 173+ messages in thread
From: Max Reitz @ 2020-07-10 15:27 UTC (permalink / raw)
  To: Andrey Shinkevich, qemu-block
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, qemu-devel


[-- Attachment #1.1: Type: text/plain, Size: 1704 bytes --]

On 09.07.20 17:13, Andrey Shinkevich wrote:
> On 25.06.2020 18:21, Max Reitz wrote:
>> Because of the (not so recent anymore) changes that make the stream job
>> independent of the base node and instead track the node above it, we
>> have to split that "bottom" node into two cases: The bottom COW node,
>> and the node directly above the base node (which may be an R/W filter
>> or the bottom COW node).
>>
>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>> ---
>>   qapi/block-core.json |  4 +++
>>   block/stream.c       | 63 ++++++++++++++++++++++++++++++++------------
>>   blockdev.c           |  4 ++-
>>   3 files changed, 53 insertions(+), 18 deletions(-)
>>
>> diff --git a/qapi/block-core.json b/qapi/block-core.json
>> index b20332e592..df87855429 100644
>> --- a/qapi/block-core.json
>> +++ b/qapi/block-core.json
>> @@ -2486,6 +2486,10 @@
>>   # On successful completion the image file is updated to drop the
>> backing file
>>   # and the BLOCK_JOB_COMPLETED event is emitted.
>>   #
>> +# In case @device is a filter node, block-stream modifies the first
>> non-filter
>> +# overlay node below it to point to base's backing node (or NULL if
>> @base was
> 
> Forgot one thing. To me, it would be more understandable to read
> 
> "...to point to the base as backing node..." because it may be thought
> as a backing
> 
> node of the base.

This doesn’t sound like it’s about understandability; “point to the base
as backing node” and “point to base’s backing node [as backing node]”
are semantically different.

Was my phrasing just wrong?  @base should be the backing node, so yours
seems correct.

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [PATCH v7 15/47] block: Use CAFs when working with backing chains
  2020-06-25 15:21 ` [PATCH v7 15/47] block: Use CAFs when working with backing chains Max Reitz
@ 2020-07-10 15:28   ` Andrey Shinkevich
  0 siblings, 0 replies; 173+ messages in thread
From: Andrey Shinkevich @ 2020-07-10 15:28 UTC (permalink / raw)
  To: Max Reitz, qemu-block
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, qemu-devel

On 25.06.2020 18:21, Max Reitz wrote:
> Use child access functions when iterating through backing chains so
> filters do not break the chain.
>
> In addition, bdrv_find_overlay() will now always return the actual
> overlay; that is, it will never return a filter node but only one with a
> COW backing file (there may be filter nodes between that node and @bs).
>
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>   block.c | 41 +++++++++++++++++++++++++++++------------
>   1 file changed, 29 insertions(+), 12 deletions(-)
>
> diff --git a/block.c b/block.c
> index a44af9c3c1..712230ef5c 100644
> --- a/block.c
> +++ b/block.c
> @@ -4724,7 +4724,8 @@ int bdrv_change_backing_file(BlockDriverState *bs,
>   }
>   
>   /*
> - * Finds the image layer in the chain that has 'bs' as its backing file.
> + * Finds the image layer in the chain that has 'bs' (or a filter on
> + * top of it) as its backing file.
One can optionally say "Finds the first non-filter parent of bs in the 
chain".
>    

...

Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>



^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [PATCH v7 16/47] block: Use bdrv_cow_child() in bdrv_co_truncate()
  2020-06-25 15:21 ` [PATCH v7 16/47] block: Use bdrv_cow_child() in bdrv_co_truncate() Max Reitz
@ 2020-07-10 15:54   ` Andrey Shinkevich
  0 siblings, 0 replies; 173+ messages in thread
From: Andrey Shinkevich @ 2020-07-10 15:54 UTC (permalink / raw)
  To: Max Reitz, qemu-block
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, qemu-devel

On 25.06.2020 18:21, Max Reitz wrote:
> The condition modified here is not about potentially filtered children,
> but only about COW sources (i.e. traditional backing files).
>
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>   block/io.c | 7 ++++---
>   1 file changed, 4 insertions(+), 3 deletions(-)
>
> diff --git a/block/io.c b/block/io.c
> index dc9891d6ce..097a3861d8 100644
> --- a/block/io.c
> +++ b/block/io.c
> @@ -3308,7 +3308,7 @@ int coroutine_fn bdrv_co_truncate(BdrvChild *child, int64_t offset, bool exact,
>                                     Error **errp)
>   {
>       BlockDriverState *bs = child->bs;
> -    BdrvChild *filtered;
> +    BdrvChild *filtered, *backing;
>       BlockDriver *drv = bs->drv;
>       BdrvTrackedRequest req;
>       int64_t old_size, new_bytes;
> @@ -3361,6 +3361,7 @@ int coroutine_fn bdrv_co_truncate(BdrvChild *child, int64_t offset, bool exact,
>       }
>   
>       filtered = bdrv_filter_child(bs);
> +    backing = bdrv_cow_child(bs);
>   
>       /*
>        * If the image has a backing file that is large enough that it would
> @@ -3372,10 +3373,10 @@ int coroutine_fn bdrv_co_truncate(BdrvChild *child, int64_t offset, bool exact,
>        * backing file, taking care of keeping things consistent with that backing
>        * file is the user's responsibility.
>        */
> -    if (new_bytes && bs->backing) {
> +    if (new_bytes && backing) {
>           int64_t backing_len;
>   
> -        backing_len = bdrv_getlength(backing_bs(bs));
> +        backing_len = bdrv_getlength(backing->bs);
>           if (backing_len < 0) {
>               ret = backing_len;
>               error_setg_errno(errp, -ret, "Could not get backing file size");
Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>


^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [PATCH v7 14/47] stream: Deal with filters
  2020-07-10 15:24     ` Max Reitz
@ 2020-07-10 17:41       ` Andrey Shinkevich
  2020-07-16 14:59         ` Max Reitz
  0 siblings, 1 reply; 173+ messages in thread
From: Andrey Shinkevich @ 2020-07-10 17:41 UTC (permalink / raw)
  To: Max Reitz, qemu-block
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, qemu-devel

On 10.07.2020 18:24, Max Reitz wrote:
> On 09.07.20 16:52, Andrey Shinkevich wrote:
>> On 25.06.2020 18:21, Max Reitz wrote:
>>> Because of the (not so recent anymore) changes that make the stream job
>>> independent of the base node and instead track the node above it, we
>>> have to split that "bottom" node into two cases: The bottom COW node,
>>> and the node directly above the base node (which may be an R/W filter
>>> or the bottom COW node).
>>>
>>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>>> ---
>>>    qapi/block-core.json |  4 +++
>>>    block/stream.c       | 63 ++++++++++++++++++++++++++++++++------------
>>>    blockdev.c           |  4 ++-
>>>    3 files changed, 53 insertions(+), 18 deletions(-)
>>>
>>> diff --git a/qapi/block-core.json b/qapi/block-core.json
>>> index b20332e592..df87855429 100644
>>> --- a/qapi/block-core.json
>>> +++ b/qapi/block-core.json
>>> @@ -2486,6 +2486,10 @@
>>>    # On successful completion the image file is updated to drop the
>>> backing file
>>>    # and the BLOCK_JOB_COMPLETED event is emitted.
>>>    #
>>> +# In case @device is a filter node, block-stream modifies the first
>>> non-filter
>>> +# overlay node below it to point to base's backing node (or NULL if
>>> @base was
>>> +# not specified) instead of modifying @device itself.
>>> +#
>>>    # @job-id: identifier for the newly-created block job. If
>>>    #          omitted, the device name will be used. (Since 2.7)
>>>    #
>>> diff --git a/block/stream.c b/block/stream.c
>>> index aa2e7af98e..b9c1141656 100644
>>> --- a/block/stream.c
>>> +++ b/block/stream.c
>>> @@ -31,7 +31,8 @@ enum {
>>>      typedef struct StreamBlockJob {
>>>        BlockJob common;
>>> -    BlockDriverState *bottom;
>>> +    BlockDriverState *base_overlay; /* COW overlay (stream from this) */
>>> +    BlockDriverState *above_base;   /* Node directly above the base */
>> Keeping the base_overlay is enough to complete the stream job.
> Depends on the definition.  If we decide it isn’t enough, then it isn’t
> enough.
>
>> The above_base may disappear during the job and we can't rely on it.
> In this version of this series, it may not, because the chain is frozen.
>   So the above_base cannot disappear.

Once we insert a filter above the top bs of the stream job, the parallel 
jobs in

the iotests #030 will fail with 'frozen link error'. It is because of the

independent parallel stream or commit jobs that insert/remove their filters

asynchroniously.

>
> We can discuss whether we should allow it to disappear, but I think not.
>
> The problem is, we need something to set as the backing file after
> streaming.  How do we figure out what that should be?  My proposal is we
> keep above_base and use its immediate child.

We can do the same with the base_overlay.

If the backing node turns out to be a filter, the proper backing child will

be set after the filter is removed. So, we shouldn't care.

>
> If we don’t keep above_base, then we’re basically left guessing as to
> what should be the backing file after the stream job.
>
>>>        BlockdevOnError on_error;
>>>        char *backing_file_str;
>>>        bool bs_read_only;
>>> @@ -53,7 +54,7 @@ static void stream_abort(Job *job)
>>>          if (s->chain_frozen) {
>>>            BlockJob *bjob = &s->common;
>>> -        bdrv_unfreeze_backing_chain(blk_bs(bjob->blk), s->bottom);
>>> +        bdrv_unfreeze_backing_chain(blk_bs(bjob->blk), s->above_base);
>>>        }
>>>    }
>>>    @@ -62,14 +63,15 @@ static int stream_prepare(Job *job)
>>>        StreamBlockJob *s = container_of(job, StreamBlockJob, common.job);
>>>        BlockJob *bjob = &s->common;
>>>        BlockDriverState *bs = blk_bs(bjob->blk);
>>> -    BlockDriverState *base = backing_bs(s->bottom);
>>> +    BlockDriverState *unfiltered_bs = bdrv_skip_filters(bs);
>>> +    BlockDriverState *base = bdrv_filter_or_cow_bs(s->above_base);
>> The initial base node may be a top node for a concurrent commit job and
>>
>> may disappear.
> Then it would just be replaced by another node, though, so above_base
> keeps a child.  The @base here is not necessarily the initial @base, and
> that’s intentional.

Not really. In my example, above_base becomes a dangling

pointer because after the commit job finishes, its filter that should 
belong to the

commit job frozen chain will be deleted. If we freeze the link to the 
above_base

for this job, the iotests #30 will not pass.

>> base = bdrv_filter_or_cow_bs(s->base_overlay) is more reliable.
> But also wrong.  The point of keeping above_base around is to get its
> child here to use that child as the new backing child of the top node.
>
>>>        Error *local_err = NULL;
>>>        int ret = 0;
>>>    -    bdrv_unfreeze_backing_chain(bs, s->bottom);
>>> +    bdrv_unfreeze_backing_chain(bs, s->above_base);
>>>        s->chain_frozen = false;
>>>    -    if (bs->backing) {
>>> +    if (bdrv_cow_child(unfiltered_bs)) {
>>>            const char *base_id = NULL, *base_fmt = NULL;
>>>            if (base) {
>>>                base_id = s->backing_file_str;
>>> @@ -77,8 +79,8 @@ static int stream_prepare(Job *job)
>>>                    base_fmt = base->drv->format_name;
>>>                }
>>>            }
>>> -        bdrv_set_backing_hd(bs, base, &local_err);
>>> -        ret = bdrv_change_backing_file(bs, base_id, base_fmt);
>>> +        bdrv_set_backing_hd(unfiltered_bs, base, &local_err);
>>> +        ret = bdrv_change_backing_file(unfiltered_bs, base_id,
>>> base_fmt);
>>>            if (local_err) {
>>>                error_report_err(local_err);
>>>                return -EPERM;
>>> @@ -109,14 +111,15 @@ static int coroutine_fn stream_run(Job *job,
>>> Error **errp)
>>>        StreamBlockJob *s = container_of(job, StreamBlockJob, common.job);
>>>        BlockBackend *blk = s->common.blk;
>>>        BlockDriverState *bs = blk_bs(blk);
>>> -    bool enable_cor = !backing_bs(s->bottom);
>>> +    BlockDriverState *unfiltered_bs = bdrv_skip_filters(bs);
>>> +    bool enable_cor = !bdrv_cow_child(s->base_overlay);
>>>        int64_t len;
>>>        int64_t offset = 0;
>>>        uint64_t delay_ns = 0;
>>>        int error = 0;
>>>        int64_t n = 0; /* bytes */
>>>    -    if (bs == s->bottom) {
>>> +    if (unfiltered_bs == s->base_overlay) {
>>>            /* Nothing to stream */
>>>            return 0;
>>>        }
>>> @@ -150,13 +153,14 @@ static int coroutine_fn stream_run(Job *job,
>>> Error **errp)
>>>              copy = false;
>>>    -        ret = bdrv_is_allocated(bs, offset, STREAM_CHUNK, &n);
>>> +        ret = bdrv_is_allocated(unfiltered_bs, offset, STREAM_CHUNK,
>>> &n);
>>>            if (ret == 1) {
>>>                /* Allocated in the top, no need to copy.  */
>>>            } else if (ret >= 0) {
>>>                /* Copy if allocated in the intermediate images.  Limit
>>> to the
>>>                 * known-unallocated area [offset,
>>> offset+n*BDRV_SECTOR_SIZE).  */
>>> -            ret = bdrv_is_allocated_above(backing_bs(bs), s->bottom,
>>> true,
>>> +            ret = bdrv_is_allocated_above(bdrv_cow_bs(unfiltered_bs),
>>> +                                          s->base_overlay, true,
>>>                                              offset, n, &n);
>>>                /* Finish early if end of backing file has been reached */
>>>                if (ret == 0 && n == 0) {
>>> @@ -223,9 +227,29 @@ void stream_start(const char *job_id,
>>> BlockDriverState *bs,
>>>        BlockDriverState *iter;
>>>        bool bs_read_only;
>>>        int basic_flags = BLK_PERM_CONSISTENT_READ |
>>> BLK_PERM_WRITE_UNCHANGED;
>>> -    BlockDriverState *bottom = bdrv_find_overlay(bs, base);
>>> +    BlockDriverState *base_overlay = bdrv_find_overlay(bs, base);
>>> +    BlockDriverState *above_base;
>>>    -    if (bdrv_freeze_backing_chain(bs, bottom, errp) < 0) {
>>> +    if (!base_overlay) {
>>> +        error_setg(errp, "'%s' is not in the backing chain of '%s'",
>>> +                   base->node_name, bs->node_name);
>> Sorry, I am not clear with the error message.
>>
>> In this case, there is no an intermediate COW node but the base, if not
>> NULL, is
>>
>> in the backing chain of bs, isn't it?
>>
>>> +        return;
>>> +    }
>>> +
>>> +    /*
>>> +     * Find the node directly above @base.  @base_overlay is a COW
>>> overlay, so
>>> +     * it must have a bdrv_cow_child(), but it is the immediate
>>> overlay of
>>> +     * @base, so between the two there can only be filters.
>>> +     */
>>> +    above_base = base_overlay;
>>> +    if (bdrv_cow_bs(above_base) != base) {
>>> +        above_base = bdrv_cow_bs(above_base);
>>> +        while (bdrv_filter_bs(above_base) != base) {
>>> +            above_base = bdrv_filter_bs(above_base);
>>> +        }
>>> +    }
>>> +
>>> +    if (bdrv_freeze_backing_chain(bs, above_base, errp) < 0) {
>> When a concurrent stream job tries to freeze or remove the above_base node,
>>
>> we will encounter the frozen node error. The above_base node is a part
>> of the
>>
>> concurrent job frozen chain.
> Correct.
>
>>>            return;
>>>        }
>>>    @@ -255,14 +279,19 @@ void stream_start(const char *job_id,
>>> BlockDriverState *bs,
>>>         * and resizes. Reassign the base node pointer because the
>>> backing BS of the
>>>         * bottom node might change after the call to
>>> bdrv_reopen_set_read_only()
>>>         * due to parallel block jobs running.
>>> +     * above_base node might change after the call to
>> Yes, if not frozen.
>>> +     * bdrv_reopen_set_read_only() due to parallel block jobs running.
>>>         */
>>> -    base = backing_bs(bottom);
>>> -    for (iter = backing_bs(bs); iter && iter != base; iter =
>>> backing_bs(iter)) {
>>> +    base = bdrv_filter_or_cow_bs(above_base);
>>> +    for (iter = bdrv_filter_or_cow_bs(bs); iter != base;
>>> +         iter = bdrv_filter_or_cow_bs(iter))
>>> +    {
>>>            block_job_add_bdrv(&s->common, "intermediate node", iter, 0,
>>>                               basic_flags, &error_abort);
>>>        }
>>>    -    s->bottom = bottom;
>>> +    s->base_overlay = base_overlay;
>>> +    s->above_base = above_base;
>> Generally, being the filter for a concurrent job, the above_base node
>> may be deleted any time
>>
>> and we will keep the dangling pointer. It may happen even earlier if
>> above_base is not frozen.
>>
>> If it is, as it here, we may get the frozen link error then.
> I’m not sure what you mean here.  Freezing it was absolutely
> intentional.  A dangling pointer would be a problem, but that’s why it’s
> frozen, so it stays around and can’t be deleted any time.
>
> Max

The nodes we freeze should be in one context of the relevant job:

filter->top_node->intermediate_node(s)

We would not include the base or any filter above it to the frozen chain

because they are of a different job context.

Once 'this' job is completed, we set the current backing child of the 
base_overlay

and may not care of its character. If that is another job filter, it 
will be replaced

with the proper node afterwards.


Andrey



^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [PATCH v7 17/47] block: Re-evaluate backing file handling in reopen
  2020-06-25 15:21 ` [PATCH v7 17/47] block: Re-evaluate backing file handling in reopen Max Reitz
@ 2020-07-10 19:42   ` Andrey Shinkevich
  2020-07-16 15:04     ` Max Reitz
  0 siblings, 1 reply; 173+ messages in thread
From: Andrey Shinkevich @ 2020-07-10 19:42 UTC (permalink / raw)
  To: Max Reitz, qemu-block
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, qemu-devel

On 25.06.2020 18:21, Max Reitz wrote:
> Reopening a node's backing child needs a bit of special handling because
> the "backing" child has different defaults than all other children
> (among other things).  Adding filter support here is a bit more
> difficult than just using the child access functions.  In fact, we often
> have to directly use bs->backing because these functions are about the
> "backing" child (which may or may not be the COW backing file).
>
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>   block.c | 46 ++++++++++++++++++++++++++++++++++++++--------
>   1 file changed, 38 insertions(+), 8 deletions(-)
>
> diff --git a/block.c b/block.c
> index 712230ef5c..8131d0b5eb 100644
> --- a/block.c
> +++ b/block.c
> @@ -4026,26 +4026,56 @@ static int bdrv_reopen_parse_backing(BDRVReopenState *reopen_state,
>           }
>       }
>   
> +    /*
> +     * Ensure that @bs can really handle backing files, because we are
> +     * about to give it one (or swap the existing one)
> +     */
> +    if (bs->drv->is_filter) {
> +        /* Filters always have a file or a backing child */
> +        if (!bs->backing) {
> +            error_setg(errp, "'%s' is a %s filter node that does not support a "
> +                       "backing child", bs->node_name, bs->drv->format_name);
> +            return -EINVAL;
> +        }
> +    } else if (!bs->drv->supports_backing) {
> +        error_setg(errp, "Driver '%s' of node '%s' does not support backing "
> +                   "files", bs->drv->format_name, bs->node_name);
> +        return -EINVAL;
> +    }
> +
>       /*
>        * Find the "actual" backing file by skipping all links that point
>        * to an implicit node, if any (e.g. a commit filter node).
> +     * We cannot use any of the bdrv_skip_*() functions here because
> +     * those return the first explicit node, while we are looking for
> +     * its overlay here.
>        */
>       overlay_bs = bs;
> -    while (backing_bs(overlay_bs) && backing_bs(overlay_bs)->implicit) {
> -        overlay_bs = backing_bs(overlay_bs);
> +    while (bdrv_filter_or_cow_bs(overlay_bs) &&
> +           bdrv_filter_or_cow_bs(overlay_bs)->implicit)
> +    {
> +        overlay_bs = bdrv_filter_or_cow_bs(overlay_bs);
>       }

I believe that little optimization would work properly:


for (BlockDriverState *below_bs = bdrv_filter_or_cow_bs(overlay_bs);
        below_bs && below_bs->implicit;
        below_bs = bdrv_filter_or_cow_bs(overlay_bs)) {
          overlay_bs = below_bs;
}
>   
>       /* If we want to replace the backing file we need some extra checks */
> -    if (new_backing_bs != backing_bs(overlay_bs)) {
> +    if (new_backing_bs != bdrv_filter_or_cow_bs(overlay_bs)) {
>           /* Check for implicit nodes between bs and its backing file */
>           if (bs != overlay_bs) {
>               error_setg(errp, "Cannot change backing link if '%s' has "
>                          "an implicit backing file", bs->node_name);
>               return -EPERM;
>           }
> -        /* Check if the backing link that we want to replace is frozen */
> -        if (bdrv_is_backing_chain_frozen(overlay_bs, backing_bs(overlay_bs),
> -                                         errp)) {
> +        /*
> +         * Check if the backing link that we want to replace is frozen.
> +         * Note that
> +         * bdrv_filter_or_cow_child(overlay_bs) == overlay_bs->backing,
> +         * because we know that overlay_bs == bs, and that @bs
> +         * either is a filter that uses ->backing or a COW format BDS
> +         * with bs->drv->supports_backing == true.
> +         */
> +        if (bdrv_is_backing_chain_frozen(overlay_bs,
> +                                         child_bs(overlay_bs->backing), errp))
What would be wrong with bdrv_filter_or_cow_bs(overlay_bs) here?
> +        {
>               return -EPERM;
>           }
>           reopen_state->replace_backing_bs = true;
> @@ -4196,7 +4226,7 @@ int bdrv_reopen_prepare(BDRVReopenState *reopen_state, BlockReopenQueue *queue,
>        * its metadata. Otherwise the 'backing' option can be omitted.
>        */
>       if (drv->supports_backing && reopen_state->backing_missing &&
> -        (backing_bs(reopen_state->bs) || reopen_state->bs->backing_file[0])) {
= BlockDriverState*
> +        (reopen_state->bs->backing || reopen_state->bs->backing_file[0])) {

= BdrvChild*

Are we OK with that?

>           error_setg(errp, "backing is missing for '%s'",
>                      reopen_state->bs->node_name);
>           ret = -EINVAL;
> @@ -4337,7 +4367,7 @@ void bdrv_reopen_commit(BDRVReopenState *reopen_state)
>        * from bdrv_set_backing_hd()) has the new values.
>        */
>       if (reopen_state->replace_backing_bs) {
> -        BlockDriverState *old_backing_bs = backing_bs(bs);
> +        BlockDriverState *old_backing_bs = child_bs(bs->backing);
>           assert(!old_backing_bs || !old_backing_bs->implicit);
>           /* Abort the permission update on the backing bs we're detaching */
>           if (old_backing_bs) {


^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [PATCH v7 01/47] block: Add child access functions
  2020-06-25 15:21 ` [PATCH v7 01/47] block: Add child access functions Max Reitz
  2020-07-08 17:22   ` Andrey Shinkevich
@ 2020-07-13  9:06   ` Vladimir Sementsov-Ogievskiy
  2020-07-16 14:46     ` Max Reitz
  2020-07-28 16:09     ` Christophe de Dinechin
  2020-07-13  9:57   ` Vladimir Sementsov-Ogievskiy
  2 siblings, 2 replies; 173+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-07-13  9:06 UTC (permalink / raw)
  To: Max Reitz, qemu-block; +Cc: Kevin Wolf, qemu-devel

25.06.2020 18:21, Max Reitz wrote:
> There are BDS children that the general block layer code can access,
> namely bs->file and bs->backing.  Since the introduction of filters and
> external data files, their meaning is not quite clear.  bs->backing can
> be a COW source, or it can be a filtered child; bs->file can be a
> filtered child, it can be data and metadata storage, or it can be just
> metadata storage.
> 
> This overloading really is not helpful.  This patch adds functions that
> retrieve the correct child for each exact purpose.  Later patches in
> this series will make use of them.  Doing so will allow us to handle
> filter nodes in a meaningful way.
> 
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---

[..]

> +/*
> + * Return the primary child of this node: For filters, that is the
> + * filtered child.  For other nodes, that is usually the child storing
> + * metadata.
> + * (A generally more helpful description is that this is (usually) the
> + * child that has the same filename as @bs.)
> + *
> + * Drivers do not necessarily have a primary child; for example quorum
> + * does not.
> + */
> +BdrvChild *bdrv_primary_child(BlockDriverState *bs)
> +{
> +    BdrvChild *c;
> +
> +    QLIST_FOREACH(c, &bs->children, next) {
> +        if (c->role & BDRV_CHILD_PRIMARY) {
> +            return c;
> +        }
> +    }
> +
> +    return NULL;
> +}
> 

Suggest squash-in to also assert that not more than one primary child:
--- a/block.c
+++ b/block.c
@@ -6998,13 +6998,14 @@ BdrvChild *bdrv_filter_or_cow_child(BlockDriverState *bs)
   */
  BdrvChild *bdrv_primary_child(BlockDriverState *bs)
  {
-    BdrvChild *c;
+    BdrvChild *c, *found = NULL;
  
      QLIST_FOREACH(c, &bs->children, next) {
          if (c->role & BDRV_CHILD_PRIMARY) {
-            return c;
+            assert(!found);
+            found = c;
          }
      }
  
-    return NULL;
+    return c;
  }


with or without:
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>


-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [PATCH v7 01/47] block: Add child access functions
  2020-06-25 15:21 ` [PATCH v7 01/47] block: Add child access functions Max Reitz
  2020-07-08 17:22   ` Andrey Shinkevich
  2020-07-13  9:06   ` Vladimir Sementsov-Ogievskiy
@ 2020-07-13  9:57   ` Vladimir Sementsov-Ogievskiy
  2 siblings, 0 replies; 173+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-07-13  9:57 UTC (permalink / raw)
  To: Max Reitz, qemu-block; +Cc: Kevin Wolf, qemu-devel

25.06.2020 18:21, Max Reitz wrote:
> There are BDS children that the general block layer code can access,
> namely bs->file and bs->backing.  Since the introduction of filters and
> external data files, their meaning is not quite clear.  bs->backing can
> be a COW source, or it can be a filtered child; bs->file can be a
> filtered child, it can be data and metadata storage, or it can be just
> metadata storage.
> 
> This overloading really is not helpful.  This patch adds functions that
> retrieve the correct child for each exact purpose.  Later patches in
> this series will make use of them.  Doing so will allow us to handle
> filter nodes in a meaningful way.
> 
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>   include/block/block_int.h | 44 +++++++++++++++++--
>   block.c                   | 90 +++++++++++++++++++++++++++++++++++++++
>   2 files changed, 131 insertions(+), 3 deletions(-)
> 
> diff --git a/include/block/block_int.h b/include/block/block_int.h
> index 1b86b59af1..bb3457c5e8 100644
> --- a/include/block/block_int.h
> +++ b/include/block/block_int.h

[..]

> +/*
> + * If @bs acts as a filter for exactly one of its children, return
> + * that child.
> + */
> +BdrvChild *bdrv_filter_child(BlockDriverState *bs)

Hmm you called it filter_child instead of filterED_child..

> +{
> +    BdrvChild *c;
> +
> +    if (!bs || !bs->drv) {
> +        return NULL;
> +    }
> +
> +    if (!bs->drv->is_filter) {
> +        return NULL;
> +    }
> +
> +    /* Only one of @backing or @file may be used */
> +    assert(!(bs->backing && bs->file));
> +
> +    c = bs->backing ?: bs->file;
> +    if (!c) {
> +        return NULL;
> +    }
> +
> +    assert(c->role & BDRV_CHILD_FILTERED);

But the role is still called CHILD_FILTERED

> +    return c;
> +}

(just note that it's a bit inconsistent, keep my r-b anyway)


-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [PATCH v7 02/47] block: Add chain helper functions
  2020-06-25 15:21 ` [PATCH v7 02/47] block: Add chain helper functions Max Reitz
  2020-07-08 17:20   ` Andrey Shinkevich
@ 2020-07-13 10:18   ` Vladimir Sementsov-Ogievskiy
  2020-07-16 14:50     ` Max Reitz
  1 sibling, 1 reply; 173+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-07-13 10:18 UTC (permalink / raw)
  To: Max Reitz, qemu-block; +Cc: Kevin Wolf, qemu-devel

25.06.2020 18:21, Max Reitz wrote:
> Add some helper functions for skipping filters in a chain of block
> nodes.
> 
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>   include/block/block_int.h |  3 +++
>   block.c                   | 55 +++++++++++++++++++++++++++++++++++++++
>   2 files changed, 58 insertions(+)
> 
> diff --git a/include/block/block_int.h b/include/block/block_int.h
> index bb3457c5e8..5da793bfc3 100644


This patch raises two questions:

1. How to treat filters at the end of the backing chain?

- child-access function will return no filter child for such nodes, it's correct of course
- filer skipping functions will return this filter.. How much is it correct - I don't know.


Consider a chain

top --- backing ---> filter-with-no-child

if bdrv_backing_chain_next(top) returns NULL, it's incorrect, because
top actually have backing, and on read it will read from it for
unallocated clusters (and this should crash). So, probably, returning
filter as a backing-chain-next is a valid thing to do. Or we should
assert that we are not in such situation (which may crash more often
than trying to really read from nonexistent child).

so, returning NULL, may even less correct than returning a filter..


2. How to tread nodes with drv=NULL, but with filter child (with BDRV_CHILD_FILTERED role).
- child-access functions returns no filtered child for such nodes
- filter skipping functions will stop on it..

=======

Isn't it better to drop drv->is_filter at all? And call filter nodes with a bs->file or bs->backing
child in BDRV_CHILD_FILTERED role? This automatically closes the two questions:

- node without a child in BDRV_CHILD_FILTERED is automatically non-filter. So, filter driver is responsible for having such child.
- node without a drv may still be a filter if it have BDRV_CHILD_FILTERED.. Still, not very useful.

Anyway, is_filter and BDRV_CHILD_FILTERED are in contradiction, and it seems good to get rid of is_filter. But I may miss something.

[..]

> +
> +static BlockDriverState *bdrv_do_skip_filters(BlockDriverState *bs,
> +                                              bool stop_on_explicit_filter)
> +{
> +    BdrvChild *c;
> +
> +    if (!bs) {
> +        return NULL;
> +    }
> +
> +    while (!(stop_on_explicit_filter && !bs->implicit)) {
> +        c = bdrv_filter_child(bs);
> +        if (!c) {
> +            break;
> +        }
> +        bs = c->bs;
> +    }
> +    /*
> +     * Note that this treats nodes with bs->drv == NULL as not being
> +     * filters (bs->drv == NULL should be replaced by something else
> +     * anyway).
> +     * The advantage of this behavior is that this function will thus
> +     * always return a non-NULL value (given a non-NULL @bs).

I don't see, how it is follows from first sentence? We can skip nodes
with a child of BDRV_CHILD_FILTERED and drv=NULL as well, and still return
non-NULL bs at the end...

Didn't you mean "treat nodes without filter child as not being filters, even if they have drv->is_filter == true"? This is a real reason for the second sentence.

... and the disadvantage is that we may return filter node, which may be not expected by caller ...

> +     */
> +
> +    return bs;
> +}


I think, I can live with it as is for now. If we don't drop is_filter, I think it worth documenting these corner cases somewhere (may be near .is_filter definition).

Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>


-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [PATCH v7 18/47] block: Flush all children in generic code
  2020-06-25 15:21 ` [PATCH v7 18/47] block: Flush all children in generic code Max Reitz
@ 2020-07-14 12:52   ` Andrey Shinkevich
  0 siblings, 0 replies; 173+ messages in thread
From: Andrey Shinkevich @ 2020-07-14 12:52 UTC (permalink / raw)
  To: Max Reitz, qemu-block
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, qemu-devel

On 25.06.2020 18:21, Max Reitz wrote:
> If the driver does not support .bdrv_co_flush() so bdrv_co_flush()
> itself has to flush the children of the given node, it should not flush
> just bs->file->bs, but in fact all children that might have been written
> to (judging from the permissions taken on them).
>
> This is a bug fix for qcow2 images with an external data file, as they
> so far did not flush that data_file node.
>
> In any case, the BLKDBG_EVENT() should be emitted on the primary child,
> because that is where a blkdebug node would be if there is any.
>
> Suggested-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>   block/io.c | 23 +++++++++++++++++------
>   1 file changed, 17 insertions(+), 6 deletions(-)
...
> @@ int coroutine_fn bdrv_co_flush(BlockDriverState *bs)
>       /* Now flush the underlying protocol.  It will also have BDRV_O_NO_FLUSH
>        * in the case of cache=unsafe, so there are no useless flushes.
>        */
> -flush_parent:
> -    ret = bs->file ? bdrv_co_flush(bs->file->bs) : 0;
> +flush_children:
> +    ret = 0;
> +    QLIST_FOREACH(child, &bs->children, next) {
> +        if (child->perm & (BLK_PERM_WRITE | BLK_PERM_WRITE_UNCHANGED)) {
> +            int this_child_ret = bdrv_co_flush(child->bs);
> +            if (!ret) {
> +                ret = this_child_ret;
> +            }
> +        }
> +    }
> +
>   out:
>       /* Notify any pending flushes that we have completed */
>       if (ret == 0) {


I have not tested if the running application do reaches the flush_parent 
point to flush the data_file but with the code it looks OK.

Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>



^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [PATCH v7 19/47] vmdk: Drop vmdk_co_flush()
  2020-06-25 15:21 ` [PATCH v7 19/47] vmdk: Drop vmdk_co_flush() Max Reitz
@ 2020-07-14 14:52   ` Andrey Shinkevich
  2020-07-16 15:08     ` Max Reitz
  0 siblings, 1 reply; 173+ messages in thread
From: Andrey Shinkevich @ 2020-07-14 14:52 UTC (permalink / raw)
  To: Max Reitz, qemu-block
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, qemu-devel

On 25.06.2020 18:21, Max Reitz wrote:
> Before HEAD^, we needed this because bdrv_co_flush() by itself would
> only flush bs->file.  With HEAD^, bdrv_co_flush() will flush all
> children on which a WRITE or WRITE_UNCHANGED permission has been taken.
> Thus, vmdk no longer needs to do it itself.
>
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>   block/vmdk.c | 16 ----------------
>   1 file changed, 16 deletions(-)
>
> diff --git a/block/vmdk.c b/block/vmdk.c
> index 62da465126..a23890e6ec 100644
> --- a/block/vmdk.c
> +++ b/block/vmdk.c
> @@ -2802,21 +2802,6 @@ static void vmdk_close(BlockDriverState *bs)
>       error_free(s->migration_blocker);
>   }
>   
> -static coroutine_fn int vmdk_co_flush(BlockDriverState *bs)
> -{
> -    BDRVVmdkState *s = bs->opaque;
> -    int i, err;
> -    int ret = 0;
> -
> -    for (i = 0; i < s->num_extents; i++) {
> -        err = bdrv_co_flush(s->extents[i].file->bs);
> -        if (err < 0) {
> -            ret = err;
> -        }
> -    }
> -    return ret;
> -}
> -
>   static int64_t vmdk_get_allocated_file_size(BlockDriverState *bs)
>   {
>       int i;
> @@ -3075,7 +3060,6 @@ static BlockDriver bdrv_vmdk = {
>       .bdrv_close                   = vmdk_close,
>       .bdrv_co_create_opts          = vmdk_co_create_opts,
>       .bdrv_co_create               = vmdk_co_create,
> -    .bdrv_co_flush_to_disk        = vmdk_co_flush,


After HEAD^ applied, wouldn't we get an endless recursion in 
bdrv_co_flush() if the HEAD (this patch) had not been merged into HEAD^?

Andrey

>       .bdrv_co_block_status         = vmdk_co_block_status,
>       .bdrv_get_allocated_file_size = vmdk_get_allocated_file_size,
>       .bdrv_has_zero_init           = vmdk_has_zero_init,


^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [PATCH v7 20/47] block: Iterate over children in refresh_limits
  2020-06-25 15:21 ` [PATCH v7 20/47] block: Iterate over children in refresh_limits Max Reitz
@ 2020-07-14 18:37   ` Andrey Shinkevich
  2020-07-16 15:14     ` Max Reitz
  0 siblings, 1 reply; 173+ messages in thread
From: Andrey Shinkevich @ 2020-07-14 18:37 UTC (permalink / raw)
  To: Max Reitz, qemu-block
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, qemu-devel

On 25.06.2020 18:21, Max Reitz wrote:
> Instead of looking at just bs->file and bs->backing, we should look at
> all children that could end up receiving forwarded requests.
>
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>   block/io.c | 32 ++++++++++++++++----------------
>   1 file changed, 16 insertions(+), 16 deletions(-)
>
> diff --git a/block/io.c b/block/io.c
> index c2af7711d6..37057f13e0 100644
> --- a/block/io.c
> +++ b/block/io.c
> @@ -135,6 +135,8 @@ static void bdrv_merge_limits(BlockLimits *dst, const BlockLimits *src)
>   void bdrv_refresh_limits(BlockDriverState *bs, Error **errp)
>   {
>       BlockDriver *drv = bs->drv;
> +    BdrvChild *c;
> +    bool have_limits;
>       Error *local_err = NULL;
>   
>       memset(&bs->bl, 0, sizeof(bs->bl));
> @@ -149,14 +151,21 @@ void bdrv_refresh_limits(BlockDriverState *bs, Error **errp)
>                                   drv->bdrv_co_preadv_part) ? 1 : 512;
>   
>       /* Take some limits from the children as a default */
> -    if (bs->file) {
> -        bdrv_refresh_limits(bs->file->bs, &local_err);
> -        if (local_err) {
> -            error_propagate(errp, local_err);
> -            return;
> +    have_limits = false;
> +    QLIST_FOREACH(c, &bs->children, next) {
> +        if (c->role & (BDRV_CHILD_DATA | BDRV_CHILD_FILTERED | BDRV_CHILD_COW))
> +        {
> +            bdrv_refresh_limits(c->bs, &local_err);
> +            if (local_err) {
> +                error_propagate(errp, local_err);
> +                return;
> +            }
> +            bdrv_merge_limits(&bs->bl, &c->bs->bl);
> +            have_limits = true;
>           }
> -        bdrv_merge_limits(&bs->bl, &bs->file->bs->bl);
> -    } else {
> +    }
> +
> +    if (!have_limits) {


This conditioned piece of code worked with (bs->file == NULL) only.

Now, it works only if there are neither bs->file, nor bs->backing, nor 
else filtered children.

Is it OK and doesn't break the logic for all cases?

Andrey


>           bs->bl.min_mem_alignment = 512;
>           bs->bl.opt_mem_alignment = qemu_real_host_page_size;
>   
> @@ -164,15 +173,6 @@ void bdrv_refresh_limits(BlockDriverState *bs, Error **errp)
>           bs->bl.max_iov = IOV_MAX;
>       }
>   
> -    if (bs->backing) {
> -        bdrv_refresh_limits(bs->backing->bs, &local_err);
> -        if (local_err) {
> -            error_propagate(errp, local_err);
> -            return;
> -        }
> -        bdrv_merge_limits(&bs->bl, &bs->backing->bs->bl);
> -    }
> -
>       /* Then let the driver override it */
>       if (drv->bdrv_refresh_limits) {
>           drv->bdrv_refresh_limits(bs, errp);


^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [PATCH v7 21/47] block: Use CAFs in bdrv_refresh_filename()
  2020-06-25 15:21 ` [PATCH v7 21/47] block: Use CAFs in bdrv_refresh_filename() Max Reitz
@ 2020-07-15 12:52   ` Andrey Shinkevich
  2020-07-15 12:58     ` Andrey Shinkevich
  2020-07-16 15:21     ` Max Reitz
  0 siblings, 2 replies; 173+ messages in thread
From: Andrey Shinkevich @ 2020-07-15 12:52 UTC (permalink / raw)
  To: Max Reitz, qemu-block
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, qemu-devel

On 25.06.2020 18:21, Max Reitz wrote:
> bdrv_refresh_filename() and the kind of related bdrv_dirname() should
> look to the primary child when they wish to copy the underlying file's
> filename.
>
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>   block.c | 29 +++++++++++++++++++++--------
>   1 file changed, 21 insertions(+), 8 deletions(-)
>
> diff --git a/block.c b/block.c
> index 8131d0b5eb..7c827fefa0 100644
> --- a/block.c
> +++ b/block.c
> @@ -6797,6 +6797,7 @@ void bdrv_refresh_filename(BlockDriverState *bs)
>   {
>       BlockDriver *drv = bs->drv;
>       BdrvChild *child;
> +    BlockDriverState *primary_child_bs;
>       QDict *opts;
>       bool backing_overridden;
>       bool generate_json_filename; /* Whether our default implementation should
> @@ -6866,20 +6867,30 @@ void bdrv_refresh_filename(BlockDriverState *bs)
>       qobject_unref(bs->full_open_options);
>       bs->full_open_options = opts;
>   
> +    primary_child_bs = bdrv_primary_bs(bs);
> +
>       if (drv->bdrv_refresh_filename) {
>           /* Obsolete information is of no use here, so drop the old file name
>            * information before refreshing it */
>           bs->exact_filename[0] = '\0';
>   
>           drv->bdrv_refresh_filename(bs);
> -    } else if (bs->file) {
> -        /* Try to reconstruct valid information from the underlying file */
> +    } else if (primary_child_bs) {
> +        /*
> +         * Try to reconstruct valid information from the underlying
> +         * file -- this only works for format nodes (filter nodes
> +         * cannot be probed and as such must be selected by the user
> +         * either through an options dict, or through a special
> +         * filename which the filter driver must construct in its
> +         * .bdrv_refresh_filename() implementation).
> +         */


The caller may not be aware of a filter node and intend to refresh the 
name of underlying format node.

In that case, the filter driver should redirect the call to the format node.

What are situations the name of the filter itself should be refreshed in?

If there are any, should we do both actions or choose either?

Andrey


>   
>           bs->exact_filename[0] = '\0';
>   
>           /*
>            * We can use the underlying file's filename if:
>            * - it has a filename,
> +         * - the current BDS is not a filter,


Should we check the function input parameter for being a filter's BS 
here, in this function, and handle the case here or let the filter 
driver function do that or else the caller should check it?

Andrey


>            * - the file is a protocol BDS, and
>            * - opening that file (as this BDS's format) will automatically create
>            *   the BDS tree we have right now, that is:
> @@ -6888,11 +6899,11 @@ void bdrv_refresh_filename(BlockDriverState *bs)
>            *   - no non-file child of this BDS has been overridden by the user
>            *   Both of these conditions are represented by generate_json_filename.
>            */
> -        if (bs->file->bs->exact_filename[0] &&
> -            bs->file->bs->drv->bdrv_file_open &&
> -            !generate_json_filename)
> +        if (primary_child_bs->exact_filename[0] &&
> +            primary_child_bs->drv->bdrv_file_open &&
> +            !drv->is_filter && !generate_json_filename)
>           {
> -            strcpy(bs->exact_filename, bs->file->bs->exact_filename);
> +            strcpy(bs->exact_filename, primary_child_bs->exact_filename);
>           }
>       }
>   
> @@ -6912,6 +6923,7 @@ void bdrv_refresh_filename(BlockDriverState *bs)
>   char *bdrv_dirname(BlockDriverState *bs, Error **errp)
>   {
>       BlockDriver *drv = bs->drv;
> +    BlockDriverState *child_bs;
>   
>       if (!drv) {
>           error_setg(errp, "Node '%s' is ejected", bs->node_name);
> @@ -6922,8 +6934,9 @@ char *bdrv_dirname(BlockDriverState *bs, Error **errp)
>           return drv->bdrv_dirname(bs, errp);
>       }
>   
> -    if (bs->file) {
> -        return bdrv_dirname(bs->file->bs, errp);
> +    child_bs = bdrv_primary_bs(bs);
> +    if (child_bs) {
> +        return bdrv_dirname(child_bs, errp);
>       }
>   
>       bdrv_refresh_filename(bs);


^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [PATCH v7 21/47] block: Use CAFs in bdrv_refresh_filename()
  2020-07-15 12:52   ` Andrey Shinkevich
@ 2020-07-15 12:58     ` Andrey Shinkevich
  2020-07-16 15:21     ` Max Reitz
  1 sibling, 0 replies; 173+ messages in thread
From: Andrey Shinkevich @ 2020-07-15 12:58 UTC (permalink / raw)
  To: Max Reitz, qemu-block
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, qemu-devel


On 15.07.2020 15:52, Andrey Shinkevich wrote:
> On 25.06.2020 18:21, Max Reitz wrote:
>> bdrv_refresh_filename() and the kind of related bdrv_dirname() should
>> look to the primary child when they wish to copy the underlying file's
>> filename.
>>
>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>> ---
>>   block.c | 29 +++++++++++++++++++++--------
>>   1 file changed, 21 insertions(+), 8 deletions(-)
>>
>> diff --git a/block.c b/block.c
>> index 8131d0b5eb..7c827fefa0 100644
>> --- a/block.c
>> +++ b/block.c
>> @@ -6797,6 +6797,7 @@ void bdrv_refresh_filename(BlockDriverState *bs)
>>   {
>>       BlockDriver *drv = bs->drv;
>>       BdrvChild *child;
>> +    BlockDriverState *primary_child_bs;
>>       QDict *opts;
>>       bool backing_overridden;
>>       bool generate_json_filename; /* Whether our default 
>> implementation should
>> @@ -6866,20 +6867,30 @@ void bdrv_refresh_filename(BlockDriverState *bs)
>>       qobject_unref(bs->full_open_options);
>>       bs->full_open_options = opts;
>>   +    primary_child_bs = bdrv_primary_bs(bs);
>> +
>>       if (drv->bdrv_refresh_filename) {
>>           /* Obsolete information is of no use here, so drop the old 
>> file name
>>            * information before refreshing it */
>>           bs->exact_filename[0] = '\0';
>>             drv->bdrv_refresh_filename(bs);
>> -    } else if (bs->file) {
>> -        /* Try to reconstruct valid information from the underlying 
>> file */
>> +    } else if (primary_child_bs) {
>> +        /*
>> +         * Try to reconstruct valid information from the underlying
>> +         * file -- this only works for format nodes (filter nodes
>> +         * cannot be probed and as such must be selected by the user
>> +         * either through an options dict, or through a special
>> +         * filename which the filter driver must construct in its
>> +         * .bdrv_refresh_filename() implementation).
>> +         */
>
>
> The caller may not be aware of a filter node and intend to refresh the 
> name of underlying format node.
>
> In that case, the filter driver should redirect the call to the format 
> node.
>
> What are situations the name of the filter itself should be refreshed in?
>
> If there are any, should we do both actions or choose either?
>
> Andrey
>

I ment the node FILE name.

Andrey


>
>>             bs->exact_filename[0] = '\0';
>>             /*
>>            * We can use the underlying file's filename if:
>>            * - it has a filename,
>> +         * - the current BDS is not a filter,
>
>
> Should we check the function input parameter for being a filter's BS 
> here, in this function, and handle the case here or let the filter 
> driver function do that or else the caller should check it?
>
> Andrey
>
>
>>            * - the file is a protocol BDS, and
>>            * - opening that file (as this BDS's format) will 
>> automatically create
>>            *   the BDS tree we have right now, that is:
>> @@ -6888,11 +6899,11 @@ void bdrv_refresh_filename(BlockDriverState *bs)
>>            *   - no non-file child of this BDS has been overridden by 
>> the user
>>            *   Both of these conditions are represented by 
>> generate_json_filename.
>>            */
>> -        if (bs->file->bs->exact_filename[0] &&
>> -            bs->file->bs->drv->bdrv_file_open &&
>> -            !generate_json_filename)
>> +        if (primary_child_bs->exact_filename[0] &&
>> +            primary_child_bs->drv->bdrv_file_open &&
>> +            !drv->is_filter && !generate_json_filename)
>>           {
>> -            strcpy(bs->exact_filename, bs->file->bs->exact_filename);
>> +            strcpy(bs->exact_filename, 
>> primary_child_bs->exact_filename);
>>           }
>>       }
>>   @@ -6912,6 +6923,7 @@ void bdrv_refresh_filename(BlockDriverState *bs)
>>   char *bdrv_dirname(BlockDriverState *bs, Error **errp)
>>   {
>>       BlockDriver *drv = bs->drv;
>> +    BlockDriverState *child_bs;
>>         if (!drv) {
>>           error_setg(errp, "Node '%s' is ejected", bs->node_name);
>> @@ -6922,8 +6934,9 @@ char *bdrv_dirname(BlockDriverState *bs, Error 
>> **errp)
>>           return drv->bdrv_dirname(bs, errp);
>>       }
>>   -    if (bs->file) {
>> -        return bdrv_dirname(bs->file->bs, errp);
>> +    child_bs = bdrv_primary_bs(bs);
>> +    if (child_bs) {
>> +        return bdrv_dirname(child_bs, errp);
>>       }
>>         bdrv_refresh_filename(bs);
>


^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [PATCH v7 22/47] block: Use CAF in bdrv_co_rw_vmstate()
  2020-06-25 15:21 ` [PATCH v7 22/47] block: Use CAF in bdrv_co_rw_vmstate() Max Reitz
@ 2020-07-15 13:39   ` Andrey Shinkevich
  0 siblings, 0 replies; 173+ messages in thread
From: Andrey Shinkevich @ 2020-07-15 13:39 UTC (permalink / raw)
  To: Max Reitz, qemu-block
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, qemu-devel

On 25.06.2020 18:21, Max Reitz wrote:
> If a node whose driver does not provide VM state functions has a
> metadata child, the VM state should probably go there; if it is a
> filter, the VM state should probably go there.  It follows that we
> should generally go down to the primary child.
>
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>   block/io.c | 5 +++--
>   1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/block/io.c b/block/io.c
> index 37057f13e0..9e802804bb 100644
> --- a/block/io.c
> +++ b/block/io.c
> @@ -2646,6 +2646,7 @@ bdrv_co_rw_vmstate(BlockDriverState *bs, QEMUIOVector *qiov, int64_t pos,
>                      bool is_read)
>   {
>       BlockDriver *drv = bs->drv;
> +    BlockDriverState *child_bs = bdrv_primary_bs(bs);
>       int ret = -ENOTSUP;
>   
>       bdrv_inc_in_flight(bs);
> @@ -2658,8 +2659,8 @@ bdrv_co_rw_vmstate(BlockDriverState *bs, QEMUIOVector *qiov, int64_t pos,
>           } else {
>               ret = drv->bdrv_save_vmstate(bs, qiov, pos);
>           }
> -    } else if (bs->file) {
> -        ret = bdrv_co_rw_vmstate(bs->file->bs, qiov, pos, is_read);
> +    } else if (child_bs) {
> +        ret = bdrv_co_rw_vmstate(child_bs, qiov, pos, is_read);
>       }
>   
>       bdrv_dec_in_flight(bs);


Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>



^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [PATCH v7 23/47] block/snapshot: Fix fallback
  2020-06-25 15:21 ` [PATCH v7 23/47] block/snapshot: Fix fallback Max Reitz
@ 2020-07-15 21:22   ` Andrey Shinkevich
  2020-07-15 22:18     ` Andrey Shinkevich
  0 siblings, 1 reply; 173+ messages in thread
From: Andrey Shinkevich @ 2020-07-15 21:22 UTC (permalink / raw)
  To: Max Reitz, qemu-block
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, qemu-devel

On 25.06.2020 18:21, Max Reitz wrote:
> If the top node's driver does not provide snapshot functionality and we
> want to fall back to a node down the chain, we need to snapshot all
> non-COW children.  For simplicity's sake, just do not fall back if there

I guess it comes to COW children like BDRV_CHILD_DATA | 
BDRV_CHILD_METADATA  rather than non-COW ones, does it?

Andrey


> is more than one such child.  Furthermore, we really only can fall back
> to bs->file and bs->backing, because bdrv_snapshot_goto() has to modify
> the child link (notably, set it to NULL).
>
> Suggested-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>   block/snapshot.c | 104 +++++++++++++++++++++++++++++++++++++----------
>   1 file changed, 83 insertions(+), 21 deletions(-)
>
> diff --git a/block/snapshot.c b/block/snapshot.c
...
> +    /*
> +     * Check that there are no other children that would need to be
> +     * snapshotted.  If there are, it is not safe to fall back to
> +     * *fallback.
> +     */
> +    QLIST_FOREACH(child, &bs->children, next) {
> +        if (child->role & (BDRV_CHILD_DATA | BDRV_CHILD_METADATA |
> +                           BDRV_CHILD_FILTERED) &&
> +            child != *fallback)
> +        {
> +            return NULL;
> +        }
> +    }
> +
> +    return fallback;
> +}

...

Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>



^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [PATCH v7 24/47] block: Use CAFs for debug breakpoints
  2020-06-25 15:21 ` [PATCH v7 24/47] block: Use CAFs for debug breakpoints Max Reitz
@ 2020-07-15 21:43   ` Andrey Shinkevich
  0 siblings, 0 replies; 173+ messages in thread
From: Andrey Shinkevich @ 2020-07-15 21:43 UTC (permalink / raw)
  To: Max Reitz, qemu-block
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, qemu-devel

On 25.06.2020 18:21, Max Reitz wrote:
> When looking for a blkdebug node (which implements debug breakpoints),
> use bdrv_primary_bs() to iterate through the graph, because that is
> where a blkdebug node would be.
>
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>   block.c | 16 +++-------------
>   1 file changed, 3 insertions(+), 13 deletions(-)
>
> diff --git a/block.c b/block.c
> index 7c827fefa0..1c71ecab7c 100644
> --- a/block.c
> +++ b/block.c
> @@ -5562,17 +5562,7 @@ void bdrv_debug_event(BlockDriverState *bs, BlkdebugEvent event)
>   static BlockDriverState *bdrv_find_debug_node(BlockDriverState *bs)
>   {
>       while (bs && bs->drv && !bs->drv->bdrv_debug_breakpoint) {
> -        if (bs->file) {
> -            bs = bs->file->bs;
> -            continue;
> -        }
> -
> -        if (bs->drv->is_filter && bs->backing) {
> -            bs = bs->backing->bs;
> -            continue;
> -        }
> -
> -        break;
> +        bs = bdrv_primary_bs(bs);
>       }
>   
>       if (bs && bs->drv && bs->drv->bdrv_debug_breakpoint) {
> @@ -5607,7 +5597,7 @@ int bdrv_debug_remove_breakpoint(BlockDriverState *bs, const char *tag)
>   int bdrv_debug_resume(BlockDriverState *bs, const char *tag)
>   {
>       while (bs && (!bs->drv || !bs->drv->bdrv_debug_resume)) {
> -        bs = bs->file ? bs->file->bs : NULL;
> +        bs = bdrv_primary_bs(bs);
>       }
>   
>       if (bs && bs->drv && bs->drv->bdrv_debug_resume) {
> @@ -5620,7 +5610,7 @@ int bdrv_debug_resume(BlockDriverState *bs, const char *tag)
>   bool bdrv_debug_is_suspended(BlockDriverState *bs, const char *tag)
>   {
>       while (bs && bs->drv && !bs->drv->bdrv_debug_is_suspended) {
> -        bs = bs->file ? bs->file->bs : NULL;
> +        bs = bdrv_primary_bs(bs);
>       }
>   
>       if (bs && bs->drv && bs->drv->bdrv_debug_is_suspended) {


Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>



^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [PATCH v7 23/47] block/snapshot: Fix fallback
  2020-07-15 21:22   ` Andrey Shinkevich
@ 2020-07-15 22:18     ` Andrey Shinkevich
  0 siblings, 0 replies; 173+ messages in thread
From: Andrey Shinkevich @ 2020-07-15 22:18 UTC (permalink / raw)
  To: Max Reitz, qemu-block
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, qemu-devel

On 16.07.2020 00:22, Andrey Shinkevich wrote:
> On 25.06.2020 18:21, Max Reitz wrote:
>> If the top node's driver does not provide snapshot functionality and we
>> want to fall back to a node down the chain, we need to snapshot all
>> non-COW children.  For simplicity's sake, just do not fall back if there
>
> I guess it comes to COW children like BDRV_CHILD_DATA | 
> BDRV_CHILD_METADATA  rather than non-COW ones, does it?
>

The BDRV_CHILD_COW is mutually exclusive with DATA, METADATA and 
FILTERED per definition.

Sorry about the question.

> Andrey
>
>
>> is more than one such child.  Furthermore, we really only can fall back
>> to bs->file and bs->backing, because bdrv_snapshot_goto() has to modify
>> the child link (notably, set it to NULL).
>>
>> Suggested-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>> ---
>>   block/snapshot.c | 104 +++++++++++++++++++++++++++++++++++++----------
>>   1 file changed, 83 insertions(+), 21 deletions(-)
>>
>> diff --git a/block/snapshot.c b/block/snapshot.c
> ...
>> +    /*
>> +     * Check that there are no other children that would need to be
>> +     * snapshotted.  If there are, it is not safe to fall back to
>> +     * *fallback.
>> +     */
>> +    QLIST_FOREACH(child, &bs->children, next) {
>> +        if (child->role & (BDRV_CHILD_DATA | BDRV_CHILD_METADATA |
>> +                           BDRV_CHILD_FILTERED) &&
>> +            child != *fallback)
>> +        {
>> +            return NULL;
>> +        }
>> +    }
>> +
>> +    return fallback;
>> +}
>
> ...
>
> Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
>
>


^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [PATCH v7 25/47] block: Def. impl.s for get_allocated_file_size
  2020-06-25 15:21 ` [PATCH v7 25/47] block: Def. impl.s for get_allocated_file_size Max Reitz
@ 2020-07-15 22:56   ` Andrey Shinkevich
  2020-08-19 10:57   ` Kevin Wolf
  1 sibling, 0 replies; 173+ messages in thread
From: Andrey Shinkevich @ 2020-07-15 22:56 UTC (permalink / raw)
  To: Max Reitz, qemu-block
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, qemu-devel

On 25.06.2020 18:21, Max Reitz wrote:
> If every BlockDriver were to implement bdrv_get_allocated_file_size(),
> there are basically three ways it would be handled:
> (1) For protocol drivers: Figure out the actual allocated file size in
>      some protocol-specific way
> (2) For protocol drivers: If that is not possible (or we just have not
>      bothered to implement it yet), return -ENOTSUP
> (3) For drivers with children: Return the sum of some or all their
>      children's sizes
>
> For the drivers we have, case (3) boils down to either:
> (a) The sum of all children's sizes
> (b) The size of the primary child
>
> (2), (3a) and (3b) can be implemented generically, so this patch adds
> such generic implementations for drivers to use.
>
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>   include/block/block_int.h |  5 ++++
>   block.c                   | 51 +++++++++++++++++++++++++++++++++++++++
>   2 files changed, 56 insertions(+)
>
> diff --git a/include/block/block_int.h b/include/block/block_int.h
> index 5da793bfc3..c963ee9f28 100644
> --- a/include/block/block_int.h
> +++ b/include/block/block_int.h
> @@ -1318,6 +1318,11 @@ int coroutine_fn bdrv_co_block_status_from_backing(BlockDriverState *bs,
>                                                      int64_t *pnum,
>                                                      int64_t *map,
>                                                      BlockDriverState **file);
> +
> +int64_t bdrv_sum_allocated_file_size(BlockDriverState *bs);
> +int64_t bdrv_primary_allocated_file_size(BlockDriverState *bs);
> +int64_t bdrv_notsup_allocated_file_size(BlockDriverState *bs);
> +
>   const char *bdrv_get_parent_name(const BlockDriverState *bs);
>   void blk_dev_change_media_cb(BlockBackend *blk, bool load, Error **errp);
>   bool blk_dev_has_removable_media(BlockBackend *blk);
> diff --git a/block.c b/block.c
> index 1c71ecab7c..fc01ce90b3 100644
> --- a/block.c
> +++ b/block.c
> @@ -5003,6 +5003,57 @@ int64_t bdrv_get_allocated_file_size(BlockDriverState *bs)
>       return -ENOTSUP;
>   }
>   
> +/**
> + * Implementation of BlockDriver.bdrv_get_allocated_file_size() for
> + * block drivers that want it to sum all children they store data on.
> + * (This excludes backing children.)
> + */
> +int64_t bdrv_sum_allocated_file_size(BlockDriverState *bs)
> +{
> +    BdrvChild *child;
> +    int64_t child_size, sum = 0;
> +
> +    QLIST_FOREACH(child, &bs->children, next) {
> +        if (child->role & (BDRV_CHILD_DATA | BDRV_CHILD_METADATA |
> +                           BDRV_CHILD_FILTERED))
> +        {
> +            child_size = bdrv_get_allocated_file_size(child->bs);
> +            if (child_size < 0) {
> +                return child_size;
> +            }
> +            sum += child_size;
> +        }
> +    }
> +
> +    return sum;
> +}
> +
> +/**
> + * Implementation of BlockDriver.bdrv_get_allocated_file_size() for
> + * block drivers that want it to return only the size of a node's
> + * primary child.
> + */
> +int64_t bdrv_primary_allocated_file_size(BlockDriverState *bs)
> +{
> +    BlockDriverState *primary_bs;
> +
> +    primary_bs = bdrv_primary_bs(bs);
> +    if (!primary_bs) {
> +        return -ENOTSUP;
> +    }
> +
> +    return bdrv_get_allocated_file_size(primary_bs);
> +}
> +
> +/**
> + * Implementation of BlockDriver.bdrv_get_allocated_file_size() for
> + * protocol block drivers that just do not support it.
> + */
> +int64_t bdrv_notsup_allocated_file_size(BlockDriverState *bs)
> +{
> +    return -ENOTSUP;
> +}
> +
>   /*
>    * bdrv_measure:
>    * @drv: Format driver


Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>



^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [PATCH v7 01/47] block: Add child access functions
  2020-07-13  9:06   ` Vladimir Sementsov-Ogievskiy
@ 2020-07-16 14:46     ` Max Reitz
  2020-07-28 16:09     ` Christophe de Dinechin
  1 sibling, 0 replies; 173+ messages in thread
From: Max Reitz @ 2020-07-16 14:46 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-block; +Cc: Kevin Wolf, qemu-devel


[-- Attachment #1.1: Type: text/plain, Size: 2393 bytes --]

On 13.07.20 11:06, Vladimir Sementsov-Ogievskiy wrote:
> 25.06.2020 18:21, Max Reitz wrote:
>> There are BDS children that the general block layer code can access,
>> namely bs->file and bs->backing.  Since the introduction of filters and
>> external data files, their meaning is not quite clear.  bs->backing can
>> be a COW source, or it can be a filtered child; bs->file can be a
>> filtered child, it can be data and metadata storage, or it can be just
>> metadata storage.
>>
>> This overloading really is not helpful.  This patch adds functions that
>> retrieve the correct child for each exact purpose.  Later patches in
>> this series will make use of them.  Doing so will allow us to handle
>> filter nodes in a meaningful way.
>>
>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>> ---
> 
> [..]
> 
>> +/*
>> + * Return the primary child of this node: For filters, that is the
>> + * filtered child.  For other nodes, that is usually the child storing
>> + * metadata.
>> + * (A generally more helpful description is that this is (usually) the
>> + * child that has the same filename as @bs.)
>> + *
>> + * Drivers do not necessarily have a primary child; for example quorum
>> + * does not.
>> + */
>> +BdrvChild *bdrv_primary_child(BlockDriverState *bs)
>> +{
>> +    BdrvChild *c;
>> +
>> +    QLIST_FOREACH(c, &bs->children, next) {
>> +        if (c->role & BDRV_CHILD_PRIMARY) {
>> +            return c;
>> +        }
>> +    }
>> +
>> +    return NULL;
>> +}
>>
> 
> Suggest squash-in to also assert that not more than one primary child:
> --- a/block.c
> +++ b/block.c
> @@ -6998,13 +6998,14 @@ BdrvChild
> *bdrv_filter_or_cow_child(BlockDriverState *bs)
>   */
>  BdrvChild *bdrv_primary_child(BlockDriverState *bs)
>  {
> -    BdrvChild *c;
> +    BdrvChild *c, *found = NULL;
>  
>      QLIST_FOREACH(c, &bs->children, next) {
>          if (c->role & BDRV_CHILD_PRIMARY) {
> -            return c;
> +            assert(!found);

Hm, why not.

> +            found = c;
>          }
>      }
>  
> -    return NULL;
> +    return c;
>  }
> 
> 
> with or without:
> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>

In any case, thanks for reviewing!

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [PATCH v7 02/47] block: Add chain helper functions
  2020-07-13 10:18   ` Vladimir Sementsov-Ogievskiy
@ 2020-07-16 14:50     ` Max Reitz
  2020-07-16 15:24       ` Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 173+ messages in thread
From: Max Reitz @ 2020-07-16 14:50 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-block; +Cc: Kevin Wolf, qemu-devel


[-- Attachment #1.1: Type: text/plain, Size: 4387 bytes --]

On 13.07.20 12:18, Vladimir Sementsov-Ogievskiy wrote:
> 25.06.2020 18:21, Max Reitz wrote:
>> Add some helper functions for skipping filters in a chain of block
>> nodes.
>>
>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>> ---
>>   include/block/block_int.h |  3 +++
>>   block.c                   | 55 +++++++++++++++++++++++++++++++++++++++
>>   2 files changed, 58 insertions(+)
>>
>> diff --git a/include/block/block_int.h b/include/block/block_int.h
>> index bb3457c5e8..5da793bfc3 100644
> 
> 
> This patch raises two questions:
> 
> 1. How to treat filters at the end of the backing chain?

It was my understanding that this configuration would be impossible.

> - child-access function will return no filter child for such nodes, it's
> correct of course
> - filer skipping functions will return this filter.. How much is it
> correct - I don't know.
> 
> 
> Consider a chain
> 
> top --- backing ---> filter-with-no-child

How would it be possible to have filter without a child?

> if bdrv_backing_chain_next(top) returns NULL, it's incorrect, because
> top actually have backing, and on read it will read from it for
> unallocated clusters (and this should crash). So, probably, returning
> filter as a backing-chain-next is a valid thing to do. Or we should
> assert that we are not in such situation (which may crash more often
> than trying to really read from nonexistent child).
> 
> so, returning NULL, may even less correct than returning a filter..
> 
> 
> 2. How to tread nodes with drv=NULL, but with filter child (with
> BDRV_CHILD_FILTERED role).

drv=NULL is a bug on its own that should be fixed...  (The idea we had
at some point was to introduce a special driver that just always returns
-EIO on everything, and to replace corrupt qcow2 nodes by that.  Or,
well, we might just return -EIO from the qcow2 driver, actually, if we
never use drv=NULL anywhere else.)

In any case, drv=NULL is an edge case that I think never has anything to
do with filters.

> - child-access functions returns no filtered child for such nodes
> - filter skipping functions will stop on it..
> 
> =======
> 
> Isn't it better to drop drv->is_filter at all? And call filter nodes
> with a bs->file or bs->backing
> child in BDRV_CHILD_FILTERED role? This automatically closes the two
> questions:
> 
> - node without a child in BDRV_CHILD_FILTERED is automatically
> non-filter. So, filter driver is responsible for having such child.
> - node without a drv may still be a filter if it have
> BDRV_CHILD_FILTERED.. Still, not very useful.
> 
> Anyway, is_filter and BDRV_CHILD_FILTERED are in contradiction, and it
> seems good to get rid of is_filter. But I may miss something.
> 
> [..]
> 
>> +
>> +static BlockDriverState *bdrv_do_skip_filters(BlockDriverState *bs,
>> +                                              bool
>> stop_on_explicit_filter)
>> +{
>> +    BdrvChild *c;
>> +
>> +    if (!bs) {
>> +        return NULL;
>> +    }
>> +
>> +    while (!(stop_on_explicit_filter && !bs->implicit)) {
>> +        c = bdrv_filter_child(bs);
>> +        if (!c) {
>> +            break;
>> +        }
>> +        bs = c->bs;
>> +    }
>> +    /*
>> +     * Note that this treats nodes with bs->drv == NULL as not being
>> +     * filters (bs->drv == NULL should be replaced by something else
>> +     * anyway).
>> +     * The advantage of this behavior is that this function will thus
>> +     * always return a non-NULL value (given a non-NULL @bs).
> 
> I don't see, how it is follows from first sentence? We can skip nodes
> with a child of BDRV_CHILD_FILTERED and drv=NULL as well, and still return
> non-NULL bs at the end...

My idea was that nodes with bs->drv == NULL might not even have
children.  If we treated them like filters and skipped through them, we
would have to return NULL if there is no child.

> Didn't you mean "treat nodes without filter child as not being filters,
> even if they have drv->is_filter == true"? This is a real reason for the
> second sentence.

Hm.  I implicitly always assume that filters always have a filter child,
so I tend to not even question that part.

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [PATCH v7 14/47] stream: Deal with filters
  2020-07-10 17:41       ` Andrey Shinkevich
@ 2020-07-16 14:59         ` Max Reitz
  2020-08-07 10:29           ` Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 173+ messages in thread
From: Max Reitz @ 2020-07-16 14:59 UTC (permalink / raw)
  To: Andrey Shinkevich, qemu-block
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, qemu-devel


[-- Attachment #1.1: Type: text/plain, Size: 13369 bytes --]

On 10.07.20 19:41, Andrey Shinkevich wrote:
> On 10.07.2020 18:24, Max Reitz wrote:
>> On 09.07.20 16:52, Andrey Shinkevich wrote:
>>> On 25.06.2020 18:21, Max Reitz wrote:
>>>> Because of the (not so recent anymore) changes that make the stream job
>>>> independent of the base node and instead track the node above it, we
>>>> have to split that "bottom" node into two cases: The bottom COW node,
>>>> and the node directly above the base node (which may be an R/W filter
>>>> or the bottom COW node).
>>>>
>>>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>>>> ---
>>>>    qapi/block-core.json |  4 +++
>>>>    block/stream.c       | 63
>>>> ++++++++++++++++++++++++++++++++------------
>>>>    blockdev.c           |  4 ++-
>>>>    3 files changed, 53 insertions(+), 18 deletions(-)
>>>>
>>>> diff --git a/qapi/block-core.json b/qapi/block-core.json
>>>> index b20332e592..df87855429 100644
>>>> --- a/qapi/block-core.json
>>>> +++ b/qapi/block-core.json
>>>> @@ -2486,6 +2486,10 @@
>>>>    # On successful completion the image file is updated to drop the
>>>> backing file
>>>>    # and the BLOCK_JOB_COMPLETED event is emitted.
>>>>    #
>>>> +# In case @device is a filter node, block-stream modifies the first
>>>> non-filter
>>>> +# overlay node below it to point to base's backing node (or NULL if
>>>> @base was
>>>> +# not specified) instead of modifying @device itself.
>>>> +#
>>>>    # @job-id: identifier for the newly-created block job. If
>>>>    #          omitted, the device name will be used. (Since 2.7)
>>>>    #
>>>> diff --git a/block/stream.c b/block/stream.c
>>>> index aa2e7af98e..b9c1141656 100644
>>>> --- a/block/stream.c
>>>> +++ b/block/stream.c
>>>> @@ -31,7 +31,8 @@ enum {
>>>>      typedef struct StreamBlockJob {
>>>>        BlockJob common;
>>>> -    BlockDriverState *bottom;
>>>> +    BlockDriverState *base_overlay; /* COW overlay (stream from
>>>> this) */
>>>> +    BlockDriverState *above_base;   /* Node directly above the base */
>>> Keeping the base_overlay is enough to complete the stream job.
>> Depends on the definition.  If we decide it isn’t enough, then it isn’t
>> enough.
>>
>>> The above_base may disappear during the job and we can't rely on it.
>> In this version of this series, it may not, because the chain is frozen.
>>   So the above_base cannot disappear.
> 
> Once we insert a filter above the top bs of the stream job, the parallel
> jobs in
> 
> the iotests #030 will fail with 'frozen link error'. It is because of the
> 
> independent parallel stream or commit jobs that insert/remove their filters
> 
> asynchroniously.

I’m not sure whether that’s a problem with this series specifically.

>> We can discuss whether we should allow it to disappear, but I think not.
>>
>> The problem is, we need something to set as the backing file after
>> streaming.  How do we figure out what that should be?  My proposal is we
>> keep above_base and use its immediate child.
> 
> We can do the same with the base_overlay.
> 
> If the backing node turns out to be a filter, the proper backing child will
> 
> be set after the filter is removed. So, we shouldn't care.

And what if the user manually added some filter above the base (i.e.
below base_overlay) that they want to keep after the job?

>> If we don’t keep above_base, then we’re basically left guessing as to
>> what should be the backing file after the stream job.
>>
>>>>        BlockdevOnError on_error;
>>>>        char *backing_file_str;
>>>>        bool bs_read_only;
>>>> @@ -53,7 +54,7 @@ static void stream_abort(Job *job)
>>>>          if (s->chain_frozen) {
>>>>            BlockJob *bjob = &s->common;
>>>> -        bdrv_unfreeze_backing_chain(blk_bs(bjob->blk), s->bottom);
>>>> +        bdrv_unfreeze_backing_chain(blk_bs(bjob->blk), s->above_base);
>>>>        }
>>>>    }
>>>>    @@ -62,14 +63,15 @@ static int stream_prepare(Job *job)
>>>>        StreamBlockJob *s = container_of(job, StreamBlockJob,
>>>> common.job);
>>>>        BlockJob *bjob = &s->common;
>>>>        BlockDriverState *bs = blk_bs(bjob->blk);
>>>> -    BlockDriverState *base = backing_bs(s->bottom);
>>>> +    BlockDriverState *unfiltered_bs = bdrv_skip_filters(bs);
>>>> +    BlockDriverState *base = bdrv_filter_or_cow_bs(s->above_base);
>>> The initial base node may be a top node for a concurrent commit job and
>>>
>>> may disappear.
>> Then it would just be replaced by another node, though, so above_base
>> keeps a child.  The @base here is not necessarily the initial @base, and
>> that’s intentional.
> 
> Not really. In my example, above_base becomes a dangling
> 
> pointer because after the commit job finishes, its filter that should
> belong to the
> 
> commit job frozen chain will be deleted. If we freeze the link to the
> above_base
> 
> for this job, the iotests #30 will not pass.

So it doesn’t become a dangling pointer, because it’s frozen.

030 passes after this series, so I’m not sure whether I can consider
that problem part of this series.

I think if adding a filter node becomes a problem, we have to consider
relaxing the restrictions when we do that, not now.

>>> base = bdrv_filter_or_cow_bs(s->base_overlay) is more reliable.
>> But also wrong.  The point of keeping above_base around is to get its
>> child here to use that child as the new backing child of the top node.
>>
>>>>        Error *local_err = NULL;
>>>>        int ret = 0;
>>>>    -    bdrv_unfreeze_backing_chain(bs, s->bottom);
>>>> +    bdrv_unfreeze_backing_chain(bs, s->above_base);
>>>>        s->chain_frozen = false;
>>>>    -    if (bs->backing) {
>>>> +    if (bdrv_cow_child(unfiltered_bs)) {
>>>>            const char *base_id = NULL, *base_fmt = NULL;
>>>>            if (base) {
>>>>                base_id = s->backing_file_str;
>>>> @@ -77,8 +79,8 @@ static int stream_prepare(Job *job)
>>>>                    base_fmt = base->drv->format_name;
>>>>                }
>>>>            }
>>>> -        bdrv_set_backing_hd(bs, base, &local_err);
>>>> -        ret = bdrv_change_backing_file(bs, base_id, base_fmt);
>>>> +        bdrv_set_backing_hd(unfiltered_bs, base, &local_err);
>>>> +        ret = bdrv_change_backing_file(unfiltered_bs, base_id,
>>>> base_fmt);
>>>>            if (local_err) {
>>>>                error_report_err(local_err);
>>>>                return -EPERM;
>>>> @@ -109,14 +111,15 @@ static int coroutine_fn stream_run(Job *job,
>>>> Error **errp)
>>>>        StreamBlockJob *s = container_of(job, StreamBlockJob,
>>>> common.job);
>>>>        BlockBackend *blk = s->common.blk;
>>>>        BlockDriverState *bs = blk_bs(blk);
>>>> -    bool enable_cor = !backing_bs(s->bottom);
>>>> +    BlockDriverState *unfiltered_bs = bdrv_skip_filters(bs);
>>>> +    bool enable_cor = !bdrv_cow_child(s->base_overlay);
>>>>        int64_t len;
>>>>        int64_t offset = 0;
>>>>        uint64_t delay_ns = 0;
>>>>        int error = 0;
>>>>        int64_t n = 0; /* bytes */
>>>>    -    if (bs == s->bottom) {
>>>> +    if (unfiltered_bs == s->base_overlay) {
>>>>            /* Nothing to stream */
>>>>            return 0;
>>>>        }
>>>> @@ -150,13 +153,14 @@ static int coroutine_fn stream_run(Job *job,
>>>> Error **errp)
>>>>              copy = false;
>>>>    -        ret = bdrv_is_allocated(bs, offset, STREAM_CHUNK, &n);
>>>> +        ret = bdrv_is_allocated(unfiltered_bs, offset, STREAM_CHUNK,
>>>> &n);
>>>>            if (ret == 1) {
>>>>                /* Allocated in the top, no need to copy.  */
>>>>            } else if (ret >= 0) {
>>>>                /* Copy if allocated in the intermediate images.  Limit
>>>> to the
>>>>                 * known-unallocated area [offset,
>>>> offset+n*BDRV_SECTOR_SIZE).  */
>>>> -            ret = bdrv_is_allocated_above(backing_bs(bs), s->bottom,
>>>> true,
>>>> +            ret = bdrv_is_allocated_above(bdrv_cow_bs(unfiltered_bs),
>>>> +                                          s->base_overlay, true,
>>>>                                              offset, n, &n);
>>>>                /* Finish early if end of backing file has been
>>>> reached */
>>>>                if (ret == 0 && n == 0) {
>>>> @@ -223,9 +227,29 @@ void stream_start(const char *job_id,
>>>> BlockDriverState *bs,
>>>>        BlockDriverState *iter;
>>>>        bool bs_read_only;
>>>>        int basic_flags = BLK_PERM_CONSISTENT_READ |
>>>> BLK_PERM_WRITE_UNCHANGED;
>>>> -    BlockDriverState *bottom = bdrv_find_overlay(bs, base);
>>>> +    BlockDriverState *base_overlay = bdrv_find_overlay(bs, base);
>>>> +    BlockDriverState *above_base;
>>>>    -    if (bdrv_freeze_backing_chain(bs, bottom, errp) < 0) {
>>>> +    if (!base_overlay) {
>>>> +        error_setg(errp, "'%s' is not in the backing chain of '%s'",
>>>> +                   base->node_name, bs->node_name);
>>> Sorry, I am not clear with the error message.
>>>
>>> In this case, there is no an intermediate COW node but the base, if not
>>> NULL, is
>>>
>>> in the backing chain of bs, isn't it?
>>>
>>>> +        return;
>>>> +    }
>>>> +
>>>> +    /*
>>>> +     * Find the node directly above @base.  @base_overlay is a COW
>>>> overlay, so
>>>> +     * it must have a bdrv_cow_child(), but it is the immediate
>>>> overlay of
>>>> +     * @base, so between the two there can only be filters.
>>>> +     */
>>>> +    above_base = base_overlay;
>>>> +    if (bdrv_cow_bs(above_base) != base) {
>>>> +        above_base = bdrv_cow_bs(above_base);
>>>> +        while (bdrv_filter_bs(above_base) != base) {
>>>> +            above_base = bdrv_filter_bs(above_base);
>>>> +        }
>>>> +    }
>>>> +
>>>> +    if (bdrv_freeze_backing_chain(bs, above_base, errp) < 0) {
>>> When a concurrent stream job tries to freeze or remove the above_base
>>> node,
>>>
>>> we will encounter the frozen node error. The above_base node is a part
>>> of the
>>>
>>> concurrent job frozen chain.
>> Correct.
>>
>>>>            return;
>>>>        }
>>>>    @@ -255,14 +279,19 @@ void stream_start(const char *job_id,
>>>> BlockDriverState *bs,
>>>>         * and resizes. Reassign the base node pointer because the
>>>> backing BS of the
>>>>         * bottom node might change after the call to
>>>> bdrv_reopen_set_read_only()
>>>>         * due to parallel block jobs running.
>>>> +     * above_base node might change after the call to
>>> Yes, if not frozen.
>>>> +     * bdrv_reopen_set_read_only() due to parallel block jobs running.
>>>>         */
>>>> -    base = backing_bs(bottom);
>>>> -    for (iter = backing_bs(bs); iter && iter != base; iter =
>>>> backing_bs(iter)) {
>>>> +    base = bdrv_filter_or_cow_bs(above_base);
>>>> +    for (iter = bdrv_filter_or_cow_bs(bs); iter != base;
>>>> +         iter = bdrv_filter_or_cow_bs(iter))
>>>> +    {
>>>>            block_job_add_bdrv(&s->common, "intermediate node", iter, 0,
>>>>                               basic_flags, &error_abort);
>>>>        }
>>>>    -    s->bottom = bottom;
>>>> +    s->base_overlay = base_overlay;
>>>> +    s->above_base = above_base;
>>> Generally, being the filter for a concurrent job, the above_base node
>>> may be deleted any time
>>>
>>> and we will keep the dangling pointer. It may happen even earlier if
>>> above_base is not frozen.
>>>
>>> If it is, as it here, we may get the frozen link error then.
>> I’m not sure what you mean here.  Freezing it was absolutely
>> intentional.  A dangling pointer would be a problem, but that’s why it’s
>> frozen, so it stays around and can’t be deleted any time.
>>
>> Max
> 
> The nodes we freeze should be in one context of the relevant job:
> 
> filter->top_node->intermediate_node(s)
> 
> We would not include the base or any filter above it to the frozen chain
> 
> because they are of a different job context.

They aren’t really, because we need to know the backing node of @device
after the job.

> Once 'this' job is completed, we set the current backing child of the
> base_overlay
> 
> and may not care of its character. If that is another job filter, it
> will be replaced
> 
> with the proper node afterwards.

But what if there are filters above the base that the user wants to keep
after the job?

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 484 bytes --]

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [PATCH v7 17/47] block: Re-evaluate backing file handling in reopen
  2020-07-10 19:42   ` Andrey Shinkevich
@ 2020-07-16 15:04     ` Max Reitz
  0 siblings, 0 replies; 173+ messages in thread
From: Max Reitz @ 2020-07-16 15:04 UTC (permalink / raw)
  To: Andrey Shinkevich, qemu-block
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, qemu-devel


[-- Attachment #1.1: Type: text/plain, Size: 5542 bytes --]

On 10.07.20 21:42, Andrey Shinkevich wrote:
> On 25.06.2020 18:21, Max Reitz wrote:
>> Reopening a node's backing child needs a bit of special handling because
>> the "backing" child has different defaults than all other children
>> (among other things).  Adding filter support here is a bit more
>> difficult than just using the child access functions.  In fact, we often
>> have to directly use bs->backing because these functions are about the
>> "backing" child (which may or may not be the COW backing file).
>>
>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>> ---
>>   block.c | 46 ++++++++++++++++++++++++++++++++++++++--------
>>   1 file changed, 38 insertions(+), 8 deletions(-)
>>
>> diff --git a/block.c b/block.c
>> index 712230ef5c..8131d0b5eb 100644
>> --- a/block.c
>> +++ b/block.c
>> @@ -4026,26 +4026,56 @@ static int
>> bdrv_reopen_parse_backing(BDRVReopenState *reopen_state,
>>           }
>>       }
>>   +    /*
>> +     * Ensure that @bs can really handle backing files, because we are
>> +     * about to give it one (or swap the existing one)
>> +     */
>> +    if (bs->drv->is_filter) {
>> +        /* Filters always have a file or a backing child */
>> +        if (!bs->backing) {
>> +            error_setg(errp, "'%s' is a %s filter node that does not
>> support a "
>> +                       "backing child", bs->node_name,
>> bs->drv->format_name);
>> +            return -EINVAL;
>> +        }
>> +    } else if (!bs->drv->supports_backing) {
>> +        error_setg(errp, "Driver '%s' of node '%s' does not support
>> backing "
>> +                   "files", bs->drv->format_name, bs->node_name);
>> +        return -EINVAL;
>> +    }
>> +
>>       /*
>>        * Find the "actual" backing file by skipping all links that point
>>        * to an implicit node, if any (e.g. a commit filter node).
>> +     * We cannot use any of the bdrv_skip_*() functions here because
>> +     * those return the first explicit node, while we are looking for
>> +     * its overlay here.
>>        */
>>       overlay_bs = bs;
>> -    while (backing_bs(overlay_bs) && backing_bs(overlay_bs)->implicit) {
>> -        overlay_bs = backing_bs(overlay_bs);
>> +    while (bdrv_filter_or_cow_bs(overlay_bs) &&
>> +           bdrv_filter_or_cow_bs(overlay_bs)->implicit)
>> +    {
>> +        overlay_bs = bdrv_filter_or_cow_bs(overlay_bs);
>>       }
> 
> I believe that little optimization would work properly:
> 
> 
> for (BlockDriverState *below_bs = bdrv_filter_or_cow_bs(overlay_bs);
>        below_bs && below_bs->implicit;
>        below_bs = bdrv_filter_or_cow_bs(overlay_bs)) {
>          overlay_bs = below_bs;
> }

Looks good, thanks.

>>         /* If we want to replace the backing file we need some extra
>> checks */
>> -    if (new_backing_bs != backing_bs(overlay_bs)) {
>> +    if (new_backing_bs != bdrv_filter_or_cow_bs(overlay_bs)) {
>>           /* Check for implicit nodes between bs and its backing file */
>>           if (bs != overlay_bs) {
>>               error_setg(errp, "Cannot change backing link if '%s' has "
>>                          "an implicit backing file", bs->node_name);
>>               return -EPERM;
>>           }
>> -        /* Check if the backing link that we want to replace is
>> frozen */
>> -        if (bdrv_is_backing_chain_frozen(overlay_bs,
>> backing_bs(overlay_bs),
>> -                                         errp)) {
>> +        /*
>> +         * Check if the backing link that we want to replace is frozen.
>> +         * Note that
>> +         * bdrv_filter_or_cow_child(overlay_bs) == overlay_bs->backing,
>> +         * because we know that overlay_bs == bs, and that @bs
>> +         * either is a filter that uses ->backing or a COW format BDS
>> +         * with bs->drv->supports_backing == true.
>> +         */
>> +        if (bdrv_is_backing_chain_frozen(overlay_bs,
>> +                                        
>> child_bs(overlay_bs->backing), errp))
> What would be wrong with bdrv_filter_or_cow_bs(overlay_bs) here?

As the comment says, it’s the same thing.

I prefer ->backing here, because this function is about reopening the
->backing child.

>> +        {
>>               return -EPERM;
>>           }
>>           reopen_state->replace_backing_bs = true;
>> @@ -4196,7 +4226,7 @@ int bdrv_reopen_prepare(BDRVReopenState
>> *reopen_state, BlockReopenQueue *queue,
>>        * its metadata. Otherwise the 'backing' option can be omitted.
>>        */
>>       if (drv->supports_backing && reopen_state->backing_missing &&
>> -        (backing_bs(reopen_state->bs) ||
>> reopen_state->bs->backing_file[0])) {
> = BlockDriverState*
>> +        (reopen_state->bs->backing ||
>> reopen_state->bs->backing_file[0])) {
> 
> = BdrvChild*
> 
> Are we OK with that?

Sure, the question is whether it’s non-NULL, and BdrvChild.bs is always
non-NULL.

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [PATCH v7 19/47] vmdk: Drop vmdk_co_flush()
  2020-07-14 14:52   ` Andrey Shinkevich
@ 2020-07-16 15:08     ` Max Reitz
  0 siblings, 0 replies; 173+ messages in thread
From: Max Reitz @ 2020-07-16 15:08 UTC (permalink / raw)
  To: Andrey Shinkevich, qemu-block
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, qemu-devel


[-- Attachment #1.1: Type: text/plain, Size: 1996 bytes --]

On 14.07.20 16:52, Andrey Shinkevich wrote:
> On 25.06.2020 18:21, Max Reitz wrote:
>> Before HEAD^, we needed this because bdrv_co_flush() by itself would
>> only flush bs->file.  With HEAD^, bdrv_co_flush() will flush all
>> children on which a WRITE or WRITE_UNCHANGED permission has been taken.
>> Thus, vmdk no longer needs to do it itself.
>>
>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>> ---
>>   block/vmdk.c | 16 ----------------
>>   1 file changed, 16 deletions(-)
>>
>> diff --git a/block/vmdk.c b/block/vmdk.c
>> index 62da465126..a23890e6ec 100644
>> --- a/block/vmdk.c
>> +++ b/block/vmdk.c
>> @@ -2802,21 +2802,6 @@ static void vmdk_close(BlockDriverState *bs)
>>       error_free(s->migration_blocker);
>>   }
>>   -static coroutine_fn int vmdk_co_flush(BlockDriverState *bs)
>> -{
>> -    BDRVVmdkState *s = bs->opaque;
>> -    int i, err;
>> -    int ret = 0;
>> -
>> -    for (i = 0; i < s->num_extents; i++) {
>> -        err = bdrv_co_flush(s->extents[i].file->bs);
>> -        if (err < 0) {
>> -            ret = err;
>> -        }
>> -    }
>> -    return ret;
>> -}
>> -
>>   static int64_t vmdk_get_allocated_file_size(BlockDriverState *bs)
>>   {
>>       int i;
>> @@ -3075,7 +3060,6 @@ static BlockDriver bdrv_vmdk = {
>>       .bdrv_close                   = vmdk_close,
>>       .bdrv_co_create_opts          = vmdk_co_create_opts,
>>       .bdrv_co_create               = vmdk_co_create,
>> -    .bdrv_co_flush_to_disk        = vmdk_co_flush,
> 
> 
> After HEAD^ applied, wouldn't we get an endless recursion in
> bdrv_co_flush() if the HEAD (this patch) had not been merged into HEAD^?

Hm, how so?  HEAD^ just flushes all children, just like vmdk_co_flush()
does.  So it seems to me just like double the work.  (Which is
unfortunate but shouldn’t be a problem.)

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [PATCH v7 20/47] block: Iterate over children in refresh_limits
  2020-07-14 18:37   ` Andrey Shinkevich
@ 2020-07-16 15:14     ` Max Reitz
  0 siblings, 0 replies; 173+ messages in thread
From: Max Reitz @ 2020-07-16 15:14 UTC (permalink / raw)
  To: Andrey Shinkevich, qemu-block
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, qemu-devel


[-- Attachment #1.1: Type: text/plain, Size: 2853 bytes --]

On 14.07.20 20:37, Andrey Shinkevich wrote:
> On 25.06.2020 18:21, Max Reitz wrote:
>> Instead of looking at just bs->file and bs->backing, we should look at
>> all children that could end up receiving forwarded requests.
>>
>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>> ---
>>   block/io.c | 32 ++++++++++++++++----------------
>>   1 file changed, 16 insertions(+), 16 deletions(-)
>>
>> diff --git a/block/io.c b/block/io.c
>> index c2af7711d6..37057f13e0 100644
>> --- a/block/io.c
>> +++ b/block/io.c
>> @@ -135,6 +135,8 @@ static void bdrv_merge_limits(BlockLimits *dst,
>> const BlockLimits *src)
>>   void bdrv_refresh_limits(BlockDriverState *bs, Error **errp)
>>   {
>>       BlockDriver *drv = bs->drv;
>> +    BdrvChild *c;
>> +    bool have_limits;
>>       Error *local_err = NULL;
>>         memset(&bs->bl, 0, sizeof(bs->bl));
>> @@ -149,14 +151,21 @@ void bdrv_refresh_limits(BlockDriverState *bs,
>> Error **errp)
>>                                   drv->bdrv_co_preadv_part) ? 1 : 512;
>>         /* Take some limits from the children as a default */
>> -    if (bs->file) {
>> -        bdrv_refresh_limits(bs->file->bs, &local_err);
>> -        if (local_err) {
>> -            error_propagate(errp, local_err);
>> -            return;
>> +    have_limits = false;
>> +    QLIST_FOREACH(c, &bs->children, next) {
>> +        if (c->role & (BDRV_CHILD_DATA | BDRV_CHILD_FILTERED |
>> BDRV_CHILD_COW))
>> +        {
>> +            bdrv_refresh_limits(c->bs, &local_err);
>> +            if (local_err) {
>> +                error_propagate(errp, local_err);
>> +                return;
>> +            }
>> +            bdrv_merge_limits(&bs->bl, &c->bs->bl);
>> +            have_limits = true;
>>           }
>> -        bdrv_merge_limits(&bs->bl, &bs->file->bs->bl);
>> -    } else {
>> +    }
>> +
>> +    if (!have_limits) {
> 
> 
> This conditioned piece of code worked with (bs->file == NULL) only.
> 
> Now, it works only if there are neither bs->file, nor bs->backing, nor
> else filtered children.
> 
> Is it OK and doesn't break the logic for all cases?

Hm.  Good question.

I think the answer is it’s OK.

For DATA and FILTERED, it makes absolute sense to just use their
alignments.  For COW, maybe not so much?  But if there’s a COW child,
there has to be a DATA child as well (in practice).  So we’ll always
consider its alignment, too.

(And hypothetically speaking, if there was a COW child but no DATA
child, then the only alignment we need to observe is in fact the one of
the COW child.)

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [PATCH v7 21/47] block: Use CAFs in bdrv_refresh_filename()
  2020-07-15 12:52   ` Andrey Shinkevich
  2020-07-15 12:58     ` Andrey Shinkevich
@ 2020-07-16 15:21     ` Max Reitz
  1 sibling, 0 replies; 173+ messages in thread
From: Max Reitz @ 2020-07-16 15:21 UTC (permalink / raw)
  To: Andrey Shinkevich, qemu-block
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, qemu-devel


[-- Attachment #1.1: Type: text/plain, Size: 4452 bytes --]

On 15.07.20 14:52, Andrey Shinkevich wrote:
> On 25.06.2020 18:21, Max Reitz wrote:
>> bdrv_refresh_filename() and the kind of related bdrv_dirname() should
>> look to the primary child when they wish to copy the underlying file's
>> filename.
>>
>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>> ---
>>   block.c | 29 +++++++++++++++++++++--------
>>   1 file changed, 21 insertions(+), 8 deletions(-)
>>
>> diff --git a/block.c b/block.c
>> index 8131d0b5eb..7c827fefa0 100644
>> --- a/block.c
>> +++ b/block.c
>> @@ -6797,6 +6797,7 @@ void bdrv_refresh_filename(BlockDriverState *bs)
>>   {
>>       BlockDriver *drv = bs->drv;
>>       BdrvChild *child;
>> +    BlockDriverState *primary_child_bs;
>>       QDict *opts;
>>       bool backing_overridden;
>>       bool generate_json_filename; /* Whether our default
>> implementation should
>> @@ -6866,20 +6867,30 @@ void bdrv_refresh_filename(BlockDriverState *bs)
>>       qobject_unref(bs->full_open_options);
>>       bs->full_open_options = opts;
>>   +    primary_child_bs = bdrv_primary_bs(bs);
>> +
>>       if (drv->bdrv_refresh_filename) {
>>           /* Obsolete information is of no use here, so drop the old
>> file name
>>            * information before refreshing it */
>>           bs->exact_filename[0] = '\0';
>>             drv->bdrv_refresh_filename(bs);
>> -    } else if (bs->file) {
>> -        /* Try to reconstruct valid information from the underlying
>> file */
>> +    } else if (primary_child_bs) {
>> +        /*
>> +         * Try to reconstruct valid information from the underlying
>> +         * file -- this only works for format nodes (filter nodes
>> +         * cannot be probed and as such must be selected by the user
>> +         * either through an options dict, or through a special
>> +         * filename which the filter driver must construct in its
>> +         * .bdrv_refresh_filename() implementation).
>> +         */
> 
> 
> The caller may not be aware of a filter node and intend to refresh the
> name of underlying format node.
> 
> In that case, the filter driver should redirect the call to the format
> node.

It shouldn’t.  We can only return a plain filename if passing that
filename to qemu (e.g. to -drive) will result in the same block graph
configuration.

This is what the comment means by “filter nodes cannot be probed”: If
there is a filter node, we must generate a json:{} filename, because
otherwise reopening the block device with -drive by passing the filename
generated here would result in a configuration where the filter is missing.

> What are situations the name of the filter itself should be refreshed in?

Hypothetically, if a filename could specify a filter.  For example, say
the filename “filter[copy-on-read]:foo.qcow2” would result in qemu
creating a COR filter on top of a qcow2 node, then we could generate
such a filename.

In practice, filters cannot be configured through plain filenames (apart
from blkverify, but that’s a debugging feature, so it doesn’t really
matter), so there is no such situation.  All filter nodes should have an
empty exact_filename and thus get a json:{} pseudo-filename.

> If there are any, should we do both actions or choose either?
> 
> Andrey
> 
> 
>>             bs->exact_filename[0] = '\0';
>>             /*
>>            * We can use the underlying file's filename if:
>>            * - it has a filename,
>> +         * - the current BDS is not a filter,
> 
> 
> Should we check the function input parameter for being a filter's BS
> here, in this function, and handle the case here or let the filter
> driver function do that or else the caller should check it?

bdrv_refresh_filename() is called whenever some node in the block graph
has changed, just to refresh its filename (after that change).  The
caller generally doesn’t really care about the result, so it doesn’t
matter whether the node is a filter or not (i.e., whether it gets a
plain filename or not).

I don’t think the caller should check it, and in this implementation we
simply have to handle filter nodes correctly: That is, not give them a
plain filename.

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [PATCH v7 02/47] block: Add chain helper functions
  2020-07-16 14:50     ` Max Reitz
@ 2020-07-16 15:24       ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 173+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-07-16 15:24 UTC (permalink / raw)
  To: Max Reitz, qemu-block; +Cc: Kevin Wolf, qemu-devel

16.07.2020 17:50, Max Reitz wrote:
> On 13.07.20 12:18, Vladimir Sementsov-Ogievskiy wrote:
>> 25.06.2020 18:21, Max Reitz wrote:
>>> Add some helper functions for skipping filters in a chain of block
>>> nodes.
>>>
>>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>>> ---
>>>    include/block/block_int.h |  3 +++
>>>    block.c                   | 55 +++++++++++++++++++++++++++++++++++++++
>>>    2 files changed, 58 insertions(+)
>>>
>>> diff --git a/include/block/block_int.h b/include/block/block_int.h
>>> index bb3457c5e8..5da793bfc3 100644
>>
>>
>> This patch raises two questions:
>>
>> 1. How to treat filters at the end of the backing chain?
> 
> It was my understanding that this configuration would be impossible.

OK for me, but I'd prefer to have a comment and assertions.

> 
>> - child-access function will return no filter child for such nodes, it's
>> correct of course
>> - filer skipping functions will return this filter.. How much is it
>> correct - I don't know.
>>
>>
>> Consider a chain
>>
>> top --- backing ---> filter-with-no-child
> 
> How would it be possible to have filter without a child?
> 
>> if bdrv_backing_chain_next(top) returns NULL, it's incorrect, because
>> top actually have backing, and on read it will read from it for
>> unallocated clusters (and this should crash). So, probably, returning
>> filter as a backing-chain-next is a valid thing to do. Or we should
>> assert that we are not in such situation (which may crash more often
>> than trying to really read from nonexistent child).
>>
>> so, returning NULL, may even less correct than returning a filter..
>>
>>
>> 2. How to tread nodes with drv=NULL, but with filter child (with
>> BDRV_CHILD_FILTERED role).
> 
> drv=NULL is a bug on its own that should be fixed...  (The idea we had
> at some point was to introduce a special driver that just always returns
> -EIO on everything, and to replace corrupt qcow2 nodes by that.  Or,
> well, we might just return -EIO from the qcow2 driver, actually, if we
> never use drv=NULL anywhere else.)
> 
> In any case, drv=NULL is an edge case that I think never has anything to
> do with filters.

So, again some good comment and assertion won't hurt.

> 
>> - child-access functions returns no filtered child for such nodes
>> - filter skipping functions will stop on it..
>>
>> =======
>>
>> Isn't it better to drop drv->is_filter at all? And call filter nodes
>> with a bs->file or bs->backing
>> child in BDRV_CHILD_FILTERED role? This automatically closes the two
>> questions:
>>
>> - node without a child in BDRV_CHILD_FILTERED is automatically
>> non-filter. So, filter driver is responsible for having such child.
>> - node without a drv may still be a filter if it have
>> BDRV_CHILD_FILTERED.. Still, not very useful.
>>
>> Anyway, is_filter and BDRV_CHILD_FILTERED are in contradiction, and it
>> seems good to get rid of is_filter. But I may miss something.
>>
>> [..]
>>
>>> +
>>> +static BlockDriverState *bdrv_do_skip_filters(BlockDriverState *bs,
>>> +                                              bool
>>> stop_on_explicit_filter)
>>> +{
>>> +    BdrvChild *c;
>>> +
>>> +    if (!bs) {
>>> +        return NULL;
>>> +    }
>>> +
>>> +    while (!(stop_on_explicit_filter && !bs->implicit)) {
>>> +        c = bdrv_filter_child(bs);
>>> +        if (!c) {
>>> +            break;
>>> +        }
>>> +        bs = c->bs;
>>> +    }
>>> +    /*
>>> +     * Note that this treats nodes with bs->drv == NULL as not being
>>> +     * filters (bs->drv == NULL should be replaced by something else
>>> +     * anyway).
>>> +     * The advantage of this behavior is that this function will thus
>>> +     * always return a non-NULL value (given a non-NULL @bs).
>>
>> I don't see, how it is follows from first sentence? We can skip nodes
>> with a child of BDRV_CHILD_FILTERED and drv=NULL as well, and still return
>> non-NULL bs at the end...
> 
> My idea was that nodes with bs->drv == NULL might not even have
> children.  If we treated them like filters and skipped through them, we
> would have to return NULL if there is no child.
> 
>> Didn't you mean "treat nodes without filter child as not being filters,
>> even if they have drv->is_filter == true"? This is a real reason for the
>> second sentence.
> 
> Hm.  I implicitly always assume that filters always have a filter child,
> so I tend to not even question that part.
> 

Hmm. Still, the relationship in the comment seems unclear to me, the code itself is simpler :)

Okay, I'm actually OK with this all. I'd prefer to have assertions and comments on corner-cases I mentioned, but patch is OK as is:

Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>



-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [PATCH v7 28/47] block/null: Implement bdrv_get_allocated_file_size
  2020-06-25 15:21 ` [PATCH v7 28/47] block/null: Implement bdrv_get_allocated_file_size Max Reitz
@ 2020-07-20 15:10   ` Andrey Shinkevich
  2020-07-24  8:58     ` Max Reitz
  0 siblings, 1 reply; 173+ messages in thread
From: Andrey Shinkevich @ 2020-07-20 15:10 UTC (permalink / raw)
  To: Max Reitz, qemu-block
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, qemu-devel

On 25.06.2020 18:21, Max Reitz wrote:
> It is trivial, so we might as well do it.
>
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>   block/null.c               | 7 +++++++
>   tests/qemu-iotests/153.out | 2 +-
>   tests/qemu-iotests/184.out | 6 ++++--
>   3 files changed, 12 insertions(+), 3 deletions(-)
>
> diff --git a/block/null.c b/block/null.c
> index 15e1d56746..cc9b1d4ea7 100644
> --- a/block/null.c
> +++ b/block/null.c
> @@ -262,6 +262,11 @@ static void null_refresh_filename(BlockDriverState *bs)
>                bs->drv->format_name);
>   }
>   
> +static int64_t null_allocated_file_size(BlockDriverState *bs)
> +{
> +    return 0;
> +}
> +
>   static const char *const null_strong_runtime_opts[] = {
>       BLOCK_OPT_SIZE,
>       NULL_OPT_ZEROES,
> @@ -277,6 +282,7 @@ static BlockDriver bdrv_null_co = {
>       .bdrv_file_open         = null_file_open,
>       .bdrv_parse_filename    = null_co_parse_filename,
>       .bdrv_getlength         = null_getlength,
> +    .bdrv_get_allocated_file_size = null_allocated_file_size,
>   
>       .bdrv_co_preadv         = null_co_preadv,
>       .bdrv_co_pwritev        = null_co_pwritev,
> @@ -297,6 +303,7 @@ static BlockDriver bdrv_null_aio = {
>       .bdrv_file_open         = null_file_open,
>       .bdrv_parse_filename    = null_aio_parse_filename,
>       .bdrv_getlength         = null_getlength,
> +    .bdrv_get_allocated_file_size = null_allocated_file_size,
>   
>       .bdrv_aio_preadv        = null_aio_preadv,
>       .bdrv_aio_pwritev       = null_aio_pwritev,
> diff --git a/tests/qemu-iotests/153.out b/tests/qemu-iotests/153.out
> index b2a90caa6b..8659e6463b 100644
> --- a/tests/qemu-iotests/153.out
> +++ b/tests/qemu-iotests/153.out
> @@ -461,7 +461,7 @@ No conflict:
>   image: null-co://
>   file format: null-co
>   virtual size: 1 GiB (1073741824 bytes)
> -disk size: unavailable
> +disk size: 0 B
>   
>   Conflict:
>   qemu-img: --force-share/-U conflicts with image options
> diff --git a/tests/qemu-iotests/184.out b/tests/qemu-iotests/184.out
> index 3deb3cfb94..28b104da89 100644
> --- a/tests/qemu-iotests/184.out
> +++ b/tests/qemu-iotests/184.out
> @@ -29,7 +29,8 @@ Testing:
>               "image": {
>                   "virtual-size": 1073741824,
>                   "filename": "json:{\"throttle-group\": \"group0\", \"driver\": \"throttle\", \"file\": {\"driver\": \"null-co\"}}",
> -                "format": "throttle"
> +                "format": "throttle",
> +                "actual-size": SIZE


If we remove the _filter_generated_node_ids() in the current 
implementation of the test #184, we will get '"actual-size": 0'. It 
might be more informative when analyzing the output in 184.out.

Andrey


>               },
>               "iops_wr": 0,
>               "ro": false,
> @@ -56,7 +57,8 @@ Testing:
>               "image": {
>                   "virtual-size": 1073741824,
>                   "filename": "null-co://",
> -                "format": "null-co"
> +                "format": "null-co",
> +                "actual-size": SIZE
>               },
>               "iops_wr": 0,
>               "ro": false,


Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>




^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [PATCH v7 27/47] blkverify: Use bdrv_sum_allocated_file_size()
  2020-06-25 15:21 ` [PATCH v7 27/47] blkverify: Use bdrv_sum_allocated_file_size() Max Reitz
@ 2020-07-20 15:10   ` Andrey Shinkevich
  2020-08-19 10:46   ` Kevin Wolf
  1 sibling, 0 replies; 173+ messages in thread
From: Andrey Shinkevich @ 2020-07-20 15:10 UTC (permalink / raw)
  To: Max Reitz, qemu-block
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, qemu-devel

On 25.06.2020 18:21, Max Reitz wrote:
> blkverify is a filter, so bdrv_get_allocated_file_size()'s default
> implementation will return only the size of its filtered child.
> However, because both of its children are disk images, it makes more
> sense to sum both of their allocated sizes.
>
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>   block/blkverify.c | 1 +
>   1 file changed, 1 insertion(+)
>
> diff --git a/block/blkverify.c b/block/blkverify.c
> index 2f261de24b..64858c8df0 100644
> --- a/block/blkverify.c
> +++ b/block/blkverify.c
> @@ -323,6 +323,7 @@ static BlockDriver bdrv_blkverify = {
>       .bdrv_getlength                   = blkverify_getlength,
>       .bdrv_refresh_filename            = blkverify_refresh_filename,
>       .bdrv_dirname                     = blkverify_dirname,
> +    .bdrv_get_allocated_file_size     = bdrv_sum_allocated_file_size,
>   
>       .bdrv_co_preadv                   = blkverify_co_preadv,
>       .bdrv_co_pwritev                  = blkverify_co_pwritev,


Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>



^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [PATCH v7 26/47] block: Improve get_allocated_file_size's default
  2020-06-25 15:21 ` [PATCH v7 26/47] block: Improve get_allocated_file_size's default Max Reitz
@ 2020-07-20 15:12   ` Andrey Shinkevich
  0 siblings, 0 replies; 173+ messages in thread
From: Andrey Shinkevich @ 2020-07-20 15:12 UTC (permalink / raw)
  To: Max Reitz, qemu-block
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, qemu-devel

On 25.06.2020 18:21, Max Reitz wrote:
> There are two practical problems with bdrv_get_allocated_file_size()'s
> default right now:
> (1) For drivers with children, we should generally sum all their sizes
>      instead of just passing the request through to bs->file.  The latter
>      is good for filters, but not so much for format drivers.
>
> (2) Filters need not have bs->file, so we should actually go to the
>      filtered child instead of hard-coding bs->file.
>
> And we can make the whole default implementation more idiomatic by using
> the three generic functions added by the previous patch.
>
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>   block.c | 17 ++++++++++++++---
>   1 file changed, 14 insertions(+), 3 deletions(-)
>
> diff --git a/block.c b/block.c
> index fc01ce90b3..a19f243997 100644
> --- a/block.c
> +++ b/block.c
> @@ -4997,10 +4997,21 @@ int64_t bdrv_get_allocated_file_size(BlockDriverState *bs)
>       if (drv->bdrv_get_allocated_file_size) {
>           return drv->bdrv_get_allocated_file_size(bs);
>       }
> -    if (bs->file) {
> -        return bdrv_get_allocated_file_size(bs->file->bs);
> +
> +    if (drv->bdrv_file_open) {
> +        /*
> +         * Protocol drivers default to -ENOTSUP (most of their data is
> +         * not stored in any of their children (if they even have any),
> +         * so there is no generic way to figure it out).
> +         */
> +        return bdrv_notsup_allocated_file_size(bs);
> +    } else if (drv->is_filter) {
> +        /* Filter drivers default to the size of their primary child */
> +        return bdrv_primary_allocated_file_size(bs);
> +    } else {
> +        /* Other drivers default to summing their children's sizes */
> +        return bdrv_sum_allocated_file_size(bs);
>       }
> -    return -ENOTSUP;
>   }
>   
>   /**


Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>




^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [PATCH v7 29/47] blockdev: Use CAF in external_snapshot_prepare()
  2020-06-25 15:21 ` [PATCH v7 29/47] blockdev: Use CAF in external_snapshot_prepare() Max Reitz
@ 2020-07-20 16:08   ` Andrey Shinkevich
  2020-07-24  9:23     ` Max Reitz
  0 siblings, 1 reply; 173+ messages in thread
From: Andrey Shinkevich @ 2020-07-20 16:08 UTC (permalink / raw)
  To: Max Reitz, qemu-block
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, qemu-devel

On 25.06.2020 18:21, Max Reitz wrote:
> This allows us to differentiate between filters and nodes with COW
> backing files: Filters cannot be used as overlays at all (for this
> function).
>
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>   blockdev.c | 7 ++++++-
>   1 file changed, 6 insertions(+), 1 deletion(-)
>
> diff --git a/blockdev.c b/blockdev.c
> index 1eb0fcdea2..aabe51036d 100644
> --- a/blockdev.c
> +++ b/blockdev.c
> @@ -1549,7 +1549,12 @@ static void external_snapshot_prepare(BlkActionState *common,
>           goto out;
>       }
>   
> -    if (state->new_bs->backing != NULL) {
> +    if (state->new_bs->drv->is_filter) {


Is there a chance to get a filter here? If so, is that when a user 
specifies the file name of such a kind “filter[filter-name]:foo.qcow2” 
or somehow else?

Andrey


> +        error_setg(errp, "Filters cannot be used as overlays");
> +        goto out;
> +    }
> +
> +    if (bdrv_cow_child(state->new_bs)) {
>           error_setg(errp, "The overlay already has a backing image");
>           goto out;
>       }


Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>




^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [PATCH v7 30/47] block: Report data child for query-blockstats
  2020-06-25 15:21 ` [PATCH v7 30/47] block: Report data child for query-blockstats Max Reitz
@ 2020-07-21 11:48   ` Andrey Shinkevich
  0 siblings, 0 replies; 173+ messages in thread
From: Andrey Shinkevich @ 2020-07-21 11:48 UTC (permalink / raw)
  To: Max Reitz, qemu-block
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, qemu-devel

On 25.06.2020 18:21, Max Reitz wrote:
> It makes no sense to report the block stats of a purely metadata-storing
> child in query-blockstats.  So if the primary child does not have any
> data, try to find a unique data-storing child.
>
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>   block/qapi.c | 31 +++++++++++++++++++++++++++++--
>   1 file changed, 29 insertions(+), 2 deletions(-)
>
> diff --git a/block/qapi.c b/block/qapi.c
> index 4807a2b344..c57b42d86d 100644
> --- a/block/qapi.c
> +++ b/block/qapi.c
> @@ -526,6 +526,7 @@ static void bdrv_query_blk_stats(BlockDeviceStats *ds, BlockBackend *blk)
>   static BlockStats *bdrv_query_bds_stats(BlockDriverState *bs,
>                                           bool blk_level)
>   {
> +    BdrvChild *parent_child;
>       BlockStats *s = NULL;
>   
>       s = g_malloc0(sizeof(*s));
> @@ -555,9 +556,35 @@ static BlockStats *bdrv_query_bds_stats(BlockDriverState *bs,
>           s->has_driver_specific = true;
>       }
>   
> -    if (bs->file) {
> +    parent_child = bdrv_primary_child(bs);
> +    if (!parent_child ||
> +        !(parent_child->role & (BDRV_CHILD_DATA | BDRV_CHILD_FILTERED)))
> +    {
> +        BdrvChild *c;
> +
> +        /*
> +         * Look for a unique data-storing child.  We do not need to look for
> +         * filtered children, as there would be only one and it would have been
> +         * the primary child.
> +         */
> +        parent_child = NULL;
> +        QLIST_FOREACH(c, &bs->children, next) {
> +            if (c->role & BDRV_CHILD_DATA) {
> +                if (parent_child) {
> +                    /*
> +                     * There are multiple data-storing children and we cannot
> +                     * choose between them.
> +                     */
> +                    parent_child = NULL;
> +                    break;
> +                }
> +                parent_child = c;
> +            }
> +        }
> +    }
> +    if (parent_child) {
>           s->has_parent = true;
> -        s->parent = bdrv_query_bds_stats(bs->file->bs, blk_level);
> +        s->parent = bdrv_query_bds_stats(parent_child->bs, blk_level);
>       }
>   
>       if (blk_level && bs->backing) {


Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>




^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [PATCH v7 31/47] block: Use child access functions for QAPI queries
  2020-06-25 15:21 ` [PATCH v7 31/47] block: Use child access functions for QAPI queries Max Reitz
@ 2020-07-21 12:30   ` Andrey Shinkevich
  0 siblings, 0 replies; 173+ messages in thread
From: Andrey Shinkevich @ 2020-07-21 12:30 UTC (permalink / raw)
  To: Max Reitz, qemu-block
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, qemu-devel

On 25.06.2020 18:21, Max Reitz wrote:
> query-block, query-named-block-nodes, and query-blockstats now return
> any filtered child under "backing", not just bs->backing or COW
> children.  This is so that filters do not interrupt the reported backing
> chain.  This changes the output for iotest 184, as the throttled node
> now appears as a backing child.
>
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>   block/qapi.c               | 33 ++++++++++++++++++++-------------
>   tests/qemu-iotests/184.out |  8 +++++++-
>   2 files changed, 27 insertions(+), 14 deletions(-)

...


Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>




^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [PATCH v7 32/47] block-copy: Use CAF to find sync=top base
  2020-06-25 15:22 ` [PATCH v7 32/47] block-copy: Use CAF to find sync=top base Max Reitz
@ 2020-07-21 12:42   ` Andrey Shinkevich
  0 siblings, 0 replies; 173+ messages in thread
From: Andrey Shinkevich @ 2020-07-21 12:42 UTC (permalink / raw)
  To: Max Reitz, qemu-block
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, qemu-devel

On 25.06.2020 18:22, Max Reitz wrote:
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>   block/block-copy.c | 4 ++--
>   1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/block/block-copy.c b/block/block-copy.c
> index f7428a7c08..5e80569bb8 100644
> --- a/block/block-copy.c
> +++ b/block/block-copy.c
> @@ -437,8 +437,8 @@ static int block_copy_block_status(BlockCopyState *s, int64_t offset,
>       BlockDriverState *base;
>       int ret;
>   
> -    if (s->skip_unallocated && s->source->bs->backing) {
> -        base = s->source->bs->backing->bs;
> +    if (s->skip_unallocated) {
> +        base = bdrv_backing_chain_next(s->source->bs);
>       } else {
>           base = NULL;
>       }


Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>




^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [PATCH v7 33/47] mirror: Deal with filters
  2020-06-25 15:22 ` [PATCH v7 33/47] mirror: Deal with filters Max Reitz
@ 2020-07-22 18:31   ` Andrey Shinkevich
  2020-07-24  9:49     ` Max Reitz
  2020-08-19 16:50   ` Kevin Wolf
  1 sibling, 1 reply; 173+ messages in thread
From: Andrey Shinkevich @ 2020-07-22 18:31 UTC (permalink / raw)
  To: Max Reitz, qemu-block
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, qemu-devel

On 25.06.2020 18:22, Max Reitz wrote:
> This includes some permission limiting (for example, we only need to
> take the RESIZE permission for active commits where the base is smaller
> than the top).
>
> Use this opportunity to rename qmp_drive_mirror()'s "source" BDS to
> "target_backing_bs", because that is what it really refers to.
>
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>   qapi/block-core.json |   6 ++-
>   block/mirror.c       | 118 +++++++++++++++++++++++++++++++++----------
>   blockdev.c           |  36 +++++++++----
>   3 files changed, 121 insertions(+), 39 deletions(-)
>
...
> diff --git a/block/mirror.c b/block/mirror.c
> index 469acf4600..770de3b34e 100644
> --- a/block/mirror.c
> +++ b/block/mirror.c
> @@ -42,6 +42,7 @@ typedef struct MirrorBlockJob {
>       BlockBackend *target;
>       BlockDriverState *mirror_top_bs;
>       BlockDriverState *base;
> +    BlockDriverState *base_overlay;
>   
>       /* The name of the graph node to replace */
>       char *replaces;
> @@ -677,8 +678,10 @@ static int mirror_exit_common(Job *job)
>                                &error_abort);
>       if (!abort && s->backing_mode == MIRROR_SOURCE_BACKING_CHAIN) {
>           BlockDriverState *backing = s->is_none_mode ? src : s->base;
> -        if (backing_bs(target_bs) != backing) {
> -            bdrv_set_backing_hd(target_bs, backing, &local_err);
> +        BlockDriverState *unfiltered_target = bdrv_skip_filters(target_bs);
> +
> +        if (bdrv_cow_bs(unfiltered_target) != backing) {


I just worry about a filter node of the concurrent job right below the 
unfiltered_target. The filter has unfiltered_target in its parent list. 
Will that filter node be replaced correctly then?


Andrey

...

> +        /*
> +         * The topmost node with
> +         * bdrv_skip_filters(filtered_target) == bdrv_skip_filters(target)
> +         */
> +        filtered_target = bdrv_cow_bs(bdrv_find_overlay(bs, target));
> +
> +        assert(bdrv_skip_filters(filtered_target) ==
> +               bdrv_skip_filters(target));
> +
> +        /*
> +         * XXX BLK_PERM_WRITE needs to be allowed so we don't block
> +         * ourselves at s->base (if writes are blocked for a node, they are
> +         * also blocked for its backing file). The other options would be a
> +         * second filter driver above s->base (== target).
> +         */
> +        iter_shared_perms = BLK_PERM_WRITE_UNCHANGED | BLK_PERM_WRITE;
> +
> +        for (iter = bdrv_filter_or_cow_bs(bs); iter != target;
> +             iter = bdrv_filter_or_cow_bs(iter))
> +        {
> +            if (iter == filtered_target) {


For one filter node only?


> +                /*
> +                 * From here on, all nodes are filters on the base.
> +                 * This allows us to share BLK_PERM_CONSISTENT_READ.
> +                 */
> +                iter_shared_perms |= BLK_PERM_CONSISTENT_READ;
> +            }
> +
>               ret = block_job_add_bdrv(&s->common, "intermediate node", iter, 0,
> -                                     BLK_PERM_WRITE_UNCHANGED | BLK_PERM_WRITE,
> -                                     errp);
> +                                     iter_shared_perms, errp);
>               if (ret < 0) {
>                   goto fail;
>               }
...
> @@ -3042,6 +3053,7 @@ void qmp_drive_mirror(DriveMirror *arg, Error **errp)
>                                " named node of the graph");
>               goto out;
>           }
> +        replaces_node_name = arg->replaces;


What is the idea behind the variables substitution?

Probably, the patch might be split out.

Andrey




^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [PATCH v7 34/47] backup: Deal with filters
  2020-06-25 15:22 ` [PATCH v7 34/47] backup: " Max Reitz
@ 2020-07-23 15:51   ` Andrey Shinkevich
  2020-07-24  9:55     ` Max Reitz
  0 siblings, 1 reply; 173+ messages in thread
From: Andrey Shinkevich @ 2020-07-23 15:51 UTC (permalink / raw)
  To: Max Reitz, qemu-block
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, qemu-devel

On 25.06.2020 18:22, Max Reitz wrote:
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>   block/backup-top.c |  2 +-
>   block/backup.c     |  9 +++++----
>   blockdev.c         | 19 +++++++++++++++----
>   3 files changed, 21 insertions(+), 9 deletions(-)
>
>
> diff --git a/block/backup.c b/block/backup.c
> index 4f13bb20a5..9afa0bf3b4 100644
> --- a/block/backup.c
> +++ b/block/backup.c
> @@ -297,6 +297,7 @@ static int64_t backup_calculate_cluster_size(BlockDriverState *target,
>   {
>       int ret;
>       BlockDriverInfo bdi;
> +    bool target_does_cow = bdrv_backing_chain_next(target);
>   


Wouldn't it better to make the explicit type conversion or use "!!" 
approach?

Andrey


>       /*
>        * If there is no backing file on the target, we cannot rely on COW if our
> @@ -304,7 +305,7 @@ static int64_t backup_calculate_cluster_size(BlockDriverState *target,
>        * targets with a backing file, try to avoid COW if possible.
>        */
>       ret = bdrv_get_info(target, &bdi);
> -    if (ret == -ENOTSUP && !target->backing) {
> +    if (ret == -ENOTSUP && !target_does_cow) {
>           /* Cluster size is not defined */

...

Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>




^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [PATCH v7 35/47] commit: Deal with filters
  2020-06-25 15:22 ` [PATCH v7 35/47] commit: " Max Reitz
@ 2020-07-23 17:15   ` Andrey Shinkevich
  2020-07-24 10:36     ` Andrey Shinkevich
  2020-08-19 17:58   ` Kevin Wolf
  1 sibling, 1 reply; 173+ messages in thread
From: Andrey Shinkevich @ 2020-07-23 17:15 UTC (permalink / raw)
  To: Max Reitz, qemu-block
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, qemu-devel

On 25.06.2020 18:22, Max Reitz wrote:
> This includes some permission limiting (for example, we only need to
> take the RESIZE permission if the base is smaller than the top).
>
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>   block/block-backend.c          |  9 +++-
>   block/commit.c                 | 96 +++++++++++++++++++++++++---------
>   block/monitor/block-hmp-cmds.c |  2 +-
>   blockdev.c                     |  4 +-
>   4 files changed, 81 insertions(+), 30 deletions(-)
>
...
> +    /*
> +     * Block all nodes between top and base, because they will
> +     * disappear from the chain after this operation.
> +     * Note that this assumes that the user is fine with removing all
> +     * nodes (including R/W filters) between top and base.  Assuring
> +     * this is the responsibility of the interface (i.e. whoever calls
> +     * commit_start()).
> +     */
> +    s->base_overlay = bdrv_find_overlay(top, base);
> +    assert(s->base_overlay);
> +
> +    /*
> +     * The topmost node with
> +     * bdrv_skip_filters(filtered_base) == bdrv_skip_filters(base)
> +     */
> +    filtered_base = bdrv_cow_bs(s->base_overlay);
> +    assert(bdrv_skip_filters(filtered_base) == bdrv_skip_filters(base));
> +
> +    /*
> +     * XXX BLK_PERM_WRITE needs to be allowed so we don't block ourselves
> +     * at s->base (if writes are blocked for a node, they are also blocked
> +     * for its backing file). The other options would be a second filter
> +     * driver above s->base.
> +     */
> +    iter_shared_perms = BLK_PERM_WRITE_UNCHANGED | BLK_PERM_WRITE;
> +
> +    for (iter = top; iter != base; iter = bdrv_filter_or_cow_bs(iter)) {
> +        if (iter == filtered_base) {


The question same to mirroring:

in case of multiple filters, one above another, the permission is 
extended for the filtered_base only.

Andrey


> +            /*
> +             * From here on, all nodes are filters on the base.  This
> +             * allows us to share BLK_PERM_CONSISTENT_READ.
> +             */
> +            iter_shared_perms |= BLK_PERM_CONSISTENT_READ;
> +        }
> +
>           ret = block_job_add_bdrv(&s->common, "intermediate node", iter, 0,
> -                                 BLK_PERM_WRITE_UNCHANGED | BLK_PERM_WRITE,
> -                                 errp);
> +                                 iter_shared_perms, errp);
>           if (ret < 0) {
>               goto fail;
>           }

...

Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>




^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [PATCH v7 36/47] nbd: Use CAF when looking for dirty bitmap
  2020-06-25 15:22 ` [PATCH v7 36/47] nbd: Use CAF when looking for dirty bitmap Max Reitz
@ 2020-07-23 17:21   ` Andrey Shinkevich
  0 siblings, 0 replies; 173+ messages in thread
From: Andrey Shinkevich @ 2020-07-23 17:21 UTC (permalink / raw)
  To: Max Reitz, qemu-block
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, qemu-devel

On 25.06.2020 18:22, Max Reitz wrote:
> When looking for a dirty bitmap to share, we should handle filters by
> just including them in the search (so they do not break backing chains).
>
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>   nbd/server.c | 6 +++---
>   1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/nbd/server.c b/nbd/server.c
> index 20754e9ebc..b504a79435 100644
> --- a/nbd/server.c
> +++ b/nbd/server.c
> @@ -1561,13 +1561,13 @@ NBDExport *nbd_export_new(BlockDriverState *bs, uint64_t dev_offset,
>       if (bitmap) {
>           BdrvDirtyBitmap *bm = NULL;
>   
> -        while (true) {
> +        while (bs) {
>               bm = bdrv_find_dirty_bitmap(bs, bitmap);
> -            if (bm != NULL || bs->backing == NULL) {
> +            if (bm != NULL) {
>                   break;
>               }
>   
> -            bs = bs->backing->bs;
> +            bs = bdrv_filter_or_cow_bs(bs);
>           }
>   
>           if (bm == NULL) {


Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>




^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [PATCH v7 28/47] block/null: Implement bdrv_get_allocated_file_size
  2020-07-20 15:10   ` Andrey Shinkevich
@ 2020-07-24  8:58     ` Max Reitz
  2020-07-24  9:49       ` Andrey Shinkevich
  0 siblings, 1 reply; 173+ messages in thread
From: Max Reitz @ 2020-07-24  8:58 UTC (permalink / raw)
  To: Andrey Shinkevich, qemu-block
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, qemu-devel


[-- Attachment #1.1: Type: text/plain, Size: 3263 bytes --]

On 20.07.20 17:10, Andrey Shinkevich wrote:
> On 25.06.2020 18:21, Max Reitz wrote:
>> It is trivial, so we might as well do it.
>>
>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>> ---
>>   block/null.c               | 7 +++++++
>>   tests/qemu-iotests/153.out | 2 +-
>>   tests/qemu-iotests/184.out | 6 ++++--
>>   3 files changed, 12 insertions(+), 3 deletions(-)
>>
>> diff --git a/block/null.c b/block/null.c
>> index 15e1d56746..cc9b1d4ea7 100644
>> --- a/block/null.c
>> +++ b/block/null.c
>> @@ -262,6 +262,11 @@ static void
>> null_refresh_filename(BlockDriverState *bs)
>>                bs->drv->format_name);
>>   }
>>   +static int64_t null_allocated_file_size(BlockDriverState *bs)
>> +{
>> +    return 0;
>> +}
>> +
>>   static const char *const null_strong_runtime_opts[] = {
>>       BLOCK_OPT_SIZE,
>>       NULL_OPT_ZEROES,
>> @@ -277,6 +282,7 @@ static BlockDriver bdrv_null_co = {
>>       .bdrv_file_open         = null_file_open,
>>       .bdrv_parse_filename    = null_co_parse_filename,
>>       .bdrv_getlength         = null_getlength,
>> +    .bdrv_get_allocated_file_size = null_allocated_file_size,
>>         .bdrv_co_preadv         = null_co_preadv,
>>       .bdrv_co_pwritev        = null_co_pwritev,
>> @@ -297,6 +303,7 @@ static BlockDriver bdrv_null_aio = {
>>       .bdrv_file_open         = null_file_open,
>>       .bdrv_parse_filename    = null_aio_parse_filename,
>>       .bdrv_getlength         = null_getlength,
>> +    .bdrv_get_allocated_file_size = null_allocated_file_size,
>>         .bdrv_aio_preadv        = null_aio_preadv,
>>       .bdrv_aio_pwritev       = null_aio_pwritev,
>> diff --git a/tests/qemu-iotests/153.out b/tests/qemu-iotests/153.out
>> index b2a90caa6b..8659e6463b 100644
>> --- a/tests/qemu-iotests/153.out
>> +++ b/tests/qemu-iotests/153.out
>> @@ -461,7 +461,7 @@ No conflict:
>>   image: null-co://
>>   file format: null-co
>>   virtual size: 1 GiB (1073741824 bytes)
>> -disk size: unavailable
>> +disk size: 0 B
>>     Conflict:
>>   qemu-img: --force-share/-U conflicts with image options
>> diff --git a/tests/qemu-iotests/184.out b/tests/qemu-iotests/184.out
>> index 3deb3cfb94..28b104da89 100644
>> --- a/tests/qemu-iotests/184.out
>> +++ b/tests/qemu-iotests/184.out
>> @@ -29,7 +29,8 @@ Testing:
>>               "image": {
>>                   "virtual-size": 1073741824,
>>                   "filename": "json:{\"throttle-group\": \"group0\",
>> \"driver\": \"throttle\", \"file\": {\"driver\": \"null-co\"}}",
>> -                "format": "throttle"
>> +                "format": "throttle",
>> +                "actual-size": SIZE
> 
> 
> If we remove the _filter_generated_node_ids() in the current
> implementation of the test #184, we will get '"actual-size": 0'. It
> might be more informative when analyzing the output in 184.out.

You mean _filter_actual_image_size?  Yeah, why not, it doesn’t seem
necessary here.

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [PATCH v7 29/47] blockdev: Use CAF in external_snapshot_prepare()
  2020-07-20 16:08   ` Andrey Shinkevich
@ 2020-07-24  9:23     ` Max Reitz
  2020-07-24 10:37       ` Andrey Shinkevich
  0 siblings, 1 reply; 173+ messages in thread
From: Max Reitz @ 2020-07-24  9:23 UTC (permalink / raw)
  To: Andrey Shinkevich, qemu-block
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, qemu-devel


[-- Attachment #1.1: Type: text/plain, Size: 1553 bytes --]

On 20.07.20 18:08, Andrey Shinkevich wrote:
> On 25.06.2020 18:21, Max Reitz wrote:
>> This allows us to differentiate between filters and nodes with COW
>> backing files: Filters cannot be used as overlays at all (for this
>> function).
>>
>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>> ---
>>   blockdev.c | 7 ++++++-
>>   1 file changed, 6 insertions(+), 1 deletion(-)
>>
>> diff --git a/blockdev.c b/blockdev.c
>> index 1eb0fcdea2..aabe51036d 100644
>> --- a/blockdev.c
>> +++ b/blockdev.c
>> @@ -1549,7 +1549,12 @@ static void
>> external_snapshot_prepare(BlkActionState *common,
>>           goto out;
>>       }
>>   -    if (state->new_bs->backing != NULL) {
>> +    if (state->new_bs->drv->is_filter) {
> 
> 
> Is there a chance to get a filter here? If so, is that when a user
> specifies the file name of such a kind “filter[filter-name]:foo.qcow2”
> or somehow else?

It would be with blockdev-snapshot and by specifying a filter for
@overlay.  Technically that’s already caught by the check whether the
overlay supports backing images (whether drv->supports_backing is true),
but we might as well give a nicer error message.

Example:

{"execute":"qmp_capabilities"}

{"execute":"blockdev-add","arguments":
 {"node-name":"overlay","driver":"copy-on-read",
  "file":{"driver":"null-co"}}}

{"execute":"blockdev-add","arguments":
 {"node-name":"base","driver":"null-co"}}

{"execute":"blockdev-snapshot","arguments":
 {"node":"base","overlay":"overlay"}}

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [PATCH v7 33/47] mirror: Deal with filters
  2020-07-22 18:31   ` Andrey Shinkevich
@ 2020-07-24  9:49     ` Max Reitz
  2020-07-24 10:27       ` Andrey Shinkevich
  0 siblings, 1 reply; 173+ messages in thread
From: Max Reitz @ 2020-07-24  9:49 UTC (permalink / raw)
  To: Andrey Shinkevich, qemu-block
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, qemu-devel


[-- Attachment #1.1: Type: text/plain, Size: 5403 bytes --]

On 22.07.20 20:31, Andrey Shinkevich wrote:
> On 25.06.2020 18:22, Max Reitz wrote:
>> This includes some permission limiting (for example, we only need to
>> take the RESIZE permission for active commits where the base is smaller
>> than the top).
>>
>> Use this opportunity to rename qmp_drive_mirror()'s "source" BDS to
>> "target_backing_bs", because that is what it really refers to.
>>
>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>> ---
>>   qapi/block-core.json |   6 ++-
>>   block/mirror.c       | 118 +++++++++++++++++++++++++++++++++----------
>>   blockdev.c           |  36 +++++++++----
>>   3 files changed, 121 insertions(+), 39 deletions(-)
>>
> ...
>> diff --git a/block/mirror.c b/block/mirror.c
>> index 469acf4600..770de3b34e 100644
>> --- a/block/mirror.c
>> +++ b/block/mirror.c
>> @@ -42,6 +42,7 @@ typedef struct MirrorBlockJob {
>>       BlockBackend *target;
>>       BlockDriverState *mirror_top_bs;
>>       BlockDriverState *base;
>> +    BlockDriverState *base_overlay;
>>         /* The name of the graph node to replace */
>>       char *replaces;
>> @@ -677,8 +678,10 @@ static int mirror_exit_common(Job *job)
>>                                &error_abort);
>>       if (!abort && s->backing_mode == MIRROR_SOURCE_BACKING_CHAIN) {
>>           BlockDriverState *backing = s->is_none_mode ? src : s->base;
>> -        if (backing_bs(target_bs) != backing) {
>> -            bdrv_set_backing_hd(target_bs, backing, &local_err);
>> +        BlockDriverState *unfiltered_target =
>> bdrv_skip_filters(target_bs);
>> +
>> +        if (bdrv_cow_bs(unfiltered_target) != backing) {
> 
> 
> I just worry about a filter node of the concurrent job right below the
> unfiltered_target.

Having a concurrent job on the target sounds extremely problematic in
itself (because at least for most of the mirror job, the target isn’t in
a consistent state).  Is that a real use case?

> The filter has unfiltered_target in its parent list.
> Will that filter node be replaced correctly then?

I’m also not quite sure what you mean.  We need to attach the source’s
backing chain to the target here, so we go down to the first node that
might accept COW backing files (by invoking bdrv_skip_filters()).  That
should be correct no matter what kind of filters are on it.
>> +        /*
>> +         * The topmost node with
>> +         * bdrv_skip_filters(filtered_target) ==
>> bdrv_skip_filters(target)
>> +         */
>> +        filtered_target = bdrv_cow_bs(bdrv_find_overlay(bs, target));
>> +
>> +        assert(bdrv_skip_filters(filtered_target) ==
>> +               bdrv_skip_filters(target));
>> +
>> +        /*
>> +         * XXX BLK_PERM_WRITE needs to be allowed so we don't block
>> +         * ourselves at s->base (if writes are blocked for a node,
>> they are
>> +         * also blocked for its backing file). The other options
>> would be a
>> +         * second filter driver above s->base (== target).
>> +         */
>> +        iter_shared_perms = BLK_PERM_WRITE_UNCHANGED | BLK_PERM_WRITE;
>> +
>> +        for (iter = bdrv_filter_or_cow_bs(bs); iter != target;
>> +             iter = bdrv_filter_or_cow_bs(iter))
>> +        {
>> +            if (iter == filtered_target) {
> 
> 
> For one filter node only?

No, iter_shared_perms is never reset, so it retains the
BLK_PERM_CONSISTENT_READ flag until the end of the loop.

>> +                /*
>> +                 * From here on, all nodes are filters on the base.
>> +                 * This allows us to share BLK_PERM_CONSISTENT_READ.
>> +                 */
>> +                iter_shared_perms |= BLK_PERM_CONSISTENT_READ;
>> +            }
>> +
>>               ret = block_job_add_bdrv(&s->common, "intermediate
>> node", iter, 0,
>> -                                     BLK_PERM_WRITE_UNCHANGED |
>> BLK_PERM_WRITE,
>> -                                     errp);
>> +                                     iter_shared_perms, errp);
>>               if (ret < 0) {
>>                   goto fail;
>>               }
> ...
>> @@ -3042,6 +3053,7 @@ void qmp_drive_mirror(DriveMirror *arg, Error
>> **errp)
>>                                " named node of the graph");
>>               goto out;
>>           }
>> +        replaces_node_name = arg->replaces;
> 
> 
> What is the idea behind the variables substitution?

Looks like a remnant from v6, where there was an

if (arg->has_replaces) {
    ...
    replaces_node_name = arg->replaces;
} else if (unfiltered_bs != bs) {
    replaces_node_name = unfiltered_bs->node_name;
}

But I moved that logic to blockdev_mirror_common() in this version.

So it’s just useless now and replaces_node_name shouldn’t exist.

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [PATCH v7 28/47] block/null: Implement bdrv_get_allocated_file_size
  2020-07-24  8:58     ` Max Reitz
@ 2020-07-24  9:49       ` Andrey Shinkevich
  0 siblings, 0 replies; 173+ messages in thread
From: Andrey Shinkevich @ 2020-07-24  9:49 UTC (permalink / raw)
  To: Max Reitz, qemu-block
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, qemu-devel

On 24.07.2020 11:58, Max Reitz wrote:
> On 20.07.20 17:10, Andrey Shinkevich wrote:
>> On 25.06.2020 18:21, Max Reitz wrote:
>>> It is trivial, so we might as well do it.
>>>
>>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>>> ---
>>>    block/null.c               | 7 +++++++
>>>    tests/qemu-iotests/153.out | 2 +-
>>>    tests/qemu-iotests/184.out | 6 ++++--
>>>    3 files changed, 12 insertions(+), 3 deletions(-)
...
>>> diff --git a/tests/qemu-iotests/184.out b/tests/qemu-iotests/184.out
>>> index 3deb3cfb94..28b104da89 100644
>>> --- a/tests/qemu-iotests/184.out
>>> +++ b/tests/qemu-iotests/184.out
>>> @@ -29,7 +29,8 @@ Testing:
>>>                "image": {
>>>                    "virtual-size": 1073741824,
>>>                    "filename": "json:{\"throttle-group\": \"group0\",
>>> \"driver\": \"throttle\", \"file\": {\"driver\": \"null-co\"}}",
>>> -                "format": "throttle"
>>> +                "format": "throttle",
>>> +                "actual-size": SIZE
>>
>> If we remove the _filter_generated_node_ids() in the current
>> implementation of the test #184, we will get '"actual-size": 0'. It
>> might be more informative when analyzing the output in 184.out.
> You mean _filter_actual_image_size?  Yeah, why not, it doesn’t seem
> necessary here.
>
> Max
>

Yes Max, you are right, I ment the _filter_actual_image_size. It was my 
copy/paste mistake.

Andrey



^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [PATCH v7 34/47] backup: Deal with filters
  2020-07-23 15:51   ` Andrey Shinkevich
@ 2020-07-24  9:55     ` Max Reitz
  0 siblings, 0 replies; 173+ messages in thread
From: Max Reitz @ 2020-07-24  9:55 UTC (permalink / raw)
  To: Andrey Shinkevich, qemu-block
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, qemu-devel


[-- Attachment #1.1: Type: text/plain, Size: 1204 bytes --]

On 23.07.20 17:51, Andrey Shinkevich wrote:
> On 25.06.2020 18:22, Max Reitz wrote:
>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>> ---
>>   block/backup-top.c |  2 +-
>>   block/backup.c     |  9 +++++----
>>   blockdev.c         | 19 +++++++++++++++----
>>   3 files changed, 21 insertions(+), 9 deletions(-)
>>
>>
>> diff --git a/block/backup.c b/block/backup.c
>> index 4f13bb20a5..9afa0bf3b4 100644
>> --- a/block/backup.c
>> +++ b/block/backup.c
>> @@ -297,6 +297,7 @@ static int64_t
>> backup_calculate_cluster_size(BlockDriverState *target,
>>   {
>>       int ret;
>>       BlockDriverInfo bdi;
>> +    bool target_does_cow = bdrv_backing_chain_next(target);
>>   
> 
> 
> Wouldn't it better to make the explicit type conversion or use "!!"
> approach?

I don’t know. O:)

I personally don’t like too may exclamation marks because I feel like
the code is screaming at me.  So I tend to use them only where necessary.

As for doing an explicit cast...  I think I’ll keep that in mind to
reduce my future use of !!.  But in this case, the type name is in the
same line, so I feel like it’s sufficiently clear.

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [PATCH v7 33/47] mirror: Deal with filters
  2020-07-24  9:49     ` Max Reitz
@ 2020-07-24 10:27       ` Andrey Shinkevich
  0 siblings, 0 replies; 173+ messages in thread
From: Andrey Shinkevich @ 2020-07-24 10:27 UTC (permalink / raw)
  To: Max Reitz, qemu-block
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, qemu-devel

On 24.07.2020 12:49, Max Reitz wrote:
> On 22.07.20 20:31, Andrey Shinkevich wrote:
>> On 25.06.2020 18:22, Max Reitz wrote:
>>> This includes some permission limiting (for example, we only need to
>>> take the RESIZE permission for active commits where the base is smaller
>>> than the top).
>>>
>>> Use this opportunity to rename qmp_drive_mirror()'s "source" BDS to
>>> "target_backing_bs", because that is what it really refers to.
>>>
>>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>>> ---
>>>    qapi/block-core.json |   6 ++-
>>>    block/mirror.c       | 118 +++++++++++++++++++++++++++++++++----------
>>>    blockdev.c           |  36 +++++++++----
>>>    3 files changed, 121 insertions(+), 39 deletions(-)
>>>
>> ...
>>> diff --git a/block/mirror.c b/block/mirror.c
>>> index 469acf4600..770de3b34e 100644
>>> --- a/block/mirror.c
>>> +++ b/block/mirror.c
>>> @@ -42,6 +42,7 @@ typedef struct MirrorBlockJob {
>>>        BlockBackend *target;
>>>        BlockDriverState *mirror_top_bs;
>>>        BlockDriverState *base;
>>> +    BlockDriverState *base_overlay;
>>>          /* The name of the graph node to replace */
>>>        char *replaces;
>>> @@ -677,8 +678,10 @@ static int mirror_exit_common(Job *job)
>>>                                 &error_abort);
>>>        if (!abort && s->backing_mode == MIRROR_SOURCE_BACKING_CHAIN) {
>>>            BlockDriverState *backing = s->is_none_mode ? src : s->base;
>>> -        if (backing_bs(target_bs) != backing) {
>>> -            bdrv_set_backing_hd(target_bs, backing, &local_err);
>>> +        BlockDriverState *unfiltered_target =
>>> bdrv_skip_filters(target_bs);
>>> +
>>> +        if (bdrv_cow_bs(unfiltered_target) != backing) {
>>
>> I just worry about a filter node of the concurrent job right below the
>> unfiltered_target.
> Having a concurrent job on the target sounds extremely problematic in
> itself (because at least for most of the mirror job, the target isn’t in
> a consistent state).  Is that a real use case?


It might be at the TestParallelOps of iotests #30 but I am not sure now. 
I am going to apply my series with copy-on-read filter for the stream 
job above this one and will see then.

Andrey


>
>> The filter has unfiltered_target in its parent list.
>> Will that filter node be replaced correctly then?
> I’m also not quite sure what you mean.  We need to attach the source’s
> backing chain to the target here, so we go down to the first node that
> might accept COW backing files (by invoking bdrv_skip_filters()).  That
> should be correct no matter what kind of filters are on it.


I ment when a filter is removed with the bdrv_replace_node() afterwards. 
As I mentioned above, I am going to test the case later.

Andrey


>>> +        /*
>>> +         * The topmost node with
>>> +         * bdrv_skip_filters(filtered_target) ==
>>> bdrv_skip_filters(target)
>>> +         */
>>> +        filtered_target = bdrv_cow_bs(bdrv_find_overlay(bs, target));
>>> +
>>> +        assert(bdrv_skip_filters(filtered_target) ==
>>> +               bdrv_skip_filters(target));
>>> +
>>> +        /*
>>> +         * XXX BLK_PERM_WRITE needs to be allowed so we don't block
>>> +         * ourselves at s->base (if writes are blocked for a node,
>>> they are
>>> +         * also blocked for its backing file). The other options
>>> would be a
>>> +         * second filter driver above s->base (== target).
>>> +         */
>>> +        iter_shared_perms = BLK_PERM_WRITE_UNCHANGED | BLK_PERM_WRITE;
>>> +
>>> +        for (iter = bdrv_filter_or_cow_bs(bs); iter != target;
>>> +             iter = bdrv_filter_or_cow_bs(iter))
>>> +        {
>>> +            if (iter == filtered_target) {
>>
>> For one filter node only?
> No, iter_shared_perms is never reset, so it retains the
> BLK_PERM_CONSISTENT_READ flag until the end of the loop.


Yes, that's right. Clear.

Andrey


>
>>> +                /*
>>> +                 * From here on, all nodes are filters on the base.
>>> +                 * This allows us to share BLK_PERM_CONSISTENT_READ.
>>> +                 */
>>> +                iter_shared_perms |= BLK_PERM_CONSISTENT_READ;
>>> +            }
>>> +
>>>                ret = block_job_add_bdrv(&s->common, "intermediate
>>> node", iter, 0,
>>> -                                     BLK_PERM_WRITE_UNCHANGED |
>>> BLK_PERM_WRITE,
>>> -                                     errp);
>>> +                                     iter_shared_perms, errp);
>>>                if (ret < 0) {
>>>                    goto fail;
>>>                }
>> ...
>>> @@ -3042,6 +3053,7 @@ void qmp_drive_mirror(DriveMirror *arg, Error
>>> **errp)
>>>                                 " named node of the graph");
>>>                goto out;
>>>            }
>>> +        replaces_node_name = arg->replaces;
>>
>> What is the idea behind the variables substitution?
> Looks like a remnant from v6, where there was an
>
> if (arg->has_replaces) {
>      ...
>      replaces_node_name = arg->replaces;
> } else if (unfiltered_bs != bs) {
>      replaces_node_name = unfiltered_bs->node_name;
> }
>
> But I moved that logic to blockdev_mirror_common() in this version.
>
> So it’s just useless now and replaces_node_name shouldn’t exist.
>
> Max
>


^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [PATCH v7 35/47] commit: Deal with filters
  2020-07-23 17:15   ` Andrey Shinkevich
@ 2020-07-24 10:36     ` Andrey Shinkevich
  0 siblings, 0 replies; 173+ messages in thread
From: Andrey Shinkevich @ 2020-07-24 10:36 UTC (permalink / raw)
  To: Max Reitz, qemu-block
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, qemu-devel

On 23.07.2020 20:15, Andrey Shinkevich wrote:
> On 25.06.2020 18:22, Max Reitz wrote:
>> This includes some permission limiting (for example, we only need to
>> take the RESIZE permission if the base is smaller than the top).
>>
>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>> ---
>>   block/block-backend.c          |  9 +++-
>>   block/commit.c                 | 96 +++++++++++++++++++++++++---------
>>   block/monitor/block-hmp-cmds.c |  2 +-
>>   blockdev.c                     |  4 +-
>>   4 files changed, 81 insertions(+), 30 deletions(-)
>>
> ...
>> +    /*
>> +     * Block all nodes between top and base, because they will
>> +     * disappear from the chain after this operation.
>> +     * Note that this assumes that the user is fine with removing all
>> +     * nodes (including R/W filters) between top and base. Assuring
>> +     * this is the responsibility of the interface (i.e. whoever calls
>> +     * commit_start()).
>> +     */
>> +    s->base_overlay = bdrv_find_overlay(top, base);
>> +    assert(s->base_overlay);
>> +
>> +    /*
>> +     * The topmost node with
>> +     * bdrv_skip_filters(filtered_base) == bdrv_skip_filters(base)
>> +     */
>> +    filtered_base = bdrv_cow_bs(s->base_overlay);
>> +    assert(bdrv_skip_filters(filtered_base) == 
>> bdrv_skip_filters(base));
>> +
>> +    /*
>> +     * XXX BLK_PERM_WRITE needs to be allowed so we don't block 
>> ourselves
>> +     * at s->base (if writes are blocked for a node, they are also 
>> blocked
>> +     * for its backing file). The other options would be a second 
>> filter
>> +     * driver above s->base.
>> +     */
>> +    iter_shared_perms = BLK_PERM_WRITE_UNCHANGED | BLK_PERM_WRITE;
>> +
>> +    for (iter = top; iter != base; iter = 
>> bdrv_filter_or_cow_bs(iter)) {
>> +        if (iter == filtered_base) {
>
>
> The question same to mirroring:
>
> in case of multiple filters, one above another, the permission is 
> extended for the filtered_base only.
>
> Andrey
>

The question has been answered already.

Andrey


>
>> +            /*
>> +             * From here on, all nodes are filters on the base.  This
>> +             * allows us to share BLK_PERM_CONSISTENT_READ.
>> +             */
>> +            iter_shared_perms |= BLK_PERM_CONSISTENT_READ;
>> +        }
>> +
>>           ret = block_job_add_bdrv(&s->common, "intermediate node", 
>> iter, 0,
>> -                                 BLK_PERM_WRITE_UNCHANGED | 
>> BLK_PERM_WRITE,
>> -                                 errp);
>> +                                 iter_shared_perms, errp);
>>           if (ret < 0) {
>>               goto fail;
>>           }
>
> ...
>
> Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
>
>
>


^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [PATCH v7 29/47] blockdev: Use CAF in external_snapshot_prepare()
  2020-07-24  9:23     ` Max Reitz
@ 2020-07-24 10:37       ` Andrey Shinkevich
  0 siblings, 0 replies; 173+ messages in thread
From: Andrey Shinkevich @ 2020-07-24 10:37 UTC (permalink / raw)
  To: Max Reitz, qemu-block
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, qemu-devel

On 24.07.2020 12:23, Max Reitz wrote:
> On 20.07.20 18:08, Andrey Shinkevich wrote:
>> On 25.06.2020 18:21, Max Reitz wrote:
>>> This allows us to differentiate between filters and nodes with COW
>>> backing files: Filters cannot be used as overlays at all (for this
>>> function).
>>>
>>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>>> ---
>>>    blockdev.c | 7 ++++++-
>>>    1 file changed, 6 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/blockdev.c b/blockdev.c
>>> index 1eb0fcdea2..aabe51036d 100644
>>> --- a/blockdev.c
>>> +++ b/blockdev.c
>>> @@ -1549,7 +1549,12 @@ static void
>>> external_snapshot_prepare(BlkActionState *common,
>>>            goto out;
>>>        }
>>>    -    if (state->new_bs->backing != NULL) {
>>> +    if (state->new_bs->drv->is_filter) {
>>
>> Is there a chance to get a filter here? If so, is that when a user
>> specifies the file name of such a kind “filter[filter-name]:foo.qcow2”
>> or somehow else?
> It would be with blockdev-snapshot and by specifying a filter for
> @overlay.  Technically that’s already caught by the check whether the
> overlay supports backing images (whether drv->supports_backing is true),
> but we might as well give a nicer error message.
>
> Example:
>
> {"execute":"qmp_capabilities"}
>
> {"execute":"blockdev-add","arguments":
>   {"node-name":"overlay","driver":"copy-on-read",
>    "file":{"driver":"null-co"}}}
>
> {"execute":"blockdev-add","arguments":
>   {"node-name":"base","driver":"null-co"}}
>
> {"execute":"blockdev-snapshot","arguments":
>   {"node":"base","overlay":"overlay"}}
>
> Max
>

Thank you for the example.

Andrey



^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [PATCH v7 37/47] qemu-img: Use child access functions
  2020-06-25 15:22 ` [PATCH v7 37/47] qemu-img: Use child access functions Max Reitz
@ 2020-07-24 15:51   ` Andrey Shinkevich
  2020-08-21 15:29   ` Kevin Wolf
  1 sibling, 0 replies; 173+ messages in thread
From: Andrey Shinkevich @ 2020-07-24 15:51 UTC (permalink / raw)
  To: Max Reitz, qemu-block
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, qemu-devel

On 25.06.2020 18:22, Max Reitz wrote:
> This changes iotest 204's output, because blkdebug on top of a COW node
> used to make qemu-img map disregard the rest of the backing chain (the
> backing chain was broken by the filter).  With this patch, the
> allocation in the base image is reported correctly.
>
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>   qemu-img.c                 | 36 ++++++++++++++++++++++--------------
>   tests/qemu-iotests/204.out |  1 +
>   2 files changed, 23 insertions(+), 14 deletions(-)
>
> diff --git a/qemu-img.c b/qemu-img.c
> index 381271a74e..947be6ffac 100644
> --- a/qemu-img.c
> +++ b/qemu-img.c
> @@ -1089,7 +1089,7 @@ static int img_commit(int argc, char **argv)
>           /* This is different from QMP, which by default uses the deepest file in
>            * the backing chain (i.e., the very base); however, the traditional
>            * behavior of qemu-img commit is using the immediate backing file. */
> -        base_bs = backing_bs(bs);
> +        base_bs = bdrv_backing_chain_next(bs);
>           if (!base_bs) {
>               error_setg(&local_err, "Image does not have a backing file");
>               goto done;
> @@ -1737,18 +1737,20 @@ static int convert_iteration_sectors(ImgConvertState *s, int64_t sector_num)
>       if (s->sector_next_status <= sector_num) {
>           uint64_t offset = (sector_num - src_cur_offset) * BDRV_SECTOR_SIZE;
>           int64_t count;
> +        BlockDriverState *src_bs = blk_bs(s->src[src_cur]);
> +        BlockDriverState *base;
> +
> +        if (s->target_has_backing) {
> +            base = bdrv_cow_bs(bdrv_skip_filters(src_bs));
> +        } else {
> +            base = NULL;
> +        }
>   
>           do {
>               count = n * BDRV_SECTOR_SIZE;
>   
> -            if (s->target_has_backing) {
> -                ret = bdrv_block_status(blk_bs(s->src[src_cur]), offset,
> -                                        count, &count, NULL, NULL);
> -            } else {
> -                ret = bdrv_block_status_above(blk_bs(s->src[src_cur]), NULL,
> -                                              offset, count, &count, NULL,
> -                                              NULL);
> -            }
> +            ret = bdrv_block_status_above(src_bs, base, offset, count, &count,
> +                                          NULL, NULL);
>   
>               if (ret < 0) {
>                   if (s->salvage) {
> @@ -2673,7 +2675,8 @@ static int img_convert(int argc, char **argv)
>            * s.target_backing_sectors has to be negative, which it will
>            * be automatically).  The backing file length is used only
>            * for optimizations, so such a case is not fatal. */
> -        s.target_backing_sectors = bdrv_nb_sectors(out_bs->backing->bs);
> +        s.target_backing_sectors =
> +            bdrv_nb_sectors(bdrv_backing_chain_next(out_bs));
>       } else {
>           s.target_backing_sectors = -1;
>       }
> @@ -3044,6 +3047,7 @@ static int get_block_status(BlockDriverState *bs, int64_t offset,
>   
>       depth = 0;
>       for (;;) {
> +        bs = bdrv_skip_filters(bs);
>           ret = bdrv_block_status(bs, offset, bytes, &bytes, &map, &file);
>           if (ret < 0) {
>               return ret;
> @@ -3052,7 +3056,7 @@ static int get_block_status(BlockDriverState *bs, int64_t offset,
>           if (ret & (BDRV_BLOCK_ZERO|BDRV_BLOCK_DATA)) {
>               break;
>           }
> -        bs = backing_bs(bs);
> +        bs = bdrv_cow_bs(bs);
>           if (bs == NULL) {
>               ret = 0;
>               break;
> @@ -3437,6 +3441,7 @@ static int img_rebase(int argc, char **argv)
>       uint8_t *buf_old = NULL;
>       uint8_t *buf_new = NULL;
>       BlockDriverState *bs = NULL, *prefix_chain_bs = NULL;
> +    BlockDriverState *unfiltered_bs;
>       char *filename;
>       const char *fmt, *cache, *src_cache, *out_basefmt, *out_baseimg;
>       int c, flags, src_flags, ret;
> @@ -3571,6 +3576,8 @@ static int img_rebase(int argc, char **argv)
>       }
>       bs = blk_bs(blk);
>   
> +    unfiltered_bs = bdrv_skip_filters(bs);
> +
>       if (out_basefmt != NULL) {
>           if (bdrv_find_format(out_basefmt) == NULL) {
>               error_report("Invalid format name: '%s'", out_basefmt);
> @@ -3582,7 +3589,7 @@ static int img_rebase(int argc, char **argv)
>       /* For safe rebasing we need to compare old and new backing file */
>       if (!unsafe) {
>           QDict *options = NULL;
> -        BlockDriverState *base_bs = backing_bs(bs);
> +        BlockDriverState *base_bs = bdrv_cow_bs(unfiltered_bs);
>   
>           if (base_bs) {
>               blk_old_backing = blk_new(qemu_get_aio_context(),
> @@ -3738,8 +3745,9 @@ static int img_rebase(int argc, char **argv)
>                    * If cluster wasn't changed since prefix_chain, we don't need
>                    * to take action
>                    */
> -                ret = bdrv_is_allocated_above(backing_bs(bs), prefix_chain_bs,
> -                                              false, offset, n, &n);
> +                ret = bdrv_is_allocated_above(bdrv_cow_bs(unfiltered_bs),
> +                                              prefix_chain_bs, false,
> +                                              offset, n, &n);
>                   if (ret < 0) {
>                       error_report("error while reading image metadata: %s",
>                                    strerror(-ret));
> diff --git a/tests/qemu-iotests/204.out b/tests/qemu-iotests/204.out
> index f3a10fbe90..684774d763 100644
> --- a/tests/qemu-iotests/204.out
> +++ b/tests/qemu-iotests/204.out
> @@ -59,5 +59,6 @@ Offset          Length          File
>   0x900000        0x2400000       TEST_DIR/t.IMGFMT
>   0x3c00000       0x1100000       TEST_DIR/t.IMGFMT
>   0x6a00000       0x400000        TEST_DIR/t.IMGFMT
> +0x6e00000       0x1200000       TEST_DIR/t.IMGFMT.base
>   No errors were found on the image.
>   *** done


Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>




^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [PATCH v7 38/47] block: Drop backing_bs()
  2020-06-25 15:22 ` [PATCH v7 38/47] block: Drop backing_bs() Max Reitz
@ 2020-07-24 15:55   ` Andrey Shinkevich
  0 siblings, 0 replies; 173+ messages in thread
From: Andrey Shinkevich @ 2020-07-24 15:55 UTC (permalink / raw)
  To: Max Reitz, qemu-block
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, qemu-devel

On 25.06.2020 18:22, Max Reitz wrote:
> We want to make it explicit where bs->backing is used, and we have done
> so.  The old role of backing_bs() is now effectively taken by
> bdrv_cow_bs().
>
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>   include/block/block_int.h | 5 -----
>   1 file changed, 5 deletions(-)
>
> diff --git a/include/block/block_int.h b/include/block/block_int.h
> index c963ee9f28..6e09e15ed4 100644
> --- a/include/block/block_int.h
> +++ b/include/block/block_int.h
> @@ -991,11 +991,6 @@ typedef enum BlockMirrorBackingMode {
>       MIRROR_LEAVE_BACKING_CHAIN,
>   } BlockMirrorBackingMode;
>   
> -static inline BlockDriverState *backing_bs(BlockDriverState *bs)
> -{
> -    return bs->backing ? bs->backing->bs : NULL;
> -}
> -
>   
>   /* Essential block drivers which must always be statically linked into qemu, and
>    * which therefore can be accessed without using bdrv_find_format() */


Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>




^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [PATCH v7 40/47] block: Inline bdrv_co_block_status_from_*()
  2020-06-25 15:22 ` [PATCH v7 40/47] block: Inline bdrv_co_block_status_from_*() Max Reitz
@ 2020-07-24 18:00   ` Andrey Shinkevich
  0 siblings, 0 replies; 173+ messages in thread
From: Andrey Shinkevich @ 2020-07-24 18:00 UTC (permalink / raw)
  To: Max Reitz, qemu-block
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, qemu-devel

On 25.06.2020 18:22, Max Reitz wrote:
> With bdrv_filter_bs(), we can easily handle this default filter behavior
> in bdrv_co_block_status().
>
> blkdebug wants to have an additional assertion, so it keeps its own
> implementation, except bdrv_co_block_status_from_file() needs to be
> inlined there.
>
> Suggested-by: Eric Blake <eblake@redhat.com>
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>   include/block/block_int.h | 23 ------------------
>   block/backup-top.c        |  2 --
>   block/blkdebug.c          |  7 ++++--
>   block/blklogwrites.c      |  1 -
>   block/commit.c            |  1 -
>   block/copy-on-read.c      |  2 --
>   block/filter-compress.c   |  2 --
>   block/io.c                | 51 +++++++++++++--------------------------
>   block/mirror.c            |  1 -
>   block/throttle.c          |  1 -
>   10 files changed, 22 insertions(+), 69 deletions(-)
>
> diff --git a/include/block/block_int.h b/include/block/block_int.h
> index 6e09e15ed4..e5a328c389 100644
> --- a/include/block/block_int.h
> +++ b/include/block/block_int.h
> @@ -1291,29 +1291,6 @@ void bdrv_default_perms(BlockDriverState *bs, BdrvChild *c,
>                           uint64_t perm, uint64_t shared,
>                           uint64_t *nperm, uint64_t *nshared);
>   
> -/*
> - * Default implementation for drivers to pass bdrv_co_block_status() to
> - * their file.
> - */
> -int coroutine_fn bdrv_co_block_status_from_file(BlockDriverState *bs,
> -                                                bool want_zero,
> -                                                int64_t offset,
> -                                                int64_t bytes,
> -                                                int64_t *pnum,
> -                                                int64_t *map,
> -                                                BlockDriverState **file);
> -/*
> - * Default implementation for drivers to pass bdrv_co_block_status() to
> - * their backing file.
> - */
> -int coroutine_fn bdrv_co_block_status_from_backing(BlockDriverState *bs,
> -                                                   bool want_zero,
> -                                                   int64_t offset,
> -                                                   int64_t bytes,
> -                                                   int64_t *pnum,
> -                                                   int64_t *map,
> -                                                   BlockDriverState **file);
> -
>   int64_t bdrv_sum_allocated_file_size(BlockDriverState *bs);
>   int64_t bdrv_primary_allocated_file_size(BlockDriverState *bs);
>   int64_t bdrv_notsup_allocated_file_size(BlockDriverState *bs);
> diff --git a/block/backup-top.c b/block/backup-top.c
> index 89bd3937d0..bf5fc22fc7 100644
> --- a/block/backup-top.c
> +++ b/block/backup-top.c
> @@ -185,8 +185,6 @@ BlockDriver bdrv_backup_top_filter = {
>       .bdrv_co_pwritev_compressed = backup_top_co_pwritev_compressed,
>       .bdrv_co_flush              = backup_top_co_flush,
>   
> -    .bdrv_co_block_status       = bdrv_co_block_status_from_backing,
> -
>       .bdrv_refresh_filename      = backup_top_refresh_filename,
>   
>       .bdrv_child_perm            = backup_top_child_perm,
> diff --git a/block/blkdebug.c b/block/blkdebug.c
> index 7194bc7f06..cf78d8809e 100644
> --- a/block/blkdebug.c
> +++ b/block/blkdebug.c
> @@ -757,8 +757,11 @@ static int coroutine_fn blkdebug_co_block_status(BlockDriverState *bs,
>           return err;
>       }
>   
> -    return bdrv_co_block_status_from_file(bs, want_zero, offset, bytes,
> -                                          pnum, map, file);
> +    assert(bs->file && bs->file->bs);
> +    *pnum = bytes;
> +    *map = offset;
> +    *file = bs->file->bs;
> +    return BDRV_BLOCK_RAW | BDRV_BLOCK_OFFSET_VALID;
>   }
>   
>   static void blkdebug_close(BlockDriverState *bs)
> diff --git a/block/blklogwrites.c b/block/blklogwrites.c
> index 6753bd9a3e..c6b2711fe5 100644
> --- a/block/blklogwrites.c
> +++ b/block/blklogwrites.c
> @@ -517,7 +517,6 @@ static BlockDriver bdrv_blk_log_writes = {
>       .bdrv_co_pwrite_zeroes  = blk_log_writes_co_pwrite_zeroes,
>       .bdrv_co_flush_to_disk  = blk_log_writes_co_flush_to_disk,
>       .bdrv_co_pdiscard       = blk_log_writes_co_pdiscard,
> -    .bdrv_co_block_status   = bdrv_co_block_status_from_file,
>   
>       .is_filter              = true,
>       .strong_runtime_opts    = blk_log_writes_strong_runtime_opts,
> diff --git a/block/commit.c b/block/commit.c
> index 4122b6736d..ea9282daea 100644
> --- a/block/commit.c
> +++ b/block/commit.c
> @@ -238,7 +238,6 @@ static void bdrv_commit_top_child_perm(BlockDriverState *bs, BdrvChild *c,
>   static BlockDriver bdrv_commit_top = {
>       .format_name                = "commit_top",
>       .bdrv_co_preadv             = bdrv_commit_top_preadv,
> -    .bdrv_co_block_status       = bdrv_co_block_status_from_backing,
>       .bdrv_refresh_filename      = bdrv_commit_top_refresh_filename,
>       .bdrv_child_perm            = bdrv_commit_top_child_perm,
>   
> diff --git a/block/copy-on-read.c b/block/copy-on-read.c
> index a6a864f147..2816e61afe 100644
> --- a/block/copy-on-read.c
> +++ b/block/copy-on-read.c
> @@ -146,8 +146,6 @@ static BlockDriver bdrv_copy_on_read = {
>       .bdrv_eject                         = cor_eject,
>       .bdrv_lock_medium                   = cor_lock_medium,
>   
> -    .bdrv_co_block_status               = bdrv_co_block_status_from_file,
> -
>       .has_variable_length                = true,
>       .is_filter                          = true,
>   };
> diff --git a/block/filter-compress.c b/block/filter-compress.c
> index 8ec1991c1f..5136371bf8 100644
> --- a/block/filter-compress.c
> +++ b/block/filter-compress.c
> @@ -146,8 +146,6 @@ static BlockDriver bdrv_compress = {
>       .bdrv_eject                         = compress_eject,
>       .bdrv_lock_medium                   = compress_lock_medium,
>   
> -    .bdrv_co_block_status               = bdrv_co_block_status_from_file,
> -
>       .has_variable_length                = true,
>       .is_filter                          = true,
>   };
> diff --git a/block/io.c b/block/io.c
> index 9e802804bb..e2196d438c 100644
> --- a/block/io.c
> +++ b/block/io.c
> @@ -2253,36 +2253,6 @@ typedef struct BdrvCoBlockStatusData {
>       BlockDriverState **file;
>   } BdrvCoBlockStatusData;
>   
> -int coroutine_fn bdrv_co_block_status_from_file(BlockDriverState *bs,
> -                                                bool want_zero,
> -                                                int64_t offset,
> -                                                int64_t bytes,
> -                                                int64_t *pnum,
> -                                                int64_t *map,
> -                                                BlockDriverState **file)
> -{
> -    assert(bs->file && bs->file->bs);
> -    *pnum = bytes;
> -    *map = offset;
> -    *file = bs->file->bs;
> -    return BDRV_BLOCK_RAW | BDRV_BLOCK_OFFSET_VALID;
> -}
> -
> -int coroutine_fn bdrv_co_block_status_from_backing(BlockDriverState *bs,
> -                                                   bool want_zero,
> -                                                   int64_t offset,
> -                                                   int64_t bytes,
> -                                                   int64_t *pnum,
> -                                                   int64_t *map,
> -                                                   BlockDriverState **file)
> -{
> -    assert(bs->backing && bs->backing->bs);
> -    *pnum = bytes;
> -    *map = offset;
> -    *file = bs->backing->bs;
> -    return BDRV_BLOCK_RAW | BDRV_BLOCK_OFFSET_VALID;
> -}
> -
>   /*
>    * Returns the allocation status of the specified sectors.
>    * Drivers not implementing the functionality are assumed to not support
> @@ -2323,6 +2293,7 @@ static int coroutine_fn bdrv_co_block_status(BlockDriverState *bs,
>       BlockDriverState *local_file = NULL;
>       int64_t aligned_offset, aligned_bytes;
>       uint32_t align;
> +    bool has_filtered_child;
>   
>       assert(pnum);
>       *pnum = 0;
> @@ -2348,7 +2319,8 @@ static int coroutine_fn bdrv_co_block_status(BlockDriverState *bs,
>   
>       /* Must be non-NULL or bdrv_getlength() would have failed */
>       assert(bs->drv);
> -    if (!bs->drv->bdrv_co_block_status) {
> +    has_filtered_child = bdrv_filter_child(bs);
> +    if (!bs->drv->bdrv_co_block_status && !has_filtered_child) {
>           *pnum = bytes;
>           ret = BDRV_BLOCK_DATA | BDRV_BLOCK_ALLOCATED;
>           if (offset + bytes == total_size) {
> @@ -2369,9 +2341,20 @@ static int coroutine_fn bdrv_co_block_status(BlockDriverState *bs,
>       aligned_offset = QEMU_ALIGN_DOWN(offset, align);
>       aligned_bytes = ROUND_UP(offset + bytes, align) - aligned_offset;
>   
> -    ret = bs->drv->bdrv_co_block_status(bs, want_zero, aligned_offset,
> -                                        aligned_bytes, pnum, &local_map,
> -                                        &local_file);
> +    if (bs->drv->bdrv_co_block_status) {
> +        ret = bs->drv->bdrv_co_block_status(bs, want_zero, aligned_offset,
> +                                            aligned_bytes, pnum, &local_map,
> +                                            &local_file);
> +    } else {
> +        /* Default code for filters */
> +
> +        local_file = bdrv_filter_bs(bs);
> +        assert(local_file);
> +
> +        *pnum = aligned_bytes;
> +        local_map = aligned_offset;
> +        ret = BDRV_BLOCK_RAW | BDRV_BLOCK_OFFSET_VALID;
> +    }
>       if (ret < 0) {
>           *pnum = 0;
>           goto out;
> diff --git a/block/mirror.c b/block/mirror.c
> index 770de3b34e..5a9e42e488 100644
> --- a/block/mirror.c
> +++ b/block/mirror.c
> @@ -1541,7 +1541,6 @@ static BlockDriver bdrv_mirror_top = {
>       .bdrv_co_pdiscard           = bdrv_mirror_top_pdiscard,
>       .bdrv_co_pwritev_compressed = bdrv_mirror_top_pwritev_compressed,
>       .bdrv_co_flush              = bdrv_mirror_top_flush,
> -    .bdrv_co_block_status       = bdrv_co_block_status_from_backing,
>       .bdrv_refresh_filename      = bdrv_mirror_top_refresh_filename,
>       .bdrv_child_perm            = bdrv_mirror_top_child_perm,
>   
> diff --git a/block/throttle.c b/block/throttle.c
> index f6e619aca2..473ea758df 100644
> --- a/block/throttle.c
> +++ b/block/throttle.c
> @@ -263,7 +263,6 @@ static BlockDriver bdrv_throttle = {
>       .bdrv_reopen_prepare                =   throttle_reopen_prepare,
>       .bdrv_reopen_commit                 =   throttle_reopen_commit,
>       .bdrv_reopen_abort                  =   throttle_reopen_abort,
> -    .bdrv_co_block_status               =   bdrv_co_block_status_from_file,
>   
>       .bdrv_co_drain_begin                =   throttle_co_drain_begin,
>       .bdrv_co_drain_end                  =   throttle_co_drain_end,


Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>




^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [PATCH v7 41/47] block: Leave BDS.backing_file constant
  2020-06-25 15:22 ` [PATCH v7 41/47] block: Leave BDS.backing_file constant Max Reitz
@ 2020-07-27 12:27   ` Andrey Shinkevich
  2020-07-28 14:10     ` Max Reitz
  2020-08-24 13:14   ` Kevin Wolf
  1 sibling, 1 reply; 173+ messages in thread
From: Andrey Shinkevich @ 2020-07-27 12:27 UTC (permalink / raw)
  To: Max Reitz, qemu-block
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, qemu-devel

On 25.06.2020 18:22, Max Reitz wrote:
> Parts of the block layer treat BDS.backing_file as if it were whatever
> the image header says (i.e., if it is a relative path, it is relative to
> the overlay), other parts treat it like a cache for
> bs->backing->bs->filename (relative paths are relative to the CWD).
> Considering bs->backing->bs->filename exists, let us make it mean the
> former.
>
> Among other things, this now allows the user to specify a base when
> using qemu-img to commit an image file in a directory that is not the
> CWD (assuming, everything uses relative filenames).
>
> Before this patch:
>
> $ ./qemu-img create -f qcow2 foo/bot.qcow2 1M
> $ ./qemu-img create -f qcow2 -b bot.qcow2 foo/mid.qcow2
> $ ./qemu-img create -f qcow2 -b mid.qcow2 foo/top.qcow2
> $ ./qemu-img commit -b mid.qcow2 foo/top.qcow2
> qemu-img: Did not find 'mid.qcow2' in the backing chain of 'foo/top.qcow2'
> $ ./qemu-img commit -b foo/mid.qcow2 foo/top.qcow2
> qemu-img: Did not find 'foo/mid.qcow2' in the backing chain of 'foo/top.qcow2'
> $ ./qemu-img commit -b $PWD/foo/mid.qcow2 foo/top.qcow2
> qemu-img: Did not find '[...]/foo/mid.qcow2' in the backing chain of 'foo/top.qcow2'
>
> After this patch:
>
> $ ./qemu-img commit -b mid.qcow2 foo/top.qcow2
> Image committed.
> $ ./qemu-img commit -b foo/mid.qcow2 foo/top.qcow2
> qemu-img: Did not find 'foo/mid.qcow2' in the backing chain of 'foo/top.qcow2'
> $ ./qemu-img commit -b $PWD/foo/mid.qcow2 foo/top.qcow2
> Image committed.
>
> With this change, bdrv_find_backing_image() must look at whether the
> user has overridden a BDS's backing file.  If so, it can no longer use
> bs->backing_file, but must instead compare the given filename against
> the backing node's filename directly.
>
> Note that this changes the QAPI output for a node's backing_file.  We
> had very inconsistent output there (sometimes what the image header
> said, sometimes the actual filename of the backing image).  This
> inconsistent output was effectively useless, so we have to decide one
> way or the other.  Considering that bs->backing_file usually at runtime
> contained the path to the image relative to qemu's CWD (or absolute),
> this patch changes QAPI's backing_file to always report the
> bs->backing->bs->filename from now on.  If you want to receive the image
> header information, you have to refer to full-backing-filename.
>
> This necessitates a change to iotest 228.  The interesting information
> it really wanted is the image header, and it can get that now, but it
> has to use full-backing-filename instead of backing_file.  Because of
> this patch's changes to bs->backing_file's behavior, we also need some
> reference output changes.
>
> Along with the changes to bs->backing_file, stop updating
> BDS.backing_format in bdrv_backing_attach() as well.  In order not to
> change our externally visible behavior (incompatibly), we have to let
> bdrv_query_image_info() try to get the image format from bs->backing if
> bs->backing_format is unset.  (The QAPI schema describes
> backing-filename-format as "the format of the backing file", so it is
> not necessarily what the image header says, but just the format of the
> file referenced by backing-filename (if known).)
>
> iotest 245 changes in behavior: With the backing node no longer
> overriding the parent node's backing_file string, you can now omit the
> @backing option when reopening a node with neither a default nor a
> current backing file even if it used to have a backing node at some
> point.
>
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>   include/block/block_int.h  | 21 ++++++++++++++++-----
>   block.c                    | 35 +++++++++++++++++++++++++++--------
>   block/qapi.c               | 17 +++++++++++++----
>   tests/qemu-iotests/228     |  6 +++---
>   tests/qemu-iotests/228.out |  6 +++---
>   tests/qemu-iotests/245     |  4 +++-
>   6 files changed, 65 insertions(+), 24 deletions(-)
>
...
> diff --git a/block/qapi.c b/block/qapi.c
> index 2628323b63..5da6d7e6e0 100644
> --- a/block/qapi.c
> +++ b/block/qapi.c
> @@ -47,7 +47,7 @@ BlockDeviceInfo *bdrv_block_device_info(BlockBackend *blk,
>                                           Error **errp)
>   {
>       ImageInfo **p_image_info;
> -    BlockDriverState *bs0;
> +    BlockDriverState *bs0, *backing;
>       BlockDeviceInfo *info;
>   
>       if (!bs->drv) {
> @@ -76,9 +76,10 @@ BlockDeviceInfo *bdrv_block_device_info(BlockBackend *blk,
>           info->node_name = g_strdup(bs->node_name);
>       }
>   
> -    if (bs->backing_file[0]) {
> +    backing = bdrv_cow_bs(bs);
> +    if (backing) {
>           info->has_backing_file = true;
> -        info->backing_file = g_strdup(bs->backing_file);
> +        info->backing_file = g_strdup(backing->filename);
>       }
>   
>       if (!QLIST_EMPTY(&bs->dirty_bitmaps)) {
> @@ -314,6 +315,8 @@ void bdrv_query_image_info(BlockDriverState *bs,
>       backing_filename = bs->backing_file;
>       if (backing_filename[0] != '\0') {
>           char *backing_filename2;
> +        const char *backing_format = NULL;
> +
>           info->backing_filename = g_strdup(backing_filename);
>           info->has_backing_filename = true;
>           backing_filename2 = bdrv_get_full_backing_filename(bs, NULL);
> @@ -326,7 +329,13 @@ void bdrv_query_image_info(BlockDriverState *bs,
>           }
>   
>           if (bs->backing_format[0]) {
> -            info->backing_filename_format = g_strdup(bs->backing_format);
> +            backing_format = bs->backing_format;
> +        } else if (bs->backing && bs->backing->bs->drv &&
> +                   !bdrv_backing_overridden(bs)) {
> +            backing_format = bs->backing->bs->drv->format_name;
> +        }


In case bdrv_backing_overridden() returns true , should we invoke 
bdrv_refresh_filename() and assign the format_name then?

Andrey


> +        if (backing_format) {
> +            info->backing_filename_format = g_strdup(backing_format);
>               info->has_backing_filename_format = true;
>           }
>           g_free(backing_filename2);

...


Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>




^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [PATCH v7 42/47] iotests: Test that qcow2's data-file is flushed
  2020-06-25 15:22 ` [PATCH v7 42/47] iotests: Test that qcow2's data-file is flushed Max Reitz
@ 2020-07-27 13:28   ` Andrey Shinkevich
  0 siblings, 0 replies; 173+ messages in thread
From: Andrey Shinkevich @ 2020-07-27 13:28 UTC (permalink / raw)
  To: Max Reitz, qemu-block
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, qemu-devel

On 25.06.2020 18:22, Max Reitz wrote:
> Flushing a qcow2 node must lead to the data-file node being flushed as
> well.
>
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>   tests/qemu-iotests/244     | 49 ++++++++++++++++++++++++++++++++++++++
>   tests/qemu-iotests/244.out |  7 ++++++
>   2 files changed, 56 insertions(+)
>
> diff --git a/tests/qemu-iotests/244 b/tests/qemu-iotests/244
> index efe3c0428b..f2b2dddf1c 100755
> --- a/tests/qemu-iotests/244
> +++ b/tests/qemu-iotests/244
> @@ -217,6 +217,55 @@ $QEMU_IMG amend -f $IMGFMT -o "data_file=blkdebug::$TEST_IMG.data" "$TEST_IMG"
>   $QEMU_IMG convert -f $IMGFMT -O $IMGFMT -n -C "$TEST_IMG.src" "$TEST_IMG"
>   $QEMU_IMG compare -f $IMGFMT -F $IMGFMT "$TEST_IMG.src" "$TEST_IMG"
>   
> +echo
> +echo "=== Flushing should flush the data file ==="
> +echo
> +
> +# We are going to flush a qcow2 file with a blkdebug node inserted
> +# between the qcow2 node and its data file node.  The blkdebug node
> +# will return an error for all flushes and so we if the data file is
> +# flushed, we will see qemu-io return an error.
> +
> +# We need to write something or the flush will not do anything; we
> +# also need -t writeback so the write is not done as a FUA write
> +# (which would then fail thanks to the implicit flush)
> +$QEMU_IO -c 'write 0 512' -c flush \
> +    -t writeback \
> +    "json:{
> +         'driver': 'qcow2',
> +         'file': {
> +             'driver': 'file',
> +             'filename': '$TEST_IMG'
> +         },
> +         'data-file': {
> +             'driver': 'blkdebug',
> +             'inject-error': [{
> +                 'event': 'none',
> +                 'iotype': 'flush'
> +             }],
> +             'image': {
> +                 'driver': 'file',
> +                 'filename': '$TEST_IMG.data'
> +             }
> +         }
> +     }" \
> +    | _filter_qemu_io
> +
> +result=${PIPESTATUS[0]}
> +echo
> +
> +case $result in
> +    0)
> +        echo "ERROR: qemu-io succeeded, so the data file was not flushed"
> +        ;;
> +    1)
> +        echo "Success: qemu-io failed, so the data file was flushed"
> +        ;;
> +    *)
> +        echo "ERROR: qemu-io returned unknown exit code $result"
> +        ;;
> +esac
> +
>   # success, all done
>   echo "*** done"
>   rm -f $seq.full
> diff --git a/tests/qemu-iotests/244.out b/tests/qemu-iotests/244.out
> index dbab7359a9..7269b4295a 100644
> --- a/tests/qemu-iotests/244.out
> +++ b/tests/qemu-iotests/244.out
> @@ -131,4 +131,11 @@ Offset          Length          Mapped to       File
>   Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=67108864 data_file=TEST_DIR/t.IMGFMT.data
>   Images are identical.
>   Images are identical.
> +
> +=== Flushing should flush the data file ===
> +
> +wrote 512/512 bytes at offset 0
> +512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
> +
> +Success: qemu-io failed, so the data file was flushed
>   *** done


Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>




^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [PATCH v7 43/47] iotests: Let complete_and_wait() work with commit
  2020-06-25 15:22 ` [PATCH v7 43/47] iotests: Let complete_and_wait() work with commit Max Reitz
@ 2020-07-27 13:35   ` Andrey Shinkevich
  0 siblings, 0 replies; 173+ messages in thread
From: Andrey Shinkevich @ 2020-07-27 13:35 UTC (permalink / raw)
  To: Max Reitz, qemu-block
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, qemu-devel

On 25.06.2020 18:22, Max Reitz wrote:
> complete_and_wait() and wait_ready() currently only work for mirror
> jobs.  Let them work for active commit jobs, too.
>
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>   tests/qemu-iotests/iotests.py | 10 +++++++---
>   1 file changed, 7 insertions(+), 3 deletions(-)
>
> diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py
> index 5ea4c4df8b..57b32d8ad3 100644
> --- a/tests/qemu-iotests/iotests.py
> +++ b/tests/qemu-iotests/iotests.py
> @@ -932,8 +932,12 @@ class QMPTestCase(unittest.TestCase):
>   
>       def wait_ready(self, drive='drive0'):
>           """Wait until a BLOCK_JOB_READY event, and return the event."""
> -        f = {'data': {'type': 'mirror', 'device': drive}}
> -        return self.vm.event_wait(name='BLOCK_JOB_READY', match=f)
> +        return self.vm.events_wait([
> +            ('BLOCK_JOB_READY',
> +             {'data': {'type': 'mirror', 'device': drive}}),
> +            ('BLOCK_JOB_READY',
> +             {'data': {'type': 'commit', 'device': drive}})
> +        ])
>   
>       def wait_ready_and_cancel(self, drive='drive0'):
>           self.wait_ready(drive=drive)
> @@ -952,7 +956,7 @@ class QMPTestCase(unittest.TestCase):
>           self.assert_qmp(result, 'return', {})
>   
>           event = self.wait_until_completed(drive=drive, error=completion_error)
> -        self.assert_qmp(event, 'data/type', 'mirror')
> +        self.assertTrue(event['data']['type'] in ['mirror', 'commit'])
>   
>       def pause_wait(self, job_id='job0'):
>           with Timeout(3, "Timeout waiting for job to pause"):


Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>




^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [PATCH v7 44/47] iotests: Add filter commit test cases
  2020-06-25 15:22 ` [PATCH v7 44/47] iotests: Add filter commit test cases Max Reitz
@ 2020-07-27 17:45   ` Andrey Shinkevich
  2020-07-28 14:00     ` Max Reitz
  0 siblings, 1 reply; 173+ messages in thread
From: Andrey Shinkevich @ 2020-07-27 17:45 UTC (permalink / raw)
  To: Max Reitz, qemu-block
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, qemu-devel

On 25.06.2020 18:22, Max Reitz wrote:
> This patch adds some tests on how commit copes with filter nodes.
>
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>   tests/qemu-iotests/040     | 177 +++++++++++++++++++++++++++++++++++++
>   tests/qemu-iotests/040.out |   4 +-
>   2 files changed, 179 insertions(+), 2 deletions(-)
>
> diff --git a/tests/qemu-iotests/040 b/tests/qemu-iotests/040
> index 32c82b4ec6..e7fa244738 100755
> --- a/tests/qemu-iotests/040
> +++ b/tests/qemu-iotests/040
> @@ -713,6 +713,183 @@ class TestErrorHandling(iotests.QMPTestCase):
>           self.assertTrue(iotests.compare_images(mid_img, backing_img, fmt2='raw'),
>                           'target image does not match source after commit')
>   
> +class TestCommitWithFilters(iotests.QMPTestCase):
> +    img0 = os.path.join(iotests.test_dir, '0.img')
> +    img1 = os.path.join(iotests.test_dir, '1.img')
> +    img2 = os.path.join(iotests.test_dir, '2.img')
> +    img3 = os.path.join(iotests.test_dir, '3.img')
> +
> +    def do_test_io(self, read_or_write):


The method defenition could be moved down after the ones of setUp() and 
tearDown().


> +        for index, pattern_file in enumerate(self.pattern_files):
> +            result = qemu_io('-f', iotests.imgfmt,
> +                             '-c', '{} -P {} {}M 1M'.format(read_or_write,
> +                                                            index + 1, index),


The Python3 format string f'{rad_or_write} ..' might be used instead of 
the .format one.

Andrey


> +                             pattern_file)
> +            self.assertFalse('Pattern verification failed' in result)
> +
> +    def setUp(self):

...


Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>




^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [PATCH v7 44/47] iotests: Add filter commit test cases
  2020-07-27 17:45   ` Andrey Shinkevich
@ 2020-07-28 14:00     ` Max Reitz
  0 siblings, 0 replies; 173+ messages in thread
From: Max Reitz @ 2020-07-28 14:00 UTC (permalink / raw)
  To: Andrey Shinkevich, qemu-block
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, qemu-devel


[-- Attachment #1.1: Type: text/plain, Size: 2323 bytes --]

On 27.07.20 19:45, Andrey Shinkevich wrote:
> On 25.06.2020 18:22, Max Reitz wrote:
>> This patch adds some tests on how commit copes with filter nodes.
>>
>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>> ---
>>   tests/qemu-iotests/040     | 177 +++++++++++++++++++++++++++++++++++++
>>   tests/qemu-iotests/040.out |   4 +-
>>   2 files changed, 179 insertions(+), 2 deletions(-)
>>
>> diff --git a/tests/qemu-iotests/040 b/tests/qemu-iotests/040
>> index 32c82b4ec6..e7fa244738 100755
>> --- a/tests/qemu-iotests/040
>> +++ b/tests/qemu-iotests/040
>> @@ -713,6 +713,183 @@ class TestErrorHandling(iotests.QMPTestCase):
>>           self.assertTrue(iotests.compare_images(mid_img, backing_img,
>> fmt2='raw'),
>>                           'target image does not match source after
>> commit')
>>   +class TestCommitWithFilters(iotests.QMPTestCase):
>> +    img0 = os.path.join(iotests.test_dir, '0.img')
>> +    img1 = os.path.join(iotests.test_dir, '1.img')
>> +    img2 = os.path.join(iotests.test_dir, '2.img')
>> +    img3 = os.path.join(iotests.test_dir, '3.img')
>> +
>> +    def do_test_io(self, read_or_write):
> 
> 
> The method defenition could be moved down after the ones of setUp() and
> tearDown().

Yes, but it’s used by setUp(), so I thought maybe it’s nicer to place it
first.

>> +        for index, pattern_file in enumerate(self.pattern_files):
>> +            result = qemu_io('-f', iotests.imgfmt,
>> +                             '-c', '{} -P {} {}M
>> 1M'.format(read_or_write,
>> +                                                            index +
>> 1, index),
> 
> 
> The Python3 format string f'{rad_or_write} ..' might be used instead of
> the .format one.

Ah, sure.  The test is a bit older already, from when we didn’t yet use
format strings as often in the iotests. :)

>> +                             pattern_file)
>> +            self.assertFalse('Pattern verification failed' in result)
>> +
>> +    def setUp(self):
> 
> ...
> 
> 
> Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
> 
> 



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [PATCH v7 41/47] block: Leave BDS.backing_file constant
  2020-07-27 12:27   ` Andrey Shinkevich
@ 2020-07-28 14:10     ` Max Reitz
  0 siblings, 0 replies; 173+ messages in thread
From: Max Reitz @ 2020-07-28 14:10 UTC (permalink / raw)
  To: Andrey Shinkevich, qemu-block
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, qemu-devel


[-- Attachment #1.1: Type: text/plain, Size: 6998 bytes --]

On 27.07.20 14:27, Andrey Shinkevich wrote:
> On 25.06.2020 18:22, Max Reitz wrote:
>> Parts of the block layer treat BDS.backing_file as if it were whatever
>> the image header says (i.e., if it is a relative path, it is relative to
>> the overlay), other parts treat it like a cache for
>> bs->backing->bs->filename (relative paths are relative to the CWD).
>> Considering bs->backing->bs->filename exists, let us make it mean the
>> former.
>>
>> Among other things, this now allows the user to specify a base when
>> using qemu-img to commit an image file in a directory that is not the
>> CWD (assuming, everything uses relative filenames).
>>
>> Before this patch:
>>
>> $ ./qemu-img create -f qcow2 foo/bot.qcow2 1M
>> $ ./qemu-img create -f qcow2 -b bot.qcow2 foo/mid.qcow2
>> $ ./qemu-img create -f qcow2 -b mid.qcow2 foo/top.qcow2
>> $ ./qemu-img commit -b mid.qcow2 foo/top.qcow2
>> qemu-img: Did not find 'mid.qcow2' in the backing chain of
>> 'foo/top.qcow2'
>> $ ./qemu-img commit -b foo/mid.qcow2 foo/top.qcow2
>> qemu-img: Did not find 'foo/mid.qcow2' in the backing chain of
>> 'foo/top.qcow2'
>> $ ./qemu-img commit -b $PWD/foo/mid.qcow2 foo/top.qcow2
>> qemu-img: Did not find '[...]/foo/mid.qcow2' in the backing chain of
>> 'foo/top.qcow2'
>>
>> After this patch:
>>
>> $ ./qemu-img commit -b mid.qcow2 foo/top.qcow2
>> Image committed.
>> $ ./qemu-img commit -b foo/mid.qcow2 foo/top.qcow2
>> qemu-img: Did not find 'foo/mid.qcow2' in the backing chain of
>> 'foo/top.qcow2'
>> $ ./qemu-img commit -b $PWD/foo/mid.qcow2 foo/top.qcow2
>> Image committed.
>>
>> With this change, bdrv_find_backing_image() must look at whether the
>> user has overridden a BDS's backing file.  If so, it can no longer use
>> bs->backing_file, but must instead compare the given filename against
>> the backing node's filename directly.
>>
>> Note that this changes the QAPI output for a node's backing_file.  We
>> had very inconsistent output there (sometimes what the image header
>> said, sometimes the actual filename of the backing image).  This
>> inconsistent output was effectively useless, so we have to decide one
>> way or the other.  Considering that bs->backing_file usually at runtime
>> contained the path to the image relative to qemu's CWD (or absolute),
>> this patch changes QAPI's backing_file to always report the
>> bs->backing->bs->filename from now on.  If you want to receive the image
>> header information, you have to refer to full-backing-filename.
>>
>> This necessitates a change to iotest 228.  The interesting information
>> it really wanted is the image header, and it can get that now, but it
>> has to use full-backing-filename instead of backing_file.  Because of
>> this patch's changes to bs->backing_file's behavior, we also need some
>> reference output changes.
>>
>> Along with the changes to bs->backing_file, stop updating
>> BDS.backing_format in bdrv_backing_attach() as well.  In order not to
>> change our externally visible behavior (incompatibly), we have to let
>> bdrv_query_image_info() try to get the image format from bs->backing if
>> bs->backing_format is unset.  (The QAPI schema describes
>> backing-filename-format as "the format of the backing file", so it is
>> not necessarily what the image header says, but just the format of the
>> file referenced by backing-filename (if known).)
>>
>> iotest 245 changes in behavior: With the backing node no longer
>> overriding the parent node's backing_file string, you can now omit the
>> @backing option when reopening a node with neither a default nor a
>> current backing file even if it used to have a backing node at some
>> point.
>>
>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>> ---
>>   include/block/block_int.h  | 21 ++++++++++++++++-----
>>   block.c                    | 35 +++++++++++++++++++++++++++--------
>>   block/qapi.c               | 17 +++++++++++++----
>>   tests/qemu-iotests/228     |  6 +++---
>>   tests/qemu-iotests/228.out |  6 +++---
>>   tests/qemu-iotests/245     |  4 +++-
>>   6 files changed, 65 insertions(+), 24 deletions(-)
>>
> ...
>> diff --git a/block/qapi.c b/block/qapi.c
>> index 2628323b63..5da6d7e6e0 100644
>> --- a/block/qapi.c
>> +++ b/block/qapi.c
>> @@ -47,7 +47,7 @@ BlockDeviceInfo *bdrv_block_device_info(BlockBackend
>> *blk,
>>                                           Error **errp)
>>   {
>>       ImageInfo **p_image_info;
>> -    BlockDriverState *bs0;
>> +    BlockDriverState *bs0, *backing;
>>       BlockDeviceInfo *info;
>>         if (!bs->drv) {
>> @@ -76,9 +76,10 @@ BlockDeviceInfo
>> *bdrv_block_device_info(BlockBackend *blk,
>>           info->node_name = g_strdup(bs->node_name);
>>       }
>>   -    if (bs->backing_file[0]) {
>> +    backing = bdrv_cow_bs(bs);
>> +    if (backing) {
>>           info->has_backing_file = true;
>> -        info->backing_file = g_strdup(bs->backing_file);
>> +        info->backing_file = g_strdup(backing->filename);
>>       }
>>         if (!QLIST_EMPTY(&bs->dirty_bitmaps)) {
>> @@ -314,6 +315,8 @@ void bdrv_query_image_info(BlockDriverState *bs,
>>       backing_filename = bs->backing_file;
>>       if (backing_filename[0] != '\0') {
>>           char *backing_filename2;
>> +        const char *backing_format = NULL;
>> +
>>           info->backing_filename = g_strdup(backing_filename);
>>           info->has_backing_filename = true;
>>           backing_filename2 = bdrv_get_full_backing_filename(bs, NULL);
>> @@ -326,7 +329,13 @@ void bdrv_query_image_info(BlockDriverState *bs,
>>           }
>>             if (bs->backing_format[0]) {
>> -            info->backing_filename_format =
>> g_strdup(bs->backing_format);
>> +            backing_format = bs->backing_format;
>> +        } else if (bs->backing && bs->backing->bs->drv &&
>> +                   !bdrv_backing_overridden(bs)) {
>> +            backing_format = bs->backing->bs->drv->format_name;
>> +        }
> 
> 
> In case bdrv_backing_overridden() returns true , should we invoke
> bdrv_refresh_filename() and assign the format_name then?

I don’t think so.  The format we return in info->backing_filename_format
should be the format of the file returned in info->backing_filename.
The latter is bs->backing_file, which (as of this patch), is the backing
file as reported by the image header.  Therefore, if the backing file
was overridden, we cannot assume that bs->backing->bs refers to the same
file as bs->backing_file, and so we cannot assume
bs->backing->bs->drv->format_name to be info->backing_filename’s format.

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [PATCH v7 01/47] block: Add child access functions
  2020-07-13  9:06   ` Vladimir Sementsov-Ogievskiy
  2020-07-16 14:46     ` Max Reitz
@ 2020-07-28 16:09     ` Christophe de Dinechin
  2020-08-07  9:33       ` Vladimir Sementsov-Ogievskiy
  1 sibling, 1 reply; 173+ messages in thread
From: Christophe de Dinechin @ 2020-07-28 16:09 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy
  Cc: Kevin Wolf, qemu-devel, qemu-block, Max Reitz


On 2020-07-13 at 11:06 CEST, Vladimir Sementsov-Ogievskiy wrote...
> 25.06.2020 18:21, Max Reitz wrote:
>> There are BDS children that the general block layer code can access,
>> namely bs->file and bs->backing.  Since the introduction of filters and
>> external data files, their meaning is not quite clear.  bs->backing can
>> be a COW source, or it can be a filtered child; bs->file can be a
>> filtered child, it can be data and metadata storage, or it can be just
>> metadata storage.
>>
>> This overloading really is not helpful.  This patch adds functions that
>> retrieve the correct child for each exact purpose.  Later patches in
>> this series will make use of them.  Doing so will allow us to handle
>> filter nodes in a meaningful way.
>>
>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>> ---
>
> [..]
>
>> +/*
>> + * Return the primary child of this node: For filters, that is the
>> + * filtered child.  For other nodes, that is usually the child storing
>> + * metadata.
>> + * (A generally more helpful description is that this is (usually) the
>> + * child that has the same filename as @bs.)
>> + *
>> + * Drivers do not necessarily have a primary child; for example quorum
>> + * does not.
>> + */
>> +BdrvChild *bdrv_primary_child(BlockDriverState *bs)
>> +{
>> +    BdrvChild *c;
>> +
>> +    QLIST_FOREACH(c, &bs->children, next) {
>> +        if (c->role & BDRV_CHILD_PRIMARY) {
>> +            return c;
>> +        }
>> +    }
>> +
>> +    return NULL;
>> +}
>>
>
> Suggest squash-in to also assert that not more than one primary child:
> --- a/block.c
> +++ b/block.c
> @@ -6998,13 +6998,14 @@ BdrvChild *bdrv_filter_or_cow_child(BlockDriverState *bs)
>    */
>   BdrvChild *bdrv_primary_child(BlockDriverState *bs)
>   {
> -    BdrvChild *c;
> +    BdrvChild *c, *found = NULL;
>
>       QLIST_FOREACH(c, &bs->children, next) {
>           if (c->role & BDRV_CHILD_PRIMARY) {
> -            return c;
> +            assert(!found);
> +            found = c;
>           }
>       }
>
> -    return NULL;
> +    return c;

Shouldn't that be "return found"?
>   }
>
>
> with or without:
> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>


--
Cheers,
Christophe de Dinechin (IRC c3d)



^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [PATCH v7 45/47] iotests: Add filter mirror test cases
  2020-06-25 15:22 ` [PATCH v7 45/47] iotests: Add filter mirror " Max Reitz
@ 2020-08-02 11:05   ` Andrey Shinkevich
  0 siblings, 0 replies; 173+ messages in thread
From: Andrey Shinkevich @ 2020-08-02 11:05 UTC (permalink / raw)
  To: Max Reitz, qemu-block
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, qemu-devel

On 25.06.2020 18:22, Max Reitz wrote:
> This patch adds some test cases how mirroring relates to filters.  One
> of them tests what happens when you mirror off a filtered COW node, two
> others use the mirror filter node as basically our only example of an
> implicitly created filter node so far (besides the commit filter).
>
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>   tests/qemu-iotests/041     | 146 ++++++++++++++++++++++++++++++++++++-
>   tests/qemu-iotests/041.out |   4 +-
>   2 files changed, 147 insertions(+), 3 deletions(-)
>
> diff --git a/tests/qemu-iotests/041 b/tests/qemu-iotests/041
> index b843f88a66..588bb76626 100755
> --- a/tests/qemu-iotests/041
> +++ b/tests/qemu-iotests/041
...
> diff --git a/tests/qemu-iotests/041.out b/tests/qemu-iotests/041.out
> index 53abe11d73..46651953e8 100644
> --- a/tests/qemu-iotests/041.out
> +++ b/tests/qemu-iotests/041.out
> @@ -1,5 +1,5 @@
> -........................................................................................................
> +...........................................................................................................
>   ----------------------------------------------------------------------
> -Ran 104 tests
> +Ran 107 tests
>   
>   OK


Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>




^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [PATCH v7 47/47] iotests: Test committing to overridden backing
  2020-06-25 15:22 ` [PATCH v7 47/47] iotests: Test committing to overridden backing Max Reitz
@ 2020-08-02 11:43   ` Andrey Shinkevich
  0 siblings, 0 replies; 173+ messages in thread
From: Andrey Shinkevich @ 2020-08-02 11:43 UTC (permalink / raw)
  To: Max Reitz, qemu-block
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, qemu-devel

On 25.06.2020 18:22, Max Reitz wrote:
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>   tests/qemu-iotests/040     | 61 ++++++++++++++++++++++++++++++++++++++
>   tests/qemu-iotests/040.out |  4 +--
>   2 files changed, 63 insertions(+), 2 deletions(-)
>
> diff --git a/tests/qemu-iotests/040 b/tests/qemu-iotests/040
> index e7fa244738..dfd46ddcbe 100755
> --- a/tests/qemu-iotests/040
> +++ b/tests/qemu-iotests/040
> @@ -890,6 +890,67 @@ class TestCommitWithFilters(iotests.QMPTestCase):
>           # 3 has been comitted into 2
>           self.pattern_files[3] = self.img2
>   
> +class TestCommitWithOverriddenBacking(iotests.QMPTestCase):
> +    img_base_a = os.path.join(iotests.test_dir, 'base_a.img')
> +    img_base_b = os.path.join(iotests.test_dir, 'base_b.img')
...
> diff --git a/tests/qemu-iotests/040.out b/tests/qemu-iotests/040.out
> index 4823c113d5..1bb1dc5f0e 100644
> --- a/tests/qemu-iotests/040.out
> +++ b/tests/qemu-iotests/040.out
> @@ -1,5 +1,5 @@
> -...............................................................
> +.................................................................
>   ----------------------------------------------------------------------
> -Ran 63 tests
> +Ran 65 tests
>   
>   OK


Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>




^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [PATCH v7 46/47] iotests: Add test for commit in sub directory
  2020-06-25 15:22 ` [PATCH v7 46/47] iotests: Add test for commit in sub directory Max Reitz
@ 2020-08-02 12:13   ` Andrey Shinkevich
  0 siblings, 0 replies; 173+ messages in thread
From: Andrey Shinkevich @ 2020-08-02 12:13 UTC (permalink / raw)
  To: Max Reitz, qemu-block
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, qemu-devel

On 25.06.2020 18:22, Max Reitz wrote:
> Add a test for committing an overlay in a sub directory to one of the
> images in its backing chain, using both relative and absolute filenames.
>
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>   tests/qemu-iotests/020     | 44 ++++++++++++++++++++++++++++++++++++++
>   tests/qemu-iotests/020.out | 10 +++++++++
>   2 files changed, 54 insertions(+)
>
> diff --git a/tests/qemu-iotests/020 b/tests/qemu-iotests/020
> index 20f8f185d0..d5b5d34058 100755
> --- a/tests/qemu-iotests/020
> +++ b/tests/qemu-iotests/020
> @@ -31,6 +31,11 @@ _cleanup()
>       _cleanup_test_img
>       _rm_test_img "$TEST_IMG.base"
>       _rm_test_img "$TEST_IMG.orig"
> +
> +    _rm_test_img "$TEST_DIR/subdir/t.$IMGFMT.base"
> +    _rm_test_img "$TEST_DIR/subdir/t.$IMGFMT.mid"
> +    _rm_test_img "$TEST_DIR/subdir/t.$IMGFMT"
> +    rmdir "$TEST_DIR/subdir" &> /dev/null
>   }
>   trap "_cleanup; exit \$status" 0 1 2 3 15
>   
> @@ -134,6 +139,45 @@ $QEMU_IO -c 'writev 0 64k' "$TEST_IMG" | _filter_qemu_io
>   $QEMU_IMG commit "$TEST_IMG"
>   _cleanup
>   
> +
> +echo
> +echo 'Testing commit in sub-directory with relative filenames'
> +echo
> +
> +pushd "$TEST_DIR" > /dev/null
> +
> +mkdir subdir
> +
> +TEST_IMG="subdir/t.$IMGFMT.base" _make_test_img 1M
> +TEST_IMG="subdir/t.$IMGFMT.mid" _make_test_img -b "t.$IMGFMT.base"
> +TEST_IMG="subdir/t.$IMGFMT" _make_test_img -b "t.$IMGFMT.mid"
> +
> +# Should work
> +$QEMU_IMG commit -b "t.$IMGFMT.mid" "subdir/t.$IMGFMT"
> +
> +# Might theoretically work, but does not in practice (we have to
> +# decide between this and the above; and since we always represent
> +# backing file names as relative to the overlay, we go for the above)
> +$QEMU_IMG commit -b "subdir/t.$IMGFMT.mid" "subdir/t.$IMGFMT" 2>&1 | \
> +    _filter_imgfmt
> +
> +# This should work as well
> +$QEMU_IMG commit -b "$TEST_DIR/subdir/t.$IMGFMT.mid" "subdir/t.$IMGFMT"
> +
> +popd > /dev/null
> +
> +# Now let's try with just absolute filenames
> +# (This will not work with external data files, though, because when
> +# using relative paths for those, qemu will always resolve them
> +# relative to its CWD.  Therefore, it cannot find those data files now
> +# that we left $TEST_DIR.)
> +if _get_data_file '' > /dev/null; then
> +    echo 'Image committed.' # Skip test
> +else
> +    $QEMU_IMG commit -b "$TEST_DIR/subdir/t.$IMGFMT.mid" \
> +        "$TEST_DIR/subdir/t.$IMGFMT"
> +fi
> +
>   # success, all done
>   echo "*** done"
>   rm -f $seq.full
> diff --git a/tests/qemu-iotests/020.out b/tests/qemu-iotests/020.out
> index 4b722b2dd0..228c37dded 100644
> --- a/tests/qemu-iotests/020.out
> +++ b/tests/qemu-iotests/020.out
> @@ -1094,4 +1094,14 @@ Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1048576 backing_file=json:{'driv
>   wrote 65536/65536 bytes at offset 0
>   64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
>   qemu-img: Block job failed: No space left on device
> +
> +Testing commit in sub-directory with relative filenames
> +
> +Formatting 'subdir/t.IMGFMT.base', fmt=IMGFMT size=1048576
> +Formatting 'subdir/t.IMGFMT.mid', fmt=IMGFMT size=1048576 backing_file=t.IMGFMT.base
> +Formatting 'subdir/t.IMGFMT', fmt=IMGFMT size=1048576 backing_file=t.IMGFMT.mid
> +Image committed.
> +qemu-img: Did not find 'subdir/t.IMGFMT.mid' in the backing chain of 'subdir/t.IMGFMT'
> +Image committed.
> +Image committed.
>   *** done


Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>



^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [PATCH v7 01/47] block: Add child access functions
  2020-07-28 16:09     ` Christophe de Dinechin
@ 2020-08-07  9:33       ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 173+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-08-07  9:33 UTC (permalink / raw)
  To: Christophe de Dinechin; +Cc: Kevin Wolf, qemu-devel, qemu-block, Max Reitz

28.07.2020 19:09, Christophe de Dinechin wrote:
> 
> On 2020-07-13 at 11:06 CEST, Vladimir Sementsov-Ogievskiy wrote...
>> 25.06.2020 18:21, Max Reitz wrote:
>>> There are BDS children that the general block layer code can access,
>>> namely bs->file and bs->backing.  Since the introduction of filters and
>>> external data files, their meaning is not quite clear.  bs->backing can
>>> be a COW source, or it can be a filtered child; bs->file can be a
>>> filtered child, it can be data and metadata storage, or it can be just
>>> metadata storage.
>>>
>>> This overloading really is not helpful.  This patch adds functions that
>>> retrieve the correct child for each exact purpose.  Later patches in
>>> this series will make use of them.  Doing so will allow us to handle
>>> filter nodes in a meaningful way.
>>>
>>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>>> ---
>>
>> [..]
>>
>>> +/*
>>> + * Return the primary child of this node: For filters, that is the
>>> + * filtered child.  For other nodes, that is usually the child storing
>>> + * metadata.
>>> + * (A generally more helpful description is that this is (usually) the
>>> + * child that has the same filename as @bs.)
>>> + *
>>> + * Drivers do not necessarily have a primary child; for example quorum
>>> + * does not.
>>> + */
>>> +BdrvChild *bdrv_primary_child(BlockDriverState *bs)
>>> +{
>>> +    BdrvChild *c;
>>> +
>>> +    QLIST_FOREACH(c, &bs->children, next) {
>>> +        if (c->role & BDRV_CHILD_PRIMARY) {
>>> +            return c;
>>> +        }
>>> +    }
>>> +
>>> +    return NULL;
>>> +}
>>>
>>
>> Suggest squash-in to also assert that not more than one primary child:
>> --- a/block.c
>> +++ b/block.c
>> @@ -6998,13 +6998,14 @@ BdrvChild *bdrv_filter_or_cow_child(BlockDriverState *bs)
>>     */
>>    BdrvChild *bdrv_primary_child(BlockDriverState *bs)
>>    {
>> -    BdrvChild *c;
>> +    BdrvChild *c, *found = NULL;
>>
>>        QLIST_FOREACH(c, &bs->children, next) {
>>            if (c->role & BDRV_CHILD_PRIMARY) {
>> -            return c;
>> +            assert(!found);
>> +            found = c;
>>            }
>>        }
>>
>> -    return NULL;
>> +    return c;
> 
> Shouldn't that be "return found"?

Oops, you are right of course!

>>    }
>>
>>
>> with or without:
>> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> 
> 
> --
> Cheers,
> Christophe de Dinechin (IRC c3d)
> 


-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [PATCH v7 03/47] block: bdrv_cow_child() for bdrv_has_zero_init()
  2020-06-25 15:21 ` [PATCH v7 03/47] block: bdrv_cow_child() for bdrv_has_zero_init() Max Reitz
  2020-07-08 17:23   ` Andrey Shinkevich
@ 2020-08-07  9:37   ` Vladimir Sementsov-Ogievskiy
  1 sibling, 0 replies; 173+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-08-07  9:37 UTC (permalink / raw)
  To: Max Reitz, qemu-block; +Cc: Kevin Wolf, qemu-devel

25.06.2020 18:21, Max Reitz wrote:
> bdrv_has_zero_init() and the related bdrv_unallocated_blocks_are_zero()
> should use bdrv_cow_child() if they want to check whether the given BDS
> has a COW backing file.
> 
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>   block.c | 4 ++--
>   1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/block.c b/block.c
> index 0a0b855261..f3e2aae49c 100644
> --- a/block.c
> +++ b/block.c
> @@ -5394,7 +5394,7 @@ int bdrv_has_zero_init(BlockDriverState *bs)
>   
>       /* If BS is a copy on write image, it is initialized to
>          the contents of the base image, which may not be zeroes.  */
> -    if (bs->backing) {
> +    if (bdrv_cow_child(bs)) {
>           return 0;
>       }
>       if (bs->drv->bdrv_has_zero_init) {
> @@ -5412,7 +5412,7 @@ bool bdrv_unallocated_blocks_are_zero(BlockDriverState *bs)

the function is removed, so, this chunk should be dropped or rebased..

>   {
>       BlockDriverInfo bdi;
>   
> -    if (bs->backing) {
> +    if (bdrv_cow_child(bs)) {
>           return false;
>       }
>   
> 


-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [PATCH v7 14/47] stream: Deal with filters
  2020-07-16 14:59         ` Max Reitz
@ 2020-08-07 10:29           ` Vladimir Sementsov-Ogievskiy
  2020-08-10  8:12             ` Max Reitz
  0 siblings, 1 reply; 173+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-08-07 10:29 UTC (permalink / raw)
  To: Max Reitz, Andrey Shinkevich, qemu-block; +Cc: Kevin Wolf, qemu-devel

16.07.2020 17:59, Max Reitz wrote:
> On 10.07.20 19:41, Andrey Shinkevich wrote:
>> On 10.07.2020 18:24, Max Reitz wrote:
>>> On 09.07.20 16:52, Andrey Shinkevich wrote:
>>>> On 25.06.2020 18:21, Max Reitz wrote:
>>>>> Because of the (not so recent anymore) changes that make the stream job
>>>>> independent of the base node and instead track the node above it, we
>>>>> have to split that "bottom" node into two cases: The bottom COW node,
>>>>> and the node directly above the base node (which may be an R/W filter
>>>>> or the bottom COW node).
>>>>>
>>>>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>>>>> ---
>>>>>     qapi/block-core.json |  4 +++
>>>>>     block/stream.c       | 63
>>>>> ++++++++++++++++++++++++++++++++------------
>>>>>     blockdev.c           |  4 ++-
>>>>>     3 files changed, 53 insertions(+), 18 deletions(-)
>>>>>
>>>>> diff --git a/qapi/block-core.json b/qapi/block-core.json
>>>>> index b20332e592..df87855429 100644
>>>>> --- a/qapi/block-core.json
>>>>> +++ b/qapi/block-core.json
>>>>> @@ -2486,6 +2486,10 @@
>>>>>     # On successful completion the image file is updated to drop the
>>>>> backing file
>>>>>     # and the BLOCK_JOB_COMPLETED event is emitted.
>>>>>     #
>>>>> +# In case @device is a filter node, block-stream modifies the first
>>>>> non-filter
>>>>> +# overlay node below it to point to base's backing node (or NULL if
>>>>> @base was
>>>>> +# not specified) instead of modifying @device itself.
>>>>> +#
>>>>>     # @job-id: identifier for the newly-created block job. If
>>>>>     #          omitted, the device name will be used. (Since 2.7)
>>>>>     #
>>>>> diff --git a/block/stream.c b/block/stream.c
>>>>> index aa2e7af98e..b9c1141656 100644
>>>>> --- a/block/stream.c
>>>>> +++ b/block/stream.c
>>>>> @@ -31,7 +31,8 @@ enum {
>>>>>       typedef struct StreamBlockJob {
>>>>>         BlockJob common;
>>>>> -    BlockDriverState *bottom;
>>>>> +    BlockDriverState *base_overlay; /* COW overlay (stream from
>>>>> this) */
>>>>> +    BlockDriverState *above_base;   /* Node directly above the base */
>>>> Keeping the base_overlay is enough to complete the stream job.
>>> Depends on the definition.  If we decide it isn’t enough, then it isn’t
>>> enough.
>>>
>>>> The above_base may disappear during the job and we can't rely on it.
>>> In this version of this series, it may not, because the chain is frozen.
>>>    So the above_base cannot disappear.
>>
>> Once we insert a filter above the top bs of the stream job, the parallel
>> jobs in
>>
>> the iotests #030 will fail with 'frozen link error'. It is because of the
>>
>> independent parallel stream or commit jobs that insert/remove their filters
>>
>> asynchroniously.
> 
> I’m not sure whether that’s a problem with this series specifically.
> 
>>> We can discuss whether we should allow it to disappear, but I think not.
>>>
>>> The problem is, we need something to set as the backing file after
>>> streaming.  How do we figure out what that should be?  My proposal is we
>>> keep above_base and use its immediate child.
>>
>> We can do the same with the base_overlay.
>>
>> If the backing node turns out to be a filter, the proper backing child will
>>
>> be set after the filter is removed. So, we shouldn't care.
> 
> And what if the user manually added some filter above the base (i.e.
> below base_overlay) that they want to keep after the job?


It's automatically kept, if we use base_overlay->backing->bs as final backing node.

You mean, that they want it to be dropped?


so, assuming the following:

top -(backing)-> manually-inserted-filter -(file)-> base

and user do stream with base=base, and expects filter to be removed by stream job?

Hmm, yes, such use-case is broken with our proposed way...

====

Let me now clarify the problem we'll have with your way.

When stream don't have any filter, we can easily imagine two parallel stream jobs:

top -(backing)-> mid1 -(backing)-> mid2 -(backing)-> base

stream1: top=top, base=mid2
stream2: top=mid2, base=NULL

final picture is obvious:

top (merged with mid1) -(backing)-> mid2 (merged with base)

But we want stream job has own filter, like mirror. So the picture becomes more complex.

Assume stream2 starts first.

top -(backing)-> mid1 -(backing)-> stream2-filter -(backing)-> mid2 -(backing)-> base

Now, when we run stream1, with your solution, stream1 will freeze stream2-filter
(wrong thing, stream2 will fail to remove it if it finished first), and stream1 will
remove stream2-filter on finish (which is wrong as well, stream2 is not prepared to
removing of its filter)..

But, with our proposed way (freeze only chain up to base_overlay inclusively, and use backing(base_overlay) as final backing), all will work as expected, and two parallel jobs will work..

====

So, these are two mutually exclusive cases.. I vote for freezing up to base_overlay, and use backing(base_overlay) as final backing, because:

1. I can't imaging other way to fix the case with parallel streams with filters (it's not a problem of current master, but we have pending series which will introduce stream job filter, and the problem will appear and even break iotest 30)

2. I don't think that removing filters above base node by stream job is so important case to break parallel stream jobs in future:

  - Stream job is not intended to remove filters, but to stream data. Filters between base_overlay and base don't contain any data and unrelated to stream process
  - I think, that filters are "more related" to their children than to their parents. So, removing filters related to base node, when we just remove all data-containing nodes between top and base (and are not going to remove base node) is at least questionable. On the contrary, removing all intermediate data containing nodes _together_ with their filters is absolutely correct thing to do.

Next, with your way, what about filters, inserted above base during stream job? They will be between above_base and base, and will not be removed. So with your way, filters above base, existing before job start will be frozen during the job and removed after it, but filters appended above base during the job will be untouched. With our way, just all base node related filters are untouched by the job. It seems simpler definition for me and simpler to document.

> 
>>> If we don’t keep above_base, then we’re basically left guessing as to
>>> what should be the backing file after the stream job.
>>>
>>>>>         BlockdevOnError on_error;
>>>>>         char *backing_file_str;
>>>>>         bool bs_read_only;
>>>>> @@ -53,7 +54,7 @@ static void stream_abort(Job *job)
>>>>>           if (s->chain_frozen) {
>>>>>             BlockJob *bjob = &s->common;
>>>>> -        bdrv_unfreeze_backing_chain(blk_bs(bjob->blk), s->bottom);
>>>>> +        bdrv_unfreeze_backing_chain(blk_bs(bjob->blk), s->above_base);
>>>>>         }
>>>>>     }
>>>>>     @@ -62,14 +63,15 @@ static int stream_prepare(Job *job)
>>>>>         StreamBlockJob *s = container_of(job, StreamBlockJob,
>>>>> common.job);
>>>>>         BlockJob *bjob = &s->common;
>>>>>         BlockDriverState *bs = blk_bs(bjob->blk);
>>>>> -    BlockDriverState *base = backing_bs(s->bottom);
>>>>> +    BlockDriverState *unfiltered_bs = bdrv_skip_filters(bs);
>>>>> +    BlockDriverState *base = bdrv_filter_or_cow_bs(s->above_base);
>>>> The initial base node may be a top node for a concurrent commit job and
>>>>
>>>> may disappear.
>>> Then it would just be replaced by another node, though, so above_base
>>> keeps a child.  The @base here is not necessarily the initial @base, and
>>> that’s intentional.
>>
>> Not really. In my example, above_base becomes a dangling
>>
>> pointer because after the commit job finishes, its filter that should
>> belong to the
>>
>> commit job frozen chain will be deleted. If we freeze the link to the
>> above_base
>>
>> for this job, the iotests #30 will not pass.
> 
> So it doesn’t become a dangling pointer, because it’s frozen.
> 
> 030 passes after this series, so I’m not sure whether I can consider
> that problem part of this series.
> 
> I think if adding a filter node becomes a problem, we have to consider
> relaxing the restrictions when we do that, not now.
> 
>>>> base = bdrv_filter_or_cow_bs(s->base_overlay) is more reliable.
>>> But also wrong.  The point of keeping above_base around is to get its
>>> child here to use that child as the new backing child of the top node.
>>>
>>>>>         Error *local_err = NULL;
>>>>>         int ret = 0;
>>>>>     -    bdrv_unfreeze_backing_chain(bs, s->bottom);
>>>>> +    bdrv_unfreeze_backing_chain(bs, s->above_base);
>>>>>         s->chain_frozen = false;
>>>>>     -    if (bs->backing) {
>>>>> +    if (bdrv_cow_child(unfiltered_bs)) {
>>>>>             const char *base_id = NULL, *base_fmt = NULL;
>>>>>             if (base) {
>>>>>                 base_id = s->backing_file_str;
>>>>> @@ -77,8 +79,8 @@ static int stream_prepare(Job *job)
>>>>>                     base_fmt = base->drv->format_name;
>>>>>                 }
>>>>>             }
>>>>> -        bdrv_set_backing_hd(bs, base, &local_err);
>>>>> -        ret = bdrv_change_backing_file(bs, base_id, base_fmt);
>>>>> +        bdrv_set_backing_hd(unfiltered_bs, base, &local_err);
>>>>> +        ret = bdrv_change_backing_file(unfiltered_bs, base_id,
>>>>> base_fmt);
>>>>>             if (local_err) {
>>>>>                 error_report_err(local_err);
>>>>>                 return -EPERM;
>>>>> @@ -109,14 +111,15 @@ static int coroutine_fn stream_run(Job *job,
>>>>> Error **errp)
>>>>>         StreamBlockJob *s = container_of(job, StreamBlockJob,
>>>>> common.job);
>>>>>         BlockBackend *blk = s->common.blk;
>>>>>         BlockDriverState *bs = blk_bs(blk);
>>>>> -    bool enable_cor = !backing_bs(s->bottom);
>>>>> +    BlockDriverState *unfiltered_bs = bdrv_skip_filters(bs);
>>>>> +    bool enable_cor = !bdrv_cow_child(s->base_overlay);
>>>>>         int64_t len;
>>>>>         int64_t offset = 0;
>>>>>         uint64_t delay_ns = 0;
>>>>>         int error = 0;
>>>>>         int64_t n = 0; /* bytes */
>>>>>     -    if (bs == s->bottom) {
>>>>> +    if (unfiltered_bs == s->base_overlay) {
>>>>>             /* Nothing to stream */
>>>>>             return 0;
>>>>>         }
>>>>> @@ -150,13 +153,14 @@ static int coroutine_fn stream_run(Job *job,
>>>>> Error **errp)
>>>>>               copy = false;
>>>>>     -        ret = bdrv_is_allocated(bs, offset, STREAM_CHUNK, &n);
>>>>> +        ret = bdrv_is_allocated(unfiltered_bs, offset, STREAM_CHUNK,
>>>>> &n);
>>>>>             if (ret == 1) {
>>>>>                 /* Allocated in the top, no need to copy.  */
>>>>>             } else if (ret >= 0) {
>>>>>                 /* Copy if allocated in the intermediate images.  Limit
>>>>> to the
>>>>>                  * known-unallocated area [offset,
>>>>> offset+n*BDRV_SECTOR_SIZE).  */
>>>>> -            ret = bdrv_is_allocated_above(backing_bs(bs), s->bottom,
>>>>> true,
>>>>> +            ret = bdrv_is_allocated_above(bdrv_cow_bs(unfiltered_bs),
>>>>> +                                          s->base_overlay, true,
>>>>>                                               offset, n, &n);
>>>>>                 /* Finish early if end of backing file has been
>>>>> reached */
>>>>>                 if (ret == 0 && n == 0) {
>>>>> @@ -223,9 +227,29 @@ void stream_start(const char *job_id,
>>>>> BlockDriverState *bs,
>>>>>         BlockDriverState *iter;
>>>>>         bool bs_read_only;
>>>>>         int basic_flags = BLK_PERM_CONSISTENT_READ |
>>>>> BLK_PERM_WRITE_UNCHANGED;
>>>>> -    BlockDriverState *bottom = bdrv_find_overlay(bs, base);
>>>>> +    BlockDriverState *base_overlay = bdrv_find_overlay(bs, base);
>>>>> +    BlockDriverState *above_base;
>>>>>     -    if (bdrv_freeze_backing_chain(bs, bottom, errp) < 0) {
>>>>> +    if (!base_overlay) {
>>>>> +        error_setg(errp, "'%s' is not in the backing chain of '%s'",
>>>>> +                   base->node_name, bs->node_name);
>>>> Sorry, I am not clear with the error message.
>>>>
>>>> In this case, there is no an intermediate COW node but the base, if not
>>>> NULL, is
>>>>
>>>> in the backing chain of bs, isn't it?
>>>>
>>>>> +        return;
>>>>> +    }
>>>>> +
>>>>> +    /*
>>>>> +     * Find the node directly above @base.  @base_overlay is a COW
>>>>> overlay, so
>>>>> +     * it must have a bdrv_cow_child(), but it is the immediate
>>>>> overlay of
>>>>> +     * @base, so between the two there can only be filters.
>>>>> +     */
>>>>> +    above_base = base_overlay;
>>>>> +    if (bdrv_cow_bs(above_base) != base) {
>>>>> +        above_base = bdrv_cow_bs(above_base);
>>>>> +        while (bdrv_filter_bs(above_base) != base) {
>>>>> +            above_base = bdrv_filter_bs(above_base);
>>>>> +        }
>>>>> +    }
>>>>> +
>>>>> +    if (bdrv_freeze_backing_chain(bs, above_base, errp) < 0) {
>>>> When a concurrent stream job tries to freeze or remove the above_base
>>>> node,
>>>>
>>>> we will encounter the frozen node error. The above_base node is a part
>>>> of the
>>>>
>>>> concurrent job frozen chain.
>>> Correct.
>>>
>>>>>             return;
>>>>>         }
>>>>>     @@ -255,14 +279,19 @@ void stream_start(const char *job_id,
>>>>> BlockDriverState *bs,
>>>>>          * and resizes. Reassign the base node pointer because the
>>>>> backing BS of the
>>>>>          * bottom node might change after the call to
>>>>> bdrv_reopen_set_read_only()
>>>>>          * due to parallel block jobs running.
>>>>> +     * above_base node might change after the call to
>>>> Yes, if not frozen.
>>>>> +     * bdrv_reopen_set_read_only() due to parallel block jobs running.
>>>>>          */
>>>>> -    base = backing_bs(bottom);
>>>>> -    for (iter = backing_bs(bs); iter && iter != base; iter =
>>>>> backing_bs(iter)) {
>>>>> +    base = bdrv_filter_or_cow_bs(above_base);
>>>>> +    for (iter = bdrv_filter_or_cow_bs(bs); iter != base;
>>>>> +         iter = bdrv_filter_or_cow_bs(iter))
>>>>> +    {
>>>>>             block_job_add_bdrv(&s->common, "intermediate node", iter, 0,
>>>>>                                basic_flags, &error_abort);
>>>>>         }
>>>>>     -    s->bottom = bottom;
>>>>> +    s->base_overlay = base_overlay;
>>>>> +    s->above_base = above_base;
>>>> Generally, being the filter for a concurrent job, the above_base node
>>>> may be deleted any time
>>>>
>>>> and we will keep the dangling pointer. It may happen even earlier if
>>>> above_base is not frozen.
>>>>
>>>> If it is, as it here, we may get the frozen link error then.
>>> I’m not sure what you mean here.  Freezing it was absolutely
>>> intentional.  A dangling pointer would be a problem, but that’s why it’s
>>> frozen, so it stays around and can’t be deleted any time.
>>>
>>> Max
>>
>> The nodes we freeze should be in one context of the relevant job:
>>
>> filter->top_node->intermediate_node(s)
>>
>> We would not include the base or any filter above it to the frozen chain
>>
>> because they are of a different job context.
> 
> They aren’t really, because we need to know the backing node of @device
> after the job.
> 
>> Once 'this' job is completed, we set the current backing child of the
>> base_overlay
>>
>> and may not care of its character. If that is another job filter, it
>> will be replaced
>>
>> with the proper node afterwards.
> 
> But what if there are filters above the base that the user wants to keep
> after the job?
> 
> Max
> 


-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [PATCH v7 14/47] stream: Deal with filters
  2020-08-07 10:29           ` Vladimir Sementsov-Ogievskiy
@ 2020-08-10  8:12             ` Max Reitz
  2020-08-10 11:04               ` Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 173+ messages in thread
From: Max Reitz @ 2020-08-10  8:12 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, Andrey Shinkevich, qemu-block
  Cc: Kevin Wolf, qemu-devel


[-- Attachment #1.1: Type: text/plain, Size: 13019 bytes --]

On 07.08.20 12:29, Vladimir Sementsov-Ogievskiy wrote:
> 16.07.2020 17:59, Max Reitz wrote:
>> On 10.07.20 19:41, Andrey Shinkevich wrote:
>>> On 10.07.2020 18:24, Max Reitz wrote:
>>>> On 09.07.20 16:52, Andrey Shinkevich wrote:
>>>>> On 25.06.2020 18:21, Max Reitz wrote:
>>>>>> Because of the (not so recent anymore) changes that make the
>>>>>> stream job
>>>>>> independent of the base node and instead track the node above it, we
>>>>>> have to split that "bottom" node into two cases: The bottom COW node,
>>>>>> and the node directly above the base node (which may be an R/W filter
>>>>>> or the bottom COW node).
>>>>>>
>>>>>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>>>>>> ---
>>>>>>     qapi/block-core.json |  4 +++
>>>>>>     block/stream.c       | 63
>>>>>> ++++++++++++++++++++++++++++++++------------
>>>>>>     blockdev.c           |  4 ++-
>>>>>>     3 files changed, 53 insertions(+), 18 deletions(-)
>>>>>>
>>>>>> diff --git a/qapi/block-core.json b/qapi/block-core.json
>>>>>> index b20332e592..df87855429 100644
>>>>>> --- a/qapi/block-core.json
>>>>>> +++ b/qapi/block-core.json
>>>>>> @@ -2486,6 +2486,10 @@
>>>>>>     # On successful completion the image file is updated to drop the
>>>>>> backing file
>>>>>>     # and the BLOCK_JOB_COMPLETED event is emitted.
>>>>>>     #
>>>>>> +# In case @device is a filter node, block-stream modifies the first
>>>>>> non-filter
>>>>>> +# overlay node below it to point to base's backing node (or NULL if
>>>>>> @base was
>>>>>> +# not specified) instead of modifying @device itself.
>>>>>> +#
>>>>>>     # @job-id: identifier for the newly-created block job. If
>>>>>>     #          omitted, the device name will be used. (Since 2.7)
>>>>>>     #
>>>>>> diff --git a/block/stream.c b/block/stream.c
>>>>>> index aa2e7af98e..b9c1141656 100644
>>>>>> --- a/block/stream.c
>>>>>> +++ b/block/stream.c
>>>>>> @@ -31,7 +31,8 @@ enum {
>>>>>>       typedef struct StreamBlockJob {
>>>>>>         BlockJob common;
>>>>>> -    BlockDriverState *bottom;
>>>>>> +    BlockDriverState *base_overlay; /* COW overlay (stream from
>>>>>> this) */
>>>>>> +    BlockDriverState *above_base;   /* Node directly above the
>>>>>> base */
>>>>> Keeping the base_overlay is enough to complete the stream job.
>>>> Depends on the definition.  If we decide it isn’t enough, then it isn’t
>>>> enough.
>>>>
>>>>> The above_base may disappear during the job and we can't rely on it.
>>>> In this version of this series, it may not, because the chain is
>>>> frozen.
>>>>    So the above_base cannot disappear.
>>>
>>> Once we insert a filter above the top bs of the stream job, the parallel
>>> jobs in
>>>
>>> the iotests #030 will fail with 'frozen link error'. It is because of
>>> the
>>>
>>> independent parallel stream or commit jobs that insert/remove their
>>> filters
>>>
>>> asynchroniously.
>>
>> I’m not sure whether that’s a problem with this series specifically.
>>
>>>> We can discuss whether we should allow it to disappear, but I think
>>>> not.
>>>>
>>>> The problem is, we need something to set as the backing file after
>>>> streaming.  How do we figure out what that should be?  My proposal
>>>> is we
>>>> keep above_base and use its immediate child.
>>>
>>> We can do the same with the base_overlay.
>>>
>>> If the backing node turns out to be a filter, the proper backing
>>> child will
>>>
>>> be set after the filter is removed. So, we shouldn't care.
>>
>> And what if the user manually added some filter above the base (i.e.
>> below base_overlay) that they want to keep after the job?
> 
> 
> It's automatically kept, if we use base_overlay->backing->bs as final
> backing node.
> 
> You mean, that they want it to be dropped?

Er, yes.  Point is, the graph structure below with @base at the root may
be different than the one right below @base_overlay.

> so, assuming the following:
> 
> top -(backing)-> manually-inserted-filter -(file)-> base
> 
> and user do stream with base=base, and expects filter to be removed by
> stream job?
> 
> Hmm, yes, such use-case is broken with our proposed way...
> 
> ====
> 
> Let me now clarify the problem we'll have with your way.
> 
> When stream don't have any filter, we can easily imagine two parallel
> stream jobs:
> 
> top -(backing)-> mid1 -(backing)-> mid2 -(backing)-> base
> 
> stream1: top=top, base=mid2
> stream2: top=mid2, base=NULL
> 
> final picture is obvious:
> 
> top (merged with mid1) -(backing)-> mid2 (merged with base)

Yes, and I don’t think this currently working case is broken by this series.

> But we want stream job has own filter, like mirror.

Which it does not have yet, right?  Which is why I was saying that I
don’t think this is a problem with this series.  We could try to address
it later.

Or do you think we can’t address it later because right now all filter
cases are broken anyway so now would be the time to make a breaking
change (which the suggestion to not use @base as the final backing node is)?

> So the picture becomes more complex.
> 
> Assume stream2 starts first.
> 
> top -(backing)-> mid1 -(backing)-> stream2-filter -(backing)-> mid2
> -(backing)-> base

stream2-filter would be on top of mid2, right?

> Now, when we run stream1, with your solution, stream1 will freeze
> stream2-filter
> (wrong thing, stream2 will fail to remove it if it finished first), and
> stream1 will
> remove stream2-filter on finish (which is wrong as well, stream2 is not
> prepared to
> removing of its filter)..

Note that the user first needs to pass “mid2” as the base to the stream
job stream1.  Why don’t they just pass “stream2-filter”?  In my model,
the user should specify exactly which node they want not to be touched
by this stream job, and so that would be stream2-filter, not mid1.

I feel like the answer to this question has to do with implicit nodes.
AFAIU you wanted to remove them, so I don’t think we’d want to
special-case them here.

If you think that we can’t expect users to pass “stream2-filter” because
currently it should work with “mid2”, then that’s a case of implicit
nodes and it means we should ascend from @base up to the first
non-implicit node to get the @above_base we want.

> But, with our proposed way (freeze only chain up to base_overlay
> inclusively, and use backing(base_overlay) as final backing), all will
> work as expected, and two parallel jobs will work..

I don’t think it will work as expected because users can no longer
specify which node should be the base node after streaming.  And the
QAPI schema says that base-node is to become the backing file of the top
node after streaming.

I suppose you’re arguing that streaming through filters basically just
doesn’t work at all right now, so we’re free to do whatever?

Well, that still leaves the problem that users should be able to specify
which node is to become the base after streaming, and that that node
maybe shouldn’t be restricted to immediate children of COW images.

> ====
> 
> So, these are two mutually exclusive cases.. I vote for freezing up to
> base_overlay, and use backing(base_overlay) as final backing, because:
> 
> 1. I can't imaging other way to fix the case with parallel streams with
> filters (it's not a problem of current master, but we have pending
> series which will introduce stream job filter, and the problem will
> appear and even break iotest 30)

Besides the question of whether the top job could just use the bottom
job’s filter node as the base, there’s also the alternative of admitting
defeat and declaring that you just cannot use a single node in two
streams, because we didn’t find a way to make it work after all.

You could still create a temporary overlay in between that’s never used
and then drop it with a trivial stream afterwards.

(But that just in case specifying the bottom job’s stream node somehow
wouldn’t work.)

> 2. I don't think that removing filters above base node by stream job is
> so important case to break parallel stream jobs in future:
> 
>  - Stream job is not intended to remove filters, but to stream data.
> Filters between base_overlay and base don't contain any data and
> unrelated to stream process

Well, it is intended to remove nodes.  You can only remove data-bearing
nodes by moving data around.  I suspect if there was a way to get the
to-be-removed nodes removed without having to move their data around,
that would be popular.

>  - I think, that filters are "more related" to their children than to
> their parents. So, removing filters related to base node, when we just
> remove all data-containing nodes between top and base (and are not going
> to remove base node) is at least questionable.

Yes.

Although it could be argued that it is a handy way to remove filters, in
a backing chain at least.  (Thanks to bdrv_find_overlay(), @base and
@top still need to refer to different levels of the backing chain, but
if we lifted that restriction, I suppose it could work for any filter
chain.)

*shrug*

> On the contrary, removing
> all intermediate data containing nodes _together_ with their filters is
> absolutely correct thing to do.

I don’t think so, actually.  Like, you have a throttle node somewhere in
the chain, shouldn’t you maybe want to move it down below the chain?  Or
a COR node, shouldn’t that go above the chain after streaming?

I’m not making an argument here, I just don’t quite understand why you’d
bring up what happens with intermediate filters here.  The only reason
to drop them is because that’s what I expect users to expect of the
stream job.

> Next, with your way, what about filters, inserted above base during
> stream job? They will be between above_base and base, and will not be
> removed. So with your way, filters above base, existing before job start
> will be frozen during the job and removed after it, but filters appended
> above base during the job will be untouched. With our way, just all base
> node related filters are untouched by the job. It seems simpler
> definition for me and simpler to document.

Hm.  The documentation seems the same to me.  Either it’s “The backing
node (at the end of the job) of @base’s parent node (when starting the
job)” or “The backing node (at the end of the job) of the next
non-filter node above @base (when starting the job)”.

The problem you describe (that @above_base at the end of the job isn’t
necessarily above @base anymore) also exists with your suggestion,
namely that you can add overlays above @base after the job has started,
so @base_overlay at the end of the job isn’t necessarily the first
non-filter node above @base anymore.


OK, so after all this text, maybe some more original problem searching.
 I think it the root of the problem is that the stream job takes a @base
parameter, but as of c624b015bf14fe01, it doesn’t really matter anymore.
 Maybe c624b015bf14fe01 should have introduced a new parameter for users
to specify the bottom node instead of @base.

Well, that would have made everything a parameter mess, but it would
have saved us the trouble now.

In any case, the problem we have now is that we want a way to
automagically find out which node the bottom node should be, because the
user can’t specify it.  So the documentation is always going to be
written as “The backing node (at the end of the job) of $bottom”, where
“$bottom” is what we’re interested in figuring out.

I thought it would be best if we stick as close as possible to the
spirit of the current documentation, which basically requires @base-node
to be the backing node of the top after streaming.  (If you do graph
modifications during the job, that’s on you, because since
c624b015bf14fe01 we can’t keep the base frozen.)

Your suggestion to do basically what you consider to be right, which
comes at the caveat of being untruthful to the current documentation
even if there are no graph modifications during the job.  Luckily, the
stream job right now doesn’t work in the cases we’re looking at, so it
wouldn’t be a breaking change.  The problem I have with it is that
you’re assuming what is right and what isn’t (i.e. “Who would want to
remove filter nodes directly above @base”) without giving the user a
chance to specify.


I think it would be nice if we could have something that remains
truthful to the current documentation.  If just ignoring implicit
filters above @base would work, then I’d find that nice.  If it doesn’t,
I suppose there’s indeed little we can do but to indeed forego
@above_base and just use @base_overlay for all cases.

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [PATCH v7 14/47] stream: Deal with filters
  2020-08-10  8:12             ` Max Reitz
@ 2020-08-10 11:04               ` Vladimir Sementsov-Ogievskiy
  2020-08-14 15:18                 ` Andrey Shinkevich
                                   ` (2 more replies)
  0 siblings, 3 replies; 173+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-08-10 11:04 UTC (permalink / raw)
  To: Max Reitz, Andrey Shinkevich, qemu-block; +Cc: Kevin Wolf, qemu-devel

10.08.2020 11:12, Max Reitz wrote:
> On 07.08.20 12:29, Vladimir Sementsov-Ogievskiy wrote:
>> 16.07.2020 17:59, Max Reitz wrote:
>>> On 10.07.20 19:41, Andrey Shinkevich wrote:
>>>> On 10.07.2020 18:24, Max Reitz wrote:
>>>>> On 09.07.20 16:52, Andrey Shinkevich wrote:
>>>>>> On 25.06.2020 18:21, Max Reitz wrote:
>>>>>>> Because of the (not so recent anymore) changes that make the
>>>>>>> stream job
>>>>>>> independent of the base node and instead track the node above it, we
>>>>>>> have to split that "bottom" node into two cases: The bottom COW node,
>>>>>>> and the node directly above the base node (which may be an R/W filter
>>>>>>> or the bottom COW node).
>>>>>>>
>>>>>>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>>>>>>> ---
>>>>>>>      qapi/block-core.json |  4 +++
>>>>>>>      block/stream.c       | 63
>>>>>>> ++++++++++++++++++++++++++++++++------------
>>>>>>>      blockdev.c           |  4 ++-
>>>>>>>      3 files changed, 53 insertions(+), 18 deletions(-)
>>>>>>>
>>>>>>> diff --git a/qapi/block-core.json b/qapi/block-core.json
>>>>>>> index b20332e592..df87855429 100644
>>>>>>> --- a/qapi/block-core.json
>>>>>>> +++ b/qapi/block-core.json
>>>>>>> @@ -2486,6 +2486,10 @@
>>>>>>>      # On successful completion the image file is updated to drop the
>>>>>>> backing file
>>>>>>>      # and the BLOCK_JOB_COMPLETED event is emitted.
>>>>>>>      #
>>>>>>> +# In case @device is a filter node, block-stream modifies the first
>>>>>>> non-filter
>>>>>>> +# overlay node below it to point to base's backing node (or NULL if
>>>>>>> @base was
>>>>>>> +# not specified) instead of modifying @device itself.
>>>>>>> +#
>>>>>>>      # @job-id: identifier for the newly-created block job. If
>>>>>>>      #          omitted, the device name will be used. (Since 2.7)
>>>>>>>      #
>>>>>>> diff --git a/block/stream.c b/block/stream.c
>>>>>>> index aa2e7af98e..b9c1141656 100644
>>>>>>> --- a/block/stream.c
>>>>>>> +++ b/block/stream.c
>>>>>>> @@ -31,7 +31,8 @@ enum {
>>>>>>>        typedef struct StreamBlockJob {
>>>>>>>          BlockJob common;
>>>>>>> -    BlockDriverState *bottom;
>>>>>>> +    BlockDriverState *base_overlay; /* COW overlay (stream from
>>>>>>> this) */
>>>>>>> +    BlockDriverState *above_base;   /* Node directly above the
>>>>>>> base */
>>>>>> Keeping the base_overlay is enough to complete the stream job.
>>>>> Depends on the definition.  If we decide it isn’t enough, then it isn’t
>>>>> enough.
>>>>>
>>>>>> The above_base may disappear during the job and we can't rely on it.
>>>>> In this version of this series, it may not, because the chain is
>>>>> frozen.
>>>>>     So the above_base cannot disappear.
>>>>
>>>> Once we insert a filter above the top bs of the stream job, the parallel
>>>> jobs in
>>>>
>>>> the iotests #030 will fail with 'frozen link error'. It is because of
>>>> the
>>>>
>>>> independent parallel stream or commit jobs that insert/remove their
>>>> filters
>>>>
>>>> asynchroniously.
>>>
>>> I’m not sure whether that’s a problem with this series specifically.
>>>
>>>>> We can discuss whether we should allow it to disappear, but I think
>>>>> not.
>>>>>
>>>>> The problem is, we need something to set as the backing file after
>>>>> streaming.  How do we figure out what that should be?  My proposal
>>>>> is we
>>>>> keep above_base and use its immediate child.
>>>>
>>>> We can do the same with the base_overlay.
>>>>
>>>> If the backing node turns out to be a filter, the proper backing
>>>> child will
>>>>
>>>> be set after the filter is removed. So, we shouldn't care.
>>>
>>> And what if the user manually added some filter above the base (i.e.
>>> below base_overlay) that they want to keep after the job?
>>
>>
>> It's automatically kept, if we use base_overlay->backing->bs as final
>> backing node.
>>
>> You mean, that they want it to be dropped?
> 
> Er, yes.  Point is, the graph structure below with @base at the root may
> be different than the one right below @base_overlay.
> 
>> so, assuming the following:
>>
>> top -(backing)-> manually-inserted-filter -(file)-> base
>>
>> and user do stream with base=base, and expects filter to be removed by
>> stream job?
>>
>> Hmm, yes, such use-case is broken with our proposed way...
>>
>> ====
>>
>> Let me now clarify the problem we'll have with your way.
>>
>> When stream don't have any filter, we can easily imagine two parallel
>> stream jobs:
>>
>> top -(backing)-> mid1 -(backing)-> mid2 -(backing)-> base
>>
>> stream1: top=top, base=mid2
>> stream2: top=mid2, base=NULL
>>
>> final picture is obvious:
>>
>> top (merged with mid1) -(backing)-> mid2 (merged with base)
> 
> Yes, and I don’t think this currently working case is broken by this series.
> 
>> But we want stream job has own filter, like mirror.
> 
> Which it does not have yet, right?  Which is why I was saying that I
> don’t think this is a problem with this series.  We could try to address
> it later.
> 
> Or do you think we can’t address it later because right now all filter
> cases are broken anyway so now would be the time to make a breaking
> change (which the suggestion to not use @base as the final backing node is)?

I think, we can address it later, but it would be good to fit into one release cycle with these series, to not make incompatible behavior changes later.

> 
>> So the picture becomes more complex.
>>
>> Assume stream2 starts first.
>>
>> top -(backing)-> mid1 -(backing)-> stream2-filter -(backing)-> mid2
>> -(backing)-> base
> 
> stream2-filter would be on top of mid2, right?

Right. In my picture, "-(backing)->" means backing link. Hmm, most probably stream-filter is COR, which actually have file child. It doesn't matter here.

> 
>> Now, when we run stream1, with your solution, stream1 will freeze
>> stream2-filter
>> (wrong thing, stream2 will fail to remove it if it finished first), and
>> stream1 will
>> remove stream2-filter on finish (which is wrong as well, stream2 is not
>> prepared to
>> removing of its filter)..
> 
> Note that the user first needs to pass “mid2” as the base to the stream
> job stream1.  Why don’t they just pass “stream2-filter”?  In my model,
> the user should specify exactly which node they want not to be touched
> by this stream job, and so that would be stream2-filter, not mid1.

Hmm. I'm sure we already tried/discussed this.. But I don't remember the results. Seems, such logic should work as well. Andrey don't you remember? I think, the only difficulty is to update iotests 30 (in our stream-filter series) to use node-names, not filename.. Can we go this way?

> 
> I feel like the answer to this question has to do with implicit nodes.
> AFAIU you wanted to remove them, so I don’t think we’d want to
> special-case them here.

Agree. I think stream-filter should be explicit, like recently introduced backup-top.

> 
> If you think that we can’t expect users to pass “stream2-filter” because
> currently it should work with “mid2”, then that’s a case of implicit
> nodes and it means we should ascend from @base up to the first
> non-implicit node to get the @above_base we want.
> 
>> But, with our proposed way (freeze only chain up to base_overlay
>> inclusively, and use backing(base_overlay) as final backing), all will
>> work as expected, and two parallel jobs will work..
> 
> I don’t think it will work as expected because users can no longer
> specify which node should be the base node after streaming.  And the
> QAPI schema says that base-node is to become the backing file of the top
> node after streaming.

But this will never work with either way: base node may disappear during stream. Even with you way, they only stable thing is "above-base", which backing child may be completely another node at stream finish.

> 
> I suppose you’re arguing that streaming through filters basically just
> doesn’t work at all right now, so we’re free to do whatever?

I don't, but I like the idea :)

> 
> Well, that still leaves the problem that users should be able to specify
> which node is to become the base after streaming, and that that node
> maybe shouldn’t be restricted to immediate children of COW images.

And again, this is impossible even with your way. I have an idea:

What about making the whole thing explicit?

We add an optional parameter to stream-job: bottom-node, which is mutally exclusive with specifying base.

Then, if user specified base node, we freeze base as well, so it can't disappear. User will not be able to start parallel stream with this base node as top (because new stream can not insert a filter into frozen chain), but for sure it's rare case, used only in iotest 30 :)). Benefit: user have guarantee of what would be final backing node.

Otherwise, if user specified bottom-node, we use the way of this patch. So user can run parallel streams (iotest 30 will have to use bottom-node argument). No guarantee of final base-node, it would be backing of bottom-node at job finish.

But, this is incompatible change, and we probably should wait for 3 releases for deprecation of old behavior..

Anyway, I feel now, that you convinced me. I'm not sure that we will not have to change it make filter work, but not reason to change something now. Andrey, could you try to rebase your series on top of this and fix iotest 30 by just specifying  exact node-names in it?..


Hmmm. My thought goes further. Seems, that in this way, introducing explicit filter would be incompatible change anyway: it will break scenario with parallel stream jobs, when user specifies filenames, not node names (user will have to specify filter-node name as base for another stream job, as you said). So, it's incompatible anyway.

What do you think of it? Could we break this scenario in one release without deprecation and don't care? Than I think my idea about base vs bottom-node arguments for stream job may be applied. Or what to do?

If we can't break this scenario without a deprecation, we'll have to implement "implicit" filter, like for mirror, when filter-node-name is not specified. And for this implicit filter we'll need additional logic (closer to what I've proposed in a previous mail). Or, try to keep stream without a filter (not insert it at all and behave the old way), when filter-node-name is not specified. Than new features based on filter will be available only when filter-node-name is specified, but this is OK. The latter seems better for me.

> 
>> ====
>>
>> So, these are two mutually exclusive cases.. I vote for freezing up to
>> base_overlay, and use backing(base_overlay) as final backing, because:
>>
>> 1. I can't imaging other way to fix the case with parallel streams with
>> filters (it's not a problem of current master, but we have pending
>> series which will introduce stream job filter, and the problem will
>> appear and even break iotest 30)
> 
> Besides the question of whether the top job could just use the bottom
> job’s filter node as the base, there’s also the alternative of admitting
> defeat and declaring that you just cannot use a single node in two
> streams, because we didn’t find a way to make it work after all.
> 
> You could still create a temporary overlay in between that’s never used
> and then drop it with a trivial stream afterwards.
> 
> (But that just in case specifying the bottom job’s stream node somehow
> wouldn’t work.)

We can break existing user scenarios.. Seems unlikely. I don't know.

> 
>> 2. I don't think that removing filters above base node by stream job is
>> so important case to break parallel stream jobs in future:
>>
>>   - Stream job is not intended to remove filters, but to stream data.
>> Filters between base_overlay and base don't contain any data and
>> unrelated to stream process
> 
> Well, it is intended to remove nodes.  You can only remove data-bearing
> nodes by moving data around.  I suspect if there was a way to get the
> to-be-removed nodes removed without having to move their data around,
> that would be popular.
> 
>>   - I think, that filters are "more related" to their children than to
>> their parents. So, removing filters related to base node, when we just
>> remove all data-containing nodes between top and base (and are not going
>> to remove base node) is at least questionable.
> 
> Yes.
> 
> Although it could be argued that it is a handy way to remove filters, in
> a backing chain at least.  (Thanks to bdrv_find_overlay(), @base and
> @top still need to refer to different levels of the backing chain, but
> if we lifted that restriction, I suppose it could work for any filter
> chain.)
> 
> *shrug*
> 
>> On the contrary, removing
>> all intermediate data containing nodes _together_ with their filters is
>> absolutely correct thing to do.
> 
> I don’t think so, actually.  Like, you have a throttle node somewhere in
> the chain, shouldn’t you maybe want to move it down below the chain?  Or
> a COR node, shouldn’t that go above the chain after streaming?
> 
> I’m not making an argument here, I just don’t quite understand why you’d
> bring up what happens with intermediate filters here.  The only reason
> to drop them is because that’s what I expect users to expect of the
> stream job.
> 
>> Next, with your way, what about filters, inserted above base during
>> stream job? They will be between above_base and base, and will not be
>> removed. So with your way, filters above base, existing before job start
>> will be frozen during the job and removed after it, but filters appended
>> above base during the job will be untouched. With our way, just all base
>> node related filters are untouched by the job. It seems simpler
>> definition for me and simpler to document.
> 
> Hm.  The documentation seems the same to me.  Either it’s “The backing
> node (at the end of the job) of @base’s parent node (when starting the
> job)” or “The backing node (at the end of the job) of the next
> non-filter node above @base (when starting the job)”.
> 
> The problem you describe (that @above_base at the end of the job isn’t
> necessarily above @base anymore) also exists with your suggestion,
> namely that you can add overlays above @base after the job has started,
> so @base_overlay at the end of the job isn’t necessarily the first
> non-filter node above @base anymore.
> 
> 
> OK, so after all this text, maybe some more original problem searching.
>   I think it the root of the problem is that the stream job takes a @base
> parameter, but as of c624b015bf14fe01, it doesn’t really matter anymore.
>   Maybe c624b015bf14fe01 should have introduced a new parameter for users
> to specify the bottom node instead of @base.

Yes, absolutely agree. If we do it now, would it be incompatible change or not?

> 
> Well, that would have made everything a parameter mess, but it would
> have saved us the trouble now.
> 
> In any case, the problem we have now is that we want a way to
> automagically find out which node the bottom node should be, because the
> user can’t specify it.  So the documentation is always going to be
> written as “The backing node (at the end of the job) of $bottom”, where
> “$bottom” is what we’re interested in figuring out.
> 
> I thought it would be best if we stick as close as possible to the
> spirit of the current documentation, which basically requires @base-node
> to be the backing node of the top after streaming.  (If you do graph
> modifications during the job, that’s on you, because since
> c624b015bf14fe01 we can’t keep the base frozen.)
> 
> Your suggestion to do basically what you consider to be right, which
> comes at the caveat of being untruthful to the current documentation
> even if there are no graph modifications during the job.  Luckily, the
> stream job right now doesn’t work in the cases we’re looking at, so it
> wouldn’t be a breaking change.

I also try to think about future introduction of stream filter. It shouldn't be a breaking change as well. But now I think it can be done the hard-way if needed: just work without a filter if filter-node-name is not given. but probably I care to much. Finally we can just drop the test-case from 030 or insert additional intermediate node into it..

>  The problem I have with it is that
> you’re assuming what is right and what isn’t (i.e. “Who would want to
> remove filter nodes directly above @base”) without giving the user a
> chance to specify.
> 
> 
> I think it would be nice if we could have something that remains
> truthful to the current documentation.

Then, we should freeze base node again, so just revert c624b015bf14fe01

I go and see it cover-letter of the series introduced c624b015bf14fe01:

   This series introduces a bottom intermediate node that eliminates the
   dependency on the base that may change while stream job is running.
   It happens when stream/commit parallel jobs are running on the same
   backing chain. The base node of the stream job may be a top node of
   the parallel commit job and can change before the stream job is
   completed. We avoid that dependency by introducing the bottom node.

Hmm. Bad that we didn't add an iotest, but the series solved existing problem:
parallel stream and commit, as commit already has filter. Commit has filter
with "backing" child, so, I assume that the case worked prior to introducing
frozen chains, broken then by frozen chains and then fixed by c624b015bf14fe01.
Still I don't know are there some real users of such parallel jobs..

>  If just ignoring implicit
> filters above @base would work, then I’d find that nice.  If it doesn’t,
> I suppose there’s indeed little we can do but to indeed forego
> @above_base and just use @base_overlay for all cases.
> 

OK, at this point, I think, I'm OK with your patch in context of these series.

But I fill that something more should be done. Could we just revert c624b015bf14fe01 as "non corresponding to specification"?


-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [PATCH v7 14/47] stream: Deal with filters
  2020-08-10 11:04               ` Vladimir Sementsov-Ogievskiy
@ 2020-08-14 15:18                 ` Andrey Shinkevich
  2020-08-18 20:45                 ` Andrey Shinkevich
  2020-08-19 12:39                 ` Max Reitz
  2 siblings, 0 replies; 173+ messages in thread
From: Andrey Shinkevich @ 2020-08-14 15:18 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, Max Reitz, qemu-block
  Cc: Kevin Wolf, den, qemu-devel

On 10.08.2020 14:04, Vladimir Sementsov-Ogievskiy wrote:
> 10.08.2020 11:12, Max Reitz wrote:
>> On 07.08.20 12:29, Vladimir Sementsov-Ogievskiy wrote:
>>> 16.07.2020 17:59, Max Reitz wrote:
>>>> On 10.07.20 19:41, Andrey Shinkevich wrote:
>>>>> On 10.07.2020 18:24, Max Reitz wrote:
>>>>>> On 09.07.20 16:52, Andrey Shinkevich wrote:
>>>>>>> On 25.06.2020 18:21, Max Reitz wrote:
>>>>>>>> Because of the (not so recent anymore) changes that make the
>>>>>>>> stream job
>>>>>>>> independent of the base node and instead track the node above 
>>>>>>>> it, we
>>>>>>>> have to split that "bottom" node into two cases: The bottom COW 
>>>>>>>> node,
>>>>>>>> and the node directly above the base node (which may be an R/W 
>>>>>>>> filter
>>>>>>>> or the bottom COW node).
>>>>>>>>
>>>>>>>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>>>>>>>> ---
>>>>>>>>      qapi/block-core.json |  4 +++
>>>>>>>>      block/stream.c       | 63
>>>>>>>> ++++++++++++++++++++++++++++++++------------
>>>>>>>>      blockdev.c           |  4 ++-
>>>>>>>>      3 files changed, 53 insertions(+), 18 deletions(-)
>>>>>>>>
>>>>>>>> diff --git a/qapi/block-core.json b/qapi/block-core.json
>>>>>>>> index b20332e592..df87855429 100644
>>>>>>>> --- a/qapi/block-core.json
>>>>>>>> +++ b/qapi/block-core.json
>>>>>>>> @@ -2486,6 +2486,10 @@
>>>>>>>>      # On successful completion the image file is updated to 
>>>>>>>> drop the
>>>>>>>> backing file
>>>>>>>>      # and the BLOCK_JOB_COMPLETED event is emitted.
>>>>>>>>      #
>>>>>>>> +# In case @device is a filter node, block-stream modifies the 
>>>>>>>> first
>>>>>>>> non-filter
>>>>>>>> +# overlay node below it to point to base's backing node (or 
>>>>>>>> NULL if
>>>>>>>> @base was
>>>>>>>> +# not specified) instead of modifying @device itself.
>>>>>>>> +#
>>>>>>>>      # @job-id: identifier for the newly-created block job. If
>>>>>>>>      #          omitted, the device name will be used. (Since 2.7)
>>>>>>>>      #
>>>>>>>> diff --git a/block/stream.c b/block/stream.c
>>>>>>>> index aa2e7af98e..b9c1141656 100644
>>>>>>>> --- a/block/stream.c
>>>>>>>> +++ b/block/stream.c
>>>>>>>> @@ -31,7 +31,8 @@ enum {
>>>>>>>>        typedef struct StreamBlockJob {
>>>>>>>>          BlockJob common;
>>>>>>>> -    BlockDriverState *bottom;
>>>>>>>> +    BlockDriverState *base_overlay; /* COW overlay (stream from
>>>>>>>> this) */
>>>>>>>> +    BlockDriverState *above_base;   /* Node directly above the
>>>>>>>> base */
>>>>>>> Keeping the base_overlay is enough to complete the stream job.
>>>>>> Depends on the definition.  If we decide it isn’t enough, then it 
>>>>>> isn’t
>>>>>> enough.
>>>>>>
>>>>>>> The above_base may disappear during the job and we can't rely on 
>>>>>>> it.
>>>>>> In this version of this series, it may not, because the chain is
>>>>>> frozen.
>>>>>>     So the above_base cannot disappear.
>>>>>
>>>>> Once we insert a filter above the top bs of the stream job, the 
>>>>> parallel
>>>>> jobs in
>>>>>
>>>>> the iotests #030 will fail with 'frozen link error'. It is because of
>>>>> the
>>>>>
>>>>> independent parallel stream or commit jobs that insert/remove their
>>>>> filters
>>>>>
>>>>> asynchroniously.
>>>>
>>>> I’m not sure whether that’s a problem with this series specifically.
>>>>
>>>>>> We can discuss whether we should allow it to disappear, but I think
>>>>>> not.
>>>>>>
>>>>>> The problem is, we need something to set as the backing file after
>>>>>> streaming.  How do we figure out what that should be? My proposal
>>>>>> is we
>>>>>> keep above_base and use its immediate child.
>>>>>
>>>>> We can do the same with the base_overlay.
>>>>>
>>>>> If the backing node turns out to be a filter, the proper backing
>>>>> child will
>>>>>
>>>>> be set after the filter is removed. So, we shouldn't care.
>>>>
>>>> And what if the user manually added some filter above the base (i.e.
>>>> below base_overlay) that they want to keep after the job?
>>>
>>>
>>> It's automatically kept, if we use base_overlay->backing->bs as final
>>> backing node.
>>>
>>> You mean, that they want it to be dropped?
>>
>> Er, yes.  Point is, the graph structure below with @base at the root may
>> be different than the one right below @base_overlay.
>>
>>> so, assuming the following:
>>>
>>> top -(backing)-> manually-inserted-filter -(file)-> base
>>>
>>> and user do stream with base=base, and expects filter to be removed by
>>> stream job?
>>>
>>> Hmm, yes, such use-case is broken with our proposed way...
>>>
>>> ====
>>>
>>> Let me now clarify the problem we'll have with your way.
>>>
>>> When stream don't have any filter, we can easily imagine two parallel
>>> stream jobs:
>>>
>>> top -(backing)-> mid1 -(backing)-> mid2 -(backing)-> base
>>>
>>> stream1: top=top, base=mid2
>>> stream2: top=mid2, base=NULL
>>>
>>> final picture is obvious:
>>>
>>> top (merged with mid1) -(backing)-> mid2 (merged with base)
>>
>> Yes, and I don’t think this currently working case is broken by this 
>> series.
>>
>>> But we want stream job has own filter, like mirror.
>>
>> Which it does not have yet, right?  Which is why I was saying that I
>> don’t think this is a problem with this series.  We could try to address
>> it later.
>>
>> Or do you think we can’t address it later because right now all filter
>> cases are broken anyway so now would be the time to make a breaking
>> change (which the suggestion to not use @base as the final backing 
>> node is)?
>
> I think, we can address it later, but it would be good to fit into one 
> release cycle with these series, to not make incompatible behavior 
> changes later.
>
>>
>>> So the picture becomes more complex.
>>>
>>> Assume stream2 starts first.
>>>
>>> top -(backing)-> mid1 -(backing)-> stream2-filter -(backing)-> mid2
>>> -(backing)-> base
>>
>> stream2-filter would be on top of mid2, right?
>
> Right. In my picture, "-(backing)->" means backing link. Hmm, most 
> probably stream-filter is COR, which actually have file child. It 
> doesn't matter here.
>
>>
>>> Now, when we run stream1, with your solution, stream1 will freeze
>>> stream2-filter
>>> (wrong thing, stream2 will fail to remove it if it finished first), and
>>> stream1 will
>>> remove stream2-filter on finish (which is wrong as well, stream2 is not
>>> prepared to
>>> removing of its filter)..
>>
>> Note that the user first needs to pass “mid2” as the base to the stream
>> job stream1.  Why don’t they just pass “stream2-filter”?  In my model,
>> the user should specify exactly which node they want not to be touched
>> by this stream job, and so that would be stream2-filter, not mid1.
>
> Hmm. I'm sure we already tried/discussed this.. But I don't remember 
> the results. Seems, such logic should work as well. Andrey don't you 
> remember? I think, the only difficulty is to update iotests 30 (in our 
> stream-filter series) to use node-names, not filename.. Can we go this 
> way?
>
>>
>> I feel like the answer to this question has to do with implicit nodes.
>> AFAIU you wanted to remove them, so I don’t think we’d want to
>> special-case them here.
>
> Agree. I think stream-filter should be explicit, like recently 
> introduced backup-top.
>
>>
>> If you think that we can’t expect users to pass “stream2-filter” because
>> currently it should work with “mid2”, then that’s a case of implicit
>> nodes and it means we should ascend from @base up to the first
>> non-implicit node to get the @above_base we want.
>>
>>> But, with our proposed way (freeze only chain up to base_overlay
>>> inclusively, and use backing(base_overlay) as final backing), all will
>>> work as expected, and two parallel jobs will work..
>>
>> I don’t think it will work as expected because users can no longer
>> specify which node should be the base node after streaming.  And the
>> QAPI schema says that base-node is to become the backing file of the top
>> node after streaming.
>
> But this will never work with either way: base node may disappear 
> during stream. Even with you way, they only stable thing is 
> "above-base", which backing child may be completely another node at 
> stream finish.
>
>>
>> I suppose you’re arguing that streaming through filters basically just
>> doesn’t work at all right now, so we’re free to do whatever?
>
> I don't, but I like the idea :)
>
>>
>> Well, that still leaves the problem that users should be able to specify
>> which node is to become the base after streaming, and that that node
>> maybe shouldn’t be restricted to immediate children of COW images.
>
> And again, this is impossible even with your way. I have an idea:
>
> What about making the whole thing explicit?
>
> We add an optional parameter to stream-job: bottom-node, which is 
> mutally exclusive with specifying base.
>
> Then, if user specified base node, we freeze base as well, so it can't 
> disappear. User will not be able to start parallel stream with this 
> base node as top (because new stream can not insert a filter into 
> frozen chain), but for sure it's rare case, used only in iotest 30 
> :)). Benefit: user have guarantee of what would be final backing node.
>
> Otherwise, if user specified bottom-node, we use the way of this 
> patch. So user can run parallel streams (iotest 30 will have to use 
> bottom-node argument). No guarantee of final base-node, it would be 
> backing of bottom-node at job finish.
>
> But, this is incompatible change, and we probably should wait for 3 
> releases for deprecation of old behavior..
>
> Anyway, I feel now, that you convinced me. I'm not sure that we will 
> not have to change it make filter work, but not reason to change 
> something now. Andrey, could you try to rebase your series on top of 
> this and fix iotest 30 by just specifying  exact node-names in it?..
>

That is what I am doing now but have not completed yet. Actually, when 
we insert the COR-filter for a stream job, the cases with concurrent 
jobs fail in the iotests:030. I am trying to cope with the issues and 
still in progress. The problem I am forseeing is that if a user 
specifies the filter node as a base for a job, the name of the filter 
will be written down to the QCOW2 header on the disk. It can happen if 
bdrv_refresh_filename() returns the json-name of the filter. Max 
separated the backing file of QCOW2 header from the one, let's say, in 
the running program with this series. Imagine, If the user doesn't 
specify the backing file in a QAPI command, the program will use the 
name of the filter read from the QCOW2 header. Let me please finish the 
experiment and I will back to you with results or issues...

Andrey


>
> Hmmm. My thought goes further. Seems, that in this way, introducing 
> explicit filter would be incompatible change anyway: it will break 
> scenario with parallel stream jobs, when user specifies filenames, not 
> node names (user will have to specify filter-node name as base for 
> another stream job, as you said). So, it's incompatible anyway.
>
> What do you think of it? Could we break this scenario in one release 
> without deprecation and don't care? Than I think my idea about base vs 
> bottom-node arguments for stream job may be applied. Or what to do?
>
> If we can't break this scenario without a deprecation, we'll have to 
> implement "implicit" filter, like for mirror, when filter-node-name is 
> not specified. And for this implicit filter we'll need additional 
> logic (closer to what I've proposed in a previous mail). Or, try to 
> keep stream without a filter (not insert it at all and behave the old 
> way), when filter-node-name is not specified. Than new features based 
> on filter will be available only when filter-node-name is specified, 
> but this is OK. The latter seems better for me.
>
>>
>>> ====
>>>
>>> So, these are two mutually exclusive cases.. I vote for freezing up to
>>> base_overlay, and use backing(base_overlay) as final backing, because:
>>>
>>> 1. I can't imaging other way to fix the case with parallel streams with
>>> filters (it's not a problem of current master, but we have pending
>>> series which will introduce stream job filter, and the problem will
>>> appear and even break iotest 30)
>>
>> Besides the question of whether the top job could just use the bottom
>> job’s filter node as the base, there’s also the alternative of admitting
>> defeat and declaring that you just cannot use a single node in two
>> streams, because we didn’t find a way to make it work after all.
>>
>> You could still create a temporary overlay in between that’s never used
>> and then drop it with a trivial stream afterwards.
>>
>> (But that just in case specifying the bottom job’s stream node somehow
>> wouldn’t work.)
>
> We can break existing user scenarios.. Seems unlikely. I don't know.
>
>>
>>> 2. I don't think that removing filters above base node by stream job is
>>> so important case to break parallel stream jobs in future:
>>>
>>>   - Stream job is not intended to remove filters, but to stream data.
>>> Filters between base_overlay and base don't contain any data and
>>> unrelated to stream process
>>
>> Well, it is intended to remove nodes.  You can only remove data-bearing
>> nodes by moving data around.  I suspect if there was a way to get the
>> to-be-removed nodes removed without having to move their data around,
>> that would be popular.
>>
>>>   - I think, that filters are "more related" to their children than to
>>> their parents. So, removing filters related to base node, when we just
>>> remove all data-containing nodes between top and base (and are not 
>>> going
>>> to remove base node) is at least questionable.
>>
>> Yes.
>>
>> Although it could be argued that it is a handy way to remove filters, in
>> a backing chain at least.  (Thanks to bdrv_find_overlay(), @base and
>> @top still need to refer to different levels of the backing chain, but
>> if we lifted that restriction, I suppose it could work for any filter
>> chain.)
>>
>> *shrug*
>>
>>> On the contrary, removing
>>> all intermediate data containing nodes _together_ with their filters is
>>> absolutely correct thing to do.
>>
>> I don’t think so, actually.  Like, you have a throttle node somewhere in
>> the chain, shouldn’t you maybe want to move it down below the chain?  Or
>> a COR node, shouldn’t that go above the chain after streaming?
>>
>> I’m not making an argument here, I just don’t quite understand why you’d
>> bring up what happens with intermediate filters here.  The only reason
>> to drop them is because that’s what I expect users to expect of the
>> stream job.
>>
>>> Next, with your way, what about filters, inserted above base during
>>> stream job? They will be between above_base and base, and will not be
>>> removed. So with your way, filters above base, existing before job 
>>> start
>>> will be frozen during the job and removed after it, but filters 
>>> appended
>>> above base during the job will be untouched. With our way, just all 
>>> base
>>> node related filters are untouched by the job. It seems simpler
>>> definition for me and simpler to document.
>>
>> Hm.  The documentation seems the same to me.  Either it’s “The backing
>> node (at the end of the job) of @base’s parent node (when starting the
>> job)” or “The backing node (at the end of the job) of the next
>> non-filter node above @base (when starting the job)”.
>>
>> The problem you describe (that @above_base at the end of the job isn’t
>> necessarily above @base anymore) also exists with your suggestion,
>> namely that you can add overlays above @base after the job has started,
>> so @base_overlay at the end of the job isn’t necessarily the first
>> non-filter node above @base anymore.
>>
>>
>> OK, so after all this text, maybe some more original problem searching.
>>   I think it the root of the problem is that the stream job takes a 
>> @base
>> parameter, but as of c624b015bf14fe01, it doesn’t really matter anymore.
>>   Maybe c624b015bf14fe01 should have introduced a new parameter for 
>> users
>> to specify the bottom node instead of @base.
>
> Yes, absolutely agree. If we do it now, would it be incompatible 
> change or not?
>
>>
>> Well, that would have made everything a parameter mess, but it would
>> have saved us the trouble now.
>>
>> In any case, the problem we have now is that we want a way to
>> automagically find out which node the bottom node should be, because the
>> user can’t specify it.  So the documentation is always going to be
>> written as “The backing node (at the end of the job) of $bottom”, where
>> “$bottom” is what we’re interested in figuring out.
>>
>> I thought it would be best if we stick as close as possible to the
>> spirit of the current documentation, which basically requires @base-node
>> to be the backing node of the top after streaming.  (If you do graph
>> modifications during the job, that’s on you, because since
>> c624b015bf14fe01 we can’t keep the base frozen.)
>>
>> Your suggestion to do basically what you consider to be right, which
>> comes at the caveat of being untruthful to the current documentation
>> even if there are no graph modifications during the job. Luckily, the
>> stream job right now doesn’t work in the cases we’re looking at, so it
>> wouldn’t be a breaking change.
>
> I also try to think about future introduction of stream filter. It 
> shouldn't be a breaking change as well. But now I think it can be done 
> the hard-way if needed: just work without a filter if filter-node-name 
> is not given. but probably I care to much. Finally we can just drop 
> the test-case from 030 or insert additional intermediate node into it..
>
>>  The problem I have with it is that
>> you’re assuming what is right and what isn’t (i.e. “Who would want to
>> remove filter nodes directly above @base”) without giving the user a
>> chance to specify.
>>
>>
>> I think it would be nice if we could have something that remains
>> truthful to the current documentation.
>
> Then, we should freeze base node again, so just revert c624b015bf14fe01
>
> I go and see it cover-letter of the series introduced c624b015bf14fe01:
>
>   This series introduces a bottom intermediate node that eliminates the
>   dependency on the base that may change while stream job is running.
>   It happens when stream/commit parallel jobs are running on the same
>   backing chain. The base node of the stream job may be a top node of
>   the parallel commit job and can change before the stream job is
>   completed. We avoid that dependency by introducing the bottom node.
>
> Hmm. Bad that we didn't add an iotest, but the series solved existing 
> problem:
> parallel stream and commit, as commit already has filter. Commit has 
> filter
> with "backing" child, so, I assume that the case worked prior to 
> introducing
> frozen chains, broken then by frozen chains and then fixed by 
> c624b015bf14fe01.
> Still I don't know are there some real users of such parallel jobs..
>
>>  If just ignoring implicit
>> filters above @base would work, then I’d find that nice.  If it doesn’t,
>> I suppose there’s indeed little we can do but to indeed forego
>> @above_base and just use @base_overlay for all cases.
>>
>
> OK, at this point, I think, I'm OK with your patch in context of these 
> series.
>
> But I fill that something more should be done. Could we just revert 
> c624b015bf14fe01 as "non corresponding to specification"?
>
>


^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [PATCH v7 10/47] mirror-top: Support compressed writes
  2020-06-25 15:21 ` [PATCH v7 10/47] mirror-top: " Max Reitz
  2020-07-08 17:58   ` Andrey Shinkevich
@ 2020-08-18 10:27   ` Kevin Wolf
  2020-08-19 15:35     ` Max Reitz
  1 sibling, 1 reply; 173+ messages in thread
From: Kevin Wolf @ 2020-08-18 10:27 UTC (permalink / raw)
  To: Max Reitz; +Cc: Vladimir Sementsov-Ogievskiy, qemu-devel, qemu-block

Am 25.06.2020 um 17:21 hat Max Reitz geschrieben:
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>  block/mirror.c | 10 ++++++++++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/block/mirror.c b/block/mirror.c
> index e8e8844afc..469acf4600 100644
> --- a/block/mirror.c
> +++ b/block/mirror.c
> @@ -1480,6 +1480,15 @@ static int coroutine_fn bdrv_mirror_top_pdiscard(BlockDriverState *bs,
>                                      NULL, 0);
>  }
>  
> +static int coroutine_fn bdrv_mirror_top_pwritev_compressed(BlockDriverState *bs,
> +                                                           uint64_t offset,
> +                                                           uint64_t bytes,
> +                                                           QEMUIOVector *qiov)
> +{
> +    return bdrv_mirror_top_pwritev(bs, offset, bytes, qiov,
> +                                   BDRV_REQ_WRITE_COMPRESSED);
> +}

Hm, not sure if it's a problem, but bdrv_supports_compressed_writes()
will now return true for mirror-top. However, with an active mirror to a
target that doesn't support compression, trying to actually do a
compressed write will always return -ENOTSUP.

So I guess the set of nodes patch 7 looks at still isn't quite complete.
However, it's not obvious how to make it more complete without
delegating to the driver.

Maybe we need to use bs->supported_write_flags, which is set by the
driver, instead of looking at the presence of callbacks.

Of course, in the general case, we also should make sure that graph
changes will be reflected in bs->supported_write_flags, but we already
fail to do this in raw-format, so I guess ignoring it for now is good
enough here, too...

Kevin



^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [PATCH v7 14/47] stream: Deal with filters
  2020-06-25 15:21 ` [PATCH v7 14/47] stream: Deal with filters Max Reitz
  2020-07-09 14:52   ` Andrey Shinkevich
  2020-07-09 15:13   ` Andrey Shinkevich
@ 2020-08-18 14:28   ` Kevin Wolf
  2020-08-19 14:47     ` Max Reitz
  2 siblings, 1 reply; 173+ messages in thread
From: Kevin Wolf @ 2020-08-18 14:28 UTC (permalink / raw)
  To: Max Reitz; +Cc: Vladimir Sementsov-Ogievskiy, qemu-devel, qemu-block

Am 25.06.2020 um 17:21 hat Max Reitz geschrieben:
> Because of the (not so recent anymore) changes that make the stream job
> independent of the base node and instead track the node above it, we
> have to split that "bottom" node into two cases: The bottom COW node,
> and the node directly above the base node (which may be an R/W filter
> or the bottom COW node).
> 
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>  qapi/block-core.json |  4 +++
>  block/stream.c       | 63 ++++++++++++++++++++++++++++++++------------
>  blockdev.c           |  4 ++-
>  3 files changed, 53 insertions(+), 18 deletions(-)
> 
> diff --git a/qapi/block-core.json b/qapi/block-core.json
> index b20332e592..df87855429 100644
> --- a/qapi/block-core.json
> +++ b/qapi/block-core.json
> @@ -2486,6 +2486,10 @@
>  # On successful completion the image file is updated to drop the backing file
>  # and the BLOCK_JOB_COMPLETED event is emitted.
>  #
> +# In case @device is a filter node, block-stream modifies the first non-filter
> +# overlay node below it to point to base's backing node (or NULL if @base was
> +# not specified) instead of modifying @device itself.

Not to @base's backing node, but to @base itself (or actually, to
above_base's backing node, which is initially @base, but may have
changed when the job is completed).

Should we also document what using a filter node for @base means?

The code changes look good.

Kevin



^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [PATCH v7 14/47] stream: Deal with filters
  2020-08-10 11:04               ` Vladimir Sementsov-Ogievskiy
  2020-08-14 15:18                 ` Andrey Shinkevich
@ 2020-08-18 20:45                 ` Andrey Shinkevich
  2020-08-19 12:39                 ` Max Reitz
  2 siblings, 0 replies; 173+ messages in thread
From: Andrey Shinkevich @ 2020-08-18 20:45 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, Max Reitz, qemu-block
  Cc: Kevin Wolf, qemu-devel

Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>


On 10.08.2020 14:04, Vladimir Sementsov-Ogievskiy wrote:
> 10.08.2020 11:12, Max Reitz wrote:
>> On 07.08.20 12:29, Vladimir Sementsov-Ogievskiy wrote:
>>> 16.07.2020 17:59, Max Reitz wrote:
>>>> On 10.07.20 19:41, Andrey Shinkevich wrote:
>>>>> On 10.07.2020 18:24, Max Reitz wrote:
>>>>>> On 09.07.20 16:52, Andrey Shinkevich wrote:
>>>>>>> On 25.06.2020 18:21, Max Reitz wrote:
>>>>>>>> Because of the (not so recent anymore) changes that make the
>>>>>>>> stream job
>>>>>>>> independent of the base node and instead track the node above 
>>>>>>>> it, we
>>>>>>>> have to split that "bottom" node into two cases: The bottom COW 
>>>>>>>> node,
>>>>>>>> and the node directly above the base node (which may be an R/W 
>>>>>>>> filter
>>>>>>>> or the bottom COW node).
>>>>>>>>
>>>>>>>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>>>>>>>> ---
>>>>>>>>      qapi/block-core.json |  4 +++
>>>>>>>>      block/stream.c       | 63
>>>>>>>> ++++++++++++++++++++++++++++++++------------
>>>>>>>>      blockdev.c           |  4 ++-
>>>>>>>>      3 files changed, 53 insertions(+), 18 deletions(-)
>>>>>>>>
>>>>>>>> diff --git a/qapi/block-core.json b/qapi/block-core.json
>>>>>>>> index b20332e592..df87855429 100644
>>>>>>>> --- a/qapi/block-core.json
>>>>>>>> +++ b/qapi/block-core.json
>>>>>>>> @@ -2486,6 +2486,10 @@
>>>>>>>>      # On successful completion the image file is updated to 
>>>>>>>> drop the
>>>>>>>> backing file
>>>>>>>>      # and the BLOCK_JOB_COMPLETED event is emitted.
>>>>>>>>      #
>>>>>>>> +# In case @device is a filter node, block-stream modifies the 
>>>>>>>> first
>>>>>>>> non-filter
>>>>>>>> +# overlay node below it to point to base's backing node (or 
>>>>>>>> NULL if
>>>>>>>> @base was
>>>>>>>> +# not specified) instead of modifying @device itself.
>>>>>>>> +#
>>>>>>>>      # @job-id: identifier for the newly-created block job. If
>>>>>>>>      #          omitted, the device name will be used. (Since 2.7)
>>>>>>>>      #
>>>>>>>> diff --git a/block/stream.c b/block/stream.c
>>>>>>>> index aa2e7af98e..b9c1141656 100644
>>>>>>>> --- a/block/stream.c
>>>>>>>> +++ b/block/stream.c
>>>>>>>> @@ -31,7 +31,8 @@ enum {
>>>>>>>>        typedef struct StreamBlockJob {
>>>>>>>>          BlockJob common;
>>>>>>>> -    BlockDriverState *bottom;
>>>>>>>> +    BlockDriverState *base_overlay; /* COW overlay (stream from
>>>>>>>> this) */
>>>>>>>> +    BlockDriverState *above_base;   /* Node directly above the
>>>>>>>> base */
>>>>>>> Keeping the base_overlay is enough to complete the stream job.
>>>>>> Depends on the definition.  If we decide it isn’t enough, then it 
>>>>>> isn’t
>>>>>> enough.
>>>>>>
>>>>>>> The above_base may disappear during the job and we can't rely on 
>>>>>>> it.
>>>>>> In this version of this series, it may not, because the chain is
>>>>>> frozen.
>>>>>>     So the above_base cannot disappear.
>>>>>
>>>>> Once we insert a filter above the top bs of the stream job, the 
>>>>> parallel
>>>>> jobs in
>>>>>
>>>>> the iotests #030 will fail with 'frozen link error'. It is because of
>>>>> the
>>>>>
>>>>> independent parallel stream or commit jobs that insert/remove their
>>>>> filters
>>>>>
>>>>> asynchroniously.
>>>>
>>>> I’m not sure whether that’s a problem with this series specifically.
>>>>
>>>>>> We can discuss whether we should allow it to disappear, but I think
>>>>>> not.
>>>>>>
>>>>>> The problem is, we need something to set as the backing file after
>>>>>> streaming.  How do we figure out what that should be? My proposal
>>>>>> is we
>>>>>> keep above_base and use its immediate child.
>>>>>
>>>>> We can do the same with the base_overlay.
>>>>>
>>>>> If the backing node turns out to be a filter, the proper backing
>>>>> child will
>>>>>
>>>>> be set after the filter is removed. So, we shouldn't care.
>>>>
>>>> And what if the user manually added some filter above the base (i.e.
>>>> below base_overlay) that they want to keep after the job?
>>>
>>>
>>> It's automatically kept, if we use base_overlay->backing->bs as final
>>> backing node.
>>>
>>> You mean, that they want it to be dropped?
>>
>> Er, yes.  Point is, the graph structure below with @base at the root may
>> be different than the one right below @base_overlay.
>>
>>> so, assuming the following:
>>>
>>> top -(backing)-> manually-inserted-filter -(file)-> base
>>>
>>> and user do stream with base=base, and expects filter to be removed by
>>> stream job?
>>>
>>> Hmm, yes, such use-case is broken with our proposed way...
>>>
>>> ====
>>>
>>> Let me now clarify the problem we'll have with your way.
>>>
>>> When stream don't have any filter, we can easily imagine two parallel
>>> stream jobs:
>>>
>>> top -(backing)-> mid1 -(backing)-> mid2 -(backing)-> base
>>>
>>> stream1: top=top, base=mid2
>>> stream2: top=mid2, base=NULL
>>>
>>> final picture is obvious:
>>>
>>> top (merged with mid1) -(backing)-> mid2 (merged with base)
>>
>> Yes, and I don’t think this currently working case is broken by this 
>> series.
>>
>>> But we want stream job has own filter, like mirror.
>>
>> Which it does not have yet, right?  Which is why I was saying that I
>> don’t think this is a problem with this series.  We could try to address
>> it later.
>>
>> Or do you think we can’t address it later because right now all filter
>> cases are broken anyway so now would be the time to make a breaking
>> change (which the suggestion to not use @base as the final backing 
>> node is)?
>
> I think, we can address it later, but it would be good to fit into one 
> release cycle with these series, to not make incompatible behavior 
> changes later.
>
>>
>>> So the picture becomes more complex.
>>>
>>> Assume stream2 starts first.
>>>
>>> top -(backing)-> mid1 -(backing)-> stream2-filter -(backing)-> mid2
>>> -(backing)-> base
>>
>> stream2-filter would be on top of mid2, right?
>
> Right. In my picture, "-(backing)->" means backing link. Hmm, most 
> probably stream-filter is COR, which actually have file child. It 
> doesn't matter here.
>
>>
>>> Now, when we run stream1, with your solution, stream1 will freeze
>>> stream2-filter
>>> (wrong thing, stream2 will fail to remove it if it finished first), and
>>> stream1 will
>>> remove stream2-filter on finish (which is wrong as well, stream2 is not
>>> prepared to
>>> removing of its filter)..
>>
>> Note that the user first needs to pass “mid2” as the base to the stream
>> job stream1.  Why don’t they just pass “stream2-filter”?  In my model,
>> the user should specify exactly which node they want not to be touched
>> by this stream job, and so that would be stream2-filter, not mid1.
>
> Hmm. I'm sure we already tried/discussed this.. But I don't remember 
> the results. Seems, such logic should work as well. Andrey don't you 
> remember? I think, the only difficulty is to update iotests 30 (in our 
> stream-filter series) to use node-names, not filename.. Can we go this 
> way?
>
>>
>> I feel like the answer to this question has to do with implicit nodes.
>> AFAIU you wanted to remove them, so I don’t think we’d want to
>> special-case them here.
>
> Agree. I think stream-filter should be explicit, like recently 
> introduced backup-top.
>
>>
>> If you think that we can’t expect users to pass “stream2-filter” because
>> currently it should work with “mid2”, then that’s a case of implicit
>> nodes and it means we should ascend from @base up to the first
>> non-implicit node to get the @above_base we want.
>>
>>> But, with our proposed way (freeze only chain up to base_overlay
>>> inclusively, and use backing(base_overlay) as final backing), all will
>>> work as expected, and two parallel jobs will work..
>>
>> I don’t think it will work as expected because users can no longer
>> specify which node should be the base node after streaming.  And the
>> QAPI schema says that base-node is to become the backing file of the top
>> node after streaming.
>
> But this will never work with either way: base node may disappear 
> during stream. Even with you way, they only stable thing is 
> "above-base", which backing child may be completely another node at 
> stream finish.
>
>>
>> I suppose you’re arguing that streaming through filters basically just
>> doesn’t work at all right now, so we’re free to do whatever?
>
> I don't, but I like the idea :)
>
>>
>> Well, that still leaves the problem that users should be able to specify
>> which node is to become the base after streaming, and that that node
>> maybe shouldn’t be restricted to immediate children of COW images.
>
> And again, this is impossible even with your way. I have an idea:
>
> What about making the whole thing explicit?
>
> We add an optional parameter to stream-job: bottom-node, which is 
> mutally exclusive with specifying base.
>
> Then, if user specified base node, we freeze base as well, so it can't 
> disappear. User will not be able to start parallel stream with this 
> base node as top (because new stream can not insert a filter into 
> frozen chain), but for sure it's rare case, used only in iotest 30 
> :)). Benefit: user have guarantee of what would be final backing node.
>
> Otherwise, if user specified bottom-node, we use the way of this 
> patch. So user can run parallel streams (iotest 30 will have to use 
> bottom-node argument). No guarantee of final base-node, it would be 
> backing of bottom-node at job finish.
>
> But, this is incompatible change, and we probably should wait for 3 
> releases for deprecation of old behavior..
>
> Anyway, I feel now, that you convinced me. I'm not sure that we will 
> not have to change it make filter work, but not reason to change 
> something now. Andrey, could you try to rebase your series on top of 
> this and fix iotest 30 by just specifying  exact node-names in it?..
>
>
> Hmmm. My thought goes further. Seems, that in this way, introducing 
> explicit filter would be incompatible change anyway: it will break 
> scenario with parallel stream jobs, when user specifies filenames, not 
> node names (user will have to specify filter-node name as base for 
> another stream job, as you said). So, it's incompatible anyway.
>
> What do you think of it? Could we break this scenario in one release 
> without deprecation and don't care? Than I think my idea about base vs 
> bottom-node arguments for stream job may be applied. Or what to do?
>
> If we can't break this scenario without a deprecation, we'll have to 
> implement "implicit" filter, like for mirror, when filter-node-name is 
> not specified. And for this implicit filter we'll need additional 
> logic (closer to what I've proposed in a previous mail). Or, try to 
> keep stream without a filter (not insert it at all and behave the old 
> way), when filter-node-name is not specified. Than new features based 
> on filter will be available only when filter-node-name is specified, 
> but this is OK. The latter seems better for me.
>
>>
>>> ====
>>>
>>> So, these are two mutually exclusive cases.. I vote for freezing up to
>>> base_overlay, and use backing(base_overlay) as final backing, because:
>>>
>>> 1. I can't imaging other way to fix the case with parallel streams with
>>> filters (it's not a problem of current master, but we have pending
>>> series which will introduce stream job filter, and the problem will
>>> appear and even break iotest 30)
>>
>> Besides the question of whether the top job could just use the bottom
>> job’s filter node as the base, there’s also the alternative of admitting
>> defeat and declaring that you just cannot use a single node in two
>> streams, because we didn’t find a way to make it work after all.
>>
>> You could still create a temporary overlay in between that’s never used
>> and then drop it with a trivial stream afterwards.
>>
>> (But that just in case specifying the bottom job’s stream node somehow
>> wouldn’t work.)
>
> We can break existing user scenarios.. Seems unlikely. I don't know.
>
>>
>>> 2. I don't think that removing filters above base node by stream job is
>>> so important case to break parallel stream jobs in future:
>>>
>>>   - Stream job is not intended to remove filters, but to stream data.
>>> Filters between base_overlay and base don't contain any data and
>>> unrelated to stream process
>>
>> Well, it is intended to remove nodes.  You can only remove data-bearing
>> nodes by moving data around.  I suspect if there was a way to get the
>> to-be-removed nodes removed without having to move their data around,
>> that would be popular.
>>
>>>   - I think, that filters are "more related" to their children than to
>>> their parents. So, removing filters related to base node, when we just
>>> remove all data-containing nodes between top and base (and are not 
>>> going
>>> to remove base node) is at least questionable.
>>
>> Yes.
>>
>> Although it could be argued that it is a handy way to remove filters, in
>> a backing chain at least.  (Thanks to bdrv_find_overlay(), @base and
>> @top still need to refer to different levels of the backing chain, but
>> if we lifted that restriction, I suppose it could work for any filter
>> chain.)
>>
>> *shrug*
>>
>>> On the contrary, removing
>>> all intermediate data containing nodes _together_ with their filters is
>>> absolutely correct thing to do.
>>
>> I don’t think so, actually.  Like, you have a throttle node somewhere in
>> the chain, shouldn’t you maybe want to move it down below the chain?  Or
>> a COR node, shouldn’t that go above the chain after streaming?
>>
>> I’m not making an argument here, I just don’t quite understand why you’d
>> bring up what happens with intermediate filters here.  The only reason
>> to drop them is because that’s what I expect users to expect of the
>> stream job.
>>
>>> Next, with your way, what about filters, inserted above base during
>>> stream job? They will be between above_base and base, and will not be
>>> removed. So with your way, filters above base, existing before job 
>>> start
>>> will be frozen during the job and removed after it, but filters 
>>> appended
>>> above base during the job will be untouched. With our way, just all 
>>> base
>>> node related filters are untouched by the job. It seems simpler
>>> definition for me and simpler to document.
>>
>> Hm.  The documentation seems the same to me.  Either it’s “The backing
>> node (at the end of the job) of @base’s parent node (when starting the
>> job)” or “The backing node (at the end of the job) of the next
>> non-filter node above @base (when starting the job)”.
>>
>> The problem you describe (that @above_base at the end of the job isn’t
>> necessarily above @base anymore) also exists with your suggestion,
>> namely that you can add overlays above @base after the job has started,
>> so @base_overlay at the end of the job isn’t necessarily the first
>> non-filter node above @base anymore.
>>
>>
>> OK, so after all this text, maybe some more original problem searching.
>>   I think it the root of the problem is that the stream job takes a 
>> @base
>> parameter, but as of c624b015bf14fe01, it doesn’t really matter anymore.
>>   Maybe c624b015bf14fe01 should have introduced a new parameter for 
>> users
>> to specify the bottom node instead of @base.
>
> Yes, absolutely agree. If we do it now, would it be incompatible 
> change or not?
>
>>
>> Well, that would have made everything a parameter mess, but it would
>> have saved us the trouble now.
>>
>> In any case, the problem we have now is that we want a way to
>> automagically find out which node the bottom node should be, because the
>> user can’t specify it.  So the documentation is always going to be
>> written as “The backing node (at the end of the job) of $bottom”, where
>> “$bottom” is what we’re interested in figuring out.
>>
>> I thought it would be best if we stick as close as possible to the
>> spirit of the current documentation, which basically requires @base-node
>> to be the backing node of the top after streaming.  (If you do graph
>> modifications during the job, that’s on you, because since
>> c624b015bf14fe01 we can’t keep the base frozen.)
>>
>> Your suggestion to do basically what you consider to be right, which
>> comes at the caveat of being untruthful to the current documentation
>> even if there are no graph modifications during the job. Luckily, the
>> stream job right now doesn’t work in the cases we’re looking at, so it
>> wouldn’t be a breaking change.
>
> I also try to think about future introduction of stream filter. It 
> shouldn't be a breaking change as well. But now I think it can be done 
> the hard-way if needed: just work without a filter if filter-node-name 
> is not given. but probably I care to much. Finally we can just drop 
> the test-case from 030 or insert additional intermediate node into it..
>
>>  The problem I have with it is that
>> you’re assuming what is right and what isn’t (i.e. “Who would want to
>> remove filter nodes directly above @base”) without giving the user a
>> chance to specify.
>>
>>
>> I think it would be nice if we could have something that remains
>> truthful to the current documentation.
>
> Then, we should freeze base node again, so just revert c624b015bf14fe01
>
> I go and see it cover-letter of the series introduced c624b015bf14fe01:
>
>   This series introduces a bottom intermediate node that eliminates the
>   dependency on the base that may change while stream job is running.
>   It happens when stream/commit parallel jobs are running on the same
>   backing chain. The base node of the stream job may be a top node of
>   the parallel commit job and can change before the stream job is
>   completed. We avoid that dependency by introducing the bottom node.
>
> Hmm. Bad that we didn't add an iotest, but the series solved existing 
> problem:
> parallel stream and commit, as commit already has filter. Commit has 
> filter
> with "backing" child, so, I assume that the case worked prior to 
> introducing
> frozen chains, broken then by frozen chains and then fixed by 
> c624b015bf14fe01.
> Still I don't know are there some real users of such parallel jobs..
>
>>  If just ignoring implicit
>> filters above @base would work, then I’d find that nice.  If it doesn’t,
>> I suppose there’s indeed little we can do but to indeed forego
>> @above_base and just use @base_overlay for all cases.
>>
>
> OK, at this point, I think, I'm OK with your patch in context of these 
> series.
>
> But I fill that something more should be done. Could we just revert 
> c624b015bf14fe01 as "non corresponding to specification"?
>
>


^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [PATCH v7 27/47] blkverify: Use bdrv_sum_allocated_file_size()
  2020-06-25 15:21 ` [PATCH v7 27/47] blkverify: Use bdrv_sum_allocated_file_size() Max Reitz
  2020-07-20 15:10   ` Andrey Shinkevich
@ 2020-08-19 10:46   ` Kevin Wolf
  2020-08-19 15:50     ` Max Reitz
  1 sibling, 1 reply; 173+ messages in thread
From: Kevin Wolf @ 2020-08-19 10:46 UTC (permalink / raw)
  To: Max Reitz; +Cc: Vladimir Sementsov-Ogievskiy, qemu-devel, qemu-block

Am 25.06.2020 um 17:21 hat Max Reitz geschrieben:
> blkverify is a filter, so bdrv_get_allocated_file_size()'s default
> implementation will return only the size of its filtered child.
> However, because both of its children are disk images, it makes more
> sense to sum both of their allocated sizes.

Hm, but so are the children of, say, backup-top. I don't think you're
suggesting that backup-top should add the sizes of both images, even
though the backup job is actively increasing the allocated size of the
non-primary node, much like blkverify.

So I believe returning only the allocated size of the primary child in
blkverify would be more consistent with what we do elsewhere.

Kevin



^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [PATCH v7 25/47] block: Def. impl.s for get_allocated_file_size
  2020-06-25 15:21 ` [PATCH v7 25/47] block: Def. impl.s for get_allocated_file_size Max Reitz
  2020-07-15 22:56   ` Andrey Shinkevich
@ 2020-08-19 10:57   ` Kevin Wolf
  2020-08-19 15:53     ` Max Reitz
  1 sibling, 1 reply; 173+ messages in thread
From: Kevin Wolf @ 2020-08-19 10:57 UTC (permalink / raw)
  To: Max Reitz; +Cc: Vladimir Sementsov-Ogievskiy, qemu-devel, qemu-block

Am 25.06.2020 um 17:21 hat Max Reitz geschrieben:
> If every BlockDriver were to implement bdrv_get_allocated_file_size(),
> there are basically three ways it would be handled:
> (1) For protocol drivers: Figure out the actual allocated file size in
>     some protocol-specific way
> (2) For protocol drivers: If that is not possible (or we just have not
>     bothered to implement it yet), return -ENOTSUP
> (3) For drivers with children: Return the sum of some or all their
>     children's sizes
> 
> For the drivers we have, case (3) boils down to either:
> (a) The sum of all children's sizes
> (b) The size of the primary child
> 
> (2), (3a) and (3b) can be implemented generically, so this patch adds
> such generic implementations for drivers to use.
> 
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>  include/block/block_int.h |  5 ++++
>  block.c                   | 51 +++++++++++++++++++++++++++++++++++++++
>  2 files changed, 56 insertions(+)
> 
> diff --git a/include/block/block_int.h b/include/block/block_int.h
> index 5da793bfc3..c963ee9f28 100644
> --- a/include/block/block_int.h
> +++ b/include/block/block_int.h
> @@ -1318,6 +1318,11 @@ int coroutine_fn bdrv_co_block_status_from_backing(BlockDriverState *bs,
>                                                     int64_t *pnum,
>                                                     int64_t *map,
>                                                     BlockDriverState **file);
> +
> +int64_t bdrv_sum_allocated_file_size(BlockDriverState *bs);
> +int64_t bdrv_primary_allocated_file_size(BlockDriverState *bs);
> +int64_t bdrv_notsup_allocated_file_size(BlockDriverState *bs);
> +
>  const char *bdrv_get_parent_name(const BlockDriverState *bs);
>  void blk_dev_change_media_cb(BlockBackend *blk, bool load, Error **errp);
>  bool blk_dev_has_removable_media(BlockBackend *blk);
> diff --git a/block.c b/block.c
> index 1c71ecab7c..fc01ce90b3 100644
> --- a/block.c
> +++ b/block.c
> @@ -5003,6 +5003,57 @@ int64_t bdrv_get_allocated_file_size(BlockDriverState *bs)
>      return -ENOTSUP;
>  }
>  
> +/**
> + * Implementation of BlockDriver.bdrv_get_allocated_file_size() for
> + * block drivers that want it to sum all children they store data on.
> + * (This excludes backing children.)
> + */
> +int64_t bdrv_sum_allocated_file_size(BlockDriverState *bs)
> +{
> +    BdrvChild *child;
> +    int64_t child_size, sum = 0;
> +
> +    QLIST_FOREACH(child, &bs->children, next) {
> +        if (child->role & (BDRV_CHILD_DATA | BDRV_CHILD_METADATA |
> +                           BDRV_CHILD_FILTERED))
> +        {
> +            child_size = bdrv_get_allocated_file_size(child->bs);
> +            if (child_size < 0) {
> +                return child_size;
> +            }
> +            sum += child_size;
> +        }
> +    }
> +
> +    return sum;
> +}

The only user apart from bdrv_get_allocated_file_size() is blkverify. As
I argued that blkverify shouldn't use it, this can become static.

> +/**
> + * Implementation of BlockDriver.bdrv_get_allocated_file_size() for
> + * block drivers that want it to return only the size of a node's
> + * primary child.
> + */
> +int64_t bdrv_primary_allocated_file_size(BlockDriverState *bs)
> +{
> +    BlockDriverState *primary_bs;
> +
> +    primary_bs = bdrv_primary_bs(bs);
> +    if (!primary_bs) {
> +        return -ENOTSUP;
> +    }
> +
> +    return bdrv_get_allocated_file_size(primary_bs);
> +}

This can become static, too (never used as a callback), and possibly
even be inlined.

> +/**
> + * Implementation of BlockDriver.bdrv_get_allocated_file_size() for
> + * protocol block drivers that just do not support it.
> + */
> +int64_t bdrv_notsup_allocated_file_size(BlockDriverState *bs)
> +{
> +    return -ENOTSUP;
> +}

Also never used as a callback. I think inlining it would almost
certainly make more sense.

Kevin



^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [PATCH v7 14/47] stream: Deal with filters
  2020-08-10 11:04               ` Vladimir Sementsov-Ogievskiy
  2020-08-14 15:18                 ` Andrey Shinkevich
  2020-08-18 20:45                 ` Andrey Shinkevich
@ 2020-08-19 12:39                 ` Max Reitz
  2020-08-19 13:18                   ` Vladimir Sementsov-Ogievskiy
  2 siblings, 1 reply; 173+ messages in thread
From: Max Reitz @ 2020-08-19 12:39 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, Andrey Shinkevich, qemu-block
  Cc: Kevin Wolf, qemu-devel


[-- Attachment #1.1: Type: text/plain, Size: 4492 bytes --]

On 10.08.20 13:04, Vladimir Sementsov-Ogievskiy wrote:
> 10.08.2020 11:12, Max Reitz wrote:
>> On 07.08.20 12:29, Vladimir Sementsov-Ogievskiy wrote:

[...]

>>> But, with our proposed way (freeze only chain up to base_overlay
>>> inclusively, and use backing(base_overlay) as final backing), all will
>>> work as expected, and two parallel jobs will work..
>>
>> I don’t think it will work as expected because users can no longer
>> specify which node should be the base node after streaming.  And the
>> QAPI schema says that base-node is to become the backing file of the top
>> node after streaming.
> 
> But this will never work with either way: base node may disappear during
> stream. Even with you way, they only stable thing is "above-base", which
> backing child may be completely another node at stream finish.

Yeah, but after c624b015bf, that’s just how it is.  I thought the best
would be an approach where if there are no graph changes during the job,
you would indeed end up with @base as the backing file afterwards.

[...]

>> Well, that still leaves the problem that users should be able to specify
>> which node is to become the base after streaming, and that that node
>> maybe shouldn’t be restricted to immediate children of COW images.
> 
> And again, this is impossible even with your way. I have an idea:
> 
> What about making the whole thing explicit?
> 
> We add an optional parameter to stream-job: bottom-node, which is
> mutally exclusive with specifying base.
> 
> Then, if user specified base node, we freeze base as well, so it can't
> disappear. User will not be able to start parallel stream with this base
> node as top (because new stream can not insert a filter into frozen
> chain), but for sure it's rare case, used only in iotest 30 :)).
> Benefit: user have guarantee of what would be final backing node.

Sounds very nice to me, but...

> Otherwise, if user specified bottom-node, we use the way of this patch.
> So user can run parallel streams (iotest 30 will have to use bottom-node
> argument). No guarantee of final base-node, it would be backing of
> bottom-node at job finish.
> 
> But, this is incompatible change, and we probably should wait for 3
> releases for deprecation of old behavior..

...yeah.  Hm.  What a pain, but right, we can just deprecate it.

Unfortunately, I don’t think there’s any way we could issue a warning
(we’d want to deprecate using the @base node in something outside of the
stream job, and we can’t really detect this case to issue a warning).
So it would be a deprecation found only in the deprecation notes and the
QAPI spec...

> Anyway, I feel now, that you convinced me. I'm not sure that we will not
> have to change it make filter work, but not reason to change something
> now. Andrey, could you try to rebase your series on top of this and fix
> iotest 30 by just specifying  exact node-names in it?..
> 
> 
> Hmmm. My thought goes further. Seems, that in this way, introducing
> explicit filter would be incompatible change anyway: it will break
> scenario with parallel stream jobs, when user specifies filenames, not
> node names (user will have to specify filter-node name as base for
> another stream job, as you said). So, it's incompatible anyway.
> 
> What do you think of it? Could we break this scenario in one release
> without deprecation and don't care?

I don’t know, but I’m afraid I don’t think we can.

> Than I think my idea about base vs
> bottom-node arguments for stream job may be applied. Or what to do?
> 
> If we can't break this scenario without a deprecation, we'll have to
> implement "implicit" filter, like for mirror, when filter-node-name is
> not specified. And for this implicit filter we'll need additional logic
> (closer to what I've proposed in a previous mail). Or, try to keep
> stream without a filter (not insert it at all and behave the old way),
> when filter-node-name is not specified. Than new features based on
> filter will be available only when filter-node-name is specified, but
> this is OK. The latter seems better for me.

If that works...


OK.  So what I think we can do is first just take this patch as part of
this series.  Then, we add @bottom-node separately and deprecate @base
not being frozen.

If it’s feasible to not add a stream filter node until the deprecation
period is over, then that should work.

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [PATCH v7 14/47] stream: Deal with filters
  2020-08-19 12:39                 ` Max Reitz
@ 2020-08-19 13:18                   ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 173+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-08-19 13:18 UTC (permalink / raw)
  To: Max Reitz, Andrey Shinkevich, qemu-block; +Cc: Kevin Wolf, qemu-devel

19.08.2020 15:39, Max Reitz wrote:
> On 10.08.20 13:04, Vladimir Sementsov-Ogievskiy wrote:
>> 10.08.2020 11:12, Max Reitz wrote:
>>> On 07.08.20 12:29, Vladimir Sementsov-Ogievskiy wrote:
> 
> [...]
> 
>>>> But, with our proposed way (freeze only chain up to base_overlay
>>>> inclusively, and use backing(base_overlay) as final backing), all will
>>>> work as expected, and two parallel jobs will work..
>>>
>>> I don’t think it will work as expected because users can no longer
>>> specify which node should be the base node after streaming.  And the
>>> QAPI schema says that base-node is to become the backing file of the top
>>> node after streaming.
>>
>> But this will never work with either way: base node may disappear during
>> stream. Even with you way, they only stable thing is "above-base", which
>> backing child may be completely another node at stream finish.
> 
> Yeah, but after c624b015bf, that’s just how it is.  I thought the best
> would be an approach where if there are no graph changes during the job,
> you would indeed end up with @base as the backing file afterwards.
> 
> [...]
> 
>>> Well, that still leaves the problem that users should be able to specify
>>> which node is to become the base after streaming, and that that node
>>> maybe shouldn’t be restricted to immediate children of COW images.
>>
>> And again, this is impossible even with your way. I have an idea:
>>
>> What about making the whole thing explicit?
>>
>> We add an optional parameter to stream-job: bottom-node, which is
>> mutally exclusive with specifying base.
>>
>> Then, if user specified base node, we freeze base as well, so it can't
>> disappear. User will not be able to start parallel stream with this base
>> node as top (because new stream can not insert a filter into frozen
>> chain), but for sure it's rare case, used only in iotest 30 :)).
>> Benefit: user have guarantee of what would be final backing node.
> 
> Sounds very nice to me, but...
> 
>> Otherwise, if user specified bottom-node, we use the way of this patch.
>> So user can run parallel streams (iotest 30 will have to use bottom-node
>> argument). No guarantee of final base-node, it would be backing of
>> bottom-node at job finish.
>>
>> But, this is incompatible change, and we probably should wait for 3
>> releases for deprecation of old behavior..
> 
> ...yeah.  Hm.  What a pain, but right, we can just deprecate it.
> 
> Unfortunately, I don’t think there’s any way we could issue a warning
> (we’d want to deprecate using the @base node in something outside of the
> stream job, and we can’t really detect this case to issue a warning).
> So it would be a deprecation found only in the deprecation notes and the
> QAPI spec...

Hmm.. add "bool frozen_only_warn" field to BdrvChild and print warning
instead of fail where c->frozen cause failure?

I can make a patch, if you don't think that its too much.

> 
>> Anyway, I feel now, that you convinced me. I'm not sure that we will not
>> have to change it make filter work, but not reason to change something
>> now. Andrey, could you try to rebase your series on top of this and fix
>> iotest 30 by just specifying  exact node-names in it?..
>>
>>
>> Hmmm. My thought goes further. Seems, that in this way, introducing
>> explicit filter would be incompatible change anyway: it will break
>> scenario with parallel stream jobs, when user specifies filenames, not
>> node names (user will have to specify filter-node name as base for
>> another stream job, as you said). So, it's incompatible anyway.
>>
>> What do you think of it? Could we break this scenario in one release
>> without deprecation and don't care?
> 
> I don’t know, but I’m afraid I don’t think we can.

Actually, we already done so once:

Freezing of base was introduced in 4.0, when c624b015bf comes in 4.1
(there were no real user bugs or feature requests, it was just my [bad]
idea, related to the problem around our stream-filter and parallel
jobs in iotest 30).
So, at least for 4.0 we had frozen base.. Were there any bugs?
But now we live with not-frozen base for several releases..

> 
>> Than I think my idea about base vs
>> bottom-node arguments for stream job may be applied. Or what to do?
>>
>> If we can't break this scenario without a deprecation, we'll have to
>> implement "implicit" filter, like for mirror, when filter-node-name is
>> not specified. And for this implicit filter we'll need additional logic
>> (closer to what I've proposed in a previous mail). Or, try to keep
>> stream without a filter (not insert it at all and behave the old way),
>> when filter-node-name is not specified. Than new features based on
>> filter will be available only when filter-node-name is specified, but
>> this is OK. The latter seems better for me.
> 
> If that works...
> 
> 
> OK.  So what I think we can do is first just take this patch as part of
> this series.

Yes, let's start with it.

>  Then, we add @bottom-node separately and deprecate @base
> not being frozen.

I think we can just deprecate, and not add the bottom-node. If someone
will come during deprecation period and say that he have such use-case,
we'll add @bottom-node. Otherwise we'll just start to freeze base node
again.

> 
> If it’s feasible to not add a stream filter node until the deprecation
> period is over, then that should work.
> 
> Max
> 


-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [PATCH v7 14/47] stream: Deal with filters
  2020-08-18 14:28   ` Kevin Wolf
@ 2020-08-19 14:47     ` Max Reitz
  2020-08-19 15:16       ` Kevin Wolf
  0 siblings, 1 reply; 173+ messages in thread
From: Max Reitz @ 2020-08-19 14:47 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: Vladimir Sementsov-Ogievskiy, qemu-devel, qemu-block


[-- Attachment #1.1: Type: text/plain, Size: 2079 bytes --]

On 18.08.20 16:28, Kevin Wolf wrote:
> Am 25.06.2020 um 17:21 hat Max Reitz geschrieben:
>> Because of the (not so recent anymore) changes that make the stream job
>> independent of the base node and instead track the node above it, we
>> have to split that "bottom" node into two cases: The bottom COW node,
>> and the node directly above the base node (which may be an R/W filter
>> or the bottom COW node).
>>
>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>> ---
>>  qapi/block-core.json |  4 +++
>>  block/stream.c       | 63 ++++++++++++++++++++++++++++++++------------
>>  blockdev.c           |  4 ++-
>>  3 files changed, 53 insertions(+), 18 deletions(-)
>>
>> diff --git a/qapi/block-core.json b/qapi/block-core.json
>> index b20332e592..df87855429 100644
>> --- a/qapi/block-core.json
>> +++ b/qapi/block-core.json
>> @@ -2486,6 +2486,10 @@
>>  # On successful completion the image file is updated to drop the backing file
>>  # and the BLOCK_JOB_COMPLETED event is emitted.
>>  #
>> +# In case @device is a filter node, block-stream modifies the first non-filter
>> +# overlay node below it to point to base's backing node (or NULL if @base was
>> +# not specified) instead of modifying @device itself.
> 
> Not to @base's backing node, but to @base itself (or actually, to
> above_base's backing node, which is initially @base, but may have
> changed when the job is completed).

Oh, yes.

(I thought I had noticed that already at some point and fixed it
locally...  But apparently not.)

> Should we also document what using a filter node for @base means?

Hm.  What does it mean?  I think the more interesting case is what it
means if above_base is a filter, right?

Maybe we can put in somewhere in the “If a base file is specified then
sectors are not copied from that base file and its backing chain.”  But
the more I think about it, the less I know what we could add to it.
What happens if there are filters above @base is that their data isn’t
copied, because that’s exactly the data in @base.

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [PATCH v7 14/47] stream: Deal with filters
  2020-08-19 14:47     ` Max Reitz
@ 2020-08-19 15:16       ` Kevin Wolf
  2020-08-20  8:31         ` Max Reitz
  0 siblings, 1 reply; 173+ messages in thread
From: Kevin Wolf @ 2020-08-19 15:16 UTC (permalink / raw)
  To: Max Reitz; +Cc: Vladimir Sementsov-Ogievskiy, qemu-devel, qemu-block

[-- Attachment #1: Type: text/plain, Size: 2706 bytes --]

Am 19.08.2020 um 16:47 hat Max Reitz geschrieben:
> On 18.08.20 16:28, Kevin Wolf wrote:
> > Am 25.06.2020 um 17:21 hat Max Reitz geschrieben:
> >> Because of the (not so recent anymore) changes that make the stream job
> >> independent of the base node and instead track the node above it, we
> >> have to split that "bottom" node into two cases: The bottom COW node,
> >> and the node directly above the base node (which may be an R/W filter
> >> or the bottom COW node).
> >>
> >> Signed-off-by: Max Reitz <mreitz@redhat.com>
> >> ---
> >>  qapi/block-core.json |  4 +++
> >>  block/stream.c       | 63 ++++++++++++++++++++++++++++++++------------
> >>  blockdev.c           |  4 ++-
> >>  3 files changed, 53 insertions(+), 18 deletions(-)
> >>
> >> diff --git a/qapi/block-core.json b/qapi/block-core.json
> >> index b20332e592..df87855429 100644
> >> --- a/qapi/block-core.json
> >> +++ b/qapi/block-core.json
> >> @@ -2486,6 +2486,10 @@
> >>  # On successful completion the image file is updated to drop the backing file
> >>  # and the BLOCK_JOB_COMPLETED event is emitted.
> >>  #
> >> +# In case @device is a filter node, block-stream modifies the first non-filter
> >> +# overlay node below it to point to base's backing node (or NULL if @base was
> >> +# not specified) instead of modifying @device itself.
> > 
> > Not to @base's backing node, but to @base itself (or actually, to
> > above_base's backing node, which is initially @base, but may have
> > changed when the job is completed).
> 
> Oh, yes.
> 
> (I thought I had noticed that already at some point and fixed it
> locally...  But apparently not.)
> 
> > Should we also document what using a filter node for @base means?
> 
> Hm.  What does it mean?  I think the more interesting case is what it
> means if above_base is a filter, right?
> 
> Maybe we can put in somewhere in the “If a base file is specified then
> sectors are not copied from that base file and its backing chain.”  But
> the more I think about it, the less I know what we could add to it.
> What happens if there are filters above @base is that their data isn’t
> copied, because that’s exactly the data in @base.

The interesting part is probably the graph reconfiguration at the end of
the job. Which is actually already documented:

# When streaming completes the image file will have the base
# file as its backing file.

Of course, this is not entirely correct any more (because the base may
have changed).

If @base is a filter, what backing file path do we write into the top
layer? A json: filename including the filter? Is this worth mentioning
or do you consider it obvious?

Kevin

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [PATCH v7 10/47] mirror-top: Support compressed writes
  2020-08-18 10:27   ` Kevin Wolf
@ 2020-08-19 15:35     ` Max Reitz
  2020-08-19 16:00       ` Kevin Wolf
  0 siblings, 1 reply; 173+ messages in thread
From: Max Reitz @ 2020-08-19 15:35 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: Vladimir Sementsov-Ogievskiy, qemu-devel, qemu-block


[-- Attachment #1.1: Type: text/plain, Size: 2098 bytes --]

On 18.08.20 12:27, Kevin Wolf wrote:
> Am 25.06.2020 um 17:21 hat Max Reitz geschrieben:
>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>> ---
>>  block/mirror.c | 10 ++++++++++
>>  1 file changed, 10 insertions(+)
>>
>> diff --git a/block/mirror.c b/block/mirror.c
>> index e8e8844afc..469acf4600 100644
>> --- a/block/mirror.c
>> +++ b/block/mirror.c
>> @@ -1480,6 +1480,15 @@ static int coroutine_fn bdrv_mirror_top_pdiscard(BlockDriverState *bs,
>>                                      NULL, 0);
>>  }
>>  
>> +static int coroutine_fn bdrv_mirror_top_pwritev_compressed(BlockDriverState *bs,
>> +                                                           uint64_t offset,
>> +                                                           uint64_t bytes,
>> +                                                           QEMUIOVector *qiov)
>> +{
>> +    return bdrv_mirror_top_pwritev(bs, offset, bytes, qiov,
>> +                                   BDRV_REQ_WRITE_COMPRESSED);
>> +}
> 
> Hm, not sure if it's a problem, but bdrv_supports_compressed_writes()
> will now return true for mirror-top. However, with an active mirror to a
> target that doesn't support compression, trying to actually do a
> compressed write will always return -ENOTSUP.

Right.

> So I guess the set of nodes patch 7 looks at still isn't quite complete.
> However, it's not obvious how to make it more complete without
> delegating to the driver.
> 
> Maybe we need to use bs->supported_write_flags, which is set by the
> driver, instead of looking at the presence of callbacks.

Hm, yes, that would work better.  Not sure if it’s worth it for this series.

The only problem we’d have is late failure when trying to do a
compressed backup to a target that’s running an active mirror.  (Late as
in “first write fails without explanation”, as opposed to “job fails
during set-up”.)

Which I hope is not a case anyone would ever encounter, and even if they
do, the failure doesn’t seem catastrophic to me.  So I don’t think it’s
really a problem.

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [PATCH v7 27/47] blkverify: Use bdrv_sum_allocated_file_size()
  2020-08-19 10:46   ` Kevin Wolf
@ 2020-08-19 15:50     ` Max Reitz
  0 siblings, 0 replies; 173+ messages in thread
From: Max Reitz @ 2020-08-19 15:50 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: Vladimir Sementsov-Ogievskiy, qemu-devel, qemu-block


[-- Attachment #1.1: Type: text/plain, Size: 2001 bytes --]

On 19.08.20 12:46, Kevin Wolf wrote:
> Am 25.06.2020 um 17:21 hat Max Reitz geschrieben:
>> blkverify is a filter, so bdrv_get_allocated_file_size()'s default
>> implementation will return only the size of its filtered child.
>> However, because both of its children are disk images, it makes more
>> sense to sum both of their allocated sizes.
> 
> Hm, but so are the children of, say, backup-top. I don't think you're
> suggesting that backup-top should add the sizes of both images,

Can be argued either way.

I lean on the side of that it should not, because: The backup is
external.  The job is copying data off.  So it isn’t really directly
data that serves qemu, which can be seen from the fact that it’s never read.

Which is not the case for quorum and blkverify.  Though one can argue
that blkverify is different from quorum here in that it doesn’t read
data from the non-filtered child to serve a guest device, but just to
verify it node-internally.

> even
> though the backup job is actively increasing the allocated size of the
> non-primary node, much like blkverify.
> 
> So I believe returning only the allocated size of the primary child in
> blkverify would be more consistent with what we do elsewhere.

For me, blkverify is basically an archaic mode of quorum, and for quorum
it’s clear that we should sum the sizes.  Which is why I thought summing
the sizes would be more consistent.

But honestly, I just don’t care about blkverify whatsoever.  I don’t
believe anyone actually cares about whether what blkverify returns for
.bdrv_get_allocated_file_size() is consistent.  I believe we could
return 42 and nobody would bat an eyelash.

(But that’s the curse of this series.  I have to touch stuff that nobody
cares about, and then we have discussions on stuff nobody cares about.)

So from that POV I’m happy to drop this patch if it means there’s just
one less opportunity to have a discussion on blkverify.

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [PATCH v7 25/47] block: Def. impl.s for get_allocated_file_size
  2020-08-19 10:57   ` Kevin Wolf
@ 2020-08-19 15:53     ` Max Reitz
  0 siblings, 0 replies; 173+ messages in thread
From: Max Reitz @ 2020-08-19 15:53 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: Vladimir Sementsov-Ogievskiy, qemu-devel, qemu-block


[-- Attachment #1.1: Type: text/plain, Size: 4345 bytes --]

On 19.08.20 12:57, Kevin Wolf wrote:
> Am 25.06.2020 um 17:21 hat Max Reitz geschrieben:
>> If every BlockDriver were to implement bdrv_get_allocated_file_size(),
>> there are basically three ways it would be handled:
>> (1) For protocol drivers: Figure out the actual allocated file size in
>>     some protocol-specific way
>> (2) For protocol drivers: If that is not possible (or we just have not
>>     bothered to implement it yet), return -ENOTSUP
>> (3) For drivers with children: Return the sum of some or all their
>>     children's sizes
>>
>> For the drivers we have, case (3) boils down to either:
>> (a) The sum of all children's sizes
>> (b) The size of the primary child
>>
>> (2), (3a) and (3b) can be implemented generically, so this patch adds
>> such generic implementations for drivers to use.
>>
>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>> ---
>>  include/block/block_int.h |  5 ++++
>>  block.c                   | 51 +++++++++++++++++++++++++++++++++++++++
>>  2 files changed, 56 insertions(+)
>>
>> diff --git a/include/block/block_int.h b/include/block/block_int.h
>> index 5da793bfc3..c963ee9f28 100644
>> --- a/include/block/block_int.h
>> +++ b/include/block/block_int.h
>> @@ -1318,6 +1318,11 @@ int coroutine_fn bdrv_co_block_status_from_backing(BlockDriverState *bs,
>>                                                     int64_t *pnum,
>>                                                     int64_t *map,
>>                                                     BlockDriverState **file);
>> +
>> +int64_t bdrv_sum_allocated_file_size(BlockDriverState *bs);
>> +int64_t bdrv_primary_allocated_file_size(BlockDriverState *bs);
>> +int64_t bdrv_notsup_allocated_file_size(BlockDriverState *bs);
>> +
>>  const char *bdrv_get_parent_name(const BlockDriverState *bs);
>>  void blk_dev_change_media_cb(BlockBackend *blk, bool load, Error **errp);
>>  bool blk_dev_has_removable_media(BlockBackend *blk);
>> diff --git a/block.c b/block.c
>> index 1c71ecab7c..fc01ce90b3 100644
>> --- a/block.c
>> +++ b/block.c
>> @@ -5003,6 +5003,57 @@ int64_t bdrv_get_allocated_file_size(BlockDriverState *bs)
>>      return -ENOTSUP;
>>  }
>>  
>> +/**
>> + * Implementation of BlockDriver.bdrv_get_allocated_file_size() for
>> + * block drivers that want it to sum all children they store data on.
>> + * (This excludes backing children.)
>> + */
>> +int64_t bdrv_sum_allocated_file_size(BlockDriverState *bs)
>> +{
>> +    BdrvChild *child;
>> +    int64_t child_size, sum = 0;
>> +
>> +    QLIST_FOREACH(child, &bs->children, next) {
>> +        if (child->role & (BDRV_CHILD_DATA | BDRV_CHILD_METADATA |
>> +                           BDRV_CHILD_FILTERED))
>> +        {
>> +            child_size = bdrv_get_allocated_file_size(child->bs);
>> +            if (child_size < 0) {
>> +                return child_size;
>> +            }
>> +            sum += child_size;
>> +        }
>> +    }
>> +
>> +    return sum;
>> +}
> 
> The only user apart from bdrv_get_allocated_file_size() is blkverify. As
> I argued that blkverify shouldn't use it, this can become static.
> 
>> +/**
>> + * Implementation of BlockDriver.bdrv_get_allocated_file_size() for
>> + * block drivers that want it to return only the size of a node's
>> + * primary child.
>> + */
>> +int64_t bdrv_primary_allocated_file_size(BlockDriverState *bs)
>> +{
>> +    BlockDriverState *primary_bs;
>> +
>> +    primary_bs = bdrv_primary_bs(bs);
>> +    if (!primary_bs) {
>> +        return -ENOTSUP;
>> +    }
>> +
>> +    return bdrv_get_allocated_file_size(primary_bs);
>> +}
> 
> This can become static, too (never used as a callback), and possibly
> even be inlined.
> 
>> +/**
>> + * Implementation of BlockDriver.bdrv_get_allocated_file_size() for
>> + * protocol block drivers that just do not support it.
>> + */
>> +int64_t bdrv_notsup_allocated_file_size(BlockDriverState *bs)
>> +{
>> +    return -ENOTSUP;
>> +}
> 
> Also never used as a callback. I think inlining it would almost
> certainly make more sense.

I think they’re all artifacts from the development process, yeah.

Originally, I wanted to make .bdrv_get_allocated_file_size() mandatory,
but then I saw that led nowhere and could be done well generically.

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [PATCH v7 10/47] mirror-top: Support compressed writes
  2020-08-19 15:35     ` Max Reitz
@ 2020-08-19 16:00       ` Kevin Wolf
  0 siblings, 0 replies; 173+ messages in thread
From: Kevin Wolf @ 2020-08-19 16:00 UTC (permalink / raw)
  To: Max Reitz; +Cc: Vladimir Sementsov-Ogievskiy, qemu-devel, qemu-block

[-- Attachment #1: Type: text/plain, Size: 2633 bytes --]

Am 19.08.2020 um 17:35 hat Max Reitz geschrieben:
> On 18.08.20 12:27, Kevin Wolf wrote:
> > Am 25.06.2020 um 17:21 hat Max Reitz geschrieben:
> >> Signed-off-by: Max Reitz <mreitz@redhat.com>
> >> ---
> >>  block/mirror.c | 10 ++++++++++
> >>  1 file changed, 10 insertions(+)
> >>
> >> diff --git a/block/mirror.c b/block/mirror.c
> >> index e8e8844afc..469acf4600 100644
> >> --- a/block/mirror.c
> >> +++ b/block/mirror.c
> >> @@ -1480,6 +1480,15 @@ static int coroutine_fn bdrv_mirror_top_pdiscard(BlockDriverState *bs,
> >>                                      NULL, 0);
> >>  }
> >>  
> >> +static int coroutine_fn bdrv_mirror_top_pwritev_compressed(BlockDriverState *bs,
> >> +                                                           uint64_t offset,
> >> +                                                           uint64_t bytes,
> >> +                                                           QEMUIOVector *qiov)
> >> +{
> >> +    return bdrv_mirror_top_pwritev(bs, offset, bytes, qiov,
> >> +                                   BDRV_REQ_WRITE_COMPRESSED);
> >> +}
> > 
> > Hm, not sure if it's a problem, but bdrv_supports_compressed_writes()
> > will now return true for mirror-top. However, with an active mirror to a
> > target that doesn't support compression, trying to actually do a
> > compressed write will always return -ENOTSUP.
> 
> Right.
> 
> > So I guess the set of nodes patch 7 looks at still isn't quite complete.
> > However, it's not obvious how to make it more complete without
> > delegating to the driver.
> > 
> > Maybe we need to use bs->supported_write_flags, which is set by the
> > driver, instead of looking at the presence of callbacks.
> 
> Hm, yes, that would work better.  Not sure if it’s worth it for this
> series.

This patch looks like a feature addition that is only marginally related
to the goal of the series anyway. Maybe it should be a separate small
series on top?

The other compression related patches in the series don't seem to have
this problem, so they could stay there anyway.

> The only problem we’d have is late failure when trying to do a
> compressed backup to a target that’s running an active mirror.  (Late as
> in “first write fails without explanation”, as opposed to “job fails
> during set-up”.)
> 
> Which I hope is not a case anyone would ever encounter, and even if they
> do, the failure doesn’t seem catastrophic to me.  So I don’t think it’s
> really a problem.

Yeah, it's just a bit unfortunate to add a new function that we know
doesn't do what it promises.

Kevin

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [PATCH v7 33/47] mirror: Deal with filters
  2020-06-25 15:22 ` [PATCH v7 33/47] mirror: Deal with filters Max Reitz
  2020-07-22 18:31   ` Andrey Shinkevich
@ 2020-08-19 16:50   ` Kevin Wolf
  2020-08-20 10:28     ` Max Reitz
  1 sibling, 1 reply; 173+ messages in thread
From: Kevin Wolf @ 2020-08-19 16:50 UTC (permalink / raw)
  To: Max Reitz; +Cc: Vladimir Sementsov-Ogievskiy, qemu-devel, qemu-block

Am 25.06.2020 um 17:22 hat Max Reitz geschrieben:
> This includes some permission limiting (for example, we only need to
> take the RESIZE permission for active commits where the base is smaller
> than the top).
> 
> Use this opportunity to rename qmp_drive_mirror()'s "source" BDS to
> "target_backing_bs", because that is what it really refers to.
> 
> Signed-off-by: Max Reitz <mreitz@redhat.com>

> @@ -1682,6 +1721,7 @@ static BlockJob *mirror_start_job(
>      s->zero_target = zero_target;
>      s->copy_mode = copy_mode;
>      s->base = base;
> +    s->base_overlay = bdrv_find_overlay(bs, base);
>      s->granularity = granularity;
>      s->buf_size = ROUND_UP(buf_size, granularity);
>      s->unmap = unmap;

Is this valid without freezing the links between base_overlay and base?

Actually, I guess we should freeze everything between bs and base (for
base != NULL) and it's a preexisting problem that just happens to affect
this code, too.

Or maybe freezing everything is too much. We only want to make sure that
no non-filter is inserted between base and base_overlay and that base
(and now base_overlay) always stay in the backing chain of bs. But what
options apart from freezing do we have to achieve this?

Why is using base_overlay even better than using base? Assuming there is
a good reason, maybe the commit message could spell it out.

Kevin



^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [PATCH v7 35/47] commit: Deal with filters
  2020-06-25 15:22 ` [PATCH v7 35/47] commit: " Max Reitz
  2020-07-23 17:15   ` Andrey Shinkevich
@ 2020-08-19 17:58   ` Kevin Wolf
  2020-08-20 11:27     ` Max Reitz
  1 sibling, 1 reply; 173+ messages in thread
From: Kevin Wolf @ 2020-08-19 17:58 UTC (permalink / raw)
  To: Max Reitz; +Cc: Vladimir Sementsov-Ogievskiy, qemu-devel, qemu-block

Am 25.06.2020 um 17:22 hat Max Reitz geschrieben:
> This includes some permission limiting (for example, we only need to
> take the RESIZE permission if the base is smaller than the top).
> 
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>  block/block-backend.c          |  9 +++-
>  block/commit.c                 | 96 +++++++++++++++++++++++++---------
>  block/monitor/block-hmp-cmds.c |  2 +-
>  blockdev.c                     |  4 +-
>  4 files changed, 81 insertions(+), 30 deletions(-)
> 
> diff --git a/block/block-backend.c b/block/block-backend.c
> index 6936b25c83..7f2c7dbccc 100644
> --- a/block/block-backend.c
> +++ b/block/block-backend.c
> @@ -2271,8 +2271,13 @@ int blk_commit_all(void)
>          AioContext *aio_context = blk_get_aio_context(blk);
>  
>          aio_context_acquire(aio_context);
> -        if (blk_is_inserted(blk) && blk->root->bs->backing) {
> -            int ret = bdrv_commit(blk->root->bs);

The old code didn't try to commit nodes that don't have a backing file.

> +        if (blk_is_inserted(blk)) {
> +            BlockDriverState *non_filter;
> +            int ret;
> +
> +            /* Legacy function, so skip implicit filters */
> +            non_filter = bdrv_skip_implicit_filters(blk->root->bs);
> +            ret = bdrv_commit(non_filter);

The new one tries unconditionally. For nodes without a backing file,
bdrv_commit() will return -ENOTSUP, so the whole function fails.

(First real bug at patch 35. I almost thought I wouldn't find any!)

>              if (ret < 0) {
>                  aio_context_release(aio_context);
>                  return ret;
> diff --git a/block/commit.c b/block/commit.c
> index 7732d02dfe..4122b6736d 100644
> --- a/block/commit.c
> +++ b/block/commit.c
> @@ -37,6 +37,7 @@ typedef struct CommitBlockJob {
>      BlockBackend *top;
>      BlockBackend *base;
>      BlockDriverState *base_bs;
> +    BlockDriverState *base_overlay;
>      BlockdevOnError on_error;
>      bool base_read_only;
>      bool chain_frozen;

Hm, again this mysterious base_overlay. I know that stream introduced it
to avoid freezing the link to base, but commit doesn't seem to do that.

Is it to avoid using the block status of filter drivers between
base_overlay and base? If so, I guess that goes back to the question I
raised earlier in this series: What is the block status supposed to tell
for filter nodes?

But anyway, in contrast to mirror, commit actually freezes the chain
between commit_top_bs and base, so it should be safe at least.

> @@ -89,7 +90,7 @@ static void commit_abort(Job *job)
>       * XXX Can (or should) we somehow keep 'consistent read' blocked even
>       * after the failed/cancelled commit job is gone? If we already wrote
>       * something to base, the intermediate images aren't valid any more. */
> -    bdrv_replace_node(s->commit_top_bs, backing_bs(s->commit_top_bs),
> +    bdrv_replace_node(s->commit_top_bs, s->commit_top_bs->backing->bs,
>                        &error_abort);
>  
>      bdrv_unref(s->commit_top_bs);
> @@ -153,7 +154,7 @@ static int coroutine_fn commit_run(Job *job, Error **errp)
>              break;
>          }
>          /* Copy if allocated above the base */
> -        ret = bdrv_is_allocated_above(blk_bs(s->top), blk_bs(s->base), false,
> +        ret = bdrv_is_allocated_above(blk_bs(s->top), s->base_overlay, true,
>                                        offset, COMMIT_BUFFER_SIZE, &n);
>          copy = (ret == 1);
>          trace_commit_one_iteration(s, offset, n, ret);
> @@ -253,15 +254,35 @@ void commit_start(const char *job_id, BlockDriverState *bs,
>      CommitBlockJob *s;
>      BlockDriverState *iter;
>      BlockDriverState *commit_top_bs = NULL;
> +    BlockDriverState *filtered_base;
>      Error *local_err = NULL;
> +    int64_t base_size, top_size;
> +    uint64_t perms, iter_shared_perms;
>      int ret;
>  
>      assert(top != bs);
> -    if (top == base) {
> +    if (bdrv_skip_filters(top) == bdrv_skip_filters(base)) {
>          error_setg(errp, "Invalid files for merge: top and base are the same");
>          return;
>      }
>  
> +    base_size = bdrv_getlength(base);
> +    if (base_size < 0) {
> +        error_setg_errno(errp, -base_size, "Could not inquire base image size");
> +        return;
> +    }
> +
> +    top_size = bdrv_getlength(top);
> +    if (top_size < 0) {
> +        error_setg_errno(errp, -top_size, "Could not inquire top image size");
> +        return;
> +    }
> +
> +    perms = BLK_PERM_CONSISTENT_READ | BLK_PERM_WRITE;
> +    if (base_size < top_size) {
> +        perms |= BLK_PERM_RESIZE;
> +    }

base_perms would indicate which permissions these are (particularly
because it's not the next thing that requires permissions, but only used
further down the function).

>      s = block_job_create(job_id, &commit_job_driver, NULL, bs, 0, BLK_PERM_ALL,
>                           speed, creation_flags, NULL, NULL, errp);
>      if (!s) {
> @@ -301,17 +322,43 @@ void commit_start(const char *job_id, BlockDriverState *bs,
>  
>      s->commit_top_bs = commit_top_bs;
>  
> -    /* Block all nodes between top and base, because they will
> -     * disappear from the chain after this operation. */
> -    assert(bdrv_chain_contains(top, base));
> -    for (iter = top; iter != base; iter = backing_bs(iter)) {
> -        /* XXX BLK_PERM_WRITE needs to be allowed so we don't block ourselves
> -         * at s->base (if writes are blocked for a node, they are also blocked
> -         * for its backing file). The other options would be a second filter
> -         * driver above s->base. */
> +    /*
> +     * Block all nodes between top and base, because they will
> +     * disappear from the chain after this operation.
> +     * Note that this assumes that the user is fine with removing all
> +     * nodes (including R/W filters) between top and base.  Assuring
> +     * this is the responsibility of the interface (i.e. whoever calls
> +     * commit_start()).
> +     */
> +    s->base_overlay = bdrv_find_overlay(top, base);
> +    assert(s->base_overlay);
> +
> +    /*
> +     * The topmost node with
> +     * bdrv_skip_filters(filtered_base) == bdrv_skip_filters(base)
> +     */
> +    filtered_base = bdrv_cow_bs(s->base_overlay);
> +    assert(bdrv_skip_filters(filtered_base) == bdrv_skip_filters(base));
> +
> +    /*
> +     * XXX BLK_PERM_WRITE needs to be allowed so we don't block ourselves
> +     * at s->base (if writes are blocked for a node, they are also blocked
> +     * for its backing file). The other options would be a second filter
> +     * driver above s->base.
> +     */
> +    iter_shared_perms = BLK_PERM_WRITE_UNCHANGED | BLK_PERM_WRITE;
> +
> +    for (iter = top; iter != base; iter = bdrv_filter_or_cow_bs(iter)) {
> +        if (iter == filtered_base) {
> +            /*
> +             * From here on, all nodes are filters on the base.  This
> +             * allows us to share BLK_PERM_CONSISTENT_READ.
> +             */
> +            iter_shared_perms |= BLK_PERM_CONSISTENT_READ;
> +        }
> +
>          ret = block_job_add_bdrv(&s->common, "intermediate node", iter, 0,
> -                                 BLK_PERM_WRITE_UNCHANGED | BLK_PERM_WRITE,
> -                                 errp);
> +                                 iter_shared_perms, errp);
>          if (ret < 0) {
>              goto fail;
>          }
> @@ -328,9 +375,7 @@ void commit_start(const char *job_id, BlockDriverState *bs,
>      }
>  
>      s->base = blk_new(s->common.job.aio_context,
> -                      BLK_PERM_CONSISTENT_READ
> -                      | BLK_PERM_WRITE
> -                      | BLK_PERM_RESIZE,
> +                      perms,
>                        BLK_PERM_CONSISTENT_READ
>                        | BLK_PERM_GRAPH_MOD
>                        | BLK_PERM_WRITE_UNCHANGED);
> @@ -398,19 +443,22 @@ int bdrv_commit(BlockDriverState *bs)
>      if (!drv)
>          return -ENOMEDIUM;
>  
> -    if (!bs->backing) {
> +    backing_file_bs = bdrv_cow_bs(bs);
> +
> +    if (!backing_file_bs) {
>          return -ENOTSUP;
>      }
>  
>      if (bdrv_op_is_blocked(bs, BLOCK_OP_TYPE_COMMIT_SOURCE, NULL) ||
> -        bdrv_op_is_blocked(bs->backing->bs, BLOCK_OP_TYPE_COMMIT_TARGET, NULL)) {
> +        bdrv_op_is_blocked(backing_file_bs, BLOCK_OP_TYPE_COMMIT_TARGET, NULL))
> +    {
>          return -EBUSY;
>      }
>  
> -    ro = bs->backing->bs->read_only;
> +    ro = backing_file_bs->read_only;
>  
>      if (ro) {
> -        if (bdrv_reopen_set_read_only(bs->backing->bs, false, NULL)) {
> +        if (bdrv_reopen_set_read_only(backing_file_bs, false, NULL)) {
>              return -EACCES;
>          }
>      }
> @@ -428,8 +476,6 @@ int bdrv_commit(BlockDriverState *bs)
>      }
>  
>      /* Insert commit_top block node above backing, so we can write to it */
> -    backing_file_bs = backing_bs(bs);
> -
>      commit_top_bs = bdrv_new_open_driver(&bdrv_commit_top, NULL, BDRV_O_RDWR,
>                                           &local_err);
>      if (commit_top_bs == NULL) {
> @@ -515,15 +561,13 @@ ro_cleanup:
>      qemu_vfree(buf);
>  
>      blk_unref(backing);
> -    if (backing_file_bs) {
> -        bdrv_set_backing_hd(bs, backing_file_bs, &error_abort);
> -    }
> +    bdrv_set_backing_hd(bs, backing_file_bs, &error_abort);

This means that bdrv_set_backing_hd() is now called to undo a change
that hasn't even been made yet. This fails (with &error_abort) if the
backing chain is frozen.

On the other hand, the other bdrv_set_backing_hd() calls in the same
function would fail the same way.

>      bdrv_unref(commit_top_bs);
>      blk_unref(src);
>  
>      if (ro) {
>          /* ignoring error return here */
> -        bdrv_reopen_set_read_only(bs->backing->bs, true, NULL);
> +        bdrv_reopen_set_read_only(backing_file_bs, true, NULL);
>      }
>  
>      return ret;

Kevin



^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [PATCH v7 14/47] stream: Deal with filters
  2020-08-19 15:16       ` Kevin Wolf
@ 2020-08-20  8:31         ` Max Reitz
  2020-08-20  9:22           ` Max Reitz
  0 siblings, 1 reply; 173+ messages in thread
From: Max Reitz @ 2020-08-20  8:31 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: Vladimir Sementsov-Ogievskiy, qemu-devel, qemu-block


[-- Attachment #1.1: Type: text/plain, Size: 4213 bytes --]

On 19.08.20 17:16, Kevin Wolf wrote:
> Am 19.08.2020 um 16:47 hat Max Reitz geschrieben:
>> On 18.08.20 16:28, Kevin Wolf wrote:
>>> Am 25.06.2020 um 17:21 hat Max Reitz geschrieben:
>>>> Because of the (not so recent anymore) changes that make the stream job
>>>> independent of the base node and instead track the node above it, we
>>>> have to split that "bottom" node into two cases: The bottom COW node,
>>>> and the node directly above the base node (which may be an R/W filter
>>>> or the bottom COW node).
>>>>
>>>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>>>> ---
>>>>  qapi/block-core.json |  4 +++
>>>>  block/stream.c       | 63 ++++++++++++++++++++++++++++++++------------
>>>>  blockdev.c           |  4 ++-
>>>>  3 files changed, 53 insertions(+), 18 deletions(-)
>>>>
>>>> diff --git a/qapi/block-core.json b/qapi/block-core.json
>>>> index b20332e592..df87855429 100644
>>>> --- a/qapi/block-core.json
>>>> +++ b/qapi/block-core.json
>>>> @@ -2486,6 +2486,10 @@
>>>>  # On successful completion the image file is updated to drop the backing file
>>>>  # and the BLOCK_JOB_COMPLETED event is emitted.
>>>>  #
>>>> +# In case @device is a filter node, block-stream modifies the first non-filter
>>>> +# overlay node below it to point to base's backing node (or NULL if @base was
>>>> +# not specified) instead of modifying @device itself.
>>>
>>> Not to @base's backing node, but to @base itself (or actually, to
>>> above_base's backing node, which is initially @base, but may have
>>> changed when the job is completed).
>>
>> Oh, yes.
>>
>> (I thought I had noticed that already at some point and fixed it
>> locally...  But apparently not.)
>>
>>> Should we also document what using a filter node for @base means?
>>
>> Hm.  What does it mean?  I think the more interesting case is what it
>> means if above_base is a filter, right?
>>
>> Maybe we can put in somewhere in the “If a base file is specified then
>> sectors are not copied from that base file and its backing chain.”  But
>> the more I think about it, the less I know what we could add to it.
>> What happens if there are filters above @base is that their data isn’t
>> copied, because that’s exactly the data in @base.
> 
> The interesting part is probably the graph reconfiguration at the end of
> the job. Which is actually already documented:
> 
> # When streaming completes the image file will have the base
> # file as its backing file.
> 
> Of course, this is not entirely correct any more (because the base may
> have changed).
> 
> If @base is a filter, what backing file path do we write into the top
> layer? A json: filename including the filter?

Yes.

Or, actually.  Now that I read the code...  It takes @base’s filename
before the stream job and then uses that.  So if @base has changed
during the job, then it still uses the old filename.

But that’s not really due to this series.

> Is this worth mentioning
> or do you consider it obvious?

Hm.  I consider it obvious, yes.  @base becomes @top’s backing file (at
least without any graph changes while the job is running), so naturally
what’s written into the image header is @base’s filename – which is a
json:{} filename.

On second thought, @backing-file’s description mysteriously states that
“QEMU will automatically determine the backing file string to use”.
Which makes sense because it would clearly not make sense to describe
what’s actually happening, which is to use @base’s filename at job start
regardless of whether it’s still there at the end of the job.

So I suppose I have the choice of either documenting exactly what’s
happening, even though it doesn’t make much sense, or just not, keeping
it mysterious.

So all in all, I believe the biggest surprise about what’s written into
the top layer isn’t that it may be a json:{} filename, but the filename
of a node that maybe doesn’t even exist anymore?  (Oh, no, please don’t
tell me you can delete it and get an invalid pointer read...)

The more I think about it, the more I think there are problems beyond
the scope of this series here. :/

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [PATCH v7 14/47] stream: Deal with filters
  2020-08-20  8:31         ` Max Reitz
@ 2020-08-20  9:22           ` Max Reitz
  2020-08-20 10:49             ` Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 173+ messages in thread
From: Max Reitz @ 2020-08-20  9:22 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: Vladimir Sementsov-Ogievskiy, qemu-devel, qemu-block


[-- Attachment #1.1: Type: text/plain, Size: 446 bytes --]

On 20.08.20 10:31, Max Reitz wrote:

[...]

> So all in all, I believe the biggest surprise about what’s written into
> the top layer isn’t that it may be a json:{} filename, but the filename
> of a node that maybe doesn’t even exist anymore?  (Oh, no, please don’t
> tell me you can delete it and get an invalid pointer read...)

(I tried triggering that, but, oh, it’s strdup’ed() in stream_start().
I’m a bit daft.)


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [PATCH v7 33/47] mirror: Deal with filters
  2020-08-19 16:50   ` Kevin Wolf
@ 2020-08-20 10:28     ` Max Reitz
  0 siblings, 0 replies; 173+ messages in thread
From: Max Reitz @ 2020-08-20 10:28 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: Vladimir Sementsov-Ogievskiy, qemu-devel, qemu-block


[-- Attachment #1.1: Type: text/plain, Size: 1975 bytes --]

On 19.08.20 18:50, Kevin Wolf wrote:
> Am 25.06.2020 um 17:22 hat Max Reitz geschrieben:
>> This includes some permission limiting (for example, we only need to
>> take the RESIZE permission for active commits where the base is smaller
>> than the top).
>>
>> Use this opportunity to rename qmp_drive_mirror()'s "source" BDS to
>> "target_backing_bs", because that is what it really refers to.
>>
>> Signed-off-by: Max Reitz <mreitz@redhat.com>
> 
>> @@ -1682,6 +1721,7 @@ static BlockJob *mirror_start_job(
>>      s->zero_target = zero_target;
>>      s->copy_mode = copy_mode;
>>      s->base = base;
>> +    s->base_overlay = bdrv_find_overlay(bs, base);
>>      s->granularity = granularity;
>>      s->buf_size = ROUND_UP(buf_size, granularity);
>>      s->unmap = unmap;
> 
> Is this valid without freezing the links between base_overlay and base?

Er...

> Actually, I guess we should freeze everything between bs and base (for
> base != NULL) and it's a preexisting problem that just happens to affect
> this code, too.

Yes, that’s how it looks to me, too.  I don’t think that has anything to
do with this patch.

> Or maybe freezing everything is too much. We only want to make sure that
> no non-filter is inserted between base and base_overlay and that base
> (and now base_overlay) always stay in the backing chain of bs. But what
> options apart from freezing do we have to achieve this?

I don’t know of any, and I don’t know whether anyone would actually care
if we were to just freeze everything.

> Why is using base_overlay even better than using base? Assuming there is
> a good reason, maybe the commit message could spell it out.

The problem is that querying the block status for a filter node falls
through to the underlying data-carrying node.  So if there’s a filter on
top of @base, and we query for is_allocated_above above @base, then
we’ll include @base, which we do not want.

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [PATCH v7 14/47] stream: Deal with filters
  2020-08-20  9:22           ` Max Reitz
@ 2020-08-20 10:49             ` Vladimir Sementsov-Ogievskiy
  2020-08-20 11:43               ` Max Reitz
  0 siblings, 1 reply; 173+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-08-20 10:49 UTC (permalink / raw)
  To: Max Reitz, Kevin Wolf; +Cc: qemu-devel, qemu-block

20.08.2020 12:22, Max Reitz wrote:
> On 20.08.20 10:31, Max Reitz wrote:
> 
> [...]
> 
>> So all in all, I believe the biggest surprise about what’s written into
>> the top layer isn’t that it may be a json:{} filename, but the filename
>> of a node that maybe doesn’t even exist anymore?  (Oh, no, please don’t
>> tell me you can delete it and get an invalid pointer read...)
> 
> (I tried triggering that, but, oh, it’s strdup’ed() in stream_start().
> I’m a bit daft.)
> 


If it's broken anyway, probably we can just revert c624b015bf and start to freeze base again?


-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [PATCH v7 35/47] commit: Deal with filters
  2020-08-19 17:58   ` Kevin Wolf
@ 2020-08-20 11:27     ` Max Reitz
  2020-08-20 13:47       ` Kevin Wolf
  0 siblings, 1 reply; 173+ messages in thread
From: Max Reitz @ 2020-08-20 11:27 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: Vladimir Sementsov-Ogievskiy, qemu-devel, qemu-block


[-- Attachment #1.1: Type: text/plain, Size: 11064 bytes --]

On 19.08.20 19:58, Kevin Wolf wrote:
> Am 25.06.2020 um 17:22 hat Max Reitz geschrieben:
>> This includes some permission limiting (for example, we only need to
>> take the RESIZE permission if the base is smaller than the top).
>>
>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>> ---
>>  block/block-backend.c          |  9 +++-
>>  block/commit.c                 | 96 +++++++++++++++++++++++++---------
>>  block/monitor/block-hmp-cmds.c |  2 +-
>>  blockdev.c                     |  4 +-
>>  4 files changed, 81 insertions(+), 30 deletions(-)
>>
>> diff --git a/block/block-backend.c b/block/block-backend.c
>> index 6936b25c83..7f2c7dbccc 100644
>> --- a/block/block-backend.c
>> +++ b/block/block-backend.c
>> @@ -2271,8 +2271,13 @@ int blk_commit_all(void)
>>          AioContext *aio_context = blk_get_aio_context(blk);
>>  
>>          aio_context_acquire(aio_context);
>> -        if (blk_is_inserted(blk) && blk->root->bs->backing) {
>> -            int ret = bdrv_commit(blk->root->bs);
> 
> The old code didn't try to commit nodes that don't have a backing file.
> 
>> +        if (blk_is_inserted(blk)) {
>> +            BlockDriverState *non_filter;
>> +            int ret;
>> +
>> +            /* Legacy function, so skip implicit filters */
>> +            non_filter = bdrv_skip_implicit_filters(blk->root->bs);
>> +            ret = bdrv_commit(non_filter);
> 
> The new one tries unconditionally. For nodes without a backing file,
> bdrv_commit() will return -ENOTSUP, so the whole function fails.

:(

Hm.  Should I fix it by checking for
bdrv_cow_bs(bdrv_skip_implicit_filters())?  Or bdrv_backing_chain_next()
and change the bdrv_skip_implicit_filters() to a bdrv_skip_filters()?  I
feel like that would make even more sense.

> (First real bug at patch 35. I almost thought I wouldn't find any!)

:)

>>              if (ret < 0) {
>>                  aio_context_release(aio_context);
>>                  return ret;
>> diff --git a/block/commit.c b/block/commit.c
>> index 7732d02dfe..4122b6736d 100644
>> --- a/block/commit.c
>> +++ b/block/commit.c
>> @@ -37,6 +37,7 @@ typedef struct CommitBlockJob {
>>      BlockBackend *top;
>>      BlockBackend *base;
>>      BlockDriverState *base_bs;
>> +    BlockDriverState *base_overlay;
>>      BlockdevOnError on_error;
>>      bool base_read_only;
>>      bool chain_frozen;
> 
> Hm, again this mysterious base_overlay. I know that stream introduced it
> to avoid freezing the link to base, but commit doesn't seem to do that.
> 
> Is it to avoid using the block status of filter drivers between
> base_overlay and base?

Yes.

> If so, I guess that goes back to the question I
> raised earlier in this series: What is the block status supposed to tell
> for filter nodes?

Honestly, I would really like to get away without having to answer that
question in this series.  Intuitively, I feel like falling through to
the next data-bearing layer is not something most callers want.  But I’d
rather investigate that question separately from this series (even
though that likely means we’ll never do it), and just treat it as it is
in this series.

> But anyway, in contrast to mirror, commit actually freezes the chain
> between commit_top_bs and base, so it should be safe at least.
> 
>> @@ -89,7 +90,7 @@ static void commit_abort(Job *job)
>>       * XXX Can (or should) we somehow keep 'consistent read' blocked even
>>       * after the failed/cancelled commit job is gone? If we already wrote
>>       * something to base, the intermediate images aren't valid any more. */
>> -    bdrv_replace_node(s->commit_top_bs, backing_bs(s->commit_top_bs),
>> +    bdrv_replace_node(s->commit_top_bs, s->commit_top_bs->backing->bs,
>>                        &error_abort);
>>  
>>      bdrv_unref(s->commit_top_bs);
>> @@ -153,7 +154,7 @@ static int coroutine_fn commit_run(Job *job, Error **errp)
>>              break;
>>          }
>>          /* Copy if allocated above the base */
>> -        ret = bdrv_is_allocated_above(blk_bs(s->top), blk_bs(s->base), false,
>> +        ret = bdrv_is_allocated_above(blk_bs(s->top), s->base_overlay, true,
>>                                        offset, COMMIT_BUFFER_SIZE, &n);
>>          copy = (ret == 1);
>>          trace_commit_one_iteration(s, offset, n, ret);
>> @@ -253,15 +254,35 @@ void commit_start(const char *job_id, BlockDriverState *bs,
>>      CommitBlockJob *s;
>>      BlockDriverState *iter;
>>      BlockDriverState *commit_top_bs = NULL;
>> +    BlockDriverState *filtered_base;
>>      Error *local_err = NULL;
>> +    int64_t base_size, top_size;
>> +    uint64_t perms, iter_shared_perms;
>>      int ret;
>>  
>>      assert(top != bs);
>> -    if (top == base) {
>> +    if (bdrv_skip_filters(top) == bdrv_skip_filters(base)) {
>>          error_setg(errp, "Invalid files for merge: top and base are the same");
>>          return;
>>      }
>>  
>> +    base_size = bdrv_getlength(base);
>> +    if (base_size < 0) {
>> +        error_setg_errno(errp, -base_size, "Could not inquire base image size");
>> +        return;
>> +    }
>> +
>> +    top_size = bdrv_getlength(top);
>> +    if (top_size < 0) {
>> +        error_setg_errno(errp, -top_size, "Could not inquire top image size");
>> +        return;
>> +    }
>> +
>> +    perms = BLK_PERM_CONSISTENT_READ | BLK_PERM_WRITE;
>> +    if (base_size < top_size) {
>> +        perms |= BLK_PERM_RESIZE;
>> +    }
> 
> base_perms would indicate which permissions these are (particularly
> because it's not the next thing that requires permissions, but only used
> further down the function).

%s/\<perms\>/base_perms/?  Sure.

>>      s = block_job_create(job_id, &commit_job_driver, NULL, bs, 0, BLK_PERM_ALL,
>>                           speed, creation_flags, NULL, NULL, errp);
>>      if (!s) {
>> @@ -301,17 +322,43 @@ void commit_start(const char *job_id, BlockDriverState *bs,
>>  
>>      s->commit_top_bs = commit_top_bs;
>>  
>> -    /* Block all nodes between top and base, because they will
>> -     * disappear from the chain after this operation. */
>> -    assert(bdrv_chain_contains(top, base));
>> -    for (iter = top; iter != base; iter = backing_bs(iter)) {
>> -        /* XXX BLK_PERM_WRITE needs to be allowed so we don't block ourselves
>> -         * at s->base (if writes are blocked for a node, they are also blocked
>> -         * for its backing file). The other options would be a second filter
>> -         * driver above s->base. */
>> +    /*
>> +     * Block all nodes between top and base, because they will
>> +     * disappear from the chain after this operation.
>> +     * Note that this assumes that the user is fine with removing all
>> +     * nodes (including R/W filters) between top and base.  Assuring
>> +     * this is the responsibility of the interface (i.e. whoever calls
>> +     * commit_start()).
>> +     */
>> +    s->base_overlay = bdrv_find_overlay(top, base);
>> +    assert(s->base_overlay);
>> +
>> +    /*
>> +     * The topmost node with
>> +     * bdrv_skip_filters(filtered_base) == bdrv_skip_filters(base)
>> +     */
>> +    filtered_base = bdrv_cow_bs(s->base_overlay);
>> +    assert(bdrv_skip_filters(filtered_base) == bdrv_skip_filters(base));
>> +
>> +    /*
>> +     * XXX BLK_PERM_WRITE needs to be allowed so we don't block ourselves
>> +     * at s->base (if writes are blocked for a node, they are also blocked
>> +     * for its backing file). The other options would be a second filter
>> +     * driver above s->base.
>> +     */
>> +    iter_shared_perms = BLK_PERM_WRITE_UNCHANGED | BLK_PERM_WRITE;
>> +
>> +    for (iter = top; iter != base; iter = bdrv_filter_or_cow_bs(iter)) {
>> +        if (iter == filtered_base) {
>> +            /*
>> +             * From here on, all nodes are filters on the base.  This
>> +             * allows us to share BLK_PERM_CONSISTENT_READ.
>> +             */
>> +            iter_shared_perms |= BLK_PERM_CONSISTENT_READ;
>> +        }
>> +
>>          ret = block_job_add_bdrv(&s->common, "intermediate node", iter, 0,
>> -                                 BLK_PERM_WRITE_UNCHANGED | BLK_PERM_WRITE,
>> -                                 errp);
>> +                                 iter_shared_perms, errp);
>>          if (ret < 0) {
>>              goto fail;
>>          }
>> @@ -328,9 +375,7 @@ void commit_start(const char *job_id, BlockDriverState *bs,
>>      }
>>  
>>      s->base = blk_new(s->common.job.aio_context,
>> -                      BLK_PERM_CONSISTENT_READ
>> -                      | BLK_PERM_WRITE
>> -                      | BLK_PERM_RESIZE,
>> +                      perms,
>>                        BLK_PERM_CONSISTENT_READ
>>                        | BLK_PERM_GRAPH_MOD
>>                        | BLK_PERM_WRITE_UNCHANGED);
>> @@ -398,19 +443,22 @@ int bdrv_commit(BlockDriverState *bs)
>>      if (!drv)
>>          return -ENOMEDIUM;
>>  
>> -    if (!bs->backing) {
>> +    backing_file_bs = bdrv_cow_bs(bs);
>> +
>> +    if (!backing_file_bs) {
>>          return -ENOTSUP;
>>      }
>>  
>>      if (bdrv_op_is_blocked(bs, BLOCK_OP_TYPE_COMMIT_SOURCE, NULL) ||
>> -        bdrv_op_is_blocked(bs->backing->bs, BLOCK_OP_TYPE_COMMIT_TARGET, NULL)) {
>> +        bdrv_op_is_blocked(backing_file_bs, BLOCK_OP_TYPE_COMMIT_TARGET, NULL))
>> +    {
>>          return -EBUSY;
>>      }
>>  
>> -    ro = bs->backing->bs->read_only;
>> +    ro = backing_file_bs->read_only;
>>  
>>      if (ro) {
>> -        if (bdrv_reopen_set_read_only(bs->backing->bs, false, NULL)) {
>> +        if (bdrv_reopen_set_read_only(backing_file_bs, false, NULL)) {
>>              return -EACCES;
>>          }
>>      }
>> @@ -428,8 +476,6 @@ int bdrv_commit(BlockDriverState *bs)
>>      }
>>  
>>      /* Insert commit_top block node above backing, so we can write to it */
>> -    backing_file_bs = backing_bs(bs);
>> -
>>      commit_top_bs = bdrv_new_open_driver(&bdrv_commit_top, NULL, BDRV_O_RDWR,
>>                                           &local_err);
>>      if (commit_top_bs == NULL) {
>> @@ -515,15 +561,13 @@ ro_cleanup:
>>      qemu_vfree(buf);
>>  
>>      blk_unref(backing);
>> -    if (backing_file_bs) {
>> -        bdrv_set_backing_hd(bs, backing_file_bs, &error_abort);
>> -    }
>> +    bdrv_set_backing_hd(bs, backing_file_bs, &error_abort);
> 
> This means that bdrv_set_backing_hd() is now called to undo a change
> that hasn't even been made yet. This fails (with &error_abort) if the
> backing chain is frozen.
> 
> On the other hand, the other bdrv_set_backing_hd() calls in the same
> function would fail the same way.

True. :)

Still, maybe there’s an op blocker from a concurrent job, so we go to
the failure path and then we’d abort here.  So better to guard it by
checking whether bdrv_cow_bs(bs) != backing_file_bs.

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [PATCH v7 14/47] stream: Deal with filters
  2020-08-20 10:49             ` Vladimir Sementsov-Ogievskiy
@ 2020-08-20 11:43               ` Max Reitz
  0 siblings, 0 replies; 173+ messages in thread
From: Max Reitz @ 2020-08-20 11:43 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, Kevin Wolf; +Cc: qemu-devel, qemu-block


[-- Attachment #1.1: Type: text/plain, Size: 983 bytes --]

On 20.08.20 12:49, Vladimir Sementsov-Ogievskiy wrote:
> 20.08.2020 12:22, Max Reitz wrote:
>> On 20.08.20 10:31, Max Reitz wrote:
>>
>> [...]
>>
>>> So all in all, I believe the biggest surprise about what’s written into
>>> the top layer isn’t that it may be a json:{} filename, but the filename
>>> of a node that maybe doesn’t even exist anymore?  (Oh, no, please don’t
>>> tell me you can delete it and get an invalid pointer read...)
>>
>> (I tried triggering that, but, oh, it’s strdup’ed() in stream_start().
>> I’m a bit daft.)
>>
> 
> 
> If it's broken anyway, probably we can just revert c624b015bf and start
> to freeze base again?

Well, it’s only broken if you care about the backing filename string
that’s written to @top.  So it isn’t broken altogether.

Though, well.  If we all agree to just revert it and maybe add a @bottom
parameter instead, then I suppose we could do it.

(Maybe in a follow-up, though.)

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [PATCH v7 35/47] commit: Deal with filters
  2020-08-20 11:27     ` Max Reitz
@ 2020-08-20 13:47       ` Kevin Wolf
  0 siblings, 0 replies; 173+ messages in thread
From: Kevin Wolf @ 2020-08-20 13:47 UTC (permalink / raw)
  To: Max Reitz; +Cc: Vladimir Sementsov-Ogievskiy, qemu-devel, qemu-block

[-- Attachment #1: Type: text/plain, Size: 12640 bytes --]

Am 20.08.2020 um 13:27 hat Max Reitz geschrieben:
> On 19.08.20 19:58, Kevin Wolf wrote:
> > Am 25.06.2020 um 17:22 hat Max Reitz geschrieben:
> >> This includes some permission limiting (for example, we only need to
> >> take the RESIZE permission if the base is smaller than the top).
> >>
> >> Signed-off-by: Max Reitz <mreitz@redhat.com>
> >> ---
> >>  block/block-backend.c          |  9 +++-
> >>  block/commit.c                 | 96 +++++++++++++++++++++++++---------
> >>  block/monitor/block-hmp-cmds.c |  2 +-
> >>  blockdev.c                     |  4 +-
> >>  4 files changed, 81 insertions(+), 30 deletions(-)
> >>
> >> diff --git a/block/block-backend.c b/block/block-backend.c
> >> index 6936b25c83..7f2c7dbccc 100644
> >> --- a/block/block-backend.c
> >> +++ b/block/block-backend.c
> >> @@ -2271,8 +2271,13 @@ int blk_commit_all(void)
> >>          AioContext *aio_context = blk_get_aio_context(blk);
> >>  
> >>          aio_context_acquire(aio_context);
> >> -        if (blk_is_inserted(blk) && blk->root->bs->backing) {
> >> -            int ret = bdrv_commit(blk->root->bs);
> > 
> > The old code didn't try to commit nodes that don't have a backing file.
> > 
> >> +        if (blk_is_inserted(blk)) {
> >> +            BlockDriverState *non_filter;
> >> +            int ret;
> >> +
> >> +            /* Legacy function, so skip implicit filters */
> >> +            non_filter = bdrv_skip_implicit_filters(blk->root->bs);
> >> +            ret = bdrv_commit(non_filter);
> > 
> > The new one tries unconditionally. For nodes without a backing file,
> > bdrv_commit() will return -ENOTSUP, so the whole function fails.
> 
> :(
> 
> Hm.  Should I fix it by checking for
> bdrv_cow_bs(bdrv_skip_implicit_filters())?  Or bdrv_backing_chain_next()
> and change the bdrv_skip_implicit_filters() to a bdrv_skip_filters()?  I
> feel like that would make even more sense.

I agree that bdrv_skip_filters() makes more sense. If I have a qcow2
image and an explicit throttle filter on top, there is no reason to skip
this image.

bdrv_backing_chain_next() or bdrv_cow_bs() should be the same in a
boolean context, so I'd vote for bdrv_cow_bs() because it has less work
to do to get the same result.

> > (First real bug at patch 35. I almost thought I wouldn't find any!)
> 
> :)
> 
> >>              if (ret < 0) {
> >>                  aio_context_release(aio_context);
> >>                  return ret;
> >> diff --git a/block/commit.c b/block/commit.c
> >> index 7732d02dfe..4122b6736d 100644
> >> --- a/block/commit.c
> >> +++ b/block/commit.c
> >> @@ -37,6 +37,7 @@ typedef struct CommitBlockJob {
> >>      BlockBackend *top;
> >>      BlockBackend *base;
> >>      BlockDriverState *base_bs;
> >> +    BlockDriverState *base_overlay;
> >>      BlockdevOnError on_error;
> >>      bool base_read_only;
> >>      bool chain_frozen;
> > 
> > Hm, again this mysterious base_overlay. I know that stream introduced it
> > to avoid freezing the link to base, but commit doesn't seem to do that.
> > 
> > Is it to avoid using the block status of filter drivers between
> > base_overlay and base?
> 
> Yes.
> 
> > If so, I guess that goes back to the question I
> > raised earlier in this series: What is the block status supposed to tell
> > for filter nodes?
> 
> Honestly, I would really like to get away without having to answer that
> question in this series.  Intuitively, I feel like falling through to
> the next data-bearing layer is not something most callers want.  But I’d
> rather investigate that question separately from this series (even
> though that likely means we’ll never do it), and just treat it as it is
> in this series.

Well, I'm asking the question because not having the answer makes us
jump through hoops in this series to accomodate a behaviour it probably
shouldn't even have. (Because I agree that filters should probably keep
DATA clear, i.e. they are never the layer that defines the content.)

Additional node references (i.e. references that are not edges in the
graph) always make the design more complicated and require us to
consider more things like what happens on graph changes. So it's a
question of maintainability.

> > But anyway, in contrast to mirror, commit actually freezes the chain
> > between commit_top_bs and base, so it should be safe at least.
> > 
> >> @@ -89,7 +90,7 @@ static void commit_abort(Job *job)
> >>       * XXX Can (or should) we somehow keep 'consistent read' blocked even
> >>       * after the failed/cancelled commit job is gone? If we already wrote
> >>       * something to base, the intermediate images aren't valid any more. */
> >> -    bdrv_replace_node(s->commit_top_bs, backing_bs(s->commit_top_bs),
> >> +    bdrv_replace_node(s->commit_top_bs, s->commit_top_bs->backing->bs,
> >>                        &error_abort);
> >>  
> >>      bdrv_unref(s->commit_top_bs);
> >> @@ -153,7 +154,7 @@ static int coroutine_fn commit_run(Job *job, Error **errp)
> >>              break;
> >>          }
> >>          /* Copy if allocated above the base */
> >> -        ret = bdrv_is_allocated_above(blk_bs(s->top), blk_bs(s->base), false,
> >> +        ret = bdrv_is_allocated_above(blk_bs(s->top), s->base_overlay, true,
> >>                                        offset, COMMIT_BUFFER_SIZE, &n);
> >>          copy = (ret == 1);
> >>          trace_commit_one_iteration(s, offset, n, ret);
> >> @@ -253,15 +254,35 @@ void commit_start(const char *job_id, BlockDriverState *bs,
> >>      CommitBlockJob *s;
> >>      BlockDriverState *iter;
> >>      BlockDriverState *commit_top_bs = NULL;
> >> +    BlockDriverState *filtered_base;
> >>      Error *local_err = NULL;
> >> +    int64_t base_size, top_size;
> >> +    uint64_t perms, iter_shared_perms;
> >>      int ret;
> >>  
> >>      assert(top != bs);
> >> -    if (top == base) {
> >> +    if (bdrv_skip_filters(top) == bdrv_skip_filters(base)) {
> >>          error_setg(errp, "Invalid files for merge: top and base are the same");
> >>          return;
> >>      }
> >>  
> >> +    base_size = bdrv_getlength(base);
> >> +    if (base_size < 0) {
> >> +        error_setg_errno(errp, -base_size, "Could not inquire base image size");
> >> +        return;
> >> +    }
> >> +
> >> +    top_size = bdrv_getlength(top);
> >> +    if (top_size < 0) {
> >> +        error_setg_errno(errp, -top_size, "Could not inquire top image size");
> >> +        return;
> >> +    }
> >> +
> >> +    perms = BLK_PERM_CONSISTENT_READ | BLK_PERM_WRITE;
> >> +    if (base_size < top_size) {
> >> +        perms |= BLK_PERM_RESIZE;
> >> +    }
> > 
> > base_perms would indicate which permissions these are (particularly
> > because it's not the next thing that requires permissions, but only used
> > further down the function).
> 
> %s/\<perms\>/base_perms/?  Sure.

Sorry, I admit this wasn't phrased very clearly. But yes, renaming the
variable this way is what I meant.

> >>      s = block_job_create(job_id, &commit_job_driver, NULL, bs, 0, BLK_PERM_ALL,
> >>                           speed, creation_flags, NULL, NULL, errp);
> >>      if (!s) {
> >> @@ -301,17 +322,43 @@ void commit_start(const char *job_id, BlockDriverState *bs,
> >>  
> >>      s->commit_top_bs = commit_top_bs;
> >>  
> >> -    /* Block all nodes between top and base, because they will
> >> -     * disappear from the chain after this operation. */
> >> -    assert(bdrv_chain_contains(top, base));
> >> -    for (iter = top; iter != base; iter = backing_bs(iter)) {
> >> -        /* XXX BLK_PERM_WRITE needs to be allowed so we don't block ourselves
> >> -         * at s->base (if writes are blocked for a node, they are also blocked
> >> -         * for its backing file). The other options would be a second filter
> >> -         * driver above s->base. */
> >> +    /*
> >> +     * Block all nodes between top and base, because they will
> >> +     * disappear from the chain after this operation.
> >> +     * Note that this assumes that the user is fine with removing all
> >> +     * nodes (including R/W filters) between top and base.  Assuring
> >> +     * this is the responsibility of the interface (i.e. whoever calls
> >> +     * commit_start()).
> >> +     */
> >> +    s->base_overlay = bdrv_find_overlay(top, base);
> >> +    assert(s->base_overlay);
> >> +
> >> +    /*
> >> +     * The topmost node with
> >> +     * bdrv_skip_filters(filtered_base) == bdrv_skip_filters(base)
> >> +     */
> >> +    filtered_base = bdrv_cow_bs(s->base_overlay);
> >> +    assert(bdrv_skip_filters(filtered_base) == bdrv_skip_filters(base));
> >> +
> >> +    /*
> >> +     * XXX BLK_PERM_WRITE needs to be allowed so we don't block ourselves
> >> +     * at s->base (if writes are blocked for a node, they are also blocked
> >> +     * for its backing file). The other options would be a second filter
> >> +     * driver above s->base.
> >> +     */
> >> +    iter_shared_perms = BLK_PERM_WRITE_UNCHANGED | BLK_PERM_WRITE;
> >> +
> >> +    for (iter = top; iter != base; iter = bdrv_filter_or_cow_bs(iter)) {
> >> +        if (iter == filtered_base) {
> >> +            /*
> >> +             * From here on, all nodes are filters on the base.  This
> >> +             * allows us to share BLK_PERM_CONSISTENT_READ.
> >> +             */
> >> +            iter_shared_perms |= BLK_PERM_CONSISTENT_READ;
> >> +        }
> >> +
> >>          ret = block_job_add_bdrv(&s->common, "intermediate node", iter, 0,
> >> -                                 BLK_PERM_WRITE_UNCHANGED | BLK_PERM_WRITE,
> >> -                                 errp);
> >> +                                 iter_shared_perms, errp);
> >>          if (ret < 0) {
> >>              goto fail;
> >>          }
> >> @@ -328,9 +375,7 @@ void commit_start(const char *job_id, BlockDriverState *bs,
> >>      }
> >>  
> >>      s->base = blk_new(s->common.job.aio_context,
> >> -                      BLK_PERM_CONSISTENT_READ
> >> -                      | BLK_PERM_WRITE
> >> -                      | BLK_PERM_RESIZE,
> >> +                      perms,
> >>                        BLK_PERM_CONSISTENT_READ
> >>                        | BLK_PERM_GRAPH_MOD
> >>                        | BLK_PERM_WRITE_UNCHANGED);
> >> @@ -398,19 +443,22 @@ int bdrv_commit(BlockDriverState *bs)
> >>      if (!drv)
> >>          return -ENOMEDIUM;
> >>  
> >> -    if (!bs->backing) {
> >> +    backing_file_bs = bdrv_cow_bs(bs);
> >> +
> >> +    if (!backing_file_bs) {
> >>          return -ENOTSUP;
> >>      }
> >>  
> >>      if (bdrv_op_is_blocked(bs, BLOCK_OP_TYPE_COMMIT_SOURCE, NULL) ||
> >> -        bdrv_op_is_blocked(bs->backing->bs, BLOCK_OP_TYPE_COMMIT_TARGET, NULL)) {
> >> +        bdrv_op_is_blocked(backing_file_bs, BLOCK_OP_TYPE_COMMIT_TARGET, NULL))
> >> +    {
> >>          return -EBUSY;
> >>      }
> >>  
> >> -    ro = bs->backing->bs->read_only;
> >> +    ro = backing_file_bs->read_only;
> >>  
> >>      if (ro) {
> >> -        if (bdrv_reopen_set_read_only(bs->backing->bs, false, NULL)) {
> >> +        if (bdrv_reopen_set_read_only(backing_file_bs, false, NULL)) {
> >>              return -EACCES;
> >>          }
> >>      }
> >> @@ -428,8 +476,6 @@ int bdrv_commit(BlockDriverState *bs)
> >>      }
> >>  
> >>      /* Insert commit_top block node above backing, so we can write to it */
> >> -    backing_file_bs = backing_bs(bs);
> >> -
> >>      commit_top_bs = bdrv_new_open_driver(&bdrv_commit_top, NULL, BDRV_O_RDWR,
> >>                                           &local_err);
> >>      if (commit_top_bs == NULL) {
> >> @@ -515,15 +561,13 @@ ro_cleanup:
> >>      qemu_vfree(buf);
> >>  
> >>      blk_unref(backing);
> >> -    if (backing_file_bs) {
> >> -        bdrv_set_backing_hd(bs, backing_file_bs, &error_abort);
> >> -    }
> >> +    bdrv_set_backing_hd(bs, backing_file_bs, &error_abort);
> > 
> > This means that bdrv_set_backing_hd() is now called to undo a change
> > that hasn't even been made yet. This fails (with &error_abort) if the
> > backing chain is frozen.
> > 
> > On the other hand, the other bdrv_set_backing_hd() calls in the same
> > function would fail the same way.
> 
> True. :)
> 
> Still, maybe there’s an op blocker from a concurrent job, so we go to
> the failure path and then we’d abort here.  So better to guard it by
> checking whether bdrv_cow_bs(bs) != backing_file_bs.

Certainly can't hurt.

Kevin

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [PATCH v7 37/47] qemu-img: Use child access functions
  2020-06-25 15:22 ` [PATCH v7 37/47] qemu-img: Use child access functions Max Reitz
  2020-07-24 15:51   ` Andrey Shinkevich
@ 2020-08-21 15:29   ` Kevin Wolf
  2020-08-24 12:42     ` Max Reitz
  1 sibling, 1 reply; 173+ messages in thread
From: Kevin Wolf @ 2020-08-21 15:29 UTC (permalink / raw)
  To: Max Reitz; +Cc: Vladimir Sementsov-Ogievskiy, qemu-devel, qemu-block

Am 25.06.2020 um 17:22 hat Max Reitz geschrieben:
> This changes iotest 204's output, because blkdebug on top of a COW node
> used to make qemu-img map disregard the rest of the backing chain (the
> backing chain was broken by the filter).  With this patch, the
> allocation in the base image is reported correctly.
> 
> Signed-off-by: Max Reitz <mreitz@redhat.com>

> @@ -3437,6 +3441,7 @@ static int img_rebase(int argc, char **argv)
>      uint8_t *buf_old = NULL;
>      uint8_t *buf_new = NULL;
>      BlockDriverState *bs = NULL, *prefix_chain_bs = NULL;
> +    BlockDriverState *unfiltered_bs;
>      char *filename;
>      const char *fmt, *cache, *src_cache, *out_basefmt, *out_baseimg;
>      int c, flags, src_flags, ret;
> @@ -3571,6 +3576,8 @@ static int img_rebase(int argc, char **argv)
>      }
>      bs = blk_bs(blk);
>  
> +    unfiltered_bs = bdrv_skip_filters(bs);
> +
>      if (out_basefmt != NULL) {
>          if (bdrv_find_format(out_basefmt) == NULL) {
>              error_report("Invalid format name: '%s'", out_basefmt);
> @@ -3582,7 +3589,7 @@ static int img_rebase(int argc, char **argv)
>      /* For safe rebasing we need to compare old and new backing file */
>      if (!unsafe) {
>          QDict *options = NULL;
> -        BlockDriverState *base_bs = backing_bs(bs);
> +        BlockDriverState *base_bs = bdrv_cow_bs(unfiltered_bs);
>  
>          if (base_bs) {
>              blk_old_backing = blk_new(qemu_get_aio_context(),
> @@ -3738,8 +3745,9 @@ static int img_rebase(int argc, char **argv)
>                   * If cluster wasn't changed since prefix_chain, we don't need
>                   * to take action
>                   */
> -                ret = bdrv_is_allocated_above(backing_bs(bs), prefix_chain_bs,
> -                                              false, offset, n, &n);
> +                ret = bdrv_is_allocated_above(bdrv_cow_bs(unfiltered_bs),
> +                                              prefix_chain_bs, false,
> +                                              offset, n, &n);
>                  if (ret < 0) {
>                      error_report("error while reading image metadata: %s",
>                                   strerror(-ret));

img_rebase() has these additional calls:

    /* If the cluster is allocated, we don't need to take action */
    ret = bdrv_is_allocated(bs, offset, n, &n);

And:

    if (out_baseimg && *out_baseimg) {
        ret = bdrv_change_backing_file(bs, out_baseimg, out_basefmt);
    } else {
        ret = bdrv_change_backing_file(bs, NULL, NULL);
    }

Shouldn't they use unfiltered_bs?

(Not that it's likely that anyone would use 'qemu-img rebase' with a
filter, but while you're touching it...)

Kevin



^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [PATCH v7 39/47] blockdev: Fix active commit choice
  2020-06-25 15:22 ` [PATCH v7 39/47] blockdev: Fix active commit choice Max Reitz
@ 2020-08-21 15:50   ` Kevin Wolf
  2020-08-24 13:18     ` Max Reitz
  0 siblings, 1 reply; 173+ messages in thread
From: Kevin Wolf @ 2020-08-21 15:50 UTC (permalink / raw)
  To: Max Reitz; +Cc: Vladimir Sementsov-Ogievskiy, qemu-devel, qemu-block

Am 25.06.2020 um 17:22 hat Max Reitz geschrieben:
> We have to perform an active commit whenever the top node has a parent
> that has taken the WRITE permission on it.
> 
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>  blockdev.c | 24 +++++++++++++++++++++---
>  1 file changed, 21 insertions(+), 3 deletions(-)
> 
> diff --git a/blockdev.c b/blockdev.c
> index 402f1d1df1..237fffbe53 100644
> --- a/blockdev.c
> +++ b/blockdev.c
> @@ -2589,6 +2589,7 @@ void qmp_block_commit(bool has_job_id, const char *job_id, const char *device,
>      AioContext *aio_context;
>      Error *local_err = NULL;
>      int job_flags = JOB_DEFAULT;
> +    uint64_t top_perm, top_shared;
>  
>      if (!has_speed) {
>          speed = 0;
> @@ -2704,14 +2705,31 @@ void qmp_block_commit(bool has_job_id, const char *job_id, const char *device,
>          goto out;
>      }
>  
> -    if (top_bs == bs) {
> +    /*
> +     * Active commit is required if and only if someone has taken a
> +     * WRITE permission on the top node.

...or if someone wants to take a WRITE permission while the job is
running.

Future intentions of the user is something that we can't know, so maybe
this should become an option in the future (not in this series, of
course).

>                                            Historically, we have always
> +     * used active commit for top nodes, so continue that practice.
> +     * (Active commit is never really wrong.)
> +     */

Changing the practice would break compatibility with clients that start
an active commit job and then attach it to a read-write device, so we
must continue the practice. I think the comment should be clearer about
this, it sounds more like "no reason, but why not".

This is even more problematic because the commit job doesn't unshare
BLK_PERM_WRITE yet, so it would lead to silent corruption rather than an
error.

> +    bdrv_get_cumulative_perm(top_bs, &top_perm, &top_shared);
> +    if (top_perm & BLK_PERM_WRITE ||
> +        bdrv_skip_filters(top_bs) == bdrv_skip_filters(bs))
> +    {
>          if (has_backing_file) {
>              error_setg(errp, "'backing-file' specified,"
>                               " but 'top' is the active layer");

Hm, this error message isn't accurate any more.

In fact, the implementation isn't consistent with the QAPI documentation
any more, because backing-file is only an error for the top level.

>              goto out;
>          }
> -        commit_active_start(has_job_id ? job_id : NULL, bs, base_bs,
> -                            job_flags, speed, on_error,
> +        if (!has_job_id) {
> +            /*
> +             * Emulate here what block_job_create() does, because it
> +             * is possible that @bs != @top_bs (the block job should
> +             * be named after @bs, even if @top_bs is the actual
> +             * source)
> +             */

Should it? Oh, yes, looks like it. block-commit is weird. :-)

> +            job_id = bdrv_get_device_name(bs);
> +        }
> +        commit_active_start(job_id, top_bs, base_bs, job_flags, speed, on_error,
>                              filter_node_name, NULL, NULL, false, &local_err);
>      } else {
>          BlockDriverState *overlay_bs = bdrv_find_overlay(bs, top_bs);

Kevin



^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [PATCH v7 37/47] qemu-img: Use child access functions
  2020-08-21 15:29   ` Kevin Wolf
@ 2020-08-24 12:42     ` Max Reitz
  0 siblings, 0 replies; 173+ messages in thread
From: Max Reitz @ 2020-08-24 12:42 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: Vladimir Sementsov-Ogievskiy, qemu-devel, qemu-block


[-- Attachment #1.1: Type: text/plain, Size: 3044 bytes --]

On 21.08.20 17:29, Kevin Wolf wrote:
> Am 25.06.2020 um 17:22 hat Max Reitz geschrieben:
>> This changes iotest 204's output, because blkdebug on top of a COW node
>> used to make qemu-img map disregard the rest of the backing chain (the
>> backing chain was broken by the filter).  With this patch, the
>> allocation in the base image is reported correctly.
>>
>> Signed-off-by: Max Reitz <mreitz@redhat.com>
> 
>> @@ -3437,6 +3441,7 @@ static int img_rebase(int argc, char **argv)
>>      uint8_t *buf_old = NULL;
>>      uint8_t *buf_new = NULL;
>>      BlockDriverState *bs = NULL, *prefix_chain_bs = NULL;
>> +    BlockDriverState *unfiltered_bs;
>>      char *filename;
>>      const char *fmt, *cache, *src_cache, *out_basefmt, *out_baseimg;
>>      int c, flags, src_flags, ret;
>> @@ -3571,6 +3576,8 @@ static int img_rebase(int argc, char **argv)
>>      }
>>      bs = blk_bs(blk);
>>  
>> +    unfiltered_bs = bdrv_skip_filters(bs);
>> +
>>      if (out_basefmt != NULL) {
>>          if (bdrv_find_format(out_basefmt) == NULL) {
>>              error_report("Invalid format name: '%s'", out_basefmt);
>> @@ -3582,7 +3589,7 @@ static int img_rebase(int argc, char **argv)
>>      /* For safe rebasing we need to compare old and new backing file */
>>      if (!unsafe) {
>>          QDict *options = NULL;
>> -        BlockDriverState *base_bs = backing_bs(bs);
>> +        BlockDriverState *base_bs = bdrv_cow_bs(unfiltered_bs);
>>  
>>          if (base_bs) {
>>              blk_old_backing = blk_new(qemu_get_aio_context(),
>> @@ -3738,8 +3745,9 @@ static int img_rebase(int argc, char **argv)
>>                   * If cluster wasn't changed since prefix_chain, we don't need
>>                   * to take action
>>                   */
>> -                ret = bdrv_is_allocated_above(backing_bs(bs), prefix_chain_bs,
>> -                                              false, offset, n, &n);
>> +                ret = bdrv_is_allocated_above(bdrv_cow_bs(unfiltered_bs),
>> +                                              prefix_chain_bs, false,
>> +                                              offset, n, &n);
>>                  if (ret < 0) {
>>                      error_report("error while reading image metadata: %s",
>>                                   strerror(-ret));
> 
> img_rebase() has these additional calls:
> 
>     /* If the cluster is allocated, we don't need to take action */
>     ret = bdrv_is_allocated(bs, offset, n, &n);
> 
> And:
> 
>     if (out_baseimg && *out_baseimg) {
>         ret = bdrv_change_backing_file(bs, out_baseimg, out_basefmt);
>     } else {
>         ret = bdrv_change_backing_file(bs, NULL, NULL);
>     }
> 
> Shouldn't they use unfiltered_bs?

Oh, yes, the second one definitely.

As for the first one, I don’t think there’s a difference.  But why not,
we really want to query unfiltered_bs, so it’s better to do so
explicitly than through the implicit fall-through behavior of block_status.

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [PATCH v7 41/47] block: Leave BDS.backing_file constant
  2020-06-25 15:22 ` [PATCH v7 41/47] block: Leave BDS.backing_file constant Max Reitz
  2020-07-27 12:27   ` Andrey Shinkevich
@ 2020-08-24 13:14   ` Kevin Wolf
  2020-08-24 14:29     ` Max Reitz
  1 sibling, 1 reply; 173+ messages in thread
From: Kevin Wolf @ 2020-08-24 13:14 UTC (permalink / raw)
  To: Max Reitz; +Cc: Vladimir Sementsov-Ogievskiy, qemu-devel, qemu-block

Am 25.06.2020 um 17:22 hat Max Reitz geschrieben:
> Parts of the block layer treat BDS.backing_file as if it were whatever
> the image header says (i.e., if it is a relative path, it is relative to
> the overlay), other parts treat it like a cache for
> bs->backing->bs->filename (relative paths are relative to the CWD).
> Considering bs->backing->bs->filename exists, let us make it mean the
> former.
> 
> Among other things, this now allows the user to specify a base when
> using qemu-img to commit an image file in a directory that is not the
> CWD (assuming, everything uses relative filenames).
> 
> Before this patch:
> 
> $ ./qemu-img create -f qcow2 foo/bot.qcow2 1M
> $ ./qemu-img create -f qcow2 -b bot.qcow2 foo/mid.qcow2
> $ ./qemu-img create -f qcow2 -b mid.qcow2 foo/top.qcow2
> $ ./qemu-img commit -b mid.qcow2 foo/top.qcow2
> qemu-img: Did not find 'mid.qcow2' in the backing chain of 'foo/top.qcow2'
> $ ./qemu-img commit -b foo/mid.qcow2 foo/top.qcow2
> qemu-img: Did not find 'foo/mid.qcow2' in the backing chain of 'foo/top.qcow2'
> $ ./qemu-img commit -b $PWD/foo/mid.qcow2 foo/top.qcow2
> qemu-img: Did not find '[...]/foo/mid.qcow2' in the backing chain of 'foo/top.qcow2'
> 
> After this patch:
> 
> $ ./qemu-img commit -b mid.qcow2 foo/top.qcow2
> Image committed.
> $ ./qemu-img commit -b foo/mid.qcow2 foo/top.qcow2
> qemu-img: Did not find 'foo/mid.qcow2' in the backing chain of 'foo/top.qcow2'
> $ ./qemu-img commit -b $PWD/foo/mid.qcow2 foo/top.qcow2
> Image committed.
> 
> With this change, bdrv_find_backing_image() must look at whether the
> user has overridden a BDS's backing file.  If so, it can no longer use
> bs->backing_file, but must instead compare the given filename against
> the backing node's filename directly.
> 
> Note that this changes the QAPI output for a node's backing_file.  We
> had very inconsistent output there (sometimes what the image header
> said, sometimes the actual filename of the backing image).  This
> inconsistent output was effectively useless, so we have to decide one
> way or the other.  Considering that bs->backing_file usually at runtime
> contained the path to the image relative to qemu's CWD (or absolute),
> this patch changes QAPI's backing_file to always report the
> bs->backing->bs->filename from now on.  If you want to receive the image
> header information, you have to refer to full-backing-filename.
> 
> This necessitates a change to iotest 228.  The interesting information
> it really wanted is the image header, and it can get that now, but it
> has to use full-backing-filename instead of backing_file.  Because of
> this patch's changes to bs->backing_file's behavior, we also need some
> reference output changes.
> 
> Along with the changes to bs->backing_file, stop updating
> BDS.backing_format in bdrv_backing_attach() as well.  In order not to
> change our externally visible behavior (incompatibly), we have to let
> bdrv_query_image_info() try to get the image format from bs->backing if
> bs->backing_format is unset.  (The QAPI schema describes
> backing-filename-format as "the format of the backing file", so it is
> not necessarily what the image header says, but just the format of the
> file referenced by backing-filename (if known).)

Why is it okay to change backing-filename incompatibly, but not
backing-filename-format? I would find it much more consistent if
ImageInfo reported the value from the header in both fields, and
BlockDeviceInfo reported the values actually in use.

The QAPI schema described ImageInfo as "Information about a QEMU image
file" and runtime state really isn't information about an image file.

If you want to know the probed image format, you can still look at
backing-image.format. I don't think this change is much different from
what you described above for BlockDeviceInfo.backing_file.

> iotest 245 changes in behavior: With the backing node no longer
> overriding the parent node's backing_file string, you can now omit the
> @backing option when reopening a node with neither a default nor a
> current backing file even if it used to have a backing node at some
> point.
> 
> Signed-off-by: Max Reitz <mreitz@redhat.com>

Kevin



^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [PATCH v7 39/47] blockdev: Fix active commit choice
  2020-08-21 15:50   ` Kevin Wolf
@ 2020-08-24 13:18     ` Max Reitz
  2020-08-24 14:07       ` Kevin Wolf
  0 siblings, 1 reply; 173+ messages in thread
From: Max Reitz @ 2020-08-24 13:18 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: Vladimir Sementsov-Ogievskiy, qemu-devel, qemu-block


[-- Attachment #1.1: Type: text/plain, Size: 3927 bytes --]

On 21.08.20 17:50, Kevin Wolf wrote:
> Am 25.06.2020 um 17:22 hat Max Reitz geschrieben:
>> We have to perform an active commit whenever the top node has a parent
>> that has taken the WRITE permission on it.
>>
>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>> ---
>>  blockdev.c | 24 +++++++++++++++++++++---
>>  1 file changed, 21 insertions(+), 3 deletions(-)
>>
>> diff --git a/blockdev.c b/blockdev.c
>> index 402f1d1df1..237fffbe53 100644
>> --- a/blockdev.c
>> +++ b/blockdev.c
>> @@ -2589,6 +2589,7 @@ void qmp_block_commit(bool has_job_id, const char *job_id, const char *device,
>>      AioContext *aio_context;
>>      Error *local_err = NULL;
>>      int job_flags = JOB_DEFAULT;
>> +    uint64_t top_perm, top_shared;
>>  
>>      if (!has_speed) {
>>          speed = 0;
>> @@ -2704,14 +2705,31 @@ void qmp_block_commit(bool has_job_id, const char *job_id, const char *device,
>>          goto out;
>>      }
>>  
>> -    if (top_bs == bs) {
>> +    /*
>> +     * Active commit is required if and only if someone has taken a
>> +     * WRITE permission on the top node.
> 
> ...or if someone wants to take a WRITE permission while the job is
> running.
> 
> Future intentions of the user is something that we can't know, so maybe
> this should become an option in the future (not in this series, of
> course).
> 
>>                                            Historically, we have always
>> +     * used active commit for top nodes, so continue that practice.
>> +     * (Active commit is never really wrong.)
>> +     */
> 
> Changing the practice would break compatibility with clients that start
> an active commit job and then attach it to a read-write device, so we
> must continue the practice. I think the comment should be clearer about
> this, it sounds more like "no reason, but why not".

I think that’s what I meant by “historically”.  Is “legacily” a word?

But sure, I can make it more explicit.

> This is even more problematic because the commit job doesn't unshare
> BLK_PERM_WRITE yet, so it would lead to silent corruption rather than an
> error.
> 
>> +    bdrv_get_cumulative_perm(top_bs, &top_perm, &top_shared);
>> +    if (top_perm & BLK_PERM_WRITE ||
>> +        bdrv_skip_filters(top_bs) == bdrv_skip_filters(bs))
>> +    {
>>          if (has_backing_file) {
>>              error_setg(errp, "'backing-file' specified,"
>>                               " but 'top' is the active layer");
> 
> Hm, this error message isn't accurate any more.
> 
> In fact, the implementation isn't consistent with the QAPI documentation
> any more, because backing-file is only an error for the top level.

Hm.  I wanted to agree, and then I wanted to come up with a QAPI
documentation that fits the new behavior (because I think it makes more
sense to change the QAPI documentation along with the behavior change,
rather than to force us to allow backing-file for anything that isn’t on
the top layer).

But in the process of coming up with a better description, I noticed
that this doesn’t say “is a root node”, it says “is the active layer”.
I would say a node in the active layer is a node that has some parent
that has taken a WRITE permission on it.  So actually I think that the
documentation is right, and this code only now fits.

Though I do think this wants for some clarification.  Perhaps “If 'top'
is the active layer (i.e., is a node that may be written to), specifying
a backing [...]”?

There’s more wrong with the specification, namely the whole part under
@backing-file past the “(Since 2.1)”, starting with “If top == base”.  I
think all of that should go to the top level.  (And “If top == active”
should be changed to “If top is active (i.e., may be written to)”.)

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [PATCH v7 39/47] blockdev: Fix active commit choice
  2020-08-24 13:18     ` Max Reitz
@ 2020-08-24 14:07       ` Kevin Wolf
  2020-08-24 14:41         ` Max Reitz
  0 siblings, 1 reply; 173+ messages in thread
From: Kevin Wolf @ 2020-08-24 14:07 UTC (permalink / raw)
  To: Max Reitz; +Cc: Vladimir Sementsov-Ogievskiy, qemu-devel, qemu-block

[-- Attachment #1: Type: text/plain, Size: 5007 bytes --]

Am 24.08.2020 um 15:18 hat Max Reitz geschrieben:
> On 21.08.20 17:50, Kevin Wolf wrote:
> > Am 25.06.2020 um 17:22 hat Max Reitz geschrieben:
> >> We have to perform an active commit whenever the top node has a parent
> >> that has taken the WRITE permission on it.
> >>
> >> Signed-off-by: Max Reitz <mreitz@redhat.com>
> >> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> >> ---
> >>  blockdev.c | 24 +++++++++++++++++++++---
> >>  1 file changed, 21 insertions(+), 3 deletions(-)
> >>
> >> diff --git a/blockdev.c b/blockdev.c
> >> index 402f1d1df1..237fffbe53 100644
> >> --- a/blockdev.c
> >> +++ b/blockdev.c
> >> @@ -2589,6 +2589,7 @@ void qmp_block_commit(bool has_job_id, const char *job_id, const char *device,
> >>      AioContext *aio_context;
> >>      Error *local_err = NULL;
> >>      int job_flags = JOB_DEFAULT;
> >> +    uint64_t top_perm, top_shared;
> >>  
> >>      if (!has_speed) {
> >>          speed = 0;
> >> @@ -2704,14 +2705,31 @@ void qmp_block_commit(bool has_job_id, const char *job_id, const char *device,
> >>          goto out;
> >>      }
> >>  
> >> -    if (top_bs == bs) {
> >> +    /*
> >> +     * Active commit is required if and only if someone has taken a
> >> +     * WRITE permission on the top node.
> > 
> > ...or if someone wants to take a WRITE permission while the job is
> > running.
> > 
> > Future intentions of the user is something that we can't know, so maybe
> > this should become an option in the future (not in this series, of
> > course).
> > 
> >>                                            Historically, we have always
> >> +     * used active commit for top nodes, so continue that practice.
> >> +     * (Active commit is never really wrong.)
> >> +     */
> > 
> > Changing the practice would break compatibility with clients that start
> > an active commit job and then attach it to a read-write device, so we
> > must continue the practice. I think the comment should be clearer about
> > this, it sounds more like "no reason, but why not".
> 
> I think that’s what I meant by “historically”.  Is “legacily” a word?
> 
> But sure, I can make it more explicit.
> 
> > This is even more problematic because the commit job doesn't unshare
> > BLK_PERM_WRITE yet, so it would lead to silent corruption rather than an
> > error.
> > 
> >> +    bdrv_get_cumulative_perm(top_bs, &top_perm, &top_shared);
> >> +    if (top_perm & BLK_PERM_WRITE ||
> >> +        bdrv_skip_filters(top_bs) == bdrv_skip_filters(bs))
> >> +    {
> >>          if (has_backing_file) {
> >>              error_setg(errp, "'backing-file' specified,"
> >>                               " but 'top' is the active layer");
> > 
> > Hm, this error message isn't accurate any more.
> > 
> > In fact, the implementation isn't consistent with the QAPI documentation
> > any more, because backing-file is only an error for the top level.
> 
> Hm.  I wanted to agree, and then I wanted to come up with a QAPI
> documentation that fits the new behavior (because I think it makes more
> sense to change the QAPI documentation along with the behavior change,
> rather than to force us to allow backing-file for anything that isn’t on
> the top layer).
> 
> But in the process of coming up with a better description, I noticed
> that this doesn’t say “is a root node”, it says “is the active layer”.
> I would say a node in the active layer is a node that has some parent
> that has taken a WRITE permission on it.  So actually I think that the
> documentation is right, and this code only now fits.

Then you may have not only "the" active layer, but multiple active
layers. I find this a bit counterintuitive.

There is a simple reason why backing-file is an error for a root node:
It doesn't have overlays, so a value to write to the header of overlay
images just doesn't make sense.

The same reasoning doesn't apply for writable images that do have
overlays. Forbidding backing-file is a more arbitrary restriction there.
I'm not saying that we can't make arbitrary restrictions where allowing
an option is not worth the effort, but I feel they should be spelt out
more explicitly instead of twisting words like "active layer" until they
fit the code.

> Though I do think this wants for some clarification.  Perhaps “If 'top'
> is the active layer (i.e., is a node that may be written to), specifying
> a backing [...]”?

"If 'top' doesn't have an overlay image or is in use by a writer..."?

> There’s more wrong with the specification, namely the whole part under
> @backing-file past the “(Since 2.1)”, starting with “If top == base”.  I
> think all of that should go to the top level.  (And “If top == active”
> should be changed to “If top is active (i.e., may be written to)”.)

At least the latter only becomes wrong with this patch, so I think it
needs to be changed by this patch.

Kevin

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [PATCH v7 41/47] block: Leave BDS.backing_file constant
  2020-08-24 13:14   ` Kevin Wolf
@ 2020-08-24 14:29     ` Max Reitz
  0 siblings, 0 replies; 173+ messages in thread
From: Max Reitz @ 2020-08-24 14:29 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: Vladimir Sementsov-Ogievskiy, qemu-devel, qemu-block


[-- Attachment #1.1: Type: text/plain, Size: 4200 bytes --]

On 24.08.20 15:14, Kevin Wolf wrote:
> Am 25.06.2020 um 17:22 hat Max Reitz geschrieben:
>> Parts of the block layer treat BDS.backing_file as if it were whatever
>> the image header says (i.e., if it is a relative path, it is relative to
>> the overlay), other parts treat it like a cache for
>> bs->backing->bs->filename (relative paths are relative to the CWD).
>> Considering bs->backing->bs->filename exists, let us make it mean the
>> former.
>>
>> Among other things, this now allows the user to specify a base when
>> using qemu-img to commit an image file in a directory that is not the
>> CWD (assuming, everything uses relative filenames).
>>
>> Before this patch:
>>
>> $ ./qemu-img create -f qcow2 foo/bot.qcow2 1M
>> $ ./qemu-img create -f qcow2 -b bot.qcow2 foo/mid.qcow2
>> $ ./qemu-img create -f qcow2 -b mid.qcow2 foo/top.qcow2
>> $ ./qemu-img commit -b mid.qcow2 foo/top.qcow2
>> qemu-img: Did not find 'mid.qcow2' in the backing chain of 'foo/top.qcow2'
>> $ ./qemu-img commit -b foo/mid.qcow2 foo/top.qcow2
>> qemu-img: Did not find 'foo/mid.qcow2' in the backing chain of 'foo/top.qcow2'
>> $ ./qemu-img commit -b $PWD/foo/mid.qcow2 foo/top.qcow2
>> qemu-img: Did not find '[...]/foo/mid.qcow2' in the backing chain of 'foo/top.qcow2'
>>
>> After this patch:
>>
>> $ ./qemu-img commit -b mid.qcow2 foo/top.qcow2
>> Image committed.
>> $ ./qemu-img commit -b foo/mid.qcow2 foo/top.qcow2
>> qemu-img: Did not find 'foo/mid.qcow2' in the backing chain of 'foo/top.qcow2'
>> $ ./qemu-img commit -b $PWD/foo/mid.qcow2 foo/top.qcow2
>> Image committed.
>>
>> With this change, bdrv_find_backing_image() must look at whether the
>> user has overridden a BDS's backing file.  If so, it can no longer use
>> bs->backing_file, but must instead compare the given filename against
>> the backing node's filename directly.
>>
>> Note that this changes the QAPI output for a node's backing_file.  We
>> had very inconsistent output there (sometimes what the image header
>> said, sometimes the actual filename of the backing image).  This
>> inconsistent output was effectively useless, so we have to decide one
>> way or the other.  Considering that bs->backing_file usually at runtime
>> contained the path to the image relative to qemu's CWD (or absolute),
>> this patch changes QAPI's backing_file to always report the
>> bs->backing->bs->filename from now on.  If you want to receive the image
>> header information, you have to refer to full-backing-filename.
>>
>> This necessitates a change to iotest 228.  The interesting information
>> it really wanted is the image header, and it can get that now, but it
>> has to use full-backing-filename instead of backing_file.  Because of
>> this patch's changes to bs->backing_file's behavior, we also need some
>> reference output changes.
>>
>> Along with the changes to bs->backing_file, stop updating
>> BDS.backing_format in bdrv_backing_attach() as well.  In order not to
>> change our externally visible behavior (incompatibly), we have to let
>> bdrv_query_image_info() try to get the image format from bs->backing if
>> bs->backing_format is unset.  (The QAPI schema describes
>> backing-filename-format as "the format of the backing file", so it is
>> not necessarily what the image header says, but just the format of the
>> file referenced by backing-filename (if known).)
> 
> Why is it okay to change backing-filename incompatibly, but not
> backing-filename-format?

I hope you’re asking the reverse, i.e. why I don’t change
backing-filename-format, too.  The answer to that is yeah, why not. :)

> I would find it much more consistent if
> ImageInfo reported the value from the header in both fields, and
> BlockDeviceInfo reported the values actually in use.
> 
> The QAPI schema described ImageInfo as "Information about a QEMU image
> file" and runtime state really isn't information about an image file.
> 
> If you want to know the probed image format, you can still look at
> backing-image.format. I don't think this change is much different from
> what you described above for BlockDeviceInfo.backing_file.

Well, OK then.

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [PATCH v7 39/47] blockdev: Fix active commit choice
  2020-08-24 14:07       ` Kevin Wolf
@ 2020-08-24 14:41         ` Max Reitz
  2020-08-24 15:06           ` Kevin Wolf
  0 siblings, 1 reply; 173+ messages in thread
From: Max Reitz @ 2020-08-24 14:41 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: Vladimir Sementsov-Ogievskiy, qemu-devel, qemu-block


[-- Attachment #1.1: Type: text/plain, Size: 6638 bytes --]

On 24.08.20 16:07, Kevin Wolf wrote:
> Am 24.08.2020 um 15:18 hat Max Reitz geschrieben:
>> On 21.08.20 17:50, Kevin Wolf wrote:
>>> Am 25.06.2020 um 17:22 hat Max Reitz geschrieben:
>>>> We have to perform an active commit whenever the top node has a parent
>>>> that has taken the WRITE permission on it.
>>>>
>>>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>>>> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>>>> ---
>>>>  blockdev.c | 24 +++++++++++++++++++++---
>>>>  1 file changed, 21 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/blockdev.c b/blockdev.c
>>>> index 402f1d1df1..237fffbe53 100644
>>>> --- a/blockdev.c
>>>> +++ b/blockdev.c
>>>> @@ -2589,6 +2589,7 @@ void qmp_block_commit(bool has_job_id, const char *job_id, const char *device,
>>>>      AioContext *aio_context;
>>>>      Error *local_err = NULL;
>>>>      int job_flags = JOB_DEFAULT;
>>>> +    uint64_t top_perm, top_shared;
>>>>  
>>>>      if (!has_speed) {
>>>>          speed = 0;
>>>> @@ -2704,14 +2705,31 @@ void qmp_block_commit(bool has_job_id, const char *job_id, const char *device,
>>>>          goto out;
>>>>      }
>>>>  
>>>> -    if (top_bs == bs) {
>>>> +    /*
>>>> +     * Active commit is required if and only if someone has taken a
>>>> +     * WRITE permission on the top node.
>>>
>>> ...or if someone wants to take a WRITE permission while the job is
>>> running.
>>>
>>> Future intentions of the user is something that we can't know, so maybe
>>> this should become an option in the future (not in this series, of
>>> course).
>>>
>>>>                                            Historically, we have always
>>>> +     * used active commit for top nodes, so continue that practice.
>>>> +     * (Active commit is never really wrong.)
>>>> +     */
>>>
>>> Changing the practice would break compatibility with clients that start
>>> an active commit job and then attach it to a read-write device, so we
>>> must continue the practice. I think the comment should be clearer about
>>> this, it sounds more like "no reason, but why not".
>>
>> I think that’s what I meant by “historically”.  Is “legacily” a word?
>>
>> But sure, I can make it more explicit.
>>
>>> This is even more problematic because the commit job doesn't unshare
>>> BLK_PERM_WRITE yet, so it would lead to silent corruption rather than an
>>> error.
>>>
>>>> +    bdrv_get_cumulative_perm(top_bs, &top_perm, &top_shared);
>>>> +    if (top_perm & BLK_PERM_WRITE ||
>>>> +        bdrv_skip_filters(top_bs) == bdrv_skip_filters(bs))
>>>> +    {
>>>>          if (has_backing_file) {
>>>>              error_setg(errp, "'backing-file' specified,"
>>>>                               " but 'top' is the active layer");
>>>
>>> Hm, this error message isn't accurate any more.
>>>
>>> In fact, the implementation isn't consistent with the QAPI documentation
>>> any more, because backing-file is only an error for the top level.
>>
>> Hm.  I wanted to agree, and then I wanted to come up with a QAPI
>> documentation that fits the new behavior (because I think it makes more
>> sense to change the QAPI documentation along with the behavior change,
>> rather than to force us to allow backing-file for anything that isn’t on
>> the top layer).
>>
>> But in the process of coming up with a better description, I noticed
>> that this doesn’t say “is a root node”, it says “is the active layer”.
>> I would say a node in the active layer is a node that has some parent
>> that has taken a WRITE permission on it.  So actually I think that the
>> documentation is right, and this code only now fits.
> 
> Then you may have not only "the" active layer, but multiple active
> layers. I find this a bit counterintuitive.

Depends on what you count as a layer.  I don’t think that’s a clearly
defined term, is it?  I only know of “active layer”, “format layer”,
“protocol layer”, and you can at least have multiple format layers above
each other.  So I don’t find it counterintuitive.

But perhaps it’d be best to just get away from the term “active layer”,
as you propose below.

> There is a simple reason why backing-file is an error for a root node:
> It doesn't have overlays, so a value to write to the header of overlay
> images just doesn't make sense.

Ah, yeah...

> The same reasoning doesn't apply for writable images that do have
> overlays. Forbidding backing-file is a more arbitrary restriction there.
> I'm not saying that we can't make arbitrary restrictions where allowing
> an option is not worth the effort, but I feel they should be spelt out
> more explicitly instead of twisting words like "active layer" until they
> fit the code.

I’m all for spelling it out more explicitly.  I just noticed that I
couldn’t clearly distinguish “active layer” from “other” cases of nodes
with writers on them, which is why I noted that “active” to me means the
post-patch behavior already.

You’re right that there is no semantic reason for making it an error.
So I just want it to be an error to be lazy.  I hope you let me do that.
 (I don’t think there’s much of a problem with it, considering that
commits on nodes that have the WRITE permission taken are basically just
completely broken right now.)

>> Though I do think this wants for some clarification.  Perhaps “If 'top'
>> is the active layer (i.e., is a node that may be written to), specifying
>> a backing [...]”?
> 
> "If 'top' doesn't have an overlay image or is in use by a writer..."?

I.e., avoiding the term “active layer” altogether?  Sounds good.  Only,
I don’t know about “writer”...  But it’s already used in
BlockdevOptionsFile.dynamic-auto-read-only’s description, so I suppose
we can use it here, too.  (I just don’t know if as a
non-block-layer-developer I’d know what it means.)

(Also, yes, you’re right, the current behavior of giving all root nodes
an active commit of course remains, even when there are no writers.)

>> There’s more wrong with the specification, namely the whole part under
>> @backing-file past the “(Since 2.1)”, starting with “If top == base”.  I
>> think all of that should go to the top level.  (And “If top == active”
>> should be changed to “If top is active (i.e., may be written to)”.)
> 
> At least the latter only becomes wrong with this patch, so I think it
> needs to be changed by this patch.

Sure.  So I understand you agree with moving the whole chunk, right?

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [PATCH v7 39/47] blockdev: Fix active commit choice
  2020-08-24 14:41         ` Max Reitz
@ 2020-08-24 15:06           ` Kevin Wolf
  0 siblings, 0 replies; 173+ messages in thread
From: Kevin Wolf @ 2020-08-24 15:06 UTC (permalink / raw)
  To: Max Reitz; +Cc: Vladimir Sementsov-Ogievskiy, qemu-devel, qemu-block

[-- Attachment #1: Type: text/plain, Size: 7932 bytes --]

Am 24.08.2020 um 16:41 hat Max Reitz geschrieben:
> On 24.08.20 16:07, Kevin Wolf wrote:
> > Am 24.08.2020 um 15:18 hat Max Reitz geschrieben:
> >> On 21.08.20 17:50, Kevin Wolf wrote:
> >>> Am 25.06.2020 um 17:22 hat Max Reitz geschrieben:
> >>>> We have to perform an active commit whenever the top node has a parent
> >>>> that has taken the WRITE permission on it.
> >>>>
> >>>> Signed-off-by: Max Reitz <mreitz@redhat.com>
> >>>> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> >>>> ---
> >>>>  blockdev.c | 24 +++++++++++++++++++++---
> >>>>  1 file changed, 21 insertions(+), 3 deletions(-)
> >>>>
> >>>> diff --git a/blockdev.c b/blockdev.c
> >>>> index 402f1d1df1..237fffbe53 100644
> >>>> --- a/blockdev.c
> >>>> +++ b/blockdev.c
> >>>> @@ -2589,6 +2589,7 @@ void qmp_block_commit(bool has_job_id, const char *job_id, const char *device,
> >>>>      AioContext *aio_context;
> >>>>      Error *local_err = NULL;
> >>>>      int job_flags = JOB_DEFAULT;
> >>>> +    uint64_t top_perm, top_shared;
> >>>>  
> >>>>      if (!has_speed) {
> >>>>          speed = 0;
> >>>> @@ -2704,14 +2705,31 @@ void qmp_block_commit(bool has_job_id, const char *job_id, const char *device,
> >>>>          goto out;
> >>>>      }
> >>>>  
> >>>> -    if (top_bs == bs) {
> >>>> +    /*
> >>>> +     * Active commit is required if and only if someone has taken a
> >>>> +     * WRITE permission on the top node.
> >>>
> >>> ...or if someone wants to take a WRITE permission while the job is
> >>> running.
> >>>
> >>> Future intentions of the user is something that we can't know, so maybe
> >>> this should become an option in the future (not in this series, of
> >>> course).
> >>>
> >>>>                                            Historically, we have always
> >>>> +     * used active commit for top nodes, so continue that practice.
> >>>> +     * (Active commit is never really wrong.)
> >>>> +     */
> >>>
> >>> Changing the practice would break compatibility with clients that start
> >>> an active commit job and then attach it to a read-write device, so we
> >>> must continue the practice. I think the comment should be clearer about
> >>> this, it sounds more like "no reason, but why not".
> >>
> >> I think that’s what I meant by “historically”.  Is “legacily” a word?
> >>
> >> But sure, I can make it more explicit.
> >>
> >>> This is even more problematic because the commit job doesn't unshare
> >>> BLK_PERM_WRITE yet, so it would lead to silent corruption rather than an
> >>> error.
> >>>
> >>>> +    bdrv_get_cumulative_perm(top_bs, &top_perm, &top_shared);
> >>>> +    if (top_perm & BLK_PERM_WRITE ||
> >>>> +        bdrv_skip_filters(top_bs) == bdrv_skip_filters(bs))
> >>>> +    {
> >>>>          if (has_backing_file) {
> >>>>              error_setg(errp, "'backing-file' specified,"
> >>>>                               " but 'top' is the active layer");
> >>>
> >>> Hm, this error message isn't accurate any more.
> >>>
> >>> In fact, the implementation isn't consistent with the QAPI documentation
> >>> any more, because backing-file is only an error for the top level.
> >>
> >> Hm.  I wanted to agree, and then I wanted to come up with a QAPI
> >> documentation that fits the new behavior (because I think it makes more
> >> sense to change the QAPI documentation along with the behavior change,
> >> rather than to force us to allow backing-file for anything that isn’t on
> >> the top layer).
> >>
> >> But in the process of coming up with a better description, I noticed
> >> that this doesn’t say “is a root node”, it says “is the active layer”.
> >> I would say a node in the active layer is a node that has some parent
> >> that has taken a WRITE permission on it.  So actually I think that the
> >> documentation is right, and this code only now fits.
> > 
> > Then you may have not only "the" active layer, but multiple active
> > layers. I find this a bit counterintuitive.
> 
> Depends on what you count as a layer.  I don’t think that’s a clearly
> defined term, is it?  I only know of “active layer”, “format layer”,
> “protocol layer”, and you can at least have multiple format layers above
> each other.  So I don’t find it counterintuitive.
> 
> But perhaps it’d be best to just get away from the term “active layer”,
> as you propose below.

Hm, if I needed to describe what a layer is for me intuitively, I guess
it would be something like each non-filter node on a node chain with all
of the filters directly on top of it?

Depending on which link you follow, you get different sets of layers:
For bs->file, you get the format/protocol layer distinction. For
bs->backing, you get essentially what bdrv_backing_chain_next()
iterates.

In this context (which is talking about COW overlays), I expected the
bs->backing link to apply.

The active layer is then the COW layer that is directly referenced by a
guest device, block job or block export.

> > There is a simple reason why backing-file is an error for a root node:
> > It doesn't have overlays, so a value to write to the header of overlay
> > images just doesn't make sense.
> 
> Ah, yeah...
> 
> > The same reasoning doesn't apply for writable images that do have
> > overlays. Forbidding backing-file is a more arbitrary restriction there.
> > I'm not saying that we can't make arbitrary restrictions where allowing
> > an option is not worth the effort, but I feel they should be spelt out
> > more explicitly instead of twisting words like "active layer" until they
> > fit the code.
> 
> I’m all for spelling it out more explicitly.  I just noticed that I
> couldn’t clearly distinguish “active layer” from “other” cases of nodes
> with writers on them, which is why I noted that “active” to me means the
> post-patch behavior already.
> 
> You’re right that there is no semantic reason for making it an error.
> So I just want it to be an error to be lazy.  I hope you let me do that.
>  (I don’t think there’s much of a problem with it, considering that
> commits on nodes that have the WRITE permission taken are basically just
> completely broken right now.)

That I'm happy to allow you to be lazy in this case is what I wanted to
express with "I'm not saying that we can't make arbitrary restrictions".
:-)

> >> Though I do think this wants for some clarification.  Perhaps “If 'top'
> >> is the active layer (i.e., is a node that may be written to), specifying
> >> a backing [...]”?
> > 
> > "If 'top' doesn't have an overlay image or is in use by a writer..."?
> 
> I.e., avoiding the term “active layer” altogether?  Sounds good.  Only,
> I don’t know about “writer”...  But it’s already used in
> BlockdevOptionsFile.dynamic-auto-read-only’s description, so I suppose
> we can use it here, too.  (I just don’t know if as a
> non-block-layer-developer I’d know what it means.)

I was thinking of something like "is used read-write" at first, but then
realised that write-only is possible, too, so it wouldn't be entirely
accurate...

> (Also, yes, you’re right, the current behavior of giving all root nodes
> an active commit of course remains, even when there are no writers.)
> 
> >> There’s more wrong with the specification, namely the whole part under
> >> @backing-file past the “(Since 2.1)”, starting with “If top == base”.  I
> >> think all of that should go to the top level.  (And “If top == active”
> >> should be changed to “If top is active (i.e., may be written to)”.)
> > 
> > At least the latter only becomes wrong with this patch, so I think it
> > needs to be changed by this patch.
> 
> Sure.  So I understand you agree with moving the whole chunk, right?

I don't mind either way.

Kevin

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [PATCH v7 00/47] block: Deal with filters
  2020-06-25 15:21 [PATCH v7 00/47] block: Deal with filters Max Reitz
                   ` (47 preceding siblings ...)
  2020-07-08 17:20 ` [PATCH v7 00/47] block: Deal with filters Andrey Shinkevich
@ 2020-08-24 15:15 ` Kevin Wolf
  48 siblings, 0 replies; 173+ messages in thread
From: Kevin Wolf @ 2020-08-24 15:15 UTC (permalink / raw)
  To: Max Reitz; +Cc: Vladimir Sementsov-Ogievskiy, qemu-devel, qemu-block

Am 25.06.2020 um 17:21 hat Max Reitz geschrieben:
> v6: https://lists.nongnu.org/archive/html/qemu-devel/2019-08/msg01715.html
> 
> Branch: https://github.com/XanClic/qemu.git child-access-functions-v7
> Branch: https://git.xanclic.moe/XanClic/qemu.git child-access-functions-v7

Okay, finally made it through the series. Sorry for taking so long. You
can add my Reviewed-by to all patches that I didn't comment on. (Yes,
I'm just too lazy to make the list myself. :-))

Kevin



^ permalink raw reply	[flat|nested] 173+ messages in thread

end of thread, other threads:[~2020-08-24 15:17 UTC | newest]

Thread overview: 173+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-06-25 15:21 [PATCH v7 00/47] block: Deal with filters Max Reitz
2020-06-25 15:21 ` [PATCH v7 01/47] block: Add child access functions Max Reitz
2020-07-08 17:22   ` Andrey Shinkevich
2020-07-13  9:06   ` Vladimir Sementsov-Ogievskiy
2020-07-16 14:46     ` Max Reitz
2020-07-28 16:09     ` Christophe de Dinechin
2020-08-07  9:33       ` Vladimir Sementsov-Ogievskiy
2020-07-13  9:57   ` Vladimir Sementsov-Ogievskiy
2020-06-25 15:21 ` [PATCH v7 02/47] block: Add chain helper functions Max Reitz
2020-07-08 17:20   ` Andrey Shinkevich
2020-07-09  8:24     ` Max Reitz
2020-07-09  9:07       ` Andrey Shinkevich
2020-07-13 10:18   ` Vladimir Sementsov-Ogievskiy
2020-07-16 14:50     ` Max Reitz
2020-07-16 15:24       ` Vladimir Sementsov-Ogievskiy
2020-06-25 15:21 ` [PATCH v7 03/47] block: bdrv_cow_child() for bdrv_has_zero_init() Max Reitz
2020-07-08 17:23   ` Andrey Shinkevich
2020-08-07  9:37   ` Vladimir Sementsov-Ogievskiy
2020-06-25 15:21 ` [PATCH v7 04/47] block: bdrv_set_backing_hd() is about bs->backing Max Reitz
2020-07-08 17:24   ` Andrey Shinkevich
2020-06-25 15:21 ` [PATCH v7 05/47] block: Include filters when freezing backing chain Max Reitz
2020-07-08 17:25   ` Andrey Shinkevich
2020-06-25 15:21 ` [PATCH v7 06/47] block: Drop bdrv_is_encrypted() Max Reitz
2020-07-08 17:41   ` Andrey Shinkevich
2020-06-25 15:21 ` [PATCH v7 07/47] block: Add bdrv_supports_compressed_writes() Max Reitz
2020-07-08 17:48   ` Andrey Shinkevich
2020-06-25 15:21 ` [PATCH v7 08/47] throttle: Support compressed writes Max Reitz
2020-07-08 17:52   ` Andrey Shinkevich
2020-06-25 15:21 ` [PATCH v7 09/47] copy-on-read: " Max Reitz
2020-07-08 17:54   ` Andrey Shinkevich
2020-06-25 15:21 ` [PATCH v7 10/47] mirror-top: " Max Reitz
2020-07-08 17:58   ` Andrey Shinkevich
2020-08-18 10:27   ` Kevin Wolf
2020-08-19 15:35     ` Max Reitz
2020-08-19 16:00       ` Kevin Wolf
2020-06-25 15:21 ` [PATCH v7 11/47] backup-top: " Max Reitz
2020-07-08 17:59   ` Andrey Shinkevich
2020-06-25 15:21 ` [PATCH v7 12/47] block: Use bdrv_filter_(bs|child) where obvious Max Reitz
2020-07-08 18:24   ` Andrey Shinkevich
2020-07-09  8:59     ` Max Reitz
2020-07-09  9:11       ` Andrey Shinkevich
2020-06-25 15:21 ` [PATCH v7 13/47] block: Use CAFs in block status functions Max Reitz
2020-07-08 19:13   ` Andrey Shinkevich
2020-06-25 15:21 ` [PATCH v7 14/47] stream: Deal with filters Max Reitz
2020-07-09 14:52   ` Andrey Shinkevich
2020-07-09 15:27     ` Andrey Shinkevich
2020-07-10 15:24     ` Max Reitz
2020-07-10 17:41       ` Andrey Shinkevich
2020-07-16 14:59         ` Max Reitz
2020-08-07 10:29           ` Vladimir Sementsov-Ogievskiy
2020-08-10  8:12             ` Max Reitz
2020-08-10 11:04               ` Vladimir Sementsov-Ogievskiy
2020-08-14 15:18                 ` Andrey Shinkevich
2020-08-18 20:45                 ` Andrey Shinkevich
2020-08-19 12:39                 ` Max Reitz
2020-08-19 13:18                   ` Vladimir Sementsov-Ogievskiy
2020-07-09 15:13   ` Andrey Shinkevich
2020-07-10 15:27     ` Max Reitz
2020-08-18 14:28   ` Kevin Wolf
2020-08-19 14:47     ` Max Reitz
2020-08-19 15:16       ` Kevin Wolf
2020-08-20  8:31         ` Max Reitz
2020-08-20  9:22           ` Max Reitz
2020-08-20 10:49             ` Vladimir Sementsov-Ogievskiy
2020-08-20 11:43               ` Max Reitz
2020-06-25 15:21 ` [PATCH v7 15/47] block: Use CAFs when working with backing chains Max Reitz
2020-07-10 15:28   ` Andrey Shinkevich
2020-06-25 15:21 ` [PATCH v7 16/47] block: Use bdrv_cow_child() in bdrv_co_truncate() Max Reitz
2020-07-10 15:54   ` Andrey Shinkevich
2020-06-25 15:21 ` [PATCH v7 17/47] block: Re-evaluate backing file handling in reopen Max Reitz
2020-07-10 19:42   ` Andrey Shinkevich
2020-07-16 15:04     ` Max Reitz
2020-06-25 15:21 ` [PATCH v7 18/47] block: Flush all children in generic code Max Reitz
2020-07-14 12:52   ` Andrey Shinkevich
2020-06-25 15:21 ` [PATCH v7 19/47] vmdk: Drop vmdk_co_flush() Max Reitz
2020-07-14 14:52   ` Andrey Shinkevich
2020-07-16 15:08     ` Max Reitz
2020-06-25 15:21 ` [PATCH v7 20/47] block: Iterate over children in refresh_limits Max Reitz
2020-07-14 18:37   ` Andrey Shinkevich
2020-07-16 15:14     ` Max Reitz
2020-06-25 15:21 ` [PATCH v7 21/47] block: Use CAFs in bdrv_refresh_filename() Max Reitz
2020-07-15 12:52   ` Andrey Shinkevich
2020-07-15 12:58     ` Andrey Shinkevich
2020-07-16 15:21     ` Max Reitz
2020-06-25 15:21 ` [PATCH v7 22/47] block: Use CAF in bdrv_co_rw_vmstate() Max Reitz
2020-07-15 13:39   ` Andrey Shinkevich
2020-06-25 15:21 ` [PATCH v7 23/47] block/snapshot: Fix fallback Max Reitz
2020-07-15 21:22   ` Andrey Shinkevich
2020-07-15 22:18     ` Andrey Shinkevich
2020-06-25 15:21 ` [PATCH v7 24/47] block: Use CAFs for debug breakpoints Max Reitz
2020-07-15 21:43   ` Andrey Shinkevich
2020-06-25 15:21 ` [PATCH v7 25/47] block: Def. impl.s for get_allocated_file_size Max Reitz
2020-07-15 22:56   ` Andrey Shinkevich
2020-08-19 10:57   ` Kevin Wolf
2020-08-19 15:53     ` Max Reitz
2020-06-25 15:21 ` [PATCH v7 26/47] block: Improve get_allocated_file_size's default Max Reitz
2020-07-20 15:12   ` Andrey Shinkevich
2020-06-25 15:21 ` [PATCH v7 27/47] blkverify: Use bdrv_sum_allocated_file_size() Max Reitz
2020-07-20 15:10   ` Andrey Shinkevich
2020-08-19 10:46   ` Kevin Wolf
2020-08-19 15:50     ` Max Reitz
2020-06-25 15:21 ` [PATCH v7 28/47] block/null: Implement bdrv_get_allocated_file_size Max Reitz
2020-07-20 15:10   ` Andrey Shinkevich
2020-07-24  8:58     ` Max Reitz
2020-07-24  9:49       ` Andrey Shinkevich
2020-06-25 15:21 ` [PATCH v7 29/47] blockdev: Use CAF in external_snapshot_prepare() Max Reitz
2020-07-20 16:08   ` Andrey Shinkevich
2020-07-24  9:23     ` Max Reitz
2020-07-24 10:37       ` Andrey Shinkevich
2020-06-25 15:21 ` [PATCH v7 30/47] block: Report data child for query-blockstats Max Reitz
2020-07-21 11:48   ` Andrey Shinkevich
2020-06-25 15:21 ` [PATCH v7 31/47] block: Use child access functions for QAPI queries Max Reitz
2020-07-21 12:30   ` Andrey Shinkevich
2020-06-25 15:22 ` [PATCH v7 32/47] block-copy: Use CAF to find sync=top base Max Reitz
2020-07-21 12:42   ` Andrey Shinkevich
2020-06-25 15:22 ` [PATCH v7 33/47] mirror: Deal with filters Max Reitz
2020-07-22 18:31   ` Andrey Shinkevich
2020-07-24  9:49     ` Max Reitz
2020-07-24 10:27       ` Andrey Shinkevich
2020-08-19 16:50   ` Kevin Wolf
2020-08-20 10:28     ` Max Reitz
2020-06-25 15:22 ` [PATCH v7 34/47] backup: " Max Reitz
2020-07-23 15:51   ` Andrey Shinkevich
2020-07-24  9:55     ` Max Reitz
2020-06-25 15:22 ` [PATCH v7 35/47] commit: " Max Reitz
2020-07-23 17:15   ` Andrey Shinkevich
2020-07-24 10:36     ` Andrey Shinkevich
2020-08-19 17:58   ` Kevin Wolf
2020-08-20 11:27     ` Max Reitz
2020-08-20 13:47       ` Kevin Wolf
2020-06-25 15:22 ` [PATCH v7 36/47] nbd: Use CAF when looking for dirty bitmap Max Reitz
2020-07-23 17:21   ` Andrey Shinkevich
2020-06-25 15:22 ` [PATCH v7 37/47] qemu-img: Use child access functions Max Reitz
2020-07-24 15:51   ` Andrey Shinkevich
2020-08-21 15:29   ` Kevin Wolf
2020-08-24 12:42     ` Max Reitz
2020-06-25 15:22 ` [PATCH v7 38/47] block: Drop backing_bs() Max Reitz
2020-07-24 15:55   ` Andrey Shinkevich
2020-06-25 15:22 ` [PATCH v7 39/47] blockdev: Fix active commit choice Max Reitz
2020-08-21 15:50   ` Kevin Wolf
2020-08-24 13:18     ` Max Reitz
2020-08-24 14:07       ` Kevin Wolf
2020-08-24 14:41         ` Max Reitz
2020-08-24 15:06           ` Kevin Wolf
2020-06-25 15:22 ` [PATCH v7 40/47] block: Inline bdrv_co_block_status_from_*() Max Reitz
2020-07-24 18:00   ` Andrey Shinkevich
2020-06-25 15:22 ` [PATCH v7 41/47] block: Leave BDS.backing_file constant Max Reitz
2020-07-27 12:27   ` Andrey Shinkevich
2020-07-28 14:10     ` Max Reitz
2020-08-24 13:14   ` Kevin Wolf
2020-08-24 14:29     ` Max Reitz
2020-06-25 15:22 ` [PATCH v7 42/47] iotests: Test that qcow2's data-file is flushed Max Reitz
2020-07-27 13:28   ` Andrey Shinkevich
2020-06-25 15:22 ` [PATCH v7 43/47] iotests: Let complete_and_wait() work with commit Max Reitz
2020-07-27 13:35   ` Andrey Shinkevich
2020-06-25 15:22 ` [PATCH v7 44/47] iotests: Add filter commit test cases Max Reitz
2020-07-27 17:45   ` Andrey Shinkevich
2020-07-28 14:00     ` Max Reitz
2020-06-25 15:22 ` [PATCH v7 45/47] iotests: Add filter mirror " Max Reitz
2020-08-02 11:05   ` Andrey Shinkevich
2020-06-25 15:22 ` [PATCH v7 46/47] iotests: Add test for commit in sub directory Max Reitz
2020-08-02 12:13   ` Andrey Shinkevich
2020-06-25 15:22 ` [PATCH v7 47/47] iotests: Test committing to overridden backing Max Reitz
2020-08-02 11:43   ` Andrey Shinkevich
2020-07-08 17:20 ` [PATCH v7 00/47] block: Deal with filters Andrey Shinkevich
2020-07-08 17:32   ` Eric Blake
2020-07-08 19:46     ` Andrey Shinkevich
2020-07-08 20:37       ` Eric Blake
2020-07-09  8:19         ` Max Reitz
2020-07-08 20:47   ` Eric Blake
2020-07-09  8:20     ` Max Reitz
2020-07-09  9:04       ` Andrey Shinkevich
2020-08-24 15:15 ` Kevin Wolf

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.