All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4 00/18] Make image fleecing more usable
@ 2022-02-16 19:45 Vladimir Sementsov-Ogievskiy
  2022-02-16 19:46 ` [PATCH v4 01/18] block/block-copy: move copy_bitmap initialization to block_copy_state_new() Vladimir Sementsov-Ogievskiy
                   ` (17 more replies)
  0 siblings, 18 replies; 36+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2022-02-16 19:45 UTC (permalink / raw)
  To: qemu-block
  Cc: qemu-devel, armbru, xiechanglong.d, wencongyang2, fam, stefanha,
	eblake, hreitz, kwolf, vsementsov, jsnow, nikita.lapshin

v4: Switch to new fleecing scheme, more native for Qemu's block-layer,
    see new patches 10-12 for details.

01,02: add Hanna's r-b
03: add const, add forgotten bdrv_release_dirty_bitmap()
04: rewrite bitmap parameter parsing
05: add Hanna's r-b
06: tiny wording fixes
07: new
08: fix comments, improve interface of new functions
09: fix grammar, add Hanna's and Nikita's r-bs
10-12: new fleecing scheme
tests: updated to use new fleecing scheme

===

These series brings several improvements to fleecing scheme:

1. support bitmap in copy-before-write filter

2. introduce snapshot-access API and filter, to make a new fleecing
   scheme. See "block: copy-before-write: realize snapshot-access API"
   commit message for details.

3. support "push backup with fleecing" scheme, when backup job is a
   client of common fleecing scheme. That helps when writes to final
   backup target are slow and we don't want guest writes hang waiting
   for copy-before-write operations to final target.

Vladimir Sementsov-Ogievskiy (18):
  block/block-copy: move copy_bitmap initialization to
    block_copy_state_new()
  block/dirty-bitmap: bdrv_merge_dirty_bitmap(): add return value
  block/block-copy: block_copy_state_new(): add bitmap parameter
  block/copy-before-write: add bitmap open parameter
  block/block-copy: add block_copy_reset()
  block: intoduce reqlist
  block/reqlist: reqlist_find_conflict(): use ranges_overlap()
  block/dirty-bitmap: introduce bdrv_dirty_bitmap_status()
  block/reqlist: add reqlist_wait_all()
  block/io: introduce block driver snapshot-access API
  block: introduce snapshot-access filter
  block: copy-before-write: realize snapshot-access API
  iotests/image-fleecing: add test-case for fleecing format node
  iotests.py: add qemu_io_pipe_and_status()
  iotests/image-fleecing: add test case with bitmap
  block: blk_root(): return non-const pointer
  qapi: backup: add immutable-source parameter
  iotests/image-fleecing: test push backup with fleecing

 qapi/block-core.json                        |  25 +-
 include/block/block-copy.h                  |   2 +
 include/block/block_int.h                   |  28 +++
 include/block/dirty-bitmap.h                |   4 +-
 include/block/reqlist.h                     |  75 ++++++
 include/qemu/hbitmap.h                      |  12 +
 include/sysemu/block-backend.h              |   2 +-
 block/backup.c                              |  61 ++++-
 block/block-backend.c                       |   2 +-
 block/block-copy.c                          | 150 +++++------
 block/copy-before-write.c                   | 265 +++++++++++++++++++-
 block/dirty-bitmap.c                        |  15 +-
 block/io.c                                  |  69 +++++
 block/monitor/bitmap-qmp-cmds.c             |   5 +-
 block/replication.c                         |   2 +-
 block/reqlist.c                             |  85 +++++++
 block/snapshot-access.c                     | 132 ++++++++++
 blockdev.c                                  |   1 +
 util/hbitmap.c                              |  33 +++
 MAINTAINERS                                 |   5 +-
 block/meson.build                           |   2 +
 tests/qemu-iotests/iotests.py               |   4 +
 tests/qemu-iotests/tests/image-fleecing     | 175 ++++++++++---
 tests/qemu-iotests/tests/image-fleecing.out | 223 +++++++++++++++-
 24 files changed, 1227 insertions(+), 150 deletions(-)
 create mode 100644 include/block/reqlist.h
 create mode 100644 block/reqlist.c
 create mode 100644 block/snapshot-access.c

-- 
2.31.1



^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v4 01/18] block/block-copy: move copy_bitmap initialization to block_copy_state_new()
  2022-02-16 19:45 [PATCH v4 00/18] Make image fleecing more usable Vladimir Sementsov-Ogievskiy
@ 2022-02-16 19:46 ` Vladimir Sementsov-Ogievskiy
  2022-02-16 19:46 ` [PATCH v4 02/18] block/dirty-bitmap: bdrv_merge_dirty_bitmap(): add return value Vladimir Sementsov-Ogievskiy
                   ` (16 subsequent siblings)
  17 siblings, 0 replies; 36+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2022-02-16 19:46 UTC (permalink / raw)
  To: qemu-block
  Cc: qemu-devel, armbru, xiechanglong.d, wencongyang2, fam, stefanha,
	eblake, hreitz, kwolf, vsementsov, jsnow, nikita.lapshin

We are going to complicate bitmap initialization in the further
commit. And in future, backup job will be able to work without filter
(when source is immutable), so we'll need same bitmap initialization in
copy-before-write filter and in backup job. So, it's reasonable to do
it in block-copy.

Note that for now cbw_open() is the only caller of
block_copy_state_new().

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Hanna Reitz <hreitz@redhat.com>
---
 block/block-copy.c        | 1 +
 block/copy-before-write.c | 4 ----
 2 files changed, 1 insertion(+), 4 deletions(-)

diff --git a/block/block-copy.c b/block/block-copy.c
index ce116318b5..abda7a80bd 100644
--- a/block/block-copy.c
+++ b/block/block-copy.c
@@ -402,6 +402,7 @@ BlockCopyState *block_copy_state_new(BdrvChild *source, BdrvChild *target,
         return NULL;
     }
     bdrv_disable_dirty_bitmap(copy_bitmap);
+    bdrv_set_dirty_bitmap(copy_bitmap, 0, bdrv_dirty_bitmap_size(copy_bitmap));
 
     /*
      * If source is in backing chain of target assume that target is going to be
diff --git a/block/copy-before-write.c b/block/copy-before-write.c
index c30a5ff8de..5bdaf0a9d9 100644
--- a/block/copy-before-write.c
+++ b/block/copy-before-write.c
@@ -149,7 +149,6 @@ static int cbw_open(BlockDriverState *bs, QDict *options, int flags,
                     Error **errp)
 {
     BDRVCopyBeforeWriteState *s = bs->opaque;
-    BdrvDirtyBitmap *copy_bitmap;
 
     bs->file = bdrv_open_child(NULL, options, "file", bs, &child_of_bds,
                                BDRV_CHILD_FILTERED | BDRV_CHILD_PRIMARY,
@@ -177,9 +176,6 @@ static int cbw_open(BlockDriverState *bs, QDict *options, int flags,
         return -EINVAL;
     }
 
-    copy_bitmap = block_copy_dirty_bitmap(s->bcs);
-    bdrv_set_dirty_bitmap(copy_bitmap, 0, bdrv_dirty_bitmap_size(copy_bitmap));
-
     return 0;
 }
 
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v4 02/18] block/dirty-bitmap: bdrv_merge_dirty_bitmap(): add return value
  2022-02-16 19:45 [PATCH v4 00/18] Make image fleecing more usable Vladimir Sementsov-Ogievskiy
  2022-02-16 19:46 ` [PATCH v4 01/18] block/block-copy: move copy_bitmap initialization to block_copy_state_new() Vladimir Sementsov-Ogievskiy
@ 2022-02-16 19:46 ` Vladimir Sementsov-Ogievskiy
  2022-02-16 19:46 ` [PATCH v4 03/18] block/block-copy: block_copy_state_new(): add bitmap parameter Vladimir Sementsov-Ogievskiy
                   ` (15 subsequent siblings)
  17 siblings, 0 replies; 36+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2022-02-16 19:46 UTC (permalink / raw)
  To: qemu-block
  Cc: qemu-devel, armbru, xiechanglong.d, wencongyang2, fam, stefanha,
	eblake, hreitz, kwolf, vsementsov, jsnow, nikita.lapshin

That simplifies handling failure in existing code and in further new
usage of bdrv_merge_dirty_bitmap().

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Hanna Reitz <hreitz@redhat.com>
---
 include/block/dirty-bitmap.h    | 2 +-
 block/dirty-bitmap.c            | 9 +++++++--
 block/monitor/bitmap-qmp-cmds.c | 5 +----
 3 files changed, 9 insertions(+), 7 deletions(-)

diff --git a/include/block/dirty-bitmap.h b/include/block/dirty-bitmap.h
index 40950ae3d5..f95d350b70 100644
--- a/include/block/dirty-bitmap.h
+++ b/include/block/dirty-bitmap.h
@@ -77,7 +77,7 @@ void bdrv_dirty_bitmap_set_persistence(BdrvDirtyBitmap *bitmap,
                                        bool persistent);
 void bdrv_dirty_bitmap_set_inconsistent(BdrvDirtyBitmap *bitmap);
 void bdrv_dirty_bitmap_set_busy(BdrvDirtyBitmap *bitmap, bool busy);
-void bdrv_merge_dirty_bitmap(BdrvDirtyBitmap *dest, const BdrvDirtyBitmap *src,
+bool bdrv_merge_dirty_bitmap(BdrvDirtyBitmap *dest, const BdrvDirtyBitmap *src,
                              HBitmap **backup, Error **errp);
 void bdrv_dirty_bitmap_skip_store(BdrvDirtyBitmap *bitmap, bool skip);
 bool bdrv_dirty_bitmap_get(BdrvDirtyBitmap *bitmap, int64_t offset);
diff --git a/block/dirty-bitmap.c b/block/dirty-bitmap.c
index 0ef46163e3..94a0276833 100644
--- a/block/dirty-bitmap.c
+++ b/block/dirty-bitmap.c
@@ -880,11 +880,14 @@ bool bdrv_dirty_bitmap_next_dirty_area(BdrvDirtyBitmap *bitmap,
  * Ensures permissions on bitmaps are reasonable; use for public API.
  *
  * @backup: If provided, make a copy of dest here prior to merge.
+ *
+ * Returns true on success, false on failure. In case of failure bitmaps are
+ * untouched.
  */
-void bdrv_merge_dirty_bitmap(BdrvDirtyBitmap *dest, const BdrvDirtyBitmap *src,
+bool bdrv_merge_dirty_bitmap(BdrvDirtyBitmap *dest, const BdrvDirtyBitmap *src,
                              HBitmap **backup, Error **errp)
 {
-    bool ret;
+    bool ret = false;
 
     bdrv_dirty_bitmaps_lock(dest->bs);
     if (src->bs != dest->bs) {
@@ -912,6 +915,8 @@ out:
     if (src->bs != dest->bs) {
         bdrv_dirty_bitmaps_unlock(src->bs);
     }
+
+    return ret;
 }
 
 /**
diff --git a/block/monitor/bitmap-qmp-cmds.c b/block/monitor/bitmap-qmp-cmds.c
index 9f11deec64..83970b22fa 100644
--- a/block/monitor/bitmap-qmp-cmds.c
+++ b/block/monitor/bitmap-qmp-cmds.c
@@ -259,7 +259,6 @@ BdrvDirtyBitmap *block_dirty_bitmap_merge(const char *node, const char *target,
     BlockDriverState *bs;
     BdrvDirtyBitmap *dst, *src, *anon;
     BlockDirtyBitmapMergeSourceList *lst;
-    Error *local_err = NULL;
 
     dst = block_dirty_bitmap_lookup(node, target, &bs, errp);
     if (!dst) {
@@ -297,9 +296,7 @@ BdrvDirtyBitmap *block_dirty_bitmap_merge(const char *node, const char *target,
             abort();
         }
 
-        bdrv_merge_dirty_bitmap(anon, src, NULL, &local_err);
-        if (local_err) {
-            error_propagate(errp, local_err);
+        if (!bdrv_merge_dirty_bitmap(anon, src, NULL, errp)) {
             dst = NULL;
             goto out;
         }
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v4 03/18] block/block-copy: block_copy_state_new(): add bitmap parameter
  2022-02-16 19:45 [PATCH v4 00/18] Make image fleecing more usable Vladimir Sementsov-Ogievskiy
  2022-02-16 19:46 ` [PATCH v4 01/18] block/block-copy: move copy_bitmap initialization to block_copy_state_new() Vladimir Sementsov-Ogievskiy
  2022-02-16 19:46 ` [PATCH v4 02/18] block/dirty-bitmap: bdrv_merge_dirty_bitmap(): add return value Vladimir Sementsov-Ogievskiy
@ 2022-02-16 19:46 ` Vladimir Sementsov-Ogievskiy
  2022-02-24 12:01   ` Hanna Reitz
  2022-02-16 19:46 ` [PATCH v4 04/18] block/copy-before-write: add bitmap open parameter Vladimir Sementsov-Ogievskiy
                   ` (14 subsequent siblings)
  17 siblings, 1 reply; 36+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2022-02-16 19:46 UTC (permalink / raw)
  To: qemu-block
  Cc: qemu-devel, armbru, xiechanglong.d, wencongyang2, fam, stefanha,
	eblake, hreitz, kwolf, vsementsov, jsnow, nikita.lapshin

This will be used in the following commit to bring "incremental" mode
to copy-before-write filter.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 include/block/block-copy.h |  1 +
 block/block-copy.c         | 14 +++++++++++++-
 block/copy-before-write.c  |  2 +-
 3 files changed, 15 insertions(+), 2 deletions(-)

diff --git a/include/block/block-copy.h b/include/block/block-copy.h
index 99370fa38b..b80ad02299 100644
--- a/include/block/block-copy.h
+++ b/include/block/block-copy.h
@@ -25,6 +25,7 @@ typedef struct BlockCopyState BlockCopyState;
 typedef struct BlockCopyCallState BlockCopyCallState;
 
 BlockCopyState *block_copy_state_new(BdrvChild *source, BdrvChild *target,
+                                     const BdrvDirtyBitmap *bitmap,
                                      Error **errp);
 
 /* Function should be called prior any actual copy request */
diff --git a/block/block-copy.c b/block/block-copy.c
index abda7a80bd..8aa6ee6a5c 100644
--- a/block/block-copy.c
+++ b/block/block-copy.c
@@ -384,8 +384,10 @@ static int64_t block_copy_calculate_cluster_size(BlockDriverState *target,
 }
 
 BlockCopyState *block_copy_state_new(BdrvChild *source, BdrvChild *target,
+                                     const BdrvDirtyBitmap *bitmap,
                                      Error **errp)
 {
+    ERRP_GUARD();
     BlockCopyState *s;
     int64_t cluster_size;
     BdrvDirtyBitmap *copy_bitmap;
@@ -402,7 +404,17 @@ BlockCopyState *block_copy_state_new(BdrvChild *source, BdrvChild *target,
         return NULL;
     }
     bdrv_disable_dirty_bitmap(copy_bitmap);
-    bdrv_set_dirty_bitmap(copy_bitmap, 0, bdrv_dirty_bitmap_size(copy_bitmap));
+    if (bitmap) {
+        if (!bdrv_merge_dirty_bitmap(copy_bitmap, bitmap, NULL, errp)) {
+            error_prepend(errp, "Failed to merge bitmap '%s' to internal "
+                          "copy-bitmap: ", bdrv_dirty_bitmap_name(bitmap));
+            bdrv_release_dirty_bitmap(copy_bitmap);
+            return NULL;
+        }
+    } else {
+        bdrv_set_dirty_bitmap(copy_bitmap, 0,
+                              bdrv_dirty_bitmap_size(copy_bitmap));
+    }
 
     /*
      * If source is in backing chain of target assume that target is going to be
diff --git a/block/copy-before-write.c b/block/copy-before-write.c
index 5bdaf0a9d9..799223e3fb 100644
--- a/block/copy-before-write.c
+++ b/block/copy-before-write.c
@@ -170,7 +170,7 @@ static int cbw_open(BlockDriverState *bs, QDict *options, int flags,
             ((BDRV_REQ_FUA | BDRV_REQ_MAY_UNMAP | BDRV_REQ_NO_FALLBACK) &
              bs->file->bs->supported_zero_flags);
 
-    s->bcs = block_copy_state_new(bs->file, s->target, errp);
+    s->bcs = block_copy_state_new(bs->file, s->target, NULL, errp);
     if (!s->bcs) {
         error_prepend(errp, "Cannot create block-copy-state: ");
         return -EINVAL;
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v4 04/18] block/copy-before-write: add bitmap open parameter
  2022-02-16 19:45 [PATCH v4 00/18] Make image fleecing more usable Vladimir Sementsov-Ogievskiy
                   ` (2 preceding siblings ...)
  2022-02-16 19:46 ` [PATCH v4 03/18] block/block-copy: block_copy_state_new(): add bitmap parameter Vladimir Sementsov-Ogievskiy
@ 2022-02-16 19:46 ` Vladimir Sementsov-Ogievskiy
  2022-02-24 12:07   ` Hanna Reitz
  2022-02-16 19:46 ` [PATCH v4 05/18] block/block-copy: add block_copy_reset() Vladimir Sementsov-Ogievskiy
                   ` (13 subsequent siblings)
  17 siblings, 1 reply; 36+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2022-02-16 19:46 UTC (permalink / raw)
  To: qemu-block
  Cc: qemu-devel, armbru, xiechanglong.d, wencongyang2, fam, stefanha,
	eblake, hreitz, kwolf, vsementsov, jsnow, nikita.lapshin

This brings "incremental" mode to copy-before-write filter: user can
specify bitmap so that filter will copy only "dirty" areas.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 qapi/block-core.json      | 10 +++++++-
 block/copy-before-write.c | 51 ++++++++++++++++++++++++++++++++++++++-
 2 files changed, 59 insertions(+), 2 deletions(-)

diff --git a/qapi/block-core.json b/qapi/block-core.json
index 9a5a3641d0..3bab597506 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -4171,11 +4171,19 @@
 #
 # @target: The target for copy-before-write operations.
 #
+# @bitmap: If specified, copy-before-write filter will do
+#          copy-before-write operations only for dirty regions of the
+#          bitmap. Bitmap size must be equal to length of file and
+#          target child of the filter. Note also, that bitmap is used
+#          only to initialize internal bitmap of the process, so further
+#          modifications (or removing) of specified bitmap doesn't
+#          influence the filter.
+#
 # Since: 6.2
 ##
 { 'struct': 'BlockdevOptionsCbw',
   'base': 'BlockdevOptionsGenericFormat',
-  'data': { 'target': 'BlockdevRef' } }
+  'data': { 'target': 'BlockdevRef', '*bitmap': 'BlockDirtyBitmap' } }
 
 ##
 # @BlockdevOptions:
diff --git a/block/copy-before-write.c b/block/copy-before-write.c
index 799223e3fb..91a2288b66 100644
--- a/block/copy-before-write.c
+++ b/block/copy-before-write.c
@@ -34,6 +34,8 @@
 
 #include "block/copy-before-write.h"
 
+#include "qapi/qapi-visit-block-core.h"
+
 typedef struct BDRVCopyBeforeWriteState {
     BlockCopyState *bcs;
     BdrvChild *target;
@@ -145,10 +147,53 @@ static void cbw_child_perm(BlockDriverState *bs, BdrvChild *c,
     }
 }
 
+static bool cbw_parse_bitmap_option(QDict *options, BdrvDirtyBitmap **bitmap,
+                                    Error **errp)
+{
+    QDict *bitmap_qdict = NULL;
+    BlockDirtyBitmap *bmp_param = NULL;
+    Visitor *v = NULL;
+    bool ret = false;
+
+    *bitmap = NULL;
+
+    qdict_extract_subqdict(options, &bitmap_qdict, "bitmap.");
+    if (!qdict_size(bitmap_qdict)) {
+        ret = true;
+        goto out;
+    }
+
+    v = qobject_input_visitor_new_flat_confused(bitmap_qdict, errp);
+    if (!v) {
+        goto out;
+    }
+
+    visit_type_BlockDirtyBitmap(v, NULL, &bmp_param, errp);
+    if (!bmp_param) {
+        goto out;
+    }
+
+    *bitmap = block_dirty_bitmap_lookup(bmp_param->node, bmp_param->name, NULL,
+                                        errp);
+    if (!*bitmap) {
+        goto out;
+    }
+
+    ret = true;
+
+out:
+    qapi_free_BlockDirtyBitmap(bmp_param);
+    visit_free(v);
+    qobject_unref(bitmap_qdict);
+
+    return ret;
+}
+
 static int cbw_open(BlockDriverState *bs, QDict *options, int flags,
                     Error **errp)
 {
     BDRVCopyBeforeWriteState *s = bs->opaque;
+    BdrvDirtyBitmap *bitmap = NULL;
 
     bs->file = bdrv_open_child(NULL, options, "file", bs, &child_of_bds,
                                BDRV_CHILD_FILTERED | BDRV_CHILD_PRIMARY,
@@ -163,6 +208,10 @@ static int cbw_open(BlockDriverState *bs, QDict *options, int flags,
         return -EINVAL;
     }
 
+    if (!cbw_parse_bitmap_option(options, &bitmap, errp)) {
+        return -EINVAL;
+    }
+
     bs->total_sectors = bs->file->bs->total_sectors;
     bs->supported_write_flags = BDRV_REQ_WRITE_UNCHANGED |
             (BDRV_REQ_FUA & bs->file->bs->supported_write_flags);
@@ -170,7 +219,7 @@ static int cbw_open(BlockDriverState *bs, QDict *options, int flags,
             ((BDRV_REQ_FUA | BDRV_REQ_MAY_UNMAP | BDRV_REQ_NO_FALLBACK) &
              bs->file->bs->supported_zero_flags);
 
-    s->bcs = block_copy_state_new(bs->file, s->target, NULL, errp);
+    s->bcs = block_copy_state_new(bs->file, s->target, bitmap, errp);
     if (!s->bcs) {
         error_prepend(errp, "Cannot create block-copy-state: ");
         return -EINVAL;
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v4 05/18] block/block-copy: add block_copy_reset()
  2022-02-16 19:45 [PATCH v4 00/18] Make image fleecing more usable Vladimir Sementsov-Ogievskiy
                   ` (3 preceding siblings ...)
  2022-02-16 19:46 ` [PATCH v4 04/18] block/copy-before-write: add bitmap open parameter Vladimir Sementsov-Ogievskiy
@ 2022-02-16 19:46 ` Vladimir Sementsov-Ogievskiy
  2022-02-16 19:46 ` [PATCH v4 06/18] block: intoduce reqlist Vladimir Sementsov-Ogievskiy
                   ` (12 subsequent siblings)
  17 siblings, 0 replies; 36+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2022-02-16 19:46 UTC (permalink / raw)
  To: qemu-block
  Cc: qemu-devel, armbru, xiechanglong.d, wencongyang2, fam, stefanha,
	eblake, hreitz, kwolf, vsementsov, jsnow, nikita.lapshin

Split block_copy_reset() out of block_copy_reset_unallocated() to be
used separately later.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Hanna Reitz <hreitz@redhat.com>
---
 include/block/block-copy.h |  1 +
 block/block-copy.c         | 21 +++++++++++++--------
 2 files changed, 14 insertions(+), 8 deletions(-)

diff --git a/include/block/block-copy.h b/include/block/block-copy.h
index b80ad02299..68bbd344b2 100644
--- a/include/block/block-copy.h
+++ b/include/block/block-copy.h
@@ -35,6 +35,7 @@ void block_copy_set_progress_meter(BlockCopyState *s, ProgressMeter *pm);
 
 void block_copy_state_free(BlockCopyState *s);
 
+void block_copy_reset(BlockCopyState *s, int64_t offset, int64_t bytes);
 int64_t block_copy_reset_unallocated(BlockCopyState *s,
                                      int64_t offset, int64_t *count);
 
diff --git a/block/block-copy.c b/block/block-copy.c
index 8aa6ee6a5c..0834e29b6e 100644
--- a/block/block-copy.c
+++ b/block/block-copy.c
@@ -692,6 +692,18 @@ static int block_copy_is_cluster_allocated(BlockCopyState *s, int64_t offset,
     }
 }
 
+void block_copy_reset(BlockCopyState *s, int64_t offset, int64_t bytes)
+{
+    QEMU_LOCK_GUARD(&s->lock);
+
+    bdrv_reset_dirty_bitmap(s->copy_bitmap, offset, bytes);
+    if (s->progress) {
+        progress_set_remaining(s->progress,
+                               bdrv_get_dirty_count(s->copy_bitmap) +
+                               s->in_flight_bytes);
+    }
+}
+
 /*
  * Reset bits in copy_bitmap starting at offset if they represent unallocated
  * data in the image. May reset subsequent contiguous bits.
@@ -712,14 +724,7 @@ int64_t block_copy_reset_unallocated(BlockCopyState *s,
     bytes = clusters * s->cluster_size;
 
     if (!ret) {
-        qemu_co_mutex_lock(&s->lock);
-        bdrv_reset_dirty_bitmap(s->copy_bitmap, offset, bytes);
-        if (s->progress) {
-            progress_set_remaining(s->progress,
-                                   bdrv_get_dirty_count(s->copy_bitmap) +
-                                   s->in_flight_bytes);
-        }
-        qemu_co_mutex_unlock(&s->lock);
+        block_copy_reset(s, offset, bytes);
     }
 
     *count = bytes;
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v4 06/18] block: intoduce reqlist
  2022-02-16 19:45 [PATCH v4 00/18] Make image fleecing more usable Vladimir Sementsov-Ogievskiy
                   ` (4 preceding siblings ...)
  2022-02-16 19:46 ` [PATCH v4 05/18] block/block-copy: add block_copy_reset() Vladimir Sementsov-Ogievskiy
@ 2022-02-16 19:46 ` Vladimir Sementsov-Ogievskiy
  2022-02-24 12:08   ` Hanna Reitz
  2022-02-16 19:46 ` [PATCH v4 07/18] block/reqlist: reqlist_find_conflict(): use ranges_overlap() Vladimir Sementsov-Ogievskiy
                   ` (11 subsequent siblings)
  17 siblings, 1 reply; 36+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2022-02-16 19:46 UTC (permalink / raw)
  To: qemu-block
  Cc: qemu-devel, armbru, xiechanglong.d, wencongyang2, fam, stefanha,
	eblake, hreitz, kwolf, vsementsov, jsnow, nikita.lapshin

Split intersecting-requests functionality out of block-copy to be
reused in copy-before-write filter.

Note: while being here, fix tiny typo in MAINTAINERS.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 include/block/reqlist.h |  67 +++++++++++++++++++++++
 block/block-copy.c      | 116 +++++++++++++---------------------------
 block/reqlist.c         |  76 ++++++++++++++++++++++++++
 MAINTAINERS             |   4 +-
 block/meson.build       |   1 +
 5 files changed, 184 insertions(+), 80 deletions(-)
 create mode 100644 include/block/reqlist.h
 create mode 100644 block/reqlist.c

diff --git a/include/block/reqlist.h b/include/block/reqlist.h
new file mode 100644
index 0000000000..0fa1eef259
--- /dev/null
+++ b/include/block/reqlist.h
@@ -0,0 +1,67 @@
+/*
+ * reqlist API
+ *
+ * Copyright (C) 2013 Proxmox Server Solutions
+ * Copyright (c) 2021 Virtuozzo International GmbH.
+ *
+ * Authors:
+ *  Dietmar Maurer (dietmar@proxmox.com)
+ *  Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#ifndef REQLIST_H
+#define REQLIST_H
+
+#include "qemu/coroutine.h"
+
+/*
+ * The API is not thread-safe and shouldn't be. The struct is public to be part
+ * of other structures and protected by third-party locks, see
+ * block/block-copy.c for example.
+ */
+
+typedef struct BlockReq {
+    int64_t offset;
+    int64_t bytes;
+
+    CoQueue wait_queue; /* coroutines blocked on this req */
+    QLIST_ENTRY(BlockReq) list;
+} BlockReq;
+
+typedef QLIST_HEAD(, BlockReq) BlockReqList;
+
+/*
+ * Initialize new request and add it to the list. Caller must be sure that
+ * there are no conflicting requests in the list.
+ */
+void reqlist_init_req(BlockReqList *reqs, BlockReq *req, int64_t offset,
+                      int64_t bytes);
+/* Search for request in the list intersecting with @offset/@bytes area. */
+BlockReq *reqlist_find_conflict(BlockReqList *reqs, int64_t offset,
+                                int64_t bytes);
+
+/*
+ * If there are no intersecting requests return false. Otherwise, wait for the
+ * first found intersecting request to finish and return true.
+ *
+ * @lock is passed to qemu_co_queue_wait()
+ * False return value proves that lock was released at no point.
+ */
+bool coroutine_fn reqlist_wait_one(BlockReqList *reqs, int64_t offset,
+                                   int64_t bytes, CoMutex *lock);
+
+/*
+ * Shrink request and wake all waiting coroutines (maybe some of them are not
+ * intersecting with shrunk request).
+ */
+void coroutine_fn reqlist_shrink_req(BlockReq *req, int64_t new_bytes);
+
+/*
+ * Remove request and wake all waiting coroutines. Do not release any memory.
+ */
+void coroutine_fn reqlist_remove_req(BlockReq *req);
+
+#endif /* REQLIST_H */
diff --git a/block/block-copy.c b/block/block-copy.c
index 0834e29b6e..ef948dccec 100644
--- a/block/block-copy.c
+++ b/block/block-copy.c
@@ -17,6 +17,7 @@
 #include "trace.h"
 #include "qapi/error.h"
 #include "block/block-copy.h"
+#include "block/reqlist.h"
 #include "sysemu/block-backend.h"
 #include "qemu/units.h"
 #include "qemu/coroutine.h"
@@ -83,7 +84,6 @@ typedef struct BlockCopyTask {
      */
     BlockCopyState *s;
     BlockCopyCallState *call_state;
-    int64_t offset;
     /*
      * @method can also be set again in the while loop of
      * block_copy_dirty_clusters(), but it is never accessed concurrently
@@ -94,21 +94,17 @@ typedef struct BlockCopyTask {
     BlockCopyMethod method;
 
     /*
-     * Fields whose state changes throughout the execution
-     * Protected by lock in BlockCopyState.
+     * Generally, req is protected by lock in BlockCopyState, Still req.offset
+     * is only set on task creation, so may be read concurrently after creation.
+     * req.bytes is changed at most once, and need only protecting the case of
+     * parallel read while updating @bytes value in block_copy_task_shrink().
      */
-    CoQueue wait_queue; /* coroutines blocked on this task */
-    /*
-     * Only protect the case of parallel read while updating @bytes
-     * value in block_copy_task_shrink().
-     */
-    int64_t bytes;
-    QLIST_ENTRY(BlockCopyTask) list;
+    BlockReq req;
 } BlockCopyTask;
 
 static int64_t task_end(BlockCopyTask *task)
 {
-    return task->offset + task->bytes;
+    return task->req.offset + task->req.bytes;
 }
 
 typedef struct BlockCopyState {
@@ -136,7 +132,7 @@ typedef struct BlockCopyState {
     CoMutex lock;
     int64_t in_flight_bytes;
     BlockCopyMethod method;
-    QLIST_HEAD(, BlockCopyTask) tasks; /* All tasks from all block-copy calls */
+    BlockReqList reqs;
     QLIST_HEAD(, BlockCopyCallState) calls;
     /*
      * skip_unallocated:
@@ -160,42 +156,6 @@ typedef struct BlockCopyState {
     RateLimit rate_limit;
 } BlockCopyState;
 
-/* Called with lock held */
-static BlockCopyTask *find_conflicting_task(BlockCopyState *s,
-                                            int64_t offset, int64_t bytes)
-{
-    BlockCopyTask *t;
-
-    QLIST_FOREACH(t, &s->tasks, list) {
-        if (offset + bytes > t->offset && offset < t->offset + t->bytes) {
-            return t;
-        }
-    }
-
-    return NULL;
-}
-
-/*
- * If there are no intersecting tasks return false. Otherwise, wait for the
- * first found intersecting tasks to finish and return true.
- *
- * Called with lock held. May temporary release the lock.
- * Return value of 0 proves that lock was NOT released.
- */
-static bool coroutine_fn block_copy_wait_one(BlockCopyState *s, int64_t offset,
-                                             int64_t bytes)
-{
-    BlockCopyTask *task = find_conflicting_task(s, offset, bytes);
-
-    if (!task) {
-        return false;
-    }
-
-    qemu_co_queue_wait(&task->wait_queue, &s->lock);
-
-    return true;
-}
-
 /* Called with lock held */
 static int64_t block_copy_chunk_size(BlockCopyState *s)
 {
@@ -239,7 +199,7 @@ block_copy_task_create(BlockCopyState *s, BlockCopyCallState *call_state,
     bytes = QEMU_ALIGN_UP(bytes, s->cluster_size);
 
     /* region is dirty, so no existent tasks possible in it */
-    assert(!find_conflicting_task(s, offset, bytes));
+    assert(!reqlist_find_conflict(&s->reqs, offset, bytes));
 
     bdrv_reset_dirty_bitmap(s->copy_bitmap, offset, bytes);
     s->in_flight_bytes += bytes;
@@ -249,12 +209,9 @@ block_copy_task_create(BlockCopyState *s, BlockCopyCallState *call_state,
         .task.func = block_copy_task_entry,
         .s = s,
         .call_state = call_state,
-        .offset = offset,
-        .bytes = bytes,
         .method = s->method,
     };
-    qemu_co_queue_init(&task->wait_queue);
-    QLIST_INSERT_HEAD(&s->tasks, task, list);
+    reqlist_init_req(&s->reqs, &task->req, offset, bytes);
 
     return task;
 }
@@ -270,34 +227,34 @@ static void coroutine_fn block_copy_task_shrink(BlockCopyTask *task,
                                                 int64_t new_bytes)
 {
     QEMU_LOCK_GUARD(&task->s->lock);
-    if (new_bytes == task->bytes) {
+    if (new_bytes == task->req.bytes) {
         return;
     }
 
-    assert(new_bytes > 0 && new_bytes < task->bytes);
+    assert(new_bytes > 0 && new_bytes < task->req.bytes);
 
-    task->s->in_flight_bytes -= task->bytes - new_bytes;
+    task->s->in_flight_bytes -= task->req.bytes - new_bytes;
     bdrv_set_dirty_bitmap(task->s->copy_bitmap,
-                          task->offset + new_bytes, task->bytes - new_bytes);
+                          task->req.offset + new_bytes,
+                          task->req.bytes - new_bytes);
 
-    task->bytes = new_bytes;
-    qemu_co_queue_restart_all(&task->wait_queue);
+    reqlist_shrink_req(&task->req, new_bytes);
 }
 
 static void coroutine_fn block_copy_task_end(BlockCopyTask *task, int ret)
 {
     QEMU_LOCK_GUARD(&task->s->lock);
-    task->s->in_flight_bytes -= task->bytes;
+    task->s->in_flight_bytes -= task->req.bytes;
     if (ret < 0) {
-        bdrv_set_dirty_bitmap(task->s->copy_bitmap, task->offset, task->bytes);
+        bdrv_set_dirty_bitmap(task->s->copy_bitmap, task->req.offset,
+                              task->req.bytes);
     }
-    QLIST_REMOVE(task, list);
     if (task->s->progress) {
         progress_set_remaining(task->s->progress,
                                bdrv_get_dirty_count(task->s->copy_bitmap) +
                                task->s->in_flight_bytes);
     }
-    qemu_co_queue_restart_all(&task->wait_queue);
+    reqlist_remove_req(&task->req);
 }
 
 void block_copy_state_free(BlockCopyState *s)
@@ -450,7 +407,7 @@ BlockCopyState *block_copy_state_new(BdrvChild *source, BdrvChild *target,
 
     ratelimit_init(&s->rate_limit);
     qemu_co_mutex_init(&s->lock);
-    QLIST_INIT(&s->tasks);
+    QLIST_INIT(&s->reqs);
     QLIST_INIT(&s->calls);
 
     return s;
@@ -483,7 +440,7 @@ static coroutine_fn int block_copy_task_run(AioTaskPool *pool,
 
     aio_task_pool_wait_slot(pool);
     if (aio_task_pool_status(pool) < 0) {
-        co_put_to_shres(task->s->mem, task->bytes);
+        co_put_to_shres(task->s->mem, task->req.bytes);
         block_copy_task_end(task, -ECANCELED);
         g_free(task);
         return -ECANCELED;
@@ -596,7 +553,8 @@ static coroutine_fn int block_copy_task_entry(AioTask *task)
     BlockCopyMethod method = t->method;
     int ret;
 
-    ret = block_copy_do_copy(s, t->offset, t->bytes, &method, &error_is_read);
+    ret = block_copy_do_copy(s, t->req.offset, t->req.bytes, &method,
+                             &error_is_read);
 
     WITH_QEMU_LOCK_GUARD(&s->lock) {
         if (s->method == t->method) {
@@ -609,10 +567,10 @@ static coroutine_fn int block_copy_task_entry(AioTask *task)
                 t->call_state->error_is_read = error_is_read;
             }
         } else if (s->progress) {
-            progress_work_done(s->progress, t->bytes);
+            progress_work_done(s->progress, t->req.bytes);
         }
     }
-    co_put_to_shres(s->mem, t->bytes);
+    co_put_to_shres(s->mem, t->req.bytes);
     block_copy_task_end(t, ret);
 
     return ret;
@@ -771,22 +729,22 @@ block_copy_dirty_clusters(BlockCopyCallState *call_state)
             trace_block_copy_skip_range(s, offset, bytes);
             break;
         }
-        if (task->offset > offset) {
-            trace_block_copy_skip_range(s, offset, task->offset - offset);
+        if (task->req.offset > offset) {
+            trace_block_copy_skip_range(s, offset, task->req.offset - offset);
         }
 
         found_dirty = true;
 
-        ret = block_copy_block_status(s, task->offset, task->bytes,
+        ret = block_copy_block_status(s, task->req.offset, task->req.bytes,
                                       &status_bytes);
         assert(ret >= 0); /* never fail */
-        if (status_bytes < task->bytes) {
+        if (status_bytes < task->req.bytes) {
             block_copy_task_shrink(task, status_bytes);
         }
         if (qatomic_read(&s->skip_unallocated) &&
             !(ret & BDRV_BLOCK_ALLOCATED)) {
             block_copy_task_end(task, 0);
-            trace_block_copy_skip_range(s, task->offset, task->bytes);
+            trace_block_copy_skip_range(s, task->req.offset, task->req.bytes);
             offset = task_end(task);
             bytes = end - offset;
             g_free(task);
@@ -807,11 +765,11 @@ block_copy_dirty_clusters(BlockCopyCallState *call_state)
             }
         }
 
-        ratelimit_calculate_delay(&s->rate_limit, task->bytes);
+        ratelimit_calculate_delay(&s->rate_limit, task->req.bytes);
 
-        trace_block_copy_process(s, task->offset);
+        trace_block_copy_process(s, task->req.offset);
 
-        co_get_from_shres(s->mem, task->bytes);
+        co_get_from_shres(s->mem, task->req.bytes);
 
         offset = task_end(task);
         bytes = end - offset;
@@ -879,8 +837,8 @@ static int coroutine_fn block_copy_common(BlockCopyCallState *call_state)
                  * Check that there is no task we still need to
                  * wait to complete
                  */
-                ret = block_copy_wait_one(s, call_state->offset,
-                                          call_state->bytes);
+                ret = reqlist_wait_one(&s->reqs, call_state->offset,
+                                       call_state->bytes, &s->lock);
                 if (ret == 0) {
                     /*
                      * No pending tasks, but check again the bitmap in this
@@ -888,7 +846,7 @@ static int coroutine_fn block_copy_common(BlockCopyCallState *call_state)
                      * between this and the critical section in
                      * block_copy_dirty_clusters().
                      *
-                     * block_copy_wait_one return value 0 also means that it
+                     * reqlist_wait_one return value 0 also means that it
                      * didn't release the lock. So, we are still in the same
                      * critical section, not interrupted by any concurrent
                      * access to state.
diff --git a/block/reqlist.c b/block/reqlist.c
new file mode 100644
index 0000000000..5e320ba649
--- /dev/null
+++ b/block/reqlist.c
@@ -0,0 +1,76 @@
+/*
+ * reqlist API
+ *
+ * Copyright (C) 2013 Proxmox Server Solutions
+ * Copyright (c) 2021 Virtuozzo International GmbH.
+ *
+ * Authors:
+ *  Dietmar Maurer (dietmar@proxmox.com)
+ *  Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+
+#include "block/reqlist.h"
+
+void reqlist_init_req(BlockReqList *reqs, BlockReq *req, int64_t offset,
+                      int64_t bytes)
+{
+    assert(!reqlist_find_conflict(reqs, offset, bytes));
+
+    *req = (BlockReq) {
+        .offset = offset,
+        .bytes = bytes,
+    };
+    qemu_co_queue_init(&req->wait_queue);
+    QLIST_INSERT_HEAD(reqs, req, list);
+}
+
+BlockReq *reqlist_find_conflict(BlockReqList *reqs, int64_t offset,
+                                int64_t bytes)
+{
+    BlockReq *r;
+
+    QLIST_FOREACH(r, reqs, list) {
+        if (offset + bytes > r->offset && offset < r->offset + r->bytes) {
+            return r;
+        }
+    }
+
+    return NULL;
+}
+
+bool coroutine_fn reqlist_wait_one(BlockReqList *reqs, int64_t offset,
+                                   int64_t bytes, CoMutex *lock)
+{
+    BlockReq *r = reqlist_find_conflict(reqs, offset, bytes);
+
+    if (!r) {
+        return false;
+    }
+
+    qemu_co_queue_wait(&r->wait_queue, lock);
+
+    return true;
+}
+
+void coroutine_fn reqlist_shrink_req(BlockReq *req, int64_t new_bytes)
+{
+    if (new_bytes == req->bytes) {
+        return;
+    }
+
+    assert(new_bytes > 0 && new_bytes < req->bytes);
+
+    req->bytes = new_bytes;
+    qemu_co_queue_restart_all(&req->wait_queue);
+}
+
+void coroutine_fn reqlist_remove_req(BlockReq *req)
+{
+    QLIST_REMOVE(req, list);
+    qemu_co_queue_restart_all(&req->wait_queue);
+}
diff --git a/MAINTAINERS b/MAINTAINERS
index 4b3ae2ab08..7a5292b814 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2504,7 +2504,9 @@ F: block/stream.c
 F: block/mirror.c
 F: qapi/job.json
 F: block/block-copy.c
-F: include/block/block-copy.c
+F: include/block/block-copy.h
+F: block/reqlist.c
+F: include/block/reqlist.h
 F: block/copy-before-write.h
 F: block/copy-before-write.c
 F: include/block/aio_task.h
diff --git a/block/meson.build b/block/meson.build
index 90dc9983e5..e2f0fe34b4 100644
--- a/block/meson.build
+++ b/block/meson.build
@@ -32,6 +32,7 @@ block_ss.add(files(
   'qcow2.c',
   'quorum.c',
   'raw-format.c',
+  'reqlist.c',
   'snapshot.c',
   'throttle-groups.c',
   'throttle.c',
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v4 07/18] block/reqlist: reqlist_find_conflict(): use ranges_overlap()
  2022-02-16 19:45 [PATCH v4 00/18] Make image fleecing more usable Vladimir Sementsov-Ogievskiy
                   ` (5 preceding siblings ...)
  2022-02-16 19:46 ` [PATCH v4 06/18] block: intoduce reqlist Vladimir Sementsov-Ogievskiy
@ 2022-02-16 19:46 ` Vladimir Sementsov-Ogievskiy
  2022-02-24 12:08   ` Hanna Reitz
  2022-02-16 19:46 ` [PATCH v4 08/18] block/dirty-bitmap: introduce bdrv_dirty_bitmap_status() Vladimir Sementsov-Ogievskiy
                   ` (10 subsequent siblings)
  17 siblings, 1 reply; 36+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2022-02-16 19:46 UTC (permalink / raw)
  To: qemu-block
  Cc: qemu-devel, armbru, xiechanglong.d, wencongyang2, fam, stefanha,
	eblake, hreitz, kwolf, vsementsov, jsnow, nikita.lapshin

Let's reuse convenient helper.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 block/reqlist.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/block/reqlist.c b/block/reqlist.c
index 5e320ba649..09fecbd48c 100644
--- a/block/reqlist.c
+++ b/block/reqlist.c
@@ -13,6 +13,7 @@
  */
 
 #include "qemu/osdep.h"
+#include "qemu/range.h"
 
 #include "block/reqlist.h"
 
@@ -35,7 +36,7 @@ BlockReq *reqlist_find_conflict(BlockReqList *reqs, int64_t offset,
     BlockReq *r;
 
     QLIST_FOREACH(r, reqs, list) {
-        if (offset + bytes > r->offset && offset < r->offset + r->bytes) {
+        if (ranges_overlap(offset, bytes, r->offset, r->bytes)) {
             return r;
         }
     }
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v4 08/18] block/dirty-bitmap: introduce bdrv_dirty_bitmap_status()
  2022-02-16 19:45 [PATCH v4 00/18] Make image fleecing more usable Vladimir Sementsov-Ogievskiy
                   ` (6 preceding siblings ...)
  2022-02-16 19:46 ` [PATCH v4 07/18] block/reqlist: reqlist_find_conflict(): use ranges_overlap() Vladimir Sementsov-Ogievskiy
@ 2022-02-16 19:46 ` Vladimir Sementsov-Ogievskiy
  2022-02-24 12:20   ` Hanna Reitz
  2022-02-16 19:46 ` [PATCH v4 09/18] block/reqlist: add reqlist_wait_all() Vladimir Sementsov-Ogievskiy
                   ` (9 subsequent siblings)
  17 siblings, 1 reply; 36+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2022-02-16 19:46 UTC (permalink / raw)
  To: qemu-block
  Cc: qemu-devel, armbru, xiechanglong.d, wencongyang2, fam, stefanha,
	eblake, hreitz, kwolf, vsementsov, jsnow, nikita.lapshin

Add a convenient function similar with bdrv_block_status() to get
status of dirty bitmap.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 include/block/dirty-bitmap.h |  2 ++
 include/qemu/hbitmap.h       | 12 ++++++++++++
 block/dirty-bitmap.c         |  6 ++++++
 util/hbitmap.c               | 33 +++++++++++++++++++++++++++++++++
 4 files changed, 53 insertions(+)

diff --git a/include/block/dirty-bitmap.h b/include/block/dirty-bitmap.h
index f95d350b70..6528336c4c 100644
--- a/include/block/dirty-bitmap.h
+++ b/include/block/dirty-bitmap.h
@@ -115,6 +115,8 @@ int64_t bdrv_dirty_bitmap_next_zero(BdrvDirtyBitmap *bitmap, int64_t offset,
 bool bdrv_dirty_bitmap_next_dirty_area(BdrvDirtyBitmap *bitmap,
         int64_t start, int64_t end, int64_t max_dirty_count,
         int64_t *dirty_start, int64_t *dirty_count);
+bool bdrv_dirty_bitmap_status(BdrvDirtyBitmap *bitmap, int64_t offset,
+                              int64_t bytes, int64_t *count);
 BdrvDirtyBitmap *bdrv_reclaim_dirty_bitmap_locked(BdrvDirtyBitmap *bitmap,
                                                   Error **errp);
 
diff --git a/include/qemu/hbitmap.h b/include/qemu/hbitmap.h
index 5e71b6d6f7..5bd986aa44 100644
--- a/include/qemu/hbitmap.h
+++ b/include/qemu/hbitmap.h
@@ -340,6 +340,18 @@ bool hbitmap_next_dirty_area(const HBitmap *hb, int64_t start, int64_t end,
                              int64_t max_dirty_count,
                              int64_t *dirty_start, int64_t *dirty_count);
 
+/*
+ * bdrv_dirty_bitmap_status:
+ * @hb: The HBitmap to operate on
+ * @start: The bit to start from
+ * @count: Number of bits to proceed
+ * @pnum: Out-parameter. How many bits has same value starting from @start
+ *
+ * Returns true if bitmap is dirty at @start, false otherwise.
+ */
+bool hbitmap_status(const HBitmap *hb, int64_t start, int64_t count,
+                    int64_t *pnum);
+
 /**
  * hbitmap_iter_next:
  * @hbi: HBitmapIter to operate on.
diff --git a/block/dirty-bitmap.c b/block/dirty-bitmap.c
index 94a0276833..08d56845ad 100644
--- a/block/dirty-bitmap.c
+++ b/block/dirty-bitmap.c
@@ -875,6 +875,12 @@ bool bdrv_dirty_bitmap_next_dirty_area(BdrvDirtyBitmap *bitmap,
                                    dirty_start, dirty_count);
 }
 
+bool bdrv_dirty_bitmap_status(BdrvDirtyBitmap *bitmap, int64_t offset,
+                              int64_t bytes, int64_t *count)
+{
+    return hbitmap_status(bitmap->bitmap, offset, bytes, count);
+}
+
 /**
  * bdrv_merge_dirty_bitmap: merge src into dest.
  * Ensures permissions on bitmaps are reasonable; use for public API.
diff --git a/util/hbitmap.c b/util/hbitmap.c
index 305b894a63..dd0501d9a7 100644
--- a/util/hbitmap.c
+++ b/util/hbitmap.c
@@ -301,6 +301,39 @@ bool hbitmap_next_dirty_area(const HBitmap *hb, int64_t start, int64_t end,
     return true;
 }
 
+bool hbitmap_status(const HBitmap *hb, int64_t start, int64_t count,
+                    int64_t *pnum)
+{
+    int64_t next_dirty, next_zero;
+
+    assert(start >= 0);
+    assert(count > 0);
+    assert(start + count <= hb->orig_size);
+
+    next_dirty = hbitmap_next_dirty(hb, start, count);
+    if (next_dirty == -1) {
+        *pnum = count;
+        return false;
+    }
+
+    if (next_dirty > start) {
+        *pnum = next_dirty - start;
+        return false;
+    }
+
+    assert(next_dirty == start);
+
+    next_zero = hbitmap_next_zero(hb, start, count);
+    if (next_zero == -1) {
+        *pnum = count;
+        return true;
+    }
+
+    assert(next_zero > start);
+    *pnum = next_zero - start;
+    return false;
+}
+
 bool hbitmap_empty(const HBitmap *hb)
 {
     return hb->count == 0;
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v4 09/18] block/reqlist: add reqlist_wait_all()
  2022-02-16 19:45 [PATCH v4 00/18] Make image fleecing more usable Vladimir Sementsov-Ogievskiy
                   ` (7 preceding siblings ...)
  2022-02-16 19:46 ` [PATCH v4 08/18] block/dirty-bitmap: introduce bdrv_dirty_bitmap_status() Vladimir Sementsov-Ogievskiy
@ 2022-02-16 19:46 ` Vladimir Sementsov-Ogievskiy
  2022-02-16 19:46 ` [PATCH v4 10/18] block/io: introduce block driver snapshot-access API Vladimir Sementsov-Ogievskiy
                   ` (8 subsequent siblings)
  17 siblings, 0 replies; 36+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2022-02-16 19:46 UTC (permalink / raw)
  To: qemu-block
  Cc: qemu-devel, armbru, xiechanglong.d, wencongyang2, fam, stefanha,
	eblake, hreitz, kwolf, vsementsov, jsnow, nikita.lapshin

Add function to wait for all intersecting requests.
To be used in the further commit.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Nikita Lapshin <nikita.lapshin@virtuozzo.com>
Reviewed-by: Hanna Reitz <hreitz@redhat.com>
---
 include/block/reqlist.h | 8 ++++++++
 block/reqlist.c         | 8 ++++++++
 2 files changed, 16 insertions(+)

diff --git a/include/block/reqlist.h b/include/block/reqlist.h
index 0fa1eef259..5253497bae 100644
--- a/include/block/reqlist.h
+++ b/include/block/reqlist.h
@@ -53,6 +53,14 @@ BlockReq *reqlist_find_conflict(BlockReqList *reqs, int64_t offset,
 bool coroutine_fn reqlist_wait_one(BlockReqList *reqs, int64_t offset,
                                    int64_t bytes, CoMutex *lock);
 
+/*
+ * Wait for all intersecting requests. It just calls reqlist_wait_one() in a
+ * loop, caller is responsible to stop producing new requests in this region
+ * in parallel, otherwise reqlist_wait_all() may never return.
+ */
+void coroutine_fn reqlist_wait_all(BlockReqList *reqs, int64_t offset,
+                                   int64_t bytes, CoMutex *lock);
+
 /*
  * Shrink request and wake all waiting coroutines (maybe some of them are not
  * intersecting with shrunk request).
diff --git a/block/reqlist.c b/block/reqlist.c
index 09fecbd48c..08cb57cfa4 100644
--- a/block/reqlist.c
+++ b/block/reqlist.c
@@ -58,6 +58,14 @@ bool coroutine_fn reqlist_wait_one(BlockReqList *reqs, int64_t offset,
     return true;
 }
 
+void coroutine_fn reqlist_wait_all(BlockReqList *reqs, int64_t offset,
+                                   int64_t bytes, CoMutex *lock)
+{
+    while (reqlist_wait_one(reqs, offset, bytes, lock)) {
+        /* continue */
+    }
+}
+
 void coroutine_fn reqlist_shrink_req(BlockReq *req, int64_t new_bytes)
 {
     if (new_bytes == req->bytes) {
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v4 10/18] block/io: introduce block driver snapshot-access API
  2022-02-16 19:45 [PATCH v4 00/18] Make image fleecing more usable Vladimir Sementsov-Ogievskiy
                   ` (8 preceding siblings ...)
  2022-02-16 19:46 ` [PATCH v4 09/18] block/reqlist: add reqlist_wait_all() Vladimir Sementsov-Ogievskiy
@ 2022-02-16 19:46 ` Vladimir Sementsov-Ogievskiy
  2022-02-24 12:24   ` Hanna Reitz
  2022-02-16 19:46 ` [PATCH v4 11/18] block: introduce snapshot-access filter Vladimir Sementsov-Ogievskiy
                   ` (7 subsequent siblings)
  17 siblings, 1 reply; 36+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2022-02-16 19:46 UTC (permalink / raw)
  To: qemu-block
  Cc: qemu-devel, armbru, xiechanglong.d, wencongyang2, fam, stefanha,
	eblake, hreitz, kwolf, vsementsov, jsnow, nikita.lapshin

Add new block driver handlers and corresponding generic wrappers.
It will be used to allow copy-before-write filter to provide
reach fleecing interface in further commit.

In future this approach may be used to allow reading qcow2 interanal
snaphots, for example to export them through NBD.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 include/block/block_int.h | 27 +++++++++++++++
 block/io.c                | 69 +++++++++++++++++++++++++++++++++++++++
 2 files changed, 96 insertions(+)

diff --git a/include/block/block_int.h b/include/block/block_int.h
index 27008cfb22..c43315ae6e 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -376,6 +376,24 @@ struct BlockDriver {
      */
     void (*bdrv_cancel_in_flight)(BlockDriverState *bs);
 
+    /*
+     * Snapshot-access API.
+     *
+     * Block-driver may provide snapshot-access API: special functions to access
+     * some internal "snapshot". The functions are similar with normal
+     * read/block_status/discard handler, but don't have any specific handling
+     * in generic block-layer: no serializing, no alignment, no tracked
+     * requests. So, block-driver that realizes these APIs is fully responsible
+     * for synchronization between snapshot-access API and normal IO requests.
+     */
+    int coroutine_fn (*bdrv_co_preadv_snapshot)(BlockDriverState *bs,
+        int64_t offset, int64_t bytes, QEMUIOVector *qiov, size_t qiov_offset);
+    int coroutine_fn (*bdrv_co_snapshot_block_status)(BlockDriverState *bs,
+        bool want_zero, int64_t offset, int64_t bytes, int64_t *pnum,
+        int64_t *map, BlockDriverState **file);
+    int coroutine_fn (*bdrv_co_pdiscard_snapshot)(BlockDriverState *bs,
+        int64_t offset, int64_t bytes);
+
     /*
      * Invalidate any cached meta-data.
      */
@@ -1078,6 +1096,15 @@ extern BlockDriver bdrv_file;
 extern BlockDriver bdrv_raw;
 extern BlockDriver bdrv_qcow2;
 
+int coroutine_fn bdrv_co_preadv_snapshot(BdrvChild *child,
+    int64_t offset, int64_t bytes, QEMUIOVector *qiov, size_t qiov_offset);
+int coroutine_fn bdrv_co_snapshot_block_status(BlockDriverState *bs,
+    bool want_zero, int64_t offset, int64_t bytes, int64_t *pnum,
+    int64_t *map, BlockDriverState **file);
+int coroutine_fn bdrv_co_pdiscard_snapshot(BlockDriverState *bs,
+    int64_t offset, int64_t bytes);
+
+
 int coroutine_fn bdrv_co_preadv(BdrvChild *child,
     int64_t offset, int64_t bytes, QEMUIOVector *qiov,
     BdrvRequestFlags flags);
diff --git a/block/io.c b/block/io.c
index 4e4cb556c5..0bcf09a491 100644
--- a/block/io.c
+++ b/block/io.c
@@ -3587,3 +3587,72 @@ void bdrv_cancel_in_flight(BlockDriverState *bs)
         bs->drv->bdrv_cancel_in_flight(bs);
     }
 }
+
+int coroutine_fn
+bdrv_co_preadv_snapshot(BdrvChild *child, int64_t offset, int64_t bytes,
+                        QEMUIOVector *qiov, size_t qiov_offset)
+{
+    BlockDriverState *bs = child->bs;
+    BlockDriver *drv = bs->drv;
+    int ret;
+
+    if (!drv) {
+        return -ENOMEDIUM;
+    }
+
+    if (!drv->bdrv_co_preadv_snapshot) {
+        return -ENOTSUP;
+    }
+
+    bdrv_inc_in_flight(bs);
+    ret = drv->bdrv_co_preadv_snapshot(bs, offset, bytes, qiov, qiov_offset);
+    bdrv_dec_in_flight(bs);
+
+    return ret;
+}
+
+int coroutine_fn
+bdrv_co_snapshot_block_status(BlockDriverState *bs,
+                              bool want_zero, int64_t offset, int64_t bytes,
+                              int64_t *pnum, int64_t *map,
+                              BlockDriverState **file)
+{
+    BlockDriver *drv = bs->drv;
+    int ret;
+
+    if (!drv) {
+        return -ENOMEDIUM;
+    }
+
+    if (!drv->bdrv_co_snapshot_block_status) {
+        return -ENOTSUP;
+    }
+
+    bdrv_inc_in_flight(bs);
+    ret = drv->bdrv_co_snapshot_block_status(bs, want_zero, offset, bytes,
+                                             pnum, map, file);
+    bdrv_dec_in_flight(bs);
+
+    return ret;
+}
+
+int coroutine_fn
+bdrv_co_pdiscard_snapshot(BlockDriverState *bs, int64_t offset, int64_t bytes)
+{
+    BlockDriver *drv = bs->drv;
+    int ret;
+
+    if (!drv) {
+        return -ENOMEDIUM;
+    }
+
+    if (!drv->bdrv_co_pdiscard_snapshot) {
+        return -ENOTSUP;
+    }
+
+    bdrv_inc_in_flight(bs);
+    ret = drv->bdrv_co_pdiscard_snapshot(bs, offset, bytes);
+    bdrv_dec_in_flight(bs);
+
+    return ret;
+}
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v4 11/18] block: introduce snapshot-access filter
  2022-02-16 19:45 [PATCH v4 00/18] Make image fleecing more usable Vladimir Sementsov-Ogievskiy
                   ` (9 preceding siblings ...)
  2022-02-16 19:46 ` [PATCH v4 10/18] block/io: introduce block driver snapshot-access API Vladimir Sementsov-Ogievskiy
@ 2022-02-16 19:46 ` Vladimir Sementsov-Ogievskiy
  2022-02-24 12:29   ` Hanna Reitz
  2022-02-16 19:46 ` [PATCH v4 12/18] block: copy-before-write: realize snapshot-access API Vladimir Sementsov-Ogievskiy
                   ` (6 subsequent siblings)
  17 siblings, 1 reply; 36+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2022-02-16 19:46 UTC (permalink / raw)
  To: qemu-block
  Cc: qemu-devel, armbru, xiechanglong.d, wencongyang2, fam, stefanha,
	eblake, hreitz, kwolf, vsementsov, jsnow, nikita.lapshin

The filter simply utilizes snapshot-access API of underlying block
node.

In further patches we want to use it like this:

[guest]                   [NBD export]
   |                            |
   | root                       | root
   v                 file       v
[copy-before-write]<------[snapshot-access]
   |           |
   | file      | target
   v           v
[active-disk] [temp.img]

This way, NBD client will be able to read snapshotted state of active
disk, when active disk is continued to be written by guest. This is
known as "fleecing", and currently uses another scheme based on qcow2
temporary image which backing file is active-disk. New scheme comes
with benefits - see next commit.

The other possible application is exporting internal snapshots of
qcow2, like this:

[guest]          [NBD export]
   |                  |
   | root             | root
   v       file       v
[qcow2]<---------[snapshot-access]

For this, we'll need to implement snapshot-access API handlers in
qcow2 driver, and improve snapshot-access filter (and API) to make it
possibele to select snapshot by name.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 qapi/block-core.json    |   4 +-
 block/snapshot-access.c | 132 ++++++++++++++++++++++++++++++++++++++++
 MAINTAINERS             |   1 +
 block/meson.build       |   1 +
 4 files changed, 137 insertions(+), 1 deletion(-)
 create mode 100644 block/snapshot-access.c

diff --git a/qapi/block-core.json b/qapi/block-core.json
index 3bab597506..a904755e98 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -2914,13 +2914,14 @@
 # @blkreplay: Since 4.2
 # @compress: Since 5.0
 # @copy-before-write: Since 6.2
+# @snapshot-access: Since 7.0
 #
 # Since: 2.9
 ##
 { 'enum': 'BlockdevDriver',
   'data': [ 'blkdebug', 'blklogwrites', 'blkreplay', 'blkverify', 'bochs',
             'cloop', 'compress', 'copy-before-write', 'copy-on-read', 'dmg',
-            'file', 'ftp', 'ftps', 'gluster',
+            'file', 'snapshot-access', 'ftp', 'ftps', 'gluster',
             {'name': 'host_cdrom', 'if': 'HAVE_HOST_BLOCK_DEVICE' },
             {'name': 'host_device', 'if': 'HAVE_HOST_BLOCK_DEVICE' },
             'http', 'https', 'iscsi',
@@ -4267,6 +4268,7 @@
       'rbd':        'BlockdevOptionsRbd',
       'replication': { 'type': 'BlockdevOptionsReplication',
                        'if': 'CONFIG_REPLICATION' },
+      'snapshot-access': 'BlockdevOptionsGenericFormat',
       'ssh':        'BlockdevOptionsSsh',
       'throttle':   'BlockdevOptionsThrottle',
       'vdi':        'BlockdevOptionsGenericFormat',
diff --git a/block/snapshot-access.c b/block/snapshot-access.c
new file mode 100644
index 0000000000..77b87c1946
--- /dev/null
+++ b/block/snapshot-access.c
@@ -0,0 +1,132 @@
+/*
+ * snapshot_access block driver
+ *
+ * Copyright (c) 2022 Virtuozzo International GmbH.
+ *
+ * Author:
+ *  Sementsov-Ogievskiy Vladimir <vsementsov@virtuozzo.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include "qemu/osdep.h"
+
+#include "sysemu/block-backend.h"
+#include "qemu/cutils.h"
+#include "block/block_int.h"
+
+static coroutine_fn int
+snapshot_access_co_preadv_part(BlockDriverState *bs,
+                               int64_t offset, int64_t bytes,
+                               QEMUIOVector *qiov, size_t qiov_offset,
+                               BdrvRequestFlags flags)
+{
+    if (flags) {
+        return -ENOTSUP;
+    }
+
+    return bdrv_co_preadv_snapshot(bs->file, offset, bytes, qiov, qiov_offset);
+}
+
+static int coroutine_fn
+snapshot_access_co_block_status(BlockDriverState *bs,
+                                bool want_zero, int64_t offset,
+                                int64_t bytes, int64_t *pnum,
+                                int64_t *map, BlockDriverState **file)
+{
+    return bdrv_co_snapshot_block_status(bs->file->bs, want_zero, offset,
+                                         bytes, pnum, map, file);
+}
+
+static int coroutine_fn snapshot_access_co_pdiscard(BlockDriverState *bs,
+                                             int64_t offset, int64_t bytes)
+{
+    return bdrv_co_pdiscard_snapshot(bs->file->bs, offset, bytes);
+}
+
+static int coroutine_fn
+snapshot_access_co_pwrite_zeroes(BlockDriverState *bs,
+                                 int64_t offset, int64_t bytes,
+                                 BdrvRequestFlags flags)
+{
+    return -ENOTSUP;
+}
+
+static coroutine_fn int
+snapshot_access_co_pwritev_part(BlockDriverState *bs,
+                                int64_t offset, int64_t bytes,
+                                QEMUIOVector *qiov, size_t qiov_offset,
+                                BdrvRequestFlags flags)
+{
+    return -ENOTSUP;
+}
+
+
+static void snapshot_access_refresh_filename(BlockDriverState *bs)
+{
+    pstrcpy(bs->exact_filename, sizeof(bs->exact_filename),
+            bs->file->bs->filename);
+}
+
+static int snapshot_access_open(BlockDriverState *bs, QDict *options, int flags,
+                                Error **errp)
+{
+    bs->file = bdrv_open_child(NULL, options, "file", bs, &child_of_bds,
+                               BDRV_CHILD_DATA | BDRV_CHILD_PRIMARY,
+                               false, errp);
+    if (!bs->file) {
+        return -EINVAL;
+    }
+
+    bs->total_sectors = bs->file->bs->total_sectors;
+
+    return 0;
+}
+
+static void snapshot_access_child_perm(BlockDriverState *bs, BdrvChild *c,
+                                BdrvChildRole role,
+                                BlockReopenQueue *reopen_queue,
+                                uint64_t perm, uint64_t shared,
+                                uint64_t *nperm, uint64_t *nshared)
+{
+    /*
+     * Currently, we don't need any permissions. If bs->file provides
+     * snapshot-access API, we can use it.
+     */
+    *nperm = 0;
+    *nshared = BLK_PERM_ALL;
+}
+
+BlockDriver bdrv_snapshot_access_drv = {
+    .format_name = "snapshot-access",
+
+    .bdrv_open                  = snapshot_access_open,
+
+    .bdrv_co_preadv_part        = snapshot_access_co_preadv_part,
+    .bdrv_co_pwritev_part       = snapshot_access_co_pwritev_part,
+    .bdrv_co_pwrite_zeroes      = snapshot_access_co_pwrite_zeroes,
+    .bdrv_co_pdiscard           = snapshot_access_co_pdiscard,
+    .bdrv_co_block_status       = snapshot_access_co_block_status,
+
+    .bdrv_refresh_filename      = snapshot_access_refresh_filename,
+
+    .bdrv_child_perm            = snapshot_access_child_perm,
+};
+
+static void snapshot_access_init(void)
+{
+    bdrv_register(&bdrv_snapshot_access_drv);
+}
+
+block_init(snapshot_access_init);
diff --git a/MAINTAINERS b/MAINTAINERS
index 7a5292b814..b16fcca98a 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2509,6 +2509,7 @@ F: block/reqlist.c
 F: include/block/reqlist.h
 F: block/copy-before-write.h
 F: block/copy-before-write.c
+F: block/snapshot-access.c
 F: include/block/aio_task.h
 F: block/aio_task.c
 F: util/qemu-co-shared-resource.c
diff --git a/block/meson.build b/block/meson.build
index e2f0fe34b4..85f2b03216 100644
--- a/block/meson.build
+++ b/block/meson.build
@@ -34,6 +34,7 @@ block_ss.add(files(
   'raw-format.c',
   'reqlist.c',
   'snapshot.c',
+  'snapshot-access.c',
   'throttle-groups.c',
   'throttle.c',
   'vhdx-endian.c',
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v4 12/18] block: copy-before-write: realize snapshot-access API
  2022-02-16 19:45 [PATCH v4 00/18] Make image fleecing more usable Vladimir Sementsov-Ogievskiy
                   ` (10 preceding siblings ...)
  2022-02-16 19:46 ` [PATCH v4 11/18] block: introduce snapshot-access filter Vladimir Sementsov-Ogievskiy
@ 2022-02-16 19:46 ` Vladimir Sementsov-Ogievskiy
  2022-02-24 12:46   ` Hanna Reitz
  2022-02-16 19:46 ` [PATCH v4 13/18] iotests/image-fleecing: add test-case for fleecing format node Vladimir Sementsov-Ogievskiy
                   ` (5 subsequent siblings)
  17 siblings, 1 reply; 36+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2022-02-16 19:46 UTC (permalink / raw)
  To: qemu-block
  Cc: qemu-devel, armbru, xiechanglong.d, wencongyang2, fam, stefanha,
	eblake, hreitz, kwolf, vsementsov, jsnow, nikita.lapshin

Current scheme of image fleecing looks like this:

[guest]                    [NBD export]
  |                              |
  |root                          | root
  v                              v
[copy-before-write] -----> [temp.qcow2]
  |                 target  |
  |file                     |backing
  v                         |
[active disk] <-------------+

 - On guest writes copy-before-write filter copies old data from active
   disk to temp.qcow2. So fleecing client (NBD export) when reads
   changed regions from temp.qcow2 image and unchanged from active disk
   through backing link.

This patch makes possible new image fleecing scheme:

[guest]                   [NBD export]
   |                            |
   | root                       | root
   v                 file       v
[copy-before-write]<------[x-snapshot-access]
   |           |
   | file      | target
   v           v
[active-disk] [temp.img]

 - copy-before-write does CBW operations and also provides
   snapshot-access API. The API may be accessed through
   x-snapshot-access driver.

Benefits of new scheme:

1. Access control: if remote client try to read data that not covered
   by original dirty bitmap used on copy-before-write open, client gets
   -EACCES.

2. Discard support: if remote client do DISCARD, this additionally to
   discarding data in temp.img informs block-copy process to not copy
   these clusters. Next read from discarded area will return -EACCES.
   This is significant thing: when fleecing user reads data that was
   not yet copied to temp.img, we can avoid copying it on further guest
   write.

3. Synchronisation between client reads and block-copy write is more
   efficient. In old scheme we just rely on BDRV_REQ_SERIALISING flag
   used for writes to temp.qcow2. New scheme is less blocking:
     - fleecing reads are never blocked: if data region is untouched or
       in-flight, we just read from active-disk, otherwise we read from
       temp.img
     - writes to temp.img are not blocked by fleecing reads
     - still, guest writes of-course are blocked by in-flight fleecing
       reads, that currently read from active-disk - it's the minimum
       necessary blocking

4. Temporary image may be of any format, as we don't rely on backing
   feature.

5. Permission relation are simplified. With old scheme we have to share
   write permission on target child of copy-before-write, otherwise
   backing link conflicts with copy-before-write file child write
   permissions. With new scheme we don't have backing link, and
   copy-before-write node may have unshared access to temporary node.
   (Not realized in this commit, will be in future).

6. Having control on fleecing reads we'll be able to implement
   alternative behavior on failed copy-before-write operations.
   Currently we just break guest request (that's a historical behavior
   of backup). But in some scenarios it's a bad behavior: better
   is to drop the backup as failed but don't break guest request.
   With new scheme we can simply unset some bits in a bitmap on CBW
   failure and further fleecing reads will -EACCES, or something like
   this. (Not implemented in this commit, will be in future)
   Additional application for this is implementing timeout for CBW
   operations.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 block/copy-before-write.c | 212 +++++++++++++++++++++++++++++++++++++-
 1 file changed, 211 insertions(+), 1 deletion(-)

diff --git a/block/copy-before-write.c b/block/copy-before-write.c
index 91a2288b66..a8c88f64eb 100644
--- a/block/copy-before-write.c
+++ b/block/copy-before-write.c
@@ -33,12 +33,37 @@
 #include "block/block-copy.h"
 
 #include "block/copy-before-write.h"
+#include "block/reqlist.h"
 
 #include "qapi/qapi-visit-block-core.h"
 
 typedef struct BDRVCopyBeforeWriteState {
     BlockCopyState *bcs;
     BdrvChild *target;
+
+    /*
+     * @lock: protects access to @access_bitmap, @done_bitmap and
+     * @frozen_read_reqs
+     */
+    CoMutex lock;
+
+    /*
+     * @access_bitmap: represents areas allowed for reading by fleecing user.
+     * Reading from non-dirty areas leads to -EACCES.
+     */
+    BdrvDirtyBitmap *access_bitmap;
+
+    /*
+     * @done_bitmap: represents areas that was successfully copied to @target by
+     * copy-before-write operations.
+     */
+    BdrvDirtyBitmap *done_bitmap;
+
+    /*
+     * @frozen_read_reqs: current read requests for fleecing user in bs->file
+     * node. These areas must not be rewritten by guest.
+     */
+    BlockReqList frozen_read_reqs;
 } BDRVCopyBeforeWriteState;
 
 static coroutine_fn int cbw_co_preadv(
@@ -48,10 +73,20 @@ static coroutine_fn int cbw_co_preadv(
     return bdrv_co_preadv(bs->file, offset, bytes, qiov, flags);
 }
 
+/*
+ * Do copy-before-write operation.
+ *
+ * On failure guest request must be failed too.
+ *
+ * On success, we also wait for all in-flight fleecing read requests in source
+ * node, and it's guaranteed that after cbw_do_copy_before_write() successful
+ * return there are no such requests and they will never appear.
+ */
 static coroutine_fn int cbw_do_copy_before_write(BlockDriverState *bs,
         uint64_t offset, uint64_t bytes, BdrvRequestFlags flags)
 {
     BDRVCopyBeforeWriteState *s = bs->opaque;
+    int ret;
     uint64_t off, end;
     int64_t cluster_size = block_copy_cluster_size(s->bcs);
 
@@ -62,7 +97,17 @@ static coroutine_fn int cbw_do_copy_before_write(BlockDriverState *bs,
     off = QEMU_ALIGN_DOWN(offset, cluster_size);
     end = QEMU_ALIGN_UP(offset + bytes, cluster_size);
 
-    return block_copy(s->bcs, off, end - off, true);
+    ret = block_copy(s->bcs, off, end - off, true);
+    if (ret < 0) {
+        return ret;
+    }
+
+    WITH_QEMU_LOCK_GUARD(&s->lock) {
+        bdrv_set_dirty_bitmap(s->done_bitmap, off, end - off);
+        reqlist_wait_all(&s->frozen_read_reqs, off, end - off, &s->lock);
+    }
+
+    return 0;
 }
 
 static int coroutine_fn cbw_co_pdiscard(BlockDriverState *bs,
@@ -110,6 +155,142 @@ static int coroutine_fn cbw_co_flush(BlockDriverState *bs)
     return bdrv_co_flush(bs->file->bs);
 }
 
+/*
+ * If @offset not accessible - return NULL.
+ *
+ * Otherwise, set @pnum to some bytes that accessible from @file (@file is set
+ * to bs->file or to s->target). Return newly allocated BlockReq object that
+ * should be than passed to cbw_snapshot_read_unlock().
+ *
+ * It's guaranteed that guest writes will not interact in the region until
+ * cbw_snapshot_read_unlock() called.
+ */
+static BlockReq *cbw_snapshot_read_lock(BlockDriverState *bs,
+                                        int64_t offset, int64_t bytes,
+                                        int64_t *pnum, BdrvChild **file)
+{
+    BDRVCopyBeforeWriteState *s = bs->opaque;
+    BlockReq *req = g_new(BlockReq, 1);
+    bool done;
+
+    QEMU_LOCK_GUARD(&s->lock);
+
+    if (bdrv_dirty_bitmap_next_zero(s->access_bitmap, offset, bytes) != -1) {
+        g_free(req);
+        return NULL;
+    }
+
+    done = bdrv_dirty_bitmap_status(s->done_bitmap, offset, bytes, pnum);
+    if (done) {
+        /*
+         * Special invalid BlockReq, that is handled in
+         * cbw_snapshot_read_unlock(). We don't need to lock something to read
+         * from s->target.
+         */
+        *req = (BlockReq) {.offset = -1, .bytes = -1};
+        *file = s->target;
+    } else {
+        reqlist_init_req(&s->frozen_read_reqs, req, offset, bytes);
+        *file = bs->file;
+    }
+
+    return req;
+}
+
+static void cbw_snapshot_read_unlock(BlockDriverState *bs, BlockReq *req)
+{
+    BDRVCopyBeforeWriteState *s = bs->opaque;
+
+    if (req->offset == -1 && req->bytes == -1) {
+        g_free(req);
+        return;
+    }
+
+    QEMU_LOCK_GUARD(&s->lock);
+
+    reqlist_remove_req(req);
+    g_free(req);
+}
+
+static coroutine_fn int
+cbw_co_preadv_snapshot(BlockDriverState *bs, int64_t offset, int64_t bytes,
+                       QEMUIOVector *qiov, size_t qiov_offset)
+{
+    BlockReq *req;
+    BdrvChild *file;
+    int ret;
+
+    /* TODO: upgrade to async loop using AioTask */
+    while (bytes) {
+        int64_t cur_bytes;
+
+        req = cbw_snapshot_read_lock(bs, offset, bytes, &cur_bytes, &file);
+        if (!req) {
+            return -EACCES;
+        }
+
+        ret = bdrv_co_preadv_part(file, offset, cur_bytes,
+                                  qiov, qiov_offset, 0);
+        cbw_snapshot_read_unlock(bs, req);
+        if (ret < 0) {
+            return ret;
+        }
+
+        bytes -= cur_bytes;
+        offset += cur_bytes;
+        qiov_offset += cur_bytes;
+    }
+
+    return 0;
+}
+
+static int coroutine_fn
+cbw_co_snapshot_block_status(BlockDriverState *bs,
+                             bool want_zero, int64_t offset, int64_t bytes,
+                             int64_t *pnum, int64_t *map,
+                             BlockDriverState **file)
+{
+    BDRVCopyBeforeWriteState *s = bs->opaque;
+    BlockReq *req;
+    int ret;
+    int64_t cur_bytes;
+    BdrvChild *child;
+
+    req = cbw_snapshot_read_lock(bs, offset, bytes, &cur_bytes, &child);
+    if (!req) {
+        return -EACCES;
+    }
+
+    ret = bdrv_block_status(bs, offset, cur_bytes, pnum, map, file);
+    if (child == s->target) {
+        /*
+         * We refer to s->target only for areas that we've written to it.
+         * And we can not report unallocated blocks in s->target: this will
+         * break generic block-status-above logic, that will go to
+         * copy-before-write filtered child in this case.
+         */
+        assert(ret & BDRV_BLOCK_ALLOCATED);
+    }
+
+    cbw_snapshot_read_unlock(bs, req);
+
+    return ret;
+}
+
+static int coroutine_fn cbw_co_pdiscard_snapshot(BlockDriverState *bs,
+                                                 int64_t offset, int64_t bytes)
+{
+    BDRVCopyBeforeWriteState *s = bs->opaque;
+
+    WITH_QEMU_LOCK_GUARD(&s->lock) {
+        bdrv_reset_dirty_bitmap(s->access_bitmap, offset, bytes);
+    }
+
+    block_copy_reset(s->bcs, offset, bytes);
+
+    return bdrv_co_pdiscard(s->target, offset, bytes);
+}
+
 static void cbw_refresh_filename(BlockDriverState *bs)
 {
     pstrcpy(bs->exact_filename, sizeof(bs->exact_filename),
@@ -194,6 +375,7 @@ static int cbw_open(BlockDriverState *bs, QDict *options, int flags,
 {
     BDRVCopyBeforeWriteState *s = bs->opaque;
     BdrvDirtyBitmap *bitmap = NULL;
+    int64_t cluster_size;
 
     bs->file = bdrv_open_child(NULL, options, "file", bs, &child_of_bds,
                                BDRV_CHILD_FILTERED | BDRV_CHILD_PRIMARY,
@@ -225,6 +407,27 @@ static int cbw_open(BlockDriverState *bs, QDict *options, int flags,
         return -EINVAL;
     }
 
+    cluster_size = block_copy_cluster_size(s->bcs);
+
+    s->done_bitmap = bdrv_create_dirty_bitmap(bs, cluster_size, NULL, errp);
+    if (!s->done_bitmap) {
+        return -EINVAL;
+    }
+    bdrv_disable_dirty_bitmap(s->done_bitmap);
+
+    /* s->access_bitmap starts equal to bcs bitmap */
+    s->access_bitmap = bdrv_create_dirty_bitmap(bs, cluster_size, NULL, errp);
+    if (!s->access_bitmap) {
+        return -EINVAL;
+    }
+    bdrv_disable_dirty_bitmap(s->access_bitmap);
+    bdrv_dirty_bitmap_merge_internal(s->access_bitmap,
+                                     block_copy_dirty_bitmap(s->bcs), NULL,
+                                     true);
+
+    qemu_co_mutex_init(&s->lock);
+    QLIST_INIT(&s->frozen_read_reqs);
+
     return 0;
 }
 
@@ -232,6 +435,9 @@ static void cbw_close(BlockDriverState *bs)
 {
     BDRVCopyBeforeWriteState *s = bs->opaque;
 
+    bdrv_release_dirty_bitmap(s->access_bitmap);
+    bdrv_release_dirty_bitmap(s->done_bitmap);
+
     block_copy_state_free(s->bcs);
     s->bcs = NULL;
 }
@@ -249,6 +455,10 @@ BlockDriver bdrv_cbw_filter = {
     .bdrv_co_pdiscard           = cbw_co_pdiscard,
     .bdrv_co_flush              = cbw_co_flush,
 
+    .bdrv_co_preadv_snapshot       = cbw_co_preadv_snapshot,
+    .bdrv_co_pdiscard_snapshot     = cbw_co_pdiscard_snapshot,
+    .bdrv_co_snapshot_block_status = cbw_co_snapshot_block_status,
+
     .bdrv_refresh_filename      = cbw_refresh_filename,
 
     .bdrv_child_perm            = cbw_child_perm,
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v4 13/18] iotests/image-fleecing: add test-case for fleecing format node
  2022-02-16 19:45 [PATCH v4 00/18] Make image fleecing more usable Vladimir Sementsov-Ogievskiy
                   ` (11 preceding siblings ...)
  2022-02-16 19:46 ` [PATCH v4 12/18] block: copy-before-write: realize snapshot-access API Vladimir Sementsov-Ogievskiy
@ 2022-02-16 19:46 ` Vladimir Sementsov-Ogievskiy
  2022-02-24 12:48   ` Hanna Reitz
  2022-02-16 19:46 ` [PATCH v4 14/18] iotests.py: add qemu_io_pipe_and_status() Vladimir Sementsov-Ogievskiy
                   ` (4 subsequent siblings)
  17 siblings, 1 reply; 36+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2022-02-16 19:46 UTC (permalink / raw)
  To: qemu-block
  Cc: qemu-devel, armbru, xiechanglong.d, wencongyang2, fam, stefanha,
	eblake, hreitz, kwolf, vsementsov, jsnow, nikita.lapshin

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 tests/qemu-iotests/tests/image-fleecing     | 64 ++++++++++++-----
 tests/qemu-iotests/tests/image-fleecing.out | 76 ++++++++++++++++++++-
 2 files changed, 120 insertions(+), 20 deletions(-)

diff --git a/tests/qemu-iotests/tests/image-fleecing b/tests/qemu-iotests/tests/image-fleecing
index a58b5a1781..909fc0a7ad 100755
--- a/tests/qemu-iotests/tests/image-fleecing
+++ b/tests/qemu-iotests/tests/image-fleecing
@@ -49,12 +49,17 @@ remainder = [('0xd5', '0x108000',  '32k'), # Right-end of partial-left [1]
              ('0xdc', '32M',       '32k'), # Left-end of partial-right [2]
              ('0xcd', '0x3ff0000', '64k')] # patterns[3]
 
-def do_test(use_cbw, base_img_path, fleece_img_path, nbd_sock_path, vm):
+def do_test(use_cbw, use_snapshot_access_filter, base_img_path,
+            fleece_img_path, nbd_sock_path, vm):
     log('--- Setting up images ---')
     log('')
 
     assert qemu_img('create', '-f', iotests.imgfmt, base_img_path, '64M') == 0
-    assert qemu_img('create', '-f', 'qcow2', fleece_img_path, '64M') == 0
+    if use_snapshot_access_filter:
+        assert use_cbw
+        assert qemu_img('create', '-f', 'raw', fleece_img_path, '64M') == 0
+    else:
+        assert qemu_img('create', '-f', 'qcow2', fleece_img_path, '64M') == 0
 
     for p in patterns:
         qemu_io('-f', iotests.imgfmt,
@@ -81,16 +86,23 @@ def do_test(use_cbw, base_img_path, fleece_img_path, nbd_sock_path, vm):
     log('')
 
 
-    # create tmp_node backed by src_node
-    log(vm.qmp('blockdev-add', {
-        'driver': 'qcow2',
-        'node-name': tmp_node,
-        'file': {
+    if use_snapshot_access_filter:
+        log(vm.qmp('blockdev-add', {
+            'node-name': tmp_node,
             'driver': 'file',
             'filename': fleece_img_path,
-        },
-        'backing': src_node,
-    }))
+        }))
+    else:
+        # create tmp_node backed by src_node
+        log(vm.qmp('blockdev-add', {
+            'driver': 'qcow2',
+            'node-name': tmp_node,
+            'file': {
+                'driver': 'file',
+                'filename': fleece_img_path,
+            },
+            'backing': src_node,
+        }))
 
     # Establish CBW from source to fleecing node
     if use_cbw:
@@ -102,6 +114,13 @@ def do_test(use_cbw, base_img_path, fleece_img_path, nbd_sock_path, vm):
         }))
 
         log(vm.qmp('qom-set', path=qom_path, property='drive', value='fl-cbw'))
+
+        if use_snapshot_access_filter:
+            log(vm.qmp('blockdev-add', {
+                'driver': 'snapshot-access',
+                'node-name': 'fl-access',
+                'file': 'fl-cbw',
+            }))
     else:
         log(vm.qmp('blockdev-backup',
                    job_id='fleecing',
@@ -109,16 +128,18 @@ def do_test(use_cbw, base_img_path, fleece_img_path, nbd_sock_path, vm):
                    target=tmp_node,
                    sync='none'))
 
+    export_node = 'fl-access' if use_snapshot_access_filter else tmp_node
+
     log('')
     log('--- Setting up NBD Export ---')
     log('')
 
-    nbd_uri = 'nbd+unix:///%s?socket=%s' % (tmp_node, nbd_sock_path)
+    nbd_uri = 'nbd+unix:///%s?socket=%s' % (export_node, nbd_sock_path)
     log(vm.qmp('nbd-server-start',
                {'addr': {'type': 'unix',
                          'data': {'path': nbd_sock_path}}}))
 
-    log(vm.qmp('nbd-server-add', device=tmp_node))
+    log(vm.qmp('nbd-server-add', device=export_node))
 
     log('')
     log('--- Sanity Check ---')
@@ -151,7 +172,11 @@ def do_test(use_cbw, base_img_path, fleece_img_path, nbd_sock_path, vm):
     log('--- Cleanup ---')
     log('')
 
+    log(vm.qmp('nbd-server-stop'))
+
     if use_cbw:
+        if use_snapshot_access_filter:
+            log(vm.qmp('blockdev-del', node_name='fl-access'))
         log(vm.qmp('qom-set', path=qom_path, property='drive', value=src_node))
         log(vm.qmp('blockdev-del', node_name='fl-cbw'))
     else:
@@ -160,7 +185,6 @@ def do_test(use_cbw, base_img_path, fleece_img_path, nbd_sock_path, vm):
         assert e is not None
         log(e, filters=[iotests.filter_qmp_event])
 
-    log(vm.qmp('nbd-server-stop'))
     log(vm.qmp('blockdev-del', node_name=tmp_node))
     vm.shutdown()
 
@@ -177,17 +201,21 @@ def do_test(use_cbw, base_img_path, fleece_img_path, nbd_sock_path, vm):
     log('Done')
 
 
-def test(use_cbw):
+def test(use_cbw, use_snapshot_access_filter):
     with iotests.FilePath('base.img') as base_img_path, \
          iotests.FilePath('fleece.img') as fleece_img_path, \
          iotests.FilePath('nbd.sock',
                           base_dir=iotests.sock_dir) as nbd_sock_path, \
          iotests.VM() as vm:
-        do_test(use_cbw, base_img_path, fleece_img_path, nbd_sock_path, vm)
+        do_test(use_cbw, use_snapshot_access_filter, base_img_path,
+                fleece_img_path, nbd_sock_path, vm)
 
 
 log('=== Test backup(sync=none) based fleecing ===\n')
-test(False)
+test(False, False)
 
-log('=== Test filter based fleecing ===\n')
-test(True)
+log('=== Test cbw-filter based fleecing ===\n')
+test(True, False)
+
+log('=== Test fleecing-format based fleecing ===\n')
+test(True, True)
diff --git a/tests/qemu-iotests/tests/image-fleecing.out b/tests/qemu-iotests/tests/image-fleecing.out
index e96d122a8b..da0af93388 100644
--- a/tests/qemu-iotests/tests/image-fleecing.out
+++ b/tests/qemu-iotests/tests/image-fleecing.out
@@ -52,8 +52,8 @@ read -P0 0x3fe0000 64k
 --- Cleanup ---
 
 {"return": {}}
-{"data": {"device": "fleecing", "len": 67108864, "offset": 393216, "speed": 0, "type": "backup"}, "event": "BLOCK_JOB_CANCELLED", "timestamp": {"microseconds": "USECS", "seconds": "SECS"}}
 {"return": {}}
+{"data": {"device": "fleecing", "len": 67108864, "offset": 393216, "speed": 0, "type": "backup"}, "event": "BLOCK_JOB_CANCELLED", "timestamp": {"microseconds": "USECS", "seconds": "SECS"}}
 {"return": {}}
 
 --- Confirming writes ---
@@ -67,7 +67,7 @@ read -P0xdc 32M 32k
 read -P0xcd 0x3ff0000 64k
 
 Done
-=== Test filter based fleecing ===
+=== Test cbw-filter based fleecing ===
 
 --- Setting up images ---
 
@@ -137,3 +137,75 @@ read -P0xdc 32M 32k
 read -P0xcd 0x3ff0000 64k
 
 Done
+=== Test fleecing-format based fleecing ===
+
+--- Setting up images ---
+
+Done
+
+--- Launching VM ---
+
+Done
+
+--- Setting up Fleecing Graph ---
+
+{"return": {}}
+{"return": {}}
+{"return": {}}
+{"return": {}}
+
+--- Setting up NBD Export ---
+
+{"return": {}}
+{"return": {}}
+
+--- Sanity Check ---
+
+read -P0x5d 0 64k
+read -P0xd5 1M 64k
+read -P0xdc 32M 64k
+read -P0xcd 0x3ff0000 64k
+read -P0 0x00f8000 32k
+read -P0 0x2010000 32k
+read -P0 0x3fe0000 64k
+
+--- Testing COW ---
+
+write -P0xab 0 64k
+{"return": ""}
+write -P0xad 0x00f8000 64k
+{"return": ""}
+write -P0x1d 0x2008000 64k
+{"return": ""}
+write -P0xea 0x3fe0000 64k
+{"return": ""}
+
+--- Verifying Data ---
+
+read -P0x5d 0 64k
+read -P0xd5 1M 64k
+read -P0xdc 32M 64k
+read -P0xcd 0x3ff0000 64k
+read -P0 0x00f8000 32k
+read -P0 0x2010000 32k
+read -P0 0x3fe0000 64k
+
+--- Cleanup ---
+
+{"return": {}}
+{"return": {}}
+{"return": {}}
+{"return": {}}
+{"return": {}}
+
+--- Confirming writes ---
+
+read -P0xab 0 64k
+read -P0xad 0x00f8000 64k
+read -P0x1d 0x2008000 64k
+read -P0xea 0x3fe0000 64k
+read -P0xd5 0x108000 32k
+read -P0xdc 32M 32k
+read -P0xcd 0x3ff0000 64k
+
+Done
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v4 14/18] iotests.py: add qemu_io_pipe_and_status()
  2022-02-16 19:45 [PATCH v4 00/18] Make image fleecing more usable Vladimir Sementsov-Ogievskiy
                   ` (12 preceding siblings ...)
  2022-02-16 19:46 ` [PATCH v4 13/18] iotests/image-fleecing: add test-case for fleecing format node Vladimir Sementsov-Ogievskiy
@ 2022-02-16 19:46 ` Vladimir Sementsov-Ogievskiy
  2022-02-24 12:52   ` Hanna Reitz
  2022-02-16 19:46 ` [PATCH v4 15/18] iotests/image-fleecing: add test case with bitmap Vladimir Sementsov-Ogievskiy
                   ` (3 subsequent siblings)
  17 siblings, 1 reply; 36+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2022-02-16 19:46 UTC (permalink / raw)
  To: qemu-block
  Cc: qemu-devel, armbru, xiechanglong.d, wencongyang2, fam, stefanha,
	eblake, hreitz, kwolf, vsementsov, jsnow, nikita.lapshin

Add helper that returns both status and output, to be used in the
following commit

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 tests/qemu-iotests/iotests.py | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py
index 6ba65eb1ff..23bc6f686f 100644
--- a/tests/qemu-iotests/iotests.py
+++ b/tests/qemu-iotests/iotests.py
@@ -278,6 +278,10 @@ def qemu_io(*args):
     '''Run qemu-io and return the stdout data'''
     return qemu_tool_pipe_and_status('qemu-io', qemu_io_wrap_args(args))[0]
 
+def qemu_io_pipe_and_status(*args):
+    args = qemu_io_args + list(args)
+    return qemu_tool_pipe_and_status('qemu-io', args)
+
 def qemu_io_log(*args):
     result = qemu_io(*args)
     log(result, filters=[filter_testfiles, filter_qemu_io])
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v4 15/18] iotests/image-fleecing: add test case with bitmap
  2022-02-16 19:45 [PATCH v4 00/18] Make image fleecing more usable Vladimir Sementsov-Ogievskiy
                   ` (13 preceding siblings ...)
  2022-02-16 19:46 ` [PATCH v4 14/18] iotests.py: add qemu_io_pipe_and_status() Vladimir Sementsov-Ogievskiy
@ 2022-02-16 19:46 ` Vladimir Sementsov-Ogievskiy
  2022-02-24 12:58   ` Hanna Reitz
  2022-02-16 19:46 ` [PATCH v4 16/18] block: blk_root(): return non-const pointer Vladimir Sementsov-Ogievskiy
                   ` (2 subsequent siblings)
  17 siblings, 1 reply; 36+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2022-02-16 19:46 UTC (permalink / raw)
  To: qemu-block
  Cc: qemu-devel, armbru, xiechanglong.d, wencongyang2, fam, stefanha,
	eblake, hreitz, kwolf, vsementsov, jsnow, nikita.lapshin

Note that reads zero areas (not dirty in the bitmap) fails, that's
correct.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 tests/qemu-iotests/tests/image-fleecing     | 32 ++++++--
 tests/qemu-iotests/tests/image-fleecing.out | 84 +++++++++++++++++++++
 2 files changed, 108 insertions(+), 8 deletions(-)

diff --git a/tests/qemu-iotests/tests/image-fleecing b/tests/qemu-iotests/tests/image-fleecing
index 909fc0a7ad..33995612be 100755
--- a/tests/qemu-iotests/tests/image-fleecing
+++ b/tests/qemu-iotests/tests/image-fleecing
@@ -23,7 +23,7 @@
 # Creator/Owner: John Snow <jsnow@redhat.com>
 
 import iotests
-from iotests import log, qemu_img, qemu_io, qemu_io_silent
+from iotests import log, qemu_img, qemu_io, qemu_io_silent, qemu_io_pipe_and_status
 
 iotests.script_initialize(
     supported_fmts=['qcow2', 'qcow', 'qed', 'vmdk', 'vhdx', 'raw'],
@@ -50,11 +50,15 @@ remainder = [('0xd5', '0x108000',  '32k'), # Right-end of partial-left [1]
              ('0xcd', '0x3ff0000', '64k')] # patterns[3]
 
 def do_test(use_cbw, use_snapshot_access_filter, base_img_path,
-            fleece_img_path, nbd_sock_path, vm):
+            fleece_img_path, nbd_sock_path, vm,
+            bitmap=False):
     log('--- Setting up images ---')
     log('')
 
     assert qemu_img('create', '-f', iotests.imgfmt, base_img_path, '64M') == 0
+    if bitmap:
+        assert qemu_img('bitmap', '--add', base_img_path, 'bitmap0') == 0
+
     if use_snapshot_access_filter:
         assert use_cbw
         assert qemu_img('create', '-f', 'raw', fleece_img_path, '64M') == 0
@@ -106,12 +110,17 @@ def do_test(use_cbw, use_snapshot_access_filter, base_img_path,
 
     # Establish CBW from source to fleecing node
     if use_cbw:
-        log(vm.qmp('blockdev-add', {
+        fl_cbw = {
             'driver': 'copy-before-write',
             'node-name': 'fl-cbw',
             'file': src_node,
             'target': tmp_node
-        }))
+        }
+
+        if bitmap:
+            fl_cbw['bitmap'] = {'node': src_node, 'name': 'bitmap0'}
+
+        log(vm.qmp('blockdev-add', fl_cbw))
 
         log(vm.qmp('qom-set', path=qom_path, property='drive', value='fl-cbw'))
 
@@ -148,7 +157,9 @@ def do_test(use_cbw, use_snapshot_access_filter, base_img_path,
     for p in patterns + zeroes:
         cmd = 'read -P%s %s %s' % p
         log(cmd)
-        assert qemu_io_silent('-r', '-f', 'raw', '-c', cmd, nbd_uri) == 0
+        out, ret = qemu_io_pipe_and_status('-r', '-f', 'raw', '-c', cmd, nbd_uri)
+        if ret != 0:
+            print(out)
 
     log('')
     log('--- Testing COW ---')
@@ -166,7 +177,9 @@ def do_test(use_cbw, use_snapshot_access_filter, base_img_path,
     for p in patterns + zeroes:
         cmd = 'read -P%s %s %s' % p
         log(cmd)
-        assert qemu_io_silent('-r', '-f', 'raw', '-c', cmd, nbd_uri) == 0
+        out, ret = qemu_io_pipe_and_status('-r', '-f', 'raw', '-c', cmd, nbd_uri)
+        if ret != 0:
+            print(out)
 
     log('')
     log('--- Cleanup ---')
@@ -201,14 +214,14 @@ def do_test(use_cbw, use_snapshot_access_filter, base_img_path,
     log('Done')
 
 
-def test(use_cbw, use_snapshot_access_filter):
+def test(use_cbw, use_snapshot_access_filter, bitmap=False):
     with iotests.FilePath('base.img') as base_img_path, \
          iotests.FilePath('fleece.img') as fleece_img_path, \
          iotests.FilePath('nbd.sock',
                           base_dir=iotests.sock_dir) as nbd_sock_path, \
          iotests.VM() as vm:
         do_test(use_cbw, use_snapshot_access_filter, base_img_path,
-                fleece_img_path, nbd_sock_path, vm)
+                fleece_img_path, nbd_sock_path, vm, bitmap=bitmap)
 
 
 log('=== Test backup(sync=none) based fleecing ===\n')
@@ -219,3 +232,6 @@ test(True, False)
 
 log('=== Test fleecing-format based fleecing ===\n')
 test(True, True)
+
+log('=== Test fleecing-format based fleecing with bitmap ===\n')
+test(True, True, bitmap=True)
diff --git a/tests/qemu-iotests/tests/image-fleecing.out b/tests/qemu-iotests/tests/image-fleecing.out
index da0af93388..62e1c1fe42 100644
--- a/tests/qemu-iotests/tests/image-fleecing.out
+++ b/tests/qemu-iotests/tests/image-fleecing.out
@@ -190,6 +190,90 @@ read -P0 0x00f8000 32k
 read -P0 0x2010000 32k
 read -P0 0x3fe0000 64k
 
+--- Cleanup ---
+
+{"return": {}}
+{"return": {}}
+{"return": {}}
+{"return": {}}
+{"return": {}}
+
+--- Confirming writes ---
+
+read -P0xab 0 64k
+read -P0xad 0x00f8000 64k
+read -P0x1d 0x2008000 64k
+read -P0xea 0x3fe0000 64k
+read -P0xd5 0x108000 32k
+read -P0xdc 32M 32k
+read -P0xcd 0x3ff0000 64k
+
+Done
+=== Test fleecing-format based fleecing with bitmap ===
+
+--- Setting up images ---
+
+Done
+
+--- Launching VM ---
+
+Done
+
+--- Setting up Fleecing Graph ---
+
+{"return": {}}
+{"return": {}}
+{"return": {}}
+{"return": {}}
+
+--- Setting up NBD Export ---
+
+{"return": {}}
+{"return": {}}
+
+--- Sanity Check ---
+
+read -P0x5d 0 64k
+read -P0xd5 1M 64k
+read -P0xdc 32M 64k
+read -P0xcd 0x3ff0000 64k
+read -P0 0x00f8000 32k
+read failed: Invalid argument
+
+read -P0 0x2010000 32k
+read failed: Invalid argument
+
+read -P0 0x3fe0000 64k
+read failed: Invalid argument
+
+
+--- Testing COW ---
+
+write -P0xab 0 64k
+{"return": ""}
+write -P0xad 0x00f8000 64k
+{"return": ""}
+write -P0x1d 0x2008000 64k
+{"return": ""}
+write -P0xea 0x3fe0000 64k
+{"return": ""}
+
+--- Verifying Data ---
+
+read -P0x5d 0 64k
+read -P0xd5 1M 64k
+read -P0xdc 32M 64k
+read -P0xcd 0x3ff0000 64k
+read -P0 0x00f8000 32k
+read failed: Invalid argument
+
+read -P0 0x2010000 32k
+read failed: Invalid argument
+
+read -P0 0x3fe0000 64k
+read failed: Invalid argument
+
+
 --- Cleanup ---
 
 {"return": {}}
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v4 16/18] block: blk_root(): return non-const pointer
  2022-02-16 19:45 [PATCH v4 00/18] Make image fleecing more usable Vladimir Sementsov-Ogievskiy
                   ` (14 preceding siblings ...)
  2022-02-16 19:46 ` [PATCH v4 15/18] iotests/image-fleecing: add test case with bitmap Vladimir Sementsov-Ogievskiy
@ 2022-02-16 19:46 ` Vladimir Sementsov-Ogievskiy
  2022-02-16 19:46 ` [PATCH v4 17/18] qapi: backup: add immutable-source parameter Vladimir Sementsov-Ogievskiy
  2022-02-16 19:46 ` [PATCH v4 18/18] iotests/image-fleecing: test push backup with fleecing Vladimir Sementsov-Ogievskiy
  17 siblings, 0 replies; 36+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2022-02-16 19:46 UTC (permalink / raw)
  To: qemu-block
  Cc: qemu-devel, armbru, xiechanglong.d, wencongyang2, fam, stefanha,
	eblake, hreitz, kwolf, vsementsov, jsnow, nikita.lapshin

In the following patch we'll want to pass blk children to block-copy.
Const pointers are not enough. So, return non const pointer from
blk_root().

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 include/sysemu/block-backend.h | 2 +-
 block/block-backend.c          | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/sysemu/block-backend.h b/include/sysemu/block-backend.h
index e5e1524f06..904d70f49c 100644
--- a/include/sysemu/block-backend.h
+++ b/include/sysemu/block-backend.h
@@ -277,7 +277,7 @@ int coroutine_fn blk_co_copy_range(BlockBackend *blk_in, int64_t off_in,
                                    int64_t bytes, BdrvRequestFlags read_flags,
                                    BdrvRequestFlags write_flags);
 
-const BdrvChild *blk_root(BlockBackend *blk);
+BdrvChild *blk_root(BlockBackend *blk);
 
 int blk_make_empty(BlockBackend *blk, Error **errp);
 
diff --git a/block/block-backend.c b/block/block-backend.c
index 4ff6b4d785..97913acfcd 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -2464,7 +2464,7 @@ int coroutine_fn blk_co_copy_range(BlockBackend *blk_in, int64_t off_in,
                               bytes, read_flags, write_flags);
 }
 
-const BdrvChild *blk_root(BlockBackend *blk)
+BdrvChild *blk_root(BlockBackend *blk)
 {
     return blk->root;
 }
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v4 17/18] qapi: backup: add immutable-source parameter
  2022-02-16 19:45 [PATCH v4 00/18] Make image fleecing more usable Vladimir Sementsov-Ogievskiy
                   ` (15 preceding siblings ...)
  2022-02-16 19:46 ` [PATCH v4 16/18] block: blk_root(): return non-const pointer Vladimir Sementsov-Ogievskiy
@ 2022-02-16 19:46 ` Vladimir Sementsov-Ogievskiy
  2022-02-24 13:05   ` Hanna Reitz
  2022-02-16 19:46 ` [PATCH v4 18/18] iotests/image-fleecing: test push backup with fleecing Vladimir Sementsov-Ogievskiy
  17 siblings, 1 reply; 36+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2022-02-16 19:46 UTC (permalink / raw)
  To: qemu-block
  Cc: qemu-devel, armbru, xiechanglong.d, wencongyang2, fam, stefanha,
	eblake, hreitz, kwolf, vsementsov, jsnow, nikita.lapshin

We are on the way to implement internal-backup with fleecing scheme,
which includes backup job copying from fleecing block driver node
(which is target of copy-before-write filter) to final target of
backup. This job doesn't need own filter, as fleecing block driver node
is a kind of snapshot, it's immutable from reader point of view.

Let's add a parameter for backup to not insert filter but instead
unshare writes on source. This way backup job becomes a simple copying
process.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 qapi/block-core.json      | 11 ++++++-
 include/block/block_int.h |  1 +
 block/backup.c            | 61 +++++++++++++++++++++++++++++++++++----
 block/replication.c       |  2 +-
 blockdev.c                |  1 +
 5 files changed, 69 insertions(+), 7 deletions(-)

diff --git a/qapi/block-core.json b/qapi/block-core.json
index a904755e98..30d44683bf 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -1436,6 +1436,15 @@
 #                    above node specified by @drive. If this option is not given,
 #                    a node name is autogenerated. (Since: 4.2)
 #
+# @immutable-source: If true, assume source is immutable, and don't insert filter
+#                    as no copy-before-write operations are needed. It will
+#                    fail if there are existing writers on source node.
+#                    Any attempt to add writer to source node during backup will
+#                    also fail. @filter-node-name must not be set.
+#                    If false, insert copy-before-write filter above source node
+#                    (see also @filter-node-name parameter).
+#                    Default is false. (Since 6.2)
+#
 # @x-perf: Performance options. (Since 6.0)
 #
 # Features:
@@ -1455,7 +1464,7 @@
             '*on-source-error': 'BlockdevOnError',
             '*on-target-error': 'BlockdevOnError',
             '*auto-finalize': 'bool', '*auto-dismiss': 'bool',
-            '*filter-node-name': 'str',
+            '*filter-node-name': 'str', '*immutable-source': 'bool',
             '*x-perf': { 'type': 'BackupPerf',
                          'features': [ 'unstable' ] } } }
 
diff --git a/include/block/block_int.h b/include/block/block_int.h
index c43315ae6e..0270af29ae 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -1348,6 +1348,7 @@ BlockJob *backup_job_create(const char *job_id, BlockDriverState *bs,
                             BitmapSyncMode bitmap_mode,
                             bool compress,
                             const char *filter_node_name,
+                            bool immutable_source,
                             BackupPerf *perf,
                             BlockdevOnError on_source_error,
                             BlockdevOnError on_target_error,
diff --git a/block/backup.c b/block/backup.c
index 21d5983779..104f8fd835 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -34,6 +34,14 @@ typedef struct BackupBlockJob {
     BlockDriverState *cbw;
     BlockDriverState *source_bs;
     BlockDriverState *target_bs;
+    BlockBackend *source_blk;
+    BlockBackend *target_blk;
+    /*
+     * Note that if backup runs with filter (immutable-source parameter is
+     * false), @cbw is set but @source_blk and @target_blk are NULL.
+     * Otherwise if backup runs without filter (immutable-source paramter is
+     * true), @cbw is NULL but @source_blk and @target_blk are set.
+     */
 
     BdrvDirtyBitmap *sync_bitmap;
 
@@ -102,7 +110,17 @@ static void backup_clean(Job *job)
 {
     BackupBlockJob *s = container_of(job, BackupBlockJob, common.job);
     block_job_remove_all_bdrv(&s->common);
-    bdrv_cbw_drop(s->cbw);
+    if (s->cbw) {
+        assert(!s->source_blk && !s->target_blk);
+        bdrv_cbw_drop(s->cbw);
+    } else {
+        block_copy_state_free(s->bcs);
+        s->bcs = NULL;
+        blk_unref(s->source_blk);
+        s->source_blk = NULL;
+        blk_unref(s->target_blk);
+        s->target_blk = NULL;
+    }
 }
 
 void backup_do_checkpoint(BlockJob *job, Error **errp)
@@ -357,6 +375,7 @@ BlockJob *backup_job_create(const char *job_id, BlockDriverState *bs,
                   BitmapSyncMode bitmap_mode,
                   bool compress,
                   const char *filter_node_name,
+                  bool immutable_source,
                   BackupPerf *perf,
                   BlockdevOnError on_source_error,
                   BlockdevOnError on_target_error,
@@ -369,6 +388,7 @@ BlockJob *backup_job_create(const char *job_id, BlockDriverState *bs,
     int64_t cluster_size;
     BlockDriverState *cbw = NULL;
     BlockCopyState *bcs = NULL;
+    BlockBackend *source_blk = NULL, *target_blk = NULL;
 
     assert(bs);
     assert(target);
@@ -377,6 +397,12 @@ BlockJob *backup_job_create(const char *job_id, BlockDriverState *bs,
     assert(sync_mode != MIRROR_SYNC_MODE_INCREMENTAL);
     assert(sync_bitmap || sync_mode != MIRROR_SYNC_MODE_BITMAP);
 
+    if (immutable_source && filter_node_name) {
+        error_setg(errp, "immutable-source and filter-node-name should not "
+                   "be set simultaneously");
+        return NULL;
+    }
+
     if (bs == target) {
         error_setg(errp, "Source and target cannot be the same");
         return NULL;
@@ -451,9 +477,30 @@ BlockJob *backup_job_create(const char *job_id, BlockDriverState *bs,
         goto error;
     }
 
-    cbw = bdrv_cbw_append(bs, target, filter_node_name, &bcs, errp);
-    if (!cbw) {
-        goto error;
+    if (immutable_source) {
+        source_blk = blk_new_with_bs(bs, BLK_PERM_CONSISTENT_READ,
+                                        BLK_PERM_WRITE_UNCHANGED |
+                                        BLK_PERM_CONSISTENT_READ, errp);
+        if (!source_blk) {
+            goto error;
+        }
+
+        target_blk  = blk_new_with_bs(target, BLK_PERM_WRITE,
+                                      BLK_PERM_CONSISTENT_READ, errp);
+        if (!target_blk) {
+            goto error;
+        }
+
+        bcs = block_copy_state_new(blk_root(source_blk), blk_root(target_blk),
+                                   NULL, errp);
+        if (!bcs) {
+            goto error;
+        }
+    } else {
+        cbw = bdrv_cbw_append(bs, target, filter_node_name, &bcs, errp);
+        if (!cbw) {
+            goto error;
+        }
     }
 
     cluster_size = block_copy_cluster_size(bcs);
@@ -465,7 +512,7 @@ BlockJob *backup_job_create(const char *job_id, BlockDriverState *bs,
     }
 
     /* job->len is fixed, so we can't allow resize */
-    job = block_job_create(job_id, &backup_job_driver, txn, cbw,
+    job = block_job_create(job_id, &backup_job_driver, txn, cbw ?: bs,
                            0, BLK_PERM_ALL,
                            speed, creation_flags, cb, opaque, errp);
     if (!job) {
@@ -475,6 +522,8 @@ BlockJob *backup_job_create(const char *job_id, BlockDriverState *bs,
     job->cbw = cbw;
     job->source_bs = bs;
     job->target_bs = target;
+    job->source_blk = source_blk;
+    job->target_blk = target_blk;
     job->on_source_error = on_source_error;
     job->on_target_error = on_target_error;
     job->sync_mode = sync_mode;
@@ -502,6 +551,8 @@ BlockJob *backup_job_create(const char *job_id, BlockDriverState *bs,
     if (cbw) {
         bdrv_cbw_drop(cbw);
     }
+    blk_unref(source_blk);
+    blk_unref(target_blk);
 
     return NULL;
 }
diff --git a/block/replication.c b/block/replication.c
index 55c8f894aa..c6c4d3af85 100644
--- a/block/replication.c
+++ b/block/replication.c
@@ -590,7 +590,7 @@ static void replication_start(ReplicationState *rs, ReplicationMode mode,
         s->backup_job = backup_job_create(
                                 NULL, s->secondary_disk->bs, s->hidden_disk->bs,
                                 0, MIRROR_SYNC_MODE_NONE, NULL, 0, false, NULL,
-                                &perf,
+                                false, &perf,
                                 BLOCKDEV_ON_ERROR_REPORT,
                                 BLOCKDEV_ON_ERROR_REPORT, JOB_INTERNAL,
                                 backup_job_completed, bs, NULL, &local_err);
diff --git a/blockdev.c b/blockdev.c
index 42e098b458..6997eccb4d 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -2878,6 +2878,7 @@ static BlockJob *do_backup_common(BackupCommon *backup,
                             backup->sync, bmap, backup->bitmap_mode,
                             backup->compress,
                             backup->filter_node_name,
+                            backup->immutable_source,
                             &perf,
                             backup->on_source_error,
                             backup->on_target_error,
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v4 18/18] iotests/image-fleecing: test push backup with fleecing
  2022-02-16 19:45 [PATCH v4 00/18] Make image fleecing more usable Vladimir Sementsov-Ogievskiy
                   ` (16 preceding siblings ...)
  2022-02-16 19:46 ` [PATCH v4 17/18] qapi: backup: add immutable-source parameter Vladimir Sementsov-Ogievskiy
@ 2022-02-16 19:46 ` Vladimir Sementsov-Ogievskiy
  17 siblings, 0 replies; 36+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2022-02-16 19:46 UTC (permalink / raw)
  To: qemu-block
  Cc: qemu-devel, armbru, xiechanglong.d, wencongyang2, fam, stefanha,
	eblake, hreitz, kwolf, vsementsov, jsnow, nikita.lapshin

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 tests/qemu-iotests/tests/image-fleecing     | 121 ++++++++++++++------
 tests/qemu-iotests/tests/image-fleecing.out |  63 ++++++++++
 2 files changed, 152 insertions(+), 32 deletions(-)

diff --git a/tests/qemu-iotests/tests/image-fleecing b/tests/qemu-iotests/tests/image-fleecing
index 33995612be..903cd50be9 100755
--- a/tests/qemu-iotests/tests/image-fleecing
+++ b/tests/qemu-iotests/tests/image-fleecing
@@ -49,9 +49,15 @@ remainder = [('0xd5', '0x108000',  '32k'), # Right-end of partial-left [1]
              ('0xdc', '32M',       '32k'), # Left-end of partial-right [2]
              ('0xcd', '0x3ff0000', '64k')] # patterns[3]
 
-def do_test(use_cbw, use_snapshot_access_filter, base_img_path,
-            fleece_img_path, nbd_sock_path, vm,
+def do_test(vm, use_cbw, use_snapshot_access_filter, base_img_path,
+            fleece_img_path, nbd_sock_path=None,
+            target_img_path=None,
             bitmap=False):
+    push_backup = target_img_path is not None
+    assert (nbd_sock_path is not None) != push_backup
+    if push_backup:
+        assert use_cbw
+
     log('--- Setting up images ---')
     log('')
 
@@ -65,6 +71,9 @@ def do_test(use_cbw, use_snapshot_access_filter, base_img_path,
     else:
         assert qemu_img('create', '-f', 'qcow2', fleece_img_path, '64M') == 0
 
+    if push_backup:
+        assert qemu_img('create', '-f', 'qcow2', target_img_path, '64M') == 0
+
     for p in patterns:
         qemu_io('-f', iotests.imgfmt,
                 '-c', 'write -P%s %s %s' % p, base_img_path)
@@ -139,27 +148,45 @@ def do_test(use_cbw, use_snapshot_access_filter, base_img_path,
 
     export_node = 'fl-access' if use_snapshot_access_filter else tmp_node
 
-    log('')
-    log('--- Setting up NBD Export ---')
-    log('')
+    if push_backup:
+        log('')
+        log('--- Starting actual backup ---')
+        log('')
 
-    nbd_uri = 'nbd+unix:///%s?socket=%s' % (export_node, nbd_sock_path)
-    log(vm.qmp('nbd-server-start',
-               {'addr': {'type': 'unix',
-                         'data': {'path': nbd_sock_path}}}))
+        log(vm.qmp('blockdev-add', **{
+            'driver': iotests.imgfmt,
+            'node-name': 'target',
+            'file': {
+                'driver': 'file',
+                'filename': target_img_path
+            }
+        }))
+        log(vm.qmp('blockdev-backup', device=export_node,
+                   sync='full', target='target',
+                   immutable_source=True,
+                   job_id='push-backup', speed=1))
+    else:
+        log('')
+        log('--- Setting up NBD Export ---')
+        log('')
 
-    log(vm.qmp('nbd-server-add', device=export_node))
+        nbd_uri = 'nbd+unix:///%s?socket=%s' % (export_node, nbd_sock_path)
+        log(vm.qmp('nbd-server-start',
+                   {'addr': { 'type': 'unix',
+                              'data': { 'path': nbd_sock_path } } }))
 
-    log('')
-    log('--- Sanity Check ---')
-    log('')
+        log(vm.qmp('nbd-server-add', device=export_node))
 
-    for p in patterns + zeroes:
-        cmd = 'read -P%s %s %s' % p
-        log(cmd)
-        out, ret = qemu_io_pipe_and_status('-r', '-f', 'raw', '-c', cmd, nbd_uri)
-        if ret != 0:
-            print(out)
+        log('')
+        log('--- Sanity Check ---')
+        log('')
+
+        for p in patterns + zeroes:
+            cmd = 'read -P%s %s %s' % p
+            log(cmd)
+            out, ret = qemu_io_pipe_and_status('-r', '-f', 'raw', '-c', cmd, nbd_uri)
+            if ret != 0:
+                print(out)
 
     log('')
     log('--- Testing COW ---')
@@ -170,6 +197,20 @@ def do_test(use_cbw, use_snapshot_access_filter, base_img_path,
         log(cmd)
         log(vm.hmp_qemu_io(qom_path, cmd, qdev=True))
 
+    if push_backup:
+        # Check that previous operations were done during backup, not after
+        result = vm.qmp('query-block-jobs')
+        if len(result['return']) != 1:
+            log('Backup finished too fast, COW is not tested')
+
+        result = vm.qmp('block-job-set-speed', device='push-backup', speed=0)
+        assert result == {'return': {}}
+
+        log(vm.event_wait(name='BLOCK_JOB_COMPLETED',
+                          match={'data': {'device': 'push-backup'}}),
+                          filters=[iotests.filter_qmp_event])
+        log(vm.qmp('blockdev-del', node_name='target'))
+
     log('')
     log('--- Verifying Data ---')
     log('')
@@ -177,15 +218,19 @@ def do_test(use_cbw, use_snapshot_access_filter, base_img_path,
     for p in patterns + zeroes:
         cmd = 'read -P%s %s %s' % p
         log(cmd)
-        out, ret = qemu_io_pipe_and_status('-r', '-f', 'raw', '-c', cmd, nbd_uri)
-        if ret != 0:
-            print(out)
+        if push_backup:
+            assert qemu_io_silent('-r', '-c', cmd, target_img_path) == 0
+        else:
+            out, ret = qemu_io_pipe_and_status('-r', '-f', 'raw', '-c', cmd, nbd_uri)
+            if ret != 0:
+                print(out)
 
     log('')
     log('--- Cleanup ---')
     log('')
 
-    log(vm.qmp('nbd-server-stop'))
+    if not push_backup:
+        log(vm.qmp('nbd-server-stop'))
 
     if use_cbw:
         if use_snapshot_access_filter:
@@ -214,24 +259,36 @@ def do_test(use_cbw, use_snapshot_access_filter, base_img_path,
     log('Done')
 
 
-def test(use_cbw, use_snapshot_access_filter, bitmap=False):
+def test(use_cbw, use_snapshot_access_filter,
+         nbd_sock_path=None, target_img_path=None, bitmap=False):
     with iotests.FilePath('base.img') as base_img_path, \
          iotests.FilePath('fleece.img') as fleece_img_path, \
-         iotests.FilePath('nbd.sock',
-                          base_dir=iotests.sock_dir) as nbd_sock_path, \
          iotests.VM() as vm:
-        do_test(use_cbw, use_snapshot_access_filter, base_img_path,
-                fleece_img_path, nbd_sock_path, vm, bitmap=bitmap)
+        do_test(vm, use_cbw, use_snapshot_access_filter, base_img_path,
+                fleece_img_path, nbd_sock_path, target_img_path,
+                bitmap=bitmap)
+
+def test_pull(use_cbw, use_snapshot_access_filter, bitmap=False):
+    with iotests.FilePath('nbd.sock',
+                          base_dir=iotests.sock_dir) as nbd_sock_path:
+        test(use_cbw, use_snapshot_access_filter, nbd_sock_path, None, bitmap=bitmap)
+
+def test_push():
+    with iotests.FilePath('target.img') as target_img_path:
+        test(True, True, None, target_img_path)
 
 
 log('=== Test backup(sync=none) based fleecing ===\n')
-test(False, False)
+test_pull(False, False)
 
 log('=== Test cbw-filter based fleecing ===\n')
-test(True, False)
+test_pull(True, False)
 
 log('=== Test fleecing-format based fleecing ===\n')
-test(True, True)
+test_pull(True, True)
 
 log('=== Test fleecing-format based fleecing with bitmap ===\n')
-test(True, True, bitmap=True)
+test_pull(True, True, bitmap=True)
+
+log('=== Test push backup with fleecing ===\n')
+test_push()
diff --git a/tests/qemu-iotests/tests/image-fleecing.out b/tests/qemu-iotests/tests/image-fleecing.out
index 62e1c1fe42..acfc89ff0e 100644
--- a/tests/qemu-iotests/tests/image-fleecing.out
+++ b/tests/qemu-iotests/tests/image-fleecing.out
@@ -293,3 +293,66 @@ read -P0xdc 32M 32k
 read -P0xcd 0x3ff0000 64k
 
 Done
+=== Test push backup with fleecing ===
+
+--- Setting up images ---
+
+Done
+
+--- Launching VM ---
+
+Done
+
+--- Setting up Fleecing Graph ---
+
+{"return": {}}
+{"return": {}}
+{"return": {}}
+{"return": {}}
+
+--- Starting actual backup ---
+
+{"return": {}}
+{"return": {}}
+
+--- Testing COW ---
+
+write -P0xab 0 64k
+{"return": ""}
+write -P0xad 0x00f8000 64k
+{"return": ""}
+write -P0x1d 0x2008000 64k
+{"return": ""}
+write -P0xea 0x3fe0000 64k
+{"return": ""}
+{"data": {"device": "push-backup", "len": 67108864, "offset": 67108864, "speed": 0, "type": "backup"}, "event": "BLOCK_JOB_COMPLETED", "timestamp": {"microseconds": "USECS", "seconds": "SECS"}}
+{"return": {}}
+
+--- Verifying Data ---
+
+read -P0x5d 0 64k
+read -P0xd5 1M 64k
+read -P0xdc 32M 64k
+read -P0xcd 0x3ff0000 64k
+read -P0 0x00f8000 32k
+read -P0 0x2010000 32k
+read -P0 0x3fe0000 64k
+
+--- Cleanup ---
+
+{"return": {}}
+{"return": {}}
+{"return": {}}
+{"return": {}}
+
+--- Confirming writes ---
+
+read -P0xab 0 64k
+read -P0xad 0x00f8000 64k
+read -P0x1d 0x2008000 64k
+read -P0xea 0x3fe0000 64k
+read -P0xd5 0x108000 32k
+read -P0xdc 32M 32k
+read -P0xcd 0x3ff0000 64k
+
+Done
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [PATCH v4 03/18] block/block-copy: block_copy_state_new(): add bitmap parameter
  2022-02-16 19:46 ` [PATCH v4 03/18] block/block-copy: block_copy_state_new(): add bitmap parameter Vladimir Sementsov-Ogievskiy
@ 2022-02-24 12:01   ` Hanna Reitz
  0 siblings, 0 replies; 36+ messages in thread
From: Hanna Reitz @ 2022-02-24 12:01 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-block
  Cc: fam, kwolf, wencongyang2, xiechanglong.d, qemu-devel, armbru,
	jsnow, nikita.lapshin, stefanha, eblake

On 16.02.22 20:46, Vladimir Sementsov-Ogievskiy wrote:
> This will be used in the following commit to bring "incremental" mode
> to copy-before-write filter.
>
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>   include/block/block-copy.h |  1 +
>   block/block-copy.c         | 14 +++++++++++++-
>   block/copy-before-write.c  |  2 +-
>   3 files changed, 15 insertions(+), 2 deletions(-)

Reviewed-by: Hanna Reitz <hreitz@redhat.com>



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v4 04/18] block/copy-before-write: add bitmap open parameter
  2022-02-16 19:46 ` [PATCH v4 04/18] block/copy-before-write: add bitmap open parameter Vladimir Sementsov-Ogievskiy
@ 2022-02-24 12:07   ` Hanna Reitz
  2022-02-24 13:27     ` Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 36+ messages in thread
From: Hanna Reitz @ 2022-02-24 12:07 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-block
  Cc: fam, kwolf, wencongyang2, xiechanglong.d, qemu-devel, armbru,
	jsnow, nikita.lapshin, stefanha, eblake

On 16.02.22 20:46, Vladimir Sementsov-Ogievskiy wrote:
> This brings "incremental" mode to copy-before-write filter: user can
> specify bitmap so that filter will copy only "dirty" areas.
>
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>   qapi/block-core.json      | 10 +++++++-
>   block/copy-before-write.c | 51 ++++++++++++++++++++++++++++++++++++++-
>   2 files changed, 59 insertions(+), 2 deletions(-)
>
> diff --git a/qapi/block-core.json b/qapi/block-core.json
> index 9a5a3641d0..3bab597506 100644
> --- a/qapi/block-core.json
> +++ b/qapi/block-core.json
> @@ -4171,11 +4171,19 @@
>   #
>   # @target: The target for copy-before-write operations.
>   #
> +# @bitmap: If specified, copy-before-write filter will do
> +#          copy-before-write operations only for dirty regions of the
> +#          bitmap. Bitmap size must be equal to length of file and
> +#          target child of the filter. Note also, that bitmap is used
> +#          only to initialize internal bitmap of the process, so further
> +#          modifications (or removing) of specified bitmap doesn't
> +#          influence the filter.

Sorry, missed this last time: There should be a “since: 7.0” here.

> +#
>   # Since: 6.2
>   ##
>   { 'struct': 'BlockdevOptionsCbw',
>     'base': 'BlockdevOptionsGenericFormat',
> -  'data': { 'target': 'BlockdevRef' } }
> +  'data': { 'target': 'BlockdevRef', '*bitmap': 'BlockDirtyBitmap' } }
>   
>   ##
>   # @BlockdevOptions:
> diff --git a/block/copy-before-write.c b/block/copy-before-write.c
> index 799223e3fb..91a2288b66 100644
> --- a/block/copy-before-write.c
> +++ b/block/copy-before-write.c
> @@ -34,6 +34,8 @@
>   
>   #include "block/copy-before-write.h"
>   
> +#include "qapi/qapi-visit-block-core.h"
> +
>   typedef struct BDRVCopyBeforeWriteState {
>       BlockCopyState *bcs;
>       BdrvChild *target;
> @@ -145,10 +147,53 @@ static void cbw_child_perm(BlockDriverState *bs, BdrvChild *c,
>       }
>   }
>   
> +static bool cbw_parse_bitmap_option(QDict *options, BdrvDirtyBitmap **bitmap,
> +                                    Error **errp)
> +{
> +    QDict *bitmap_qdict = NULL;
> +    BlockDirtyBitmap *bmp_param = NULL;
> +    Visitor *v = NULL;
> +    bool ret = false;
> +
> +    *bitmap = NULL;
> +
> +    qdict_extract_subqdict(options, &bitmap_qdict, "bitmap.");
> +    if (!qdict_size(bitmap_qdict)) {
> +        ret = true;
> +        goto out;
> +    }
> +
> +    v = qobject_input_visitor_new_flat_confused(bitmap_qdict, errp);
> +    if (!v) {
> +        goto out;
> +    }
> +
> +    visit_type_BlockDirtyBitmap(v, NULL, &bmp_param, errp);
> +    if (!bmp_param) {
> +        goto out;
> +    }
> +
> +    *bitmap = block_dirty_bitmap_lookup(bmp_param->node, bmp_param->name, NULL,
> +                                        errp);
> +    if (!*bitmap) {
> +        goto out;
> +    }
> +
> +    ret = true;
> +
> +out:
> +    qapi_free_BlockDirtyBitmap(bmp_param);
> +    visit_free(v);
> +    qobject_unref(bitmap_qdict);
> +
> +    return ret;
> +}
> +
>   static int cbw_open(BlockDriverState *bs, QDict *options, int flags,
>                       Error **errp)
>   {
>       BDRVCopyBeforeWriteState *s = bs->opaque;
> +    BdrvDirtyBitmap *bitmap = NULL;
>   
>       bs->file = bdrv_open_child(NULL, options, "file", bs, &child_of_bds,
>                                  BDRV_CHILD_FILTERED | BDRV_CHILD_PRIMARY,
> @@ -163,6 +208,10 @@ static int cbw_open(BlockDriverState *bs, QDict *options, int flags,
>           return -EINVAL;
>       }
>   
> +    if (!cbw_parse_bitmap_option(options, &bitmap, errp)) {
> +        return -EINVAL;

Hm...  Just to get a second opinion on this: We don’t need to close 
s->target here, because the failure paths of bdrv_open_inherit() and 
bdrv_new_open_driver_opts() both call bdrv_unref(), which will call 
bdrv_close(), which will close all children including s->target, right?

> +    }
> +
>       bs->total_sectors = bs->file->bs->total_sectors;
>       bs->supported_write_flags = BDRV_REQ_WRITE_UNCHANGED |
>               (BDRV_REQ_FUA & bs->file->bs->supported_write_flags);
> @@ -170,7 +219,7 @@ static int cbw_open(BlockDriverState *bs, QDict *options, int flags,
>               ((BDRV_REQ_FUA | BDRV_REQ_MAY_UNMAP | BDRV_REQ_NO_FALLBACK) &
>                bs->file->bs->supported_zero_flags);
>   
> -    s->bcs = block_copy_state_new(bs->file, s->target, NULL, errp);
> +    s->bcs = block_copy_state_new(bs->file, s->target, bitmap, errp);
>       if (!s->bcs) {
>           error_prepend(errp, "Cannot create block-copy-state: ");
>           return -EINVAL;



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v4 06/18] block: intoduce reqlist
  2022-02-16 19:46 ` [PATCH v4 06/18] block: intoduce reqlist Vladimir Sementsov-Ogievskiy
@ 2022-02-24 12:08   ` Hanna Reitz
  0 siblings, 0 replies; 36+ messages in thread
From: Hanna Reitz @ 2022-02-24 12:08 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-block
  Cc: fam, kwolf, wencongyang2, xiechanglong.d, qemu-devel, armbru,
	jsnow, nikita.lapshin, stefanha, eblake

On 16.02.22 20:46, Vladimir Sementsov-Ogievskiy wrote:
> Split intersecting-requests functionality out of block-copy to be
> reused in copy-before-write filter.
>
> Note: while being here, fix tiny typo in MAINTAINERS.
>
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>   include/block/reqlist.h |  67 +++++++++++++++++++++++
>   block/block-copy.c      | 116 +++++++++++++---------------------------
>   block/reqlist.c         |  76 ++++++++++++++++++++++++++
>   MAINTAINERS             |   4 +-
>   block/meson.build       |   1 +
>   5 files changed, 184 insertions(+), 80 deletions(-)
>   create mode 100644 include/block/reqlist.h
>   create mode 100644 block/reqlist.c

Reviewed-by: Hanna Reitz <hreitz@redhat.com>



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v4 07/18] block/reqlist: reqlist_find_conflict(): use ranges_overlap()
  2022-02-16 19:46 ` [PATCH v4 07/18] block/reqlist: reqlist_find_conflict(): use ranges_overlap() Vladimir Sementsov-Ogievskiy
@ 2022-02-24 12:08   ` Hanna Reitz
  0 siblings, 0 replies; 36+ messages in thread
From: Hanna Reitz @ 2022-02-24 12:08 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-block
  Cc: fam, kwolf, wencongyang2, xiechanglong.d, qemu-devel, armbru,
	jsnow, nikita.lapshin, stefanha, eblake

On 16.02.22 20:46, Vladimir Sementsov-Ogievskiy wrote:
> Let's reuse convenient helper.
>
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>   block/reqlist.c | 3 ++-
>   1 file changed, 2 insertions(+), 1 deletion(-)

Reviewed-by: Hanna Reitz <hreitz@redhat.com>



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v4 08/18] block/dirty-bitmap: introduce bdrv_dirty_bitmap_status()
  2022-02-16 19:46 ` [PATCH v4 08/18] block/dirty-bitmap: introduce bdrv_dirty_bitmap_status() Vladimir Sementsov-Ogievskiy
@ 2022-02-24 12:20   ` Hanna Reitz
  0 siblings, 0 replies; 36+ messages in thread
From: Hanna Reitz @ 2022-02-24 12:20 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-block
  Cc: fam, kwolf, wencongyang2, xiechanglong.d, qemu-devel, armbru,
	jsnow, nikita.lapshin, stefanha, eblake

On 16.02.22 20:46, Vladimir Sementsov-Ogievskiy wrote:
> Add a convenient function similar with bdrv_block_status() to get
> status of dirty bitmap.
>
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>   include/block/dirty-bitmap.h |  2 ++
>   include/qemu/hbitmap.h       | 12 ++++++++++++
>   block/dirty-bitmap.c         |  6 ++++++
>   util/hbitmap.c               | 33 +++++++++++++++++++++++++++++++++
>   4 files changed, 53 insertions(+)

Reviewed-by: Hanna Reitz <hreitz@redhat.com>



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v4 10/18] block/io: introduce block driver snapshot-access API
  2022-02-16 19:46 ` [PATCH v4 10/18] block/io: introduce block driver snapshot-access API Vladimir Sementsov-Ogievskiy
@ 2022-02-24 12:24   ` Hanna Reitz
  0 siblings, 0 replies; 36+ messages in thread
From: Hanna Reitz @ 2022-02-24 12:24 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-block
  Cc: fam, kwolf, wencongyang2, xiechanglong.d, qemu-devel, armbru,
	jsnow, nikita.lapshin, stefanha, eblake

On 16.02.22 20:46, Vladimir Sementsov-Ogievskiy wrote:
> Add new block driver handlers and corresponding generic wrappers.
> It will be used to allow copy-before-write filter to provide
> reach fleecing interface in further commit.
>
> In future this approach may be used to allow reading qcow2 interanal

(s/interanal/internal/)

> snaphots, for example to export them through NBD.

Ooh, that’s indeed quite nice.

Raises the question of how users are to select a specific snapshot in 
qcow2 file, but your next patch answers that question: The snapshot 
access driver is to receive a runtime option for this, and the API is to 
be extended to allow for selecting a specific snapshot.  Sounds good!

> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>   include/block/block_int.h | 27 +++++++++++++++
>   block/io.c                | 69 +++++++++++++++++++++++++++++++++++++++
>   2 files changed, 96 insertions(+)

Yes, really nice.  Thanks.

Reviewed-by: Hanna Reitz <hreitz@redhat.com>



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v4 11/18] block: introduce snapshot-access filter
  2022-02-16 19:46 ` [PATCH v4 11/18] block: introduce snapshot-access filter Vladimir Sementsov-Ogievskiy
@ 2022-02-24 12:29   ` Hanna Reitz
  0 siblings, 0 replies; 36+ messages in thread
From: Hanna Reitz @ 2022-02-24 12:29 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-block
  Cc: fam, kwolf, wencongyang2, xiechanglong.d, qemu-devel, armbru,
	jsnow, nikita.lapshin, stefanha, eblake

On 16.02.22 20:46, Vladimir Sementsov-Ogievskiy wrote:
> The filter simply utilizes snapshot-access API of underlying block

Nit picking: Well, it isn’t really a filter.  I understand where you’re 
coming from, but by definition it isn’t a filter driver.

> node.
>
> In further patches we want to use it like this:
>
> [guest]                   [NBD export]
>     |                            |
>     | root                       | root
>     v                 file       v
> [copy-before-write]<------[snapshot-access]
>     |           |
>     | file      | target
>     v           v
> [active-disk] [temp.img]
>
> This way, NBD client will be able to read snapshotted state of active
> disk, when active disk is continued to be written by guest. This is
> known as "fleecing", and currently uses another scheme based on qcow2
> temporary image which backing file is active-disk. New scheme comes
> with benefits - see next commit.
>
> The other possible application is exporting internal snapshots of
> qcow2, like this:
>
> [guest]          [NBD export]
>     |                  |
>     | root             | root
>     v       file       v
> [qcow2]<---------[snapshot-access]
>
> For this, we'll need to implement snapshot-access API handlers in
> qcow2 driver, and improve snapshot-access filter (and API) to make it
> possibele to select snapshot by name.

s/possibele/possible/

> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>   qapi/block-core.json    |   4 +-
>   block/snapshot-access.c | 132 ++++++++++++++++++++++++++++++++++++++++
>   MAINTAINERS             |   1 +
>   block/meson.build       |   1 +
>   4 files changed, 137 insertions(+), 1 deletion(-)
>   create mode 100644 block/snapshot-access.c

Again, I like this very much, not least because it provides a clean way 
to solve the long-standing question of how to nicely export qcow2 snapshots.

[...]

> diff --git a/block/snapshot-access.c b/block/snapshot-access.c
> new file mode 100644
> index 0000000000..77b87c1946
> --- /dev/null
> +++ b/block/snapshot-access.c

[...]

> +static int snapshot_access_open(BlockDriverState *bs, QDict *options, int flags,
> +                                Error **errp)
> +{
> +    bs->file = bdrv_open_child(NULL, options, "file", bs, &child_of_bds,
> +                               BDRV_CHILD_DATA | BDRV_CHILD_PRIMARY,
> +                               false, errp);
> +    if (!bs->file) {
> +        return -EINVAL;
> +    }
> +
> +    bs->total_sectors = bs->file->bs->total_sectors;

(qcow2) snapshots can have a size that differs from the image’s current 
(active layer) size.  We should accommodate for that here (I guess I’d 
be fine with a FIXME, too, but introducing FIXMEs is always not exactly 
great), I think.

> +
> +    return 0;
> +}
> +



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v4 12/18] block: copy-before-write: realize snapshot-access API
  2022-02-16 19:46 ` [PATCH v4 12/18] block: copy-before-write: realize snapshot-access API Vladimir Sementsov-Ogievskiy
@ 2022-02-24 12:46   ` Hanna Reitz
  2022-02-24 13:42     ` Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 36+ messages in thread
From: Hanna Reitz @ 2022-02-24 12:46 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-block
  Cc: fam, kwolf, wencongyang2, xiechanglong.d, qemu-devel, armbru,
	jsnow, nikita.lapshin, stefanha, eblake

On 16.02.22 20:46, Vladimir Sementsov-Ogievskiy wrote:
> Current scheme of image fleecing looks like this:
>
> [guest]                    [NBD export]
>    |                              |
>    |root                          | root
>    v                              v
> [copy-before-write] -----> [temp.qcow2]
>    |                 target  |
>    |file                     |backing
>    v                         |
> [active disk] <-------------+
>
>   - On guest writes copy-before-write filter copies old data from active
>     disk to temp.qcow2. So fleecing client (NBD export) when reads
>     changed regions from temp.qcow2 image and unchanged from active disk
>     through backing link.
>
> This patch makes possible new image fleecing scheme:
>
> [guest]                   [NBD export]
>     |                            |
>     | root                       | root
>     v                 file       v
> [copy-before-write]<------[x-snapshot-access]
>     |           |
>     | file      | target
>     v           v
> [active-disk] [temp.img]
>
>   - copy-before-write does CBW operations and also provides
>     snapshot-access API. The API may be accessed through
>     x-snapshot-access driver.

The “x-” prefix seems like a relic from an earlier version.

(I agree with what I assume is your opinion now, that we don’t need an 
x- prefix.  I can’t imagine why we’d need to change the snapshot-access 
interface in an incompatible way.)

> Benefits of new scheme:
>
> 1. Access control: if remote client try to read data that not covered
>     by original dirty bitmap used on copy-before-write open, client gets
>     -EACCES.
>
> 2. Discard support: if remote client do DISCARD, this additionally to
>     discarding data in temp.img informs block-copy process to not copy
>     these clusters. Next read from discarded area will return -EACCES.
>     This is significant thing: when fleecing user reads data that was
>     not yet copied to temp.img, we can avoid copying it on further guest
>     write.
>
> 3. Synchronisation between client reads and block-copy write is more
>     efficient. In old scheme we just rely on BDRV_REQ_SERIALISING flag
>     used for writes to temp.qcow2. New scheme is less blocking:
>       - fleecing reads are never blocked: if data region is untouched or
>         in-flight, we just read from active-disk, otherwise we read from
>         temp.img
>       - writes to temp.img are not blocked by fleecing reads
>       - still, guest writes of-course are blocked by in-flight fleecing
>         reads, that currently read from active-disk - it's the minimum
>         necessary blocking
>
> 4. Temporary image may be of any format, as we don't rely on backing
>     feature.
>
> 5. Permission relation are simplified. With old scheme we have to share
>     write permission on target child of copy-before-write, otherwise
>     backing link conflicts with copy-before-write file child write
>     permissions. With new scheme we don't have backing link, and
>     copy-before-write node may have unshared access to temporary node.
>     (Not realized in this commit, will be in future).
>
> 6. Having control on fleecing reads we'll be able to implement
>     alternative behavior on failed copy-before-write operations.
>     Currently we just break guest request (that's a historical behavior
>     of backup). But in some scenarios it's a bad behavior: better
>     is to drop the backup as failed but don't break guest request.
>     With new scheme we can simply unset some bits in a bitmap on CBW
>     failure and further fleecing reads will -EACCES, or something like
>     this. (Not implemented in this commit, will be in future)
>     Additional application for this is implementing timeout for CBW
>     operations.
>
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>   block/copy-before-write.c | 212 +++++++++++++++++++++++++++++++++++++-
>   1 file changed, 211 insertions(+), 1 deletion(-)
>
> diff --git a/block/copy-before-write.c b/block/copy-before-write.c
> index 91a2288b66..a8c88f64eb 100644
> --- a/block/copy-before-write.c
> +++ b/block/copy-before-write.c

[...]

> +static int coroutine_fn
> +cbw_co_snapshot_block_status(BlockDriverState *bs,
> +                             bool want_zero, int64_t offset, int64_t bytes,
> +                             int64_t *pnum, int64_t *map,
> +                             BlockDriverState **file)
> +{
> +    BDRVCopyBeforeWriteState *s = bs->opaque;
> +    BlockReq *req;
> +    int ret;
> +    int64_t cur_bytes;
> +    BdrvChild *child;
> +
> +    req = cbw_snapshot_read_lock(bs, offset, bytes, &cur_bytes, &child);
> +    if (!req) {
> +        return -EACCES;
> +    }
> +
> +    ret = bdrv_block_status(bs, offset, cur_bytes, pnum, map, file);

This looks like an infinite recursion.  Shouldn’t this be s/bs/child->bs/?

> +    if (child == s->target) {
> +        /*
> +         * We refer to s->target only for areas that we've written to it.
> +         * And we can not report unallocated blocks in s->target: this will
> +         * break generic block-status-above logic, that will go to
> +         * copy-before-write filtered child in this case.
> +         */
> +        assert(ret & BDRV_BLOCK_ALLOCATED);
> +    }
> +
> +    cbw_snapshot_read_unlock(bs, req);
> +
> +    return ret;
> +}

[...]

> @@ -225,6 +407,27 @@ static int cbw_open(BlockDriverState *bs, QDict *options, int flags,
>           return -EINVAL;
>       }
>   
> +    cluster_size = block_copy_cluster_size(s->bcs);
> +
> +    s->done_bitmap = bdrv_create_dirty_bitmap(bs, cluster_size, NULL, errp);
> +    if (!s->done_bitmap) {
> +        return -EINVAL;

Hmm, similarly to my question on patch 4, I assume cbw_close() will free 
s->bcs (and also s->done_bitmap in the error case below)?

> +    }
> +    bdrv_disable_dirty_bitmap(s->done_bitmap);
> +
> +    /* s->access_bitmap starts equal to bcs bitmap */
> +    s->access_bitmap = bdrv_create_dirty_bitmap(bs, cluster_size, NULL, errp);
> +    if (!s->access_bitmap) {
> +        return -EINVAL;
> +    }
> +    bdrv_disable_dirty_bitmap(s->access_bitmap);
> +    bdrv_dirty_bitmap_merge_internal(s->access_bitmap,
> +                                     block_copy_dirty_bitmap(s->bcs), NULL,
> +                                     true);
> +
> +    qemu_co_mutex_init(&s->lock);
> +    QLIST_INIT(&s->frozen_read_reqs);
> +
>       return 0;
>   }



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v4 13/18] iotests/image-fleecing: add test-case for fleecing format node
  2022-02-16 19:46 ` [PATCH v4 13/18] iotests/image-fleecing: add test-case for fleecing format node Vladimir Sementsov-Ogievskiy
@ 2022-02-24 12:48   ` Hanna Reitz
  0 siblings, 0 replies; 36+ messages in thread
From: Hanna Reitz @ 2022-02-24 12:48 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-block
  Cc: fam, kwolf, wencongyang2, xiechanglong.d, qemu-devel, armbru,
	jsnow, nikita.lapshin, stefanha, eblake

On 16.02.22 20:46, Vladimir Sementsov-Ogievskiy wrote:
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>   tests/qemu-iotests/tests/image-fleecing     | 64 ++++++++++++-----
>   tests/qemu-iotests/tests/image-fleecing.out | 76 ++++++++++++++++++++-
>   2 files changed, 120 insertions(+), 20 deletions(-)

Reviewed-by: Hanna Reitz <hreitz@redhat.com>



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v4 14/18] iotests.py: add qemu_io_pipe_and_status()
  2022-02-16 19:46 ` [PATCH v4 14/18] iotests.py: add qemu_io_pipe_and_status() Vladimir Sementsov-Ogievskiy
@ 2022-02-24 12:52   ` Hanna Reitz
  2022-02-24 13:42     ` Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 36+ messages in thread
From: Hanna Reitz @ 2022-02-24 12:52 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-block
  Cc: fam, kwolf, wencongyang2, xiechanglong.d, qemu-devel, armbru,
	jsnow, nikita.lapshin, stefanha, eblake

On 16.02.22 20:46, Vladimir Sementsov-Ogievskiy wrote:
> Add helper that returns both status and output, to be used in the
> following commit
>
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>   tests/qemu-iotests/iotests.py | 4 ++++
>   1 file changed, 4 insertions(+)
>
> diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py
> index 6ba65eb1ff..23bc6f686f 100644
> --- a/tests/qemu-iotests/iotests.py
> +++ b/tests/qemu-iotests/iotests.py
> @@ -278,6 +278,10 @@ def qemu_io(*args):
>       '''Run qemu-io and return the stdout data'''
>       return qemu_tool_pipe_and_status('qemu-io', qemu_io_wrap_args(args))[0]
>   
> +def qemu_io_pipe_and_status(*args):
> +    args = qemu_io_args + list(args)
> +    return qemu_tool_pipe_and_status('qemu-io', args)

Shouldn’t we use qemu_io_wrap_args() here, like above?  The next patch 
adds a caller that passes `'-f', 'raw'` to it, which kind of implies to 
me that qemu_io_wrap_args() would be better.

> +
>   def qemu_io_log(*args):
>       result = qemu_io(*args)
>       log(result, filters=[filter_testfiles, filter_qemu_io])



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v4 15/18] iotests/image-fleecing: add test case with bitmap
  2022-02-16 19:46 ` [PATCH v4 15/18] iotests/image-fleecing: add test case with bitmap Vladimir Sementsov-Ogievskiy
@ 2022-02-24 12:58   ` Hanna Reitz
  2022-02-24 14:07     ` Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 36+ messages in thread
From: Hanna Reitz @ 2022-02-24 12:58 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-block
  Cc: fam, kwolf, wencongyang2, xiechanglong.d, qemu-devel, armbru,
	jsnow, nikita.lapshin, stefanha, eblake

On 16.02.22 20:46, Vladimir Sementsov-Ogievskiy wrote:
> Note that reads zero areas (not dirty in the bitmap) fails, that's
> correct.
>
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>   tests/qemu-iotests/tests/image-fleecing     | 32 ++++++--
>   tests/qemu-iotests/tests/image-fleecing.out | 84 +++++++++++++++++++++
>   2 files changed, 108 insertions(+), 8 deletions(-)

Looks good, just one general usage question:

> diff --git a/tests/qemu-iotests/tests/image-fleecing b/tests/qemu-iotests/tests/image-fleecing
> index 909fc0a7ad..33995612be 100755
> --- a/tests/qemu-iotests/tests/image-fleecing
> +++ b/tests/qemu-iotests/tests/image-fleecing
> @@ -23,7 +23,7 @@
>   # Creator/Owner: John Snow <jsnow@redhat.com>
>   
>   import iotests
> -from iotests import log, qemu_img, qemu_io, qemu_io_silent
> +from iotests import log, qemu_img, qemu_io, qemu_io_silent, qemu_io_pipe_and_status
>   
>   iotests.script_initialize(
>       supported_fmts=['qcow2', 'qcow', 'qed', 'vmdk', 'vhdx', 'raw'],
> @@ -50,11 +50,15 @@ remainder = [('0xd5', '0x108000',  '32k'), # Right-end of partial-left [1]
>                ('0xcd', '0x3ff0000', '64k')] # patterns[3]
>   
>   def do_test(use_cbw, use_snapshot_access_filter, base_img_path,
> -            fleece_img_path, nbd_sock_path, vm):
> +            fleece_img_path, nbd_sock_path, vm,
> +            bitmap=False):
>       log('--- Setting up images ---')
>       log('')
>   
>       assert qemu_img('create', '-f', iotests.imgfmt, base_img_path, '64M') == 0
> +    if bitmap:
> +        assert qemu_img('bitmap', '--add', base_img_path, 'bitmap0') == 0
> +
>       if use_snapshot_access_filter:
>           assert use_cbw
>           assert qemu_img('create', '-f', 'raw', fleece_img_path, '64M') == 0
> @@ -106,12 +110,17 @@ def do_test(use_cbw, use_snapshot_access_filter, base_img_path,
>   
>       # Establish CBW from source to fleecing node
>       if use_cbw:
> -        log(vm.qmp('blockdev-add', {
> +        fl_cbw = {
>               'driver': 'copy-before-write',
>               'node-name': 'fl-cbw',
>               'file': src_node,
>               'target': tmp_node
> -        }))
> +        }
> +
> +        if bitmap:
> +            fl_cbw['bitmap'] = {'node': src_node, 'name': 'bitmap0'}
> +
> +        log(vm.qmp('blockdev-add', fl_cbw))
>   
>           log(vm.qmp('qom-set', path=qom_path, property='drive', value='fl-cbw'))

This makes me wonder how exactly the @bitmap parameter is to be used.  
In this case here, we use an active bitmap that tracks all writes, so it 
looks like a case of trying to copy the changes since some previous 
checkpoint (as a point-in-time state).  But if there are any writes 
between the blockdev-add and the qom-set, then they will not be included 
in the CBW bitmap.  Is that fine?  Or is it perhaps even intentional?

(Is the idea that one would use a transaction to disable the current 
bitmap (say “A”), and add a new one (say “B”) at the same time, then use 
bitmap A for the CBW filter, delete it after the backup, and then use B 
for the subsequent backup?)



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v4 17/18] qapi: backup: add immutable-source parameter
  2022-02-16 19:46 ` [PATCH v4 17/18] qapi: backup: add immutable-source parameter Vladimir Sementsov-Ogievskiy
@ 2022-02-24 13:05   ` Hanna Reitz
  2022-02-24 14:14     ` Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 36+ messages in thread
From: Hanna Reitz @ 2022-02-24 13:05 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-block
  Cc: fam, kwolf, wencongyang2, xiechanglong.d, qemu-devel, armbru,
	jsnow, nikita.lapshin, stefanha, eblake

On 16.02.22 20:46, Vladimir Sementsov-Ogievskiy wrote:
> We are on the way to implement internal-backup with fleecing scheme,
> which includes backup job copying from fleecing block driver node
> (which is target of copy-before-write filter) to final target of
> backup. This job doesn't need own filter, as fleecing block driver node
> is a kind of snapshot, it's immutable from reader point of view.
>
> Let's add a parameter for backup to not insert filter but instead
> unshare writes on source. This way backup job becomes a simple copying
> process.
>
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>   qapi/block-core.json      | 11 ++++++-
>   include/block/block_int.h |  1 +
>   block/backup.c            | 61 +++++++++++++++++++++++++++++++++++----
>   block/replication.c       |  2 +-
>   blockdev.c                |  1 +
>   5 files changed, 69 insertions(+), 7 deletions(-)

I’m not really technically opposed to this, but I wonder what the actual 
benefit of this is.  It sounds like the only benefit is that we don’t 
need a filter driver, but what’s the problem with such a filter driver?

(And if we just want to copy data off of a immutable node, I personally 
would go for the mirror job instead, but it isn’t like I could give good 
technical reasons for that personal bias.)



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v4 04/18] block/copy-before-write: add bitmap open parameter
  2022-02-24 12:07   ` Hanna Reitz
@ 2022-02-24 13:27     ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 36+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2022-02-24 13:27 UTC (permalink / raw)
  To: Hanna Reitz, qemu-block
  Cc: qemu-devel, armbru, xiechanglong.d, wencongyang2, fam, stefanha,
	eblake, kwolf, jsnow, nikita.lapshin

24.02.2022 15:07, Hanna Reitz wrote:
> On 16.02.22 20:46, Vladimir Sementsov-Ogievskiy wrote:
>> This brings "incremental" mode to copy-before-write filter: user can
>> specify bitmap so that filter will copy only "dirty" areas.
>>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>> ---
>>   qapi/block-core.json      | 10 +++++++-
>>   block/copy-before-write.c | 51 ++++++++++++++++++++++++++++++++++++++-
>>   2 files changed, 59 insertions(+), 2 deletions(-)
>>
>> diff --git a/qapi/block-core.json b/qapi/block-core.json
>> index 9a5a3641d0..3bab597506 100644
>> --- a/qapi/block-core.json
>> +++ b/qapi/block-core.json
>> @@ -4171,11 +4171,19 @@
>>   #
>>   # @target: The target for copy-before-write operations.
>>   #
>> +# @bitmap: If specified, copy-before-write filter will do
>> +#          copy-before-write operations only for dirty regions of the
>> +#          bitmap. Bitmap size must be equal to length of file and
>> +#          target child of the filter. Note also, that bitmap is used
>> +#          only to initialize internal bitmap of the process, so further
>> +#          modifications (or removing) of specified bitmap doesn't
>> +#          influence the filter.
> 
> Sorry, missed this last time: There should be a “since: 7.0” here.
> 
>> +#
>>   # Since: 6.2
>>   ##
>>   { 'struct': 'BlockdevOptionsCbw',
>>     'base': 'BlockdevOptionsGenericFormat',
>> -  'data': { 'target': 'BlockdevRef' } }
>> +  'data': { 'target': 'BlockdevRef', '*bitmap': 'BlockDirtyBitmap' } }
>>   ##
>>   # @BlockdevOptions:
>> diff --git a/block/copy-before-write.c b/block/copy-before-write.c
>> index 799223e3fb..91a2288b66 100644
>> --- a/block/copy-before-write.c
>> +++ b/block/copy-before-write.c
>> @@ -34,6 +34,8 @@
>>   #include "block/copy-before-write.h"
>> +#include "qapi/qapi-visit-block-core.h"
>> +
>>   typedef struct BDRVCopyBeforeWriteState {
>>       BlockCopyState *bcs;
>>       BdrvChild *target;
>> @@ -145,10 +147,53 @@ static void cbw_child_perm(BlockDriverState *bs, BdrvChild *c,
>>       }
>>   }
>> +static bool cbw_parse_bitmap_option(QDict *options, BdrvDirtyBitmap **bitmap,
>> +                                    Error **errp)
>> +{
>> +    QDict *bitmap_qdict = NULL;
>> +    BlockDirtyBitmap *bmp_param = NULL;
>> +    Visitor *v = NULL;
>> +    bool ret = false;
>> +
>> +    *bitmap = NULL;
>> +
>> +    qdict_extract_subqdict(options, &bitmap_qdict, "bitmap.");
>> +    if (!qdict_size(bitmap_qdict)) {
>> +        ret = true;
>> +        goto out;
>> +    }
>> +
>> +    v = qobject_input_visitor_new_flat_confused(bitmap_qdict, errp);
>> +    if (!v) {
>> +        goto out;
>> +    }
>> +
>> +    visit_type_BlockDirtyBitmap(v, NULL, &bmp_param, errp);
>> +    if (!bmp_param) {
>> +        goto out;
>> +    }
>> +
>> +    *bitmap = block_dirty_bitmap_lookup(bmp_param->node, bmp_param->name, NULL,
>> +                                        errp);
>> +    if (!*bitmap) {
>> +        goto out;
>> +    }
>> +
>> +    ret = true;
>> +
>> +out:
>> +    qapi_free_BlockDirtyBitmap(bmp_param);
>> +    visit_free(v);
>> +    qobject_unref(bitmap_qdict);
>> +
>> +    return ret;
>> +}
>> +
>>   static int cbw_open(BlockDriverState *bs, QDict *options, int flags,
>>                       Error **errp)
>>   {
>>       BDRVCopyBeforeWriteState *s = bs->opaque;
>> +    BdrvDirtyBitmap *bitmap = NULL;
>>       bs->file = bdrv_open_child(NULL, options, "file", bs, &child_of_bds,
>>                                  BDRV_CHILD_FILTERED | BDRV_CHILD_PRIMARY,
>> @@ -163,6 +208,10 @@ static int cbw_open(BlockDriverState *bs, QDict *options, int flags,
>>           return -EINVAL;
>>       }
>> +    if (!cbw_parse_bitmap_option(options, &bitmap, errp)) {
>> +        return -EINVAL;
> 
> Hm...  Just to get a second opinion on this: We don’t need to close s->target here, because the failure paths of bdrv_open_inherit() and bdrv_new_open_driver_opts() both call bdrv_unref(), which will call bdrv_close(), which will close all children including s->target, right?

I think I just followed existing error path in cbw_open() on block_copy_state_new() failure. But I think you are right and bdrv_close() should take care of all bs children.

> 
>> +    }
>> +
>>       bs->total_sectors = bs->file->bs->total_sectors;
>>       bs->supported_write_flags = BDRV_REQ_WRITE_UNCHANGED |
>>               (BDRV_REQ_FUA & bs->file->bs->supported_write_flags);
>> @@ -170,7 +219,7 @@ static int cbw_open(BlockDriverState *bs, QDict *options, int flags,
>>               ((BDRV_REQ_FUA | BDRV_REQ_MAY_UNMAP | BDRV_REQ_NO_FALLBACK) &
>>                bs->file->bs->supported_zero_flags);
>> -    s->bcs = block_copy_state_new(bs->file, s->target, NULL, errp);
>> +    s->bcs = block_copy_state_new(bs->file, s->target, bitmap, errp);
>>       if (!s->bcs) {
>>           error_prepend(errp, "Cannot create block-copy-state: ");
>>           return -EINVAL;
> 


-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v4 12/18] block: copy-before-write: realize snapshot-access API
  2022-02-24 12:46   ` Hanna Reitz
@ 2022-02-24 13:42     ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 36+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2022-02-24 13:42 UTC (permalink / raw)
  To: Hanna Reitz, qemu-block
  Cc: qemu-devel, armbru, xiechanglong.d, wencongyang2, fam, stefanha,
	eblake, kwolf, jsnow, nikita.lapshin

24.02.2022 15:46, Hanna Reitz wrote:
> On 16.02.22 20:46, Vladimir Sementsov-Ogievskiy wrote:
>> Current scheme of image fleecing looks like this:
>>
>> [guest]                    [NBD export]
>>    |                              |
>>    |root                          | root
>>    v                              v
>> [copy-before-write] -----> [temp.qcow2]
>>    |                 target  |
>>    |file                     |backing
>>    v                         |
>> [active disk] <-------------+
>>
>>   - On guest writes copy-before-write filter copies old data from active
>>     disk to temp.qcow2. So fleecing client (NBD export) when reads
>>     changed regions from temp.qcow2 image and unchanged from active disk
>>     through backing link.
>>
>> This patch makes possible new image fleecing scheme:
>>
>> [guest]                   [NBD export]
>>     |                            |
>>     | root                       | root
>>     v                 file       v
>> [copy-before-write]<------[x-snapshot-access]
>>     |           |
>>     | file      | target
>>     v           v
>> [active-disk] [temp.img]
>>
>>   - copy-before-write does CBW operations and also provides
>>     snapshot-access API. The API may be accessed through
>>     x-snapshot-access driver.
> 
> The “x-” prefix seems like a relic from an earlier version.
> 
> (I agree with what I assume is your opinion now, that we don’t need an x- prefix.  I can’t imagine why we’d need to change the snapshot-access interface in an incompatible way.)
> 
>> Benefits of new scheme:
>>
>> 1. Access control: if remote client try to read data that not covered
>>     by original dirty bitmap used on copy-before-write open, client gets
>>     -EACCES.
>>
>> 2. Discard support: if remote client do DISCARD, this additionally to
>>     discarding data in temp.img informs block-copy process to not copy
>>     these clusters. Next read from discarded area will return -EACCES.
>>     This is significant thing: when fleecing user reads data that was
>>     not yet copied to temp.img, we can avoid copying it on further guest
>>     write.
>>
>> 3. Synchronisation between client reads and block-copy write is more
>>     efficient. In old scheme we just rely on BDRV_REQ_SERIALISING flag
>>     used for writes to temp.qcow2. New scheme is less blocking:
>>       - fleecing reads are never blocked: if data region is untouched or
>>         in-flight, we just read from active-disk, otherwise we read from
>>         temp.img
>>       - writes to temp.img are not blocked by fleecing reads
>>       - still, guest writes of-course are blocked by in-flight fleecing
>>         reads, that currently read from active-disk - it's the minimum
>>         necessary blocking
>>
>> 4. Temporary image may be of any format, as we don't rely on backing
>>     feature.
>>
>> 5. Permission relation are simplified. With old scheme we have to share
>>     write permission on target child of copy-before-write, otherwise
>>     backing link conflicts with copy-before-write file child write
>>     permissions. With new scheme we don't have backing link, and
>>     copy-before-write node may have unshared access to temporary node.
>>     (Not realized in this commit, will be in future).
>>
>> 6. Having control on fleecing reads we'll be able to implement
>>     alternative behavior on failed copy-before-write operations.
>>     Currently we just break guest request (that's a historical behavior
>>     of backup). But in some scenarios it's a bad behavior: better
>>     is to drop the backup as failed but don't break guest request.
>>     With new scheme we can simply unset some bits in a bitmap on CBW
>>     failure and further fleecing reads will -EACCES, or something like
>>     this. (Not implemented in this commit, will be in future)
>>     Additional application for this is implementing timeout for CBW
>>     operations.
>>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>> ---
>>   block/copy-before-write.c | 212 +++++++++++++++++++++++++++++++++++++-
>>   1 file changed, 211 insertions(+), 1 deletion(-)
>>
>> diff --git a/block/copy-before-write.c b/block/copy-before-write.c
>> index 91a2288b66..a8c88f64eb 100644
>> --- a/block/copy-before-write.c
>> +++ b/block/copy-before-write.c
> 
> [...]
> 
>> +static int coroutine_fn
>> +cbw_co_snapshot_block_status(BlockDriverState *bs,
>> +                             bool want_zero, int64_t offset, int64_t bytes,
>> +                             int64_t *pnum, int64_t *map,
>> +                             BlockDriverState **file)
>> +{
>> +    BDRVCopyBeforeWriteState *s = bs->opaque;
>> +    BlockReq *req;
>> +    int ret;
>> +    int64_t cur_bytes;
>> +    BdrvChild *child;
>> +
>> +    req = cbw_snapshot_read_lock(bs, offset, bytes, &cur_bytes, &child);
>> +    if (!req) {
>> +        return -EACCES;
>> +    }
>> +
>> +    ret = bdrv_block_status(bs, offset, cur_bytes, pnum, map, file);
> 
> This looks like an infinite recursion.  Shouldn’t this be s/bs/child->bs/?

Oh, yes, right

> 
>> +    if (child == s->target) {
>> +        /*
>> +         * We refer to s->target only for areas that we've written to it.
>> +         * And we can not report unallocated blocks in s->target: this will
>> +         * break generic block-status-above logic, that will go to
>> +         * copy-before-write filtered child in this case.
>> +         */
>> +        assert(ret & BDRV_BLOCK_ALLOCATED);
>> +    }
>> +
>> +    cbw_snapshot_read_unlock(bs, req);
>> +
>> +    return ret;
>> +}
> 
> [...]
> 
>> @@ -225,6 +407,27 @@ static int cbw_open(BlockDriverState *bs, QDict *options, int flags,
>>           return -EINVAL;
>>       }
>> +    cluster_size = block_copy_cluster_size(s->bcs);
>> +
>> +    s->done_bitmap = bdrv_create_dirty_bitmap(bs, cluster_size, NULL, errp);
>> +    if (!s->done_bitmap) {
>> +        return -EINVAL;
> 
> Hmm, similarly to my question on patch 4, I assume cbw_close() will free s->bcs (and also s->done_bitmap in the error case below)?

Honestly, I don't remember did I think of it really. But I think it should work as you describe.

Interesting that in qcow2 we have code in the end of qcow2_do_open on "fail:" path, mostly duplicating what we have in qcow2_close(). Seems it may be simplified.

> 
>> +    }
>> +    bdrv_disable_dirty_bitmap(s->done_bitmap);
>> +
>> +    /* s->access_bitmap starts equal to bcs bitmap */
>> +    s->access_bitmap = bdrv_create_dirty_bitmap(bs, cluster_size, NULL, errp);
>> +    if (!s->access_bitmap) {
>> +        return -EINVAL;
>> +    }
>> +    bdrv_disable_dirty_bitmap(s->access_bitmap);
>> +    bdrv_dirty_bitmap_merge_internal(s->access_bitmap,
>> +                                     block_copy_dirty_bitmap(s->bcs), NULL,
>> +                                     true);
>> +
>> +    qemu_co_mutex_init(&s->lock);
>> +    QLIST_INIT(&s->frozen_read_reqs);
>> +
>>       return 0;
>>   }
> 


-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v4 14/18] iotests.py: add qemu_io_pipe_and_status()
  2022-02-24 12:52   ` Hanna Reitz
@ 2022-02-24 13:42     ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 36+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2022-02-24 13:42 UTC (permalink / raw)
  To: Hanna Reitz, qemu-block
  Cc: qemu-devel, armbru, xiechanglong.d, wencongyang2, fam, stefanha,
	eblake, kwolf, jsnow, nikita.lapshin

24.02.2022 15:52, Hanna Reitz wrote:
> On 16.02.22 20:46, Vladimir Sementsov-Ogievskiy wrote:
>> Add helper that returns both status and output, to be used in the
>> following commit
>>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>> ---
>>   tests/qemu-iotests/iotests.py | 4 ++++
>>   1 file changed, 4 insertions(+)
>>
>> diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py
>> index 6ba65eb1ff..23bc6f686f 100644
>> --- a/tests/qemu-iotests/iotests.py
>> +++ b/tests/qemu-iotests/iotests.py
>> @@ -278,6 +278,10 @@ def qemu_io(*args):
>>       '''Run qemu-io and return the stdout data'''
>>       return qemu_tool_pipe_and_status('qemu-io', qemu_io_wrap_args(args))[0]
>> +def qemu_io_pipe_and_status(*args):
>> +    args = qemu_io_args + list(args)
>> +    return qemu_tool_pipe_and_status('qemu-io', args)
> 
> Shouldn’t we use qemu_io_wrap_args() here, like above?  The next patch adds a caller that passes `'-f', 'raw'` to it, which kind of implies to me that qemu_io_wrap_args() would be better.

Will do

> 
>> +
>>   def qemu_io_log(*args):
>>       result = qemu_io(*args)
>>       log(result, filters=[filter_testfiles, filter_qemu_io])
> 


-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v4 15/18] iotests/image-fleecing: add test case with bitmap
  2022-02-24 12:58   ` Hanna Reitz
@ 2022-02-24 14:07     ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 36+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2022-02-24 14:07 UTC (permalink / raw)
  To: Hanna Reitz, qemu-block
  Cc: qemu-devel, armbru, xiechanglong.d, wencongyang2, fam, stefanha,
	eblake, kwolf, jsnow, nikita.lapshin

24.02.2022 15:58, Hanna Reitz wrote:
> On 16.02.22 20:46, Vladimir Sementsov-Ogievskiy wrote:
>> Note that reads zero areas (not dirty in the bitmap) fails, that's
>> correct.
>>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>> ---
>>   tests/qemu-iotests/tests/image-fleecing     | 32 ++++++--
>>   tests/qemu-iotests/tests/image-fleecing.out | 84 +++++++++++++++++++++
>>   2 files changed, 108 insertions(+), 8 deletions(-)
> 
> Looks good, just one general usage question:
> 
>> diff --git a/tests/qemu-iotests/tests/image-fleecing b/tests/qemu-iotests/tests/image-fleecing
>> index 909fc0a7ad..33995612be 100755
>> --- a/tests/qemu-iotests/tests/image-fleecing
>> +++ b/tests/qemu-iotests/tests/image-fleecing
>> @@ -23,7 +23,7 @@
>>   # Creator/Owner: John Snow <jsnow@redhat.com>
>>   import iotests
>> -from iotests import log, qemu_img, qemu_io, qemu_io_silent
>> +from iotests import log, qemu_img, qemu_io, qemu_io_silent, qemu_io_pipe_and_status
>>   iotests.script_initialize(
>>       supported_fmts=['qcow2', 'qcow', 'qed', 'vmdk', 'vhdx', 'raw'],
>> @@ -50,11 +50,15 @@ remainder = [('0xd5', '0x108000',  '32k'), # Right-end of partial-left [1]
>>                ('0xcd', '0x3ff0000', '64k')] # patterns[3]
>>   def do_test(use_cbw, use_snapshot_access_filter, base_img_path,
>> -            fleece_img_path, nbd_sock_path, vm):
>> +            fleece_img_path, nbd_sock_path, vm,
>> +            bitmap=False):
>>       log('--- Setting up images ---')
>>       log('')
>>       assert qemu_img('create', '-f', iotests.imgfmt, base_img_path, '64M') == 0
>> +    if bitmap:
>> +        assert qemu_img('bitmap', '--add', base_img_path, 'bitmap0') == 0
>> +
>>       if use_snapshot_access_filter:
>>           assert use_cbw
>>           assert qemu_img('create', '-f', 'raw', fleece_img_path, '64M') == 0
>> @@ -106,12 +110,17 @@ def do_test(use_cbw, use_snapshot_access_filter, base_img_path,
>>       # Establish CBW from source to fleecing node
>>       if use_cbw:
>> -        log(vm.qmp('blockdev-add', {
>> +        fl_cbw = {
>>               'driver': 'copy-before-write',
>>               'node-name': 'fl-cbw',
>>               'file': src_node,
>>               'target': tmp_node
>> -        }))
>> +        }
>> +
>> +        if bitmap:
>> +            fl_cbw['bitmap'] = {'node': src_node, 'name': 'bitmap0'}
>> +
>> +        log(vm.qmp('blockdev-add', fl_cbw))
>>           log(vm.qmp('qom-set', path=qom_path, property='drive', value='fl-cbw'))
> 
> This makes me wonder how exactly the @bitmap parameter is to be used. In this case here, we use an active bitmap that tracks all writes, so it looks like a case of trying to copy the changes since some previous checkpoint (as a point-in-time state).  But if there are any writes between the blockdev-add and the qom-set, then they will not be included in the CBW bitmap.  Is that fine?  Or is it perhaps even intentional?
> 
> (Is the idea that one would use a transaction to disable the current bitmap (say “A”), and add a new one (say “B”) at the same time, then use bitmap A for the CBW filter, delete it after the backup, and then use B for the subsequent backup?)
> 

Hmm, good question. If we do this way, we break a point-in-time of backup.. We'll make a copy of disk in state of the moment of qom-set, but use an outdated copy of bitmap..

Good solution would do blockdev-add and qom-set in one transaction. But it's more possible to make transaction support for my proposed blockdev-replace, which should substitute qom-set in this scenario..

And supporting blockdev-add in transaction is not simple too.

With usual backup we simply do blockdev-backup and all needed bitmap manipulations in one transaction. With filter, actual backup start is qom-set (or blockdev-replace), not blockdev-add.. But we can't pass bitmap parameter to qom-set or blockdev-replace.

We probably could support blockdev-reopen in transaction, and change the bitmap in reopen.. But that seems wrong to me: we should not use reopen in scenario where we've just created this temporary node with all arguments we want.

Keeping in mind my recent series that introduces a kind of transaction for bdrv_close, may be the best and more native way is really support blockdev-add and blockdev-del in transaction.


The only alternative way I see is to not copy the user-given bitmap, but use exactly what user gives. This way, we do the following:

1. User give active bitmap A to cbw_open, so bitmap continue to track dirtiness.
2. User start a new dirty bitmap B
3. On filter insertion, we have a good bitmap with all needed dirty bits
4. After filter insertion, user stops tracking in bitmap A (disable it)

This way, we'll not lose any data. The drawback, is that we may copy some extra data, that was not actually dirty at point [3] (because we disable bitmap A after it, not in transaction). As well, bitmap B which will be used for next incremental backup will probably contain some extra dirty bits. That's not bad, but that's not an ideal architecture.

-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v4 17/18] qapi: backup: add immutable-source parameter
  2022-02-24 13:05   ` Hanna Reitz
@ 2022-02-24 14:14     ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 36+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2022-02-24 14:14 UTC (permalink / raw)
  To: Hanna Reitz, qemu-block
  Cc: qemu-devel, armbru, xiechanglong.d, wencongyang2, fam, stefanha,
	eblake, kwolf, jsnow, nikita.lapshin

24.02.2022 16:05, Hanna Reitz wrote:
> On 16.02.22 20:46, Vladimir Sementsov-Ogievskiy wrote:
>> We are on the way to implement internal-backup with fleecing scheme,
>> which includes backup job copying from fleecing block driver node
>> (which is target of copy-before-write filter) to final target of
>> backup. This job doesn't need own filter, as fleecing block driver node
>> is a kind of snapshot, it's immutable from reader point of view.
>>
>> Let's add a parameter for backup to not insert filter but instead
>> unshare writes on source. This way backup job becomes a simple copying
>> process.
>>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>> ---
>>   qapi/block-core.json      | 11 ++++++-
>>   include/block/block_int.h |  1 +
>>   block/backup.c            | 61 +++++++++++++++++++++++++++++++++++----
>>   block/replication.c       |  2 +-
>>   blockdev.c                |  1 +
>>   5 files changed, 69 insertions(+), 7 deletions(-)
> 
> I’m not really technically opposed to this, but I wonder what the actual benefit of this is.  It sounds like the only benefit is that we don’t need a filter driver, but what’s the problem with such a filter driver?

Hmm. Yes, that's the only benefit: less extra components -> more stability.

But I doubt now does it really worth extra parameter.. More parameters that actually change nothing for the user -> less stability :)

Ok, I think I at least should postpone it for now, this series is too fat even without this patch.

The only possible problem - will permission conflict happen in the next test without this patch? But if it will, the solution should exist to solve it without user interaction. I'll check and try to avoid this new parameter.

> 
> (And if we just want to copy data off of a immutable node, I personally would go for the mirror job instead, but it isn’t like I could give good technical reasons for that personal bias.)
> 

I still hope that in far good future mirror will work through block/block-copy like backup, and there would be no difference what to use for immutable source copying.


Thanks a lot for reviewing!

-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 36+ messages in thread

end of thread, other threads:[~2022-02-24 14:18 UTC | newest]

Thread overview: 36+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-02-16 19:45 [PATCH v4 00/18] Make image fleecing more usable Vladimir Sementsov-Ogievskiy
2022-02-16 19:46 ` [PATCH v4 01/18] block/block-copy: move copy_bitmap initialization to block_copy_state_new() Vladimir Sementsov-Ogievskiy
2022-02-16 19:46 ` [PATCH v4 02/18] block/dirty-bitmap: bdrv_merge_dirty_bitmap(): add return value Vladimir Sementsov-Ogievskiy
2022-02-16 19:46 ` [PATCH v4 03/18] block/block-copy: block_copy_state_new(): add bitmap parameter Vladimir Sementsov-Ogievskiy
2022-02-24 12:01   ` Hanna Reitz
2022-02-16 19:46 ` [PATCH v4 04/18] block/copy-before-write: add bitmap open parameter Vladimir Sementsov-Ogievskiy
2022-02-24 12:07   ` Hanna Reitz
2022-02-24 13:27     ` Vladimir Sementsov-Ogievskiy
2022-02-16 19:46 ` [PATCH v4 05/18] block/block-copy: add block_copy_reset() Vladimir Sementsov-Ogievskiy
2022-02-16 19:46 ` [PATCH v4 06/18] block: intoduce reqlist Vladimir Sementsov-Ogievskiy
2022-02-24 12:08   ` Hanna Reitz
2022-02-16 19:46 ` [PATCH v4 07/18] block/reqlist: reqlist_find_conflict(): use ranges_overlap() Vladimir Sementsov-Ogievskiy
2022-02-24 12:08   ` Hanna Reitz
2022-02-16 19:46 ` [PATCH v4 08/18] block/dirty-bitmap: introduce bdrv_dirty_bitmap_status() Vladimir Sementsov-Ogievskiy
2022-02-24 12:20   ` Hanna Reitz
2022-02-16 19:46 ` [PATCH v4 09/18] block/reqlist: add reqlist_wait_all() Vladimir Sementsov-Ogievskiy
2022-02-16 19:46 ` [PATCH v4 10/18] block/io: introduce block driver snapshot-access API Vladimir Sementsov-Ogievskiy
2022-02-24 12:24   ` Hanna Reitz
2022-02-16 19:46 ` [PATCH v4 11/18] block: introduce snapshot-access filter Vladimir Sementsov-Ogievskiy
2022-02-24 12:29   ` Hanna Reitz
2022-02-16 19:46 ` [PATCH v4 12/18] block: copy-before-write: realize snapshot-access API Vladimir Sementsov-Ogievskiy
2022-02-24 12:46   ` Hanna Reitz
2022-02-24 13:42     ` Vladimir Sementsov-Ogievskiy
2022-02-16 19:46 ` [PATCH v4 13/18] iotests/image-fleecing: add test-case for fleecing format node Vladimir Sementsov-Ogievskiy
2022-02-24 12:48   ` Hanna Reitz
2022-02-16 19:46 ` [PATCH v4 14/18] iotests.py: add qemu_io_pipe_and_status() Vladimir Sementsov-Ogievskiy
2022-02-24 12:52   ` Hanna Reitz
2022-02-24 13:42     ` Vladimir Sementsov-Ogievskiy
2022-02-16 19:46 ` [PATCH v4 15/18] iotests/image-fleecing: add test case with bitmap Vladimir Sementsov-Ogievskiy
2022-02-24 12:58   ` Hanna Reitz
2022-02-24 14:07     ` Vladimir Sementsov-Ogievskiy
2022-02-16 19:46 ` [PATCH v4 16/18] block: blk_root(): return non-const pointer Vladimir Sementsov-Ogievskiy
2022-02-16 19:46 ` [PATCH v4 17/18] qapi: backup: add immutable-source parameter Vladimir Sementsov-Ogievskiy
2022-02-24 13:05   ` Hanna Reitz
2022-02-24 14:14     ` Vladimir Sementsov-Ogievskiy
2022-02-16 19:46 ` [PATCH v4 18/18] iotests/image-fleecing: test push backup with fleecing Vladimir Sementsov-Ogievskiy

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.