All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 00/19] Make image fleecing more usable
@ 2021-12-22 17:39 Vladimir Sementsov-Ogievskiy
  2021-12-22 17:40 ` [PATCH v3 01/19] block/block-copy: move copy_bitmap initialization to block_copy_state_new() Vladimir Sementsov-Ogievskiy
                   ` (18 more replies)
  0 siblings, 19 replies; 38+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2021-12-22 17:39 UTC (permalink / raw)
  To: qemu-block
  Cc: qemu-devel, armbru, xiechanglong.d, wencongyang2, eblake, hreitz,
	kwolf, vsementsov, jsnow, nikita.lapshin

v3: Rebase on master, small qapi docs improvements suggested by Markus

There several improvements to fleecing scheme:

1. support bitmap in copy-before-write filter

2. introduce fleecing block driver, which opens the door for a lot of
   image fleecing improvements.
   See "block: introduce fleecing block driver" commit message for
   details.

3. support "push backup with fleecing" scheme, when backup job is a
   client of common fleecing scheme. That helps when writes to final
   backup target are slow and we don't want guest writes hang waiting
   for copy-before-write operations to final target.

Vladimir Sementsov-Ogievskiy (19):
  block/block-copy: move copy_bitmap initialization to
    block_copy_state_new()
  block/dirty-bitmap: bdrv_merge_dirty_bitmap(): add return value
  block/block-copy: block_copy_state_new(): add bitmap parameter
  block/copy-before-write: add bitmap open parameter
  block/block-copy: add block_copy_reset()
  block: intoduce reqlist
  block/dirty-bitmap: introduce bdrv_dirty_bitmap_status()
  block/reqlist: add reqlist_wait_all()
  block: introduce FleecingState class
  block: introduce fleecing block driver
  block/copy-before-write: support fleecing block driver
  block/block-copy: add write-unchanged mode
  block/copy-before-write: use write-unchanged in fleecing mode
  iotests/image-fleecing: add test-case for fleecing format node
  iotests.py: add qemu_io_pipe_and_status()
  iotests/image-fleecing: add test case with bitmap
  block: blk_root(): return non-const pointer
  qapi: backup: add immutable-source parameter
  iotests/image-fleecing: test push backup with fleecing

 qapi/block-core.json                        |  58 ++++-
 block/fleecing.h                            | 151 +++++++++++
 include/block/block-copy.h                  |   4 +-
 include/block/block_int.h                   |   1 +
 include/block/dirty-bitmap.h                |   4 +-
 include/block/reqlist.h                     |  75 ++++++
 include/qemu/hbitmap.h                      |  11 +
 include/sysemu/block-backend.h              |   2 +-
 block/backup.c                              |  61 ++++-
 block/block-backend.c                       |   2 +-
 block/block-copy.c                          | 157 +++++-------
 block/copy-before-write.c                   |  70 +++++-
 block/dirty-bitmap.c                        |  15 +-
 block/fleecing-drv.c                        | 261 ++++++++++++++++++++
 block/fleecing.c                            | 182 ++++++++++++++
 block/monitor/bitmap-qmp-cmds.c             |   5 +-
 block/replication.c                         |   2 +-
 block/reqlist.c                             |  84 +++++++
 blockdev.c                                  |   1 +
 util/hbitmap.c                              |  36 +++
 MAINTAINERS                                 |   7 +-
 block/meson.build                           |   3 +
 tests/qemu-iotests/iotests.py               |   4 +
 tests/qemu-iotests/tests/image-fleecing     | 178 ++++++++++---
 tests/qemu-iotests/tests/image-fleecing.out | 223 ++++++++++++++++-
 25 files changed, 1441 insertions(+), 156 deletions(-)
 create mode 100644 block/fleecing.h
 create mode 100644 include/block/reqlist.h
 create mode 100644 block/fleecing-drv.c
 create mode 100644 block/fleecing.c
 create mode 100644 block/reqlist.c

-- 
2.31.1



^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH v3 01/19] block/block-copy: move copy_bitmap initialization to block_copy_state_new()
  2021-12-22 17:39 [PATCH v3 00/19] Make image fleecing more usable Vladimir Sementsov-Ogievskiy
@ 2021-12-22 17:40 ` Vladimir Sementsov-Ogievskiy
  2022-01-14 16:54   ` Hanna Reitz
  2021-12-22 17:40 ` [PATCH v3 02/19] block/dirty-bitmap: bdrv_merge_dirty_bitmap(): add return value Vladimir Sementsov-Ogievskiy
                   ` (17 subsequent siblings)
  18 siblings, 1 reply; 38+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2021-12-22 17:40 UTC (permalink / raw)
  To: qemu-block
  Cc: qemu-devel, armbru, xiechanglong.d, wencongyang2, eblake, hreitz,
	kwolf, vsementsov, jsnow, nikita.lapshin

We are going to complicate bitmap initialization in the further
commit. And in future, backup job will be able to work without filter
(when source is immutable), so we'll need same bitmap initialization in
copy-before-write filter and in backup job. So, it's reasonable to do
it in block-copy.

Note that for now cbw_open() is the only caller of
block_copy_state_new().

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 block/block-copy.c        | 1 +
 block/copy-before-write.c | 4 ----
 2 files changed, 1 insertion(+), 4 deletions(-)

diff --git a/block/block-copy.c b/block/block-copy.c
index ce116318b5..abda7a80bd 100644
--- a/block/block-copy.c
+++ b/block/block-copy.c
@@ -402,6 +402,7 @@ BlockCopyState *block_copy_state_new(BdrvChild *source, BdrvChild *target,
         return NULL;
     }
     bdrv_disable_dirty_bitmap(copy_bitmap);
+    bdrv_set_dirty_bitmap(copy_bitmap, 0, bdrv_dirty_bitmap_size(copy_bitmap));
 
     /*
      * If source is in backing chain of target assume that target is going to be
diff --git a/block/copy-before-write.c b/block/copy-before-write.c
index c30a5ff8de..5bdaf0a9d9 100644
--- a/block/copy-before-write.c
+++ b/block/copy-before-write.c
@@ -149,7 +149,6 @@ static int cbw_open(BlockDriverState *bs, QDict *options, int flags,
                     Error **errp)
 {
     BDRVCopyBeforeWriteState *s = bs->opaque;
-    BdrvDirtyBitmap *copy_bitmap;
 
     bs->file = bdrv_open_child(NULL, options, "file", bs, &child_of_bds,
                                BDRV_CHILD_FILTERED | BDRV_CHILD_PRIMARY,
@@ -177,9 +176,6 @@ static int cbw_open(BlockDriverState *bs, QDict *options, int flags,
         return -EINVAL;
     }
 
-    copy_bitmap = block_copy_dirty_bitmap(s->bcs);
-    bdrv_set_dirty_bitmap(copy_bitmap, 0, bdrv_dirty_bitmap_size(copy_bitmap));
-
     return 0;
 }
 
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v3 02/19] block/dirty-bitmap: bdrv_merge_dirty_bitmap(): add return value
  2021-12-22 17:39 [PATCH v3 00/19] Make image fleecing more usable Vladimir Sementsov-Ogievskiy
  2021-12-22 17:40 ` [PATCH v3 01/19] block/block-copy: move copy_bitmap initialization to block_copy_state_new() Vladimir Sementsov-Ogievskiy
@ 2021-12-22 17:40 ` Vladimir Sementsov-Ogievskiy
  2022-01-14 16:55   ` Hanna Reitz
  2021-12-22 17:40 ` [PATCH v3 03/19] block/block-copy: block_copy_state_new(): add bitmap parameter Vladimir Sementsov-Ogievskiy
                   ` (16 subsequent siblings)
  18 siblings, 1 reply; 38+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2021-12-22 17:40 UTC (permalink / raw)
  To: qemu-block
  Cc: qemu-devel, armbru, xiechanglong.d, wencongyang2, eblake, hreitz,
	kwolf, vsementsov, jsnow, nikita.lapshin

That simplifies handling failure in existing code and in further new
usage of bdrv_merge_dirty_bitmap().

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 include/block/dirty-bitmap.h    | 2 +-
 block/dirty-bitmap.c            | 9 +++++++--
 block/monitor/bitmap-qmp-cmds.c | 5 +----
 3 files changed, 9 insertions(+), 7 deletions(-)

diff --git a/include/block/dirty-bitmap.h b/include/block/dirty-bitmap.h
index 40950ae3d5..f95d350b70 100644
--- a/include/block/dirty-bitmap.h
+++ b/include/block/dirty-bitmap.h
@@ -77,7 +77,7 @@ void bdrv_dirty_bitmap_set_persistence(BdrvDirtyBitmap *bitmap,
                                        bool persistent);
 void bdrv_dirty_bitmap_set_inconsistent(BdrvDirtyBitmap *bitmap);
 void bdrv_dirty_bitmap_set_busy(BdrvDirtyBitmap *bitmap, bool busy);
-void bdrv_merge_dirty_bitmap(BdrvDirtyBitmap *dest, const BdrvDirtyBitmap *src,
+bool bdrv_merge_dirty_bitmap(BdrvDirtyBitmap *dest, const BdrvDirtyBitmap *src,
                              HBitmap **backup, Error **errp);
 void bdrv_dirty_bitmap_skip_store(BdrvDirtyBitmap *bitmap, bool skip);
 bool bdrv_dirty_bitmap_get(BdrvDirtyBitmap *bitmap, int64_t offset);
diff --git a/block/dirty-bitmap.c b/block/dirty-bitmap.c
index 0ef46163e3..94a0276833 100644
--- a/block/dirty-bitmap.c
+++ b/block/dirty-bitmap.c
@@ -880,11 +880,14 @@ bool bdrv_dirty_bitmap_next_dirty_area(BdrvDirtyBitmap *bitmap,
  * Ensures permissions on bitmaps are reasonable; use for public API.
  *
  * @backup: If provided, make a copy of dest here prior to merge.
+ *
+ * Returns true on success, false on failure. In case of failure bitmaps are
+ * untouched.
  */
-void bdrv_merge_dirty_bitmap(BdrvDirtyBitmap *dest, const BdrvDirtyBitmap *src,
+bool bdrv_merge_dirty_bitmap(BdrvDirtyBitmap *dest, const BdrvDirtyBitmap *src,
                              HBitmap **backup, Error **errp)
 {
-    bool ret;
+    bool ret = false;
 
     bdrv_dirty_bitmaps_lock(dest->bs);
     if (src->bs != dest->bs) {
@@ -912,6 +915,8 @@ out:
     if (src->bs != dest->bs) {
         bdrv_dirty_bitmaps_unlock(src->bs);
     }
+
+    return ret;
 }
 
 /**
diff --git a/block/monitor/bitmap-qmp-cmds.c b/block/monitor/bitmap-qmp-cmds.c
index 9f11deec64..83970b22fa 100644
--- a/block/monitor/bitmap-qmp-cmds.c
+++ b/block/monitor/bitmap-qmp-cmds.c
@@ -259,7 +259,6 @@ BdrvDirtyBitmap *block_dirty_bitmap_merge(const char *node, const char *target,
     BlockDriverState *bs;
     BdrvDirtyBitmap *dst, *src, *anon;
     BlockDirtyBitmapMergeSourceList *lst;
-    Error *local_err = NULL;
 
     dst = block_dirty_bitmap_lookup(node, target, &bs, errp);
     if (!dst) {
@@ -297,9 +296,7 @@ BdrvDirtyBitmap *block_dirty_bitmap_merge(const char *node, const char *target,
             abort();
         }
 
-        bdrv_merge_dirty_bitmap(anon, src, NULL, &local_err);
-        if (local_err) {
-            error_propagate(errp, local_err);
+        if (!bdrv_merge_dirty_bitmap(anon, src, NULL, errp)) {
             dst = NULL;
             goto out;
         }
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v3 03/19] block/block-copy: block_copy_state_new(): add bitmap parameter
  2021-12-22 17:39 [PATCH v3 00/19] Make image fleecing more usable Vladimir Sementsov-Ogievskiy
  2021-12-22 17:40 ` [PATCH v3 01/19] block/block-copy: move copy_bitmap initialization to block_copy_state_new() Vladimir Sementsov-Ogievskiy
  2021-12-22 17:40 ` [PATCH v3 02/19] block/dirty-bitmap: bdrv_merge_dirty_bitmap(): add return value Vladimir Sementsov-Ogievskiy
@ 2021-12-22 17:40 ` Vladimir Sementsov-Ogievskiy
  2022-01-14 16:58   ` Hanna Reitz
  2021-12-22 17:40 ` [PATCH v3 04/19] block/copy-before-write: add bitmap open parameter Vladimir Sementsov-Ogievskiy
                   ` (15 subsequent siblings)
  18 siblings, 1 reply; 38+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2021-12-22 17:40 UTC (permalink / raw)
  To: qemu-block
  Cc: qemu-devel, armbru, xiechanglong.d, wencongyang2, eblake, hreitz,
	kwolf, vsementsov, jsnow, nikita.lapshin

This will be used in the following commit to bring "incremental" mode
to copy-before-write filter.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 include/block/block-copy.h |  2 +-
 block/block-copy.c         | 14 ++++++++++++--
 block/copy-before-write.c  |  2 +-
 3 files changed, 14 insertions(+), 4 deletions(-)

diff --git a/include/block/block-copy.h b/include/block/block-copy.h
index 99370fa38b..8da4cec1b6 100644
--- a/include/block/block-copy.h
+++ b/include/block/block-copy.h
@@ -25,7 +25,7 @@ typedef struct BlockCopyState BlockCopyState;
 typedef struct BlockCopyCallState BlockCopyCallState;
 
 BlockCopyState *block_copy_state_new(BdrvChild *source, BdrvChild *target,
-                                     Error **errp);
+                                     BdrvDirtyBitmap *bitmap, Error **errp);
 
 /* Function should be called prior any actual copy request */
 void block_copy_set_copy_opts(BlockCopyState *s, bool use_copy_range,
diff --git a/block/block-copy.c b/block/block-copy.c
index abda7a80bd..f6345e3a4c 100644
--- a/block/block-copy.c
+++ b/block/block-copy.c
@@ -384,8 +384,9 @@ static int64_t block_copy_calculate_cluster_size(BlockDriverState *target,
 }
 
 BlockCopyState *block_copy_state_new(BdrvChild *source, BdrvChild *target,
-                                     Error **errp)
+                                     BdrvDirtyBitmap *bitmap, Error **errp)
 {
+    ERRP_GUARD();
     BlockCopyState *s;
     int64_t cluster_size;
     BdrvDirtyBitmap *copy_bitmap;
@@ -402,7 +403,16 @@ BlockCopyState *block_copy_state_new(BdrvChild *source, BdrvChild *target,
         return NULL;
     }
     bdrv_disable_dirty_bitmap(copy_bitmap);
-    bdrv_set_dirty_bitmap(copy_bitmap, 0, bdrv_dirty_bitmap_size(copy_bitmap));
+    if (bitmap) {
+        if (!bdrv_merge_dirty_bitmap(copy_bitmap, bitmap, NULL, errp)) {
+            error_prepend(errp, "Failed to merge bitmap '%s' to internal "
+                          "copy-bitmap: ", bdrv_dirty_bitmap_name(bitmap));
+            return NULL;
+        }
+    } else {
+        bdrv_set_dirty_bitmap(copy_bitmap, 0,
+                              bdrv_dirty_bitmap_size(copy_bitmap));
+    }
 
     /*
      * If source is in backing chain of target assume that target is going to be
diff --git a/block/copy-before-write.c b/block/copy-before-write.c
index 5bdaf0a9d9..799223e3fb 100644
--- a/block/copy-before-write.c
+++ b/block/copy-before-write.c
@@ -170,7 +170,7 @@ static int cbw_open(BlockDriverState *bs, QDict *options, int flags,
             ((BDRV_REQ_FUA | BDRV_REQ_MAY_UNMAP | BDRV_REQ_NO_FALLBACK) &
              bs->file->bs->supported_zero_flags);
 
-    s->bcs = block_copy_state_new(bs->file, s->target, errp);
+    s->bcs = block_copy_state_new(bs->file, s->target, NULL, errp);
     if (!s->bcs) {
         error_prepend(errp, "Cannot create block-copy-state: ");
         return -EINVAL;
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v3 04/19] block/copy-before-write: add bitmap open parameter
  2021-12-22 17:39 [PATCH v3 00/19] Make image fleecing more usable Vladimir Sementsov-Ogievskiy
                   ` (2 preceding siblings ...)
  2021-12-22 17:40 ` [PATCH v3 03/19] block/block-copy: block_copy_state_new(): add bitmap parameter Vladimir Sementsov-Ogievskiy
@ 2021-12-22 17:40 ` Vladimir Sementsov-Ogievskiy
  2022-01-14 17:47   ` Hanna Reitz
  2021-12-22 17:40 ` [PATCH v3 05/19] block/block-copy: add block_copy_reset() Vladimir Sementsov-Ogievskiy
                   ` (14 subsequent siblings)
  18 siblings, 1 reply; 38+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2021-12-22 17:40 UTC (permalink / raw)
  To: qemu-block
  Cc: qemu-devel, armbru, xiechanglong.d, wencongyang2, eblake, hreitz,
	kwolf, vsementsov, jsnow, nikita.lapshin

This brings "incremental" mode to copy-before-write filter: user can
specify bitmap so that filter will copy only "dirty" areas.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 qapi/block-core.json      | 10 +++++++++-
 block/copy-before-write.c | 30 +++++++++++++++++++++++++++++-
 2 files changed, 38 insertions(+), 2 deletions(-)

diff --git a/qapi/block-core.json b/qapi/block-core.json
index 1d3dd9cb48..6904daeacf 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -4167,11 +4167,19 @@
 #
 # @target: The target for copy-before-write operations.
 #
+# @bitmap: If specified, copy-before-write filter will do
+#          copy-before-write operations only for dirty regions of the
+#          bitmap. Bitmap size must be equal to length of file and
+#          target child of the filter. Note also, that bitmap is used
+#          only to initialize internal bitmap of the process, so further
+#          modifications (or removing) of specified bitmap doesn't
+#          influence the filter.
+#
 # Since: 6.2
 ##
 { 'struct': 'BlockdevOptionsCbw',
   'base': 'BlockdevOptionsGenericFormat',
-  'data': { 'target': 'BlockdevRef' } }
+  'data': { 'target': 'BlockdevRef', '*bitmap': 'BlockDirtyBitmap' } }
 
 ##
 # @BlockdevOptions:
diff --git a/block/copy-before-write.c b/block/copy-before-write.c
index 799223e3fb..4cd90d22df 100644
--- a/block/copy-before-write.c
+++ b/block/copy-before-write.c
@@ -149,6 +149,7 @@ static int cbw_open(BlockDriverState *bs, QDict *options, int flags,
                     Error **errp)
 {
     BDRVCopyBeforeWriteState *s = bs->opaque;
+    BdrvDirtyBitmap *bitmap = NULL;
 
     bs->file = bdrv_open_child(NULL, options, "file", bs, &child_of_bds,
                                BDRV_CHILD_FILTERED | BDRV_CHILD_PRIMARY,
@@ -163,6 +164,33 @@ static int cbw_open(BlockDriverState *bs, QDict *options, int flags,
         return -EINVAL;
     }
 
+    if (qdict_haskey(options, "bitmap.node") ||
+        qdict_haskey(options, "bitmap.name"))
+    {
+        const char *bitmap_node, *bitmap_name;
+
+        if (!qdict_haskey(options, "bitmap.node")) {
+            error_setg(errp, "bitmap.node is not specified");
+            return -EINVAL;
+        }
+
+        if (!qdict_haskey(options, "bitmap.name")) {
+            error_setg(errp, "bitmap.name is not specified");
+            return -EINVAL;
+        }
+
+        bitmap_node = qdict_get_str(options, "bitmap.node");
+        bitmap_name = qdict_get_str(options, "bitmap.name");
+        qdict_del(options, "bitmap.node");
+        qdict_del(options, "bitmap.name");
+
+        bitmap = block_dirty_bitmap_lookup(bitmap_node, bitmap_name, NULL,
+                                           errp);
+        if (!bitmap) {
+            return -EINVAL;
+        }
+    }
+
     bs->total_sectors = bs->file->bs->total_sectors;
     bs->supported_write_flags = BDRV_REQ_WRITE_UNCHANGED |
             (BDRV_REQ_FUA & bs->file->bs->supported_write_flags);
@@ -170,7 +198,7 @@ static int cbw_open(BlockDriverState *bs, QDict *options, int flags,
             ((BDRV_REQ_FUA | BDRV_REQ_MAY_UNMAP | BDRV_REQ_NO_FALLBACK) &
              bs->file->bs->supported_zero_flags);
 
-    s->bcs = block_copy_state_new(bs->file, s->target, NULL, errp);
+    s->bcs = block_copy_state_new(bs->file, s->target, bitmap, errp);
     if (!s->bcs) {
         error_prepend(errp, "Cannot create block-copy-state: ");
         return -EINVAL;
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v3 05/19] block/block-copy: add block_copy_reset()
  2021-12-22 17:39 [PATCH v3 00/19] Make image fleecing more usable Vladimir Sementsov-Ogievskiy
                   ` (3 preceding siblings ...)
  2021-12-22 17:40 ` [PATCH v3 04/19] block/copy-before-write: add bitmap open parameter Vladimir Sementsov-Ogievskiy
@ 2021-12-22 17:40 ` Vladimir Sementsov-Ogievskiy
  2022-01-14 17:51   ` Hanna Reitz
  2021-12-22 17:40 ` [PATCH v3 06/19] block: intoduce reqlist Vladimir Sementsov-Ogievskiy
                   ` (13 subsequent siblings)
  18 siblings, 1 reply; 38+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2021-12-22 17:40 UTC (permalink / raw)
  To: qemu-block
  Cc: qemu-devel, armbru, xiechanglong.d, wencongyang2, eblake, hreitz,
	kwolf, vsementsov, jsnow, nikita.lapshin

Split block_copy_reset() out of block_copy_reset_unallocated() to be
used separately later.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 include/block/block-copy.h |  1 +
 block/block-copy.c         | 21 +++++++++++++--------
 2 files changed, 14 insertions(+), 8 deletions(-)

diff --git a/include/block/block-copy.h b/include/block/block-copy.h
index 8da4cec1b6..a11e1620f6 100644
--- a/include/block/block-copy.h
+++ b/include/block/block-copy.h
@@ -34,6 +34,7 @@ void block_copy_set_progress_meter(BlockCopyState *s, ProgressMeter *pm);
 
 void block_copy_state_free(BlockCopyState *s);
 
+void block_copy_reset(BlockCopyState *s, int64_t offset, int64_t bytes);
 int64_t block_copy_reset_unallocated(BlockCopyState *s,
                                      int64_t offset, int64_t *count);
 
diff --git a/block/block-copy.c b/block/block-copy.c
index f6345e3a4c..6228cf01d4 100644
--- a/block/block-copy.c
+++ b/block/block-copy.c
@@ -690,6 +690,18 @@ static int block_copy_is_cluster_allocated(BlockCopyState *s, int64_t offset,
     }
 }
 
+void block_copy_reset(BlockCopyState *s, int64_t offset, int64_t bytes)
+{
+    QEMU_LOCK_GUARD(&s->lock);
+
+    bdrv_reset_dirty_bitmap(s->copy_bitmap, offset, bytes);
+    if (s->progress) {
+        progress_set_remaining(s->progress,
+                               bdrv_get_dirty_count(s->copy_bitmap) +
+                               s->in_flight_bytes);
+    }
+}
+
 /*
  * Reset bits in copy_bitmap starting at offset if they represent unallocated
  * data in the image. May reset subsequent contiguous bits.
@@ -710,14 +722,7 @@ int64_t block_copy_reset_unallocated(BlockCopyState *s,
     bytes = clusters * s->cluster_size;
 
     if (!ret) {
-        qemu_co_mutex_lock(&s->lock);
-        bdrv_reset_dirty_bitmap(s->copy_bitmap, offset, bytes);
-        if (s->progress) {
-            progress_set_remaining(s->progress,
-                                   bdrv_get_dirty_count(s->copy_bitmap) +
-                                   s->in_flight_bytes);
-        }
-        qemu_co_mutex_unlock(&s->lock);
+        block_copy_reset(s, offset, bytes);
     }
 
     *count = bytes;
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v3 06/19] block: intoduce reqlist
  2021-12-22 17:39 [PATCH v3 00/19] Make image fleecing more usable Vladimir Sementsov-Ogievskiy
                   ` (4 preceding siblings ...)
  2021-12-22 17:40 ` [PATCH v3 05/19] block/block-copy: add block_copy_reset() Vladimir Sementsov-Ogievskiy
@ 2021-12-22 17:40 ` Vladimir Sementsov-Ogievskiy
  2022-01-14 18:20   ` Hanna Reitz
  2021-12-22 17:40 ` [PATCH v3 07/19] block/dirty-bitmap: introduce bdrv_dirty_bitmap_status() Vladimir Sementsov-Ogievskiy
                   ` (12 subsequent siblings)
  18 siblings, 1 reply; 38+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2021-12-22 17:40 UTC (permalink / raw)
  To: qemu-block
  Cc: qemu-devel, armbru, xiechanglong.d, wencongyang2, eblake, hreitz,
	kwolf, vsementsov, jsnow, nikita.lapshin

Split intersecting-requests functionality out of block-copy to be
reused in copy-before-write filter.

Note: while being here, fix tiny typo in MAINTAINERS.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 include/block/reqlist.h |  67 +++++++++++++++++++++++
 block/block-copy.c      | 116 +++++++++++++---------------------------
 block/reqlist.c         |  76 ++++++++++++++++++++++++++
 MAINTAINERS             |   4 +-
 block/meson.build       |   1 +
 5 files changed, 184 insertions(+), 80 deletions(-)
 create mode 100644 include/block/reqlist.h
 create mode 100644 block/reqlist.c

diff --git a/include/block/reqlist.h b/include/block/reqlist.h
new file mode 100644
index 0000000000..b904d80216
--- /dev/null
+++ b/include/block/reqlist.h
@@ -0,0 +1,67 @@
+/*
+ * reqlist API
+ *
+ * Copyright (C) 2013 Proxmox Server Solutions
+ * Copyright (c) 2021 Virtuozzo International GmbH.
+ *
+ * Authors:
+ *  Dietmar Maurer (dietmar@proxmox.com)
+ *  Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#ifndef REQLIST_H
+#define REQLIST_H
+
+#include "qemu/coroutine.h"
+
+/*
+ * The API is not thread-safe and shouldn't be. The struct is public to be part
+ * of other structures and protected by third-party locks, see
+ * block/block-copy.c for example.
+ */
+
+typedef struct BlockReq {
+    int64_t offset;
+    int64_t bytes;
+
+    CoQueue wait_queue; /* coroutines blocked on this req */
+    QLIST_ENTRY(BlockReq) list;
+} BlockReq;
+
+typedef QLIST_HEAD(, BlockReq) BlockReqList;
+
+/*
+ * Initialize new request and add it to the list. Caller should be sure that
+ * there are no conflicting requests in the list.
+ */
+void reqlist_init_req(BlockReqList *reqs, BlockReq *req, int64_t offset,
+                      int64_t bytes);
+/* Search for request in the list intersecting with @offset/@bytes area. */
+BlockReq *reqlist_find_conflict(BlockReqList *reqs, int64_t offset,
+                                int64_t bytes);
+
+/*
+ * If there are no intersecting requests return false. Otherwise, wait for the
+ * first found intersecting request to finish and return true.
+ *
+ * @lock is passed to qemu_co_queue_wait()
+ * False return value proves that lock was NOT released.
+ */
+bool coroutine_fn reqlist_wait_one(BlockReqList *reqs, int64_t offset,
+                                   int64_t bytes, CoMutex *lock);
+
+/*
+ * Shrink request and wake all waiting coroutines (may be some of them are not
+ * intersecting with shrunk request).
+ */
+void coroutine_fn reqlist_shrink_req(BlockReq *req, int64_t new_bytes);
+
+/*
+ * Remove request and wake all waiting coroutines. Do not release any memory.
+ */
+void coroutine_fn reqlist_remove_req(BlockReq *req);
+
+#endif /* REQLIST_H */
diff --git a/block/block-copy.c b/block/block-copy.c
index 6228cf01d4..f70f1ad993 100644
--- a/block/block-copy.c
+++ b/block/block-copy.c
@@ -17,6 +17,7 @@
 #include "trace.h"
 #include "qapi/error.h"
 #include "block/block-copy.h"
+#include "block/reqlist.h"
 #include "sysemu/block-backend.h"
 #include "qemu/units.h"
 #include "qemu/coroutine.h"
@@ -83,7 +84,6 @@ typedef struct BlockCopyTask {
      */
     BlockCopyState *s;
     BlockCopyCallState *call_state;
-    int64_t offset;
     /*
      * @method can also be set again in the while loop of
      * block_copy_dirty_clusters(), but it is never accessed concurrently
@@ -94,21 +94,17 @@ typedef struct BlockCopyTask {
     BlockCopyMethod method;
 
     /*
-     * Fields whose state changes throughout the execution
-     * Protected by lock in BlockCopyState.
+     * Generally, req is protected by lock in BlockCopyState, Still req.offset
+     * is only set on task creation, so may be read concurrently after creation.
+     * req.bytes is changed at most once, and need only protecting the case of
+     * parallel read while updating @bytes value in block_copy_task_shrink().
      */
-    CoQueue wait_queue; /* coroutines blocked on this task */
-    /*
-     * Only protect the case of parallel read while updating @bytes
-     * value in block_copy_task_shrink().
-     */
-    int64_t bytes;
-    QLIST_ENTRY(BlockCopyTask) list;
+    BlockReq req;
 } BlockCopyTask;
 
 static int64_t task_end(BlockCopyTask *task)
 {
-    return task->offset + task->bytes;
+    return task->req.offset + task->req.bytes;
 }
 
 typedef struct BlockCopyState {
@@ -136,7 +132,7 @@ typedef struct BlockCopyState {
     CoMutex lock;
     int64_t in_flight_bytes;
     BlockCopyMethod method;
-    QLIST_HEAD(, BlockCopyTask) tasks; /* All tasks from all block-copy calls */
+    BlockReqList reqs;
     QLIST_HEAD(, BlockCopyCallState) calls;
     /*
      * skip_unallocated:
@@ -160,42 +156,6 @@ typedef struct BlockCopyState {
     RateLimit rate_limit;
 } BlockCopyState;
 
-/* Called with lock held */
-static BlockCopyTask *find_conflicting_task(BlockCopyState *s,
-                                            int64_t offset, int64_t bytes)
-{
-    BlockCopyTask *t;
-
-    QLIST_FOREACH(t, &s->tasks, list) {
-        if (offset + bytes > t->offset && offset < t->offset + t->bytes) {
-            return t;
-        }
-    }
-
-    return NULL;
-}
-
-/*
- * If there are no intersecting tasks return false. Otherwise, wait for the
- * first found intersecting tasks to finish and return true.
- *
- * Called with lock held. May temporary release the lock.
- * Return value of 0 proves that lock was NOT released.
- */
-static bool coroutine_fn block_copy_wait_one(BlockCopyState *s, int64_t offset,
-                                             int64_t bytes)
-{
-    BlockCopyTask *task = find_conflicting_task(s, offset, bytes);
-
-    if (!task) {
-        return false;
-    }
-
-    qemu_co_queue_wait(&task->wait_queue, &s->lock);
-
-    return true;
-}
-
 /* Called with lock held */
 static int64_t block_copy_chunk_size(BlockCopyState *s)
 {
@@ -239,7 +199,7 @@ block_copy_task_create(BlockCopyState *s, BlockCopyCallState *call_state,
     bytes = QEMU_ALIGN_UP(bytes, s->cluster_size);
 
     /* region is dirty, so no existent tasks possible in it */
-    assert(!find_conflicting_task(s, offset, bytes));
+    assert(!reqlist_find_conflict(&s->reqs, offset, bytes));
 
     bdrv_reset_dirty_bitmap(s->copy_bitmap, offset, bytes);
     s->in_flight_bytes += bytes;
@@ -249,12 +209,9 @@ block_copy_task_create(BlockCopyState *s, BlockCopyCallState *call_state,
         .task.func = block_copy_task_entry,
         .s = s,
         .call_state = call_state,
-        .offset = offset,
-        .bytes = bytes,
         .method = s->method,
     };
-    qemu_co_queue_init(&task->wait_queue);
-    QLIST_INSERT_HEAD(&s->tasks, task, list);
+    reqlist_init_req(&s->reqs, &task->req, offset, bytes);
 
     return task;
 }
@@ -270,34 +227,34 @@ static void coroutine_fn block_copy_task_shrink(BlockCopyTask *task,
                                                 int64_t new_bytes)
 {
     QEMU_LOCK_GUARD(&task->s->lock);
-    if (new_bytes == task->bytes) {
+    if (new_bytes == task->req.bytes) {
         return;
     }
 
-    assert(new_bytes > 0 && new_bytes < task->bytes);
+    assert(new_bytes > 0 && new_bytes < task->req.bytes);
 
-    task->s->in_flight_bytes -= task->bytes - new_bytes;
+    task->s->in_flight_bytes -= task->req.bytes - new_bytes;
     bdrv_set_dirty_bitmap(task->s->copy_bitmap,
-                          task->offset + new_bytes, task->bytes - new_bytes);
+                          task->req.offset + new_bytes,
+                          task->req.bytes - new_bytes);
 
-    task->bytes = new_bytes;
-    qemu_co_queue_restart_all(&task->wait_queue);
+    reqlist_shrink_req(&task->req, new_bytes);
 }
 
 static void coroutine_fn block_copy_task_end(BlockCopyTask *task, int ret)
 {
     QEMU_LOCK_GUARD(&task->s->lock);
-    task->s->in_flight_bytes -= task->bytes;
+    task->s->in_flight_bytes -= task->req.bytes;
     if (ret < 0) {
-        bdrv_set_dirty_bitmap(task->s->copy_bitmap, task->offset, task->bytes);
+        bdrv_set_dirty_bitmap(task->s->copy_bitmap, task->req.offset,
+                              task->req.bytes);
     }
-    QLIST_REMOVE(task, list);
     if (task->s->progress) {
         progress_set_remaining(task->s->progress,
                                bdrv_get_dirty_count(task->s->copy_bitmap) +
                                task->s->in_flight_bytes);
     }
-    qemu_co_queue_restart_all(&task->wait_queue);
+    reqlist_remove_req(&task->req);
 }
 
 void block_copy_state_free(BlockCopyState *s)
@@ -448,7 +405,7 @@ BlockCopyState *block_copy_state_new(BdrvChild *source, BdrvChild *target,
 
     ratelimit_init(&s->rate_limit);
     qemu_co_mutex_init(&s->lock);
-    QLIST_INIT(&s->tasks);
+    QLIST_INIT(&s->reqs);
     QLIST_INIT(&s->calls);
 
     return s;
@@ -481,7 +438,7 @@ static coroutine_fn int block_copy_task_run(AioTaskPool *pool,
 
     aio_task_pool_wait_slot(pool);
     if (aio_task_pool_status(pool) < 0) {
-        co_put_to_shres(task->s->mem, task->bytes);
+        co_put_to_shres(task->s->mem, task->req.bytes);
         block_copy_task_end(task, -ECANCELED);
         g_free(task);
         return -ECANCELED;
@@ -594,7 +551,8 @@ static coroutine_fn int block_copy_task_entry(AioTask *task)
     BlockCopyMethod method = t->method;
     int ret;
 
-    ret = block_copy_do_copy(s, t->offset, t->bytes, &method, &error_is_read);
+    ret = block_copy_do_copy(s, t->req.offset, t->req.bytes, &method,
+                             &error_is_read);
 
     WITH_QEMU_LOCK_GUARD(&s->lock) {
         if (s->method == t->method) {
@@ -607,10 +565,10 @@ static coroutine_fn int block_copy_task_entry(AioTask *task)
                 t->call_state->error_is_read = error_is_read;
             }
         } else if (s->progress) {
-            progress_work_done(s->progress, t->bytes);
+            progress_work_done(s->progress, t->req.bytes);
         }
     }
-    co_put_to_shres(s->mem, t->bytes);
+    co_put_to_shres(s->mem, t->req.bytes);
     block_copy_task_end(t, ret);
 
     return ret;
@@ -769,22 +727,22 @@ block_copy_dirty_clusters(BlockCopyCallState *call_state)
             trace_block_copy_skip_range(s, offset, bytes);
             break;
         }
-        if (task->offset > offset) {
-            trace_block_copy_skip_range(s, offset, task->offset - offset);
+        if (task->req.offset > offset) {
+            trace_block_copy_skip_range(s, offset, task->req.offset - offset);
         }
 
         found_dirty = true;
 
-        ret = block_copy_block_status(s, task->offset, task->bytes,
+        ret = block_copy_block_status(s, task->req.offset, task->req.bytes,
                                       &status_bytes);
         assert(ret >= 0); /* never fail */
-        if (status_bytes < task->bytes) {
+        if (status_bytes < task->req.bytes) {
             block_copy_task_shrink(task, status_bytes);
         }
         if (qatomic_read(&s->skip_unallocated) &&
             !(ret & BDRV_BLOCK_ALLOCATED)) {
             block_copy_task_end(task, 0);
-            trace_block_copy_skip_range(s, task->offset, task->bytes);
+            trace_block_copy_skip_range(s, task->req.offset, task->req.bytes);
             offset = task_end(task);
             bytes = end - offset;
             g_free(task);
@@ -805,11 +763,11 @@ block_copy_dirty_clusters(BlockCopyCallState *call_state)
             }
         }
 
-        ratelimit_calculate_delay(&s->rate_limit, task->bytes);
+        ratelimit_calculate_delay(&s->rate_limit, task->req.bytes);
 
-        trace_block_copy_process(s, task->offset);
+        trace_block_copy_process(s, task->req.offset);
 
-        co_get_from_shres(s->mem, task->bytes);
+        co_get_from_shres(s->mem, task->req.bytes);
 
         offset = task_end(task);
         bytes = end - offset;
@@ -877,8 +835,8 @@ static int coroutine_fn block_copy_common(BlockCopyCallState *call_state)
                  * Check that there is no task we still need to
                  * wait to complete
                  */
-                ret = block_copy_wait_one(s, call_state->offset,
-                                          call_state->bytes);
+                ret = reqlist_wait_one(&s->reqs, call_state->offset,
+                                       call_state->bytes, &s->lock);
                 if (ret == 0) {
                     /*
                      * No pending tasks, but check again the bitmap in this
@@ -886,7 +844,7 @@ static int coroutine_fn block_copy_common(BlockCopyCallState *call_state)
                      * between this and the critical section in
                      * block_copy_dirty_clusters().
                      *
-                     * block_copy_wait_one return value 0 also means that it
+                     * reqlist_wait_one return value 0 also means that it
                      * didn't release the lock. So, we are still in the same
                      * critical section, not interrupted by any concurrent
                      * access to state.
diff --git a/block/reqlist.c b/block/reqlist.c
new file mode 100644
index 0000000000..5e320ba649
--- /dev/null
+++ b/block/reqlist.c
@@ -0,0 +1,76 @@
+/*
+ * reqlist API
+ *
+ * Copyright (C) 2013 Proxmox Server Solutions
+ * Copyright (c) 2021 Virtuozzo International GmbH.
+ *
+ * Authors:
+ *  Dietmar Maurer (dietmar@proxmox.com)
+ *  Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+
+#include "block/reqlist.h"
+
+void reqlist_init_req(BlockReqList *reqs, BlockReq *req, int64_t offset,
+                      int64_t bytes)
+{
+    assert(!reqlist_find_conflict(reqs, offset, bytes));
+
+    *req = (BlockReq) {
+        .offset = offset,
+        .bytes = bytes,
+    };
+    qemu_co_queue_init(&req->wait_queue);
+    QLIST_INSERT_HEAD(reqs, req, list);
+}
+
+BlockReq *reqlist_find_conflict(BlockReqList *reqs, int64_t offset,
+                                int64_t bytes)
+{
+    BlockReq *r;
+
+    QLIST_FOREACH(r, reqs, list) {
+        if (offset + bytes > r->offset && offset < r->offset + r->bytes) {
+            return r;
+        }
+    }
+
+    return NULL;
+}
+
+bool coroutine_fn reqlist_wait_one(BlockReqList *reqs, int64_t offset,
+                                   int64_t bytes, CoMutex *lock)
+{
+    BlockReq *r = reqlist_find_conflict(reqs, offset, bytes);
+
+    if (!r) {
+        return false;
+    }
+
+    qemu_co_queue_wait(&r->wait_queue, lock);
+
+    return true;
+}
+
+void coroutine_fn reqlist_shrink_req(BlockReq *req, int64_t new_bytes)
+{
+    if (new_bytes == req->bytes) {
+        return;
+    }
+
+    assert(new_bytes > 0 && new_bytes < req->bytes);
+
+    req->bytes = new_bytes;
+    qemu_co_queue_restart_all(&req->wait_queue);
+}
+
+void coroutine_fn reqlist_remove_req(BlockReq *req)
+{
+    QLIST_REMOVE(req, list);
+    qemu_co_queue_restart_all(&req->wait_queue);
+}
diff --git a/MAINTAINERS b/MAINTAINERS
index 5dcefc0d01..7f24ee4b92 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2418,7 +2418,9 @@ F: block/stream.c
 F: block/mirror.c
 F: qapi/job.json
 F: block/block-copy.c
-F: include/block/block-copy.c
+F: include/block/block-copy.h
+F: block/reqlist.c
+F: include/block/reqlist.h
 F: block/copy-before-write.h
 F: block/copy-before-write.c
 F: include/block/aio_task.h
diff --git a/block/meson.build b/block/meson.build
index deb73ca389..5065cf33ba 100644
--- a/block/meson.build
+++ b/block/meson.build
@@ -32,6 +32,7 @@ block_ss.add(files(
   'qcow2.c',
   'quorum.c',
   'raw-format.c',
+  'reqlist.c',
   'snapshot.c',
   'throttle-groups.c',
   'throttle.c',
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v3 07/19] block/dirty-bitmap: introduce bdrv_dirty_bitmap_status()
  2021-12-22 17:39 [PATCH v3 00/19] Make image fleecing more usable Vladimir Sementsov-Ogievskiy
                   ` (5 preceding siblings ...)
  2021-12-22 17:40 ` [PATCH v3 06/19] block: intoduce reqlist Vladimir Sementsov-Ogievskiy
@ 2021-12-22 17:40 ` Vladimir Sementsov-Ogievskiy
  2022-01-17 10:06   ` Nikta Lapshin
  2022-01-18 13:31   ` Hanna Reitz
  2021-12-22 17:40 ` [PATCH v3 08/19] block/reqlist: add reqlist_wait_all() Vladimir Sementsov-Ogievskiy
                   ` (11 subsequent siblings)
  18 siblings, 2 replies; 38+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2021-12-22 17:40 UTC (permalink / raw)
  To: qemu-block
  Cc: qemu-devel, armbru, xiechanglong.d, wencongyang2, eblake, hreitz,
	kwolf, vsementsov, jsnow, nikita.lapshin

Add a convenient function similar with bdrv_block_status() to get
status of dirty bitmap.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 include/block/dirty-bitmap.h |  2 ++
 include/qemu/hbitmap.h       | 11 +++++++++++
 block/dirty-bitmap.c         |  6 ++++++
 util/hbitmap.c               | 36 ++++++++++++++++++++++++++++++++++++
 4 files changed, 55 insertions(+)

diff --git a/include/block/dirty-bitmap.h b/include/block/dirty-bitmap.h
index f95d350b70..2ae7dc3d1d 100644
--- a/include/block/dirty-bitmap.h
+++ b/include/block/dirty-bitmap.h
@@ -115,6 +115,8 @@ int64_t bdrv_dirty_bitmap_next_zero(BdrvDirtyBitmap *bitmap, int64_t offset,
 bool bdrv_dirty_bitmap_next_dirty_area(BdrvDirtyBitmap *bitmap,
         int64_t start, int64_t end, int64_t max_dirty_count,
         int64_t *dirty_start, int64_t *dirty_count);
+void bdrv_dirty_bitmap_status(BdrvDirtyBitmap *bitmap, int64_t offset,
+                              int64_t bytes, bool *is_dirty, int64_t *count);
 BdrvDirtyBitmap *bdrv_reclaim_dirty_bitmap_locked(BdrvDirtyBitmap *bitmap,
                                                   Error **errp);
 
diff --git a/include/qemu/hbitmap.h b/include/qemu/hbitmap.h
index 5e71b6d6f7..845fda12db 100644
--- a/include/qemu/hbitmap.h
+++ b/include/qemu/hbitmap.h
@@ -340,6 +340,17 @@ bool hbitmap_next_dirty_area(const HBitmap *hb, int64_t start, int64_t end,
                              int64_t max_dirty_count,
                              int64_t *dirty_start, int64_t *dirty_count);
 
+/*
+ * bdrv_dirty_bitmap_status:
+ * @hb: The HBitmap to operate on
+ * @start: the offset to start from
+ * @end: end of requested area
+ * @is_dirty: is bitmap dirty at @offset
+ * @pnum: how many bits has same value starting from @offset
+ */
+void hbitmap_status(const HBitmap *hb, int64_t offset, int64_t bytes,
+                    bool *is_dirty, int64_t *pnum);
+
 /**
  * hbitmap_iter_next:
  * @hbi: HBitmapIter to operate on.
diff --git a/block/dirty-bitmap.c b/block/dirty-bitmap.c
index 94a0276833..e4a836749a 100644
--- a/block/dirty-bitmap.c
+++ b/block/dirty-bitmap.c
@@ -875,6 +875,12 @@ bool bdrv_dirty_bitmap_next_dirty_area(BdrvDirtyBitmap *bitmap,
                                    dirty_start, dirty_count);
 }
 
+void bdrv_dirty_bitmap_status(BdrvDirtyBitmap *bitmap, int64_t offset,
+                              int64_t bytes, bool *is_dirty, int64_t *count)
+{
+    hbitmap_status(bitmap->bitmap, offset, bytes, is_dirty, count);
+}
+
 /**
  * bdrv_merge_dirty_bitmap: merge src into dest.
  * Ensures permissions on bitmaps are reasonable; use for public API.
diff --git a/util/hbitmap.c b/util/hbitmap.c
index 305b894a63..ae8d0eb4d2 100644
--- a/util/hbitmap.c
+++ b/util/hbitmap.c
@@ -301,6 +301,42 @@ bool hbitmap_next_dirty_area(const HBitmap *hb, int64_t start, int64_t end,
     return true;
 }
 
+void hbitmap_status(const HBitmap *hb, int64_t start, int64_t count,
+                    bool *is_dirty, int64_t *pnum)
+{
+    int64_t next_dirty, next_zero;
+
+    assert(start >= 0);
+    assert(count > 0);
+    assert(start + count <= hb->orig_size);
+
+    next_dirty = hbitmap_next_dirty(hb, start, count);
+    if (next_dirty == -1) {
+        *pnum = count;
+        *is_dirty = false;
+        return;
+    }
+
+    if (next_dirty > start) {
+        *pnum = next_dirty - start;
+        *is_dirty = false;
+        return;
+    }
+
+    assert(next_dirty == start);
+
+    next_zero = hbitmap_next_zero(hb, start, count);
+    if (next_zero == -1) {
+        *pnum = count;
+        *is_dirty = true;
+        return;
+    }
+
+    assert(next_zero > start);
+    *pnum = next_zero - start;
+    *is_dirty = false;
+}
+
 bool hbitmap_empty(const HBitmap *hb)
 {
     return hb->count == 0;
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v3 08/19] block/reqlist: add reqlist_wait_all()
  2021-12-22 17:39 [PATCH v3 00/19] Make image fleecing more usable Vladimir Sementsov-Ogievskiy
                   ` (6 preceding siblings ...)
  2021-12-22 17:40 ` [PATCH v3 07/19] block/dirty-bitmap: introduce bdrv_dirty_bitmap_status() Vladimir Sementsov-Ogievskiy
@ 2021-12-22 17:40 ` Vladimir Sementsov-Ogievskiy
  2022-01-17 12:34   ` Nikta Lapshin
  2022-01-18 13:44   ` Hanna Reitz
  2021-12-22 17:40 ` [PATCH v3 09/19] block: introduce FleecingState class Vladimir Sementsov-Ogievskiy
                   ` (10 subsequent siblings)
  18 siblings, 2 replies; 38+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2021-12-22 17:40 UTC (permalink / raw)
  To: qemu-block
  Cc: qemu-devel, armbru, xiechanglong.d, wencongyang2, eblake, hreitz,
	kwolf, vsementsov, jsnow, nikita.lapshin

Add function to wait for all intersecting requests.
To be used in the further commit.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 include/block/reqlist.h | 8 ++++++++
 block/reqlist.c         | 8 ++++++++
 2 files changed, 16 insertions(+)

diff --git a/include/block/reqlist.h b/include/block/reqlist.h
index b904d80216..4695623bb3 100644
--- a/include/block/reqlist.h
+++ b/include/block/reqlist.h
@@ -53,6 +53,14 @@ BlockReq *reqlist_find_conflict(BlockReqList *reqs, int64_t offset,
 bool coroutine_fn reqlist_wait_one(BlockReqList *reqs, int64_t offset,
                                    int64_t bytes, CoMutex *lock);
 
+/*
+ * Wait for all intersecting requests. It just calls reqlist_wait_one() in a
+ * loops, caller is responsible to stop producing new requests in this region
+ * in parallel, otherwise reqlist_wait_all() may never return.
+ */
+void coroutine_fn reqlist_wait_all(BlockReqList *reqs, int64_t offset,
+                                   int64_t bytes, CoMutex *lock);
+
 /*
  * Shrink request and wake all waiting coroutines (may be some of them are not
  * intersecting with shrunk request).
diff --git a/block/reqlist.c b/block/reqlist.c
index 5e320ba649..52a362a1d8 100644
--- a/block/reqlist.c
+++ b/block/reqlist.c
@@ -57,6 +57,14 @@ bool coroutine_fn reqlist_wait_one(BlockReqList *reqs, int64_t offset,
     return true;
 }
 
+void coroutine_fn reqlist_wait_all(BlockReqList *reqs, int64_t offset,
+                                   int64_t bytes, CoMutex *lock)
+{
+    while (reqlist_wait_one(reqs, offset, bytes, lock)) {
+        /* continue */
+    }
+}
+
 void coroutine_fn reqlist_shrink_req(BlockReq *req, int64_t new_bytes)
 {
     if (new_bytes == req->bytes) {
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v3 09/19] block: introduce FleecingState class
  2021-12-22 17:39 [PATCH v3 00/19] Make image fleecing more usable Vladimir Sementsov-Ogievskiy
                   ` (7 preceding siblings ...)
  2021-12-22 17:40 ` [PATCH v3 08/19] block/reqlist: add reqlist_wait_all() Vladimir Sementsov-Ogievskiy
@ 2021-12-22 17:40 ` Vladimir Sementsov-Ogievskiy
  2022-01-18 16:37   ` Hanna Reitz
  2021-12-22 17:40 ` [PATCH v3 10/19] block: introduce fleecing block driver Vladimir Sementsov-Ogievskiy
                   ` (9 subsequent siblings)
  18 siblings, 1 reply; 38+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2021-12-22 17:40 UTC (permalink / raw)
  To: qemu-block
  Cc: qemu-devel, armbru, xiechanglong.d, wencongyang2, eblake, hreitz,
	kwolf, vsementsov, jsnow, nikita.lapshin

FleecingState represents state shared between copy-before-write filter
and upcoming fleecing block driver.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 block/fleecing.h  | 135 ++++++++++++++++++++++++++++++++++
 block/fleecing.c  | 182 ++++++++++++++++++++++++++++++++++++++++++++++
 MAINTAINERS       |   2 +
 block/meson.build |   1 +
 4 files changed, 320 insertions(+)
 create mode 100644 block/fleecing.h
 create mode 100644 block/fleecing.c

diff --git a/block/fleecing.h b/block/fleecing.h
new file mode 100644
index 0000000000..fb7b2f86c4
--- /dev/null
+++ b/block/fleecing.h
@@ -0,0 +1,135 @@
+/*
+ * FleecingState
+ *
+ * The common state of image fleecing, shared between copy-before-write filter
+ * and fleecing block driver.
+ *
+ * Copyright (c) 2021 Virtuozzo International GmbH.
+ *
+ * Author:
+ *  Sementsov-Ogievskiy Vladimir <vsementsov@virtuozzo.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ *
+ *
+ * Fleecing scheme looks as follows:
+ *
+ * [guest blk]                   [nbd export]
+ *    |                              |
+ *    |root                          |
+ *    v                              v
+ * [copy-before-write]--target-->[fleecing drv]
+ *    |                          /   |
+ *    |file                     /    |file
+ *    v                        /     v
+ * [active disk]<--source-----/  [temp disk]
+ *
+ * Note that "active disk" is also called just "source" and "temp disk" is also
+ * called "target".
+ *
+ * What happens here:
+ *
+ * copy-before-write filter performs copy-before-write operations: on guest
+ * write we should copy old data to target child before rewriting. Note that we
+ * write this data through fleecing driver: it saves a possibility to implement
+ * a kind of cache in fleecing driver in future.
+ *
+ * Fleecing user is nbd export: it can read from fleecing node, which guarantees
+ * a snapshot-view for fleecing user. Fleecing user may also do discard
+ * operations.
+ *
+ * FleecingState is responsible for most of the fleecing logic:
+ *
+ * 1. Fleecing read. Handle reads of fleecing user: we should decide where from
+ * to read, from source node or from copy-before-write target node. In former
+ * case we need to synchronize with guest writes. See fleecing_read_lock() and
+ * fleecing_read_unlock() functionality.
+ *
+ * 2. Guest write synchronization (part of [1] actually). See
+ * fleecing_mark_done_and_wait_readers()
+ *
+ * 3. Fleecing discard. Used by fleecing user when corresponding area is already
+ * copied. Fleecing user may discard the area which is not needed anymore, that
+ * should result in:
+ *   - discarding data to free disk space
+ *   - clear bits in copy-bitmap of block-copy, to avoid extra copy-before-write
+ *     operations
+ *   - clear bits in access-bitmap of FleecingState, to avoid further wrong
+ *     access
+ *
+ * Still, FleecingState doesn't own any block children, so all real io
+ * operations (reads, writes and discards) are done by copy-before-write filter
+ * and fleecing block driver.
+ */
+
+#ifndef FLEECING_H
+#define FLEECING_H
+
+#include "block/block_int.h"
+#include "block/block-copy.h"
+#include "block/reqlist.h"
+
+typedef struct FleecingState FleecingState;
+
+/*
+ * Create FleecingState.
+ *
+ * @bcs: link to block-copy owned by copy-before-write filter.
+ *
+ * @fleecing_node: should be fleecing block driver node. Used to create some
+ * bitmaps in it.
+ */
+FleecingState *fleecing_new(BlockCopyState *bcs,
+                            BlockDriverState *fleecing_node,
+                            Error **errp);
+
+/* Free the state. Doesn't free block-copy state (@bcs) */
+void fleecing_free(FleecingState *s);
+
+/*
+ * Convenient function for thous who want to do fleecing read.
+ *
+ * If requested region starts in "done" area, i.e. data is already copied to
+ * copy-before-write target node, req is set to NULL, pnum is set to available
+ * bytes to read from target. User is free to read @pnum bytes from target.
+ * Still, user is responsible for concurrent discards on target.
+ *
+ * If requests region starts in "not done" area, i.e. we have to read from
+ * source node directly, than @pnum bytes of source node are frozen and
+ * guaranteed not be rewritten until user calls cbw_snapshot_read_unlock().
+ *
+ * Returns 0 on success and -EACCES when try to read non-dirty area of
+ * access_bitmap.
+ */
+int fleecing_read_lock(FleecingState *f, int64_t offset,
+                       int64_t bytes, const BlockReq **req, int64_t *pnum);
+/* Called as closing pair for fleecing_read_lock() */
+void fleecing_read_unlock(FleecingState *f, const BlockReq *req);
+
+/*
+ * Called when fleecing user doesn't need the region anymore (for example the
+ * region is successfully read and backed up somewhere).
+ * This prevents extra copy-before-write operations in this area in future.
+ * Next fleecing read from this area will fail with -EACCES.
+ */
+void fleecing_discard(FleecingState *f, int64_t offset, int64_t bytes);
+
+/*
+ * Called by copy-before-write filter after successful copy-before-write
+ * operation to synchronize with parallel fleecing reads.
+ */
+void fleecing_mark_done_and_wait_readers(FleecingState *f, int64_t offset,
+                                         int64_t bytes);
+
+#endif /* FLEECING_H */
diff --git a/block/fleecing.c b/block/fleecing.c
new file mode 100644
index 0000000000..f75d11b892
--- /dev/null
+++ b/block/fleecing.c
@@ -0,0 +1,182 @@
+/*
+ * FleecingState
+ *
+ * The common state of image fleecing, shared between copy-before-write filter
+ * and fleecing block driver.
+ *
+ * Copyright (c) 2021 Virtuozzo International GmbH.
+ *
+ * Author:
+ *  Sementsov-Ogievskiy Vladimir <vsementsov@virtuozzo.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include "qemu/osdep.h"
+
+#include "sysemu/block-backend.h"
+#include "qemu/cutils.h"
+#include "qapi/error.h"
+#include "block/block_int.h"
+#include "block/coroutines.h"
+#include "block/qdict.h"
+#include "block/block-copy.h"
+#include "block/reqlist.h"
+
+#include "block/fleecing.h"
+
+/*
+ * @bcs: link to block-copy state owned by copy-before-write filter which
+ * performs copy-before-write operations in context of fleecing scheme.
+ * FleecingState doesn't own the block-copy state and don't free it on cleanup.
+ *
+ * @lock: protects access to @access_bitmap, @done_bitmap and @frozen_read_reqs
+ *
+ * @access_bitmap: represents areas allowed for reading by fleecing user.
+ * Reading from non-dirty areas leads to -EACCES. Discard operation among other
+ * things clears corresponding bits in this bitmaps.
+ *
+ * @done_bitmap: represents areas that was successfully copied by
+ * copy-before-write operations. So, for dirty areas fleecing user should read
+ * from target node and for clear areas - from source node.
+ *
+ * @frozen_read_reqs: current read requests for fleecing user in source node.
+ * corresponding areas must not be rewritten by guest.
+ */
+typedef struct FleecingState {
+    BlockCopyState *bcs;
+
+    CoMutex lock;
+
+    BdrvDirtyBitmap *access_bitmap;
+    BdrvDirtyBitmap *done_bitmap;
+
+    BlockReqList frozen_read_reqs;
+} FleecingState;
+
+FleecingState *fleecing_new(BlockCopyState *bcs,
+                            BlockDriverState *fleecing_node,
+                            Error **errp)
+{
+    BdrvDirtyBitmap *bcs_bitmap = block_copy_dirty_bitmap(bcs),
+                    *done_bitmap, *access_bitmap;
+    int64_t cluster_size = block_copy_cluster_size(bcs);
+    FleecingState *s;
+
+    /* done_bitmap starts empty */
+    done_bitmap = bdrv_create_dirty_bitmap(fleecing_node, cluster_size, NULL,
+                                           errp);
+    if (!done_bitmap) {
+        return NULL;
+    }
+    bdrv_disable_dirty_bitmap(done_bitmap);
+
+    /* access_bitmap starts equal to bcs_bitmap */
+    access_bitmap = bdrv_create_dirty_bitmap(fleecing_node, cluster_size, NULL,
+                                             errp);
+    if (!access_bitmap) {
+        return NULL;
+    }
+    bdrv_disable_dirty_bitmap(access_bitmap);
+    if (!bdrv_dirty_bitmap_merge_internal(access_bitmap, bcs_bitmap,
+                                          NULL, true))
+    {
+        return NULL;
+    }
+
+    s = g_new(FleecingState, 1);
+    *s = (FleecingState) {
+        .bcs = bcs,
+        .done_bitmap = done_bitmap,
+        .access_bitmap = access_bitmap,
+    };
+    qemu_co_mutex_init(&s->lock);
+    QLIST_INIT(&s->frozen_read_reqs);
+
+    return s;
+}
+
+void fleecing_free(FleecingState *s)
+{
+    if (!s) {
+        return;
+    }
+
+    bdrv_release_dirty_bitmap(s->access_bitmap);
+    bdrv_release_dirty_bitmap(s->done_bitmap);
+    g_free(s);
+}
+
+static BlockReq *add_read_req(FleecingState *s, uint64_t offset, uint64_t bytes)
+{
+    BlockReq *req = g_new(BlockReq, 1);
+
+    reqlist_init_req(&s->frozen_read_reqs, req, offset, bytes);
+
+    return req;
+}
+
+static void drop_read_req(BlockReq *req)
+{
+    reqlist_remove_req(req);
+    g_free(req);
+}
+
+int fleecing_read_lock(FleecingState *s, int64_t offset,
+                       int64_t bytes, const BlockReq **req,
+                       int64_t *pnum)
+{
+    bool done;
+
+    QEMU_LOCK_GUARD(&s->lock);
+
+    if (bdrv_dirty_bitmap_next_zero(s->access_bitmap, offset, bytes) != -1) {
+        return -EACCES;
+    }
+
+    bdrv_dirty_bitmap_status(s->done_bitmap, offset, bytes, &done, pnum);
+    if (!done) {
+        *req = add_read_req(s, offset, *pnum);
+    }
+
+    return 0;
+}
+
+void fleecing_read_unlock(FleecingState *s, const BlockReq *req)
+{
+    QEMU_LOCK_GUARD(&s->lock);
+
+    drop_read_req((BlockReq *)req);
+}
+
+void fleecing_discard(FleecingState *s, int64_t offset, int64_t bytes)
+{
+    WITH_QEMU_LOCK_GUARD(&s->lock) {
+        bdrv_reset_dirty_bitmap(s->access_bitmap, offset, bytes);
+    }
+
+    block_copy_reset(s->bcs, offset, bytes);
+}
+
+void fleecing_mark_done_and_wait_readers(FleecingState *s, int64_t offset,
+                                         int64_t bytes)
+{
+    assert(QEMU_IS_ALIGNED(offset, block_copy_cluster_size(s->bcs)));
+    assert(QEMU_IS_ALIGNED(bytes, block_copy_cluster_size(s->bcs)));
+
+    WITH_QEMU_LOCK_GUARD(&s->lock) {
+        bdrv_set_dirty_bitmap(s->done_bitmap, offset, bytes);
+        reqlist_wait_all(&s->frozen_read_reqs, offset, bytes, &s->lock);
+    }
+}
diff --git a/MAINTAINERS b/MAINTAINERS
index 7f24ee4b92..78ea04e292 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2423,6 +2423,8 @@ F: block/reqlist.c
 F: include/block/reqlist.h
 F: block/copy-before-write.h
 F: block/copy-before-write.c
+F: block/fleecing.h
+F: block/fleecing.c
 F: include/block/aio_task.h
 F: block/aio_task.c
 F: util/qemu-co-shared-resource.c
diff --git a/block/meson.build b/block/meson.build
index 5065cf33ba..d30da90a01 100644
--- a/block/meson.build
+++ b/block/meson.build
@@ -18,6 +18,7 @@ block_ss.add(files(
   'crypto.c',
   'dirty-bitmap.c',
   'filter-compress.c',
+  'fleecing.c',
   'io.c',
   'mirror.c',
   'nbd.c',
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v3 10/19] block: introduce fleecing block driver
  2021-12-22 17:39 [PATCH v3 00/19] Make image fleecing more usable Vladimir Sementsov-Ogievskiy
                   ` (8 preceding siblings ...)
  2021-12-22 17:40 ` [PATCH v3 09/19] block: introduce FleecingState class Vladimir Sementsov-Ogievskiy
@ 2021-12-22 17:40 ` Vladimir Sementsov-Ogievskiy
  2022-01-20 16:11   ` Hanna Reitz
  2021-12-22 17:40 ` [PATCH v3 11/19] block/copy-before-write: support " Vladimir Sementsov-Ogievskiy
                   ` (8 subsequent siblings)
  18 siblings, 1 reply; 38+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2021-12-22 17:40 UTC (permalink / raw)
  To: qemu-block
  Cc: qemu-devel, armbru, xiechanglong.d, wencongyang2, eblake, hreitz,
	kwolf, vsementsov, jsnow, nikita.lapshin

Introduce a new driver, that works in pair with copy-before-write to
improve fleecing.

Without fleecing driver, old fleecing scheme looks as follows:

[guest]
  |
  |root
  v
[copy-before-write] -----> [temp.qcow2] <--- [nbd export]
  |                 target  |
  |file                     |backing
  v                         |
[active disk] <-------------+

With fleecing driver, new scheme is:

[guest]
  |
  |root
  v
[copy-before-write] -----> [fleecing] <--- [nbd export]
  |                 target  |    |
  |file                     |    |file
  v                         |    v
[active disk]<--source------+  [temp.img]

Benefits of new scheme:

1. Access control: if remote client try to read data that not covered
   by original dirty bitmap used on copy-before-write open, client gets
   -EACCES.

2. Discard support: if remote client do DISCARD, this additionally to
   discarding data in temp.img informs block-copy process to not copy
   these clusters. Next read from discarded area will return -EACCES.
   This is significant thing: when fleecing user reads data that was
   not yet copied to temp.img, we can avoid copying it on further guest
   write.

3. Synchronisation between client reads and block-copy write is more
   efficient: it doesn't block intersecting block-copy write during
   client read.

4. We don't rely on backing feature: active disk should not be backing
   of temp image, so we avoid some permission-related difficulties and
   temp image now is not required to support backing, it may be simple
   raw image.

Note that now nobody calls fleecing_drv_activate(), so new driver is
actually unusable. It's a work for the following patch: support
fleecing block driver in copy-before-write filter driver.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 qapi/block-core.json |  37 +++++-
 block/fleecing.h     |  16 +++
 block/fleecing-drv.c | 261 +++++++++++++++++++++++++++++++++++++++++++
 MAINTAINERS          |   1 +
 block/meson.build    |   1 +
 5 files changed, 315 insertions(+), 1 deletion(-)
 create mode 100644 block/fleecing-drv.c

diff --git a/qapi/block-core.json b/qapi/block-core.json
index 6904daeacf..b47351dbac 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -2917,13 +2917,14 @@
 # @blkreplay: Since 4.2
 # @compress: Since 5.0
 # @copy-before-write: Since 6.2
+# @fleecing: Since 7.0
 #
 # Since: 2.9
 ##
 { 'enum': 'BlockdevDriver',
   'data': [ 'blkdebug', 'blklogwrites', 'blkreplay', 'blkverify', 'bochs',
             'cloop', 'compress', 'copy-before-write', 'copy-on-read', 'dmg',
-            'file', 'ftp', 'ftps', 'gluster',
+            'file', 'fleecing', 'ftp', 'ftps', 'gluster',
             {'name': 'host_cdrom', 'if': 'HAVE_HOST_BLOCK_DEVICE' },
             {'name': 'host_device', 'if': 'HAVE_HOST_BLOCK_DEVICE' },
             'http', 'https', 'iscsi',
@@ -4181,6 +4182,39 @@
   'base': 'BlockdevOptionsGenericFormat',
   'data': { 'target': 'BlockdevRef', '*bitmap': 'BlockDirtyBitmap' } }
 
+##
+# @BlockdevOptionsFleecing:
+#
+# Driver that works in pair with copy-before-write filter to make a fleecing
+# scheme like this:
+#
+#    [guest]
+#      |
+#      |root
+#      v
+#    [copy-before-write] -----> [fleecing] <--- [nbd export]
+#      |                 target  |    |
+#      |file                     |    |file
+#      v                         |    v
+#    [active disk]<--source------+  [temp.img]
+#
+# The scheme works like this: on write, fleecing driver saves data to its
+# ``file`` child and remember that this data is in ``file`` child. On read
+# fleecing reads from ``file`` child if data is already stored to it and
+# otherwise it reads from ``source`` child.
+# In the same time, before each guest write, ``copy-before-write`` copies
+# corresponding old data  from ``active disk`` to ``fleecing`` node.
+# This way, ``fleecing`` node looks like a kind of snapshot for extenal
+# reader like NBD export.
+#
+# @source: node name of source node of fleecing scheme
+#
+# Since: 7.0
+##
+{ 'struct': 'BlockdevOptionsFleecing',
+  'base': 'BlockdevOptionsGenericFormat',
+  'data': { 'source': 'str' } }
+
 ##
 # @BlockdevOptions:
 #
@@ -4237,6 +4271,7 @@
       'copy-on-read':'BlockdevOptionsCor',
       'dmg':        'BlockdevOptionsGenericFormat',
       'file':       'BlockdevOptionsFile',
+      'fleecing':   'BlockdevOptionsFleecing',
       'ftp':        'BlockdevOptionsCurlFtp',
       'ftps':       'BlockdevOptionsCurlFtps',
       'gluster':    'BlockdevOptionsGluster',
diff --git a/block/fleecing.h b/block/fleecing.h
index fb7b2f86c4..75ad2f8b19 100644
--- a/block/fleecing.h
+++ b/block/fleecing.h
@@ -80,6 +80,9 @@
 #include "block/block-copy.h"
 #include "block/reqlist.h"
 
+
+/* fleecing.c */
+
 typedef struct FleecingState FleecingState;
 
 /*
@@ -132,4 +135,17 @@ void fleecing_discard(FleecingState *f, int64_t offset, int64_t bytes);
 void fleecing_mark_done_and_wait_readers(FleecingState *f, int64_t offset,
                                          int64_t bytes);
 
+
+/* fleecing-drv.c */
+
+/* Returns true if @bs->drv is fleecing block driver */
+bool is_fleecing_drv(BlockDriverState *bs);
+
+/*
+ * Normally FleecingState is created by copy-before-write filter. Then
+ * copy-before-write filter calls fleecing_drv_activate() to share FleecingState
+ * with fleecing block driver.
+ */
+void fleecing_drv_activate(BlockDriverState *bs, FleecingState *fleecing);
+
 #endif /* FLEECING_H */
diff --git a/block/fleecing-drv.c b/block/fleecing-drv.c
new file mode 100644
index 0000000000..202208bb03
--- /dev/null
+++ b/block/fleecing-drv.c
@@ -0,0 +1,261 @@
+/*
+ * fleecing block driver
+ *
+ * Copyright (c) 2021 Virtuozzo International GmbH.
+ *
+ * Author:
+ *  Sementsov-Ogievskiy Vladimir <vsementsov@virtuozzo.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include "qemu/osdep.h"
+
+#include "sysemu/block-backend.h"
+#include "qemu/cutils.h"
+#include "qapi/error.h"
+#include "block/block_int.h"
+#include "block/coroutines.h"
+#include "block/qdict.h"
+#include "block/block-copy.h"
+#include "block/reqlist.h"
+
+#include "block/copy-before-write.h"
+#include "block/fleecing.h"
+
+typedef struct BDRVFleecingState {
+    FleecingState *fleecing;
+    BdrvChild *source;
+} BDRVFleecingState;
+
+static coroutine_fn int fleecing_co_preadv_part(
+        BlockDriverState *bs, int64_t offset, int64_t bytes,
+        QEMUIOVector *qiov, size_t qiov_offset, BdrvRequestFlags flags)
+{
+    BDRVFleecingState *s = bs->opaque;
+    const BlockReq *req;
+    int ret;
+
+    if (!s->fleecing) {
+        /* fleecing_drv_activate() was not called */
+        return -EINVAL;
+    }
+
+    /* TODO: upgrade to async loop using AioTask */
+    while (bytes) {
+        int64_t cur_bytes;
+
+        ret = fleecing_read_lock(s->fleecing, offset, bytes, &req, &cur_bytes);
+        if (ret < 0) {
+            return ret;
+        }
+
+        if (req) {
+            ret = bdrv_co_preadv_part(s->source, offset, cur_bytes,
+                                      qiov, qiov_offset, flags);
+            fleecing_read_unlock(s->fleecing, req);
+        } else {
+            ret = bdrv_co_preadv_part(bs->file, offset, cur_bytes,
+                                      qiov, qiov_offset, flags);
+        }
+        if (ret < 0) {
+            return ret;
+        }
+
+        bytes -= cur_bytes;
+        offset += cur_bytes;
+        qiov_offset += cur_bytes;
+    }
+
+    return 0;
+}
+
+static int coroutine_fn fleecing_co_block_status(BlockDriverState *bs,
+                                                 bool want_zero, int64_t offset,
+                                                 int64_t bytes, int64_t *pnum,
+                                                 int64_t *map,
+                                                 BlockDriverState **file)
+{
+    BDRVFleecingState *s = bs->opaque;
+    const BlockReq *req = NULL;
+    int ret;
+    int64_t cur_bytes;
+
+    if (!s->fleecing) {
+        /* fleecing_drv_activate() was not called */
+        return -EINVAL;
+    }
+
+    ret = fleecing_read_lock(s->fleecing, offset, bytes, &req, &cur_bytes);
+    if (ret < 0) {
+        return ret;
+    }
+
+    *pnum = cur_bytes;
+    *map = offset;
+
+    if (req) {
+        *file = s->source->bs;
+        fleecing_read_unlock(s->fleecing, req);
+    } else {
+        *file = bs->file->bs;
+    }
+
+    return ret;
+}
+
+static int coroutine_fn fleecing_co_pdiscard(BlockDriverState *bs,
+                                             int64_t offset, int64_t bytes)
+{
+    BDRVFleecingState *s = bs->opaque;
+    if (!s->fleecing) {
+        /* fleecing_drv_activate() was not called */
+        return -EINVAL;
+    }
+
+    fleecing_discard(s->fleecing, offset, bytes);
+
+    bdrv_co_pdiscard(bs->file, offset, bytes);
+
+    /*
+     * Ignore bdrv_co_pdiscard() result: fleecing_discard() succeeded, that
+     * means that next read from this area will fail with -EACCES. More correct
+     * to report success now.
+     */
+    return 0;
+}
+
+static int coroutine_fn fleecing_co_pwrite_zeroes(BlockDriverState *bs,
+        int64_t offset, int64_t bytes, BdrvRequestFlags flags)
+{
+    BDRVFleecingState *s = bs->opaque;
+    if (!s->fleecing) {
+        /* fleecing_drv_activate() was not called */
+        return -EINVAL;
+    }
+
+    /*
+     * TODO: implement cache, to have a chance to fleecing user to read and
+     * discard this data before actual writing to temporary image.
+     */
+    return bdrv_co_pwrite_zeroes(bs->file, offset, bytes, flags);
+}
+
+static coroutine_fn int fleecing_co_pwritev(BlockDriverState *bs,
+                                            int64_t offset,
+                                            int64_t bytes,
+                                            QEMUIOVector *qiov,
+                                            BdrvRequestFlags flags)
+{
+    BDRVFleecingState *s = bs->opaque;
+    if (!s->fleecing) {
+        /* fleecing_drv_activate() was not called */
+        return -EINVAL;
+    }
+
+    /*
+     * TODO: implement cache, to have a chance to fleecing user to read and
+     * discard this data before actual writing to temporary image.
+     */
+    return bdrv_co_pwritev(bs->file, offset, bytes, qiov, flags);
+}
+
+
+static void fleecing_refresh_filename(BlockDriverState *bs)
+{
+    pstrcpy(bs->exact_filename, sizeof(bs->exact_filename),
+            bs->file->bs->filename);
+}
+
+static int fleecing_open(BlockDriverState *bs, QDict *options, int flags,
+                         Error **errp)
+{
+    BDRVFleecingState *s = bs->opaque;
+
+    bs->file = bdrv_open_child(NULL, options, "file", bs, &child_of_bds,
+                               BDRV_CHILD_DATA | BDRV_CHILD_PRIMARY,
+                               false, errp);
+    if (!bs->file) {
+        return -EINVAL;
+    }
+
+    s->source = bdrv_open_child(NULL, options, "source", bs, &child_of_bds,
+                               BDRV_CHILD_DATA, false, errp);
+    if (!s->source) {
+        return -EINVAL;
+    }
+
+    bs->total_sectors = bs->file->bs->total_sectors;
+
+    return 0;
+}
+
+static void fleecing_child_perm(BlockDriverState *bs, BdrvChild *c,
+                                BdrvChildRole role,
+                                BlockReopenQueue *reopen_queue,
+                                uint64_t perm, uint64_t shared,
+                                uint64_t *nperm, uint64_t *nshared)
+{
+    bdrv_default_perms(bs, c, role, reopen_queue, perm, shared, nperm, nshared);
+
+    if (role & BDRV_CHILD_PRIMARY) {
+        *nshared &= BLK_PERM_CONSISTENT_READ;
+    } else {
+        *nperm &= BLK_PERM_CONSISTENT_READ;
+
+        /*
+         * copy-before-write filter is responsible for source child and need
+         * write access to it.
+         */
+        *nshared |= BLK_PERM_WRITE;
+    }
+}
+
+BlockDriver bdrv_fleecing_drv = {
+    .format_name = "fleecing",
+    .instance_size = sizeof(BDRVFleecingState),
+
+    .bdrv_open                  = fleecing_open,
+
+    .bdrv_co_preadv_part        = fleecing_co_preadv_part,
+    .bdrv_co_pwritev            = fleecing_co_pwritev,
+    .bdrv_co_pwrite_zeroes      = fleecing_co_pwrite_zeroes,
+    .bdrv_co_pdiscard           = fleecing_co_pdiscard,
+    .bdrv_co_block_status       = fleecing_co_block_status,
+
+    .bdrv_refresh_filename      = fleecing_refresh_filename,
+
+    .bdrv_child_perm            = fleecing_child_perm,
+};
+
+bool is_fleecing_drv(BlockDriverState *bs)
+{
+    return bs && bs->drv == &bdrv_fleecing_drv;
+}
+
+void fleecing_drv_activate(BlockDriverState *bs, FleecingState *fleecing)
+{
+    BDRVFleecingState *s = bs->opaque;
+
+    assert(is_fleecing_drv(bs));
+
+    s->fleecing = fleecing;
+}
+
+static void fleecing_init(void)
+{
+    bdrv_register(&bdrv_fleecing_drv);
+}
+
+block_init(fleecing_init);
diff --git a/MAINTAINERS b/MAINTAINERS
index 78ea04e292..42dc979052 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2425,6 +2425,7 @@ F: block/copy-before-write.h
 F: block/copy-before-write.c
 F: block/fleecing.h
 F: block/fleecing.c
+F: block/fleecing-drv.c
 F: include/block/aio_task.h
 F: block/aio_task.c
 F: util/qemu-co-shared-resource.c
diff --git a/block/meson.build b/block/meson.build
index d30da90a01..b493580fbe 100644
--- a/block/meson.build
+++ b/block/meson.build
@@ -19,6 +19,7 @@ block_ss.add(files(
   'dirty-bitmap.c',
   'filter-compress.c',
   'fleecing.c',
+  'fleecing-drv.c',
   'io.c',
   'mirror.c',
   'nbd.c',
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v3 11/19] block/copy-before-write: support fleecing block driver
  2021-12-22 17:39 [PATCH v3 00/19] Make image fleecing more usable Vladimir Sementsov-Ogievskiy
                   ` (9 preceding siblings ...)
  2021-12-22 17:40 ` [PATCH v3 10/19] block: introduce fleecing block driver Vladimir Sementsov-Ogievskiy
@ 2021-12-22 17:40 ` Vladimir Sementsov-Ogievskiy
  2021-12-22 17:40 ` [PATCH v3 12/19] block/block-copy: add write-unchanged mode Vladimir Sementsov-Ogievskiy
                   ` (7 subsequent siblings)
  18 siblings, 0 replies; 38+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2021-12-22 17:40 UTC (permalink / raw)
  To: qemu-block
  Cc: qemu-devel, armbru, xiechanglong.d, wencongyang2, eblake, hreitz,
	kwolf, vsementsov, jsnow, nikita.lapshin

The last step to make new fleecing scheme work (see block/fleecing.h
for descritption) is to update copy-before-write filter:

If we detect that unfiltered target child is fleecing block driver, we
do:
 - initialize shared FleecingState
 - activate fleecing block driver with it
 - do guest write synchronization with help of
   fleecing_mark_done_and_wait_readers() function

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 block/copy-before-write.c | 27 ++++++++++++++++++++++++++-
 1 file changed, 26 insertions(+), 1 deletion(-)

diff --git a/block/copy-before-write.c b/block/copy-before-write.c
index 4cd90d22df..2e39159a7e 100644
--- a/block/copy-before-write.c
+++ b/block/copy-before-write.c
@@ -33,10 +33,13 @@
 #include "block/block-copy.h"
 
 #include "block/copy-before-write.h"
+#include "block/fleecing.h"
 
 typedef struct BDRVCopyBeforeWriteState {
     BlockCopyState *bcs;
     BdrvChild *target;
+
+    FleecingState *fleecing;
 } BDRVCopyBeforeWriteState;
 
 static coroutine_fn int cbw_co_preadv(
@@ -50,6 +53,7 @@ static coroutine_fn int cbw_do_copy_before_write(BlockDriverState *bs,
         uint64_t offset, uint64_t bytes, BdrvRequestFlags flags)
 {
     BDRVCopyBeforeWriteState *s = bs->opaque;
+    int ret;
     uint64_t off, end;
     int64_t cluster_size = block_copy_cluster_size(s->bcs);
 
@@ -60,7 +64,16 @@ static coroutine_fn int cbw_do_copy_before_write(BlockDriverState *bs,
     off = QEMU_ALIGN_DOWN(offset, cluster_size);
     end = QEMU_ALIGN_UP(offset + bytes, cluster_size);
 
-    return block_copy(s->bcs, off, end - off, true);
+    ret = block_copy(s->bcs, off, end - off, true);
+    if (ret < 0) {
+        return ret;
+    }
+
+    if (s->fleecing) {
+        fleecing_mark_done_and_wait_readers(s->fleecing, off, end - off);
+    }
+
+    return 0;
 }
 
 static int coroutine_fn cbw_co_pdiscard(BlockDriverState *bs,
@@ -150,6 +163,7 @@ static int cbw_open(BlockDriverState *bs, QDict *options, int flags,
 {
     BDRVCopyBeforeWriteState *s = bs->opaque;
     BdrvDirtyBitmap *bitmap = NULL;
+    BlockDriverState *unfiltered_target;
 
     bs->file = bdrv_open_child(NULL, options, "file", bs, &child_of_bds,
                                BDRV_CHILD_FILTERED | BDRV_CHILD_PRIMARY,
@@ -163,6 +177,7 @@ static int cbw_open(BlockDriverState *bs, QDict *options, int flags,
     if (!s->target) {
         return -EINVAL;
     }
+    unfiltered_target = bdrv_skip_filters(s->target->bs);
 
     if (qdict_haskey(options, "bitmap.node") ||
         qdict_haskey(options, "bitmap.name"))
@@ -204,6 +219,14 @@ static int cbw_open(BlockDriverState *bs, QDict *options, int flags,
         return -EINVAL;
     }
 
+    if (is_fleecing_drv(unfiltered_target)) {
+        s->fleecing = fleecing_new(s->bcs, unfiltered_target, errp);
+        if (!s->fleecing) {
+            return -EINVAL;
+        }
+        fleecing_drv_activate(unfiltered_target, s->fleecing);
+    }
+
     return 0;
 }
 
@@ -211,6 +234,8 @@ static void cbw_close(BlockDriverState *bs)
 {
     BDRVCopyBeforeWriteState *s = bs->opaque;
 
+    fleecing_free(s->fleecing);
+    s->fleecing = NULL;
     block_copy_state_free(s->bcs);
     s->bcs = NULL;
 }
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v3 12/19] block/block-copy: add write-unchanged mode
  2021-12-22 17:39 [PATCH v3 00/19] Make image fleecing more usable Vladimir Sementsov-Ogievskiy
                   ` (10 preceding siblings ...)
  2021-12-22 17:40 ` [PATCH v3 11/19] block/copy-before-write: support " Vladimir Sementsov-Ogievskiy
@ 2021-12-22 17:40 ` Vladimir Sementsov-Ogievskiy
  2021-12-22 17:40 ` [PATCH v3 13/19] block/copy-before-write: use write-unchanged in fleecing mode Vladimir Sementsov-Ogievskiy
                   ` (6 subsequent siblings)
  18 siblings, 0 replies; 38+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2021-12-22 17:40 UTC (permalink / raw)
  To: qemu-block
  Cc: qemu-devel, armbru, xiechanglong.d, wencongyang2, eblake, hreitz,
	kwolf, vsementsov, jsnow, nikita.lapshin

We are going to implement push backup with fleecing scheme. This means
that backup job will be a fleecing user and therefore will not need
separate copy-before-write filter. Instead it will consider source as
constant unchanged drive. Of course backup will want to unshare writes
on source for this case. But we want to do copy-before-write
operations. Still these operations may be considered as
write-unchanged. Add corresponding option to block-copy now, to use in
the following commit.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 include/block/block-copy.h | 3 ++-
 block/block-copy.c         | 9 ++++++---
 block/copy-before-write.c  | 2 +-
 3 files changed, 9 insertions(+), 5 deletions(-)

diff --git a/include/block/block-copy.h b/include/block/block-copy.h
index a11e1620f6..a66f81d314 100644
--- a/include/block/block-copy.h
+++ b/include/block/block-copy.h
@@ -25,7 +25,8 @@ typedef struct BlockCopyState BlockCopyState;
 typedef struct BlockCopyCallState BlockCopyCallState;
 
 BlockCopyState *block_copy_state_new(BdrvChild *source, BdrvChild *target,
-                                     BdrvDirtyBitmap *bitmap, Error **errp);
+                                     BdrvDirtyBitmap *bitmap,
+                                     bool write_unchanged, Error **errp);
 
 /* Function should be called prior any actual copy request */
 void block_copy_set_copy_opts(BlockCopyState *s, bool use_copy_range,
diff --git a/block/block-copy.c b/block/block-copy.c
index f70f1ad993..e2cf67e335 100644
--- a/block/block-copy.c
+++ b/block/block-copy.c
@@ -280,7 +280,8 @@ void block_copy_set_copy_opts(BlockCopyState *s, bool use_copy_range,
                               bool compress)
 {
     /* Keep BDRV_REQ_SERIALISING set (or not set) in block_copy_state_new() */
-    s->write_flags = (s->write_flags & BDRV_REQ_SERIALISING) |
+    s->write_flags = (s->write_flags &
+                      (BDRV_REQ_SERIALISING | BDRV_REQ_WRITE_UNCHANGED)) |
         (compress ? BDRV_REQ_WRITE_COMPRESSED : 0);
 
     if (s->max_transfer < s->cluster_size) {
@@ -341,7 +342,8 @@ static int64_t block_copy_calculate_cluster_size(BlockDriverState *target,
 }
 
 BlockCopyState *block_copy_state_new(BdrvChild *source, BdrvChild *target,
-                                     BdrvDirtyBitmap *bitmap, Error **errp)
+                                     BdrvDirtyBitmap *bitmap,
+                                     bool write_unchanged, Error **errp)
 {
     ERRP_GUARD();
     BlockCopyState *s;
@@ -394,7 +396,8 @@ BlockCopyState *block_copy_state_new(BdrvChild *source, BdrvChild *target,
         .copy_bitmap = copy_bitmap,
         .cluster_size = cluster_size,
         .len = bdrv_dirty_bitmap_size(copy_bitmap),
-        .write_flags = (is_fleecing ? BDRV_REQ_SERIALISING : 0),
+        .write_flags = (is_fleecing ? BDRV_REQ_SERIALISING : 0) |
+            (write_unchanged ? BDRV_REQ_WRITE_UNCHANGED : 0),
         .mem = shres_create(BLOCK_COPY_MAX_MEM),
         .max_transfer = QEMU_ALIGN_DOWN(
                                     block_copy_max_transfer(source, target),
diff --git a/block/copy-before-write.c b/block/copy-before-write.c
index 2e39159a7e..f95c54dbdf 100644
--- a/block/copy-before-write.c
+++ b/block/copy-before-write.c
@@ -213,7 +213,7 @@ static int cbw_open(BlockDriverState *bs, QDict *options, int flags,
             ((BDRV_REQ_FUA | BDRV_REQ_MAY_UNMAP | BDRV_REQ_NO_FALLBACK) &
              bs->file->bs->supported_zero_flags);
 
-    s->bcs = block_copy_state_new(bs->file, s->target, bitmap, errp);
+    s->bcs = block_copy_state_new(bs->file, s->target, bitmap, false, errp);
     if (!s->bcs) {
         error_prepend(errp, "Cannot create block-copy-state: ");
         return -EINVAL;
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v3 13/19] block/copy-before-write: use write-unchanged in fleecing mode
  2021-12-22 17:39 [PATCH v3 00/19] Make image fleecing more usable Vladimir Sementsov-Ogievskiy
                   ` (11 preceding siblings ...)
  2021-12-22 17:40 ` [PATCH v3 12/19] block/block-copy: add write-unchanged mode Vladimir Sementsov-Ogievskiy
@ 2021-12-22 17:40 ` Vladimir Sementsov-Ogievskiy
  2021-12-22 17:40 ` [PATCH v3 14/19] iotests/image-fleecing: add test-case for fleecing format node Vladimir Sementsov-Ogievskiy
                   ` (5 subsequent siblings)
  18 siblings, 0 replies; 38+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2021-12-22 17:40 UTC (permalink / raw)
  To: qemu-block
  Cc: qemu-devel, armbru, xiechanglong.d, wencongyang2, eblake, hreitz,
	kwolf, vsementsov, jsnow, nikita.lapshin

As announced in previous commit, we need use write-unchanged operations
for fleecing, so that fleecing client may unshare writes if needed.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 block/copy-before-write.c | 13 +++++++++++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/block/copy-before-write.c b/block/copy-before-write.c
index f95c54dbdf..ca0d7fa5ff 100644
--- a/block/copy-before-write.c
+++ b/block/copy-before-write.c
@@ -133,6 +133,8 @@ static void cbw_child_perm(BlockDriverState *bs, BdrvChild *c,
                            uint64_t perm, uint64_t shared,
                            uint64_t *nperm, uint64_t *nshared)
 {
+    BDRVCopyBeforeWriteState *s = bs->opaque;
+
     if (!(role & BDRV_CHILD_FILTERED)) {
         /*
          * Target child
@@ -143,7 +145,7 @@ static void cbw_child_perm(BlockDriverState *bs, BdrvChild *c,
          * only upfront.
          */
         *nshared = BLK_PERM_ALL & ~BLK_PERM_RESIZE;
-        *nperm = BLK_PERM_WRITE;
+        *nperm = s->fleecing ? BLK_PERM_WRITE_UNCHANGED : BLK_PERM_WRITE;
     } else {
         /* Source child */
         bdrv_default_perms(bs, c, role, reopen_queue,
@@ -213,7 +215,14 @@ static int cbw_open(BlockDriverState *bs, QDict *options, int flags,
             ((BDRV_REQ_FUA | BDRV_REQ_MAY_UNMAP | BDRV_REQ_NO_FALLBACK) &
              bs->file->bs->supported_zero_flags);
 
-    s->bcs = block_copy_state_new(bs->file, s->target, bitmap, false, errp);
+    /*
+     * For fleecing scheme set parameter write_unchanged=true, as our
+     * copy-before-write operations will actually be write-unchanged. As well we
+     * take write-unchanged permission instead of write, which is important for
+     * backup with immutable_source=true to work as fleecing client.
+     */
+    s->bcs = block_copy_state_new(bs->file, s->target, bitmap,
+                                  is_fleecing_drv(unfiltered_target), errp);
     if (!s->bcs) {
         error_prepend(errp, "Cannot create block-copy-state: ");
         return -EINVAL;
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v3 14/19] iotests/image-fleecing: add test-case for fleecing format node
  2021-12-22 17:39 [PATCH v3 00/19] Make image fleecing more usable Vladimir Sementsov-Ogievskiy
                   ` (12 preceding siblings ...)
  2021-12-22 17:40 ` [PATCH v3 13/19] block/copy-before-write: use write-unchanged in fleecing mode Vladimir Sementsov-Ogievskiy
@ 2021-12-22 17:40 ` Vladimir Sementsov-Ogievskiy
  2021-12-22 17:40 ` [PATCH v3 15/19] iotests.py: add qemu_io_pipe_and_status() Vladimir Sementsov-Ogievskiy
                   ` (4 subsequent siblings)
  18 siblings, 0 replies; 38+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2021-12-22 17:40 UTC (permalink / raw)
  To: qemu-block
  Cc: qemu-devel, armbru, xiechanglong.d, wencongyang2, eblake, hreitz,
	kwolf, vsementsov, jsnow, nikita.lapshin

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 tests/qemu-iotests/tests/image-fleecing     | 67 ++++++++++++------
 tests/qemu-iotests/tests/image-fleecing.out | 76 ++++++++++++++++++++-
 2 files changed, 122 insertions(+), 21 deletions(-)

diff --git a/tests/qemu-iotests/tests/image-fleecing b/tests/qemu-iotests/tests/image-fleecing
index a58b5a1781..2544782c28 100755
--- a/tests/qemu-iotests/tests/image-fleecing
+++ b/tests/qemu-iotests/tests/image-fleecing
@@ -49,12 +49,17 @@ remainder = [('0xd5', '0x108000',  '32k'), # Right-end of partial-left [1]
              ('0xdc', '32M',       '32k'), # Left-end of partial-right [2]
              ('0xcd', '0x3ff0000', '64k')] # patterns[3]
 
-def do_test(use_cbw, base_img_path, fleece_img_path, nbd_sock_path, vm):
+def do_test(use_cbw, use_fleecing_filter, base_img_path,
+            fleece_img_path, nbd_sock_path, vm):
     log('--- Setting up images ---')
     log('')
 
     assert qemu_img('create', '-f', iotests.imgfmt, base_img_path, '64M') == 0
-    assert qemu_img('create', '-f', 'qcow2', fleece_img_path, '64M') == 0
+    if use_fleecing_filter:
+        assert use_cbw
+        assert qemu_img('create', '-f', 'raw', fleece_img_path, '64M') == 0
+    else:
+        assert qemu_img('create', '-f', 'qcow2', fleece_img_path, '64M') == 0
 
     for p in patterns:
         qemu_io('-f', iotests.imgfmt,
@@ -81,24 +86,39 @@ def do_test(use_cbw, base_img_path, fleece_img_path, nbd_sock_path, vm):
     log('')
 
 
-    # create tmp_node backed by src_node
-    log(vm.qmp('blockdev-add', {
-        'driver': 'qcow2',
-        'node-name': tmp_node,
-        'file': {
+    if use_fleecing_filter:
+        log(vm.qmp('blockdev-add', {
+            'node-name': tmp_node,
             'driver': 'file',
             'filename': fleece_img_path,
-        },
-        'backing': src_node,
-    }))
+        }))
+    else:
+        # create tmp_node backed by src_node
+        log(vm.qmp('blockdev-add', {
+            'driver': 'qcow2',
+            'node-name': tmp_node,
+            'file': {
+                'driver': 'file',
+                'filename': fleece_img_path,
+            },
+            'backing': src_node,
+        }))
 
     # Establish CBW from source to fleecing node
     if use_cbw:
+        if use_fleecing_filter:
+            log(vm.qmp('blockdev-add', {
+                'driver': 'fleecing',
+                'node-name': 'fl-fleecing',
+                'file': tmp_node,
+                'source': src_node,
+            }))
+
         log(vm.qmp('blockdev-add', {
             'driver': 'copy-before-write',
             'node-name': 'fl-cbw',
             'file': src_node,
-            'target': tmp_node
+            'target': 'fl-fleecing' if use_fleecing_filter else tmp_node
         }))
 
         log(vm.qmp('qom-set', path=qom_path, property='drive', value='fl-cbw'))
@@ -109,16 +129,18 @@ def do_test(use_cbw, base_img_path, fleece_img_path, nbd_sock_path, vm):
                    target=tmp_node,
                    sync='none'))
 
+    export_node = 'fl-fleecing' if use_fleecing_filter else tmp_node
+
     log('')
     log('--- Setting up NBD Export ---')
     log('')
 
-    nbd_uri = 'nbd+unix:///%s?socket=%s' % (tmp_node, nbd_sock_path)
+    nbd_uri = 'nbd+unix:///%s?socket=%s' % (export_node, nbd_sock_path)
     log(vm.qmp('nbd-server-start',
                {'addr': {'type': 'unix',
                          'data': {'path': nbd_sock_path}}}))
 
-    log(vm.qmp('nbd-server-add', device=tmp_node))
+    log(vm.qmp('nbd-server-add', device=export_node))
 
     log('')
     log('--- Sanity Check ---')
@@ -151,16 +173,19 @@ def do_test(use_cbw, base_img_path, fleece_img_path, nbd_sock_path, vm):
     log('--- Cleanup ---')
     log('')
 
+    log(vm.qmp('nbd-server-stop'))
+
     if use_cbw:
         log(vm.qmp('qom-set', path=qom_path, property='drive', value=src_node))
         log(vm.qmp('blockdev-del', node_name='fl-cbw'))
+        if use_fleecing_filter:
+            log(vm.qmp('blockdev-del', node_name='fl-fleecing'))
     else:
         log(vm.qmp('block-job-cancel', device='fleecing'))
         e = vm.event_wait('BLOCK_JOB_CANCELLED')
         assert e is not None
         log(e, filters=[iotests.filter_qmp_event])
 
-    log(vm.qmp('nbd-server-stop'))
     log(vm.qmp('blockdev-del', node_name=tmp_node))
     vm.shutdown()
 
@@ -177,17 +202,21 @@ def do_test(use_cbw, base_img_path, fleece_img_path, nbd_sock_path, vm):
     log('Done')
 
 
-def test(use_cbw):
+def test(use_cbw, use_fleecing_filter):
     with iotests.FilePath('base.img') as base_img_path, \
          iotests.FilePath('fleece.img') as fleece_img_path, \
          iotests.FilePath('nbd.sock',
                           base_dir=iotests.sock_dir) as nbd_sock_path, \
          iotests.VM() as vm:
-        do_test(use_cbw, base_img_path, fleece_img_path, nbd_sock_path, vm)
+        do_test(use_cbw, use_fleecing_filter, base_img_path,
+                fleece_img_path, nbd_sock_path, vm)
 
 
 log('=== Test backup(sync=none) based fleecing ===\n')
-test(False)
+test(False, False)
 
-log('=== Test filter based fleecing ===\n')
-test(True)
+log('=== Test cbw-filter based fleecing ===\n')
+test(True, False)
+
+log('=== Test fleecing-format based fleecing ===\n')
+test(True, True)
diff --git a/tests/qemu-iotests/tests/image-fleecing.out b/tests/qemu-iotests/tests/image-fleecing.out
index e96d122a8b..da0af93388 100644
--- a/tests/qemu-iotests/tests/image-fleecing.out
+++ b/tests/qemu-iotests/tests/image-fleecing.out
@@ -52,8 +52,8 @@ read -P0 0x3fe0000 64k
 --- Cleanup ---
 
 {"return": {}}
-{"data": {"device": "fleecing", "len": 67108864, "offset": 393216, "speed": 0, "type": "backup"}, "event": "BLOCK_JOB_CANCELLED", "timestamp": {"microseconds": "USECS", "seconds": "SECS"}}
 {"return": {}}
+{"data": {"device": "fleecing", "len": 67108864, "offset": 393216, "speed": 0, "type": "backup"}, "event": "BLOCK_JOB_CANCELLED", "timestamp": {"microseconds": "USECS", "seconds": "SECS"}}
 {"return": {}}
 
 --- Confirming writes ---
@@ -67,7 +67,7 @@ read -P0xdc 32M 32k
 read -P0xcd 0x3ff0000 64k
 
 Done
-=== Test filter based fleecing ===
+=== Test cbw-filter based fleecing ===
 
 --- Setting up images ---
 
@@ -137,3 +137,75 @@ read -P0xdc 32M 32k
 read -P0xcd 0x3ff0000 64k
 
 Done
+=== Test fleecing-format based fleecing ===
+
+--- Setting up images ---
+
+Done
+
+--- Launching VM ---
+
+Done
+
+--- Setting up Fleecing Graph ---
+
+{"return": {}}
+{"return": {}}
+{"return": {}}
+{"return": {}}
+
+--- Setting up NBD Export ---
+
+{"return": {}}
+{"return": {}}
+
+--- Sanity Check ---
+
+read -P0x5d 0 64k
+read -P0xd5 1M 64k
+read -P0xdc 32M 64k
+read -P0xcd 0x3ff0000 64k
+read -P0 0x00f8000 32k
+read -P0 0x2010000 32k
+read -P0 0x3fe0000 64k
+
+--- Testing COW ---
+
+write -P0xab 0 64k
+{"return": ""}
+write -P0xad 0x00f8000 64k
+{"return": ""}
+write -P0x1d 0x2008000 64k
+{"return": ""}
+write -P0xea 0x3fe0000 64k
+{"return": ""}
+
+--- Verifying Data ---
+
+read -P0x5d 0 64k
+read -P0xd5 1M 64k
+read -P0xdc 32M 64k
+read -P0xcd 0x3ff0000 64k
+read -P0 0x00f8000 32k
+read -P0 0x2010000 32k
+read -P0 0x3fe0000 64k
+
+--- Cleanup ---
+
+{"return": {}}
+{"return": {}}
+{"return": {}}
+{"return": {}}
+{"return": {}}
+
+--- Confirming writes ---
+
+read -P0xab 0 64k
+read -P0xad 0x00f8000 64k
+read -P0x1d 0x2008000 64k
+read -P0xea 0x3fe0000 64k
+read -P0xd5 0x108000 32k
+read -P0xdc 32M 32k
+read -P0xcd 0x3ff0000 64k
+
+Done
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v3 15/19] iotests.py: add qemu_io_pipe_and_status()
  2021-12-22 17:39 [PATCH v3 00/19] Make image fleecing more usable Vladimir Sementsov-Ogievskiy
                   ` (13 preceding siblings ...)
  2021-12-22 17:40 ` [PATCH v3 14/19] iotests/image-fleecing: add test-case for fleecing format node Vladimir Sementsov-Ogievskiy
@ 2021-12-22 17:40 ` Vladimir Sementsov-Ogievskiy
  2021-12-22 17:40 ` [PATCH v3 16/19] iotests/image-fleecing: add test case with bitmap Vladimir Sementsov-Ogievskiy
                   ` (3 subsequent siblings)
  18 siblings, 0 replies; 38+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2021-12-22 17:40 UTC (permalink / raw)
  To: qemu-block
  Cc: qemu-devel, armbru, xiechanglong.d, wencongyang2, eblake, hreitz,
	kwolf, vsementsov, jsnow, nikita.lapshin

Add helper that returns both status and output, to be used in the
following commit

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 tests/qemu-iotests/iotests.py | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py
index 83bfedb902..b5e6216517 100644
--- a/tests/qemu-iotests/iotests.py
+++ b/tests/qemu-iotests/iotests.py
@@ -238,6 +238,10 @@ def qemu_io(*args):
     args = qemu_io_args + list(args)
     return qemu_tool_pipe_and_status('qemu-io', args)[0]
 
+def qemu_io_pipe_and_status(*args):
+    args = qemu_io_args + list(args)
+    return qemu_tool_pipe_and_status('qemu-io', args)
+
 def qemu_io_log(*args):
     result = qemu_io(*args)
     log(result, filters=[filter_testfiles, filter_qemu_io])
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v3 16/19] iotests/image-fleecing: add test case with bitmap
  2021-12-22 17:39 [PATCH v3 00/19] Make image fleecing more usable Vladimir Sementsov-Ogievskiy
                   ` (14 preceding siblings ...)
  2021-12-22 17:40 ` [PATCH v3 15/19] iotests.py: add qemu_io_pipe_and_status() Vladimir Sementsov-Ogievskiy
@ 2021-12-22 17:40 ` Vladimir Sementsov-Ogievskiy
  2021-12-22 17:40 ` [PATCH v3 17/19] block: blk_root(): return non-const pointer Vladimir Sementsov-Ogievskiy
                   ` (2 subsequent siblings)
  18 siblings, 0 replies; 38+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2021-12-22 17:40 UTC (permalink / raw)
  To: qemu-block
  Cc: qemu-devel, armbru, xiechanglong.d, wencongyang2, eblake, hreitz,
	kwolf, vsementsov, jsnow, nikita.lapshin

Note that reads zero areas (not dirty in the bitmap) fails, that's
correct.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 tests/qemu-iotests/tests/image-fleecing     | 32 ++++++--
 tests/qemu-iotests/tests/image-fleecing.out | 84 +++++++++++++++++++++
 2 files changed, 108 insertions(+), 8 deletions(-)

diff --git a/tests/qemu-iotests/tests/image-fleecing b/tests/qemu-iotests/tests/image-fleecing
index 2544782c28..279047b19c 100755
--- a/tests/qemu-iotests/tests/image-fleecing
+++ b/tests/qemu-iotests/tests/image-fleecing
@@ -23,7 +23,7 @@
 # Creator/Owner: John Snow <jsnow@redhat.com>
 
 import iotests
-from iotests import log, qemu_img, qemu_io, qemu_io_silent
+from iotests import log, qemu_img, qemu_io, qemu_io_silent, qemu_io_pipe_and_status
 
 iotests.script_initialize(
     supported_fmts=['qcow2', 'qcow', 'qed', 'vmdk', 'vhdx', 'raw'],
@@ -50,11 +50,15 @@ remainder = [('0xd5', '0x108000',  '32k'), # Right-end of partial-left [1]
              ('0xcd', '0x3ff0000', '64k')] # patterns[3]
 
 def do_test(use_cbw, use_fleecing_filter, base_img_path,
-            fleece_img_path, nbd_sock_path, vm):
+            fleece_img_path, nbd_sock_path, vm,
+            bitmap=False):
     log('--- Setting up images ---')
     log('')
 
     assert qemu_img('create', '-f', iotests.imgfmt, base_img_path, '64M') == 0
+    if bitmap:
+        assert qemu_img('bitmap', '--add', base_img_path, 'bitmap0') == 0
+
     if use_fleecing_filter:
         assert use_cbw
         assert qemu_img('create', '-f', 'raw', fleece_img_path, '64M') == 0
@@ -114,12 +118,17 @@ def do_test(use_cbw, use_fleecing_filter, base_img_path,
                 'source': src_node,
             }))
 
-        log(vm.qmp('blockdev-add', {
+        fl_cbw = {
             'driver': 'copy-before-write',
             'node-name': 'fl-cbw',
             'file': src_node,
             'target': 'fl-fleecing' if use_fleecing_filter else tmp_node
-        }))
+        }
+
+        if bitmap:
+            fl_cbw['bitmap'] = {'node': src_node, 'name': 'bitmap0'}
+
+        log(vm.qmp('blockdev-add', fl_cbw))
 
         log(vm.qmp('qom-set', path=qom_path, property='drive', value='fl-cbw'))
     else:
@@ -149,7 +158,9 @@ def do_test(use_cbw, use_fleecing_filter, base_img_path,
     for p in patterns + zeroes:
         cmd = 'read -P%s %s %s' % p
         log(cmd)
-        assert qemu_io_silent('-r', '-f', 'raw', '-c', cmd, nbd_uri) == 0
+        out, ret = qemu_io_pipe_and_status('-r', '-f', 'raw', '-c', cmd, nbd_uri)
+        if ret != 0:
+            print(out)
 
     log('')
     log('--- Testing COW ---')
@@ -167,7 +178,9 @@ def do_test(use_cbw, use_fleecing_filter, base_img_path,
     for p in patterns + zeroes:
         cmd = 'read -P%s %s %s' % p
         log(cmd)
-        assert qemu_io_silent('-r', '-f', 'raw', '-c', cmd, nbd_uri) == 0
+        out, ret = qemu_io_pipe_and_status('-r', '-f', 'raw', '-c', cmd, nbd_uri)
+        if ret != 0:
+            print(out)
 
     log('')
     log('--- Cleanup ---')
@@ -202,14 +215,14 @@ def do_test(use_cbw, use_fleecing_filter, base_img_path,
     log('Done')
 
 
-def test(use_cbw, use_fleecing_filter):
+def test(use_cbw, use_fleecing_filter, bitmap=False):
     with iotests.FilePath('base.img') as base_img_path, \
          iotests.FilePath('fleece.img') as fleece_img_path, \
          iotests.FilePath('nbd.sock',
                           base_dir=iotests.sock_dir) as nbd_sock_path, \
          iotests.VM() as vm:
         do_test(use_cbw, use_fleecing_filter, base_img_path,
-                fleece_img_path, nbd_sock_path, vm)
+                fleece_img_path, nbd_sock_path, vm, bitmap=bitmap)
 
 
 log('=== Test backup(sync=none) based fleecing ===\n')
@@ -220,3 +233,6 @@ test(True, False)
 
 log('=== Test fleecing-format based fleecing ===\n')
 test(True, True)
+
+log('=== Test fleecing-format based fleecing with bitmap ===\n')
+test(True, True, bitmap=True)
diff --git a/tests/qemu-iotests/tests/image-fleecing.out b/tests/qemu-iotests/tests/image-fleecing.out
index da0af93388..62e1c1fe42 100644
--- a/tests/qemu-iotests/tests/image-fleecing.out
+++ b/tests/qemu-iotests/tests/image-fleecing.out
@@ -190,6 +190,90 @@ read -P0 0x00f8000 32k
 read -P0 0x2010000 32k
 read -P0 0x3fe0000 64k
 
+--- Cleanup ---
+
+{"return": {}}
+{"return": {}}
+{"return": {}}
+{"return": {}}
+{"return": {}}
+
+--- Confirming writes ---
+
+read -P0xab 0 64k
+read -P0xad 0x00f8000 64k
+read -P0x1d 0x2008000 64k
+read -P0xea 0x3fe0000 64k
+read -P0xd5 0x108000 32k
+read -P0xdc 32M 32k
+read -P0xcd 0x3ff0000 64k
+
+Done
+=== Test fleecing-format based fleecing with bitmap ===
+
+--- Setting up images ---
+
+Done
+
+--- Launching VM ---
+
+Done
+
+--- Setting up Fleecing Graph ---
+
+{"return": {}}
+{"return": {}}
+{"return": {}}
+{"return": {}}
+
+--- Setting up NBD Export ---
+
+{"return": {}}
+{"return": {}}
+
+--- Sanity Check ---
+
+read -P0x5d 0 64k
+read -P0xd5 1M 64k
+read -P0xdc 32M 64k
+read -P0xcd 0x3ff0000 64k
+read -P0 0x00f8000 32k
+read failed: Invalid argument
+
+read -P0 0x2010000 32k
+read failed: Invalid argument
+
+read -P0 0x3fe0000 64k
+read failed: Invalid argument
+
+
+--- Testing COW ---
+
+write -P0xab 0 64k
+{"return": ""}
+write -P0xad 0x00f8000 64k
+{"return": ""}
+write -P0x1d 0x2008000 64k
+{"return": ""}
+write -P0xea 0x3fe0000 64k
+{"return": ""}
+
+--- Verifying Data ---
+
+read -P0x5d 0 64k
+read -P0xd5 1M 64k
+read -P0xdc 32M 64k
+read -P0xcd 0x3ff0000 64k
+read -P0 0x00f8000 32k
+read failed: Invalid argument
+
+read -P0 0x2010000 32k
+read failed: Invalid argument
+
+read -P0 0x3fe0000 64k
+read failed: Invalid argument
+
+
 --- Cleanup ---
 
 {"return": {}}
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v3 17/19] block: blk_root(): return non-const pointer
  2021-12-22 17:39 [PATCH v3 00/19] Make image fleecing more usable Vladimir Sementsov-Ogievskiy
                   ` (15 preceding siblings ...)
  2021-12-22 17:40 ` [PATCH v3 16/19] iotests/image-fleecing: add test case with bitmap Vladimir Sementsov-Ogievskiy
@ 2021-12-22 17:40 ` Vladimir Sementsov-Ogievskiy
  2021-12-22 17:40 ` [PATCH v3 18/19] qapi: backup: add immutable-source parameter Vladimir Sementsov-Ogievskiy
  2021-12-22 17:40 ` [PATCH v3 19/19] iotests/image-fleecing: test push backup with fleecing Vladimir Sementsov-Ogievskiy
  18 siblings, 0 replies; 38+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2021-12-22 17:40 UTC (permalink / raw)
  To: qemu-block
  Cc: qemu-devel, armbru, xiechanglong.d, wencongyang2, eblake, hreitz,
	kwolf, vsementsov, jsnow, nikita.lapshin

In the following patch we'll want to pass blk children to block-copy.
Const pointers are not enough. So, return non const pointer from
blk_root().

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 include/sysemu/block-backend.h | 2 +-
 block/block-backend.c          | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/sysemu/block-backend.h b/include/sysemu/block-backend.h
index e5e1524f06..904d70f49c 100644
--- a/include/sysemu/block-backend.h
+++ b/include/sysemu/block-backend.h
@@ -277,7 +277,7 @@ int coroutine_fn blk_co_copy_range(BlockBackend *blk_in, int64_t off_in,
                                    int64_t bytes, BdrvRequestFlags read_flags,
                                    BdrvRequestFlags write_flags);
 
-const BdrvChild *blk_root(BlockBackend *blk);
+BdrvChild *blk_root(BlockBackend *blk);
 
 int blk_make_empty(BlockBackend *blk, Error **errp);
 
diff --git a/block/block-backend.c b/block/block-backend.c
index 12ef80ea17..d994a0f096 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -2438,7 +2438,7 @@ int coroutine_fn blk_co_copy_range(BlockBackend *blk_in, int64_t off_in,
                               bytes, read_flags, write_flags);
 }
 
-const BdrvChild *blk_root(BlockBackend *blk)
+BdrvChild *blk_root(BlockBackend *blk)
 {
     return blk->root;
 }
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v3 18/19] qapi: backup: add immutable-source parameter
  2021-12-22 17:39 [PATCH v3 00/19] Make image fleecing more usable Vladimir Sementsov-Ogievskiy
                   ` (16 preceding siblings ...)
  2021-12-22 17:40 ` [PATCH v3 17/19] block: blk_root(): return non-const pointer Vladimir Sementsov-Ogievskiy
@ 2021-12-22 17:40 ` Vladimir Sementsov-Ogievskiy
  2021-12-22 17:40 ` [PATCH v3 19/19] iotests/image-fleecing: test push backup with fleecing Vladimir Sementsov-Ogievskiy
  18 siblings, 0 replies; 38+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2021-12-22 17:40 UTC (permalink / raw)
  To: qemu-block
  Cc: qemu-devel, armbru, xiechanglong.d, wencongyang2, eblake, hreitz,
	kwolf, vsementsov, jsnow, nikita.lapshin

We are on the way to implement internal-backup with fleecing scheme,
which includes backup job copying from fleecing block driver node
(which is target of copy-before-write filter) to final target of
backup. This job doesn't need own filter, as fleecing block driver node
is a kind of snapshot, it's immutable from reader point of view.

Let's add a parameter for backup to not insert filter but instead
unshare writes on source. This way backup job becomes a simple copying
process.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 qapi/block-core.json      | 11 ++++++-
 include/block/block_int.h |  1 +
 block/backup.c            | 61 +++++++++++++++++++++++++++++++++++----
 block/replication.c       |  2 +-
 blockdev.c                |  1 +
 5 files changed, 69 insertions(+), 7 deletions(-)

diff --git a/qapi/block-core.json b/qapi/block-core.json
index b47351dbac..6bac3b53fb 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -1436,6 +1436,15 @@
 #                    above node specified by @drive. If this option is not given,
 #                    a node name is autogenerated. (Since: 4.2)
 #
+# @immutable-source: If true, assume source is immutable, and don't insert filter
+#                    as no copy-before-write operations are needed. It will
+#                    fail if there are existing writers on source node.
+#                    Any attempt to add writer to source node during backup will
+#                    also fail. @filter-node-name must not be set.
+#                    If false, insert copy-before-write filter above source node
+#                    (see also @filter-node-name parameter).
+#                    Default is false. (Since 6.2)
+#
 # @x-perf: Performance options. (Since 6.0)
 #
 # Features:
@@ -1455,7 +1464,7 @@
             '*on-source-error': 'BlockdevOnError',
             '*on-target-error': 'BlockdevOnError',
             '*auto-finalize': 'bool', '*auto-dismiss': 'bool',
-            '*filter-node-name': 'str',
+            '*filter-node-name': 'str', '*immutable-source': 'bool',
             '*x-perf': { 'type': 'BackupPerf',
                          'features': [ 'unstable' ] } } }
 
diff --git a/include/block/block_int.h b/include/block/block_int.h
index f4c75e8ba9..efb85c41de 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -1324,6 +1324,7 @@ BlockJob *backup_job_create(const char *job_id, BlockDriverState *bs,
                             BitmapSyncMode bitmap_mode,
                             bool compress,
                             const char *filter_node_name,
+                            bool immutable_source,
                             BackupPerf *perf,
                             BlockdevOnError on_source_error,
                             BlockdevOnError on_target_error,
diff --git a/block/backup.c b/block/backup.c
index 21d5983779..9b4b35b21b 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -34,6 +34,14 @@ typedef struct BackupBlockJob {
     BlockDriverState *cbw;
     BlockDriverState *source_bs;
     BlockDriverState *target_bs;
+    BlockBackend *source_blk;
+    BlockBackend *target_blk;
+    /*
+     * Note that if backup runs with filter (immutable-source parameter is
+     * false), @cbw is set but @source_blk and @target_blk are NULL.
+     * Otherwise if backup runs without filter (immutable-source paramter is
+     * true), @cbw is NULL but @source_blk and @target_blk are set.
+     */
 
     BdrvDirtyBitmap *sync_bitmap;
 
@@ -102,7 +110,17 @@ static void backup_clean(Job *job)
 {
     BackupBlockJob *s = container_of(job, BackupBlockJob, common.job);
     block_job_remove_all_bdrv(&s->common);
-    bdrv_cbw_drop(s->cbw);
+    if (s->cbw) {
+        assert(!s->source_blk && !s->target_blk);
+        bdrv_cbw_drop(s->cbw);
+    } else {
+        block_copy_state_free(s->bcs);
+        s->bcs = NULL;
+        blk_unref(s->source_blk);
+        s->source_blk = NULL;
+        blk_unref(s->target_blk);
+        s->target_blk = NULL;
+    }
 }
 
 void backup_do_checkpoint(BlockJob *job, Error **errp)
@@ -357,6 +375,7 @@ BlockJob *backup_job_create(const char *job_id, BlockDriverState *bs,
                   BitmapSyncMode bitmap_mode,
                   bool compress,
                   const char *filter_node_name,
+                  bool immutable_source,
                   BackupPerf *perf,
                   BlockdevOnError on_source_error,
                   BlockdevOnError on_target_error,
@@ -369,6 +388,7 @@ BlockJob *backup_job_create(const char *job_id, BlockDriverState *bs,
     int64_t cluster_size;
     BlockDriverState *cbw = NULL;
     BlockCopyState *bcs = NULL;
+    BlockBackend *source_blk = NULL, *target_blk = NULL;
 
     assert(bs);
     assert(target);
@@ -377,6 +397,12 @@ BlockJob *backup_job_create(const char *job_id, BlockDriverState *bs,
     assert(sync_mode != MIRROR_SYNC_MODE_INCREMENTAL);
     assert(sync_bitmap || sync_mode != MIRROR_SYNC_MODE_BITMAP);
 
+    if (immutable_source && filter_node_name) {
+        error_setg(errp, "immutable-source and filter-node-name should not "
+                   "be set simultaneously");
+        return NULL;
+    }
+
     if (bs == target) {
         error_setg(errp, "Source and target cannot be the same");
         return NULL;
@@ -451,9 +477,30 @@ BlockJob *backup_job_create(const char *job_id, BlockDriverState *bs,
         goto error;
     }
 
-    cbw = bdrv_cbw_append(bs, target, filter_node_name, &bcs, errp);
-    if (!cbw) {
-        goto error;
+    if (immutable_source) {
+        source_blk = blk_new_with_bs(bs, BLK_PERM_CONSISTENT_READ,
+                                        BLK_PERM_WRITE_UNCHANGED |
+                                        BLK_PERM_CONSISTENT_READ, errp);
+        if (!source_blk) {
+            goto error;
+        }
+
+        target_blk  = blk_new_with_bs(target, BLK_PERM_WRITE,
+                                      BLK_PERM_CONSISTENT_READ, errp);
+        if (!target_blk) {
+            goto error;
+        }
+
+        bcs = block_copy_state_new(blk_root(source_blk), blk_root(target_blk),
+                                   NULL, false, errp);
+        if (!bcs) {
+            goto error;
+        }
+    } else {
+        cbw = bdrv_cbw_append(bs, target, filter_node_name, &bcs, errp);
+        if (!cbw) {
+            goto error;
+        }
     }
 
     cluster_size = block_copy_cluster_size(bcs);
@@ -465,7 +512,7 @@ BlockJob *backup_job_create(const char *job_id, BlockDriverState *bs,
     }
 
     /* job->len is fixed, so we can't allow resize */
-    job = block_job_create(job_id, &backup_job_driver, txn, cbw,
+    job = block_job_create(job_id, &backup_job_driver, txn, cbw ?: bs,
                            0, BLK_PERM_ALL,
                            speed, creation_flags, cb, opaque, errp);
     if (!job) {
@@ -475,6 +522,8 @@ BlockJob *backup_job_create(const char *job_id, BlockDriverState *bs,
     job->cbw = cbw;
     job->source_bs = bs;
     job->target_bs = target;
+    job->source_blk = source_blk;
+    job->target_blk = target_blk;
     job->on_source_error = on_source_error;
     job->on_target_error = on_target_error;
     job->sync_mode = sync_mode;
@@ -502,6 +551,8 @@ BlockJob *backup_job_create(const char *job_id, BlockDriverState *bs,
     if (cbw) {
         bdrv_cbw_drop(cbw);
     }
+    blk_unref(source_blk);
+    blk_unref(target_blk);
 
     return NULL;
 }
diff --git a/block/replication.c b/block/replication.c
index 55c8f894aa..c6c4d3af85 100644
--- a/block/replication.c
+++ b/block/replication.c
@@ -590,7 +590,7 @@ static void replication_start(ReplicationState *rs, ReplicationMode mode,
         s->backup_job = backup_job_create(
                                 NULL, s->secondary_disk->bs, s->hidden_disk->bs,
                                 0, MIRROR_SYNC_MODE_NONE, NULL, 0, false, NULL,
-                                &perf,
+                                false, &perf,
                                 BLOCKDEV_ON_ERROR_REPORT,
                                 BLOCKDEV_ON_ERROR_REPORT, JOB_INTERNAL,
                                 backup_job_completed, bs, NULL, &local_err);
diff --git a/blockdev.c b/blockdev.c
index 0eb2823b1b..b6effd481d 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -2900,6 +2900,7 @@ static BlockJob *do_backup_common(BackupCommon *backup,
                             backup->sync, bmap, backup->bitmap_mode,
                             backup->compress,
                             backup->filter_node_name,
+                            backup->immutable_source,
                             &perf,
                             backup->on_source_error,
                             backup->on_target_error,
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v3 19/19] iotests/image-fleecing: test push backup with fleecing
  2021-12-22 17:39 [PATCH v3 00/19] Make image fleecing more usable Vladimir Sementsov-Ogievskiy
                   ` (17 preceding siblings ...)
  2021-12-22 17:40 ` [PATCH v3 18/19] qapi: backup: add immutable-source parameter Vladimir Sementsov-Ogievskiy
@ 2021-12-22 17:40 ` Vladimir Sementsov-Ogievskiy
  18 siblings, 0 replies; 38+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2021-12-22 17:40 UTC (permalink / raw)
  To: qemu-block
  Cc: qemu-devel, armbru, xiechanglong.d, wencongyang2, eblake, hreitz,
	kwolf, vsementsov, jsnow, nikita.lapshin

Add test for push backup with fleecing:

 - start fleecing with copy-before-write filter
 - start a backup job from temporary fleecing node to actual backup
   target

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 tests/qemu-iotests/tests/image-fleecing     | 121 ++++++++++++++------
 tests/qemu-iotests/tests/image-fleecing.out |  63 ++++++++++
 2 files changed, 152 insertions(+), 32 deletions(-)

diff --git a/tests/qemu-iotests/tests/image-fleecing b/tests/qemu-iotests/tests/image-fleecing
index 279047b19c..c72cfc70f2 100755
--- a/tests/qemu-iotests/tests/image-fleecing
+++ b/tests/qemu-iotests/tests/image-fleecing
@@ -49,9 +49,15 @@ remainder = [('0xd5', '0x108000',  '32k'), # Right-end of partial-left [1]
              ('0xdc', '32M',       '32k'), # Left-end of partial-right [2]
              ('0xcd', '0x3ff0000', '64k')] # patterns[3]
 
-def do_test(use_cbw, use_fleecing_filter, base_img_path,
-            fleece_img_path, nbd_sock_path, vm,
+def do_test(vm, use_cbw, use_fleecing_filter, base_img_path,
+            fleece_img_path, nbd_sock_path=None,
+            target_img_path=None,
             bitmap=False):
+    push_backup = target_img_path is not None
+    assert (nbd_sock_path is not None) != push_backup
+    if push_backup:
+        assert use_cbw
+
     log('--- Setting up images ---')
     log('')
 
@@ -65,6 +71,9 @@ def do_test(use_cbw, use_fleecing_filter, base_img_path,
     else:
         assert qemu_img('create', '-f', 'qcow2', fleece_img_path, '64M') == 0
 
+    if push_backup:
+        assert qemu_img('create', '-f', 'qcow2', target_img_path, '64M') == 0
+
     for p in patterns:
         qemu_io('-f', iotests.imgfmt,
                 '-c', 'write -P%s %s %s' % p, base_img_path)
@@ -140,27 +149,45 @@ def do_test(use_cbw, use_fleecing_filter, base_img_path,
 
     export_node = 'fl-fleecing' if use_fleecing_filter else tmp_node
 
-    log('')
-    log('--- Setting up NBD Export ---')
-    log('')
+    if push_backup:
+        log('')
+        log('--- Starting actual backup ---')
+        log('')
 
-    nbd_uri = 'nbd+unix:///%s?socket=%s' % (export_node, nbd_sock_path)
-    log(vm.qmp('nbd-server-start',
-               {'addr': {'type': 'unix',
-                         'data': {'path': nbd_sock_path}}}))
+        log(vm.qmp('blockdev-add', **{
+            'driver': iotests.imgfmt,
+            'node-name': 'target',
+            'file': {
+                'driver': 'file',
+                'filename': target_img_path
+            }
+        }))
+        log(vm.qmp('blockdev-backup', device=export_node,
+                   sync='full', target='target',
+                   immutable_source=True,
+                   job_id='push-backup', speed=1))
+    else:
+        log('')
+        log('--- Setting up NBD Export ---')
+        log('')
 
-    log(vm.qmp('nbd-server-add', device=export_node))
+        nbd_uri = 'nbd+unix:///%s?socket=%s' % (export_node, nbd_sock_path)
+        log(vm.qmp('nbd-server-start',
+                   {'addr': { 'type': 'unix',
+                              'data': { 'path': nbd_sock_path } } }))
 
-    log('')
-    log('--- Sanity Check ---')
-    log('')
+        log(vm.qmp('nbd-server-add', device=export_node))
 
-    for p in patterns + zeroes:
-        cmd = 'read -P%s %s %s' % p
-        log(cmd)
-        out, ret = qemu_io_pipe_and_status('-r', '-f', 'raw', '-c', cmd, nbd_uri)
-        if ret != 0:
-            print(out)
+        log('')
+        log('--- Sanity Check ---')
+        log('')
+
+        for p in patterns + zeroes:
+            cmd = 'read -P%s %s %s' % p
+            log(cmd)
+            out, ret = qemu_io_pipe_and_status('-r', '-f', 'raw', '-c', cmd, nbd_uri)
+            if ret != 0:
+                print(out)
 
     log('')
     log('--- Testing COW ---')
@@ -171,6 +198,20 @@ def do_test(use_cbw, use_fleecing_filter, base_img_path,
         log(cmd)
         log(vm.hmp_qemu_io(qom_path, cmd, qdev=True))
 
+    if push_backup:
+        # Check that previous operations were done during backup, not after
+        result = vm.qmp('query-block-jobs')
+        if len(result['return']) != 1:
+            log('Backup finished too fast, COW is not tested')
+
+        result = vm.qmp('block-job-set-speed', device='push-backup', speed=0)
+        assert result == {'return': {}}
+
+        log(vm.event_wait(name='BLOCK_JOB_COMPLETED',
+                          match={'data': {'device': 'push-backup'}}),
+                          filters=[iotests.filter_qmp_event])
+        log(vm.qmp('blockdev-del', node_name='target'))
+
     log('')
     log('--- Verifying Data ---')
     log('')
@@ -178,15 +219,19 @@ def do_test(use_cbw, use_fleecing_filter, base_img_path,
     for p in patterns + zeroes:
         cmd = 'read -P%s %s %s' % p
         log(cmd)
-        out, ret = qemu_io_pipe_and_status('-r', '-f', 'raw', '-c', cmd, nbd_uri)
-        if ret != 0:
-            print(out)
+        if push_backup:
+            assert qemu_io_silent('-r', '-c', cmd, target_img_path) == 0
+        else:
+            out, ret = qemu_io_pipe_and_status('-r', '-f', 'raw', '-c', cmd, nbd_uri)
+            if ret != 0:
+                print(out)
 
     log('')
     log('--- Cleanup ---')
     log('')
 
-    log(vm.qmp('nbd-server-stop'))
+    if not push_backup:
+        log(vm.qmp('nbd-server-stop'))
 
     if use_cbw:
         log(vm.qmp('qom-set', path=qom_path, property='drive', value=src_node))
@@ -215,24 +260,36 @@ def do_test(use_cbw, use_fleecing_filter, base_img_path,
     log('Done')
 
 
-def test(use_cbw, use_fleecing_filter, bitmap=False):
+def test(use_cbw, use_fleecing_filter,
+         nbd_sock_path=None, target_img_path=None, bitmap=False):
     with iotests.FilePath('base.img') as base_img_path, \
          iotests.FilePath('fleece.img') as fleece_img_path, \
-         iotests.FilePath('nbd.sock',
-                          base_dir=iotests.sock_dir) as nbd_sock_path, \
          iotests.VM() as vm:
-        do_test(use_cbw, use_fleecing_filter, base_img_path,
-                fleece_img_path, nbd_sock_path, vm, bitmap=bitmap)
+        do_test(vm, use_cbw, use_fleecing_filter, base_img_path,
+                fleece_img_path, nbd_sock_path, target_img_path,
+                bitmap=bitmap)
+
+def test_pull(use_cbw, use_fleecing_filter, bitmap=False):
+    with iotests.FilePath('nbd.sock',
+                          base_dir=iotests.sock_dir) as nbd_sock_path:
+        test(use_cbw, use_fleecing_filter, nbd_sock_path, None, bitmap=bitmap)
+
+def test_push():
+    with iotests.FilePath('target.img') as target_img_path:
+        test(True, True, None, target_img_path)
 
 
 log('=== Test backup(sync=none) based fleecing ===\n')
-test(False, False)
+test_pull(False, False)
 
 log('=== Test cbw-filter based fleecing ===\n')
-test(True, False)
+test_pull(True, False)
 
 log('=== Test fleecing-format based fleecing ===\n')
-test(True, True)
+test_pull(True, True)
 
 log('=== Test fleecing-format based fleecing with bitmap ===\n')
-test(True, True, bitmap=True)
+test_pull(True, True, bitmap=True)
+
+log('=== Test push backup with fleecing ===\n')
+test_push()
diff --git a/tests/qemu-iotests/tests/image-fleecing.out b/tests/qemu-iotests/tests/image-fleecing.out
index 62e1c1fe42..acfc89ff0e 100644
--- a/tests/qemu-iotests/tests/image-fleecing.out
+++ b/tests/qemu-iotests/tests/image-fleecing.out
@@ -293,3 +293,66 @@ read -P0xdc 32M 32k
 read -P0xcd 0x3ff0000 64k
 
 Done
+=== Test push backup with fleecing ===
+
+--- Setting up images ---
+
+Done
+
+--- Launching VM ---
+
+Done
+
+--- Setting up Fleecing Graph ---
+
+{"return": {}}
+{"return": {}}
+{"return": {}}
+{"return": {}}
+
+--- Starting actual backup ---
+
+{"return": {}}
+{"return": {}}
+
+--- Testing COW ---
+
+write -P0xab 0 64k
+{"return": ""}
+write -P0xad 0x00f8000 64k
+{"return": ""}
+write -P0x1d 0x2008000 64k
+{"return": ""}
+write -P0xea 0x3fe0000 64k
+{"return": ""}
+{"data": {"device": "push-backup", "len": 67108864, "offset": 67108864, "speed": 0, "type": "backup"}, "event": "BLOCK_JOB_COMPLETED", "timestamp": {"microseconds": "USECS", "seconds": "SECS"}}
+{"return": {}}
+
+--- Verifying Data ---
+
+read -P0x5d 0 64k
+read -P0xd5 1M 64k
+read -P0xdc 32M 64k
+read -P0xcd 0x3ff0000 64k
+read -P0 0x00f8000 32k
+read -P0 0x2010000 32k
+read -P0 0x3fe0000 64k
+
+--- Cleanup ---
+
+{"return": {}}
+{"return": {}}
+{"return": {}}
+{"return": {}}
+
+--- Confirming writes ---
+
+read -P0xab 0 64k
+read -P0xad 0x00f8000 64k
+read -P0x1d 0x2008000 64k
+read -P0xea 0x3fe0000 64k
+read -P0xd5 0x108000 32k
+read -P0xdc 32M 32k
+read -P0xcd 0x3ff0000 64k
+
+Done
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* Re: [PATCH v3 01/19] block/block-copy: move copy_bitmap initialization to block_copy_state_new()
  2021-12-22 17:40 ` [PATCH v3 01/19] block/block-copy: move copy_bitmap initialization to block_copy_state_new() Vladimir Sementsov-Ogievskiy
@ 2022-01-14 16:54   ` Hanna Reitz
  0 siblings, 0 replies; 38+ messages in thread
From: Hanna Reitz @ 2022-01-14 16:54 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-block
  Cc: kwolf, wencongyang2, xiechanglong.d, qemu-devel, armbru, jsnow,
	nikita.lapshin, eblake

On 22.12.21 18:40, Vladimir Sementsov-Ogievskiy wrote:
> We are going to complicate bitmap initialization in the further
> commit. And in future, backup job will be able to work without filter
> (when source is immutable), so we'll need same bitmap initialization in
> copy-before-write filter and in backup job. So, it's reasonable to do
> it in block-copy.
>
> Note that for now cbw_open() is the only caller of
> block_copy_state_new().
>
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>   block/block-copy.c        | 1 +
>   block/copy-before-write.c | 4 ----
>   2 files changed, 1 insertion(+), 4 deletions(-)

Reviewed-by: Hanna Reitz <hreitz@redhat.com>



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v3 02/19] block/dirty-bitmap: bdrv_merge_dirty_bitmap(): add return value
  2021-12-22 17:40 ` [PATCH v3 02/19] block/dirty-bitmap: bdrv_merge_dirty_bitmap(): add return value Vladimir Sementsov-Ogievskiy
@ 2022-01-14 16:55   ` Hanna Reitz
  0 siblings, 0 replies; 38+ messages in thread
From: Hanna Reitz @ 2022-01-14 16:55 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-block
  Cc: kwolf, wencongyang2, xiechanglong.d, qemu-devel, armbru, jsnow,
	nikita.lapshin, eblake

On 22.12.21 18:40, Vladimir Sementsov-Ogievskiy wrote:
> That simplifies handling failure in existing code and in further new
> usage of bdrv_merge_dirty_bitmap().
>
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>   include/block/dirty-bitmap.h    | 2 +-
>   block/dirty-bitmap.c            | 9 +++++++--
>   block/monitor/bitmap-qmp-cmds.c | 5 +----
>   3 files changed, 9 insertions(+), 7 deletions(-)

Reviewed-by: Hanna Reitz <hreitz@redhat.com>



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v3 03/19] block/block-copy: block_copy_state_new(): add bitmap parameter
  2021-12-22 17:40 ` [PATCH v3 03/19] block/block-copy: block_copy_state_new(): add bitmap parameter Vladimir Sementsov-Ogievskiy
@ 2022-01-14 16:58   ` Hanna Reitz
  0 siblings, 0 replies; 38+ messages in thread
From: Hanna Reitz @ 2022-01-14 16:58 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-block
  Cc: kwolf, wencongyang2, xiechanglong.d, qemu-devel, armbru, jsnow,
	nikita.lapshin, eblake

On 22.12.21 18:40, Vladimir Sementsov-Ogievskiy wrote:
> This will be used in the following commit to bring "incremental" mode
> to copy-before-write filter.
>
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>   include/block/block-copy.h |  2 +-
>   block/block-copy.c         | 14 ++++++++++++--
>   block/copy-before-write.c  |  2 +-
>   3 files changed, 14 insertions(+), 4 deletions(-)
>
> diff --git a/include/block/block-copy.h b/include/block/block-copy.h
> index 99370fa38b..8da4cec1b6 100644
> --- a/include/block/block-copy.h
> +++ b/include/block/block-copy.h
> @@ -25,7 +25,7 @@ typedef struct BlockCopyState BlockCopyState;
>   typedef struct BlockCopyCallState BlockCopyCallState;
>   
>   BlockCopyState *block_copy_state_new(BdrvChild *source, BdrvChild *target,
> -                                     Error **errp);
> +                                     BdrvDirtyBitmap *bitmap, Error **errp);
>   
>   /* Function should be called prior any actual copy request */
>   void block_copy_set_copy_opts(BlockCopyState *s, bool use_copy_range,
> diff --git a/block/block-copy.c b/block/block-copy.c
> index abda7a80bd..f6345e3a4c 100644
> --- a/block/block-copy.c
> +++ b/block/block-copy.c
> @@ -384,8 +384,9 @@ static int64_t block_copy_calculate_cluster_size(BlockDriverState *target,
>   }
>   
>   BlockCopyState *block_copy_state_new(BdrvChild *source, BdrvChild *target,
> -                                     Error **errp)
> +                                     BdrvDirtyBitmap *bitmap, Error **errp)

Could be `const` to signal that we won’t be using this bitmap for the 
BCS, but given our inconsistent usage of `const`, it isn’t anything 
that’d be important.

>   {
> +    ERRP_GUARD();
>       BlockCopyState *s;
>       int64_t cluster_size;
>       BdrvDirtyBitmap *copy_bitmap;
> @@ -402,7 +403,16 @@ BlockCopyState *block_copy_state_new(BdrvChild *source, BdrvChild *target,
>           return NULL;
>       }
>       bdrv_disable_dirty_bitmap(copy_bitmap);
> -    bdrv_set_dirty_bitmap(copy_bitmap, 0, bdrv_dirty_bitmap_size(copy_bitmap));
> +    if (bitmap) {
> +        if (!bdrv_merge_dirty_bitmap(copy_bitmap, bitmap, NULL, errp)) {
> +            error_prepend(errp, "Failed to merge bitmap '%s' to internal "
> +                          "copy-bitmap: ", bdrv_dirty_bitmap_name(bitmap));
> +            return NULL;

What might be Should we free `copy_bitmap` here?

(Apart from this, looks good to me!)

> +        }
> +    } else {
> +        bdrv_set_dirty_bitmap(copy_bitmap, 0,
> +                              bdrv_dirty_bitmap_size(copy_bitmap));
> +    }
>   
>       /*
>        * If source is in backing chain of target assume that target is going to be
> diff --git a/block/copy-before-write.c b/block/copy-before-write.c
> index 5bdaf0a9d9..799223e3fb 100644
> --- a/block/copy-before-write.c
> +++ b/block/copy-before-write.c
> @@ -170,7 +170,7 @@ static int cbw_open(BlockDriverState *bs, QDict *options, int flags,
>               ((BDRV_REQ_FUA | BDRV_REQ_MAY_UNMAP | BDRV_REQ_NO_FALLBACK) &
>                bs->file->bs->supported_zero_flags);
>   
> -    s->bcs = block_copy_state_new(bs->file, s->target, errp);
> +    s->bcs = block_copy_state_new(bs->file, s->target, NULL, errp);
>       if (!s->bcs) {
>           error_prepend(errp, "Cannot create block-copy-state: ");
>           return -EINVAL;



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v3 04/19] block/copy-before-write: add bitmap open parameter
  2021-12-22 17:40 ` [PATCH v3 04/19] block/copy-before-write: add bitmap open parameter Vladimir Sementsov-Ogievskiy
@ 2022-01-14 17:47   ` Hanna Reitz
  2022-01-17 11:36     ` Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 38+ messages in thread
From: Hanna Reitz @ 2022-01-14 17:47 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-block
  Cc: kwolf, wencongyang2, xiechanglong.d, qemu-devel, armbru, jsnow,
	nikita.lapshin, eblake

On 22.12.21 18:40, Vladimir Sementsov-Ogievskiy wrote:
> This brings "incremental" mode to copy-before-write filter: user can
> specify bitmap so that filter will copy only "dirty" areas.
>
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>   qapi/block-core.json      | 10 +++++++++-
>   block/copy-before-write.c | 30 +++++++++++++++++++++++++++++-
>   2 files changed, 38 insertions(+), 2 deletions(-)
>
> diff --git a/qapi/block-core.json b/qapi/block-core.json
> index 1d3dd9cb48..6904daeacf 100644
> --- a/qapi/block-core.json
> +++ b/qapi/block-core.json
> @@ -4167,11 +4167,19 @@
>   #
>   # @target: The target for copy-before-write operations.
>   #
> +# @bitmap: If specified, copy-before-write filter will do
> +#          copy-before-write operations only for dirty regions of the
> +#          bitmap. Bitmap size must be equal to length of file and
> +#          target child of the filter. Note also, that bitmap is used
> +#          only to initialize internal bitmap of the process, so further
> +#          modifications (or removing) of specified bitmap doesn't
> +#          influence the filter.
> +#
>   # Since: 6.2
>   ##
>   { 'struct': 'BlockdevOptionsCbw',
>     'base': 'BlockdevOptionsGenericFormat',
> -  'data': { 'target': 'BlockdevRef' } }
> +  'data': { 'target': 'BlockdevRef', '*bitmap': 'BlockDirtyBitmap' } }
>   
>   ##
>   # @BlockdevOptions:
> diff --git a/block/copy-before-write.c b/block/copy-before-write.c
> index 799223e3fb..4cd90d22df 100644
> --- a/block/copy-before-write.c
> +++ b/block/copy-before-write.c
> @@ -149,6 +149,7 @@ static int cbw_open(BlockDriverState *bs, QDict *options, int flags,
>                       Error **errp)
>   {
>       BDRVCopyBeforeWriteState *s = bs->opaque;
> +    BdrvDirtyBitmap *bitmap = NULL;
>   
>       bs->file = bdrv_open_child(NULL, options, "file", bs, &child_of_bds,
>                                  BDRV_CHILD_FILTERED | BDRV_CHILD_PRIMARY,
> @@ -163,6 +164,33 @@ static int cbw_open(BlockDriverState *bs, QDict *options, int flags,
>           return -EINVAL;
>       }
>   
> +    if (qdict_haskey(options, "bitmap.node") ||
> +        qdict_haskey(options, "bitmap.name"))
> +    {
> +        const char *bitmap_node, *bitmap_name;
> +
> +        if (!qdict_haskey(options, "bitmap.node")) {
> +            error_setg(errp, "bitmap.node is not specified");
> +            return -EINVAL;
> +        }
> +
> +        if (!qdict_haskey(options, "bitmap.name")) {
> +            error_setg(errp, "bitmap.name is not specified");
> +            return -EINVAL;
> +        }
> +
> +        bitmap_node = qdict_get_str(options, "bitmap.node");
> +        bitmap_name = qdict_get_str(options, "bitmap.name");
> +        qdict_del(options, "bitmap.node");
> +        qdict_del(options, "bitmap.name");

I’m not really a fan of this manual parsing, but I can see nothing 
technically wrong with it.

Still, what do you think of using an input visitor, like:

QDict *bitmap_qdict;

qdict_extract_subqdict(options, &bitmap_qdict, "bitmap.");
if (qdict_size(bitmap_qdict) > 0) {
     BlockDirtyBitmap *bmp_param;
     Visitor *v = qobject_input_visitor_new_flat_confused(bitmap_qdict, 
errp);
     visit_type_BlockDirtyBitmap(v, NULL, &bmp_param, errp);
     visit_free(v);
     qobject_unref(bitmap_qdict);

     bitmap = block_dirty_bitmap_lookup(bmp_param->node, 
bmp_param->name, ...);
     qapi_free_BlockDirtyBitmap(bmp_param);
}

(+ error handling, which is why perhaps the first block should be put 
into a separate function cbw_get_bitmap_param() to simplify error handling)

> +
> +        bitmap = block_dirty_bitmap_lookup(bitmap_node, bitmap_name, NULL,
> +                                           errp);
> +        if (!bitmap) {
> +            return -EINVAL;
> +        }
> +    }
> +
>       bs->total_sectors = bs->file->bs->total_sectors;
>       bs->supported_write_flags = BDRV_REQ_WRITE_UNCHANGED |
>               (BDRV_REQ_FUA & bs->file->bs->supported_write_flags);
> @@ -170,7 +198,7 @@ static int cbw_open(BlockDriverState *bs, QDict *options, int flags,
>               ((BDRV_REQ_FUA | BDRV_REQ_MAY_UNMAP | BDRV_REQ_NO_FALLBACK) &
>                bs->file->bs->supported_zero_flags);
>   
> -    s->bcs = block_copy_state_new(bs->file, s->target, NULL, errp);
> +    s->bcs = block_copy_state_new(bs->file, s->target, bitmap, errp);
>       if (!s->bcs) {
>           error_prepend(errp, "Cannot create block-copy-state: ");
>           return -EINVAL;



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v3 05/19] block/block-copy: add block_copy_reset()
  2021-12-22 17:40 ` [PATCH v3 05/19] block/block-copy: add block_copy_reset() Vladimir Sementsov-Ogievskiy
@ 2022-01-14 17:51   ` Hanna Reitz
  0 siblings, 0 replies; 38+ messages in thread
From: Hanna Reitz @ 2022-01-14 17:51 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-block
  Cc: kwolf, wencongyang2, xiechanglong.d, qemu-devel, armbru, jsnow,
	nikita.lapshin, eblake

On 22.12.21 18:40, Vladimir Sementsov-Ogievskiy wrote:
> Split block_copy_reset() out of block_copy_reset_unallocated() to be
> used separately later.
>
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>   include/block/block-copy.h |  1 +
>   block/block-copy.c         | 21 +++++++++++++--------
>   2 files changed, 14 insertions(+), 8 deletions(-)

Reviewed-by: Hanna Reitz <hreitz@redhat.com>



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v3 06/19] block: intoduce reqlist
  2021-12-22 17:40 ` [PATCH v3 06/19] block: intoduce reqlist Vladimir Sementsov-Ogievskiy
@ 2022-01-14 18:20   ` Hanna Reitz
  0 siblings, 0 replies; 38+ messages in thread
From: Hanna Reitz @ 2022-01-14 18:20 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-block
  Cc: kwolf, wencongyang2, xiechanglong.d, qemu-devel, armbru, jsnow,
	nikita.lapshin, eblake

On 22.12.21 18:40, Vladimir Sementsov-Ogievskiy wrote:
> Split intersecting-requests functionality out of block-copy to be
> reused in copy-before-write filter.
>
> Note: while being here, fix tiny typo in MAINTAINERS.
>
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>   include/block/reqlist.h |  67 +++++++++++++++++++++++
>   block/block-copy.c      | 116 +++++++++++++---------------------------
>   block/reqlist.c         |  76 ++++++++++++++++++++++++++
>   MAINTAINERS             |   4 +-
>   block/meson.build       |   1 +
>   5 files changed, 184 insertions(+), 80 deletions(-)
>   create mode 100644 include/block/reqlist.h
>   create mode 100644 block/reqlist.c

Looks good to me, this split makes sense.

I have just minor comments (50 % about pre-existing things) below.

> diff --git a/include/block/reqlist.h b/include/block/reqlist.h
> new file mode 100644
> index 0000000000..b904d80216
> --- /dev/null
> +++ b/include/block/reqlist.h
> @@ -0,0 +1,67 @@
> +/*
> + * reqlist API
> + *
> + * Copyright (C) 2013 Proxmox Server Solutions
> + * Copyright (c) 2021 Virtuozzo International GmbH.
> + *
> + * Authors:
> + *  Dietmar Maurer (dietmar@proxmox.com)
> + *  Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> + * See the COPYING file in the top-level directory.
> + */
> +
> +#ifndef REQLIST_H
> +#define REQLIST_H
> +
> +#include "qemu/coroutine.h"
> +
> +/*
> + * The API is not thread-safe and shouldn't be. The struct is public to be part
> + * of other structures and protected by third-party locks, see
> + * block/block-copy.c for example.
> + */
> +
> +typedef struct BlockReq {
> +    int64_t offset;
> +    int64_t bytes;
> +
> +    CoQueue wait_queue; /* coroutines blocked on this req */
> +    QLIST_ENTRY(BlockReq) list;
> +} BlockReq;
> +
> +typedef QLIST_HEAD(, BlockReq) BlockReqList;
> +
> +/*
> + * Initialize new request and add it to the list. Caller should be sure that

I’d say s/should/must/, because that is guarded by an assertion.

> + * there are no conflicting requests in the list.
> + */
> +void reqlist_init_req(BlockReqList *reqs, BlockReq *req, int64_t offset,
> +                      int64_t bytes);
> +/* Search for request in the list intersecting with @offset/@bytes area. */
> +BlockReq *reqlist_find_conflict(BlockReqList *reqs, int64_t offset,
> +                                int64_t bytes);
> +
> +/*
> + * If there are no intersecting requests return false. Otherwise, wait for the
> + * first found intersecting request to finish and return true.
> + *
> + * @lock is passed to qemu_co_queue_wait()
> + * False return value proves that lock was NOT released.

I’d say “was released at no point” instead, because when first reading 
this I understood it to mean that lock simply is locked when this 
function returns, and so I thought that this implies that when `true` is 
returned, the lock is released (and remains released).

> + */
> +bool coroutine_fn reqlist_wait_one(BlockReqList *reqs, int64_t offset,
> +                                   int64_t bytes, CoMutex *lock);
> +
> +/*
> + * Shrink request and wake all waiting coroutines (may be some of them are not

s/may be/maybe/

> + * intersecting with shrunk request).
> + */
> +void coroutine_fn reqlist_shrink_req(BlockReq *req, int64_t new_bytes);
> +
> +/*
> + * Remove request and wake all waiting coroutines. Do not release any memory.
> + */
> +void coroutine_fn reqlist_remove_req(BlockReq *req);
> +
> +#endif /* REQLIST_H */

> diff --git a/block/reqlist.c b/block/reqlist.c
> new file mode 100644
> index 0000000000..5e320ba649
> --- /dev/null
> +++ b/block/reqlist.c
> @@ -0,0 +1,76 @@

[...]

> +BlockReq *reqlist_find_conflict(BlockReqList *reqs, int64_t offset,
> +                                int64_t bytes)
> +{
> +    BlockReq *r;
> +
> +    QLIST_FOREACH(r, reqs, list) {
> +        if (offset + bytes > r->offset && offset < r->offset + r->bytes) {

(Late, I know, the old code was exactly this, but:) Why not use 
ranges_overlap()?

> +            return r;
> +        }
> +    }
> +
> +    return NULL;
> +}



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v3 07/19] block/dirty-bitmap: introduce bdrv_dirty_bitmap_status()
  2021-12-22 17:40 ` [PATCH v3 07/19] block/dirty-bitmap: introduce bdrv_dirty_bitmap_status() Vladimir Sementsov-Ogievskiy
@ 2022-01-17 10:06   ` Nikta Lapshin
  2022-01-17 12:02     ` Vladimir Sementsov-Ogievskiy
  2022-01-18 13:31   ` Hanna Reitz
  1 sibling, 1 reply; 38+ messages in thread
From: Nikta Lapshin @ 2022-01-17 10:06 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-block
  Cc: qemu-devel, armbru, xiechanglong.d, wencongyang2, eblake, hreitz,
	kwolf, jsnow

[-- Attachment #1: Type: text/plain, Size: 4399 bytes --]

On 12/22/21 20:40, Vladimir Sementsov-Ogievskiy wrote:

> Add a convenient function similar with bdrv_block_status() to get
> status of dirty bitmap.
>
> Signed-off-by: Vladimir Sementsov-Ogievskiy<vsementsov@virtuozzo.com>
> ---
>   include/block/dirty-bitmap.h |  2 ++
>   include/qemu/hbitmap.h       | 11 +++++++++++
>   block/dirty-bitmap.c         |  6 ++++++
>   util/hbitmap.c               | 36 ++++++++++++++++++++++++++++++++++++
>   4 files changed, 55 insertions(+)
>
> diff --git a/include/block/dirty-bitmap.h b/include/block/dirty-bitmap.h
> index f95d350b70..2ae7dc3d1d 100644
> --- a/include/block/dirty-bitmap.h
> +++ b/include/block/dirty-bitmap.h
> @@ -115,6 +115,8 @@ int64_t bdrv_dirty_bitmap_next_zero(BdrvDirtyBitmap *bitmap, int64_t offset,
>   bool bdrv_dirty_bitmap_next_dirty_area(BdrvDirtyBitmap *bitmap,
>           int64_t start, int64_t end, int64_t max_dirty_count,
>           int64_t *dirty_start, int64_t *dirty_count);
> +void bdrv_dirty_bitmap_status(BdrvDirtyBitmap *bitmap, int64_t offset,
> +                              int64_t bytes, bool *is_dirty, int64_t *count);
>   BdrvDirtyBitmap *bdrv_reclaim_dirty_bitmap_locked(BdrvDirtyBitmap *bitmap,
>                                                     Error **errp);
>   
> diff --git a/include/qemu/hbitmap.h b/include/qemu/hbitmap.h
> index 5e71b6d6f7..845fda12db 100644
> --- a/include/qemu/hbitmap.h
> +++ b/include/qemu/hbitmap.h
> @@ -340,6 +340,17 @@ bool hbitmap_next_dirty_area(const HBitmap *hb, int64_t start, int64_t end,
>                                int64_t max_dirty_count,
>                                int64_t *dirty_start, int64_t *dirty_count);
>   
> +/*
> + * bdrv_dirty_bitmap_status:
> + * @hb: The HBitmap to operate on
> + * @start: the offset to start from
> + * @end: end of requested area
> + * @is_dirty: is bitmap dirty at @offset
> + * @pnum: how many bits has same value starting from @offset
> + */
> +void hbitmap_status(const HBitmap *hb, int64_t offset, int64_t bytes,
> +                    bool *is_dirty, int64_t *pnum);
> +

I think description should be changed, there is no start and no end
arguments in function.

>   /**
>    * hbitmap_iter_next:
>    * @hbi: HBitmapIter to operate on.
> diff --git a/block/dirty-bitmap.c b/block/dirty-bitmap.c
> index 94a0276833..e4a836749a 100644
> --- a/block/dirty-bitmap.c
> +++ b/block/dirty-bitmap.c
> @@ -875,6 +875,12 @@ bool bdrv_dirty_bitmap_next_dirty_area(BdrvDirtyBitmap *bitmap,
>                                      dirty_start, dirty_count);
>   }
>   
> +void bdrv_dirty_bitmap_status(BdrvDirtyBitmap *bitmap, int64_t offset,
> +                              int64_t bytes, bool *is_dirty, int64_t *count)
> +{
> +    hbitmap_status(bitmap->bitmap, offset, bytes, is_dirty, count);
> +}
> +
>   /**
>    * bdrv_merge_dirty_bitmap: merge src into dest.
>    * Ensures permissions on bitmaps are reasonable; use for public API.
> diff --git a/util/hbitmap.c b/util/hbitmap.c
> index 305b894a63..ae8d0eb4d2 100644
> --- a/util/hbitmap.c
> +++ b/util/hbitmap.c
> @@ -301,6 +301,42 @@ bool hbitmap_next_dirty_area(const HBitmap *hb, int64_t start, int64_t end,
>       return true;
>   }
>   
> +void hbitmap_status(const HBitmap *hb, int64_t start, int64_t count,
> +                    bool *is_dirty, int64_t *pnum)
> +{
> +    int64_t next_dirty, next_zero;
> +
> +    assert(start >= 0);
> +    assert(count > 0);
> +    assert(start + count <= hb->orig_size);
> +
> +    next_dirty = hbitmap_next_dirty(hb, start, count);
> +    if (next_dirty == -1) {
> +        *pnum = count;
> +        *is_dirty = false;
> +        return;
> +    }
> +
> +    if (next_dirty > start) {
> +        *pnum = next_dirty - start;
> +        *is_dirty = false;
> +        return;
> +    }
> +
> +    assert(next_dirty == start);
> +
> +    next_zero = hbitmap_next_zero(hb, start, count);
> +    if (next_zero == -1) {
> +        *pnum = count;
> +        *is_dirty = true;
> +        return;
> +    }
> +
> +    assert(next_zero > start);
> +    *pnum = next_zero - start;
> +    *is_dirty = false;
> +}
> +

This function finds if this bitmap is dirty and also counts first bits.
I don't think that this is a problem, but may be it should be divided?

>   bool hbitmap_empty(const HBitmap *hb)
>   {
>       return hb->count == 0;

With corrected description
Reviewed-by: Nikita Lapshin<nikita.lapshin@virtuozzo.com>

[-- Attachment #2: Type: text/html, Size: 5239 bytes --]

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v3 04/19] block/copy-before-write: add bitmap open parameter
  2022-01-14 17:47   ` Hanna Reitz
@ 2022-01-17 11:36     ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 38+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2022-01-17 11:36 UTC (permalink / raw)
  To: Hanna Reitz, qemu-block
  Cc: qemu-devel, armbru, xiechanglong.d, wencongyang2, eblake, kwolf,
	jsnow, nikita.lapshin

14.01.2022 20:47, Hanna Reitz wrote:
> On 22.12.21 18:40, Vladimir Sementsov-Ogievskiy wrote:
>> This brings "incremental" mode to copy-before-write filter: user can
>> specify bitmap so that filter will copy only "dirty" areas.
>>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>> ---
>>   qapi/block-core.json      | 10 +++++++++-
>>   block/copy-before-write.c | 30 +++++++++++++++++++++++++++++-
>>   2 files changed, 38 insertions(+), 2 deletions(-)
>>
>> diff --git a/qapi/block-core.json b/qapi/block-core.json
>> index 1d3dd9cb48..6904daeacf 100644
>> --- a/qapi/block-core.json
>> +++ b/qapi/block-core.json
>> @@ -4167,11 +4167,19 @@
>>   #
>>   # @target: The target for copy-before-write operations.
>>   #
>> +# @bitmap: If specified, copy-before-write filter will do
>> +#          copy-before-write operations only for dirty regions of the
>> +#          bitmap. Bitmap size must be equal to length of file and
>> +#          target child of the filter. Note also, that bitmap is used
>> +#          only to initialize internal bitmap of the process, so further
>> +#          modifications (or removing) of specified bitmap doesn't
>> +#          influence the filter.
>> +#
>>   # Since: 6.2
>>   ##
>>   { 'struct': 'BlockdevOptionsCbw',
>>     'base': 'BlockdevOptionsGenericFormat',
>> -  'data': { 'target': 'BlockdevRef' } }
>> +  'data': { 'target': 'BlockdevRef', '*bitmap': 'BlockDirtyBitmap' } }
>>   ##
>>   # @BlockdevOptions:
>> diff --git a/block/copy-before-write.c b/block/copy-before-write.c
>> index 799223e3fb..4cd90d22df 100644
>> --- a/block/copy-before-write.c
>> +++ b/block/copy-before-write.c
>> @@ -149,6 +149,7 @@ static int cbw_open(BlockDriverState *bs, QDict *options, int flags,
>>                       Error **errp)
>>   {
>>       BDRVCopyBeforeWriteState *s = bs->opaque;
>> +    BdrvDirtyBitmap *bitmap = NULL;
>>       bs->file = bdrv_open_child(NULL, options, "file", bs, &child_of_bds,
>>                                  BDRV_CHILD_FILTERED | BDRV_CHILD_PRIMARY,
>> @@ -163,6 +164,33 @@ static int cbw_open(BlockDriverState *bs, QDict *options, int flags,
>>           return -EINVAL;
>>       }
>> +    if (qdict_haskey(options, "bitmap.node") ||
>> +        qdict_haskey(options, "bitmap.name"))
>> +    {
>> +        const char *bitmap_node, *bitmap_name;
>> +
>> +        if (!qdict_haskey(options, "bitmap.node")) {
>> +            error_setg(errp, "bitmap.node is not specified");
>> +            return -EINVAL;
>> +        }
>> +
>> +        if (!qdict_haskey(options, "bitmap.name")) {
>> +            error_setg(errp, "bitmap.name is not specified");
>> +            return -EINVAL;
>> +        }
>> +
>> +        bitmap_node = qdict_get_str(options, "bitmap.node");
>> +        bitmap_name = qdict_get_str(options, "bitmap.name");
>> +        qdict_del(options, "bitmap.node");
>> +        qdict_del(options, "bitmap.name");
> 
> I’m not really a fan of this manual parsing, but I can see nothing technically wrong with it.
> 
> Still, what do you think of using an input visitor, like:
> 
> QDict *bitmap_qdict;
> 
> qdict_extract_subqdict(options, &bitmap_qdict, "bitmap.");
> if (qdict_size(bitmap_qdict) > 0) {
>      BlockDirtyBitmap *bmp_param;
>      Visitor *v = qobject_input_visitor_new_flat_confused(bitmap_qdict, errp);
>      visit_type_BlockDirtyBitmap(v, NULL, &bmp_param, errp);
>      visit_free(v);
>      qobject_unref(bitmap_qdict);
> 
>      bitmap = block_dirty_bitmap_lookup(bmp_param->node, bmp_param->name, ...);
>      qapi_free_BlockDirtyBitmap(bmp_param);
> }
> 
> (+ error handling, which is why perhaps the first block should be put into a separate function cbw_get_bitmap_param() to simplify error handling)
> 

Will try. Hmm. At some point we should start to generate _marshal_ wrappers and  handle _open() realizations like we do we qmp commands..

>> +
>> +        bitmap = block_dirty_bitmap_lookup(bitmap_node, bitmap_name, NULL,
>> +                                           errp);
>> +        if (!bitmap) {
>> +            return -EINVAL;
>> +        }
>> +    }
>> +
>>       bs->total_sectors = bs->file->bs->total_sectors;
>>       bs->supported_write_flags = BDRV_REQ_WRITE_UNCHANGED |
>>               (BDRV_REQ_FUA & bs->file->bs->supported_write_flags);
>> @@ -170,7 +198,7 @@ static int cbw_open(BlockDriverState *bs, QDict *options, int flags,
>>               ((BDRV_REQ_FUA | BDRV_REQ_MAY_UNMAP | BDRV_REQ_NO_FALLBACK) &
>>                bs->file->bs->supported_zero_flags);
>> -    s->bcs = block_copy_state_new(bs->file, s->target, NULL, errp);
>> +    s->bcs = block_copy_state_new(bs->file, s->target, bitmap, errp);
>>       if (!s->bcs) {
>>           error_prepend(errp, "Cannot create block-copy-state: ");
>>           return -EINVAL;
> 


-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v3 07/19] block/dirty-bitmap: introduce bdrv_dirty_bitmap_status()
  2022-01-17 10:06   ` Nikta Lapshin
@ 2022-01-17 12:02     ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 38+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2022-01-17 12:02 UTC (permalink / raw)
  To: Nikta Lapshin, qemu-block
  Cc: qemu-devel, armbru, xiechanglong.d, wencongyang2, eblake, hreitz,
	kwolf, jsnow

17.01.2022 13:06, Nikta Lapshin wrote:
> On 12/22/21 20:40, Vladimir Sementsov-Ogievskiy wrote:
> 
>> Add a convenient function similar with bdrv_block_status() to get
>> status of dirty bitmap.
>>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy<vsementsov@virtuozzo.com>
>> ---
>>   include/block/dirty-bitmap.h |  2 ++
>>   include/qemu/hbitmap.h       | 11 +++++++++++
>>   block/dirty-bitmap.c         |  6 ++++++
>>   util/hbitmap.c               | 36 ++++++++++++++++++++++++++++++++++++
>>   4 files changed, 55 insertions(+)
>>
>> diff --git a/include/block/dirty-bitmap.h b/include/block/dirty-bitmap.h
>> index f95d350b70..2ae7dc3d1d 100644
>> --- a/include/block/dirty-bitmap.h
>> +++ b/include/block/dirty-bitmap.h
>> @@ -115,6 +115,8 @@ int64_t bdrv_dirty_bitmap_next_zero(BdrvDirtyBitmap *bitmap, int64_t offset,
>>   bool bdrv_dirty_bitmap_next_dirty_area(BdrvDirtyBitmap *bitmap,
>>           int64_t start, int64_t end, int64_t max_dirty_count,
>>           int64_t *dirty_start, int64_t *dirty_count);
>> +void bdrv_dirty_bitmap_status(BdrvDirtyBitmap *bitmap, int64_t offset,
>> +                              int64_t bytes, bool *is_dirty, int64_t *count);
>>   BdrvDirtyBitmap *bdrv_reclaim_dirty_bitmap_locked(BdrvDirtyBitmap *bitmap,
>>                                                     Error **errp);
>>   
>> diff --git a/include/qemu/hbitmap.h b/include/qemu/hbitmap.h
>> index 5e71b6d6f7..845fda12db 100644
>> --- a/include/qemu/hbitmap.h
>> +++ b/include/qemu/hbitmap.h
>> @@ -340,6 +340,17 @@ bool hbitmap_next_dirty_area(const HBitmap *hb, int64_t start, int64_t end,
>>                                int64_t max_dirty_count,
>>                                int64_t *dirty_start, int64_t *dirty_count);
>>   
>> +/*
>> + * bdrv_dirty_bitmap_status:
>> + * @hb: The HBitmap to operate on
>> + * @start: the offset to start from
>> + * @end: end of requested area
>> + * @is_dirty: is bitmap dirty at @offset
>> + * @pnum: how many bits has same value starting from @offset
>> + */
>> +void hbitmap_status(const HBitmap *hb, int64_t offset, int64_t bytes,
>> +                    bool *is_dirty, int64_t *pnum);
>> +
> 
> I think description should be changed, there is no start and no end
> arguments in function.
> 
>>   /**
>>    * hbitmap_iter_next:
>>    * @hbi: HBitmapIter to operate on.
>> diff --git a/block/dirty-bitmap.c b/block/dirty-bitmap.c
>> index 94a0276833..e4a836749a 100644
>> --- a/block/dirty-bitmap.c
>> +++ b/block/dirty-bitmap.c
>> @@ -875,6 +875,12 @@ bool bdrv_dirty_bitmap_next_dirty_area(BdrvDirtyBitmap *bitmap,
>>                                      dirty_start, dirty_count);
>>   }
>>   
>> +void bdrv_dirty_bitmap_status(BdrvDirtyBitmap *bitmap, int64_t offset,
>> +                              int64_t bytes, bool *is_dirty, int64_t *count)
>> +{
>> +    hbitmap_status(bitmap->bitmap, offset, bytes, is_dirty, count);
>> +}
>> +
>>   /**
>>    * bdrv_merge_dirty_bitmap: merge src into dest.
>>    * Ensures permissions on bitmaps are reasonable; use for public API.
>> diff --git a/util/hbitmap.c b/util/hbitmap.c
>> index 305b894a63..ae8d0eb4d2 100644
>> --- a/util/hbitmap.c
>> +++ b/util/hbitmap.c
>> @@ -301,6 +301,42 @@ bool hbitmap_next_dirty_area(const HBitmap *hb, int64_t start, int64_t end,
>>       return true;
>>   }
>>   
>> +void hbitmap_status(const HBitmap *hb, int64_t start, int64_t count,
>> +                    bool *is_dirty, int64_t *pnum)
>> +{
>> +    int64_t next_dirty, next_zero;
>> +
>> +    assert(start >= 0);
>> +    assert(count > 0);
>> +    assert(start + count <= hb->orig_size);
>> +
>> +    next_dirty = hbitmap_next_dirty(hb, start, count);
>> +    if (next_dirty == -1) {
>> +        *pnum = count;
>> +        *is_dirty = false;
>> +        return;
>> +    }
>> +
>> +    if (next_dirty > start) {
>> +        *pnum = next_dirty - start;
>> +        *is_dirty = false;
>> +        return;
>> +    }
>> +
>> +    assert(next_dirty == start);
>> +
>> +    next_zero = hbitmap_next_zero(hb, start, count);
>> +    if (next_zero == -1) {
>> +        *pnum = count;
>> +        *is_dirty = true;
>> +        return;
>> +    }
>> +
>> +    assert(next_zero > start);
>> +    *pnum = next_zero - start;
>> +    *is_dirty = false;
>> +}
>> +
> 
> This function finds if this bitmap is dirty and also counts first bits.

Not exactly.

The idea was to have one function, that works like block_status:
it return status of bit at offset and count how many bits are of the same status after it.

> I don't think that this is a problem, but may be it should be divided?

No, I need it as one function, for further commits.

> 
>>   bool hbitmap_empty(const HBitmap *hb)
>>   {
>>       return hb->count == 0;
> 
> With corrected description
> Reviewed-by: Nikita Lapshin<nikita.lapshin@virtuozzo.com>
> 

thanks!

-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v3 08/19] block/reqlist: add reqlist_wait_all()
  2021-12-22 17:40 ` [PATCH v3 08/19] block/reqlist: add reqlist_wait_all() Vladimir Sementsov-Ogievskiy
@ 2022-01-17 12:34   ` Nikta Lapshin
  2022-01-18 13:44   ` Hanna Reitz
  1 sibling, 0 replies; 38+ messages in thread
From: Nikta Lapshin @ 2022-01-17 12:34 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-block
  Cc: qemu-devel, armbru, xiechanglong.d, wencongyang2, eblake, hreitz,
	kwolf, jsnow

[-- Attachment #1: Type: text/plain, Size: 1965 bytes --]


On 12/22/21 20:40, Vladimir Sementsov-Ogievskiy wrote:

> Add function to wait for all intersecting requests.
> To be used in the further commit.
>
> Signed-off-by: Vladimir Sementsov-Ogievskiy<vsementsov@virtuozzo.com>
> ---
>   include/block/reqlist.h | 8 ++++++++
>   block/reqlist.c         | 8 ++++++++
>   2 files changed, 16 insertions(+)
>
> diff --git a/include/block/reqlist.h b/include/block/reqlist.h
> index b904d80216..4695623bb3 100644
> --- a/include/block/reqlist.h
> +++ b/include/block/reqlist.h
> @@ -53,6 +53,14 @@ BlockReq *reqlist_find_conflict(BlockReqList *reqs, int64_t offset,
>   bool coroutine_fn reqlist_wait_one(BlockReqList *reqs, int64_t offset,
>                                      int64_t bytes, CoMutex *lock);
>   
> +/*
> + * Wait for all intersecting requests. It just calls reqlist_wait_one() in a
> + * loops, caller is responsible to stop producing new requests in this region
> + * in parallel, otherwise reqlist_wait_all() may never return.
> + */
> +void coroutine_fn reqlist_wait_all(BlockReqList *reqs, int64_t offset,
> +                                   int64_t bytes, CoMutex *lock);
> +
>   /*
>    * Shrink request and wake all waiting coroutines (may be some of them are not
>    * intersecting with shrunk request).
> diff --git a/block/reqlist.c b/block/reqlist.c
> index 5e320ba649..52a362a1d8 100644
> --- a/block/reqlist.c
> +++ b/block/reqlist.c
> @@ -57,6 +57,14 @@ bool coroutine_fn reqlist_wait_one(BlockReqList *reqs, int64_t offset,
>       return true;
>   }
>   
> +void coroutine_fn reqlist_wait_all(BlockReqList *reqs, int64_t offset,
> +                                   int64_t bytes, CoMutex *lock)
> +{
> +    while (reqlist_wait_one(reqs, offset, bytes, lock)) {
> +        /* continue */
> +    }
> +}
> +
>   void coroutine_fn reqlist_shrink_req(BlockReq *req, int64_t new_bytes)
>   {
>       if (new_bytes == req->bytes) {


Reviewed-by: Nikita Lapshin<nikita.lapshin@virtuozzo.com>

[-- Attachment #2: Type: text/html, Size: 2430 bytes --]

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v3 07/19] block/dirty-bitmap: introduce bdrv_dirty_bitmap_status()
  2021-12-22 17:40 ` [PATCH v3 07/19] block/dirty-bitmap: introduce bdrv_dirty_bitmap_status() Vladimir Sementsov-Ogievskiy
  2022-01-17 10:06   ` Nikta Lapshin
@ 2022-01-18 13:31   ` Hanna Reitz
  2022-01-26 10:56     ` Vladimir Sementsov-Ogievskiy
  1 sibling, 1 reply; 38+ messages in thread
From: Hanna Reitz @ 2022-01-18 13:31 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-block
  Cc: kwolf, wencongyang2, xiechanglong.d, qemu-devel, armbru, jsnow,
	nikita.lapshin, eblake

On 22.12.21 18:40, Vladimir Sementsov-Ogievskiy wrote:
> Add a convenient function similar with bdrv_block_status() to get
> status of dirty bitmap.
>
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>   include/block/dirty-bitmap.h |  2 ++
>   include/qemu/hbitmap.h       | 11 +++++++++++
>   block/dirty-bitmap.c         |  6 ++++++
>   util/hbitmap.c               | 36 ++++++++++++++++++++++++++++++++++++
>   4 files changed, 55 insertions(+)

[...]

> diff --git a/include/qemu/hbitmap.h b/include/qemu/hbitmap.h
> index 5e71b6d6f7..845fda12db 100644
> --- a/include/qemu/hbitmap.h
> +++ b/include/qemu/hbitmap.h
> @@ -340,6 +340,17 @@ bool hbitmap_next_dirty_area(const HBitmap *hb, int64_t start, int64_t end,
>                                int64_t max_dirty_count,
>                                int64_t *dirty_start, int64_t *dirty_count);
>   
> +/*
> + * bdrv_dirty_bitmap_status:
> + * @hb: The HBitmap to operate on
> + * @start: the offset to start from
> + * @end: end of requested area
> + * @is_dirty: is bitmap dirty at @offset
> + * @pnum: how many bits has same value starting from @offset
> + */
> +void hbitmap_status(const HBitmap *hb, int64_t offset, int64_t bytes,

In addition to the comment not fitting the parameter names, I also don’t 
find it ideal that the parameter names here don’t match the ones in the 
function’s definition.

I don’t have a preference between `start` or `offset` (although most 
other bitmap functions seem to prefer `start`), but I do prefer `count` 
over `bytes`, because...  Well, it’s a bit count, not a byte count, 
right?  (And from the bitmap user’s perspective, those bits might stand 
for any arbitrary unit.)

Apart from that, looks nice to me.  I am wondering a bit why this 
function doesn’t simply return the dirty bit status (like, well, the 
block-status functions do it), but I presume you simply found this 
interface to be better suited for its callers.

> +                    bool *is_dirty, int64_t *pnum);
> +
>   /**
>    * hbitmap_iter_next:
>    * @hbi: HBitmapIter to operate on.



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v3 08/19] block/reqlist: add reqlist_wait_all()
  2021-12-22 17:40 ` [PATCH v3 08/19] block/reqlist: add reqlist_wait_all() Vladimir Sementsov-Ogievskiy
  2022-01-17 12:34   ` Nikta Lapshin
@ 2022-01-18 13:44   ` Hanna Reitz
  1 sibling, 0 replies; 38+ messages in thread
From: Hanna Reitz @ 2022-01-18 13:44 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-block
  Cc: kwolf, wencongyang2, xiechanglong.d, qemu-devel, armbru, jsnow,
	nikita.lapshin, eblake

On 22.12.21 18:40, Vladimir Sementsov-Ogievskiy wrote:
> Add function to wait for all intersecting requests.
> To be used in the further commit.
>
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>   include/block/reqlist.h | 8 ++++++++
>   block/reqlist.c         | 8 ++++++++
>   2 files changed, 16 insertions(+)
>
> diff --git a/include/block/reqlist.h b/include/block/reqlist.h
> index b904d80216..4695623bb3 100644
> --- a/include/block/reqlist.h
> +++ b/include/block/reqlist.h
> @@ -53,6 +53,14 @@ BlockReq *reqlist_find_conflict(BlockReqList *reqs, int64_t offset,
>   bool coroutine_fn reqlist_wait_one(BlockReqList *reqs, int64_t offset,
>                                      int64_t bytes, CoMutex *lock);
>   
> +/*
> + * Wait for all intersecting requests. It just calls reqlist_wait_one() in a
> + * loops, caller is responsible to stop producing new requests in this region

s/loops/loop/

Reviewed-by: Hanna Reitz <hreitz@redhat.com>

> + * in parallel, otherwise reqlist_wait_all() may never return.
> + */
> +void coroutine_fn reqlist_wait_all(BlockReqList *reqs, int64_t offset,
> +                                   int64_t bytes, CoMutex *lock);
> +
>   /*
>    * Shrink request and wake all waiting coroutines (may be some of them are not
>    * intersecting with shrunk request).



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v3 09/19] block: introduce FleecingState class
  2021-12-22 17:40 ` [PATCH v3 09/19] block: introduce FleecingState class Vladimir Sementsov-Ogievskiy
@ 2022-01-18 16:37   ` Hanna Reitz
  2022-01-18 18:35     ` Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 38+ messages in thread
From: Hanna Reitz @ 2022-01-18 16:37 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-block
  Cc: kwolf, wencongyang2, xiechanglong.d, qemu-devel, armbru, jsnow,
	nikita.lapshin, eblake

On 22.12.21 18:40, Vladimir Sementsov-Ogievskiy wrote:
> FleecingState represents state shared between copy-before-write filter
> and upcoming fleecing block driver.
>
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>   block/fleecing.h  | 135 ++++++++++++++++++++++++++++++++++
>   block/fleecing.c  | 182 ++++++++++++++++++++++++++++++++++++++++++++++
>   MAINTAINERS       |   2 +
>   block/meson.build |   1 +
>   4 files changed, 320 insertions(+)
>   create mode 100644 block/fleecing.h
>   create mode 100644 block/fleecing.c
>
> diff --git a/block/fleecing.h b/block/fleecing.h
> new file mode 100644
> index 0000000000..fb7b2f86c4
> --- /dev/null
> +++ b/block/fleecing.h
> @@ -0,0 +1,135 @@
> +/*
> + * FleecingState
> + *
> + * The common state of image fleecing, shared between copy-before-write filter
> + * and fleecing block driver.

 From this documentation, it’s unclear to me who owns the FleecingState 
object.  I would have assumed it’s the fleecing node, and if it is, I 
wonder why we even have this external interface instead of considering 
FleecingState a helper object for the fleecing block driver (or rather 
the block driver’s opaque state, which it basically is, as far as I can 
see from peeking into the next patch), and putting both into a single 
file with no external interface except for 
fleecing_mark_done_and_wait_readers().

> + *
> + * Copyright (c) 2021 Virtuozzo International GmbH.
> + *
> + * Author:
> + *  Sementsov-Ogievskiy Vladimir <vsementsov@virtuozzo.com>
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License, or
> + * (at your option) any later version.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program. If not, see <http://www.gnu.org/licenses/>.
> + *
> + *
> + * Fleecing scheme looks as follows:
> + *
> + * [guest blk]                   [nbd export]
> + *    |                              |
> + *    |root                          |
> + *    v                              v
> + * [copy-before-write]--target-->[fleecing drv]
> + *    |                          /   |
> + *    |file                     /    |file
> + *    v                        /     v
> + * [active disk]<--source-----/  [temp disk]
> + *
> + * Note that "active disk" is also called just "source" and "temp disk" is also
> + * called "target".
> + *
> + * What happens here:
> + *
> + * copy-before-write filter performs copy-before-write operations: on guest
> + * write we should copy old data to target child before rewriting. Note that we
> + * write this data through fleecing driver: it saves a possibility to implement
> + * a kind of cache in fleecing driver in future.

I don’t understand why this explanation is the first one given (and the 
only one given explicitly as a reason) for why we want a fleecing block 
driver.

(1) If we implement caching later, I have a feeling that we’ll want new 
options for this.  So a management layer that wants caching will need to 
be updated at that point anyway (to specify these new options), so I 
don’t understand how adding a fleecing block driver now would make it 
easier later on to introduce caching.

(1b) It’s actually entirely possible that we will not want to use the 
fleecing driver for caching, because we decide that caching is much more 
useful as its own dedicated block driver.

(2) There are much better arguments below.  This FleecingState you 
introduce here makes it clear why we need a fleecing block driver; it 
helps with synchronization, and it provides the “I’m done with this bit, 
I don’t care about it anymore” discard interface.

> + *
> + * Fleecing user is nbd export: it can read from fleecing node, which guarantees
> + * a snapshot-view for fleecing user. Fleecing user may also do discard
> + * operations.
> + *
> + * FleecingState is responsible for most of the fleecing logic:
> + *
> + * 1. Fleecing read. Handle reads of fleecing user: we should decide where from
> + * to read, from source node or from copy-before-write target node. In former
> + * case we need to synchronize with guest writes. See fleecing_read_lock() and
> + * fleecing_read_unlock() functionality.
> + *
> + * 2. Guest write synchronization (part of [1] actually). See
> + * fleecing_mark_done_and_wait_readers()
> + *
> + * 3. Fleecing discard. Used by fleecing user when corresponding area is already
> + * copied. Fleecing user may discard the area which is not needed anymore, that
> + * should result in:
> + *   - discarding data to free disk space
> + *   - clear bits in copy-bitmap of block-copy, to avoid extra copy-before-write
> + *     operations
> + *   - clear bits in access-bitmap of FleecingState, to avoid further wrong
> + *     access
> + *
> + * Still, FleecingState doesn't own any block children, so all real io
> + * operations (reads, writes and discards) are done by copy-before-write filter
> + * and fleecing block driver.

I find this a bit confusing, because for me, it raised the question of 
“why would it own block childen?”, which led to me wanting to know even 
more where the place of FleecingState is.  This sentence makes it really 
sound as if FleecingState is its own independent object floating around 
somewhere, not owned by anything, and that feels very wrong.

(If FleecingState were owned by the fleecing driver, i.e. if it 
basically were just the fleecing driver’s opaque data itself, then the 
question of what the FleecingState is, and whether it could own block 
children wouldn’t even come up.)

> + */
> +
> +#ifndef FLEECING_H
> +#define FLEECING_H
> +
> +#include "block/block_int.h"
> +#include "block/block-copy.h"
> +#include "block/reqlist.h"
> +
> +typedef struct FleecingState FleecingState;
> +
> +/*
> + * Create FleecingState.
> + *
> + * @bcs: link to block-copy owned by copy-before-write filter.

s/block-copy/block-copy state/

> + *
> + * @fleecing_node: should be fleecing block driver node. Used to create some

I think the “should be” should be dropped.  This must be a fleecing 
block driver, right?

(But then again, I really don’t understand why the FleecingState is 
separate from BDRVFleecingState in the first place.)

> + * bitmaps in it.
> + */
> +FleecingState *fleecing_new(BlockCopyState *bcs,
> +                            BlockDriverState *fleecing_node,
> +                            Error **errp);
> +
> +/* Free the state. Doesn't free block-copy state (@bcs) */
> +void fleecing_free(FleecingState *s);
> +
> +/*
> + * Convenient function for thous who want to do fleecing read.

s/thous/those/

I kind of miss a quick summary here of what this function is for, i.e. to
(1) find out where to read the data from, and
(2) if it’s to be read from the source, to block the affected area from 
writes until the read is done.

But I don’t know how to phrase that concisely, so I’m also OK with not 
having such a summary.

> + *
> + * If requested region starts in "done" area, i.e. data is already copied to
> + * copy-before-write target node, req is set to NULL, pnum is set to available
> + * bytes to read from target. User is free to read @pnum bytes from target.
> + * Still, user is responsible for concurrent discards on target.
> + *
> + * If requests region starts in "not done" area, i.e. we have to read from

s/requests/requested/

> + * source node directly, than @pnum bytes of source node are frozen and

s/than/then/

> + * guaranteed not be rewritten until user calls cbw_snapshot_read_unlock().

s/guaranteed not be rewritten/guaranteed not to be rewritten/

(or perhaps also s/rewritten/modified/, but it’s probably just me to 
whom “rewritten” sounds a bit like “the same data is written again”)

> + *
> + * Returns 0 on success and -EACCES when try to read non-dirty area of
> + * access_bitmap.
> + */

This description doesn’t sufficiently describe the @req parameter. It 
only says that `*req == NULL` will be returned if the data is to be read 
from the target node, but other than that it doesn’t say whether *req is 
a pure “out” or “inout” parameter.  It doesn’t say whether the caller 
has to pre-fill it (and fleecing_read_lock() will set it to NULL if the 
caller should read from the target), or whether fleecing_read_lock() 
will always set it (depending on whether to read from the source 
(non-NULL) or the target (NULL)).

Since it’s the latter (fleecing_read_lock() will allocate a BlockReq and 
return it), we also need to explain what we expect the user to do with 
this, namely absolutely nothing except pass it again to 
fleecing_read_unlock().

> +int fleecing_read_lock(FleecingState *f, int64_t offset,
> +                       int64_t bytes, const BlockReq **req, int64_t *pnum);
> +/* Called as closing pair for fleecing_read_lock() */

It isn’t quite clear from this summary whether this function should also 
be called if fleecing_read_lock() returned success, but *req == NULL.  
It shouldn’t, but given this description, I’d do it.

> +void fleecing_read_unlock(FleecingState *f, const BlockReq *req);
> +
> +/*
> + * Called when fleecing user doesn't need the region anymore (for example the
> + * region is successfully read and backed up somewhere).
> + * This prevents extra copy-before-write operations in this area in future.
> + * Next fleecing read from this area will fail with -EACCES.
> + */
> +void fleecing_discard(FleecingState *f, int64_t offset, int64_t bytes);
> +
> +/*
> + * Called by copy-before-write filter after successful copy-before-write
> + * operation to synchronize with parallel fleecing reads.
> + */
> +void fleecing_mark_done_and_wait_readers(FleecingState *f, int64_t offset,
> +                                         int64_t bytes);
> +
> +#endif /* FLEECING_H */
> diff --git a/block/fleecing.c b/block/fleecing.c
> new file mode 100644
> index 0000000000..f75d11b892
> --- /dev/null
> +++ b/block/fleecing.c
> @@ -0,0 +1,182 @@
> +/*
> + * FleecingState
> + *
> + * The common state of image fleecing, shared between copy-before-write filter
> + * and fleecing block driver.
> + *
> + * Copyright (c) 2021 Virtuozzo International GmbH.
> + *
> + * Author:
> + *  Sementsov-Ogievskiy Vladimir <vsementsov@virtuozzo.com>
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License, or
> + * (at your option) any later version.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program. If not, see <http://www.gnu.org/licenses/>.
> + */
> +
> +#include "qemu/osdep.h"
> +
> +#include "sysemu/block-backend.h"
> +#include "qemu/cutils.h"
> +#include "qapi/error.h"
> +#include "block/block_int.h"
> +#include "block/coroutines.h"
> +#include "block/qdict.h"
> +#include "block/block-copy.h"
> +#include "block/reqlist.h"
> +
> +#include "block/fleecing.h"
> +
> +/*
> + * @bcs: link to block-copy state owned by copy-before-write filter which
> + * performs copy-before-write operations in context of fleecing scheme.
> + * FleecingState doesn't own the block-copy state and don't free it on cleanup.

s/don't/doesn't/

> + *
> + * @lock: protects access to @access_bitmap, @done_bitmap and @frozen_read_reqs
> + *
> + * @access_bitmap: represents areas allowed for reading by fleecing user.
> + * Reading from non-dirty areas leads to -EACCES. Discard operation among other

Since this is not really abort dirty or not, I’d prefer “clear 
(non-dirty)” instead of just “non-dirty”.

> + * things clears corresponding bits in this bitmaps.

It isn’t quite clear whether (A) the discard operation does various 
things, and one of them is to reset the corresponding area in the 
access_bitmap; or (B) the discard operation is only one of various ways 
to reset areas in the access_bitmap.  (It’s (A), and so I’d just say 
“fleecing_discard() clears areas in this bitmap.”)

> + *
> + * @done_bitmap: represents areas that was successfully copied by

s/that was/that were/

> + * copy-before-write operations. So, for dirty areas fleecing user should read
> + * from target node and for clear areas - from source node.

I’d prefer “from target node, and for clear areas from source node”.

> + *
> + * @frozen_read_reqs: current read requests for fleecing user in source node.

Hmm, perhaps “ongoing” would be clearer than “current”?

> + * corresponding areas must not be rewritten by guest.

Not necessarily just the guest, so something like “Writing to these 
areas must wait until the respective requests are settled.” would be 
more general.

> + */
> +typedef struct FleecingState {
> +    BlockCopyState *bcs;
> +
> +    CoMutex lock;
> +
> +    BdrvDirtyBitmap *access_bitmap;
> +    BdrvDirtyBitmap *done_bitmap;
> +
> +    BlockReqList frozen_read_reqs;
> +} FleecingState;
> +
> +FleecingState *fleecing_new(BlockCopyState *bcs,
> +                            BlockDriverState *fleecing_node,
> +                            Error **errp)
> +{
> +    BdrvDirtyBitmap *bcs_bitmap = block_copy_dirty_bitmap(bcs),
> +                    *done_bitmap, *access_bitmap;

I don’t really understand why you didn’t just start a new declaration 
here, putting “BdrvDirtyBitmap” at the beginning of the line again.

> +    int64_t cluster_size = block_copy_cluster_size(bcs);
> +    FleecingState *s;
> +
> +    /* done_bitmap starts empty */
> +    done_bitmap = bdrv_create_dirty_bitmap(fleecing_node, cluster_size, NULL,
> +                                           errp);
> +    if (!done_bitmap) {
> +        return NULL;
> +    }
> +    bdrv_disable_dirty_bitmap(done_bitmap);
> +
> +    /* access_bitmap starts equal to bcs_bitmap */
> +    access_bitmap = bdrv_create_dirty_bitmap(fleecing_node, cluster_size, NULL,
> +                                             errp);
> +    if (!access_bitmap) {
> +        return NULL;
> +    }
> +    bdrv_disable_dirty_bitmap(access_bitmap);
> +    if (!bdrv_dirty_bitmap_merge_internal(access_bitmap, bcs_bitmap,
> +                                          NULL, true))
> +    {
> +        return NULL;
> +    }

This function lacks a proper on-error clean-up path to free the dirty 
bitmaps.

> +
> +    s = g_new(FleecingState, 1);
> +    *s = (FleecingState) {
> +        .bcs = bcs,
> +        .done_bitmap = done_bitmap,
> +        .access_bitmap = access_bitmap,
> +    };
> +    qemu_co_mutex_init(&s->lock);
> +    QLIST_INIT(&s->frozen_read_reqs);
> +
> +    return s;
> +}
> +
> +void fleecing_free(FleecingState *s)
> +{
> +    if (!s) {
> +        return;
> +    }
> +
> +    bdrv_release_dirty_bitmap(s->access_bitmap);
> +    bdrv_release_dirty_bitmap(s->done_bitmap);
> +    g_free(s);
> +}
> +
> +static BlockReq *add_read_req(FleecingState *s, uint64_t offset, uint64_t bytes)
> +{
> +    BlockReq *req = g_new(BlockReq, 1);
> +
> +    reqlist_init_req(&s->frozen_read_reqs, req, offset, bytes);
> +
> +    return req;
> +}
> +
> +static void drop_read_req(BlockReq *req)
> +{
> +    reqlist_remove_req(req);
> +    g_free(req);
> +}
> +
> +int fleecing_read_lock(FleecingState *s, int64_t offset,
> +                       int64_t bytes, const BlockReq **req,
> +                       int64_t *pnum)
> +{
> +    bool done;
> +
> +    QEMU_LOCK_GUARD(&s->lock);
> +
> +    if (bdrv_dirty_bitmap_next_zero(s->access_bitmap, offset, bytes) != -1) {
> +        return -EACCES;
> +    }
> +
> +    bdrv_dirty_bitmap_status(s->done_bitmap, offset, bytes, &done, pnum);
> +    if (!done) {
> +        *req = add_read_req(s, offset, *pnum);
> +    }
> +
> +    return 0;
> +}
> +
> +void fleecing_read_unlock(FleecingState *s, const BlockReq *req)
> +{
> +    QEMU_LOCK_GUARD(&s->lock);
> +
> +    drop_read_req((BlockReq *)req);

In my opinion, any cast removing a `const` must be accompanied by an 
explanatory comment.

As I understand it, this function takes a `const BlockReq *` pointer so 
because `fleecing_read_lock()` returns a `const BlockReq *` object, 
because we don’t want the user to modify that request, but still want 
them to be able to easily pass the object they’ve received to 
`fleecing_read_unlock()`, right?

The problem is of course that we need a mutable BlockReq object here, 
because QLIST_REMOVE() will modify it.  Taking a `const BlockReq *` is 
just not correct.

Perhaps instead of returning a `const BlockReq *` pointer to the caller, 
we should just return a `void *` opaque pointer that they are to pass to 
this function again, that should ensure they don’t touch it just the 
same, and we wouldn’t need this cast here.

> +}
> +
> +void fleecing_discard(FleecingState *s, int64_t offset, int64_t bytes)
> +{
> +    WITH_QEMU_LOCK_GUARD(&s->lock) {
> +        bdrv_reset_dirty_bitmap(s->access_bitmap, offset, bytes);
> +    }
> +
> +    block_copy_reset(s->bcs, offset, bytes);
> +}
> +
> +void fleecing_mark_done_and_wait_readers(FleecingState *s, int64_t offset,
> +                                         int64_t bytes)
> +{
> +    assert(QEMU_IS_ALIGNED(offset, block_copy_cluster_size(s->bcs)));
> +    assert(QEMU_IS_ALIGNED(bytes, block_copy_cluster_size(s->bcs)));
> +
> +    WITH_QEMU_LOCK_GUARD(&s->lock) {
> +        bdrv_set_dirty_bitmap(s->done_bitmap, offset, bytes);
> +        reqlist_wait_all(&s->frozen_read_reqs, offset, bytes, &s->lock);
> +    }
> +}
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 7f24ee4b92..78ea04e292 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -2423,6 +2423,8 @@ F: block/reqlist.c
>   F: include/block/reqlist.h
>   F: block/copy-before-write.h
>   F: block/copy-before-write.c
> +F: block/fleecing.h
> +F: block/fleecing.c
>   F: include/block/aio_task.h
>   F: block/aio_task.c
>   F: util/qemu-co-shared-resource.c
> diff --git a/block/meson.build b/block/meson.build
> index 5065cf33ba..d30da90a01 100644
> --- a/block/meson.build
> +++ b/block/meson.build
> @@ -18,6 +18,7 @@ block_ss.add(files(
>     'crypto.c',
>     'dirty-bitmap.c',
>     'filter-compress.c',
> +  'fleecing.c',
>     'io.c',
>     'mirror.c',
>     'nbd.c',



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v3 09/19] block: introduce FleecingState class
  2022-01-18 16:37   ` Hanna Reitz
@ 2022-01-18 18:35     ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 38+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2022-01-18 18:35 UTC (permalink / raw)
  To: Hanna Reitz, qemu-block
  Cc: qemu-devel, armbru, xiechanglong.d, wencongyang2, eblake, kwolf,
	jsnow, nikita.lapshin

18.01.2022 19:37, Hanna Reitz wrote:
> On 22.12.21 18:40, Vladimir Sementsov-Ogievskiy wrote:
>> FleecingState represents state shared between copy-before-write filter
>> and upcoming fleecing block driver.
>>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>> ---
>>   block/fleecing.h  | 135 ++++++++++++++++++++++++++++++++++
>>   block/fleecing.c  | 182 ++++++++++++++++++++++++++++++++++++++++++++++
>>   MAINTAINERS       |   2 +
>>   block/meson.build |   1 +
>>   4 files changed, 320 insertions(+)
>>   create mode 100644 block/fleecing.h
>>   create mode 100644 block/fleecing.c
>>
>> diff --git a/block/fleecing.h b/block/fleecing.h
>> new file mode 100644
>> index 0000000000..fb7b2f86c4
>> --- /dev/null
>> +++ b/block/fleecing.h
>> @@ -0,0 +1,135 @@
>> +/*
>> + * FleecingState
>> + *
>> + * The common state of image fleecing, shared between copy-before-write filter
>> + * and fleecing block driver.
> 
>  From this documentation, it’s unclear to me who owns the FleecingState object.  I would have assumed it’s the fleecing node, and if it is, I wonder why we even have this external interface instead of considering FleecingState a helper object for the fleecing block driver (or rather the block driver’s opaque state, which it basically is, as far as I can see from peeking into the next patch), and putting both into a single file with no external interface except for fleecing_mark_done_and_wait_readers().

FleecingState object is owned by copy-before-write node. copy-before-write has the whole information, and it owns BlockCopyState object, which is used to create FleecingState. copy-before-write node can easily detect that its target is fleecing filter, and initialize FleecingState in this case.

On the other hand, if we want to create FleecingState from fleecing filter (or even merge the state into its driver state), we'll have to search through parents to find copy-before-write, which may be not trivial. Moreover, at time of open() we may have no parents yet.


Hmm, but may be just pass bcs to fleecing-node by activate(), like we are going to do with fleecing state?  I'll give it a try.

> 
>> + *
>> + * Copyright (c) 2021 Virtuozzo International GmbH.
>> + *
>> + * Author:
>> + *  Sementsov-Ogievskiy Vladimir <vsementsov@virtuozzo.com>
>> + *
>> + * This program is free software; you can redistribute it and/or modify
>> + * it under the terms of the GNU General Public License as published by
>> + * the Free Software Foundation; either version 2 of the License, or
>> + * (at your option) any later version.
>> + *
>> + * This program is distributed in the hope that it will be useful,
>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> + * GNU General Public License for more details.
>> + *
>> + * You should have received a copy of the GNU General Public License
>> + * along with this program. If not, see <http://www.gnu.org/licenses/>.
>> + *
>> + *
>> + * Fleecing scheme looks as follows:
>> + *
>> + * [guest blk]                   [nbd export]
>> + *    |                              |
>> + *    |root                          |
>> + *    v                              v
>> + * [copy-before-write]--target-->[fleecing drv]
>> + *    |                          /   |
>> + *    |file                     /    |file
>> + *    v                        /     v
>> + * [active disk]<--source-----/  [temp disk]
>> + *
>> + * Note that "active disk" is also called just "source" and "temp disk" is also
>> + * called "target".
>> + *
>> + * What happens here:
>> + *
>> + * copy-before-write filter performs copy-before-write operations: on guest
>> + * write we should copy old data to target child before rewriting. Note that we
>> + * write this data through fleecing driver: it saves a possibility to implement
>> + * a kind of cache in fleecing driver in future.
> 
> I don’t understand why this explanation is the first one given (and the only one given explicitly as a reason) for why we want a fleecing block driver.

Actually, benefits are given in the next commit message.

> 
> (1) If we implement caching later, I have a feeling that we’ll want new options for this.  So a management layer that wants caching will need to be updated at that point anyway (to specify these new options), so I don’t understand how adding a fleecing block driver now would make it easier later on to introduce caching.
> 
> (1b) It’s actually entirely possible that we will not want to use the fleecing driver for caching, because we decide that caching is much more useful as its own dedicated block driver.
> 
> (2) There are much better arguments below.  This FleecingState you introduce here makes it clear why we need a fleecing block driver; it helps with synchronization, and it provides the “I’m done with this bit, I don’t care about it anymore” discard interface.
> 
>> + *
>> + * Fleecing user is nbd export: it can read from fleecing node, which guarantees
>> + * a snapshot-view for fleecing user. Fleecing user may also do discard
>> + * operations.
>> + *
>> + * FleecingState is responsible for most of the fleecing logic:
>> + *
>> + * 1. Fleecing read. Handle reads of fleecing user: we should decide where from
>> + * to read, from source node or from copy-before-write target node. In former
>> + * case we need to synchronize with guest writes. See fleecing_read_lock() and
>> + * fleecing_read_unlock() functionality.
>> + *
>> + * 2. Guest write synchronization (part of [1] actually). See
>> + * fleecing_mark_done_and_wait_readers()
>> + *
>> + * 3. Fleecing discard. Used by fleecing user when corresponding area is already
>> + * copied. Fleecing user may discard the area which is not needed anymore, that
>> + * should result in:
>> + *   - discarding data to free disk space
>> + *   - clear bits in copy-bitmap of block-copy, to avoid extra copy-before-write
>> + *     operations
>> + *   - clear bits in access-bitmap of FleecingState, to avoid further wrong
>> + *     access
>> + *
>> + * Still, FleecingState doesn't own any block children, so all real io
>> + * operations (reads, writes and discards) are done by copy-before-write filter
>> + * and fleecing block driver.
> 
> I find this a bit confusing, because for me, it raised the question of “why would it own block childen?”, which led to me wanting to know even more where the place of FleecingState is.  This sentence makes it really sound as if FleecingState is its own independent object floating around somewhere, not owned by anything, and that feels very wrong.

It's owned by copy-before-write node. Hmm, and seems doesn't operate directly on any block children, so this sentence may be removed.



-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v3 10/19] block: introduce fleecing block driver
  2021-12-22 17:40 ` [PATCH v3 10/19] block: introduce fleecing block driver Vladimir Sementsov-Ogievskiy
@ 2022-01-20 16:11   ` Hanna Reitz
  2022-01-21 10:46     ` Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 38+ messages in thread
From: Hanna Reitz @ 2022-01-20 16:11 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-block
  Cc: kwolf, wencongyang2, xiechanglong.d, qemu-devel, armbru, jsnow,
	nikita.lapshin, eblake

On 22.12.21 18:40, Vladimir Sementsov-Ogievskiy wrote:
> Introduce a new driver, that works in pair with copy-before-write to
> improve fleecing.
>
> Without fleecing driver, old fleecing scheme looks as follows:
>
> [guest]
>    |
>    |root
>    v
> [copy-before-write] -----> [temp.qcow2] <--- [nbd export]
>    |                 target  |
>    |file                     |backing
>    v                         |
> [active disk] <-------------+
>
> With fleecing driver, new scheme is:
>
> [guest]
>    |
>    |root
>    v
> [copy-before-write] -----> [fleecing] <--- [nbd export]
>    |                 target  |    |
>    |file                     |    |file
>    v                         |    v
> [active disk]<--source------+  [temp.img]
>
> Benefits of new scheme:
>
> 1. Access control: if remote client try to read data that not covered
>     by original dirty bitmap used on copy-before-write open, client gets
>     -EACCES.
>
> 2. Discard support: if remote client do DISCARD, this additionally to
>     discarding data in temp.img informs block-copy process to not copy
>     these clusters. Next read from discarded area will return -EACCES.
>     This is significant thing: when fleecing user reads data that was
>     not yet copied to temp.img, we can avoid copying it on further guest
>     write.
>
> 3. Synchronisation between client reads and block-copy write is more
>     efficient: it doesn't block intersecting block-copy write during
>     client read.
>
> 4. We don't rely on backing feature: active disk should not be backing
>     of temp image, so we avoid some permission-related difficulties and
>     temp image now is not required to support backing, it may be simple
>     raw image.
>
> Note that now nobody calls fleecing_drv_activate(), so new driver is
> actually unusable. It's a work for the following patch: support
> fleecing block driver in copy-before-write filter driver.
>
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>   qapi/block-core.json |  37 +++++-
>   block/fleecing.h     |  16 +++
>   block/fleecing-drv.c | 261 +++++++++++++++++++++++++++++++++++++++++++
>   MAINTAINERS          |   1 +
>   block/meson.build    |   1 +
>   5 files changed, 315 insertions(+), 1 deletion(-)
>   create mode 100644 block/fleecing-drv.c
>
> diff --git a/qapi/block-core.json b/qapi/block-core.json
> index 6904daeacf..b47351dbac 100644
> --- a/qapi/block-core.json
> +++ b/qapi/block-core.json
> @@ -2917,13 +2917,14 @@
>   # @blkreplay: Since 4.2
>   # @compress: Since 5.0
>   # @copy-before-write: Since 6.2
> +# @fleecing: Since 7.0
>   #
>   # Since: 2.9
>   ##
>   { 'enum': 'BlockdevDriver',
>     'data': [ 'blkdebug', 'blklogwrites', 'blkreplay', 'blkverify', 'bochs',
>               'cloop', 'compress', 'copy-before-write', 'copy-on-read', 'dmg',
> -            'file', 'ftp', 'ftps', 'gluster',
> +            'file', 'fleecing', 'ftp', 'ftps', 'gluster',
>               {'name': 'host_cdrom', 'if': 'HAVE_HOST_BLOCK_DEVICE' },
>               {'name': 'host_device', 'if': 'HAVE_HOST_BLOCK_DEVICE' },
>               'http', 'https', 'iscsi',
> @@ -4181,6 +4182,39 @@
>     'base': 'BlockdevOptionsGenericFormat',
>     'data': { 'target': 'BlockdevRef', '*bitmap': 'BlockDirtyBitmap' } }
>   
> +##
> +# @BlockdevOptionsFleecing:
> +#
> +# Driver that works in pair with copy-before-write filter to make a fleecing
> +# scheme like this:
> +#
> +#    [guest]
> +#      |
> +#      |root
> +#      v
> +#    [copy-before-write] -----> [fleecing] <--- [nbd export]
> +#      |                 target  |    |
> +#      |file                     |    |file
> +#      v                         |    v
> +#    [active disk]<--source------+  [temp.img]

When generating docs, my sphinx doesn’t like this very much.  I don’t 
know exactly what of it, but it complains with:

docs/../qapi/block-core.json:4190:Line block ends without a blank line.

(Line 4190 is the “@BlockdevOptionsFleecing:” line, but there is no 
warning if I remove this ASCII art.)

> +#
> +# The scheme works like this: on write, fleecing driver saves data to its
> +# ``file`` child and remember that this data is in ``file`` child. On read
> +# fleecing reads from ``file`` child if data is already stored to it and
> +# otherwise it reads from ``source`` child.

I.e. it’s basically a COW format with the allocation bitmap stored as a 
block dirty bitmap.

> +# In the same time, before each guest write, ``copy-before-write`` copies
> +# corresponding old data  from ``active disk`` to ``fleecing`` node.
> +# This way, ``fleecing`` node looks like a kind of snapshot for extenal
> +# reader like NBD export.

So this description sounds like the driver is just a COW driver with an 
in-memory allocation bitmap.  But it’s actually specifically tuned for 
fleecing, because it interacts with the CBW node to prevent conflicts, 
and discard requests result in the respective areas become unreadable.

I find that important to mention, because if we don’t, then I’m 
wondering why this isn’t a generic “in-memory-cow” driver, and what 
makes it so useful for fleecing over any other COW driver.

(In fact, I’m asking myself all the time whether we can’t pull this 
driver apart into more generic nodes, like one in-memory-cow driver, and 
another driver managing the discard feature, and so on.  Could be done 
e.g. like this:


                 Guest -> copy-before-write --file--> fleecing-lock 
--file--> disk image
^        |                  ^
|      target               |
+-- cbw-child --+        |               backing
|           v                  |
NBD -> fleecing-discard --file--> in-memory-cow -----------+
                                         |
         file
           |
           v
       temp.img

I.e. fleecing-discard would handle discards (telling its cbw-child to 
drop those areas from the copy-bitmap, and forwarding discards to the 
in-memory-cow node), the in-memory-cow node would just be a generic 
implementation of COW (could be replaced by any other COW-implementing 
node, like qcow2), and the fleecing-lock driver would prevent areas that 
are still being read from from being written to concurrently.

Problem is, of course, that’s very complicated, I haven’t thought this 
through, and it’s extremely questionable whether we really need this 
modularity.  Most likely not.

I still feel compelled to think about such modularization, because the 
relationship between the CBW and the fleecing driver as laid out in this 
series doesn’t feel quite right to me.  They feel bolted together in a 
way that doesn’t fit in with the general design of the block layer where 
every node is basically self-contained.  I understand CBW and fleecing 
will need some communication, but I don’t (yet) like how in the next 
patch, the CBW driver looks for the fleecing driver and directly 
communicates with it through the FleecingState instead of going through 
the block layer, as we’d normally do when communicating between block nodes.

That’s why I’m trying to pick apart the functionality of the fleecing 
block driver into self-contained “atomic” nodes that perform its 
different functionalities, so that perhaps I can eventually put it back 
together and find out whether we can do better than 
`is_fleecing_drv(unfiltered_target)`.)

> +#
> +# @source: node name of source node of fleecing scheme
> +#
> +# Since: 7.0
> +##
> +{ 'struct': 'BlockdevOptionsFleecing',
> +  'base': 'BlockdevOptionsGenericFormat',
> +  'data': { 'source': 'str' } }
> +
>   ##
>   # @BlockdevOptions:
>   #
> @@ -4237,6 +4271,7 @@
>         'copy-on-read':'BlockdevOptionsCor',
>         'dmg':        'BlockdevOptionsGenericFormat',
>         'file':       'BlockdevOptionsFile',
> +      'fleecing':   'BlockdevOptionsFleecing',
>         'ftp':        'BlockdevOptionsCurlFtp',
>         'ftps':       'BlockdevOptionsCurlFtps',
>         'gluster':    'BlockdevOptionsGluster',
> diff --git a/block/fleecing.h b/block/fleecing.h
> index fb7b2f86c4..75ad2f8b19 100644
> --- a/block/fleecing.h
> +++ b/block/fleecing.h
> @@ -80,6 +80,9 @@
>   #include "block/block-copy.h"
>   #include "block/reqlist.h"
>   
> +
> +/* fleecing.c */
> +
>   typedef struct FleecingState FleecingState;
>   
>   /*
> @@ -132,4 +135,17 @@ void fleecing_discard(FleecingState *f, int64_t offset, int64_t bytes);
>   void fleecing_mark_done_and_wait_readers(FleecingState *f, int64_t offset,
>                                            int64_t bytes);
>   
> +
> +/* fleecing-drv.c */
> +
> +/* Returns true if @bs->drv is fleecing block driver */
> +bool is_fleecing_drv(BlockDriverState *bs);
> +
> +/*
> + * Normally FleecingState is created by copy-before-write filter. Then
> + * copy-before-write filter calls fleecing_drv_activate() to share FleecingState
> + * with fleecing block driver.
> + */
> +void fleecing_drv_activate(BlockDriverState *bs, FleecingState *fleecing);
> +
>   #endif /* FLEECING_H */
> diff --git a/block/fleecing-drv.c b/block/fleecing-drv.c
> new file mode 100644
> index 0000000000..202208bb03
> --- /dev/null
> +++ b/block/fleecing-drv.c
> @@ -0,0 +1,261 @@
> +/*
> + * fleecing block driver
> + *
> + * Copyright (c) 2021 Virtuozzo International GmbH.
> + *
> + * Author:
> + *  Sementsov-Ogievskiy Vladimir <vsementsov@virtuozzo.com>
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License, or
> + * (at your option) any later version.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program. If not, see <http://www.gnu.org/licenses/>.
> + */
> +
> +#include "qemu/osdep.h"
> +
> +#include "sysemu/block-backend.h"
> +#include "qemu/cutils.h"
> +#include "qapi/error.h"
> +#include "block/block_int.h"
> +#include "block/coroutines.h"
> +#include "block/qdict.h"
> +#include "block/block-copy.h"
> +#include "block/reqlist.h"
> +
> +#include "block/copy-before-write.h"
> +#include "block/fleecing.h"
> +
> +typedef struct BDRVFleecingState {
> +    FleecingState *fleecing;
> +    BdrvChild *source;
> +} BDRVFleecingState;
> +
> +static coroutine_fn int fleecing_co_preadv_part(
> +        BlockDriverState *bs, int64_t offset, int64_t bytes,
> +        QEMUIOVector *qiov, size_t qiov_offset, BdrvRequestFlags flags)
> +{
> +    BDRVFleecingState *s = bs->opaque;
> +    const BlockReq *req;
> +    int ret;
> +
> +    if (!s->fleecing) {
> +        /* fleecing_drv_activate() was not called */
> +        return -EINVAL;

I'd rather treat a missing connection with a CBW driver as if we had an 
empty copy/access bitmap, and so return -EACCES in these places.

> +    }
> +
> +    /* TODO: upgrade to async loop using AioTask */
> +    while (bytes) {
> +        int64_t cur_bytes;
> +
> +        ret = fleecing_read_lock(s->fleecing, offset, bytes, &req, &cur_bytes);
> +        if (ret < 0) {
> +            return ret;
> +        }
> +
> +        if (req) {
> +            ret = bdrv_co_preadv_part(s->source, offset, cur_bytes,
> +                                      qiov, qiov_offset, flags);
> +            fleecing_read_unlock(s->fleecing, req);
> +        } else {
> +            ret = bdrv_co_preadv_part(bs->file, offset, cur_bytes,
> +                                      qiov, qiov_offset, flags);
> +        }
> +        if (ret < 0) {
> +            return ret;
> +        }
> +
> +        bytes -= cur_bytes;
> +        offset += cur_bytes;
> +        qiov_offset += cur_bytes;
> +    }
> +
> +    return 0;
> +}
> +
> +static int coroutine_fn fleecing_co_block_status(BlockDriverState *bs,
> +                                                 bool want_zero, int64_t offset,
> +                                                 int64_t bytes, int64_t *pnum,
> +                                                 int64_t *map,
> +                                                 BlockDriverState **file)
> +{
> +    BDRVFleecingState *s = bs->opaque;
> +    const BlockReq *req = NULL;
> +    int ret;
> +    int64_t cur_bytes;
> +
> +    if (!s->fleecing) {
> +        /* fleecing_drv_activate() was not called */
> +        return -EINVAL;
> +    }
> +
> +    ret = fleecing_read_lock(s->fleecing, offset, bytes, &req, &cur_bytes);
> +    if (ret < 0) {
> +        return ret;
> +    }
> +
> +    *pnum = cur_bytes;
> +    *map = offset;
> +
> +    if (req) {
> +        *file = s->source->bs;
> +        fleecing_read_unlock(s->fleecing, req);
> +    } else {
> +        *file = bs->file->bs;
> +    }
> +
> +    return ret;

Is ret == 0 the right return value here?

> +}
> +
> +static int coroutine_fn fleecing_co_pdiscard(BlockDriverState *bs,
> +                                             int64_t offset, int64_t bytes)
> +{
> +    BDRVFleecingState *s = bs->opaque;
> +    if (!s->fleecing) {
> +        /* fleecing_drv_activate() was not called */
> +        return -EINVAL;
> +    }
> +
> +    fleecing_discard(s->fleecing, offset, bytes);
> +
> +    bdrv_co_pdiscard(bs->file, offset, bytes);
> +
> +    /*
> +     * Ignore bdrv_co_pdiscard() result: fleecing_discard() succeeded, that
> +     * means that next read from this area will fail with -EACCES. More correct
> +     * to report success now.
> +     */

I don’t know.  I’m asking myself why the caller in turn would care about 
the discard result (usually one doesn’t really care whether discarding 
succeeded or not), and I feel like if they care, they’d like to know 
that discard the data from storage did fail.

> +    return 0;
> +}
> +
> +static int coroutine_fn fleecing_co_pwrite_zeroes(BlockDriverState *bs,
> +        int64_t offset, int64_t bytes, BdrvRequestFlags flags)
> +{
> +    BDRVFleecingState *s = bs->opaque;
> +    if (!s->fleecing) {
> +        /* fleecing_drv_activate() was not called */
> +        return -EINVAL;
> +    }
> +
> +    /*
> +     * TODO: implement cache, to have a chance to fleecing user to read and
> +     * discard this data before actual writing to temporary image.
> +     */

Is there a good reason why a cache shouldn’t be implemented as a 
separate block driver?

> +    return bdrv_co_pwrite_zeroes(bs->file, offset, bytes, flags);
> +}
> +
> +static coroutine_fn int fleecing_co_pwritev(BlockDriverState *bs,
> +                                            int64_t offset,
> +                                            int64_t bytes,
> +                                            QEMUIOVector *qiov,
> +                                            BdrvRequestFlags flags)
> +{
> +    BDRVFleecingState *s = bs->opaque;
> +    if (!s->fleecing) {
> +        /* fleecing_drv_activate() was not called */
> +        return -EINVAL;
> +    }
> +
> +    /*
> +     * TODO: implement cache, to have a chance to fleecing user to read and
> +     * discard this data before actual writing to temporary image.
> +     */
> +    return bdrv_co_pwritev(bs->file, offset, bytes, qiov, flags);
> +}
> +
> +
> +static void fleecing_refresh_filename(BlockDriverState *bs)
> +{
> +    pstrcpy(bs->exact_filename, sizeof(bs->exact_filename),
> +            bs->file->bs->filename);
> +}
> +
> +static int fleecing_open(BlockDriverState *bs, QDict *options, int flags,
> +                         Error **errp)
> +{
> +    BDRVFleecingState *s = bs->opaque;
> +
> +    bs->file = bdrv_open_child(NULL, options, "file", bs, &child_of_bds,
> +                               BDRV_CHILD_DATA | BDRV_CHILD_PRIMARY,
> +                               false, errp);
> +    if (!bs->file) {
> +        return -EINVAL;
> +    }
> +
> +    s->source = bdrv_open_child(NULL, options, "source", bs, &child_of_bds,
> +                               BDRV_CHILD_DATA, false, errp);
> +    if (!s->source) {
> +        return -EINVAL;
> +    }
> +
> +    bs->total_sectors = bs->file->bs->total_sectors;
> +
> +    return 0;
> +}
> +
> +static void fleecing_child_perm(BlockDriverState *bs, BdrvChild *c,
> +                                BdrvChildRole role,
> +                                BlockReopenQueue *reopen_queue,
> +                                uint64_t perm, uint64_t shared,
> +                                uint64_t *nperm, uint64_t *nshared)
> +{
> +    bdrv_default_perms(bs, c, role, reopen_queue, perm, shared, nperm, nshared);
> +
> +    if (role & BDRV_CHILD_PRIMARY) {
> +        *nshared &= BLK_PERM_CONSISTENT_READ;
> +    } else {
> +        *nperm &= BLK_PERM_CONSISTENT_READ;
> +
> +        /*
> +         * copy-before-write filter is responsible for source child and need
> +         * write access to it.
> +         */
> +        *nshared |= BLK_PERM_WRITE;
> +    }
> +}
> +
> +BlockDriver bdrv_fleecing_drv = {
> +    .format_name = "fleecing",
> +    .instance_size = sizeof(BDRVFleecingState),
> +
> +    .bdrv_open                  = fleecing_open,
> +
> +    .bdrv_co_preadv_part        = fleecing_co_preadv_part,
> +    .bdrv_co_pwritev            = fleecing_co_pwritev,
> +    .bdrv_co_pwrite_zeroes      = fleecing_co_pwrite_zeroes,
> +    .bdrv_co_pdiscard           = fleecing_co_pdiscard,
> +    .bdrv_co_block_status       = fleecing_co_block_status,
> +
> +    .bdrv_refresh_filename      = fleecing_refresh_filename,
> +
> +    .bdrv_child_perm            = fleecing_child_perm,
> +};
> +
> +bool is_fleecing_drv(BlockDriverState *bs)
> +{
> +    return bs && bs->drv == &bdrv_fleecing_drv;
> +}

Besides the question whether the FleecingState should be part of CBW or 
the fleecing driver, I don’t like this very much.  As stated above, 
normally we go through the block layer to communicate between nodes, and 
this function for example prevents the possibility of having filters 
between CBW and the fleecing node.

Normally, I would expect a new BlockDriver method that the CBW driver 
would call to communicate with the fleecing driver.  Isn’t 
fleecing_mark_done_and_wait_readers() the only part where the CBW driver 
ever needs to tell the fleecing driver something?

Hm, actually, I wonder why we need fleecing_mark_done_and_wait_readers() 
to be called from CBW – can we not have the fleecing driver call this in 
its write implementations?  (It’s my understanding that the fleecing 
node is to be used read-only from the NBD export, besides discards.)

> +
> +void fleecing_drv_activate(BlockDriverState *bs, FleecingState *fleecing)
> +{
> +    BDRVFleecingState *s = bs->opaque;
> +
> +    assert(is_fleecing_drv(bs));
> +
> +    s->fleecing = fleecing;
> +}
> +
> +static void fleecing_init(void)
> +{
> +    bdrv_register(&bdrv_fleecing_drv);
> +}
> +
> +block_init(fleecing_init);
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 78ea04e292..42dc979052 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -2425,6 +2425,7 @@ F: block/copy-before-write.h
>   F: block/copy-before-write.c
>   F: block/fleecing.h
>   F: block/fleecing.c
> +F: block/fleecing-drv.c
>   F: include/block/aio_task.h
>   F: block/aio_task.c
>   F: util/qemu-co-shared-resource.c
> diff --git a/block/meson.build b/block/meson.build
> index d30da90a01..b493580fbe 100644
> --- a/block/meson.build
> +++ b/block/meson.build
> @@ -19,6 +19,7 @@ block_ss.add(files(
>     'dirty-bitmap.c',
>     'filter-compress.c',
>     'fleecing.c',
> +  'fleecing-drv.c',
>     'io.c',
>     'mirror.c',
>     'nbd.c',



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v3 10/19] block: introduce fleecing block driver
  2022-01-20 16:11   ` Hanna Reitz
@ 2022-01-21 10:46     ` Vladimir Sementsov-Ogievskiy
  2022-01-27 15:28       ` Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 38+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2022-01-21 10:46 UTC (permalink / raw)
  To: Hanna Reitz, qemu-block
  Cc: qemu-devel, armbru, xiechanglong.d, wencongyang2, eblake, kwolf,
	jsnow, nikita.lapshin

20.01.2022 19:11, Hanna Reitz wrote:
> On 22.12.21 18:40, Vladimir Sementsov-Ogievskiy wrote:
>> Introduce a new driver, that works in pair with copy-before-write to
>> improve fleecing.
>>
>> Without fleecing driver, old fleecing scheme looks as follows:
>>
>> [guest]
>>    |
>>    |root
>>    v
>> [copy-before-write] -----> [temp.qcow2] <--- [nbd export]
>>    |                 target  |
>>    |file                     |backing
>>    v                         |
>> [active disk] <-------------+
>>
>> With fleecing driver, new scheme is:
>>
>> [guest]
>>    |
>>    |root
>>    v
>> [copy-before-write] -----> [fleecing] <--- [nbd export]
>>    |                 target  |    |
>>    |file                     |    |file
>>    v                         |    v
>> [active disk]<--source------+  [temp.img]
>>
>> Benefits of new scheme:
>>
>> 1. Access control: if remote client try to read data that not covered
>>     by original dirty bitmap used on copy-before-write open, client gets
>>     -EACCES.
>>
>> 2. Discard support: if remote client do DISCARD, this additionally to
>>     discarding data in temp.img informs block-copy process to not copy
>>     these clusters. Next read from discarded area will return -EACCES.
>>     This is significant thing: when fleecing user reads data that was
>>     not yet copied to temp.img, we can avoid copying it on further guest
>>     write.
>>
>> 3. Synchronisation between client reads and block-copy write is more
>>     efficient: it doesn't block intersecting block-copy write during
>>     client read.
>>
>> 4. We don't rely on backing feature: active disk should not be backing
>>     of temp image, so we avoid some permission-related difficulties and
>>     temp image now is not required to support backing, it may be simple
>>     raw image.
>>
>> Note that now nobody calls fleecing_drv_activate(), so new driver is
>> actually unusable. It's a work for the following patch: support
>> fleecing block driver in copy-before-write filter driver.
>>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>> ---
>>   qapi/block-core.json |  37 +++++-
>>   block/fleecing.h     |  16 +++
>>   block/fleecing-drv.c | 261 +++++++++++++++++++++++++++++++++++++++++++
>>   MAINTAINERS          |   1 +
>>   block/meson.build    |   1 +
>>   5 files changed, 315 insertions(+), 1 deletion(-)
>>   create mode 100644 block/fleecing-drv.c
>>
>> diff --git a/qapi/block-core.json b/qapi/block-core.json
>> index 6904daeacf..b47351dbac 100644
>> --- a/qapi/block-core.json
>> +++ b/qapi/block-core.json
>> @@ -2917,13 +2917,14 @@
>>   # @blkreplay: Since 4.2
>>   # @compress: Since 5.0
>>   # @copy-before-write: Since 6.2
>> +# @fleecing: Since 7.0
>>   #
>>   # Since: 2.9
>>   ##
>>   { 'enum': 'BlockdevDriver',
>>     'data': [ 'blkdebug', 'blklogwrites', 'blkreplay', 'blkverify', 'bochs',
>>               'cloop', 'compress', 'copy-before-write', 'copy-on-read', 'dmg',
>> -            'file', 'ftp', 'ftps', 'gluster',
>> +            'file', 'fleecing', 'ftp', 'ftps', 'gluster',
>>               {'name': 'host_cdrom', 'if': 'HAVE_HOST_BLOCK_DEVICE' },
>>               {'name': 'host_device', 'if': 'HAVE_HOST_BLOCK_DEVICE' },
>>               'http', 'https', 'iscsi',
>> @@ -4181,6 +4182,39 @@
>>     'base': 'BlockdevOptionsGenericFormat',
>>     'data': { 'target': 'BlockdevRef', '*bitmap': 'BlockDirtyBitmap' } }
>> +##
>> +# @BlockdevOptionsFleecing:
>> +#
>> +# Driver that works in pair with copy-before-write filter to make a fleecing
>> +# scheme like this:
>> +#
>> +#    [guest]
>> +#      |
>> +#      |root
>> +#      v
>> +#    [copy-before-write] -----> [fleecing] <--- [nbd export]
>> +#      |                 target  |    |
>> +#      |file                     |    |file
>> +#      v                         |    v
>> +#    [active disk]<--source------+  [temp.img]
> 
> When generating docs, my sphinx doesn’t like this very much.  I don’t know exactly what of it, but it complains with:
> 
> docs/../qapi/block-core.json:4190:Line block ends without a blank line.
> 
> (Line 4190 is the “@BlockdevOptionsFleecing:” line, but there is no warning if I remove this ASCII art.)

I usually disable docs building to not waste the time.. But I should enable it at least once to check that I don't break it.

> 
>> +#
>> +# The scheme works like this: on write, fleecing driver saves data to its
>> +# ``file`` child and remember that this data is in ``file`` child. On read
>> +# fleecing reads from ``file`` child if data is already stored to it and
>> +# otherwise it reads from ``source`` child.
> 
> I.e. it’s basically a COW format with the allocation bitmap stored as a block dirty bitmap.
> 
>> +# In the same time, before each guest write, ``copy-before-write`` copies
>> +# corresponding old data  from ``active disk`` to ``fleecing`` node.
>> +# This way, ``fleecing`` node looks like a kind of snapshot for extenal
>> +# reader like NBD export.
> 
> So this description sounds like the driver is just a COW driver with an in-memory allocation bitmap.  But it’s actually specifically tuned for fleecing, because it interacts with the CBW node to prevent conflicts, and discard requests result in the respective areas become unreadable.
> 
> I find that important to mention, because if we don’t, then I’m wondering why this isn’t a generic “in-memory-cow” driver, and what makes it so useful for fleecing over any other COW driver.
> 
> (In fact, I’m asking myself all the time whether we can’t pull this driver apart into more generic nodes, like one in-memory-cow driver, and another driver managing the discard feature, and so on.  Could be done e.g. like this:
> 
> 
>                  Guest -> copy-before-write --file--> fleecing-lock --file--> disk image
> ^        |                  ^
> |      target               |
> +-- cbw-child --+        |               backing
> |           v                  |
> NBD -> fleecing-discard --file--> in-memory-cow -----------+
>                                          |
>          file
>            |
>            v
>        temp.img

Hmm ASCII art is broken for me.. Me trying to fix:


                                     ┌──────────────────┐
                                     │       NBD        │
                                     └─┬────────────────┘
                                       │
                                       │ root
                                       ▼
    ┌──────────┐                     ┌──────────────────┐
    │  guest   │     ┌───────────────┤ fleecing-discard │
    └─┬────────┘     │ cbw-child     └─┬────────────────┘
      │              │                 │
      │ root         │                 │ file
      ▼              ▼                 ▼
    ┌──────────────────┐  target     ┌──────────────────┐
    │       CBW        ├────────────►│  in-memory-cow   │
    └─┬────────────────┘             └─┬───────────┬────┘
      │                                │           │
      │ file                           │           │ file
      ▼                                │           ▼
    ┌──────────────────┐     backing   │        ┌─────────────┐
    │  fleecing-lock   │◄──────────────┘        │ temp.img    │
    └─┬────────────────┘                        └─────────────┘
      │
      │ file
      ▼
    ┌──────────────────┐
    │   active-disk    │
    └──────────────────┘

> 
> I.e. fleecing-discard would handle discards (telling its cbw-child to drop those areas from the copy-bitmap, and forwarding discards to the in-memory-cow node)

, the in-memory-cow node would just be a generic implementation of COW (could be replaced by any other COW-implementing node, like qcow2),

Hmm, but than in-memory-cow should own the done_bitmap bitmap. But we want to use it for synchronization in upper layers..


> and the fleecing-lock driver would prevent areas that are still being read from from being written to concurrently.

But we want to call fleecing_mark_done_and_wait_readers() exactly after copy-before-write operation, so this call should be done in CBW filter, not in fleecing lock

[*] upd after answering to last comment: or we don't want..

> 
> Problem is, of course, that’s very complicated, I haven’t thought this through, and it’s extremely questionable whether we really need this modularity.  Most likely not.

Yes, I try to go with not-too-many filters.

> 
> I still feel compelled to think about such modularization, because the relationship between the CBW and the fleecing driver as laid out in this series doesn’t feel quite right to me.  They feel bolted together in a way that doesn’t fit in with the general design of the block layer where every node is basically self-contained.  I understand CBW and fleecing will need some communication, but I don’t (yet) like how in the next patch, the CBW driver looks for the fleecing driver and directly communicates with it through the FleecingState instead of going through the block layer, as we’d normally do when communicating between block nodes.
> 
> That’s why I’m trying to pick apart the functionality of the fleecing block driver into self-contained “atomic” nodes that perform its different functionalities, so that perhaps I can eventually put it back together and find out whether we can do better than `is_fleecing_drv(unfiltered_target)`.)

Big part of the problem is that we want somehow bind together two filters. But we can't make both the child of each other, as it would be a loop. May be we should introduce "non-child" relationship on the graph? Which will not participate in permission update but only in aio-context management?

We may add a parameter for CBW filter, that points directly to fleecing filter instead of "is_fleecing_drv(unfiltered_target)".. But it's just and extra argument wchih we can detect automatically.

> 
>> +#
>> +# @source: node name of source node of fleecing scheme
>> +#
>> +# Since: 7.0
>> +##
>> +{ 'struct': 'BlockdevOptionsFleecing',
>> +  'base': 'BlockdevOptionsGenericFormat',
>> +  'data': { 'source': 'str' } }
>> +
>>   ##
>>   # @BlockdevOptions:
>>   #
>> @@ -4237,6 +4271,7 @@
>>         'copy-on-read':'BlockdevOptionsCor',
>>         'dmg':        'BlockdevOptionsGenericFormat',
>>         'file':       'BlockdevOptionsFile',
>> +      'fleecing':   'BlockdevOptionsFleecing',
>>         'ftp':        'BlockdevOptionsCurlFtp',
>>         'ftps':       'BlockdevOptionsCurlFtps',
>>         'gluster':    'BlockdevOptionsGluster',
>> diff --git a/block/fleecing.h b/block/fleecing.h
>> index fb7b2f86c4..75ad2f8b19 100644
>> --- a/block/fleecing.h
>> +++ b/block/fleecing.h
>> @@ -80,6 +80,9 @@
>>   #include "block/block-copy.h"
>>   #include "block/reqlist.h"
>> +
>> +/* fleecing.c */
>> +
>>   typedef struct FleecingState FleecingState;
>>   /*
>> @@ -132,4 +135,17 @@ void fleecing_discard(FleecingState *f, int64_t offset, int64_t bytes);
>>   void fleecing_mark_done_and_wait_readers(FleecingState *f, int64_t offset,
>>                                            int64_t bytes);
>> +
>> +/* fleecing-drv.c */
>> +
>> +/* Returns true if @bs->drv is fleecing block driver */
>> +bool is_fleecing_drv(BlockDriverState *bs);
>> +
>> +/*
>> + * Normally FleecingState is created by copy-before-write filter. Then
>> + * copy-before-write filter calls fleecing_drv_activate() to share FleecingState
>> + * with fleecing block driver.
>> + */
>> +void fleecing_drv_activate(BlockDriverState *bs, FleecingState *fleecing);
>> +
>>   #endif /* FLEECING_H */
>> diff --git a/block/fleecing-drv.c b/block/fleecing-drv.c
>> new file mode 100644
>> index 0000000000..202208bb03
>> --- /dev/null
>> +++ b/block/fleecing-drv.c
>> @@ -0,0 +1,261 @@
>> +/*
>> + * fleecing block driver
>> + *
>> + * Copyright (c) 2021 Virtuozzo International GmbH.
>> + *
>> + * Author:
>> + *  Sementsov-Ogievskiy Vladimir <vsementsov@virtuozzo.com>
>> + *
>> + * This program is free software; you can redistribute it and/or modify
>> + * it under the terms of the GNU General Public License as published by
>> + * the Free Software Foundation; either version 2 of the License, or
>> + * (at your option) any later version.
>> + *
>> + * This program is distributed in the hope that it will be useful,
>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> + * GNU General Public License for more details.
>> + *
>> + * You should have received a copy of the GNU General Public License
>> + * along with this program. If not, see <http://www.gnu.org/licenses/>.
>> + */
>> +
>> +#include "qemu/osdep.h"
>> +
>> +#include "sysemu/block-backend.h"
>> +#include "qemu/cutils.h"
>> +#include "qapi/error.h"
>> +#include "block/block_int.h"
>> +#include "block/coroutines.h"
>> +#include "block/qdict.h"
>> +#include "block/block-copy.h"
>> +#include "block/reqlist.h"
>> +
>> +#include "block/copy-before-write.h"
>> +#include "block/fleecing.h"
>> +
>> +typedef struct BDRVFleecingState {
>> +    FleecingState *fleecing;
>> +    BdrvChild *source;
>> +} BDRVFleecingState;
>> +
>> +static coroutine_fn int fleecing_co_preadv_part(
>> +        BlockDriverState *bs, int64_t offset, int64_t bytes,
>> +        QEMUIOVector *qiov, size_t qiov_offset, BdrvRequestFlags flags)
>> +{
>> +    BDRVFleecingState *s = bs->opaque;
>> +    const BlockReq *req;
>> +    int ret;
>> +
>> +    if (!s->fleecing) {
>> +        /* fleecing_drv_activate() was not called */
>> +        return -EINVAL;
> 
> I'd rather treat a missing connection with a CBW driver as if we had an empty copy/access bitmap, and so return -EACCES in these places.

OK for me

> 
>> +    }
>> +
>> +    /* TODO: upgrade to async loop using AioTask */
>> +    while (bytes) {
>> +        int64_t cur_bytes;
>> +
>> +        ret = fleecing_read_lock(s->fleecing, offset, bytes, &req, &cur_bytes);
>> +        if (ret < 0) {
>> +            return ret;
>> +        }
>> +
>> +        if (req) {
>> +            ret = bdrv_co_preadv_part(s->source, offset, cur_bytes,
>> +                                      qiov, qiov_offset, flags);
>> +            fleecing_read_unlock(s->fleecing, req);
>> +        } else {
>> +            ret = bdrv_co_preadv_part(bs->file, offset, cur_bytes,
>> +                                      qiov, qiov_offset, flags);
>> +        }
>> +        if (ret < 0) {
>> +            return ret;
>> +        }
>> +
>> +        bytes -= cur_bytes;
>> +        offset += cur_bytes;
>> +        qiov_offset += cur_bytes;
>> +    }
>> +
>> +    return 0;
>> +}
>> +
>> +static int coroutine_fn fleecing_co_block_status(BlockDriverState *bs,
>> +                                                 bool want_zero, int64_t offset,
>> +                                                 int64_t bytes, int64_t *pnum,
>> +                                                 int64_t *map,
>> +                                                 BlockDriverState **file)
>> +{
>> +    BDRVFleecingState *s = bs->opaque;
>> +    const BlockReq *req = NULL;
>> +    int ret;
>> +    int64_t cur_bytes;
>> +
>> +    if (!s->fleecing) {
>> +        /* fleecing_drv_activate() was not called */
>> +        return -EINVAL;
>> +    }
>> +
>> +    ret = fleecing_read_lock(s->fleecing, offset, bytes, &req, &cur_bytes);
>> +    if (ret < 0) {
>> +        return ret;
>> +    }
>> +
>> +    *pnum = cur_bytes;
>> +    *map = offset;
>> +
>> +    if (req) {
>> +        *file = s->source->bs;
>> +        fleecing_read_unlock(s->fleecing, req);
>> +    } else {
>> +        *file = bs->file->bs;
>> +    }
>> +
>> +    return ret;
> 
> Is ret == 0 the right return value here?

Hmm yes, looks strange, it should be some combination of flags.

> 
>> +}
>> +
>> +static int coroutine_fn fleecing_co_pdiscard(BlockDriverState *bs,
>> +                                             int64_t offset, int64_t bytes)
>> +{
>> +    BDRVFleecingState *s = bs->opaque;
>> +    if (!s->fleecing) {
>> +        /* fleecing_drv_activate() was not called */
>> +        return -EINVAL;
>> +    }
>> +
>> +    fleecing_discard(s->fleecing, offset, bytes);
>> +
>> +    bdrv_co_pdiscard(bs->file, offset, bytes);
>> +
>> +    /*
>> +     * Ignore bdrv_co_pdiscard() result: fleecing_discard() succeeded, that
>> +     * means that next read from this area will fail with -EACCES. More correct
>> +     * to report success now.
>> +     */
> 
> I don’t know.  I’m asking myself why the caller in turn would care about the discard result (usually one doesn’t really care whether discarding succeeded or not), and I feel like if they care, they’d like to know that discard the data from storage did fail.

Returning error is OK too. Will change. Anyway if error is returned, caller shouldn't rely on any assumptions.

> 
>> +    return 0;
>> +}
>> +
>> +static int coroutine_fn fleecing_co_pwrite_zeroes(BlockDriverState *bs,
>> +        int64_t offset, int64_t bytes, BdrvRequestFlags flags)
>> +{
>> +    BDRVFleecingState *s = bs->opaque;
>> +    if (!s->fleecing) {
>> +        /* fleecing_drv_activate() was not called */
>> +        return -EINVAL;
>> +    }
>> +
>> +    /*
>> +     * TODO: implement cache, to have a chance to fleecing user to read and
>> +     * discard this data before actual writing to temporary image.
>> +     */
> 
> Is there a good reason why a cache shouldn’t be implemented as a separate block driver?

I don't remember. My last idea was just to implement all the features in special fleecing driver. But you are right that if we see things that could be split to separate small filter which make sense by itself, it _probably_ worth doing.. I'll think about it when prepare a new version, as it is hard to imagine the whole picture not trying to implement it.

> 
>> +    return bdrv_co_pwrite_zeroes(bs->file, offset, bytes, flags);
>> +}
>> +
>> +static coroutine_fn int fleecing_co_pwritev(BlockDriverState *bs,
>> +                                            int64_t offset,
>> +                                            int64_t bytes,
>> +                                            QEMUIOVector *qiov,
>> +                                            BdrvRequestFlags flags)
>> +{
>> +    BDRVFleecingState *s = bs->opaque;
>> +    if (!s->fleecing) {
>> +        /* fleecing_drv_activate() was not called */
>> +        return -EINVAL;
>> +    }
>> +
>> +    /*
>> +     * TODO: implement cache, to have a chance to fleecing user to read and
>> +     * discard this data before actual writing to temporary image.
>> +     */
>> +    return bdrv_co_pwritev(bs->file, offset, bytes, qiov, flags);
>> +}
>> +
>> +
>> +static void fleecing_refresh_filename(BlockDriverState *bs)
>> +{
>> +    pstrcpy(bs->exact_filename, sizeof(bs->exact_filename),
>> +            bs->file->bs->filename);
>> +}
>> +
>> +static int fleecing_open(BlockDriverState *bs, QDict *options, int flags,
>> +                         Error **errp)
>> +{
>> +    BDRVFleecingState *s = bs->opaque;
>> +
>> +    bs->file = bdrv_open_child(NULL, options, "file", bs, &child_of_bds,
>> +                               BDRV_CHILD_DATA | BDRV_CHILD_PRIMARY,
>> +                               false, errp);
>> +    if (!bs->file) {
>> +        return -EINVAL;
>> +    }
>> +
>> +    s->source = bdrv_open_child(NULL, options, "source", bs, &child_of_bds,
>> +                               BDRV_CHILD_DATA, false, errp);
>> +    if (!s->source) {
>> +        return -EINVAL;
>> +    }
>> +
>> +    bs->total_sectors = bs->file->bs->total_sectors;
>> +
>> +    return 0;
>> +}
>> +
>> +static void fleecing_child_perm(BlockDriverState *bs, BdrvChild *c,
>> +                                BdrvChildRole role,
>> +                                BlockReopenQueue *reopen_queue,
>> +                                uint64_t perm, uint64_t shared,
>> +                                uint64_t *nperm, uint64_t *nshared)
>> +{
>> +    bdrv_default_perms(bs, c, role, reopen_queue, perm, shared, nperm, nshared);
>> +
>> +    if (role & BDRV_CHILD_PRIMARY) {
>> +        *nshared &= BLK_PERM_CONSISTENT_READ;
>> +    } else {
>> +        *nperm &= BLK_PERM_CONSISTENT_READ;
>> +
>> +        /*
>> +         * copy-before-write filter is responsible for source child and need
>> +         * write access to it.
>> +         */
>> +        *nshared |= BLK_PERM_WRITE;
>> +    }
>> +}
>> +
>> +BlockDriver bdrv_fleecing_drv = {
>> +    .format_name = "fleecing",
>> +    .instance_size = sizeof(BDRVFleecingState),
>> +
>> +    .bdrv_open                  = fleecing_open,
>> +
>> +    .bdrv_co_preadv_part        = fleecing_co_preadv_part,
>> +    .bdrv_co_pwritev            = fleecing_co_pwritev,
>> +    .bdrv_co_pwrite_zeroes      = fleecing_co_pwrite_zeroes,
>> +    .bdrv_co_pdiscard           = fleecing_co_pdiscard,
>> +    .bdrv_co_block_status       = fleecing_co_block_status,
>> +
>> +    .bdrv_refresh_filename      = fleecing_refresh_filename,
>> +
>> +    .bdrv_child_perm            = fleecing_child_perm,
>> +};
>> +
>> +bool is_fleecing_drv(BlockDriverState *bs)
>> +{
>> +    return bs && bs->drv == &bdrv_fleecing_drv;
>> +}
> 
> Besides the question whether the FleecingState should be part of CBW or the fleecing driver, I don’t like this very much.  As stated above, normally we go through the block layer to communicate between nodes, and this function for example prevents the possibility of having filters between CBW and the fleecing node.
> 
> Normally, I would expect a new BlockDriver method that the CBW driver would call to communicate with the fleecing driver.  Isn’t fleecing_mark_done_and_wait_readers() the only part where the CBW driver ever needs to tell the fleecing driver something?
> 
> Hm, actually, I wonder why we need fleecing_mark_done_and_wait_readers() to be called from CBW – can we not have the fleecing driver call this in its write implementations?  (It’s my understanding that the fleecing node is to be used read-only from the NBD export, besides discards.)

Interesting idea. That means that we establish the guarantee: successful write to fleecing node is a point after which it will not touch this region in active-disk, and all in-flight reads are awaited. Then we should propagate this guarantee to block_copy() call.. Seems it should work. I'll try.


Thanks a lot for reviewing, I now have enough material to work on v4. Will see, could this all become a bit more beautiful :)


-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v3 07/19] block/dirty-bitmap: introduce bdrv_dirty_bitmap_status()
  2022-01-18 13:31   ` Hanna Reitz
@ 2022-01-26 10:56     ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 38+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2022-01-26 10:56 UTC (permalink / raw)
  To: Hanna Reitz, qemu-block
  Cc: qemu-devel, armbru, xiechanglong.d, wencongyang2, eblake, kwolf,
	jsnow, nikita.lapshin

18.01.2022 16:31, Hanna Reitz wrote:
>> +/*
>> + * bdrv_dirty_bitmap_status:
>> + * @hb: The HBitmap to operate on
>> + * @start: the offset to start from
>> + * @end: end of requested area
>> + * @is_dirty: is bitmap dirty at @offset
>> + * @pnum: how many bits has same value starting from @offset
>> + */
>> +void hbitmap_status(const HBitmap *hb, int64_t offset, int64_t bytes,
> 
> In addition to the comment not fitting the parameter names, I also don’t find it ideal that the parameter names here don’t match the ones in the function’s definition.
> 
> I don’t have a preference between `start` or `offset` (although most other bitmap functions seem to prefer `start`), but I do prefer `count` over `bytes`, because...  Well, it’s a bit count, not a byte count, right?  (And from the bitmap user’s perspective, those bits might stand for any arbitrary unit.)
> 
> Apart from that, looks nice to me.  I am wondering a bit why this function doesn’t simply return the dirty bit status (like, well, the block-status functions do it), but I presume you simply found this interface to be better suited for its callers.

Hmm, seems, no reason for it actually. Will change to use normal return value.

-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v3 10/19] block: introduce fleecing block driver
  2022-01-21 10:46     ` Vladimir Sementsov-Ogievskiy
@ 2022-01-27 15:28       ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 38+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2022-01-27 15:28 UTC (permalink / raw)
  To: Hanna Reitz, qemu-block
  Cc: qemu-devel, armbru, xiechanglong.d, wencongyang2, eblake, kwolf,
	jsnow, nikita.lapshin

21.01.2022 13:46, Vladimir Sementsov-Ogievskiy wrote:
> 20.01.2022 19:11, Hanna Reitz wrote:
>> On 22.12.21 18:40, Vladimir Sementsov-Ogievskiy wrote:
>>> Introduce a new driver, that works in pair with copy-before-write to
>>> improve fleecing.
>>>
>>> Without fleecing driver, old fleecing scheme looks as follows:
>>>
>>> [guest]
>>>    |
>>>    |root
>>>    v
>>> [copy-before-write] -----> [temp.qcow2] <--- [nbd export]
>>>    |                 target  |
>>>    |file                     |backing
>>>    v                         |
>>> [active disk] <-------------+
>>>
>>> With fleecing driver, new scheme is:
>>>
>>> [guest]
>>>    |
>>>    |root
>>>    v
>>> [copy-before-write] -----> [fleecing] <--- [nbd export]
>>>    |                 target  |    |
>>>    |file                     |    |file
>>>    v                         |    v
>>> [active disk]<--source------+  [temp.img]
>>>
>>> Benefits of new scheme:
>>>
>>> 1. Access control: if remote client try to read data that not covered
>>>     by original dirty bitmap used on copy-before-write open, client gets
>>>     -EACCES.
>>>
>>> 2. Discard support: if remote client do DISCARD, this additionally to
>>>     discarding data in temp.img informs block-copy process to not copy
>>>     these clusters. Next read from discarded area will return -EACCES.
>>>     This is significant thing: when fleecing user reads data that was
>>>     not yet copied to temp.img, we can avoid copying it on further guest
>>>     write.
>>>
>>> 3. Synchronisation between client reads and block-copy write is more
>>>     efficient: it doesn't block intersecting block-copy write during
>>>     client read.
>>>
>>> 4. We don't rely on backing feature: active disk should not be backing
>>>     of temp image, so we avoid some permission-related difficulties and
>>>     temp image now is not required to support backing, it may be simple
>>>     raw image.
>>>
>>> Note that now nobody calls fleecing_drv_activate(), so new driver is
>>> actually unusable. It's a work for the following patch: support
>>> fleecing block driver in copy-before-write filter driver.
>>>
>>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>>> ---
>>>   qapi/block-core.json |  37 +++++-
>>>   block/fleecing.h     |  16 +++
>>>   block/fleecing-drv.c | 261 +++++++++++++++++++++++++++++++++++++++++++
>>>   MAINTAINERS          |   1 +
>>>   block/meson.build    |   1 +
>>>   5 files changed, 315 insertions(+), 1 deletion(-)
>>>   create mode 100644 block/fleecing-drv.c
>>>
>>> diff --git a/qapi/block-core.json b/qapi/block-core.json
>>> index 6904daeacf..b47351dbac 100644
>>> --- a/qapi/block-core.json
>>> +++ b/qapi/block-core.json
>>> @@ -2917,13 +2917,14 @@
>>>   # @blkreplay: Since 4.2
>>>   # @compress: Since 5.0
>>>   # @copy-before-write: Since 6.2
>>> +# @fleecing: Since 7.0
>>>   #
>>>   # Since: 2.9
>>>   ##
>>>   { 'enum': 'BlockdevDriver',
>>>     'data': [ 'blkdebug', 'blklogwrites', 'blkreplay', 'blkverify', 'bochs',
>>>               'cloop', 'compress', 'copy-before-write', 'copy-on-read', 'dmg',
>>> -            'file', 'ftp', 'ftps', 'gluster',
>>> +            'file', 'fleecing', 'ftp', 'ftps', 'gluster',
>>>               {'name': 'host_cdrom', 'if': 'HAVE_HOST_BLOCK_DEVICE' },
>>>               {'name': 'host_device', 'if': 'HAVE_HOST_BLOCK_DEVICE' },
>>>               'http', 'https', 'iscsi',
>>> @@ -4181,6 +4182,39 @@
>>>     'base': 'BlockdevOptionsGenericFormat',
>>>     'data': { 'target': 'BlockdevRef', '*bitmap': 'BlockDirtyBitmap' } }
>>> +##
>>> +# @BlockdevOptionsFleecing:
>>> +#
>>> +# Driver that works in pair with copy-before-write filter to make a fleecing
>>> +# scheme like this:
>>> +#
>>> +#    [guest]
>>> +#      |
>>> +#      |root
>>> +#      v
>>> +#    [copy-before-write] -----> [fleecing] <--- [nbd export]
>>> +#      |                 target  |    |
>>> +#      |file                     |    |file
>>> +#      v                         |    v
>>> +#    [active disk]<--source------+  [temp.img]
>>
>> When generating docs, my sphinx doesn’t like this very much.  I don’t know exactly what of it, but it complains with:
>>
>> docs/../qapi/block-core.json:4190:Line block ends without a blank line.
>>
>> (Line 4190 is the “@BlockdevOptionsFleecing:” line, but there is no warning if I remove this ASCII art.)
> 
> I usually disable docs building to not waste the time.. But I should enable it at least once to check that I don't break it.
> 
>>
>>> +#
>>> +# The scheme works like this: on write, fleecing driver saves data to its
>>> +# ``file`` child and remember that this data is in ``file`` child. On read
>>> +# fleecing reads from ``file`` child if data is already stored to it and
>>> +# otherwise it reads from ``source`` child.
>>
>> I.e. it’s basically a COW format with the allocation bitmap stored as a block dirty bitmap.
>>
>>> +# In the same time, before each guest write, ``copy-before-write`` copies
>>> +# corresponding old data  from ``active disk`` to ``fleecing`` node.
>>> +# This way, ``fleecing`` node looks like a kind of snapshot for extenal
>>> +# reader like NBD export.
>>
>> So this description sounds like the driver is just a COW driver with an in-memory allocation bitmap.  But it’s actually specifically tuned for fleecing, because it interacts with the CBW node to prevent conflicts, and discard requests result in the respective areas become unreadable.
>>
>> I find that important to mention, because if we don’t, then I’m wondering why this isn’t a generic “in-memory-cow” driver, and what makes it so useful for fleecing over any other COW driver.
>>
>> (In fact, I’m asking myself all the time whether we can’t pull this driver apart into more generic nodes, like one in-memory-cow driver, and another driver managing the discard feature, and so on.  Could be done e.g. like this:
>>
>>
>>                  Guest -> copy-before-write --file--> fleecing-lock --file--> disk image
>> ^        |                  ^
>> |      target               |
>> +-- cbw-child --+        |               backing
>> |           v                  |
>> NBD -> fleecing-discard --file--> in-memory-cow -----------+
>>                                          |
>>          file
>>            |
>>            v
>>        temp.img
> 
> Hmm ASCII art is broken for me.. Me trying to fix:
> 
> 
>                                      ┌──────────────────┐
>                                      │       NBD        │
>                                      └─┬────────────────┘
>                                        │
>                                        │ root
>                                        ▼
>     ┌──────────┐                     ┌──────────────────┐
>     │  guest   │     ┌───────────────┤ fleecing-discard │
>     └─┬────────┘     │ cbw-child     └─┬────────────────┘
>       │              │                 │
>       │ root         │                 │ file
>       ▼              ▼                 ▼
>     ┌──────────────────┐  target     ┌──────────────────┐
>     │       CBW        ├────────────►│  in-memory-cow   │
>     └─┬────────────────┘             └─┬───────────┬────┘
>       │                                │           │
>       │ file                           │           │ file
>       ▼                                │           ▼
>     ┌──────────────────┐     backing   │        ┌─────────────┐
>     │  fleecing-lock   │◄──────────────┘        │ temp.img    │
>     └─┬────────────────┘                        └─────────────┘
>       │
>       │ file
>       ▼
>     ┌──────────────────┐
>     │   active-disk    │
>     └──────────────────┘
> 
>>
>> I.e. fleecing-discard would handle discards (telling its cbw-child to drop those areas from the copy-bitmap, and forwarding discards to the in-memory-cow node)
> 
> , the in-memory-cow node would just be a generic implementation of COW (could be replaced by any other COW-implementing node, like qcow2),
> 
> Hmm, but than in-memory-cow should own the done_bitmap bitmap. But we want to use it for synchronization in upper layers..
> 
> 
>> and the fleecing-lock driver would prevent areas that are still being read from from being written to concurrently.
> 
> But we want to call fleecing_mark_done_and_wait_readers() exactly after copy-before-write operation, so this call should be done in CBW filter, not in fleecing lock
> 
> [*] upd after answering to last comment: or we don't want..
> 
>>
>> Problem is, of course, that’s very complicated, I haven’t thought this through, and it’s extremely questionable whether we really need this modularity.  Most likely not.
> 
> Yes, I try to go with not-too-many filters.
> 
>>
>> I still feel compelled to think about such modularization, because the relationship between the CBW and the fleecing driver as laid out in this series doesn’t feel quite right to me.  They feel bolted together in a way that doesn’t fit in with the general design of the block layer where every node is basically self-contained.  I understand CBW and fleecing will need some communication, but I don’t (yet) like how in the next patch, the CBW driver looks for the fleecing driver and directly communicates with it through the FleecingState instead of going through the block layer, as we’d normally do when communicating between block nodes.
>>
>> That’s why I’m trying to pick apart the functionality of the fleecing block driver into self-contained “atomic” nodes that perform its different functionalities, so that perhaps I can eventually put it back together and find out whether we can do better than `is_fleecing_drv(unfiltered_target)`.)
> 
> Big part of the problem is that we want somehow bind together two filters. But we can't make both the child of each other, as it would be a loop. May be we should introduce "non-child" relationship on the graph? Which will not participate in permission update but only in aio-context management?
> 
> We may add a parameter for CBW filter, that points directly to fleecing filter instead of "is_fleecing_drv(unfiltered_target)".. But it's just and extra argument wchih we can detect automatically.
> 
>>
>>> +#
>>> +# @source: node name of source node of fleecing scheme
>>> +#
>>> +# Since: 7.0
>>> +##
>>> +{ 'struct': 'BlockdevOptionsFleecing',
>>> +  'base': 'BlockdevOptionsGenericFormat',
>>> +  'data': { 'source': 'str' } }
>>> +
>>>   ##
>>>   # @BlockdevOptions:
>>>   #
>>> @@ -4237,6 +4271,7 @@
>>>         'copy-on-read':'BlockdevOptionsCor',
>>>         'dmg':        'BlockdevOptionsGenericFormat',
>>>         'file':       'BlockdevOptionsFile',
>>> +      'fleecing':   'BlockdevOptionsFleecing',
>>>         'ftp':        'BlockdevOptionsCurlFtp',
>>>         'ftps':       'BlockdevOptionsCurlFtps',
>>>         'gluster':    'BlockdevOptionsGluster',
>>> diff --git a/block/fleecing.h b/block/fleecing.h
>>> index fb7b2f86c4..75ad2f8b19 100644
>>> --- a/block/fleecing.h
>>> +++ b/block/fleecing.h
>>> @@ -80,6 +80,9 @@
>>>   #include "block/block-copy.h"
>>>   #include "block/reqlist.h"
>>> +
>>> +/* fleecing.c */
>>> +
>>>   typedef struct FleecingState FleecingState;
>>>   /*
>>> @@ -132,4 +135,17 @@ void fleecing_discard(FleecingState *f, int64_t offset, int64_t bytes);
>>>   void fleecing_mark_done_and_wait_readers(FleecingState *f, int64_t offset,
>>>                                            int64_t bytes);
>>> +
>>> +/* fleecing-drv.c */
>>> +
>>> +/* Returns true if @bs->drv is fleecing block driver */
>>> +bool is_fleecing_drv(BlockDriverState *bs);
>>> +
>>> +/*
>>> + * Normally FleecingState is created by copy-before-write filter. Then
>>> + * copy-before-write filter calls fleecing_drv_activate() to share FleecingState
>>> + * with fleecing block driver.
>>> + */
>>> +void fleecing_drv_activate(BlockDriverState *bs, FleecingState *fleecing);
>>> +
>>>   #endif /* FLEECING_H */
>>> diff --git a/block/fleecing-drv.c b/block/fleecing-drv.c
>>> new file mode 100644
>>> index 0000000000..202208bb03
>>> --- /dev/null
>>> +++ b/block/fleecing-drv.c
>>> @@ -0,0 +1,261 @@
>>> +/*
>>> + * fleecing block driver
>>> + *
>>> + * Copyright (c) 2021 Virtuozzo International GmbH.
>>> + *
>>> + * Author:
>>> + *  Sementsov-Ogievskiy Vladimir <vsementsov@virtuozzo.com>
>>> + *
>>> + * This program is free software; you can redistribute it and/or modify
>>> + * it under the terms of the GNU General Public License as published by
>>> + * the Free Software Foundation; either version 2 of the License, or
>>> + * (at your option) any later version.
>>> + *
>>> + * This program is distributed in the hope that it will be useful,
>>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>>> + * GNU General Public License for more details.
>>> + *
>>> + * You should have received a copy of the GNU General Public License
>>> + * along with this program. If not, see <http://www.gnu.org/licenses/>.
>>> + */
>>> +
>>> +#include "qemu/osdep.h"
>>> +
>>> +#include "sysemu/block-backend.h"
>>> +#include "qemu/cutils.h"
>>> +#include "qapi/error.h"
>>> +#include "block/block_int.h"
>>> +#include "block/coroutines.h"
>>> +#include "block/qdict.h"
>>> +#include "block/block-copy.h"
>>> +#include "block/reqlist.h"
>>> +
>>> +#include "block/copy-before-write.h"
>>> +#include "block/fleecing.h"
>>> +
>>> +typedef struct BDRVFleecingState {
>>> +    FleecingState *fleecing;
>>> +    BdrvChild *source;
>>> +} BDRVFleecingState;
>>> +
>>> +static coroutine_fn int fleecing_co_preadv_part(
>>> +        BlockDriverState *bs, int64_t offset, int64_t bytes,
>>> +        QEMUIOVector *qiov, size_t qiov_offset, BdrvRequestFlags flags)
>>> +{
>>> +    BDRVFleecingState *s = bs->opaque;
>>> +    const BlockReq *req;
>>> +    int ret;
>>> +
>>> +    if (!s->fleecing) {
>>> +        /* fleecing_drv_activate() was not called */
>>> +        return -EINVAL;
>>
>> I'd rather treat a missing connection with a CBW driver as if we had an empty copy/access bitmap, and so return -EACCES in these places.
> 
> OK for me
> 
>>
>>> +    }
>>> +
>>> +    /* TODO: upgrade to async loop using AioTask */
>>> +    while (bytes) {
>>> +        int64_t cur_bytes;
>>> +
>>> +        ret = fleecing_read_lock(s->fleecing, offset, bytes, &req, &cur_bytes);
>>> +        if (ret < 0) {
>>> +            return ret;
>>> +        }
>>> +
>>> +        if (req) {
>>> +            ret = bdrv_co_preadv_part(s->source, offset, cur_bytes,
>>> +                                      qiov, qiov_offset, flags);
>>> +            fleecing_read_unlock(s->fleecing, req);
>>> +        } else {
>>> +            ret = bdrv_co_preadv_part(bs->file, offset, cur_bytes,
>>> +                                      qiov, qiov_offset, flags);
>>> +        }
>>> +        if (ret < 0) {
>>> +            return ret;
>>> +        }
>>> +
>>> +        bytes -= cur_bytes;
>>> +        offset += cur_bytes;
>>> +        qiov_offset += cur_bytes;
>>> +    }
>>> +
>>> +    return 0;
>>> +}
>>> +
>>> +static int coroutine_fn fleecing_co_block_status(BlockDriverState *bs,
>>> +                                                 bool want_zero, int64_t offset,
>>> +                                                 int64_t bytes, int64_t *pnum,
>>> +                                                 int64_t *map,
>>> +                                                 BlockDriverState **file)
>>> +{
>>> +    BDRVFleecingState *s = bs->opaque;
>>> +    const BlockReq *req = NULL;
>>> +    int ret;
>>> +    int64_t cur_bytes;
>>> +
>>> +    if (!s->fleecing) {
>>> +        /* fleecing_drv_activate() was not called */
>>> +        return -EINVAL;
>>> +    }
>>> +
>>> +    ret = fleecing_read_lock(s->fleecing, offset, bytes, &req, &cur_bytes);
>>> +    if (ret < 0) {
>>> +        return ret;
>>> +    }
>>> +
>>> +    *pnum = cur_bytes;
>>> +    *map = offset;
>>> +
>>> +    if (req) {
>>> +        *file = s->source->bs;
>>> +        fleecing_read_unlock(s->fleecing, req);
>>> +    } else {
>>> +        *file = bs->file->bs;
>>> +    }
>>> +
>>> +    return ret;
>>
>> Is ret == 0 the right return value here?
> 
> Hmm yes, looks strange, it should be some combination of flags.
> 
>>
>>> +}
>>> +
>>> +static int coroutine_fn fleecing_co_pdiscard(BlockDriverState *bs,
>>> +                                             int64_t offset, int64_t bytes)
>>> +{
>>> +    BDRVFleecingState *s = bs->opaque;
>>> +    if (!s->fleecing) {
>>> +        /* fleecing_drv_activate() was not called */
>>> +        return -EINVAL;
>>> +    }
>>> +
>>> +    fleecing_discard(s->fleecing, offset, bytes);
>>> +
>>> +    bdrv_co_pdiscard(bs->file, offset, bytes);
>>> +
>>> +    /*
>>> +     * Ignore bdrv_co_pdiscard() result: fleecing_discard() succeeded, that
>>> +     * means that next read from this area will fail with -EACCES. More correct
>>> +     * to report success now.
>>> +     */
>>
>> I don’t know.  I’m asking myself why the caller in turn would care about the discard result (usually one doesn’t really care whether discarding succeeded or not), and I feel like if they care, they’d like to know that discard the data from storage did fail.
> 
> Returning error is OK too. Will change. Anyway if error is returned, caller shouldn't rely on any assumptions.
> 
>>
>>> +    return 0;
>>> +}
>>> +
>>> +static int coroutine_fn fleecing_co_pwrite_zeroes(BlockDriverState *bs,
>>> +        int64_t offset, int64_t bytes, BdrvRequestFlags flags)
>>> +{
>>> +    BDRVFleecingState *s = bs->opaque;
>>> +    if (!s->fleecing) {
>>> +        /* fleecing_drv_activate() was not called */
>>> +        return -EINVAL;
>>> +    }
>>> +
>>> +    /*
>>> +     * TODO: implement cache, to have a chance to fleecing user to read and
>>> +     * discard this data before actual writing to temporary image.
>>> +     */
>>
>> Is there a good reason why a cache shouldn’t be implemented as a separate block driver?
> 
> I don't remember. My last idea was just to implement all the features in special fleecing driver. But you are right that if we see things that could be split to separate small filter which make sense by itself, it _probably_ worth doing.. I'll think about it when prepare a new version, as it is hard to imagine the whole picture not trying to implement it.
> 
>>
>>> +    return bdrv_co_pwrite_zeroes(bs->file, offset, bytes, flags);
>>> +}
>>> +
>>> +static coroutine_fn int fleecing_co_pwritev(BlockDriverState *bs,
>>> +                                            int64_t offset,
>>> +                                            int64_t bytes,
>>> +                                            QEMUIOVector *qiov,
>>> +                                            BdrvRequestFlags flags)
>>> +{
>>> +    BDRVFleecingState *s = bs->opaque;
>>> +    if (!s->fleecing) {
>>> +        /* fleecing_drv_activate() was not called */
>>> +        return -EINVAL;
>>> +    }
>>> +
>>> +    /*
>>> +     * TODO: implement cache, to have a chance to fleecing user to read and
>>> +     * discard this data before actual writing to temporary image.
>>> +     */
>>> +    return bdrv_co_pwritev(bs->file, offset, bytes, qiov, flags);
>>> +}
>>> +
>>> +
>>> +static void fleecing_refresh_filename(BlockDriverState *bs)
>>> +{
>>> +    pstrcpy(bs->exact_filename, sizeof(bs->exact_filename),
>>> +            bs->file->bs->filename);
>>> +}
>>> +
>>> +static int fleecing_open(BlockDriverState *bs, QDict *options, int flags,
>>> +                         Error **errp)
>>> +{
>>> +    BDRVFleecingState *s = bs->opaque;
>>> +
>>> +    bs->file = bdrv_open_child(NULL, options, "file", bs, &child_of_bds,
>>> +                               BDRV_CHILD_DATA | BDRV_CHILD_PRIMARY,
>>> +                               false, errp);
>>> +    if (!bs->file) {
>>> +        return -EINVAL;
>>> +    }
>>> +
>>> +    s->source = bdrv_open_child(NULL, options, "source", bs, &child_of_bds,
>>> +                               BDRV_CHILD_DATA, false, errp);
>>> +    if (!s->source) {
>>> +        return -EINVAL;
>>> +    }
>>> +
>>> +    bs->total_sectors = bs->file->bs->total_sectors;
>>> +
>>> +    return 0;
>>> +}
>>> +
>>> +static void fleecing_child_perm(BlockDriverState *bs, BdrvChild *c,
>>> +                                BdrvChildRole role,
>>> +                                BlockReopenQueue *reopen_queue,
>>> +                                uint64_t perm, uint64_t shared,
>>> +                                uint64_t *nperm, uint64_t *nshared)
>>> +{
>>> +    bdrv_default_perms(bs, c, role, reopen_queue, perm, shared, nperm, nshared);
>>> +
>>> +    if (role & BDRV_CHILD_PRIMARY) {
>>> +        *nshared &= BLK_PERM_CONSISTENT_READ;
>>> +    } else {
>>> +        *nperm &= BLK_PERM_CONSISTENT_READ;
>>> +
>>> +        /*
>>> +         * copy-before-write filter is responsible for source child and need
>>> +         * write access to it.
>>> +         */
>>> +        *nshared |= BLK_PERM_WRITE;
>>> +    }
>>> +}
>>> +
>>> +BlockDriver bdrv_fleecing_drv = {
>>> +    .format_name = "fleecing",
>>> +    .instance_size = sizeof(BDRVFleecingState),
>>> +
>>> +    .bdrv_open                  = fleecing_open,
>>> +
>>> +    .bdrv_co_preadv_part        = fleecing_co_preadv_part,
>>> +    .bdrv_co_pwritev            = fleecing_co_pwritev,
>>> +    .bdrv_co_pwrite_zeroes      = fleecing_co_pwrite_zeroes,
>>> +    .bdrv_co_pdiscard           = fleecing_co_pdiscard,
>>> +    .bdrv_co_block_status       = fleecing_co_block_status,
>>> +
>>> +    .bdrv_refresh_filename      = fleecing_refresh_filename,
>>> +
>>> +    .bdrv_child_perm            = fleecing_child_perm,
>>> +};
>>> +
>>> +bool is_fleecing_drv(BlockDriverState *bs)
>>> +{
>>> +    return bs && bs->drv == &bdrv_fleecing_drv;
>>> +}
>>
>> Besides the question whether the FleecingState should be part of CBW or the fleecing driver, I don’t like this very much.  As stated above, normally we go through the block layer to communicate between nodes, and this function for example prevents the possibility of having filters between CBW and the fleecing node.
>>
>> Normally, I would expect a new BlockDriver method that the CBW driver would call to communicate with the fleecing driver.  Isn’t fleecing_mark_done_and_wait_readers() the only part where the CBW driver ever needs to tell the fleecing driver something?
>>
>> Hm, actually, I wonder why we need fleecing_mark_done_and_wait_readers() to be called from CBW – can we not have the fleecing driver call this in its write implementations?  (It’s my understanding that the fleecing node is to be used read-only from the NBD export, besides discards.)
> 
> Interesting idea. That means that we establish the guarantee: successful write to fleecing node is a point after which it will not touch this region in active-disk, and all in-flight reads are awaited. Then we should propagate this guarantee to block_copy() call.. Seems it should work. I'll try.
> 
> 
> Thanks a lot for reviewing, I now have enough material to work on v4. Will see, could this all become a bit more beautiful :)
> 
> 

OK, me now thinking. Let me think out loud.

First about RAM cache. We should keep it in mind. But the scheme should work well with system cache used instead of RAM cache.

Image CBW write opertion now:

CBW WRITE  (copy-before-write operation, copying old unchanged data from active disk to some target)

1. It should be guearanteed, that corresponding area in "done" dirty bitmap is not dirty. And it should be guaranteed that we don't have intersecting parallel writes. It all is true for CBW writes. Should we check and assert it? Probably yes.

2. Do write. We are safe, as dirty bitmap is unset here, and all reads goes to active-disk.

If we work only with system cache, it's all that we can. If write is fast, all is OK, data becomes available for read almost immediately. But if write trigger real flush, fleecing reader will have to read from active-disk during this write, when we actually have the data in RAM. It's a bit inefficient. This may be solved by ram-cache node, for which write is always fast, but than we should call some "flush" operation for that region so that RAM usage not grow endlessly.

3. Data is written, so, make it available for readers

   - mutex_lock
   - set bits in dirty bitmap
   - mutex_unlock

4. If ram-cache is in use trigger cache flush, and wait until cache size normalized (we can't finish cbw-write, otherwise RAM usage will grow indefinitely).

5. before starting actual guest write to active-disk, we should wait for all in-flight fleecing reads from active-disk in this area. So, wait on reqlist.. [*]


think about fleecing read

FLEECING READ  (read operation that done by fleecing user like NBD export)

* mutex_lock()

* check the bitmap:

if data is available in the cache or underlying storage, we don't need any synchronization:

    * mutex_unlock()
    
    * do read from cache (or from underlying storage through cache)

else, we should read from active-disk, and want a guarantee that active-disk will not change in this area during the read

    * create request in reqlist
  
    * mutex_unlock()

    * do read from active-disk

    * drop request from reqlist ( reqlist most probably should be protected by the same mutex as above... should it be some sepearate mutex? Or we want to abuse bitmaps mutex? I don't like to abuse anything )



Ok, now, let's think how to spread all this functionality between nodes..

If we have one "fleecing" node, that does everything, it's simple. It owns all the objects: mutex, dirty-bitmap, reqlist, ram-cache..


But what if we want to split it? Decision where from to read and creating request in reqlist should be done under mutex. So it should be in in-ram-cow node. But this brings a kind of syncrhonization which is not needed for generic in-ram-cow node.. Then, if we start to care of CBW/fleecing synchronization in this node, no reason to not do [*] waiting here too. So that doesn't look like generic in-ram-cow, but like specific fleecing driver.. Which is COW driver. But rather specific.

Could we split RAM cache? Seems we could. Write to it is always fast and may be done under mutex. And after write cache size may exceed the maximum. And we need an API, to wait for cache size normalized..

On the other hand, the simplest and minimal implementation of RAM cache is just a list of in-flight write-requests inside fleecing node + rely on system cache. So the operations would look like:


CBW WRITE to FLEECING node

- mutex lock
- check that corresponding bits in the bitmap are unset and no intersecting write requests
- add write request (together with data buf copied or stolen) to write requests list
- set corresponding bits in the bitmap
- mutex unlock
- great, starting from this point data is already available for reads
- write data to underlying node (system cache helps almost never do real write to disk)
- mutex lock
- drop write request from inflight write requests list
- mutex unlock
- wait for in-flight read requests in active-disk in this area

READ from FLEECING node
- mutex lock
- if data is in in-flight write request list, just copy it, unlock mutex and we are done
- else if bitmap is dirty, unlock mutex and read the data from underlying temporary storage
- else
    - create in-flight read request in reqlist
    - mutex unlock
    - read from active disk
    - mutex lock
    - drop in-flight read request from reqlist
    - mutex unlock


That's all about synchronization + simple improvement that makes data of in-flight writes available for reads.. Do we really need it? Or old synchronization based on serializing requests is enough?


Another thing is access/discard feature. It may simply be split, so we finally have something like



                                    ┌─────────────────┐
                                    │ fleecing-access │
                                    └────┬───┬────────┘
                          cbw-child      │   │
                  ┌──────────────────────┘   │file
                  │                          │
    ┌─────────────▼───┐   target    ┌────────▼────────┐
    │ CBW             ├─────────────► fleecing-sync   │
    └────────┬────────┘             └──┬─────┬────────┘
             │                         │     │
       file  │    ┌────────────────────┘     │file
             │    │     source               │
    ┌────────▼────▼───┐             ┌────────▼────────┐
    │ active-disk     │             │ temp.img        │
    └─────────────────┘             └─────────────────┘


And in this scheme, the question becomes meningful: does it worth the complexity? Or we can simply live with old fleecing synchronization based on qcow2 temporary image and serializing requests, and then we have only one fleecing-access driver.

The only operation that fleecing-access does on cbw-child would some new special operation, like bdrv_discard_cbw, or new flag for discard. We can support automatic passs-through this new operation through filters, but I don't think it may be useful. So, the only reason to have two nodes is to have a cbw-child relation.. When actually combing discard feature and check for dirty bitmap on read to one fleecing driver seems reasonable: it becomes a complete fleecing driver, which has signinficant actions for all block operations: read, write, discard. And it make sense.

So actually we want something like this:


                          cbw-friend
                  ┌──────────────────────────┐
                  │                          │
    ┌─────────────▼───┐   target    ┌────────┴────────┐
    │ CBW             ├─────────────► fleecing        │
    └────────┬────────┘             └──┬─────┬────────┘
             │                         │     │
       file  │    ┌────────────────────┘     │file
             │    │     source               │
    ┌────────▼────▼───┐             ┌────────▼────────┐
    │ active-disk     │             │ temp.img        │
    └─────────────────┘             └─────────────────┘


Where cbw-friend link is used to do discard_cbw() operations. And keeping in mind that discard_cbw() is a very specific operation, there is no reason to support this relationship in generic layer, so all we need is simply keep a reference of CBW node in fleecing node, and it seems not so bad.

And this way, question about does new synchronization and caching worth the complexit goes away: no additional complexity for user, we have only one additional node (which is needed for discard functionality anyway) and simple relationship. So for user the whole differecy is links [fleecing] --source-> [active-disk] instead of [temp.qcow2] --backing-> [active-disk]. And we have a small improvement: a bit more optimal synchronization scheme, reuse of data that is already in ram, possibility to use simple raw file as temp.img.


Hmm. But of course, such relationship raises some questions.. What about aio-context switch? We have to hope for "target" relationship.

Interesting, looking at last picture, I see that CBW and fleecing looks like _one_ block node.. Why can't we merge them? Because guest-read and fleecing-client-read operations are different and should be handled differently. As well as guest-discard and fleecing-client-discard..

But we can implement such operatons as separate bdrv_* operations or as flags for read and discard. And then, the scheme will look like this

                             ┌──────────────────┐
                             │  NBD export      │
                             └─────────┬────────┘
                                       │
                                       │root
                                       │
┌─────────────────┐         ┌─────────▼────────┐
│  Guest          │         │ fleecing-access  │
└───────┬─────────┘         └─────────┬────────┘
         │                             │
         │root                         │file
         │                             │
┌───────▼─────────────────────────────▼────────┐
│                  CBW                         │
└───────┬─────────────────────────────┬────────┘
         │                             │
         │file                         │target
         │                             │
┌───────▼─────────┐         ┌─────────▼────────┐
│   active-disk   │         │  temp.img        │
└─────────────────┘         └──────────────────┘


So, the whole logic is in CBW driver. And fleecing related logic is realized as new bdrv handlers: .bdrv_co_fleecing_preadv()  and .bdrv_co_fleecing_discard()

And fleecing-access is a very simple driver that just on .bdrv_co_preadv() calls bdrv_co_fleecing_preadv(bs->file) and on .bdrv_co_pdiscard() calls bdrv_co_fleecing_pdiscard(bs->file).

This way fleecing-access driver is fully independent of CBW, all relations in graph are Qemu-native and the scheme looks very simple for understanding.



More over, it reminds me my old idea of implementing a possibility to read qcow2 internal snapshot directly. It may look like this:

                              ┌──────────────────┐
                              │  NBD export      │
                              └─────────┬────────┘
                                        │
                                        │root
                                        │
  ┌─────────────────┐         ┌─────────▼────────┐
  │  Guest          │         │ fleecing-access  │
  └───────┬─────────┘         └─────────┬────────┘
          │                             │
          │root                         │file
          │                             │
  ┌───────▼─────────────────────────────▼────────┐
  │                qcow2 active disk             │
  └──────────────────────────────────────────────┘


For this to work, we impelement .bdrv_co_fleecing_preadv() in qcow2, which directly reads data from internal snapshot. This also means that it is better to rename fleecing-access to snapshot-access, and handlers to .bdrv_co_snapshot_preadv().. This way, CBW driver just provides a kind of internal snapshot and corresponding interface for it.

And at some point we even can implement reverse-snapshot inside qcow2 driver, so it will work like fleecing scheme, where instead of COW operations CBW operations are done, and the snapshot doesn't influence the disk fragmenation..


-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 38+ messages in thread

end of thread, other threads:[~2022-01-27 16:08 UTC | newest]

Thread overview: 38+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-12-22 17:39 [PATCH v3 00/19] Make image fleecing more usable Vladimir Sementsov-Ogievskiy
2021-12-22 17:40 ` [PATCH v3 01/19] block/block-copy: move copy_bitmap initialization to block_copy_state_new() Vladimir Sementsov-Ogievskiy
2022-01-14 16:54   ` Hanna Reitz
2021-12-22 17:40 ` [PATCH v3 02/19] block/dirty-bitmap: bdrv_merge_dirty_bitmap(): add return value Vladimir Sementsov-Ogievskiy
2022-01-14 16:55   ` Hanna Reitz
2021-12-22 17:40 ` [PATCH v3 03/19] block/block-copy: block_copy_state_new(): add bitmap parameter Vladimir Sementsov-Ogievskiy
2022-01-14 16:58   ` Hanna Reitz
2021-12-22 17:40 ` [PATCH v3 04/19] block/copy-before-write: add bitmap open parameter Vladimir Sementsov-Ogievskiy
2022-01-14 17:47   ` Hanna Reitz
2022-01-17 11:36     ` Vladimir Sementsov-Ogievskiy
2021-12-22 17:40 ` [PATCH v3 05/19] block/block-copy: add block_copy_reset() Vladimir Sementsov-Ogievskiy
2022-01-14 17:51   ` Hanna Reitz
2021-12-22 17:40 ` [PATCH v3 06/19] block: intoduce reqlist Vladimir Sementsov-Ogievskiy
2022-01-14 18:20   ` Hanna Reitz
2021-12-22 17:40 ` [PATCH v3 07/19] block/dirty-bitmap: introduce bdrv_dirty_bitmap_status() Vladimir Sementsov-Ogievskiy
2022-01-17 10:06   ` Nikta Lapshin
2022-01-17 12:02     ` Vladimir Sementsov-Ogievskiy
2022-01-18 13:31   ` Hanna Reitz
2022-01-26 10:56     ` Vladimir Sementsov-Ogievskiy
2021-12-22 17:40 ` [PATCH v3 08/19] block/reqlist: add reqlist_wait_all() Vladimir Sementsov-Ogievskiy
2022-01-17 12:34   ` Nikta Lapshin
2022-01-18 13:44   ` Hanna Reitz
2021-12-22 17:40 ` [PATCH v3 09/19] block: introduce FleecingState class Vladimir Sementsov-Ogievskiy
2022-01-18 16:37   ` Hanna Reitz
2022-01-18 18:35     ` Vladimir Sementsov-Ogievskiy
2021-12-22 17:40 ` [PATCH v3 10/19] block: introduce fleecing block driver Vladimir Sementsov-Ogievskiy
2022-01-20 16:11   ` Hanna Reitz
2022-01-21 10:46     ` Vladimir Sementsov-Ogievskiy
2022-01-27 15:28       ` Vladimir Sementsov-Ogievskiy
2021-12-22 17:40 ` [PATCH v3 11/19] block/copy-before-write: support " Vladimir Sementsov-Ogievskiy
2021-12-22 17:40 ` [PATCH v3 12/19] block/block-copy: add write-unchanged mode Vladimir Sementsov-Ogievskiy
2021-12-22 17:40 ` [PATCH v3 13/19] block/copy-before-write: use write-unchanged in fleecing mode Vladimir Sementsov-Ogievskiy
2021-12-22 17:40 ` [PATCH v3 14/19] iotests/image-fleecing: add test-case for fleecing format node Vladimir Sementsov-Ogievskiy
2021-12-22 17:40 ` [PATCH v3 15/19] iotests.py: add qemu_io_pipe_and_status() Vladimir Sementsov-Ogievskiy
2021-12-22 17:40 ` [PATCH v3 16/19] iotests/image-fleecing: add test case with bitmap Vladimir Sementsov-Ogievskiy
2021-12-22 17:40 ` [PATCH v3 17/19] block: blk_root(): return non-const pointer Vladimir Sementsov-Ogievskiy
2021-12-22 17:40 ` [PATCH v3 18/19] qapi: backup: add immutable-source parameter Vladimir Sementsov-Ogievskiy
2021-12-22 17:40 ` [PATCH v3 19/19] iotests/image-fleecing: test push backup with fleecing Vladimir Sementsov-Ogievskiy

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.