All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH RFC DRAFT 00/11] Make image fleecing more usable
@ 2021-08-04 13:17 Vladimir Sementsov-Ogievskiy
  2021-08-04 13:17 ` [PATCH 01/11] block/block-copy: move copy_bitmap initialization to block_copy_state_new() Vladimir Sementsov-Ogievskiy
                   ` (10 more replies)
  0 siblings, 11 replies; 12+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2021-08-04 13:17 UTC (permalink / raw)
  To: qemu-block; +Cc: qemu-devel, eblake, mreitz, vsementsov, jsnow, kwolf

Hi all!

That's an untested draft. I'll be on vocation 05-22, so no reason for
this just lay in my hard drive. Any comments are welcome (mostly about
general design), but don't waste time on careful reviewing.

What this series brings to image-fleecing:

1. support for bitmap (patch 04). So, we can do incremental external
backups and not do extra copy-before-write operations for non-dirty
regions.

2. fleecing block driver - see lat patch for details and list of
benefits.

Based-on: <20210804093813.20688-1-vsementsov@virtuozzo.com>
  ([PATCH v7 00/33] block: publish backup-top filter)

Vladimir Sementsov-Ogievskiy (11):
  block/block-copy: move copy_bitmap initialization to
    block_copy_state_new()
  block/dirty-bitmap: bdrv_merge_dirty_bitmap(): add return value
  block/block-copy: block_copy_state_new(): add bitmap parameter
  block/copy-before-write: add bitmap open parameter
  block/block-copy: add block_copy_reset()
  block: intoduce reqlist
  block/dirty-bitmap: introduce bdrv_dirty_bitmap_status()
  block/reqlist: add reqlist_wait_all()
  block/copy-before-write: add cbw_snapshot_read_{lock,unlock}()
  block/copy-before-write: add cbw_snapshot_discard()
  block: introduce fleecing block driver

 qapi/block-core.json            |  10 +-
 block/copy-before-write.h       |   6 ++
 include/block/block-copy.h      |   3 +-
 include/block/dirty-bitmap.h    |   4 +-
 include/block/reqlist.h         |  47 ++++++++
 include/qemu/hbitmap.h          |  11 ++
 block/block-copy.c              | 152 +++++++++++---------------
 block/copy-before-write.c       | 143 ++++++++++++++++++++++++-
 block/dirty-bitmap.c            |  14 ++-
 block/fleecing.c                | 183 ++++++++++++++++++++++++++++++++
 block/monitor/bitmap-qmp-cmds.c |   5 +-
 block/reqlist.c                 | 100 +++++++++++++++++
 util/hbitmap.c                  |  36 +++++++
 block/meson.build               |   2 +
 14 files changed, 613 insertions(+), 103 deletions(-)
 create mode 100644 include/block/reqlist.h
 create mode 100644 block/fleecing.c
 create mode 100644 block/reqlist.c

-- 
2.29.2



^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 01/11] block/block-copy: move copy_bitmap initialization to block_copy_state_new()
  2021-08-04 13:17 [PATCH RFC DRAFT 00/11] Make image fleecing more usable Vladimir Sementsov-Ogievskiy
@ 2021-08-04 13:17 ` Vladimir Sementsov-Ogievskiy
  2021-08-04 13:17 ` [PATCH 02/11] block/dirty-bitmap: bdrv_merge_dirty_bitmap(): add return value Vladimir Sementsov-Ogievskiy
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2021-08-04 13:17 UTC (permalink / raw)
  To: qemu-block; +Cc: qemu-devel, eblake, mreitz, vsementsov, jsnow, kwolf

We are going to complicate bitmap initialization in the following
commit. And in future, backup job will be able to work without filter
(when source is immutable), so we'll need same bitmap initialization in
copy-before-write filter and in backup job. So, it's reasonable to do
it in block-copy.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 block/block-copy.c        | 1 +
 block/copy-before-write.c | 4 ----
 2 files changed, 1 insertion(+), 4 deletions(-)

diff --git a/block/block-copy.c b/block/block-copy.c
index 90664ee0ab..307045a59f 100644
--- a/block/block-copy.c
+++ b/block/block-copy.c
@@ -418,6 +418,7 @@ BlockCopyState *block_copy_state_new(BdrvChild *source, BdrvChild *target,
         return NULL;
     }
     bdrv_disable_dirty_bitmap(copy_bitmap);
+    bdrv_set_dirty_bitmap(copy_bitmap, 0, bdrv_dirty_bitmap_size(copy_bitmap));
 
     s = g_new(BlockCopyState, 1);
     *s = (BlockCopyState) {
diff --git a/block/copy-before-write.c b/block/copy-before-write.c
index 2cd68b480a..b36ede3186 100644
--- a/block/copy-before-write.c
+++ b/block/copy-before-write.c
@@ -148,7 +148,6 @@ static int cbw_open(BlockDriverState *bs, QDict *options, int flags,
                     Error **errp)
 {
     BDRVCopyBeforeWriteState *s = bs->opaque;
-    BdrvDirtyBitmap *copy_bitmap;
 
     bs->file = bdrv_open_child(NULL, options, "file", bs, &child_of_bds,
                                BDRV_CHILD_FILTERED | BDRV_CHILD_PRIMARY,
@@ -176,9 +175,6 @@ static int cbw_open(BlockDriverState *bs, QDict *options, int flags,
         return -EINVAL;
     }
 
-    copy_bitmap = block_copy_dirty_bitmap(s->bcs);
-    bdrv_set_dirty_bitmap(copy_bitmap, 0, bdrv_dirty_bitmap_size(copy_bitmap));
-
     return 0;
 }
 
-- 
2.29.2



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 02/11] block/dirty-bitmap: bdrv_merge_dirty_bitmap(): add return value
  2021-08-04 13:17 [PATCH RFC DRAFT 00/11] Make image fleecing more usable Vladimir Sementsov-Ogievskiy
  2021-08-04 13:17 ` [PATCH 01/11] block/block-copy: move copy_bitmap initialization to block_copy_state_new() Vladimir Sementsov-Ogievskiy
@ 2021-08-04 13:17 ` Vladimir Sementsov-Ogievskiy
  2021-08-04 13:17 ` [PATCH 03/11] block/block-copy: block_copy_state_new(): add bitmap parameter Vladimir Sementsov-Ogievskiy
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2021-08-04 13:17 UTC (permalink / raw)
  To: qemu-block; +Cc: qemu-devel, eblake, mreitz, vsementsov, jsnow, kwolf

That simplifies handling failure in existing code and in further new
usage of bdrv_merge_dirty_bitmap().

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 include/block/dirty-bitmap.h    | 2 +-
 block/dirty-bitmap.c            | 8 ++++++--
 block/monitor/bitmap-qmp-cmds.c | 5 +----
 3 files changed, 8 insertions(+), 7 deletions(-)

diff --git a/include/block/dirty-bitmap.h b/include/block/dirty-bitmap.h
index 40950ae3d5..f95d350b70 100644
--- a/include/block/dirty-bitmap.h
+++ b/include/block/dirty-bitmap.h
@@ -77,7 +77,7 @@ void bdrv_dirty_bitmap_set_persistence(BdrvDirtyBitmap *bitmap,
                                        bool persistent);
 void bdrv_dirty_bitmap_set_inconsistent(BdrvDirtyBitmap *bitmap);
 void bdrv_dirty_bitmap_set_busy(BdrvDirtyBitmap *bitmap, bool busy);
-void bdrv_merge_dirty_bitmap(BdrvDirtyBitmap *dest, const BdrvDirtyBitmap *src,
+bool bdrv_merge_dirty_bitmap(BdrvDirtyBitmap *dest, const BdrvDirtyBitmap *src,
                              HBitmap **backup, Error **errp);
 void bdrv_dirty_bitmap_skip_store(BdrvDirtyBitmap *bitmap, bool skip);
 bool bdrv_dirty_bitmap_get(BdrvDirtyBitmap *bitmap, int64_t offset);
diff --git a/block/dirty-bitmap.c b/block/dirty-bitmap.c
index 0ef46163e3..8f8b428baa 100644
--- a/block/dirty-bitmap.c
+++ b/block/dirty-bitmap.c
@@ -881,10 +881,10 @@ bool bdrv_dirty_bitmap_next_dirty_area(BdrvDirtyBitmap *bitmap,
  *
  * @backup: If provided, make a copy of dest here prior to merge.
  */
-void bdrv_merge_dirty_bitmap(BdrvDirtyBitmap *dest, const BdrvDirtyBitmap *src,
+bool bdrv_merge_dirty_bitmap(BdrvDirtyBitmap *dest, const BdrvDirtyBitmap *src,
                              HBitmap **backup, Error **errp)
 {
-    bool ret;
+    bool ret = false;
 
     bdrv_dirty_bitmaps_lock(dest->bs);
     if (src->bs != dest->bs) {
@@ -907,11 +907,15 @@ void bdrv_merge_dirty_bitmap(BdrvDirtyBitmap *dest, const BdrvDirtyBitmap *src,
     ret = bdrv_dirty_bitmap_merge_internal(dest, src, backup, false);
     assert(ret);
 
+    ret = true;
+
 out:
     bdrv_dirty_bitmaps_unlock(dest->bs);
     if (src->bs != dest->bs) {
         bdrv_dirty_bitmaps_unlock(src->bs);
     }
+
+    return ret;
 }
 
 /**
diff --git a/block/monitor/bitmap-qmp-cmds.c b/block/monitor/bitmap-qmp-cmds.c
index 9f11deec64..83970b22fa 100644
--- a/block/monitor/bitmap-qmp-cmds.c
+++ b/block/monitor/bitmap-qmp-cmds.c
@@ -259,7 +259,6 @@ BdrvDirtyBitmap *block_dirty_bitmap_merge(const char *node, const char *target,
     BlockDriverState *bs;
     BdrvDirtyBitmap *dst, *src, *anon;
     BlockDirtyBitmapMergeSourceList *lst;
-    Error *local_err = NULL;
 
     dst = block_dirty_bitmap_lookup(node, target, &bs, errp);
     if (!dst) {
@@ -297,9 +296,7 @@ BdrvDirtyBitmap *block_dirty_bitmap_merge(const char *node, const char *target,
             abort();
         }
 
-        bdrv_merge_dirty_bitmap(anon, src, NULL, &local_err);
-        if (local_err) {
-            error_propagate(errp, local_err);
+        if (!bdrv_merge_dirty_bitmap(anon, src, NULL, errp)) {
             dst = NULL;
             goto out;
         }
-- 
2.29.2



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 03/11] block/block-copy: block_copy_state_new(): add bitmap parameter
  2021-08-04 13:17 [PATCH RFC DRAFT 00/11] Make image fleecing more usable Vladimir Sementsov-Ogievskiy
  2021-08-04 13:17 ` [PATCH 01/11] block/block-copy: move copy_bitmap initialization to block_copy_state_new() Vladimir Sementsov-Ogievskiy
  2021-08-04 13:17 ` [PATCH 02/11] block/dirty-bitmap: bdrv_merge_dirty_bitmap(): add return value Vladimir Sementsov-Ogievskiy
@ 2021-08-04 13:17 ` Vladimir Sementsov-Ogievskiy
  2021-08-04 13:17 ` [PATCH 04/11] block/copy-before-write: add bitmap open parameter Vladimir Sementsov-Ogievskiy
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2021-08-04 13:17 UTC (permalink / raw)
  To: qemu-block; +Cc: qemu-devel, eblake, mreitz, vsementsov, jsnow, kwolf

This will be used in the following commit to bring "incremental" mode
to copy-before-write filter.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 include/block/block-copy.h |  2 +-
 block/block-copy.c         | 16 +++++++++++++---
 block/copy-before-write.c  |  3 ++-
 3 files changed, 16 insertions(+), 5 deletions(-)

diff --git a/include/block/block-copy.h b/include/block/block-copy.h
index 9c24e8f38e..7d40bf2ac2 100644
--- a/include/block/block-copy.h
+++ b/include/block/block-copy.h
@@ -26,7 +26,7 @@ typedef struct BlockCopyCallState BlockCopyCallState;
 
 BlockCopyState *block_copy_state_new(BdrvChild *source, BdrvChild *target,
                                      bool use_copy_range, bool compress,
-                                     Error **errp);
+                                     BdrvDirtyBitmap *bitmap, Error **errp);
 
 void block_copy_set_copy_opts(BlockCopyState *s, bool use_copy_range,
                               bool compress);
diff --git a/block/block-copy.c b/block/block-copy.c
index 307045a59f..ec6a39b2ed 100644
--- a/block/block-copy.c
+++ b/block/block-copy.c
@@ -400,9 +400,10 @@ static int64_t block_copy_calculate_cluster_size(BlockDriverState *target,
 }
 
 BlockCopyState *block_copy_state_new(BdrvChild *source, BdrvChild *target,
-                                     bool use_copy_range,
-                                     bool compress, Error **errp)
+                                     bool use_copy_range, bool compress,
+                                     BdrvDirtyBitmap *bitmap, Error **errp)
 {
+    ERRP_GUARD();
     BlockCopyState *s;
     int64_t cluster_size;
     BdrvDirtyBitmap *copy_bitmap;
@@ -418,7 +419,16 @@ BlockCopyState *block_copy_state_new(BdrvChild *source, BdrvChild *target,
         return NULL;
     }
     bdrv_disable_dirty_bitmap(copy_bitmap);
-    bdrv_set_dirty_bitmap(copy_bitmap, 0, bdrv_dirty_bitmap_size(copy_bitmap));
+    if (bitmap) {
+        if (!bdrv_merge_dirty_bitmap(copy_bitmap, bitmap, NULL, errp)) {
+            error_prepend(errp, "Failed to merge bitmap '%s' to internal "
+                          "copy-bitmap: ", bdrv_dirty_bitmap_name(bitmap));
+            return NULL;
+        }
+    } else {
+        bdrv_set_dirty_bitmap(copy_bitmap, 0,
+                              bdrv_dirty_bitmap_size(copy_bitmap));
+    }
 
     s = g_new(BlockCopyState, 1);
     *s = (BlockCopyState) {
diff --git a/block/copy-before-write.c b/block/copy-before-write.c
index b36ede3186..dbafee1f03 100644
--- a/block/copy-before-write.c
+++ b/block/copy-before-write.c
@@ -169,7 +169,8 @@ static int cbw_open(BlockDriverState *bs, QDict *options, int flags,
             ((BDRV_REQ_FUA | BDRV_REQ_MAY_UNMAP | BDRV_REQ_NO_FALLBACK) &
              bs->file->bs->supported_zero_flags);
 
-    s->bcs = block_copy_state_new(bs->file, s->target, false, false, errp);
+    s->bcs = block_copy_state_new(bs->file, s->target, false, false, NULL,
+                                  errp);
     if (!s->bcs) {
         error_prepend(errp, "Cannot create block-copy-state: ");
         return -EINVAL;
-- 
2.29.2



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 04/11] block/copy-before-write: add bitmap open parameter
  2021-08-04 13:17 [PATCH RFC DRAFT 00/11] Make image fleecing more usable Vladimir Sementsov-Ogievskiy
                   ` (2 preceding siblings ...)
  2021-08-04 13:17 ` [PATCH 03/11] block/block-copy: block_copy_state_new(): add bitmap parameter Vladimir Sementsov-Ogievskiy
@ 2021-08-04 13:17 ` Vladimir Sementsov-Ogievskiy
  2021-08-04 13:17 ` [PATCH 05/11] block/block-copy: add block_copy_reset() Vladimir Sementsov-Ogievskiy
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2021-08-04 13:17 UTC (permalink / raw)
  To: qemu-block; +Cc: qemu-devel, eblake, mreitz, vsementsov, jsnow, kwolf

This brings "incremental" mode to copy-before-write filter: user can
specify bitmap so that filter will copy only "dirty" areas.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 qapi/block-core.json      | 10 +++++++++-
 block/copy-before-write.c | 28 +++++++++++++++++++++++++++-
 2 files changed, 36 insertions(+), 2 deletions(-)

diff --git a/qapi/block-core.json b/qapi/block-core.json
index 59d3e5e42d..d061204cb0 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -4063,11 +4063,19 @@
 #
 # @target: The target for copy-before-write operations.
 #
+# @bitmap: If specified, copy-before-write filter will do
+#          copy-before-write operations only for dirty regions of the
+#          bitmap. Bitmap size must be equal to length of file and
+#          target child of the filter. Note also, that bitmap is used
+#          only to initialize internal bitmap of the process, so further
+#          modifications (or removing) of specified bitmap doesn't
+#          influence the filter.
+#
 # Since: 6.1
 ##
 { 'struct': 'BlockdevOptionsCbw',
   'base': 'BlockdevOptionsGenericFormat',
-  'data': { 'target': 'BlockdevRef' } }
+  'data': { 'target': 'BlockdevRef', '*bitmap': 'BlockDirtyBitmap' } }
 
 ##
 # @BlockdevOptions:
diff --git a/block/copy-before-write.c b/block/copy-before-write.c
index dbafee1f03..b58a5e8b48 100644
--- a/block/copy-before-write.c
+++ b/block/copy-before-write.c
@@ -148,6 +148,7 @@ static int cbw_open(BlockDriverState *bs, QDict *options, int flags,
                     Error **errp)
 {
     BDRVCopyBeforeWriteState *s = bs->opaque;
+    BdrvDirtyBitmap *bitmap = NULL;
 
     bs->file = bdrv_open_child(NULL, options, "file", bs, &child_of_bds,
                                BDRV_CHILD_FILTERED | BDRV_CHILD_PRIMARY,
@@ -162,6 +163,31 @@ static int cbw_open(BlockDriverState *bs, QDict *options, int flags,
         return -EINVAL;
     }
 
+    if (qdict_haskey(options, "bitmap.node") ||
+        qdict_haskey(options, "bitmap.name"))
+    {
+        const char *bitmap_node, *bitmap_name;
+
+        if (!qdict_haskey(options, "bitmap.node")) {
+            error_setg(errp, "bitmap.node is not specified");
+            return -EINVAL;
+        }
+
+        if (!qdict_haskey(options, "bitmap.name")) {
+            error_setg(errp, "bitmap.name is not specified");
+            return -EINVAL;
+        }
+
+        bitmap_node = qdict_get_str(options, "bitmap.node");
+        bitmap_name = qdict_get_str(options, "bitmap.name");
+
+        bitmap = block_dirty_bitmap_lookup(bitmap_node, bitmap_name, NULL,
+                                           errp);
+        if (!bitmap) {
+            return -EINVAL;
+        }
+    }
+
     bs->total_sectors = bs->file->bs->total_sectors;
     bs->supported_write_flags = BDRV_REQ_WRITE_UNCHANGED |
             (BDRV_REQ_FUA & bs->file->bs->supported_write_flags);
@@ -169,7 +195,7 @@ static int cbw_open(BlockDriverState *bs, QDict *options, int flags,
             ((BDRV_REQ_FUA | BDRV_REQ_MAY_UNMAP | BDRV_REQ_NO_FALLBACK) &
              bs->file->bs->supported_zero_flags);
 
-    s->bcs = block_copy_state_new(bs->file, s->target, false, false, NULL,
+    s->bcs = block_copy_state_new(bs->file, s->target, false, false, bitmap,
                                   errp);
     if (!s->bcs) {
         error_prepend(errp, "Cannot create block-copy-state: ");
-- 
2.29.2



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 05/11] block/block-copy: add block_copy_reset()
  2021-08-04 13:17 [PATCH RFC DRAFT 00/11] Make image fleecing more usable Vladimir Sementsov-Ogievskiy
                   ` (3 preceding siblings ...)
  2021-08-04 13:17 ` [PATCH 04/11] block/copy-before-write: add bitmap open parameter Vladimir Sementsov-Ogievskiy
@ 2021-08-04 13:17 ` Vladimir Sementsov-Ogievskiy
  2021-08-04 13:17 ` [PATCH 06/11] block: intoduce reqlist Vladimir Sementsov-Ogievskiy
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2021-08-04 13:17 UTC (permalink / raw)
  To: qemu-block; +Cc: qemu-devel, eblake, mreitz, vsementsov, jsnow, kwolf

Split block_copy_reset() out of block_copy_reset_unallocated() to be
used in separate later.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 include/block/block-copy.h |  1 +
 block/block-copy.c         | 21 +++++++++++++--------
 2 files changed, 14 insertions(+), 8 deletions(-)

diff --git a/include/block/block-copy.h b/include/block/block-copy.h
index 7d40bf2ac2..599af02bef 100644
--- a/include/block/block-copy.h
+++ b/include/block/block-copy.h
@@ -34,6 +34,7 @@ void block_copy_set_progress_meter(BlockCopyState *s, ProgressMeter *pm);
 
 void block_copy_state_free(BlockCopyState *s);
 
+void block_copy_reset(BlockCopyState *s, int64_t offset, int64_t bytes);
 int64_t block_copy_reset_unallocated(BlockCopyState *s,
                                      int64_t offset, int64_t *count);
 
diff --git a/block/block-copy.c b/block/block-copy.c
index ec6a39b2ed..5cd291727b 100644
--- a/block/block-copy.c
+++ b/block/block-copy.c
@@ -689,6 +689,18 @@ static int block_copy_is_cluster_allocated(BlockCopyState *s, int64_t offset,
     }
 }
 
+void block_copy_reset(BlockCopyState *s, int64_t offset, int64_t bytes)
+{
+    QEMU_LOCK_GUARD(&s->lock);
+
+    bdrv_reset_dirty_bitmap(s->copy_bitmap, offset, bytes);
+    if (s->progress) {
+        progress_set_remaining(s->progress,
+                               bdrv_get_dirty_count(s->copy_bitmap) +
+                               s->in_flight_bytes);
+    }
+}
+
 /*
  * Reset bits in copy_bitmap starting at offset if they represent unallocated
  * data in the image. May reset subsequent contiguous bits.
@@ -709,14 +721,7 @@ int64_t block_copy_reset_unallocated(BlockCopyState *s,
     bytes = clusters * s->cluster_size;
 
     if (!ret) {
-        qemu_co_mutex_lock(&s->lock);
-        bdrv_reset_dirty_bitmap(s->copy_bitmap, offset, bytes);
-        if (s->progress) {
-            progress_set_remaining(s->progress,
-                                   bdrv_get_dirty_count(s->copy_bitmap) +
-                                   s->in_flight_bytes);
-        }
-        qemu_co_mutex_unlock(&s->lock);
+        block_copy_reset(s, offset, bytes);
     }
 
     *count = bytes;
-- 
2.29.2



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 06/11] block: intoduce reqlist
  2021-08-04 13:17 [PATCH RFC DRAFT 00/11] Make image fleecing more usable Vladimir Sementsov-Ogievskiy
                   ` (4 preceding siblings ...)
  2021-08-04 13:17 ` [PATCH 05/11] block/block-copy: add block_copy_reset() Vladimir Sementsov-Ogievskiy
@ 2021-08-04 13:17 ` Vladimir Sementsov-Ogievskiy
  2021-08-04 13:17 ` [PATCH 07/11] block/dirty-bitmap: introduce bdrv_dirty_bitmap_status() Vladimir Sementsov-Ogievskiy
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2021-08-04 13:17 UTC (permalink / raw)
  To: qemu-block; +Cc: qemu-devel, eblake, mreitz, vsementsov, jsnow, kwolf

Split intersecting-requests functionality out of block-copy to be
reused in copy-before-write filter.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 include/block/reqlist.h |  45 ++++++++++++++++
 block/block-copy.c      | 116 +++++++++++++---------------------------
 block/reqlist.c         |  94 ++++++++++++++++++++++++++++++++
 block/meson.build       |   1 +
 4 files changed, 177 insertions(+), 79 deletions(-)
 create mode 100644 include/block/reqlist.h
 create mode 100644 block/reqlist.c

diff --git a/include/block/reqlist.h b/include/block/reqlist.h
new file mode 100644
index 0000000000..2de86be300
--- /dev/null
+++ b/include/block/reqlist.h
@@ -0,0 +1,45 @@
+/*
+ * block_copy API
+ *
+ * Copyright (C) 2013 Proxmox Server Solutions
+ * Copyright (c) 2021 Virtuozzo International GmbH.
+ *
+ * Authors:
+ *  Dietmar Maurer (dietmar@proxmox.com)
+ *  Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#ifndef REQLIST_H
+#define REQLIST_H
+
+#include "qemu/coroutine.h"
+
+/*
+ * The API is not thread-safe and shouldn't be. The struct is public to be part
+ * of other structures and protected by third-party locks, see
+ * block/block-copy.c for example.
+ */
+
+typedef struct BlockReq {
+    int64_t offset;
+    int64_t bytes;
+
+    CoQueue wait_queue; /* coroutines blocked on this req */
+    QLIST_ENTRY(BlockReq) list;
+} BlockReq;
+
+typedef QLIST_HEAD(, BlockReq) BlockReqList;
+
+void reqlist_init_req(BlockReqList *reqs, BlockReq *req, int64_t offset,
+                      int64_t bytes);
+BlockReq *reqlist_find_conflict(BlockReqList *reqs, int64_t offset,
+                                int64_t bytes);
+bool coroutine_fn reqlist_wait_one(BlockReqList *reqs, int64_t offset,
+                                   int64_t bytes, CoMutex *lock);
+void coroutine_fn reqlist_shrink_req(BlockReq *req, int64_t new_bytes);
+void coroutine_fn reqlist_remove_req(BlockReq *req);
+
+#endif /* REQLIST_H */
diff --git a/block/block-copy.c b/block/block-copy.c
index 5cd291727b..de388b7b96 100644
--- a/block/block-copy.c
+++ b/block/block-copy.c
@@ -17,6 +17,7 @@
 #include "trace.h"
 #include "qapi/error.h"
 #include "block/block-copy.h"
+#include "block/reqlist.h"
 #include "sysemu/block-backend.h"
 #include "qemu/units.h"
 #include "qemu/coroutine.h"
@@ -82,7 +83,6 @@ typedef struct BlockCopyTask {
      */
     BlockCopyState *s;
     BlockCopyCallState *call_state;
-    int64_t offset;
     /*
      * @method can also be set again in the while loop of
      * block_copy_dirty_clusters(), but it is never accessed concurrently
@@ -93,21 +93,17 @@ typedef struct BlockCopyTask {
     BlockCopyMethod method;
 
     /*
-     * Fields whose state changes throughout the execution
-     * Protected by lock in BlockCopyState.
+     * Generally, req is protected by lock in BlockCopyState, Still req.offset
+     * is only set on task creation, so may be read concurrently after creation.
+     * req.bytes is changed at most once, and need only protecting the case of
+     * parallel read while updating @bytes value in block_copy_task_shrink().
      */
-    CoQueue wait_queue; /* coroutines blocked on this task */
-    /*
-     * Only protect the case of parallel read while updating @bytes
-     * value in block_copy_task_shrink().
-     */
-    int64_t bytes;
-    QLIST_ENTRY(BlockCopyTask) list;
+    BlockReq req;
 } BlockCopyTask;
 
 static int64_t task_end(BlockCopyTask *task)
 {
-    return task->offset + task->bytes;
+    return task->req.offset + task->req.bytes;
 }
 
 typedef struct BlockCopyState {
@@ -135,7 +131,7 @@ typedef struct BlockCopyState {
     CoMutex lock;
     int64_t in_flight_bytes;
     BlockCopyMethod method;
-    QLIST_HEAD(, BlockCopyTask) tasks; /* All tasks from all block-copy calls */
+    BlockReqList reqs;
     QLIST_HEAD(, BlockCopyCallState) calls;
     /*
      * skip_unallocated:
@@ -159,42 +155,6 @@ typedef struct BlockCopyState {
     RateLimit rate_limit;
 } BlockCopyState;
 
-/* Called with lock held */
-static BlockCopyTask *find_conflicting_task(BlockCopyState *s,
-                                            int64_t offset, int64_t bytes)
-{
-    BlockCopyTask *t;
-
-    QLIST_FOREACH(t, &s->tasks, list) {
-        if (offset + bytes > t->offset && offset < t->offset + t->bytes) {
-            return t;
-        }
-    }
-
-    return NULL;
-}
-
-/*
- * If there are no intersecting tasks return false. Otherwise, wait for the
- * first found intersecting tasks to finish and return true.
- *
- * Called with lock held. May temporary release the lock.
- * Return value of 0 proves that lock was NOT released.
- */
-static bool coroutine_fn block_copy_wait_one(BlockCopyState *s, int64_t offset,
-                                             int64_t bytes)
-{
-    BlockCopyTask *task = find_conflicting_task(s, offset, bytes);
-
-    if (!task) {
-        return false;
-    }
-
-    qemu_co_queue_wait(&task->wait_queue, &s->lock);
-
-    return true;
-}
-
 /* Called with lock held */
 static int64_t block_copy_chunk_size(BlockCopyState *s)
 {
@@ -238,7 +198,7 @@ block_copy_task_create(BlockCopyState *s, BlockCopyCallState *call_state,
     bytes = QEMU_ALIGN_UP(bytes, s->cluster_size);
 
     /* region is dirty, so no existent tasks possible in it */
-    assert(!find_conflicting_task(s, offset, bytes));
+    assert(!reqlist_find_conflict(&s->reqs, offset, bytes));
 
     bdrv_reset_dirty_bitmap(s->copy_bitmap, offset, bytes);
     s->in_flight_bytes += bytes;
@@ -248,12 +208,9 @@ block_copy_task_create(BlockCopyState *s, BlockCopyCallState *call_state,
         .task.func = block_copy_task_entry,
         .s = s,
         .call_state = call_state,
-        .offset = offset,
-        .bytes = bytes,
         .method = s->method,
     };
-    qemu_co_queue_init(&task->wait_queue);
-    QLIST_INSERT_HEAD(&s->tasks, task, list);
+    reqlist_init_req(&s->reqs, &task->req, offset, bytes);
 
     return task;
 }
@@ -269,34 +226,34 @@ static void coroutine_fn block_copy_task_shrink(BlockCopyTask *task,
                                                 int64_t new_bytes)
 {
     QEMU_LOCK_GUARD(&task->s->lock);
-    if (new_bytes == task->bytes) {
+    if (new_bytes == task->req.bytes) {
         return;
     }
 
-    assert(new_bytes > 0 && new_bytes < task->bytes);
+    assert(new_bytes > 0 && new_bytes < task->req.bytes);
 
-    task->s->in_flight_bytes -= task->bytes - new_bytes;
+    task->s->in_flight_bytes -= task->req.bytes - new_bytes;
     bdrv_set_dirty_bitmap(task->s->copy_bitmap,
-                          task->offset + new_bytes, task->bytes - new_bytes);
+                          task->req.offset + new_bytes,
+                          task->req.bytes - new_bytes);
 
-    task->bytes = new_bytes;
-    qemu_co_queue_restart_all(&task->wait_queue);
+    reqlist_shrink_req(&task->req, new_bytes);
 }
 
 static void coroutine_fn block_copy_task_end(BlockCopyTask *task, int ret)
 {
     QEMU_LOCK_GUARD(&task->s->lock);
-    task->s->in_flight_bytes -= task->bytes;
+    task->s->in_flight_bytes -= task->req.bytes;
     if (ret < 0) {
-        bdrv_set_dirty_bitmap(task->s->copy_bitmap, task->offset, task->bytes);
+        bdrv_set_dirty_bitmap(task->s->copy_bitmap, task->req.offset,
+                              task->req.bytes);
     }
-    QLIST_REMOVE(task, list);
     if (task->s->progress) {
         progress_set_remaining(task->s->progress,
                                bdrv_get_dirty_count(task->s->copy_bitmap) +
                                task->s->in_flight_bytes);
     }
-    qemu_co_queue_restart_all(&task->wait_queue);
+    reqlist_remove_req(&task->req);
 }
 
 void block_copy_state_free(BlockCopyState *s)
@@ -447,7 +404,7 @@ BlockCopyState *block_copy_state_new(BdrvChild *source, BdrvChild *target,
 
     ratelimit_init(&s->rate_limit);
     qemu_co_mutex_init(&s->lock);
-    QLIST_INIT(&s->tasks);
+    QLIST_INIT(&s->reqs);
     QLIST_INIT(&s->calls);
 
     return s;
@@ -480,7 +437,7 @@ static coroutine_fn int block_copy_task_run(AioTaskPool *pool,
 
     aio_task_pool_wait_slot(pool);
     if (aio_task_pool_status(pool) < 0) {
-        co_put_to_shres(task->s->mem, task->bytes);
+        co_put_to_shres(task->s->mem, task->req.bytes);
         block_copy_task_end(task, -ECANCELED);
         g_free(task);
         return -ECANCELED;
@@ -593,7 +550,8 @@ static coroutine_fn int block_copy_task_entry(AioTask *task)
     BlockCopyMethod method = t->method;
     int ret;
 
-    ret = block_copy_do_copy(s, t->offset, t->bytes, &method, &error_is_read);
+    ret = block_copy_do_copy(s, t->req.offset, t->req.bytes, &method,
+                             &error_is_read);
 
     WITH_QEMU_LOCK_GUARD(&s->lock) {
         if (s->method == t->method) {
@@ -606,10 +564,10 @@ static coroutine_fn int block_copy_task_entry(AioTask *task)
                 t->call_state->error_is_read = error_is_read;
             }
         } else if (s->progress) {
-            progress_work_done(s->progress, t->bytes);
+            progress_work_done(s->progress, t->req.bytes);
         }
     }
-    co_put_to_shres(s->mem, t->bytes);
+    co_put_to_shres(s->mem, t->req.bytes);
     block_copy_task_end(t, ret);
 
     return ret;
@@ -768,22 +726,22 @@ block_copy_dirty_clusters(BlockCopyCallState *call_state)
             trace_block_copy_skip_range(s, offset, bytes);
             break;
         }
-        if (task->offset > offset) {
-            trace_block_copy_skip_range(s, offset, task->offset - offset);
+        if (task->req.offset > offset) {
+            trace_block_copy_skip_range(s, offset, task->req.offset - offset);
         }
 
         found_dirty = true;
 
-        ret = block_copy_block_status(s, task->offset, task->bytes,
+        ret = block_copy_block_status(s, task->req.offset, task->req.bytes,
                                       &status_bytes);
         assert(ret >= 0); /* never fail */
-        if (status_bytes < task->bytes) {
+        if (status_bytes < task->req.bytes) {
             block_copy_task_shrink(task, status_bytes);
         }
         if (qatomic_read(&s->skip_unallocated) &&
             !(ret & BDRV_BLOCK_ALLOCATED)) {
             block_copy_task_end(task, 0);
-            trace_block_copy_skip_range(s, task->offset, task->bytes);
+            trace_block_copy_skip_range(s, task->req.offset, task->req.bytes);
             offset = task_end(task);
             bytes = end - offset;
             g_free(task);
@@ -804,11 +762,11 @@ block_copy_dirty_clusters(BlockCopyCallState *call_state)
             }
         }
 
-        ratelimit_calculate_delay(&s->rate_limit, task->bytes);
+        ratelimit_calculate_delay(&s->rate_limit, task->req.bytes);
 
-        trace_block_copy_process(s, task->offset);
+        trace_block_copy_process(s, task->req.offset);
 
-        co_get_from_shres(s->mem, task->bytes);
+        co_get_from_shres(s->mem, task->req.bytes);
 
         offset = task_end(task);
         bytes = end - offset;
@@ -876,8 +834,8 @@ static int coroutine_fn block_copy_common(BlockCopyCallState *call_state)
                  * Check that there is no task we still need to
                  * wait to complete
                  */
-                ret = block_copy_wait_one(s, call_state->offset,
-                                          call_state->bytes);
+                ret = reqlist_wait_one(&s->reqs, call_state->offset,
+                                       call_state->bytes, &s->lock);
                 if (ret == 0) {
                     /*
                      * No pending tasks, but check again the bitmap in this
@@ -885,7 +843,7 @@ static int coroutine_fn block_copy_common(BlockCopyCallState *call_state)
                      * between this and the critical section in
                      * block_copy_dirty_clusters().
                      *
-                     * block_copy_wait_one return value 0 also means that it
+                     * reqlist_wait_one return value 0 also means that it
                      * didn't release the lock. So, we are still in the same
                      * critical section, not interrupted by any concurrent
                      * access to state.
diff --git a/block/reqlist.c b/block/reqlist.c
new file mode 100644
index 0000000000..c41415c16a
--- /dev/null
+++ b/block/reqlist.c
@@ -0,0 +1,94 @@
+/*
+ * block_copy API
+ *
+ * Copyright (C) 2013 Proxmox Server Solutions
+ * Copyright (c) 2021 Virtuozzo International GmbH.
+ *
+ * Authors:
+ *  Dietmar Maurer (dietmar@proxmox.com)
+ *  Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+
+#include "block/reqlist.h"
+
+/*
+ * Initialize new request and add it to the list. Caller should be sure that
+ * there are no conflicting requests in the list.
+ */
+void reqlist_init_req(BlockReqList *reqs, BlockReq *req, int64_t offset,
+                      int64_t bytes)
+{
+    assert(!reqlist_find_conflict(reqs, offset, bytes));
+
+    *req = (BlockReq) {
+        .offset = offset,
+        .bytes = bytes,
+    };
+    qemu_co_queue_init(&req->wait_queue);
+    QLIST_INSERT_HEAD(reqs, req, list);
+}
+
+BlockReq *reqlist_find_conflict(BlockReqList *reqs, int64_t offset,
+                                int64_t bytes)
+{
+    BlockReq *r;
+
+    QLIST_FOREACH(r, reqs, list) {
+        if (offset + bytes > r->offset && offset < r->offset + r->bytes) {
+            return r;
+        }
+    }
+
+    return NULL;
+}
+
+/*
+ * If there are no intersecting requests return false. Otherwise, wait for the
+ * first found intersecting request to finish and return true.
+ *
+ * @lock is passed to qemu_co_queue_wait()
+ * False return value proves that lock was NOT released.
+ */
+bool coroutine_fn reqlist_wait_one(BlockReqList *reqs, int64_t offset,
+                                   int64_t bytes, CoMutex *lock)
+{
+    BlockReq *r = reqlist_find_conflict(reqs, offset, bytes);
+
+    if (!r) {
+        return false;
+    }
+
+    qemu_co_queue_wait(&r->wait_queue, lock);
+
+    return true;
+}
+
+/*
+ * Shrink request and wake all waiting coroutines (may be some of them are not
+ * intersecting with shrunk request).
+ */
+void coroutine_fn reqlist_shrink_req(BlockReq *req, int64_t new_bytes)
+{
+    if (new_bytes == req->bytes) {
+        return;
+    }
+
+    assert(new_bytes > 0 && new_bytes < req->bytes);
+
+    req->bytes = new_bytes;
+    qemu_co_queue_restart_all(&req->wait_queue);
+}
+
+/*
+ * Remove request and wake all waiting coroutines. Do not release any memory.
+ */
+void coroutine_fn reqlist_remove_req(BlockReq *req)
+{
+    QLIST_REMOVE(req, list);
+    qemu_co_queue_restart_all(&req->wait_queue);
+}
diff --git a/block/meson.build b/block/meson.build
index 66ee11e62c..c630e9374a 100644
--- a/block/meson.build
+++ b/block/meson.build
@@ -32,6 +32,7 @@ block_ss.add(files(
   'qcow2.c',
   'quorum.c',
   'raw-format.c',
+  'reqlist.c',
   'snapshot.c',
   'throttle-groups.c',
   'throttle.c',
-- 
2.29.2



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 07/11] block/dirty-bitmap: introduce bdrv_dirty_bitmap_status()
  2021-08-04 13:17 [PATCH RFC DRAFT 00/11] Make image fleecing more usable Vladimir Sementsov-Ogievskiy
                   ` (5 preceding siblings ...)
  2021-08-04 13:17 ` [PATCH 06/11] block: intoduce reqlist Vladimir Sementsov-Ogievskiy
@ 2021-08-04 13:17 ` Vladimir Sementsov-Ogievskiy
  2021-08-04 13:17 ` [PATCH 08/11] block/reqlist: add reqlist_wait_all() Vladimir Sementsov-Ogievskiy
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2021-08-04 13:17 UTC (permalink / raw)
  To: qemu-block; +Cc: qemu-devel, eblake, mreitz, vsementsov, jsnow, kwolf

Add a convenient function similar with bdrv_block_status() to get
status of dirty bitmap.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 include/block/dirty-bitmap.h |  2 ++
 include/qemu/hbitmap.h       | 11 +++++++++++
 block/dirty-bitmap.c         |  6 ++++++
 util/hbitmap.c               | 36 ++++++++++++++++++++++++++++++++++++
 4 files changed, 55 insertions(+)

diff --git a/include/block/dirty-bitmap.h b/include/block/dirty-bitmap.h
index f95d350b70..2ae7dc3d1d 100644
--- a/include/block/dirty-bitmap.h
+++ b/include/block/dirty-bitmap.h
@@ -115,6 +115,8 @@ int64_t bdrv_dirty_bitmap_next_zero(BdrvDirtyBitmap *bitmap, int64_t offset,
 bool bdrv_dirty_bitmap_next_dirty_area(BdrvDirtyBitmap *bitmap,
         int64_t start, int64_t end, int64_t max_dirty_count,
         int64_t *dirty_start, int64_t *dirty_count);
+void bdrv_dirty_bitmap_status(BdrvDirtyBitmap *bitmap, int64_t offset,
+                              int64_t bytes, bool *is_dirty, int64_t *count);
 BdrvDirtyBitmap *bdrv_reclaim_dirty_bitmap_locked(BdrvDirtyBitmap *bitmap,
                                                   Error **errp);
 
diff --git a/include/qemu/hbitmap.h b/include/qemu/hbitmap.h
index 5e71b6d6f7..845fda12db 100644
--- a/include/qemu/hbitmap.h
+++ b/include/qemu/hbitmap.h
@@ -340,6 +340,17 @@ bool hbitmap_next_dirty_area(const HBitmap *hb, int64_t start, int64_t end,
                              int64_t max_dirty_count,
                              int64_t *dirty_start, int64_t *dirty_count);
 
+/*
+ * bdrv_dirty_bitmap_status:
+ * @hb: The HBitmap to operate on
+ * @start: the offset to start from
+ * @end: end of requested area
+ * @is_dirty: is bitmap dirty at @offset
+ * @pnum: how many bits has same value starting from @offset
+ */
+void hbitmap_status(const HBitmap *hb, int64_t offset, int64_t bytes,
+                    bool *is_dirty, int64_t *pnum);
+
 /**
  * hbitmap_iter_next:
  * @hbi: HBitmapIter to operate on.
diff --git a/block/dirty-bitmap.c b/block/dirty-bitmap.c
index 8f8b428baa..ecbd6703dc 100644
--- a/block/dirty-bitmap.c
+++ b/block/dirty-bitmap.c
@@ -875,6 +875,12 @@ bool bdrv_dirty_bitmap_next_dirty_area(BdrvDirtyBitmap *bitmap,
                                    dirty_start, dirty_count);
 }
 
+void bdrv_dirty_bitmap_status(BdrvDirtyBitmap *bitmap, int64_t offset,
+                              int64_t bytes, bool *is_dirty, int64_t *count)
+{
+    hbitmap_status(bitmap->bitmap, offset, bytes, is_dirty, count);
+}
+
 /**
  * bdrv_merge_dirty_bitmap: merge src into dest.
  * Ensures permissions on bitmaps are reasonable; use for public API.
diff --git a/util/hbitmap.c b/util/hbitmap.c
index 305b894a63..91956263ef 100644
--- a/util/hbitmap.c
+++ b/util/hbitmap.c
@@ -301,6 +301,42 @@ bool hbitmap_next_dirty_area(const HBitmap *hb, int64_t start, int64_t end,
     return true;
 }
 
+void hbitmap_status(const HBitmap *hb, int64_t start, int64_t count,
+                    bool *is_dirty, int64_t *pnum)
+{
+    int64_t next_dirty, next_zero;
+
+    assert(start >= 0);
+    assert(count > 0);
+    assert(start + count <= hb->orig_size);
+
+    next_dirty = hbitmap_next_dirty(hb, start, count);
+    if (next_dirty == -1) {
+        *pnum = count;
+        *is_dirty = false;
+        return;
+    }
+
+    if (next_dirty > start) {
+        *pnum = next_dirty - start;
+        *is_dirty = false;
+        return;
+    }
+
+    assert(next_dirty == start);
+
+    next_zero = hbitmap_next_zero(hb, start, count);
+    if (next_zero == -1) {
+        *pnum = next_zero - start;
+        *is_dirty = true;
+        return;
+    }
+
+    assert(next_zero > start);
+    *pnum = next_zero - start;
+    *is_dirty = false;
+}
+
 bool hbitmap_empty(const HBitmap *hb)
 {
     return hb->count == 0;
-- 
2.29.2



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 08/11] block/reqlist: add reqlist_wait_all()
  2021-08-04 13:17 [PATCH RFC DRAFT 00/11] Make image fleecing more usable Vladimir Sementsov-Ogievskiy
                   ` (6 preceding siblings ...)
  2021-08-04 13:17 ` [PATCH 07/11] block/dirty-bitmap: introduce bdrv_dirty_bitmap_status() Vladimir Sementsov-Ogievskiy
@ 2021-08-04 13:17 ` Vladimir Sementsov-Ogievskiy
  2021-08-04 13:17 ` [PATCH 09/11] block/copy-before-write: add cbw_snapshot_read_{lock, unlock}() Vladimir Sementsov-Ogievskiy via
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2021-08-04 13:17 UTC (permalink / raw)
  To: qemu-block; +Cc: qemu-devel, eblake, mreitz, vsementsov, jsnow, kwolf

To be used in the next commit.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 include/block/reqlist.h | 2 ++
 block/reqlist.c         | 6 ++++++
 2 files changed, 8 insertions(+)

diff --git a/include/block/reqlist.h b/include/block/reqlist.h
index 2de86be300..59d5c24cda 100644
--- a/include/block/reqlist.h
+++ b/include/block/reqlist.h
@@ -39,6 +39,8 @@ BlockReq *reqlist_find_conflict(BlockReqList *reqs, int64_t offset,
                                 int64_t bytes);
 bool coroutine_fn reqlist_wait_one(BlockReqList *reqs, int64_t offset,
                                    int64_t bytes, CoMutex *lock);
+void coroutine_fn reqlist_wait_all(BlockReqList *reqs, int64_t offset,
+                                   int64_t bytes, CoMutex *lock);
 void coroutine_fn reqlist_shrink_req(BlockReq *req, int64_t new_bytes);
 void coroutine_fn reqlist_remove_req(BlockReq *req);
 
diff --git a/block/reqlist.c b/block/reqlist.c
index c41415c16a..939437621d 100644
--- a/block/reqlist.c
+++ b/block/reqlist.c
@@ -68,6 +68,12 @@ bool coroutine_fn reqlist_wait_one(BlockReqList *reqs, int64_t offset,
     return true;
 }
 
+void coroutine_fn reqlist_wait_all(BlockReqList *reqs, int64_t offset,
+                                   int64_t bytes, CoMutex *lock)
+{
+    while (reqlist_wait_one(reqs, offset, bytes, lock)) { }
+}
+
 /*
  * Shrink request and wake all waiting coroutines (may be some of them are not
  * intersecting with shrunk request).
-- 
2.29.2



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 09/11] block/copy-before-write: add cbw_snapshot_read_{lock, unlock}()
  2021-08-04 13:17 [PATCH RFC DRAFT 00/11] Make image fleecing more usable Vladimir Sementsov-Ogievskiy
                   ` (7 preceding siblings ...)
  2021-08-04 13:17 ` [PATCH 08/11] block/reqlist: add reqlist_wait_all() Vladimir Sementsov-Ogievskiy
@ 2021-08-04 13:17 ` Vladimir Sementsov-Ogievskiy via
  2021-08-04 13:17 ` [PATCH 10/11] block/copy-before-write: add cbw_snapshot_discard() Vladimir Sementsov-Ogievskiy
  2021-08-04 13:17 ` [PATCH 11/11] block: introduce fleecing block driver Vladimir Sementsov-Ogievskiy
  10 siblings, 0 replies; 12+ messages in thread
From: Vladimir Sementsov-Ogievskiy via @ 2021-08-04 13:17 UTC (permalink / raw)
  To: qemu-block; +Cc: qemu-devel, eblake, mreitz, vsementsov, jsnow, kwolf

Add interface which help to do fleecing read. To be used in the next
commit.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 block/copy-before-write.h |   5 ++
 block/copy-before-write.c | 103 +++++++++++++++++++++++++++++++++++++-
 2 files changed, 106 insertions(+), 2 deletions(-)

diff --git a/block/copy-before-write.h b/block/copy-before-write.h
index 51847e711a..a7e286620c 100644
--- a/block/copy-before-write.h
+++ b/block/copy-before-write.h
@@ -28,6 +28,7 @@
 
 #include "block/block_int.h"
 #include "block/block-copy.h"
+#include "block/reqlist.h"
 
 BlockDriverState *bdrv_cbw_append(BlockDriverState *source,
                                   BlockDriverState *target,
@@ -36,4 +37,8 @@ BlockDriverState *bdrv_cbw_append(BlockDriverState *source,
                                   Error **errp);
 void bdrv_cbw_drop(BlockDriverState *bs);
 
+int cbw_snapshot_read_lock(BlockDriverState *bs, int64_t offset,
+                           int64_t bytes, const BlockReq **req, int64_t *pnum);
+void cbw_snapshot_read_unlock(BlockDriverState *bs, const BlockReq *req);
+
 #endif /* COPY_BEFORE_WRITE_H */
diff --git a/block/copy-before-write.c b/block/copy-before-write.c
index b58a5e8b48..a96131358e 100644
--- a/block/copy-before-write.c
+++ b/block/copy-before-write.c
@@ -31,14 +31,78 @@
 #include "block/block_int.h"
 #include "block/qdict.h"
 #include "block/block-copy.h"
+#include "block/reqlist.h"
 
 #include "block/copy-before-write.h"
 
 typedef struct BDRVCopyBeforeWriteState {
     BlockCopyState *bcs;
     BdrvChild *target;
+    CoMutex lock;
+
+    BdrvDirtyBitmap *access_bitmap;
+    BdrvDirtyBitmap *done_bitmap;
+
+    BlockReqList frozen_read_reqs;
 } BDRVCopyBeforeWriteState;
 
+static BlockReq *add_read_req(BDRVCopyBeforeWriteState *s, uint64_t offset,
+                              uint64_t bytes)
+{
+    BlockReq *req = g_new(BlockReq, 1);
+
+    reqlist_init_req(&s->frozen_read_reqs, req, offset, bytes);
+
+    return req;
+}
+
+static void drop_read_req(BDRVCopyBeforeWriteState *s, BlockReq *req)
+{
+    reqlist_remove_req(req);
+    g_free(req);
+}
+
+/*
+ * Convenient function for thous who want to do fleecing read.
+ *
+ * If requested region starts in "done" area, i.e. data is already copied to
+ * copy-before-write target node, req is set to NULL, pnum is set to available
+ * bytes to read from target. User is free to read @pnum bytes from target.
+ * Still, user is responsible for concurrent discards on target.
+ *
+ * If requests region starts in "not done" area, i.e. we have to read from
+ * source node directly, than @pnum bytes of source node are frozen and
+ * guaranteed not be rewritten until user calls cbw_snapshot_read_unlock().
+ */
+int cbw_snapshot_read_lock(BlockDriverState *bs, int64_t offset,
+                           int64_t bytes, const BlockReq **req, int64_t *pnum)
+{
+    BDRVCopyBeforeWriteState *s = bs->opaque;
+    bool done;
+
+    QEMU_LOCK_GUARD(&s->lock);
+
+    if (bdrv_dirty_bitmap_next_zero(s->access_bitmap, offset, bytes) != -1) {
+        return -EACCES;
+    }
+
+    bdrv_dirty_bitmap_status(s->done_bitmap, offset, bytes, &done, pnum);
+    if (!done) {
+        *req = add_read_req(s, offset, *pnum);
+    }
+
+    return 0;
+}
+
+void cbw_snapshot_read_unlock(BlockDriverState *bs, const BlockReq *req)
+{
+    BDRVCopyBeforeWriteState *s = bs->opaque;
+
+    QEMU_LOCK_GUARD(&s->lock);
+
+    drop_read_req(s, (BlockReq *)req);
+}
+
 static coroutine_fn int cbw_co_preadv(
         BlockDriverState *bs, uint64_t offset, uint64_t bytes,
         QEMUIOVector *qiov, int flags)
@@ -50,6 +114,7 @@ static coroutine_fn int cbw_do_copy_before_write(BlockDriverState *bs,
         uint64_t offset, uint64_t bytes, BdrvRequestFlags flags)
 {
     BDRVCopyBeforeWriteState *s = bs->opaque;
+    int ret;
     uint64_t off, end;
     int64_t cluster_size = block_copy_cluster_size(s->bcs);
 
@@ -60,7 +125,17 @@ static coroutine_fn int cbw_do_copy_before_write(BlockDriverState *bs,
     off = QEMU_ALIGN_DOWN(offset, cluster_size);
     end = QEMU_ALIGN_UP(offset + bytes, cluster_size);
 
-    return block_copy(s->bcs, off, end - off, true);
+    ret = block_copy(s->bcs, off, end - off, true);
+    if (ret < 0) {
+        return ret;
+    }
+
+    WITH_QEMU_LOCK_GUARD(&s->lock) {
+        bdrv_set_dirty_bitmap(s->done_bitmap, off, end - off);
+        reqlist_wait_all(&s->frozen_read_reqs, off, end - off, &s->lock);
+    }
+
+    return 0;
 }
 
 static int coroutine_fn cbw_co_pdiscard(BlockDriverState *bs,
@@ -148,7 +223,11 @@ static int cbw_open(BlockDriverState *bs, QDict *options, int flags,
                     Error **errp)
 {
     BDRVCopyBeforeWriteState *s = bs->opaque;
-    BdrvDirtyBitmap *bitmap = NULL;
+    BdrvDirtyBitmap *bcs_bitmap, *bitmap = NULL;
+    bool ok;
+
+    qemu_co_mutex_init(&s->lock);
+    QLIST_INIT(&s->frozen_read_reqs);
 
     bs->file = bdrv_open_child(NULL, options, "file", bs, &child_of_bds,
                                BDRV_CHILD_FILTERED | BDRV_CHILD_PRIMARY,
@@ -202,6 +281,23 @@ static int cbw_open(BlockDriverState *bs, QDict *options, int flags,
         return -EINVAL;
     }
 
+    bcs_bitmap = block_copy_dirty_bitmap(s->bcs);
+
+    /* done_bitmap starts empty */
+    s->done_bitmap =
+        bdrv_create_dirty_bitmap(bs, block_copy_cluster_size(s->bcs), NULL,
+                                 errp);
+    bdrv_disable_dirty_bitmap(s->done_bitmap);
+    /* access_bitmap starts equal to bcs_bitmap */
+    s->access_bitmap =
+        bdrv_create_dirty_bitmap(bs, block_copy_cluster_size(s->bcs), NULL,
+                                 errp);
+    bdrv_disable_dirty_bitmap(s->access_bitmap);
+    ok = bdrv_dirty_bitmap_merge_internal(s->access_bitmap, bcs_bitmap, NULL,
+                                          true);
+    /* Merge fails iff bitmaps has different size */
+    assert(ok);
+
     return 0;
 }
 
@@ -211,6 +307,9 @@ static void cbw_close(BlockDriverState *bs)
 
     block_copy_state_free(s->bcs);
     s->bcs = NULL;
+
+    bdrv_release_dirty_bitmap(s->access_bitmap);
+    bdrv_release_dirty_bitmap(s->done_bitmap);
 }
 
 BlockDriver bdrv_cbw_filter = {
-- 
2.29.2



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 10/11] block/copy-before-write: add cbw_snapshot_discard()
  2021-08-04 13:17 [PATCH RFC DRAFT 00/11] Make image fleecing more usable Vladimir Sementsov-Ogievskiy
                   ` (8 preceding siblings ...)
  2021-08-04 13:17 ` [PATCH 09/11] block/copy-before-write: add cbw_snapshot_read_{lock, unlock}() Vladimir Sementsov-Ogievskiy via
@ 2021-08-04 13:17 ` Vladimir Sementsov-Ogievskiy
  2021-08-04 13:17 ` [PATCH 11/11] block: introduce fleecing block driver Vladimir Sementsov-Ogievskiy
  10 siblings, 0 replies; 12+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2021-08-04 13:17 UTC (permalink / raw)
  To: qemu-block; +Cc: qemu-devel, eblake, mreitz, vsementsov, jsnow, kwolf

To be used soon.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 block/copy-before-write.h |  1 +
 block/copy-before-write.c | 11 +++++++++++
 2 files changed, 12 insertions(+)

diff --git a/block/copy-before-write.h b/block/copy-before-write.h
index a7e286620c..5809ffc7d0 100644
--- a/block/copy-before-write.h
+++ b/block/copy-before-write.h
@@ -40,5 +40,6 @@ void bdrv_cbw_drop(BlockDriverState *bs);
 int cbw_snapshot_read_lock(BlockDriverState *bs, int64_t offset,
                            int64_t bytes, const BlockReq **req, int64_t *pnum);
 void cbw_snapshot_read_unlock(BlockDriverState *bs, const BlockReq *req);
+void cbw_snapshot_discard(BlockDriverState *bs, int64_t offset, int64_t bytes);
 
 #endif /* COPY_BEFORE_WRITE_H */
diff --git a/block/copy-before-write.c b/block/copy-before-write.c
index a96131358e..a9fc8e34e9 100644
--- a/block/copy-before-write.c
+++ b/block/copy-before-write.c
@@ -103,6 +103,17 @@ void cbw_snapshot_read_unlock(BlockDriverState *bs, const BlockReq *req)
     drop_read_req(s, (BlockReq *)req);
 }
 
+void cbw_snapshot_discard(BlockDriverState *bs, int64_t offset, int64_t bytes)
+{
+    BDRVCopyBeforeWriteState *s = bs->opaque;
+
+    WITH_QEMU_LOCK_GUARD(&s->lock) {
+        bdrv_reset_dirty_bitmap(s->access_bitmap, offset, bytes);
+    }
+
+    block_copy_reset(s->bcs, offset, bytes);
+}
+
 static coroutine_fn int cbw_co_preadv(
         BlockDriverState *bs, uint64_t offset, uint64_t bytes,
         QEMUIOVector *qiov, int flags)
-- 
2.29.2



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 11/11] block: introduce fleecing block driver
  2021-08-04 13:17 [PATCH RFC DRAFT 00/11] Make image fleecing more usable Vladimir Sementsov-Ogievskiy
                   ` (9 preceding siblings ...)
  2021-08-04 13:17 ` [PATCH 10/11] block/copy-before-write: add cbw_snapshot_discard() Vladimir Sementsov-Ogievskiy
@ 2021-08-04 13:17 ` Vladimir Sementsov-Ogievskiy
  10 siblings, 0 replies; 12+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2021-08-04 13:17 UTC (permalink / raw)
  To: qemu-block; +Cc: qemu-devel, eblake, mreitz, vsementsov, jsnow, kwolf

Introduce a new driver, that works in pair with copy-before-write to
improve fleecing.

Without fleecing driver, fleecing scheme may look as follows:

[guest]
  |
  |root
  v
[copy-before-write] -----> [temp.qcow2] <--- [nbd export]
  |                 target  |
  |file                     |backing
  v                         |
[active disk] <-------------+

With fleecing driver, new scheme is:

           bdrv_ref()
[guest]<~~~~~~~~~~~~~~~~~~~[fleecing] <--- [nbd export]
  |                         |    |
  |root      +--source------+    |file
  v          v                   v
[copy-before-write] -----> [temp.img]
  |                 target
  |file
  v
[active disk]

Benefits of new scheme:

1. Access control: if remote client try to read data that not covered
   by original dirty bitmap used on copy-before-write open, client gets
   -EACCES.

2. Discard support: if remote client do DISCARD, this additionally to
   discarding data in temp.img informs block-copy process to not copy
   these clusters. Next read from discarded area will return -EACCES.

3. Synchronisation between client reads and block-copy write is more
   efficient: it doesn't block intersecting block-copy write during
   client read (hmm, we still needlesly block it, as block-copy
   always serialize writes, it's a TODO to stop doing so).

4. We don't rely on backing feature: active disk should not be backing
   of temp image, so we avoid some permission-related difficulties
   (cleaning them up in copy-before-write filter is a TODO) and temp
   image now is not required to support backing, it may be simple raw
   image.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 block/fleecing.c  | 183 ++++++++++++++++++++++++++++++++++++++++++++++
 block/meson.build |   1 +
 2 files changed, 184 insertions(+)
 create mode 100644 block/fleecing.c

diff --git a/block/fleecing.c b/block/fleecing.c
new file mode 100644
index 0000000000..7d213da1ca
--- /dev/null
+++ b/block/fleecing.c
@@ -0,0 +1,183 @@
+/*
+ * copy-before-write filter driver
+ *
+ * The driver performs Copy-Before-Write (CBW) operation: it is injected above
+ * some node, and before each write it copies _old_ data to the target node.
+ *
+ * Copyright (c) 2018-2021 Virtuozzo International GmbH.
+ *
+ * Author:
+ *  Sementsov-Ogievskiy Vladimir <vsementsov@virtuozzo.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include "qemu/osdep.h"
+
+#include "sysemu/block-backend.h"
+#include "qemu/cutils.h"
+#include "qapi/error.h"
+#include "block/block_int.h"
+#include "block/qdict.h"
+#include "block/block-copy.h"
+
+#include "block/copy-before-write.h"
+
+typedef struct BDRVFleecingState {
+    BlockDriverState *cbw;
+    BdrvChild *source;
+} BDRVFleecingState;
+
+static coroutine_fn int fleecing_co_preadv_part(
+        BlockDriverState *bs, uint64_t offset, uint64_t bytes,
+        QEMUIOVector *qiov, size_t qiov_offset, int flags)
+{
+    BDRVFleecingState *s = bs->opaque;
+    const BlockReq *req;
+    int ret;
+
+    /* TODO: upgrade to async loop using AioTask */
+    while (bytes) {
+        int64_t cur_bytes;
+
+        ret = cbw_snapshot_read_lock(s->cbw, offset, bytes, &req, &cur_bytes);
+        if (ret < 0) {
+            return ret;
+        }
+
+        if (req) {
+            ret = bdrv_co_preadv_part(s->source, offset, cur_bytes,
+                                      qiov, qiov_offset, flags);
+            cbw_snapshot_read_unlock(s->cbw, req);
+        } else {
+            ret = bdrv_co_preadv_part(bs->file, offset, cur_bytes,
+                                      qiov, qiov_offset, flags);
+        }
+        if (ret < 0) {
+            return ret;
+        }
+
+        bytes -= cur_bytes;
+        offset += cur_bytes;
+        qiov_offset += cur_bytes;
+    }
+
+    return 0;
+}
+
+static int coroutine_fn fleecing_co_pdiscard(BlockDriverState *bs,
+                                             int64_t offset, int bytes)
+{
+    BDRVFleecingState *s = bs->opaque;
+
+    cbw_snapshot_discard(s->cbw, offset, bytes);
+
+    bdrv_co_pdiscard(bs->file, offset, bytes);
+
+    /*
+     * Ignore bdrv_co_pdiscard() result: cbw_snapshot_discard() succeeded, that
+     * means that next read from this area will fail with -EACCES. More correct
+     * to report success now.
+     */
+    return 0;
+}
+
+static int coroutine_fn fleecing_co_pwrite_zeroes(BlockDriverState *bs,
+        int64_t offset, int bytes, BdrvRequestFlags flags)
+{
+    return -EACCES;
+}
+
+static coroutine_fn int fleecing_co_pwritev(BlockDriverState *bs,
+                                       uint64_t offset,
+                                       uint64_t bytes,
+                                       QEMUIOVector *qiov, int flags)
+{
+    return -EACCES;
+}
+
+
+static void fleecing_refresh_filename(BlockDriverState *bs)
+{
+    pstrcpy(bs->exact_filename, sizeof(bs->exact_filename),
+            bs->file->bs->filename);
+}
+
+static int fleecing_open(BlockDriverState *bs, QDict *options, int flags,
+                         Error **errp)
+{
+    BDRVFleecingState *s = bs->opaque;
+    const char *cbw_node_name = qdict_get_str(options,
+                                              "copy-before-write-node");
+    BlockDriverState *cbw;
+
+    cbw = bdrv_find_node(cbw_node_name);
+    if (!cbw) {
+        error_setg(errp, "Node '%s' not found", cbw_node_name);
+        return -EINVAL;
+    }
+
+    bs->file = bdrv_open_child(NULL, options, "file", bs, &child_of_bds,
+                               BDRV_CHILD_DATA | BDRV_CHILD_PRIMARY,
+                               false, errp);
+    if (!bs->file) {
+        return -EINVAL;
+    }
+
+    s->source = bdrv_open_child(NULL, options, "source", bs, &child_of_bds,
+                               BDRV_CHILD_DATA, false, errp);
+    if (!s->source) {
+        return -EINVAL;
+    }
+
+    bs->total_sectors = bs->file->bs->total_sectors;
+
+    s->cbw = cbw;
+    bdrv_ref(cbw);
+
+    return 0;
+}
+
+static void fleecing_close(BlockDriverState *bs)
+{
+    BDRVFleecingState *s = bs->opaque;
+
+    bdrv_unref(s->cbw);
+}
+
+BlockDriver bdrv_fleecing_filter = {
+    .format_name = "fleecing",
+    .instance_size = sizeof(BDRVFleecingState),
+
+    .bdrv_open                  = fleecing_open,
+    .bdrv_close                 = fleecing_close,
+
+    .bdrv_co_preadv_part        = fleecing_co_preadv_part,
+    .bdrv_co_pwritev            = fleecing_co_pwritev,
+    .bdrv_co_pwrite_zeroes      = fleecing_co_pwrite_zeroes,
+    .bdrv_co_pdiscard           = fleecing_co_pdiscard,
+
+    .bdrv_refresh_filename      = fleecing_refresh_filename,
+
+    .bdrv_child_perm            = bdrv_default_perms,
+
+    .is_filter = true,
+};
+
+static void fleecing_init(void)
+{
+    bdrv_register(&bdrv_fleecing_filter);
+}
+
+block_init(fleecing_init);
diff --git a/block/meson.build b/block/meson.build
index c630e9374a..caa52f49c2 100644
--- a/block/meson.build
+++ b/block/meson.build
@@ -18,6 +18,7 @@ block_ss.add(files(
   'crypto.c',
   'dirty-bitmap.c',
   'filter-compress.c',
+  'fleecing.c',
   'io.c',
   'mirror.c',
   'nbd.c',
-- 
2.29.2



^ permalink raw reply related	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2021-08-04 13:30 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-04 13:17 [PATCH RFC DRAFT 00/11] Make image fleecing more usable Vladimir Sementsov-Ogievskiy
2021-08-04 13:17 ` [PATCH 01/11] block/block-copy: move copy_bitmap initialization to block_copy_state_new() Vladimir Sementsov-Ogievskiy
2021-08-04 13:17 ` [PATCH 02/11] block/dirty-bitmap: bdrv_merge_dirty_bitmap(): add return value Vladimir Sementsov-Ogievskiy
2021-08-04 13:17 ` [PATCH 03/11] block/block-copy: block_copy_state_new(): add bitmap parameter Vladimir Sementsov-Ogievskiy
2021-08-04 13:17 ` [PATCH 04/11] block/copy-before-write: add bitmap open parameter Vladimir Sementsov-Ogievskiy
2021-08-04 13:17 ` [PATCH 05/11] block/block-copy: add block_copy_reset() Vladimir Sementsov-Ogievskiy
2021-08-04 13:17 ` [PATCH 06/11] block: intoduce reqlist Vladimir Sementsov-Ogievskiy
2021-08-04 13:17 ` [PATCH 07/11] block/dirty-bitmap: introduce bdrv_dirty_bitmap_status() Vladimir Sementsov-Ogievskiy
2021-08-04 13:17 ` [PATCH 08/11] block/reqlist: add reqlist_wait_all() Vladimir Sementsov-Ogievskiy
2021-08-04 13:17 ` [PATCH 09/11] block/copy-before-write: add cbw_snapshot_read_{lock, unlock}() Vladimir Sementsov-Ogievskiy via
2021-08-04 13:17 ` [PATCH 10/11] block/copy-before-write: add cbw_snapshot_discard() Vladimir Sementsov-Ogievskiy
2021-08-04 13:17 ` [PATCH 11/11] block: introduce fleecing block driver Vladimir Sementsov-Ogievskiy

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.