All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/8] qemu-img: rebase: add compression support
@ 2023-09-15 16:20 Andrey Drobyshev via
  2023-09-15 16:20 ` [PATCH v2 1/8] qemu-img: rebase: stop when reaching EOF of old backing file Andrey Drobyshev via
                   ` (7 more replies)
  0 siblings, 8 replies; 21+ messages in thread
From: Andrey Drobyshev via @ 2023-09-15 16:20 UTC (permalink / raw)
  To: qemu-block; +Cc: qemu-devel, hreitz, kwolf, eblake, andrey.drobyshev, den

v1 --> v2:
 * Choose proper BlockBackend when aligning buf_old;
 * Add new patch ("qemu-img: add chunk size parameter to
   compare_buffers()");
 * Rework write alignment logic; now writes are aligned to either
   subcluster or cluster size, depending on whether compressionis enabled;
 * Add new patch ("iotests/{024, 271}: add testcases for qemu-img
   rebase");
 * Add another compressed rebase testcase for images having subclusters.

v1: https://lists.nongnu.org/archive/html/qemu-block/2023-06/msg00068.html

NOTE: compressed rebase testcase for subclusters assume "compressed"
field in "qemu-img map" output.  This series is currently in the block
branch and is likely to be merged into master soon:

https://lists.nongnu.org/archive/html/qemu-devel/2023-09/msg01489.html


Andrey Drobyshev (8):
  qemu-img: rebase: stop when reaching EOF of old backing file
  qemu-iotests: 024: add rebasing test case for overlay_size >
    backing_size
  qemu-img: rebase: use backing files' BlockBackend for buffer alignment
  qemu-img: add chunk size parameter to compare_buffers()
  qemu-img: rebase: avoid unnecessary COW operations
  iotests/{024, 271}: add testcases for qemu-img rebase
  qemu-img: add compression option to rebase subcommand
  iotests: add tests for "qemu-img rebase" with compression

 docs/tools/qemu-img.rst    |   6 +-
 qemu-img-cmds.hx           |   4 +-
 qemu-img.c                 | 136 +++++++++++++++++++++++-------
 tests/qemu-iotests/024     | 117 ++++++++++++++++++++++++++
 tests/qemu-iotests/024.out |  73 ++++++++++++++++
 tests/qemu-iotests/271     | 131 +++++++++++++++++++++++++++++
 tests/qemu-iotests/271.out |  82 ++++++++++++++++++
 tests/qemu-iotests/314     | 165 +++++++++++++++++++++++++++++++++++++
 tests/qemu-iotests/314.out |  75 +++++++++++++++++
 9 files changed, 753 insertions(+), 36 deletions(-)
 create mode 100755 tests/qemu-iotests/314
 create mode 100644 tests/qemu-iotests/314.out

-- 
2.39.3



^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH v2 1/8] qemu-img: rebase: stop when reaching EOF of old backing file
  2023-09-15 16:20 [PATCH v2 0/8] qemu-img: rebase: add compression support Andrey Drobyshev via
@ 2023-09-15 16:20 ` Andrey Drobyshev via
  2023-09-15 16:20 ` [PATCH v2 2/8] qemu-iotests: 024: add rebasing test case for overlay_size > backing_size Andrey Drobyshev via
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 21+ messages in thread
From: Andrey Drobyshev via @ 2023-09-15 16:20 UTC (permalink / raw)
  To: qemu-block; +Cc: qemu-devel, hreitz, kwolf, eblake, andrey.drobyshev, den

In case when we're rebasing within one backing chain, and when target image
is larger than old backing file, bdrv_is_allocated_above() ends up setting
*pnum = 0.  As a result, target offset isn't getting incremented, and we
get stuck in an infinite for loop.  Let's detect this case and proceed
further down the loop body, as the offsets beyond the old backing size need
to be explicitly zeroed.

Signed-off-by: Andrey Drobyshev <andrey.drobyshev@virtuozzo.com>
Reviewed-by: Denis V. Lunev <den@openvz.org>
Reviewed-by: Hanna Czenczek <hreitz@redhat.com>
---
 qemu-img.c | 13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/qemu-img.c b/qemu-img.c
index a48edb7101..50660ba920 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -3805,6 +3805,8 @@ static int img_rebase(int argc, char **argv)
             }
 
             if (prefix_chain_bs) {
+                uint64_t bytes = n;
+
                 /*
                  * If cluster wasn't changed since prefix_chain, we don't need
                  * to take action
@@ -3817,9 +3819,18 @@ static int img_rebase(int argc, char **argv)
                                  strerror(-ret));
                     goto out;
                 }
-                if (!ret) {
+                if (!ret && n) {
                     continue;
                 }
+                if (!n) {
+                    /*
+                     * If we've reached EOF of the old backing, it means that
+                     * offsets beyond the old backing size were read as zeroes.
+                     * Now we will need to explicitly zero the cluster in
+                     * order to preserve that state after the rebase.
+                     */
+                    n = bytes;
+                }
             }
 
             /*
-- 
2.39.3



^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v2 2/8] qemu-iotests: 024: add rebasing test case for overlay_size > backing_size
  2023-09-15 16:20 [PATCH v2 0/8] qemu-img: rebase: add compression support Andrey Drobyshev via
  2023-09-15 16:20 ` [PATCH v2 1/8] qemu-img: rebase: stop when reaching EOF of old backing file Andrey Drobyshev via
@ 2023-09-15 16:20 ` Andrey Drobyshev via
  2023-09-15 16:20 ` [PATCH v2 3/8] qemu-img: rebase: use backing files' BlockBackend for buffer alignment Andrey Drobyshev via
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 21+ messages in thread
From: Andrey Drobyshev via @ 2023-09-15 16:20 UTC (permalink / raw)
  To: qemu-block; +Cc: qemu-devel, hreitz, kwolf, eblake, andrey.drobyshev, den

Before previous commit, rebase was getting infitely stuck in case of
rebasing within the same backing chain and when overlay_size > backing_size.
Let's add this case to the rebasing test 024 to make sure it doesn't
break again.

Signed-off-by: Andrey Drobyshev <andrey.drobyshev@virtuozzo.com>
Reviewed-by: Denis V. Lunev <den@openvz.org>
Reviewed-by: Hanna Czenczek <hreitz@redhat.com>
---
 tests/qemu-iotests/024     | 57 ++++++++++++++++++++++++++++++++++++++
 tests/qemu-iotests/024.out | 30 ++++++++++++++++++++
 2 files changed, 87 insertions(+)

diff --git a/tests/qemu-iotests/024 b/tests/qemu-iotests/024
index 25a564a150..98a7c8fd65 100755
--- a/tests/qemu-iotests/024
+++ b/tests/qemu-iotests/024
@@ -199,6 +199,63 @@ echo
 # $BASE_OLD and $BASE_NEW)
 $QEMU_IMG map "$OVERLAY" | _filter_qemu_img_map
 
+# Check that rebase within the chain is working when
+# overlay_size > old_backing_size
+#
+# base_new <-- base_old <-- overlay
+#
+# Backing (new): 11 11 11 11 11
+# Backing (old): 22 22 22 22
+# Overlay:       -- -- -- -- --
+#
+# As a result, overlay should contain data identical to base_old, with the
+# last cluster remaining unallocated.
+
+echo
+echo "=== Test rebase within one backing chain ==="
+echo
+
+echo "Creating backing chain"
+echo
+
+TEST_IMG=$BASE_NEW _make_test_img $(( CLUSTER_SIZE * 5 ))
+TEST_IMG=$BASE_OLD _make_test_img -b "$BASE_NEW" -F $IMGFMT \
+    $(( CLUSTER_SIZE * 4 ))
+TEST_IMG=$OVERLAY _make_test_img -b "$BASE_OLD" -F $IMGFMT \
+    $(( CLUSTER_SIZE * 5 ))
+
+echo
+echo "Fill backing files with data"
+echo
+
+$QEMU_IO "$BASE_NEW" -c "write -P 0x11 0 $(( CLUSTER_SIZE * 5 ))" \
+    | _filter_qemu_io
+$QEMU_IO "$BASE_OLD" -c "write -P 0x22 0 $(( CLUSTER_SIZE * 4 ))" \
+    | _filter_qemu_io
+
+echo
+echo "Check the last cluster is zeroed in overlay before the rebase"
+echo
+$QEMU_IO "$OVERLAY" -c "read -P 0x00 $(( CLUSTER_SIZE * 4 )) $CLUSTER_SIZE" \
+    | _filter_qemu_io
+
+echo
+echo "Rebase onto another image in the same chain"
+echo
+
+$QEMU_IMG rebase -b "$BASE_NEW" -F $IMGFMT "$OVERLAY"
+
+echo "Verify that data is read the same before and after rebase"
+echo
+
+# Verify the first 4 clusters are still read the same as in the old base
+$QEMU_IO "$OVERLAY" -c "read -P 0x22 0 $(( CLUSTER_SIZE * 4 ))" \
+    | _filter_qemu_io
+# Verify the last cluster still reads as zeroes
+$QEMU_IO "$OVERLAY" -c "read -P 0x00 $(( CLUSTER_SIZE * 4 )) $CLUSTER_SIZE" \
+    | _filter_qemu_io
+
+echo
 
 # success, all done
 echo "*** done"
diff --git a/tests/qemu-iotests/024.out b/tests/qemu-iotests/024.out
index 973a5a3711..245fe8b1d1 100644
--- a/tests/qemu-iotests/024.out
+++ b/tests/qemu-iotests/024.out
@@ -171,4 +171,34 @@ read 65536/65536 bytes at offset 196608
 Offset          Length          File
 0               0x30000         TEST_DIR/subdir/t.IMGFMT
 0x30000         0x10000         TEST_DIR/subdir/t.IMGFMT.base_new
+
+=== Test rebase within one backing chain ===
+
+Creating backing chain
+
+Formatting 'TEST_DIR/subdir/t.IMGFMT.base_new', fmt=IMGFMT size=327680
+Formatting 'TEST_DIR/subdir/t.IMGFMT.base_old', fmt=IMGFMT size=262144 backing_file=TEST_DIR/subdir/t.IMGFMT.base_new backing_fmt=IMGFMT
+Formatting 'TEST_DIR/subdir/t.IMGFMT', fmt=IMGFMT size=327680 backing_file=TEST_DIR/subdir/t.IMGFMT.base_old backing_fmt=IMGFMT
+
+Fill backing files with data
+
+wrote 327680/327680 bytes at offset 0
+320 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 262144/262144 bytes at offset 0
+256 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+
+Check the last cluster is zeroed in overlay before the rebase
+
+read 65536/65536 bytes at offset 262144
+64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+
+Rebase onto another image in the same chain
+
+Verify that data is read the same before and after rebase
+
+read 262144/262144 bytes at offset 0
+256 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 65536/65536 bytes at offset 262144
+64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+
 *** done
-- 
2.39.3



^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v2 3/8] qemu-img: rebase: use backing files' BlockBackend for buffer alignment
  2023-09-15 16:20 [PATCH v2 0/8] qemu-img: rebase: add compression support Andrey Drobyshev via
  2023-09-15 16:20 ` [PATCH v2 1/8] qemu-img: rebase: stop when reaching EOF of old backing file Andrey Drobyshev via
  2023-09-15 16:20 ` [PATCH v2 2/8] qemu-iotests: 024: add rebasing test case for overlay_size > backing_size Andrey Drobyshev via
@ 2023-09-15 16:20 ` Andrey Drobyshev via
  2023-09-15 18:39   ` Eric Blake
  2023-09-19  8:18   ` Hanna Czenczek
  2023-09-15 16:20 ` [PATCH v2 4/8] qemu-img: add chunk size parameter to compare_buffers() Andrey Drobyshev via
                   ` (4 subsequent siblings)
  7 siblings, 2 replies; 21+ messages in thread
From: Andrey Drobyshev via @ 2023-09-15 16:20 UTC (permalink / raw)
  To: qemu-block; +Cc: qemu-devel, hreitz, kwolf, eblake, andrey.drobyshev, den

Since commit bb1c05973cf ("qemu-img: Use qemu_blockalign"), buffers for
the data read from the old and new backing files are aligned using
BlockDriverState (or BlockBackend later on) referring to the target image.
However, this isn't quite right, because buf_new is only being used for
reading from the new backing, while buf_old is being used for both reading
from the old backing and writing to the target.  Let's take that into account
and use more appropriate values as alignments.

Signed-off-by: Andrey Drobyshev <andrey.drobyshev@virtuozzo.com>
---
 qemu-img.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/qemu-img.c b/qemu-img.c
index 50660ba920..d12e4a4753 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -3750,8 +3750,13 @@ static int img_rebase(int argc, char **argv)
         int64_t n;
         float local_progress = 0;
 
-        buf_old = blk_blockalign(blk, IO_BUF_SIZE);
-        buf_new = blk_blockalign(blk, IO_BUF_SIZE);
+        if (blk_old_backing && bdrv_opt_mem_align(blk_bs(blk)) >
+            bdrv_opt_mem_align(blk_bs(blk_old_backing))) {
+            buf_old = blk_blockalign(blk, IO_BUF_SIZE);
+        } else {
+            buf_old = blk_blockalign(blk_old_backing, IO_BUF_SIZE);
+        }
+        buf_new = blk_blockalign(blk_new_backing, IO_BUF_SIZE);
 
         size = blk_getlength(blk);
         if (size < 0) {
-- 
2.39.3



^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v2 4/8] qemu-img: add chunk size parameter to compare_buffers()
  2023-09-15 16:20 [PATCH v2 0/8] qemu-img: rebase: add compression support Andrey Drobyshev via
                   ` (2 preceding siblings ...)
  2023-09-15 16:20 ` [PATCH v2 3/8] qemu-img: rebase: use backing files' BlockBackend for buffer alignment Andrey Drobyshev via
@ 2023-09-15 16:20 ` Andrey Drobyshev via
  2023-09-15 19:28   ` Eric Blake
  2023-09-19  8:32   ` Hanna Czenczek
  2023-09-15 16:20 ` [PATCH v2 5/8] qemu-img: rebase: avoid unnecessary COW operations Andrey Drobyshev via
                   ` (3 subsequent siblings)
  7 siblings, 2 replies; 21+ messages in thread
From: Andrey Drobyshev via @ 2023-09-15 16:20 UTC (permalink / raw)
  To: qemu-block; +Cc: qemu-devel, hreitz, kwolf, eblake, andrey.drobyshev, den

Add @chsize param to the function which, if non-zero, would represent
the chunk size to be used for comparison.  If it's zero, then
BDRV_SECTOR_SIZE is used as default chunk size, which is the previous
behaviour.

In particular, we're going to use this param in img_rebase() to make the
write requests aligned to a predefined alignment value.

Signed-off-by: Andrey Drobyshev <andrey.drobyshev@virtuozzo.com>
---
 qemu-img.c | 22 ++++++++++++++--------
 1 file changed, 14 insertions(+), 8 deletions(-)

diff --git a/qemu-img.c b/qemu-img.c
index d12e4a4753..fcd31d7b5b 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -1274,23 +1274,29 @@ static int is_allocated_sectors_min(const uint8_t *buf, int n, int *pnum,
 }
 
 /*
- * Compares two buffers sector by sector. Returns 0 if the first
- * sector of each buffer matches, non-zero otherwise.
+ * Compares two buffers chunk by chunk, where @chsize is the chunk size.
+ * If @chsize is 0, default chunk size of BDRV_SECTOR_SIZE is used.
+ * Returns 0 if the first chunk of each buffer matches, non-zero otherwise.
  *
- * pnum is set to the sector-aligned size of the buffer prefix that
+ * @pnum is set to the size of the buffer prefix aligned to @chsize that
  * has the same matching status as the first sector.
  */
 static int compare_buffers(const uint8_t *buf1, const uint8_t *buf2,
-                           int64_t bytes, int64_t *pnum)
+                           int64_t bytes, uint64_t chsize, int64_t *pnum)
 {
     bool res;
-    int64_t i = MIN(bytes, BDRV_SECTOR_SIZE);
+    int64_t i;
 
     assert(bytes > 0);
 
+    if (!chsize) {
+        chsize = BDRV_SECTOR_SIZE;
+    }
+    i = MIN(bytes, chsize);
+
     res = !!memcmp(buf1, buf2, i);
     while (i < bytes) {
-        int64_t len = MIN(bytes - i, BDRV_SECTOR_SIZE);
+        int64_t len = MIN(bytes - i, chsize);
 
         if (!!memcmp(buf1 + i, buf2 + i, len) != res) {
             break;
@@ -1559,7 +1565,7 @@ static int img_compare(int argc, char **argv)
                     ret = 4;
                     goto out;
                 }
-                ret = compare_buffers(buf1, buf2, chunk, &pnum);
+                ret = compare_buffers(buf1, buf2, chunk, 0, &pnum);
                 if (ret || pnum != chunk) {
                     qprintf(quiet, "Content mismatch at offset %" PRId64 "!\n",
                             offset + (ret ? 0 : pnum));
@@ -3878,7 +3884,7 @@ static int img_rebase(int argc, char **argv)
                 int64_t pnum;
 
                 if (compare_buffers(buf_old + written, buf_new + written,
-                                    n - written, &pnum))
+                                    n - written, 0, &pnum))
                 {
                     if (buf_old_is_zero) {
                         ret = blk_pwrite_zeroes(blk, offset + written, pnum, 0);
-- 
2.39.3



^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v2 5/8] qemu-img: rebase: avoid unnecessary COW operations
  2023-09-15 16:20 [PATCH v2 0/8] qemu-img: rebase: add compression support Andrey Drobyshev via
                   ` (3 preceding siblings ...)
  2023-09-15 16:20 ` [PATCH v2 4/8] qemu-img: add chunk size parameter to compare_buffers() Andrey Drobyshev via
@ 2023-09-15 16:20 ` Andrey Drobyshev via
  2023-09-15 21:52   ` Eric Blake
  2023-09-19 10:46   ` Hanna Czenczek
  2023-09-15 16:20 ` [PATCH v2 6/8] iotests/{024, 271}: add testcases for qemu-img rebase Andrey Drobyshev via
                   ` (2 subsequent siblings)
  7 siblings, 2 replies; 21+ messages in thread
From: Andrey Drobyshev via @ 2023-09-15 16:20 UTC (permalink / raw)
  To: qemu-block; +Cc: qemu-devel, hreitz, kwolf, eblake, andrey.drobyshev, den

When rebasing an image from one backing file to another, we need to
compare data from old and new backings.  If the diff between that data
happens to be unaligned to the target cluster size, we might end up
doing partial writes, which would lead to copy-on-write and additional IO.

Consider the following simple case (virtual_size == cluster_size == 64K):

base <-- inc1 <-- inc2

qemu-io -c "write -P 0xaa 0 32K" base.qcow2
qemu-io -c "write -P 0xcc 32K 32K" base.qcow2
qemu-io -c "write -P 0xbb 0 32K" inc1.qcow2
qemu-io -c "write -P 0xcc 32K 32K" inc1.qcow2
qemu-img rebase -f qcow2 -b base.qcow2 -F qcow2 inc2.qcow2

While doing rebase, we'll write a half of the cluster to inc2, and block
layer will have to read the 2nd half of the same cluster from the base image
inc1 while doing this write operation, although the whole cluster is already
read earlier to perform data comparison.

In order to avoid these unnecessary IO cycles, let's make sure every
write request is aligned to the overlay subcluster boundaries.  Using
subcluster size is universal as for the images which don't have them
this size equals to the cluster size, so in any case we end up aligning
to the smallest unit of allocation.

Signed-off-by: Andrey Drobyshev <andrey.drobyshev@virtuozzo.com>
---
 qemu-img.c | 76 ++++++++++++++++++++++++++++++++++++++++--------------
 1 file changed, 56 insertions(+), 20 deletions(-)

diff --git a/qemu-img.c b/qemu-img.c
index fcd31d7b5b..83950af42b 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -3523,6 +3523,7 @@ static int img_rebase(int argc, char **argv)
     uint8_t *buf_new = NULL;
     BlockDriverState *bs = NULL, *prefix_chain_bs = NULL;
     BlockDriverState *unfiltered_bs;
+    BlockDriverInfo bdi = {0};
     char *filename;
     const char *fmt, *cache, *src_cache, *out_basefmt, *out_baseimg;
     int c, flags, src_flags, ret;
@@ -3533,6 +3534,7 @@ static int img_rebase(int argc, char **argv)
     bool quiet = false;
     Error *local_err = NULL;
     bool image_opts = false;
+    int64_t write_align;
 
     /* Parse commandline parameters */
     fmt = NULL;
@@ -3656,6 +3658,20 @@ static int img_rebase(int argc, char **argv)
         }
     }
 
+    /*
+     * We need overlay subcluster size to make sure write requests are
+     * aligned.
+     */
+    ret = bdrv_get_info(unfiltered_bs, &bdi);
+    if (ret < 0) {
+        error_report("could not get block driver info");
+        goto out;
+    } else if (bdi.subcluster_size == 0) {
+        bdi.subcluster_size = 1;
+    }
+
+    write_align = bdi.subcluster_size;
+
     /* For safe rebasing we need to compare old and new backing file */
     if (!unsafe) {
         QDict *options = NULL;
@@ -3753,7 +3769,7 @@ static int img_rebase(int argc, char **argv)
         int64_t old_backing_size = 0;
         int64_t new_backing_size = 0;
         uint64_t offset;
-        int64_t n;
+        int64_t n, n_old = 0, n_new = 0;
         float local_progress = 0;
 
         if (blk_old_backing && bdrv_opt_mem_align(blk_bs(blk)) >
@@ -3799,7 +3815,8 @@ static int img_rebase(int argc, char **argv)
         }
 
         for (offset = 0; offset < size; offset += n) {
-            bool buf_old_is_zero = false;
+            bool old_backing_eof = false;
+            int64_t n_alloc;
 
             /* How many bytes can we handle with the next read? */
             n = MIN(IO_BUF_SIZE, size - offset);
@@ -3844,33 +3861,48 @@ static int img_rebase(int argc, char **argv)
                 }
             }
 
+            /*
+             * At this point we know that the region [offset; offset + n)
+             * is unallocated within the target image.  This region might be
+             * unaligned to the target image's (sub)cluster boundaries, as
+             * old backing may have smaller clusters (or have subclusters).
+             * We extend it to the aligned boundaries to avoid CoW on
+             * partial writes in blk_pwrite(),
+             */
+            n += offset - QEMU_ALIGN_DOWN(offset, write_align);
+            offset = QEMU_ALIGN_DOWN(offset, write_align);
+            n += QEMU_ALIGN_UP(offset + n, write_align) - (offset + n);
+            n = MIN(n, size - offset);
+            assert(!bdrv_is_allocated(unfiltered_bs, offset, n, &n_alloc) &&
+                   n_alloc == n);
+
+            /*
+             * Much like the with the target image, we'll try to read as much
+             * of the old and new backings as we can.
+             */
+            n_old = MIN(n, MAX(0, old_backing_size - (int64_t) offset));
+            if (blk_new_backing) {
+                n_new = MIN(n, MAX(0, new_backing_size - (int64_t) offset));
+            }
+
             /*
              * Read old and new backing file and take into consideration that
              * backing files may be smaller than the COW image.
              */
-            if (offset >= old_backing_size) {
-                memset(buf_old, 0, n);
-                buf_old_is_zero = true;
+            memset(buf_old + n_old, 0, n - n_old);
+            if (!n_old) {
+                old_backing_eof = true;
             } else {
-                if (offset + n > old_backing_size) {
-                    n = old_backing_size - offset;
-                }
-
-                ret = blk_pread(blk_old_backing, offset, n, buf_old, 0);
+                ret = blk_pread(blk_old_backing, offset, n_old, buf_old, 0);
                 if (ret < 0) {
                     error_report("error while reading from old backing file");
                     goto out;
                 }
             }
 
-            if (offset >= new_backing_size || !blk_new_backing) {
-                memset(buf_new, 0, n);
-            } else {
-                if (offset + n > new_backing_size) {
-                    n = new_backing_size - offset;
-                }
-
-                ret = blk_pread(blk_new_backing, offset, n, buf_new, 0);
+            memset(buf_new + n_new, 0, n - n_new);
+            if (blk_new_backing && n_new) {
+                ret = blk_pread(blk_new_backing, offset, n_new, buf_new, 0);
                 if (ret < 0) {
                     error_report("error while reading from new backing file");
                     goto out;
@@ -3884,11 +3916,12 @@ static int img_rebase(int argc, char **argv)
                 int64_t pnum;
 
                 if (compare_buffers(buf_old + written, buf_new + written,
-                                    n - written, 0, &pnum))
+                                    n - written, write_align, &pnum))
                 {
-                    if (buf_old_is_zero) {
+                    if (old_backing_eof) {
                         ret = blk_pwrite_zeroes(blk, offset + written, pnum, 0);
                     } else {
+                        assert(written + pnum <= IO_BUF_SIZE);
                         ret = blk_pwrite(blk, offset + written, pnum,
                                          buf_old + written, 0);
                     }
@@ -3900,6 +3933,9 @@ static int img_rebase(int argc, char **argv)
                 }
 
                 written += pnum;
+                if (offset + written >= old_backing_size) {
+                    old_backing_eof = true;
+                }
             }
             qemu_progress_print(local_progress, 100);
         }
-- 
2.39.3



^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v2 6/8] iotests/{024, 271}: add testcases for qemu-img rebase
  2023-09-15 16:20 [PATCH v2 0/8] qemu-img: rebase: add compression support Andrey Drobyshev via
                   ` (4 preceding siblings ...)
  2023-09-15 16:20 ` [PATCH v2 5/8] qemu-img: rebase: avoid unnecessary COW operations Andrey Drobyshev via
@ 2023-09-15 16:20 ` Andrey Drobyshev via
  2023-09-19 11:08   ` Hanna Czenczek
  2023-09-15 16:20 ` [PATCH v2 7/8] qemu-img: add compression option to rebase subcommand Andrey Drobyshev via
  2023-09-15 16:20 ` [PATCH v2 8/8] iotests: add tests for "qemu-img rebase" with compression Andrey Drobyshev via
  7 siblings, 1 reply; 21+ messages in thread
From: Andrey Drobyshev via @ 2023-09-15 16:20 UTC (permalink / raw)
  To: qemu-block; +Cc: qemu-devel, hreitz, kwolf, eblake, andrey.drobyshev, den

As the previous commit changes the logic of "qemu-img rebase" (it's using
write alignment now), let's add a couple more test cases which would
ensure it works correctly.  In particular, the following scenarios:

024: add test case for rebase within one backing chain when the overlay
     cluster size > backings cluster size;
271: add test case for rebase images that contain subclusters.  Check
     that no extra allocations are being made.

Signed-off-by: Andrey Drobyshev <andrey.drobyshev@virtuozzo.com>
---
 tests/qemu-iotests/024     | 60 ++++++++++++++++++++++++++++++++++
 tests/qemu-iotests/024.out | 43 +++++++++++++++++++++++++
 tests/qemu-iotests/271     | 66 ++++++++++++++++++++++++++++++++++++++
 tests/qemu-iotests/271.out | 42 ++++++++++++++++++++++++
 4 files changed, 211 insertions(+)

diff --git a/tests/qemu-iotests/024 b/tests/qemu-iotests/024
index 98a7c8fd65..285f17e79f 100755
--- a/tests/qemu-iotests/024
+++ b/tests/qemu-iotests/024
@@ -257,6 +257,66 @@ $QEMU_IO "$OVERLAY" -c "read -P 0x00 $(( CLUSTER_SIZE * 4 )) $CLUSTER_SIZE" \
 
 echo
 
+# Check that rebase within the chain is working when
+# overlay cluster size > backings cluster size
+# (here overlay cluster size == 2 * backings cluster size)
+#
+# base_new <-- base_old <-- overlay
+#
+# Backing (new): -- -- -- -- -- --
+# Backing (old): -- 11 -- -- 22 --
+# Overlay:      |-- --|-- --|-- --|
+#
+# We should end up having 1st and 3rd cluster allocated, and their halves
+# being read as zeroes.
+
+echo
+echo "=== Test rebase with different cluster sizes ==="
+echo
+
+echo "Creating backing chain"
+echo
+
+TEST_IMG=$BASE_NEW _make_test_img $(( CLUSTER_SIZE * 6 ))
+TEST_IMG=$BASE_OLD _make_test_img -b "$BASE_NEW" -F $IMGFMT \
+    $(( CLUSTER_SIZE * 6 ))
+CLUSTER_SIZE=$(( CLUSTER_SIZE * 2 )) TEST_IMG=$OVERLAY \
+    _make_test_img -b "$BASE_OLD" -F $IMGFMT $(( CLUSTER_SIZE * 6 ))
+
+TEST_IMG=$OVERLAY _img_info
+
+echo
+echo "Fill backing files with data"
+echo
+
+$QEMU_IO "$BASE_OLD" -c "write -P 0x11 $CLUSTER_SIZE $CLUSTER_SIZE" \
+    -c "write -P 0x22 $(( CLUSTER_SIZE * 4 )) $CLUSTER_SIZE" \
+    | _filter_qemu_io
+
+echo
+echo "Rebase onto another image in the same chain"
+echo
+
+$QEMU_IMG rebase -b "$BASE_NEW" -F $IMGFMT "$OVERLAY"
+
+echo "Verify that data is read the same before and after rebase"
+echo
+
+$QEMU_IO "$OVERLAY" -c "read -P 0x00 0 $CLUSTER_SIZE" \
+    -c "read -P 0x11 $CLUSTER_SIZE $CLUSTER_SIZE" \
+    -c "read -P 0x00 $(( CLUSTER_SIZE * 2 )) $(( CLUSTER_SIZE * 2 ))" \
+    -c "read -P 0x22 $(( CLUSTER_SIZE * 4 )) $CLUSTER_SIZE" \
+    -c "read -P 0x00 $(( CLUSTER_SIZE * 5 )) $CLUSTER_SIZE" \
+    | _filter_qemu_io
+
+echo
+echo "Verify that untouched cluster remains unallocated"
+echo
+
+$QEMU_IMG map "$OVERLAY" | _filter_qemu_img_map
+
+echo
+
 # success, all done
 echo "*** done"
 rm -f $seq.full
diff --git a/tests/qemu-iotests/024.out b/tests/qemu-iotests/024.out
index 245fe8b1d1..e1e8eea863 100644
--- a/tests/qemu-iotests/024.out
+++ b/tests/qemu-iotests/024.out
@@ -201,4 +201,47 @@ read 262144/262144 bytes at offset 0
 read 65536/65536 bytes at offset 262144
 64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 
+
+=== Test rebase with different cluster sizes ===
+
+Creating backing chain
+
+Formatting 'TEST_DIR/subdir/t.IMGFMT.base_new', fmt=IMGFMT size=393216
+Formatting 'TEST_DIR/subdir/t.IMGFMT.base_old', fmt=IMGFMT size=393216 backing_file=TEST_DIR/subdir/t.IMGFMT.base_new backing_fmt=IMGFMT
+Formatting 'TEST_DIR/subdir/t.IMGFMT', fmt=IMGFMT size=393216 backing_file=TEST_DIR/subdir/t.IMGFMT.base_old backing_fmt=IMGFMT
+image: TEST_DIR/subdir/t.IMGFMT
+file format: IMGFMT
+virtual size: 384 KiB (393216 bytes)
+cluster_size: 131072
+backing file: TEST_DIR/subdir/t.IMGFMT.base_old
+backing file format: IMGFMT
+
+Fill backing files with data
+
+wrote 65536/65536 bytes at offset 65536
+64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 65536/65536 bytes at offset 262144
+64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+
+Rebase onto another image in the same chain
+
+Verify that data is read the same before and after rebase
+
+read 65536/65536 bytes at offset 0
+64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 65536/65536 bytes at offset 65536
+64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 131072/131072 bytes at offset 131072
+128 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 65536/65536 bytes at offset 262144
+64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 65536/65536 bytes at offset 327680
+64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+
+Verify that untouched cluster remains unallocated
+
+Offset          Length          File
+0               0x20000         TEST_DIR/subdir/t.IMGFMT
+0x40000         0x20000         TEST_DIR/subdir/t.IMGFMT
+
 *** done
diff --git a/tests/qemu-iotests/271 b/tests/qemu-iotests/271
index c7c2cadda0..e243f57ba7 100755
--- a/tests/qemu-iotests/271
+++ b/tests/qemu-iotests/271
@@ -899,6 +899,72 @@ _concurrent_io     | $QEMU_IO | _filter_qemu_io | \
     sed -e 's/\(20480\|40960\)/OFFSET/'
 _concurrent_verify | $QEMU_IO | _filter_qemu_io
 
+############################################################
+############################################################
+############################################################
+
+echo
+echo "### Rebase of qcow2 images with subclusters ###"
+echo
+
+l2_offset=$((0x400000))
+
+# Check that rebase operation preserve holes between allocated subclusters
+# within one cluster (i.e. does not allocate extra space).  Check that the
+# data is preserved as well.
+#
+# Base (new backing): -- -- -- ... -- -- --
+# Mid (old backing):  -- 11 -- ... -- 22 --
+# Top:                -- -- -- ... -- -- --
+
+echo "### Preservation of unallocated holes after rebase ###"
+echo
+
+echo "# create backing chain"
+echo
+
+TEST_IMG="$TEST_IMG.base" _make_test_img -o cluster_size=1M,extended_l2=on 1M
+TEST_IMG="$TEST_IMG.mid" _make_test_img -o cluster_size=1M,extended_l2=on \
+    -b "$TEST_IMG.base" -F qcow2 1M
+TEST_IMG="$TEST_IMG.top" _make_test_img -o cluster_size=1M,extended_l2=on \
+    -b "$TEST_IMG.mid" -F qcow2 1M
+
+echo
+echo "# fill old backing with data (separate subclusters within cluster)"
+echo
+
+$QEMU_IO -c "write -P 0x11 32k 32k" \
+         -c "write -P 0x22 $(( 30 * 32 ))k 32k" \
+         "$TEST_IMG.mid" | _filter_qemu_io
+
+echo
+echo "# rebase topmost image onto the new backing"
+echo
+
+$QEMU_IMG rebase -b "$TEST_IMG.base" -F qcow2 "$TEST_IMG.top"
+
+echo "# verify that data is read the same before and after rebase"
+echo
+
+$QEMU_IO -c "read -P 0x00 0 32k" \
+         -c "read -P 0x11 32k 32k" \
+         -c "read -P 0x00 64k $(( 28 * 32 ))k" \
+         -c "read -P 0x22 $(( 30 * 32 ))k 32k" \
+         -c "read -P 0x00 $(( 31 * 32 ))k 32k" \
+         "$TEST_IMG.top" | _filter_qemu_io
+
+echo
+echo "# verify that only selected subclusters remain allocated"
+echo
+
+$QEMU_IMG map "$TEST_IMG.top" | _filter_testdir
+
+echo
+echo "# verify image bitmap"
+echo
+
+TEST_IMG="$TEST_IMG.top" alloc="1 30" zero="" _verify_l2_bitmap 0
+
 # success, all done
 echo "*** done"
 rm -f $seq.full
diff --git a/tests/qemu-iotests/271.out b/tests/qemu-iotests/271.out
index 5be780de76..c335a6c608 100644
--- a/tests/qemu-iotests/271.out
+++ b/tests/qemu-iotests/271.out
@@ -723,4 +723,46 @@ wrote 2048/2048 bytes at offset OFFSET
 2 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 wrote 2048/2048 bytes at offset OFFSET
 2 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+
+### Rebase of qcow2 images with subclusters ###
+
+### Preservation of unallocated holes after rebase ###
+
+# create backing chain
+
+Formatting 'TEST_DIR/t.IMGFMT.base', fmt=IMGFMT size=1048576
+Formatting 'TEST_DIR/t.IMGFMT.mid', fmt=IMGFMT size=1048576 backing_file=TEST_DIR/t.IMGFMT.base backing_fmt=IMGFMT
+Formatting 'TEST_DIR/t.IMGFMT.top', fmt=IMGFMT size=1048576 backing_file=TEST_DIR/t.IMGFMT.mid backing_fmt=IMGFMT
+
+# fill old backing with data (separate subclusters within cluster)
+
+wrote 32768/32768 bytes at offset 32768
+32 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 32768/32768 bytes at offset 983040
+32 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+
+# rebase topmost image onto the new backing
+
+# verify that data is read the same before and after rebase
+
+read 32768/32768 bytes at offset 0
+32 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 32768/32768 bytes at offset 32768
+32 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 917504/917504 bytes at offset 65536
+896 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 32768/32768 bytes at offset 983040
+32 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 32768/32768 bytes at offset 1015808
+32 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+
+# verify that only selected subclusters remain allocated
+
+Offset          Length          Mapped to       File
+0x8000          0x8000          0x508000        TEST_DIR/t.qcow2.top
+0xf0000         0x8000          0x5f0000        TEST_DIR/t.qcow2.top
+
+# verify image bitmap
+
+L2 entry #0: 0x8000000000500000 0000000040000002
 *** done
-- 
2.39.3



^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v2 7/8] qemu-img: add compression option to rebase subcommand
  2023-09-15 16:20 [PATCH v2 0/8] qemu-img: rebase: add compression support Andrey Drobyshev via
                   ` (5 preceding siblings ...)
  2023-09-15 16:20 ` [PATCH v2 6/8] iotests/{024, 271}: add testcases for qemu-img rebase Andrey Drobyshev via
@ 2023-09-15 16:20 ` Andrey Drobyshev via
  2023-09-15 16:20 ` [PATCH v2 8/8] iotests: add tests for "qemu-img rebase" with compression Andrey Drobyshev via
  7 siblings, 0 replies; 21+ messages in thread
From: Andrey Drobyshev via @ 2023-09-15 16:20 UTC (permalink / raw)
  To: qemu-block; +Cc: qemu-devel, hreitz, kwolf, eblake, andrey.drobyshev, den

If we rebase an image whose backing file has compressed clusters, we
might end up wasting disk space since the copied clusters are now
uncompressed.  In order to have better control over this, let's add
"--compress" option to the "qemu-img rebase" command.

Note that this option affects only the clusters which are actually being
copied from the original backing file.  The clusters which were
uncompressed in the target image will remain so.

Signed-off-by: Andrey Drobyshev <andrey.drobyshev@virtuozzo.com>
Reviewed-by: Denis V. Lunev <den@openvz.org>
Reviewed-by: Hanna Czenczek <hreitz@redhat.com>
---
 docs/tools/qemu-img.rst |  6 ++++--
 qemu-img-cmds.hx        |  4 ++--
 qemu-img.c              | 26 ++++++++++++++++++++------
 3 files changed, 26 insertions(+), 10 deletions(-)

diff --git a/docs/tools/qemu-img.rst b/docs/tools/qemu-img.rst
index ca5a2773cf..4459c065f1 100644
--- a/docs/tools/qemu-img.rst
+++ b/docs/tools/qemu-img.rst
@@ -667,7 +667,7 @@ Command description:
 
   List, apply, create or delete snapshots in image *FILENAME*.
 
-.. option:: rebase [--object OBJECTDEF] [--image-opts] [-U] [-q] [-f FMT] [-t CACHE] [-T SRC_CACHE] [-p] [-u] -b BACKING_FILE [-F BACKING_FMT] FILENAME
+.. option:: rebase [--object OBJECTDEF] [--image-opts] [-U] [-q] [-f FMT] [-t CACHE] [-T SRC_CACHE] [-p] [-u] [-c] -b BACKING_FILE [-F BACKING_FMT] FILENAME
 
   Changes the backing file of an image. Only the formats ``qcow2`` and
   ``qed`` support changing the backing file.
@@ -694,7 +694,9 @@ Command description:
 
     In order to achieve this, any clusters that differ between
     *BACKING_FILE* and the old backing file of *FILENAME* are merged
-    into *FILENAME* before actually changing the backing file.
+    into *FILENAME* before actually changing the backing file. With the
+    ``-c`` option specified, the clusters which are being merged (but not
+    the entire *FILENAME* image) are compressed when written.
 
     Note that the safe mode is an expensive operation, comparable to
     converting an image. It only works if the old backing file still
diff --git a/qemu-img-cmds.hx b/qemu-img-cmds.hx
index 1b1dab5b17..068692d13e 100644
--- a/qemu-img-cmds.hx
+++ b/qemu-img-cmds.hx
@@ -88,9 +88,9 @@ SRST
 ERST
 
 DEF("rebase", img_rebase,
-    "rebase [--object objectdef] [--image-opts] [-U] [-q] [-f fmt] [-t cache] [-T src_cache] [-p] [-u] -b backing_file [-F backing_fmt] filename")
+    "rebase [--object objectdef] [--image-opts] [-U] [-q] [-f fmt] [-t cache] [-T src_cache] [-p] [-u] [-c] -b backing_file [-F backing_fmt] filename")
 SRST
-.. option:: rebase [--object OBJECTDEF] [--image-opts] [-U] [-q] [-f FMT] [-t CACHE] [-T SRC_CACHE] [-p] [-u] -b BACKING_FILE [-F BACKING_FMT] FILENAME
+.. option:: rebase [--object OBJECTDEF] [--image-opts] [-U] [-q] [-f FMT] [-t CACHE] [-T SRC_CACHE] [-p] [-u] [-c] -b BACKING_FILE [-F BACKING_FMT] FILENAME
 ERST
 
 DEF("resize", img_resize,
diff --git a/qemu-img.c b/qemu-img.c
index 83950af42b..8f39dae187 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -3527,11 +3527,13 @@ static int img_rebase(int argc, char **argv)
     char *filename;
     const char *fmt, *cache, *src_cache, *out_basefmt, *out_baseimg;
     int c, flags, src_flags, ret;
+    BdrvRequestFlags write_flags = 0;
     bool writethrough, src_writethrough;
     int unsafe = 0;
     bool force_share = false;
     int progress = 0;
     bool quiet = false;
+    bool compress = false;
     Error *local_err = NULL;
     bool image_opts = false;
     int64_t write_align;
@@ -3548,9 +3550,10 @@ static int img_rebase(int argc, char **argv)
             {"object", required_argument, 0, OPTION_OBJECT},
             {"image-opts", no_argument, 0, OPTION_IMAGE_OPTS},
             {"force-share", no_argument, 0, 'U'},
+            {"compress", no_argument, 0, 'c'},
             {0, 0, 0, 0}
         };
-        c = getopt_long(argc, argv, ":hf:F:b:upt:T:qU",
+        c = getopt_long(argc, argv, ":hf:F:b:upt:T:qUc",
                         long_options, NULL);
         if (c == -1) {
             break;
@@ -3598,6 +3601,9 @@ static int img_rebase(int argc, char **argv)
         case 'U':
             force_share = true;
             break;
+        case 'c':
+            compress = true;
+            break;
         }
     }
 
@@ -3650,6 +3656,14 @@ static int img_rebase(int argc, char **argv)
 
     unfiltered_bs = bdrv_skip_filters(bs);
 
+    if (compress && !block_driver_can_compress(unfiltered_bs->drv)) {
+        error_report("Compression not supported for this file format");
+        ret = -1;
+        goto out;
+    } else if (compress) {
+        write_flags |= BDRV_REQ_WRITE_COMPRESSED;
+    }
+
     if (out_basefmt != NULL) {
         if (bdrv_find_format(out_basefmt) == NULL) {
             error_report("Invalid format name: '%s'", out_basefmt);
@@ -3659,18 +3673,18 @@ static int img_rebase(int argc, char **argv)
     }
 
     /*
-     * We need overlay subcluster size to make sure write requests are
-     * aligned.
+     * We need overlay subcluster size (or cluster size in case writes are
+     * compressed) to make sure write requests are aligned.
      */
     ret = bdrv_get_info(unfiltered_bs, &bdi);
     if (ret < 0) {
         error_report("could not get block driver info");
         goto out;
     } else if (bdi.subcluster_size == 0) {
-        bdi.subcluster_size = 1;
+        bdi.cluster_size = bdi.subcluster_size = 1;
     }
 
-    write_align = bdi.subcluster_size;
+    write_align = compress ? bdi.cluster_size : bdi.subcluster_size;
 
     /* For safe rebasing we need to compare old and new backing file */
     if (!unsafe) {
@@ -3923,7 +3937,7 @@ static int img_rebase(int argc, char **argv)
                     } else {
                         assert(written + pnum <= IO_BUF_SIZE);
                         ret = blk_pwrite(blk, offset + written, pnum,
-                                         buf_old + written, 0);
+                                         buf_old + written, write_flags);
                     }
                     if (ret < 0) {
                         error_report("Error while writing to COW image: %s",
-- 
2.39.3



^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v2 8/8] iotests: add tests for "qemu-img rebase" with compression
  2023-09-15 16:20 [PATCH v2 0/8] qemu-img: rebase: add compression support Andrey Drobyshev via
                   ` (6 preceding siblings ...)
  2023-09-15 16:20 ` [PATCH v2 7/8] qemu-img: add compression option to rebase subcommand Andrey Drobyshev via
@ 2023-09-15 16:20 ` Andrey Drobyshev via
  2023-09-19 11:14   ` Hanna Czenczek
  7 siblings, 1 reply; 21+ messages in thread
From: Andrey Drobyshev via @ 2023-09-15 16:20 UTC (permalink / raw)
  To: qemu-block; +Cc: qemu-devel, hreitz, kwolf, eblake, andrey.drobyshev, den

The test cases considered so far:

314 (new test suite):

1. Check that compression mode isn't compatible with "-f raw" (raw
   format doesn't support compression).
2. Check that rebasing an image onto no backing file preserves the data
   and writes the copied clusters actually compressed.
3. Same as 2, but with a raw backing file (i.e. the clusters copied from the
   backing are originally uncompressed -- we check they end up compressed
   after being merged).
4. Remove a single delta from a backing chain, perform the same checks
   as in 2.
5. Check that even when backing and overlay are initially uncompressed,
   copied clusters end up compressed when rebase with compression is
   performed.

271:

1. Check that when target image has subclusters, rebase with compression
   will make an entire cluster containing the written subcluster
   compressed.

Signed-off-by: Andrey Drobyshev <andrey.drobyshev@virtuozzo.com>
---
 tests/qemu-iotests/271     |  65 +++++++++++++++
 tests/qemu-iotests/271.out |  40 +++++++++
 tests/qemu-iotests/314     | 165 +++++++++++++++++++++++++++++++++++++
 tests/qemu-iotests/314.out |  75 +++++++++++++++++
 4 files changed, 345 insertions(+)
 create mode 100755 tests/qemu-iotests/314
 create mode 100644 tests/qemu-iotests/314.out

diff --git a/tests/qemu-iotests/271 b/tests/qemu-iotests/271
index e243f57ba7..59a6fafa2f 100755
--- a/tests/qemu-iotests/271
+++ b/tests/qemu-iotests/271
@@ -965,6 +965,71 @@ echo
 
 TEST_IMG="$TEST_IMG.top" alloc="1 30" zero="" _verify_l2_bitmap 0
 
+# Check that rebase with compression works correctly with images containing
+# subclusters.  When compression is enabled and we allocate a new
+# subcluster within the target (overlay) image, we expect the entire cluster
+# containing that subcluster to become compressed.
+#
+# Here we expect 1st and 3rd clusters of the top (overlay) image to become
+# compressed after the rebase, while cluster 2 to remain unallocated and
+# be read from the base (new backing) image.
+#
+# Base (new backing): |-- -- .. -- --|11 11 .. 11 11|-- -- .. -- --|
+# Mid (old backing):  |-- -- .. -- 22|-- -- .. -- --|33 -- .. -- --|
+# Top:                |-- -- .. -- --|-- -- -- -- --|-- -- .. -- --|
+
+echo
+echo "### Rebase with compression for images with subclusters ###"
+echo
+
+echo "# create backing chain"
+echo
+
+TEST_IMG="$TEST_IMG.base" _make_test_img -o cluster_size=1M,extended_l2=on 3M
+TEST_IMG="$TEST_IMG.mid" _make_test_img -o cluster_size=1M,extended_l2=on \
+    -b "$TEST_IMG.base" -F qcow2 3M
+TEST_IMG="$TEST_IMG.top" _make_test_img -o cluster_size=1M,extended_l2=on \
+    -b "$TEST_IMG.mid" -F qcow2 3M
+
+echo
+echo "# fill old and new backing with data"
+echo
+
+$QEMU_IO -c "write -P 0x11 1M 1M" "$TEST_IMG.base" | _filter_qemu_io
+$QEMU_IO -c "write -P 0x22 $(( 31 * 32 ))k 32k" \
+         -c "write -P 0x33 $(( 64 * 32 ))k 32k" \
+         "$TEST_IMG.mid" | _filter_qemu_io
+
+echo
+echo "# rebase topmost image onto the new backing, with compression"
+echo
+
+$QEMU_IMG rebase -c -b "$TEST_IMG.base" -F qcow2 "$TEST_IMG.top"
+
+echo "# verify that the 1st and 3rd clusters've become compressed"
+echo
+
+$QEMU_IMG map --output=json "$TEST_IMG.top" | _filter_testdir
+
+echo
+echo "# verify that data is read the same before and after rebase"
+echo
+
+$QEMU_IO -c "read -P 0x22 $(( 31 * 32 ))k 32k" \
+         -c "read -P 0x11 1M 1M" \
+         -c "read -P 0x33 $(( 64 * 32 ))k 32k" \
+         "$TEST_IMG.top" | _filter_qemu_io
+
+echo
+echo "# verify image bitmap"
+echo
+
+# For compressed clusters bitmap is always 0.  For unallocated cluster
+# there should be no entry at all, thus bitmap is also 0.
+TEST_IMG="$TEST_IMG.top" alloc="" zero="" _verify_l2_bitmap 0
+TEST_IMG="$TEST_IMG.top" alloc="" zero="" _verify_l2_bitmap 1
+TEST_IMG="$TEST_IMG.top" alloc="" zero="" _verify_l2_bitmap 2
+
 # success, all done
 echo "*** done"
 rm -f $seq.full
diff --git a/tests/qemu-iotests/271.out b/tests/qemu-iotests/271.out
index c335a6c608..0b24d50159 100644
--- a/tests/qemu-iotests/271.out
+++ b/tests/qemu-iotests/271.out
@@ -765,4 +765,44 @@ Offset          Length          Mapped to       File
 # verify image bitmap
 
 L2 entry #0: 0x8000000000500000 0000000040000002
+
+### Rebase with compression for images with subclusters ###
+
+# create backing chain
+
+Formatting 'TEST_DIR/t.IMGFMT.base', fmt=IMGFMT size=3145728
+Formatting 'TEST_DIR/t.IMGFMT.mid', fmt=IMGFMT size=3145728 backing_file=TEST_DIR/t.IMGFMT.base backing_fmt=IMGFMT
+Formatting 'TEST_DIR/t.IMGFMT.top', fmt=IMGFMT size=3145728 backing_file=TEST_DIR/t.IMGFMT.mid backing_fmt=IMGFMT
+
+# fill old and new backing with data
+
+wrote 1048576/1048576 bytes at offset 1048576
+1 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 32768/32768 bytes at offset 1015808
+32 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 32768/32768 bytes at offset 2097152
+32 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+
+# rebase topmost image onto the new backing, with compression
+
+# verify that the 1st and 3rd clusters've become compressed
+
+[{ "start": 0, "length": 1048576, "depth": 0, "present": true, "zero": false, "data": true, "compressed": true},
+{ "start": 1048576, "length": 1048576, "depth": 1, "present": true, "zero": false, "data": true, "compressed": false, "offset": 5242880},
+{ "start": 2097152, "length": 1048576, "depth": 0, "present": true, "zero": false, "data": true, "compressed": true}]
+
+# verify that data is read the same before and after rebase
+
+read 32768/32768 bytes at offset 1015808
+32 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 1048576/1048576 bytes at offset 1048576
+1 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 32768/32768 bytes at offset 2097152
+32 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+
+# verify image bitmap
+
+L2 entry #0: 0x4008000000500000 0000000000000000
+L2 entry #1: 0x0000000000000000 0000000000000000
+L2 entry #2: 0x400800000050040b 0000000000000000
 *** done
diff --git a/tests/qemu-iotests/314 b/tests/qemu-iotests/314
new file mode 100755
index 0000000000..96d7b4d258
--- /dev/null
+++ b/tests/qemu-iotests/314
@@ -0,0 +1,165 @@
+#!/usr/bin/env bash
+# group: rw backing auto quick
+#
+# Test qemu-img rebase with compression
+#
+# Copyright (c) 2023 Virtuozzo International GmbH.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+#
+
+# creator
+owner=andrey.drobyshev@virtuozzo.com
+
+seq=`basename $0`
+echo "QA output created by $seq"
+
+status=1	# failure is the default!
+
+_cleanup()
+{
+    _cleanup_test_img
+    _rm_test_img "$TEST_IMG.base"
+    _rm_test_img "$TEST_IMG.itmd"
+}
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+# get standard environment, filters and checks
+. ./common.rc
+. ./common.filter
+
+_supported_fmt qcow2
+_supported_proto file
+_supported_os Linux
+
+# Want the size divisible by 2 and 3
+size=$(( 48 * 1024 * 1024 ))
+half_size=$(( size / 2 ))
+third_size=$(( size / 3 ))
+
+# 1. "qemu-img rebase -c" should refuse working with any format which doesn't
+# support compression.  We only check "-f raw" here.
+echo
+echo "=== Testing compressed rebase format compatibility ==="
+echo
+
+$QEMU_IMG create -f raw "$TEST_IMG" "$size" | _filter_img_create
+$QEMU_IMG rebase -c -f raw -b "" "$TEST_IMG"
+
+# 2. Write the 1st half of $size to backing file (compressed), 2nd half -- to
+# the top image (also compressed).  Rebase the top image onto no backing file,
+# with compression (i.e. "qemu-img -c -b ''").  Check that the resulting image
+# has the written data preserved, and "qemu-img check" reports 100% clusters
+# as compressed.
+echo
+echo "=== Testing rebase with compression onto no backing file ==="
+echo
+
+TEST_IMG="$TEST_IMG.base" _make_test_img $size
+_make_test_img -b "$TEST_IMG.base" -F $IMGFMT $size
+
+$QEMU_IO -c "write -c -P 0xaa 0 $half_size" "$TEST_IMG.base" | _filter_qemu_io
+$QEMU_IO -c "write -c -P 0xbb $half_size $half_size" "$TEST_IMG" \
+    | _filter_qemu_io
+
+$QEMU_IMG rebase -c -f $IMGFMT -b "" "$TEST_IMG"
+
+$QEMU_IO -c "read -P 0xaa 0 $half_size" "$TEST_IMG" | _filter_qemu_io
+$QEMU_IO -c "read -P 0xbb $half_size $half_size" "$TEST_IMG" | _filter_qemu_io
+
+$QEMU_IMG check "$TEST_IMG" | _filter_testdir
+
+# 3. Same as the previous one, but with raw backing file (hence write to
+# the backing is uncompressed).
+echo
+echo "=== Testing rebase with compression with raw backing file ==="
+echo
+
+$QEMU_IMG create -f raw "$TEST_IMG.base" "$half_size" | _filter_img_create
+_make_test_img -b "$TEST_IMG.base" -F raw $size
+
+$QEMU_IO -f raw -c "write -P 0xaa 0 $half_size" "$TEST_IMG.base" \
+    | _filter_qemu_io
+$QEMU_IO -c "write -c -P 0xbb $half_size $half_size" \
+    "$TEST_IMG" | _filter_qemu_io
+
+$QEMU_IMG rebase -c -f $IMGFMT -b "" "$TEST_IMG"
+
+$QEMU_IO -c "read -P 0xaa 0 $half_size" "$TEST_IMG" | _filter_qemu_io
+$QEMU_IO -c "read -P 0xbb $half_size $half_size" "$TEST_IMG" | _filter_qemu_io
+
+$QEMU_IMG check "$TEST_IMG" | _filter_testdir
+
+# 4. Create a backing chain base<--itmd<--img, filling 1st, 2nd and 3rd
+# thirds of them, respectively (with compression).  Rebase img onto base,
+# effectively deleting itmd from the chain, and check that written data is
+# preserved in the resulting image.  Also check that "qemu-img check" reports
+# 100% clusters as compressed.
+echo
+echo "=== Testing compressed rebase removing single delta from the chain ==="
+echo
+
+TEST_IMG="$TEST_IMG.base" _make_test_img $size
+TEST_IMG="$TEST_IMG.itmd" _make_test_img -b "$TEST_IMG.base" -F $IMGFMT $size
+_make_test_img -b "$TEST_IMG.itmd" -F $IMGFMT $size
+
+$QEMU_IO -c "write -c -P 0xaa 0 $third_size" \
+    "$TEST_IMG.base" | _filter_qemu_io
+$QEMU_IO -c "write -c -P 0xbb $third_size $third_size" \
+    "$TEST_IMG.itmd" | _filter_qemu_io
+$QEMU_IO -c "write -c -P 0xcc $((third_size * 2 )) $third_size" \
+    "$TEST_IMG" | _filter_qemu_io
+
+$QEMU_IMG rebase -c -f $IMGFMT -b "$TEST_IMG.base" -F $IMGFMT "$TEST_IMG"
+
+$QEMU_IO -c "read -P 0xaa 0 $third_size" "$TEST_IMG" | _filter_qemu_io
+$QEMU_IO -c "read -P 0xbb $third_size $third_size" \
+    "$TEST_IMG" | _filter_qemu_io
+$QEMU_IO -c "read -P 0xcc $(( third_size * 2 )) $third_size" \
+    "$TEST_IMG" | _filter_qemu_io
+
+$QEMU_IMG check "$TEST_IMG" | _filter_testdir
+
+# 5. Create one-cluster backing and overlay images, and fill only the first
+# (half - 1) bytes of the backing with data (uncompressed).  Rebase the
+# overlay onto no backing file with compression.  Check that data is still
+# read correctly, and that cluster is now really compressed ("qemu-img check"
+# reports 100% clusters as compressed.
+echo
+echo "=== Testing compressed rebase with unaligned unmerged data ==="
+echo
+
+CLUSTER_SIZE=65536
+
+TEST_IMG="$TEST_IMG.base" _make_test_img $CLUSTER_SIZE
+_make_test_img -b "$TEST_IMG.base" -F $IMGFMT $CLUSTER_SIZE
+
+$QEMU_IO -c "write -P 0xaa 0 $(( CLUSTER_SIZE / 2 - 1 ))" $TEST_IMG.base \
+    | _filter_qemu_io
+
+$QEMU_IMG rebase -c -f $IMGFMT -b "" "$TEST_IMG"
+
+$QEMU_IO -c "read -P 0xaa 0 $(( CLUSTER_SIZE / 2 - 1 ))" "$TEST_IMG" \
+    | _filter_qemu_io
+$QEMU_IO -c \
+    "read -P 0x00 $(( CLUSTER_SIZE / 2 - 1 )) $(( CLUSTER_SIZE / 2 + 1 ))" \
+    "$TEST_IMG" | _filter_qemu_io
+
+$QEMU_IMG check "$TEST_IMG" | _filter_testdir
+
+# success, all done
+echo
+echo '*** done'
+rm -f $seq.full
+status=0
diff --git a/tests/qemu-iotests/314.out b/tests/qemu-iotests/314.out
new file mode 100644
index 0000000000..ac9337a543
--- /dev/null
+++ b/tests/qemu-iotests/314.out
@@ -0,0 +1,75 @@
+QA output created by 314
+
+=== Testing compressed rebase format compatibility ===
+
+Formatting 'TEST_DIR/t.IMGFMT', fmt=raw size=50331648
+qemu-img: Compression not supported for this file format
+
+=== Testing rebase with compression onto no backing file ===
+
+Formatting 'TEST_DIR/t.IMGFMT.base', fmt=IMGFMT size=50331648
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=50331648 backing_file=TEST_DIR/t.IMGFMT.base backing_fmt=IMGFMT
+wrote 25165824/25165824 bytes at offset 0
+24 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 25165824/25165824 bytes at offset 25165824
+24 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 25165824/25165824 bytes at offset 0
+24 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 25165824/25165824 bytes at offset 25165824
+24 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+No errors were found on the image.
+768/768 = 100.00% allocated, 100.00% fragmented, 100.00% compressed clusters
+Image end offset: 458752
+
+=== Testing rebase with compression with raw backing file ===
+
+Formatting 'TEST_DIR/t.IMGFMT.base', fmt=raw size=25165824
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=50331648 backing_file=TEST_DIR/t.IMGFMT.base backing_fmt=raw
+wrote 25165824/25165824 bytes at offset 0
+24 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 25165824/25165824 bytes at offset 25165824
+24 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 25165824/25165824 bytes at offset 0
+24 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 25165824/25165824 bytes at offset 25165824
+24 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+No errors were found on the image.
+768/768 = 100.00% allocated, 100.00% fragmented, 100.00% compressed clusters
+Image end offset: 458752
+
+=== Testing compressed rebase removing single delta from the chain ===
+
+Formatting 'TEST_DIR/t.IMGFMT.base', fmt=IMGFMT size=50331648
+Formatting 'TEST_DIR/t.IMGFMT.itmd', fmt=IMGFMT size=50331648 backing_file=TEST_DIR/t.IMGFMT.base backing_fmt=IMGFMT
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=50331648 backing_file=TEST_DIR/t.IMGFMT.itmd backing_fmt=IMGFMT
+wrote 16777216/16777216 bytes at offset 0
+16 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 16777216/16777216 bytes at offset 16777216
+16 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 16777216/16777216 bytes at offset 33554432
+16 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 16777216/16777216 bytes at offset 0
+16 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 16777216/16777216 bytes at offset 16777216
+16 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 16777216/16777216 bytes at offset 33554432
+16 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+No errors were found on the image.
+512/768 = 66.67% allocated, 100.00% fragmented, 100.00% compressed clusters
+Image end offset: 458752
+
+=== Testing compressed rebase with unaligned unmerged data ===
+
+Formatting 'TEST_DIR/t.IMGFMT.base', fmt=IMGFMT size=65536
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=65536 backing_file=TEST_DIR/t.IMGFMT.base backing_fmt=IMGFMT
+wrote 32767/32767 bytes at offset 0
+31.999 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 32767/32767 bytes at offset 0
+31.999 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 32769/32769 bytes at offset 32767
+32.001 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+No errors were found on the image.
+1/1 = 100.00% allocated, 100.00% fragmented, 100.00% compressed clusters
+Image end offset: 393216
+
+*** done
-- 
2.39.3



^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 3/8] qemu-img: rebase: use backing files' BlockBackend for buffer alignment
  2023-09-15 16:20 ` [PATCH v2 3/8] qemu-img: rebase: use backing files' BlockBackend for buffer alignment Andrey Drobyshev via
@ 2023-09-15 18:39   ` Eric Blake
  2023-09-15 19:27     ` Andrey Drobyshev
  2023-09-19  8:18   ` Hanna Czenczek
  1 sibling, 1 reply; 21+ messages in thread
From: Eric Blake @ 2023-09-15 18:39 UTC (permalink / raw)
  To: Andrey Drobyshev; +Cc: qemu-block, qemu-devel, hreitz, kwolf, den

On Fri, Sep 15, 2023 at 07:20:11PM +0300, Andrey Drobyshev wrote:
> Since commit bb1c05973cf ("qemu-img: Use qemu_blockalign"), buffers for
> the data read from the old and new backing files are aligned using
> BlockDriverState (or BlockBackend later on) referring to the target image.
> However, this isn't quite right, because buf_new is only being used for
> reading from the new backing, while buf_old is being used for both reading
> from the old backing and writing to the target.  Let's take that into account
> and use more appropriate values as alignments.
> 
> Signed-off-by: Andrey Drobyshev <andrey.drobyshev@virtuozzo.com>
> ---
>  qemu-img.c | 9 +++++++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/qemu-img.c b/qemu-img.c
> index 50660ba920..d12e4a4753 100644
> --- a/qemu-img.c
> +++ b/qemu-img.c
> @@ -3750,8 +3750,13 @@ static int img_rebase(int argc, char **argv)
>          int64_t n;
>          float local_progress = 0;
>  
> -        buf_old = blk_blockalign(blk, IO_BUF_SIZE);
> -        buf_new = blk_blockalign(blk, IO_BUF_SIZE);
> +        if (blk_old_backing && bdrv_opt_mem_align(blk_bs(blk)) >
> +            bdrv_opt_mem_align(blk_bs(blk_old_backing))) {
> +            buf_old = blk_blockalign(blk, IO_BUF_SIZE);
> +        } else {
> +            buf_old = blk_blockalign(blk_old_backing, IO_BUF_SIZE);
> +        }

Since bdrv_opt_mem_align(NULL) is safe, could we just simplify this to:

buf_old = qemu_memalign(MAX(bdrv_opt_mem_align(blk_old_backing),
                            bdrv_opt_mem_align(blk)), IO_BUF_SIZE);

instead of going through an if statement?  Or is the problem that
bdrv_opt_mem_align(NULL) can return the host page size (perhaps 64k),
which may be larger than technically needed in some scenarios?

> +        buf_new = blk_blockalign(blk_new_backing, IO_BUF_SIZE);
>  
>          size = blk_getlength(blk);
>          if (size < 0) {
> -- 
> 2.39.3

At any rate, aligning the buffers by how they will be used makes sense
(if the destination blk has looser requirements than the source
blk_old_backing, then accesses into blk_old are suspect).

Reviewed-by: Eric Blake <eblake@redhat.com.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.
Virtualization:  qemu.org | libguestfs.org



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 3/8] qemu-img: rebase: use backing files' BlockBackend for buffer alignment
  2023-09-15 18:39   ` Eric Blake
@ 2023-09-15 19:27     ` Andrey Drobyshev
  0 siblings, 0 replies; 21+ messages in thread
From: Andrey Drobyshev @ 2023-09-15 19:27 UTC (permalink / raw)
  To: Eric Blake; +Cc: qemu-block, qemu-devel, hreitz, kwolf, den

On 9/15/23 21:39, Eric Blake wrote:
> On Fri, Sep 15, 2023 at 07:20:11PM +0300, Andrey Drobyshev wrote:
>> Since commit bb1c05973cf ("qemu-img: Use qemu_blockalign"), buffers for
>> the data read from the old and new backing files are aligned using
>> BlockDriverState (or BlockBackend later on) referring to the target image.
>> However, this isn't quite right, because buf_new is only being used for
>> reading from the new backing, while buf_old is being used for both reading
>> from the old backing and writing to the target.  Let's take that into account
>> and use more appropriate values as alignments.
>>
>> Signed-off-by: Andrey Drobyshev <andrey.drobyshev@virtuozzo.com>
>> ---
>>  qemu-img.c | 9 +++++++--
>>  1 file changed, 7 insertions(+), 2 deletions(-)
>>
>> diff --git a/qemu-img.c b/qemu-img.c
>> index 50660ba920..d12e4a4753 100644
>> --- a/qemu-img.c
>> +++ b/qemu-img.c
>> @@ -3750,8 +3750,13 @@ static int img_rebase(int argc, char **argv)
>>          int64_t n;
>>          float local_progress = 0;
>>  
>> -        buf_old = blk_blockalign(blk, IO_BUF_SIZE);
>> -        buf_new = blk_blockalign(blk, IO_BUF_SIZE);
>> +        if (blk_old_backing && bdrv_opt_mem_align(blk_bs(blk)) >
>> +            bdrv_opt_mem_align(blk_bs(blk_old_backing))) {
>> +            buf_old = blk_blockalign(blk, IO_BUF_SIZE);
>> +        } else {
>> +            buf_old = blk_blockalign(blk_old_backing, IO_BUF_SIZE);
>> +        }
> 
> Since bdrv_opt_mem_align(NULL) is safe, could we just simplify this to:
> 
> buf_old = qemu_memalign(MAX(bdrv_opt_mem_align(blk_old_backing),
>                             bdrv_opt_mem_align(blk)), IO_BUF_SIZE);
> 
> instead of going through an if statement?  Or is the problem that
> bdrv_opt_mem_align(NULL) can return the host page size (perhaps 64k),
> which may be larger than technically needed in some scenarios?
>

Although bdrv_opt_mem_align(NULL) is safe, blk_bs(NULL) is not.  And
bdrv_opt_mem_align() takes BlockDriverState* not BlockBackend*, so we
would have to perform the same check and there would be no simplification.

>> +        buf_new = blk_blockalign(blk_new_backing, IO_BUF_SIZE);
>>  
>>          size = blk_getlength(blk);
>>          if (size < 0) {
>> -- 
>> 2.39.3
> 
> At any rate, aligning the buffers by how they will be used makes sense
> (if the destination blk has looser requirements than the source
> blk_old_backing, then accesses into blk_old are suspect).
> 
> Reviewed-by: Eric Blake <eblake@redhat.com.
> 



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 4/8] qemu-img: add chunk size parameter to compare_buffers()
  2023-09-15 16:20 ` [PATCH v2 4/8] qemu-img: add chunk size parameter to compare_buffers() Andrey Drobyshev via
@ 2023-09-15 19:28   ` Eric Blake
  2023-09-19  8:32   ` Hanna Czenczek
  1 sibling, 0 replies; 21+ messages in thread
From: Eric Blake @ 2023-09-15 19:28 UTC (permalink / raw)
  To: Andrey Drobyshev; +Cc: qemu-block, qemu-devel, hreitz, kwolf, den

On Fri, Sep 15, 2023 at 07:20:12PM +0300, Andrey Drobyshev wrote:
> Add @chsize param to the function which, if non-zero, would represent
> the chunk size to be used for comparison.  If it's zero, then
> BDRV_SECTOR_SIZE is used as default chunk size, which is the previous
> behaviour.
> 
> In particular, we're going to use this param in img_rebase() to make the
> write requests aligned to a predefined alignment value.
> 
> Signed-off-by: Andrey Drobyshev <andrey.drobyshev@virtuozzo.com>
> ---
>  qemu-img.c | 22 ++++++++++++++--------
>  1 file changed, 14 insertions(+), 8 deletions(-)
> 
> diff --git a/qemu-img.c b/qemu-img.c
> index d12e4a4753..fcd31d7b5b 100644
> --- a/qemu-img.c
> +++ b/qemu-img.c
> @@ -1274,23 +1274,29 @@ static int is_allocated_sectors_min(const uint8_t *buf, int n, int *pnum,
>  }
>  
>  /*
> - * Compares two buffers sector by sector. Returns 0 if the first
> - * sector of each buffer matches, non-zero otherwise.
> + * Compares two buffers chunk by chunk, where @chsize is the chunk size.
> + * If @chsize is 0, default chunk size of BDRV_SECTOR_SIZE is used.
> + * Returns 0 if the first chunk of each buffer matches, non-zero otherwise.
>   *
> - * pnum is set to the sector-aligned size of the buffer prefix that
> + * @pnum is set to the size of the buffer prefix aligned to @chsize that
>   * has the same matching status as the first sector.

s/sector/chunk/

With that,
Reviewed-by: Eric Blake <eblake@redhat.com>

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.
Virtualization:  qemu.org | libguestfs.org



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 5/8] qemu-img: rebase: avoid unnecessary COW operations
  2023-09-15 16:20 ` [PATCH v2 5/8] qemu-img: rebase: avoid unnecessary COW operations Andrey Drobyshev via
@ 2023-09-15 21:52   ` Eric Blake
  2023-09-16 17:45     ` Andrey Drobyshev
  2023-09-19 10:46   ` Hanna Czenczek
  1 sibling, 1 reply; 21+ messages in thread
From: Eric Blake @ 2023-09-15 21:52 UTC (permalink / raw)
  To: Andrey Drobyshev; +Cc: qemu-block, qemu-devel, hreitz, kwolf, den

On Fri, Sep 15, 2023 at 07:20:13PM +0300, Andrey Drobyshev wrote:
> When rebasing an image from one backing file to another, we need to
> compare data from old and new backings.  If the diff between that data
> happens to be unaligned to the target cluster size, we might end up
> doing partial writes, which would lead to copy-on-write and additional IO.
> 
> Consider the following simple case (virtual_size == cluster_size == 64K):
> 
> base <-- inc1 <-- inc2
> 
> qemu-io -c "write -P 0xaa 0 32K" base.qcow2
> qemu-io -c "write -P 0xcc 32K 32K" base.qcow2
> qemu-io -c "write -P 0xbb 0 32K" inc1.qcow2
> qemu-io -c "write -P 0xcc 32K 32K" inc1.qcow2
> qemu-img rebase -f qcow2 -b base.qcow2 -F qcow2 inc2.qcow2
> 
> While doing rebase, we'll write a half of the cluster to inc2, and block
> layer will have to read the 2nd half of the same cluster from the base image
> inc1 while doing this write operation, although the whole cluster is already
> read earlier to perform data comparison.
> 
> In order to avoid these unnecessary IO cycles, let's make sure every
> write request is aligned to the overlay subcluster boundaries.  Using
> subcluster size is universal as for the images which don't have them
> this size equals to the cluster size, so in any case we end up aligning
> to the smallest unit of allocation.
> 
> Signed-off-by: Andrey Drobyshev <andrey.drobyshev@virtuozzo.com>
> ---
>  qemu-img.c | 76 ++++++++++++++++++++++++++++++++++++++++--------------
>  1 file changed, 56 insertions(+), 20 deletions(-)
> 
> diff --git a/qemu-img.c b/qemu-img.c
> index fcd31d7b5b..83950af42b 100644
> --- a/qemu-img.c
> +++ b/qemu-img.c
> @@ -3523,6 +3523,7 @@ static int img_rebase(int argc, char **argv)
>      uint8_t *buf_new = NULL;
>      BlockDriverState *bs = NULL, *prefix_chain_bs = NULL;
>      BlockDriverState *unfiltered_bs;
> +    BlockDriverInfo bdi = {0};
>      char *filename;
>      const char *fmt, *cache, *src_cache, *out_basefmt, *out_baseimg;
>      int c, flags, src_flags, ret;
> @@ -3533,6 +3534,7 @@ static int img_rebase(int argc, char **argv)
>      bool quiet = false;
>      Error *local_err = NULL;
>      bool image_opts = false;
> +    int64_t write_align;
>  
>      /* Parse commandline parameters */
>      fmt = NULL;
> @@ -3656,6 +3658,20 @@ static int img_rebase(int argc, char **argv)
>          }
>      }
>  
> +    /*
> +     * We need overlay subcluster size to make sure write requests are
> +     * aligned.
> +     */
> +    ret = bdrv_get_info(unfiltered_bs, &bdi);
> +    if (ret < 0) {
> +        error_report("could not get block driver info");
> +        goto out;
> +    } else if (bdi.subcluster_size == 0) {
> +        bdi.subcluster_size = 1;
> +    }
> +
> +    write_align = bdi.subcluster_size;
> +
>      /* For safe rebasing we need to compare old and new backing file */
>      if (!unsafe) {
>          QDict *options = NULL;
> @@ -3753,7 +3769,7 @@ static int img_rebase(int argc, char **argv)
>          int64_t old_backing_size = 0;
>          int64_t new_backing_size = 0;
>          uint64_t offset;
> -        int64_t n;
> +        int64_t n, n_old = 0, n_new = 0;
>          float local_progress = 0;
>  
>          if (blk_old_backing && bdrv_opt_mem_align(blk_bs(blk)) >
> @@ -3799,7 +3815,8 @@ static int img_rebase(int argc, char **argv)
>          }
>  
>          for (offset = 0; offset < size; offset += n) {
> -            bool buf_old_is_zero = false;
> +            bool old_backing_eof = false;
> +            int64_t n_alloc;
>  
>              /* How many bytes can we handle with the next read? */
>              n = MIN(IO_BUF_SIZE, size - offset);
> @@ -3844,33 +3861,48 @@ static int img_rebase(int argc, char **argv)
>                  }
>              }
>  
> +            /*
> +             * At this point we know that the region [offset; offset + n)
> +             * is unallocated within the target image.  This region might be
> +             * unaligned to the target image's (sub)cluster boundaries, as
> +             * old backing may have smaller clusters (or have subclusters).
> +             * We extend it to the aligned boundaries to avoid CoW on
> +             * partial writes in blk_pwrite(),
> +             */
> +            n += offset - QEMU_ALIGN_DOWN(offset, write_align);
> +            offset = QEMU_ALIGN_DOWN(offset, write_align);

If we are always aligning to write_align on each iteration of this
loop, won't this round down always be a no-op?

> +            n += QEMU_ALIGN_UP(offset + n, write_align) - (offset + n);
> +            n = MIN(n, size - offset);

However, I can see how this round up can matter.

> +            assert(!bdrv_is_allocated(unfiltered_bs, offset, n, &n_alloc) &&
> +                   n_alloc == n);

This assertion feels a bit heavyweight.  I see what you're trying to
say: if we found a (partial) unallocated region in the destination,
then since write_align is the minimum alignment of such allocation,
our rounding up to alignment boundaries should not change the fact
that we still have an unallocated region in the destination.  But we
already checked the data from the original offset through the original
n, and I argue that the original offset was unchanged; so really, all
the more you'd need to assert (if the assertion is even necessary)
could be something like

assert(new_offset == orig_offset);
tail = new_n - orig_n;
assert(!bdrv_is_allocated(unfiltered_bs, orig_offset+orig_n, tail, &n_alloc) && n_alloc == tail);

> +
> +            /*
> +             * Much like the with the target image, we'll try to read as much
> +             * of the old and new backings as we can.
> +             */
> +            n_old = MIN(n, MAX(0, old_backing_size - (int64_t) offset));
> +            if (blk_new_backing) {
> +                n_new = MIN(n, MAX(0, new_backing_size - (int64_t) offset));
> +            }
> +
>              /*
>               * Read old and new backing file and take into consideration that
>               * backing files may be smaller than the COW image.
>               */
> -            if (offset >= old_backing_size) {
> -                memset(buf_old, 0, n);
> -                buf_old_is_zero = true;
> +            memset(buf_old + n_old, 0, n - n_old);
> +            if (!n_old) {
> +                old_backing_eof = true;
>              } else {
> -                if (offset + n > old_backing_size) {
> -                    n = old_backing_size - offset;
> -                }
> -
> -                ret = blk_pread(blk_old_backing, offset, n, buf_old, 0);
> +                ret = blk_pread(blk_old_backing, offset, n_old, buf_old, 0);

Here's a more fundamental question.  Why are we reading from the old
backing file?  At this point in time, isn't unfiltered_bs (the target
image) still chained to the old backing file?  Why not just do a
blk_pread() from the destination?  It will cause the block layer to
read through the backing layers on our behalf, but the block layer
will then take care of any needed zeroing without us having to do a
memset here.

Then, once we have the contents of the disk (as seen through the
destination backed by the old image), we can compare to what the new
image would read, to see if we still need to write into the
destination or can just let the destination rely on backing from the
new image.

>                  if (ret < 0) {
>                      error_report("error while reading from old backing file");
>                      goto out;
>                  }
>              }
>  
> -            if (offset >= new_backing_size || !blk_new_backing) {
> -                memset(buf_new, 0, n);
> -            } else {
> -                if (offset + n > new_backing_size) {
> -                    n = new_backing_size - offset;
> -                }
> -
> -                ret = blk_pread(blk_new_backing, offset, n, buf_new, 0);
> +            memset(buf_new + n_new, 0, n - n_new);
> +            if (blk_new_backing && n_new) {
> +                ret = blk_pread(blk_new_backing, offset, n_new, buf_new, 0);
>                  if (ret < 0) {
>                      error_report("error while reading from new backing file");
>                      goto out;
> @@ -3884,11 +3916,12 @@ static int img_rebase(int argc, char **argv)
>                  int64_t pnum;
>  
>                  if (compare_buffers(buf_old + written, buf_new + written,
> -                                    n - written, 0, &pnum))
> +                                    n - written, write_align, &pnum))
>                  {
> -                    if (buf_old_is_zero) {
> +                    if (old_backing_eof) {
>                          ret = blk_pwrite_zeroes(blk, offset + written, pnum, 0);

Deciding whether to write zeroes (which can be more efficient) is
possible for more than just when the old backing file has already
reached EOF.

>                      } else {
> +                        assert(written + pnum <= IO_BUF_SIZE);
>                          ret = blk_pwrite(blk, offset + written, pnum,
>                                           buf_old + written, 0);
>                      }
> @@ -3900,6 +3933,9 @@ static int img_rebase(int argc, char **argv)
>                  }
>  
>                  written += pnum;
> +                if (offset + written >= old_backing_size) {
> +                    old_backing_eof = true;
> +                }
>              }
>              qemu_progress_print(local_progress, 100);
>          }
> -- 
> 2.39.3
>

The idea behind this patch makes sense, but the function is already so
long and complicated and you are adding more complexity.  I'll
continue reviewing the rest of the series, but I'll be interested in
seeing if any other block maintainers has an opinion on this patch.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.
Virtualization:  qemu.org | libguestfs.org



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 5/8] qemu-img: rebase: avoid unnecessary COW operations
  2023-09-15 21:52   ` Eric Blake
@ 2023-09-16 17:45     ` Andrey Drobyshev
  0 siblings, 0 replies; 21+ messages in thread
From: Andrey Drobyshev @ 2023-09-16 17:45 UTC (permalink / raw)
  To: Eric Blake; +Cc: qemu-block, qemu-devel, hreitz, kwolf, den

On 9/16/23 00:52, Eric Blake wrote:
> On Fri, Sep 15, 2023 at 07:20:13PM +0300, Andrey Drobyshev wrote:
>> When rebasing an image from one backing file to another, we need to
>> compare data from old and new backings.  If the diff between that data
>> happens to be unaligned to the target cluster size, we might end up
>> doing partial writes, which would lead to copy-on-write and additional IO.
>>
>> Consider the following simple case (virtual_size == cluster_size == 64K):
>>
>> base <-- inc1 <-- inc2
>>
>> qemu-io -c "write -P 0xaa 0 32K" base.qcow2
>> qemu-io -c "write -P 0xcc 32K 32K" base.qcow2
>> qemu-io -c "write -P 0xbb 0 32K" inc1.qcow2
>> qemu-io -c "write -P 0xcc 32K 32K" inc1.qcow2
>> qemu-img rebase -f qcow2 -b base.qcow2 -F qcow2 inc2.qcow2
>>
>> While doing rebase, we'll write a half of the cluster to inc2, and block
>> layer will have to read the 2nd half of the same cluster from the base image
>> inc1 while doing this write operation, although the whole cluster is already
>> read earlier to perform data comparison.
>>
>> In order to avoid these unnecessary IO cycles, let's make sure every
>> write request is aligned to the overlay subcluster boundaries.  Using
>> subcluster size is universal as for the images which don't have them
>> this size equals to the cluster size, so in any case we end up aligning
>> to the smallest unit of allocation.
>>
>> Signed-off-by: Andrey Drobyshev <andrey.drobyshev@virtuozzo.com>
>> ---
>>  qemu-img.c | 76 ++++++++++++++++++++++++++++++++++++++++--------------
>>  1 file changed, 56 insertions(+), 20 deletions(-)
>>
>> diff --git a/qemu-img.c b/qemu-img.c
>> index fcd31d7b5b..83950af42b 100644
>> --- a/qemu-img.c
>> +++ b/qemu-img.c
>> @@ -3523,6 +3523,7 @@ static int img_rebase(int argc, char **argv)
>>      uint8_t *buf_new = NULL;
>>      BlockDriverState *bs = NULL, *prefix_chain_bs = NULL;
>>      BlockDriverState *unfiltered_bs;
>> +    BlockDriverInfo bdi = {0};
>>      char *filename;
>>      const char *fmt, *cache, *src_cache, *out_basefmt, *out_baseimg;
>>      int c, flags, src_flags, ret;
>> @@ -3533,6 +3534,7 @@ static int img_rebase(int argc, char **argv)
>>      bool quiet = false;
>>      Error *local_err = NULL;
>>      bool image_opts = false;
>> +    int64_t write_align;
>>  
>>      /* Parse commandline parameters */
>>      fmt = NULL;
>> @@ -3656,6 +3658,20 @@ static int img_rebase(int argc, char **argv)
>>          }
>>      }
>>  
>> +    /*
>> +     * We need overlay subcluster size to make sure write requests are
>> +     * aligned.
>> +     */
>> +    ret = bdrv_get_info(unfiltered_bs, &bdi);
>> +    if (ret < 0) {
>> +        error_report("could not get block driver info");
>> +        goto out;
>> +    } else if (bdi.subcluster_size == 0) {
>> +        bdi.subcluster_size = 1;
>> +    }
>> +
>> +    write_align = bdi.subcluster_size;
>> +
>>      /* For safe rebasing we need to compare old and new backing file */
>>      if (!unsafe) {
>>          QDict *options = NULL;
>> @@ -3753,7 +3769,7 @@ static int img_rebase(int argc, char **argv)
>>          int64_t old_backing_size = 0;
>>          int64_t new_backing_size = 0;
>>          uint64_t offset;
>> -        int64_t n;
>> +        int64_t n, n_old = 0, n_new = 0;
>>          float local_progress = 0;
>>  
>>          if (blk_old_backing && bdrv_opt_mem_align(blk_bs(blk)) >
>> @@ -3799,7 +3815,8 @@ static int img_rebase(int argc, char **argv)
>>          }
>>  
>>          for (offset = 0; offset < size; offset += n) {
>> -            bool buf_old_is_zero = false;
>> +            bool old_backing_eof = false;
>> +            int64_t n_alloc;
>>  
>>              /* How many bytes can we handle with the next read? */
>>              n = MIN(IO_BUF_SIZE, size - offset);
>> @@ -3844,33 +3861,48 @@ static int img_rebase(int argc, char **argv)
>>                  }
>>              }
>>  
>> +            /*
>> +             * At this point we know that the region [offset; offset + n)
>> +             * is unallocated within the target image.  This region might be
>> +             * unaligned to the target image's (sub)cluster boundaries, as
>> +             * old backing may have smaller clusters (or have subclusters).
>> +             * We extend it to the aligned boundaries to avoid CoW on
>> +             * partial writes in blk_pwrite(),
>> +             */
>> +            n += offset - QEMU_ALIGN_DOWN(offset, write_align);
>> +            offset = QEMU_ALIGN_DOWN(offset, write_align);
> 
> If we are always aligning to write_align on each iteration of this
> loop, won't this round down always be a no-op?
> 

No, it's not always the case.  Bear in mind that when the call to
bdrv_is_allocated() operates on the target's granularity, the call to
bdrv_is_allocated_above() would make n aligned to the old backing's
granularity.  Consider the following case (it was brought by Hanna under
v1 and is now covered in tests by commit "iotests/{024, 271}: add
testcases for qemu-img rebase"):

qemu-img create -f qcow2 base.qcow2 1M
qemu-img create -f qcow2 -b base.qcow2 -F qcow2 mid.qcow2 1M
qemu-img create -f qcow2 -o cluster_size=1M -b mid.qcow2 -F qcow2
top.qcow2 1M

qemu-io -c 'write 64k 64k' mid.qcow2
qemu-img rebase -b base.qcow2 -F qcow2 top.qcow2


Base: |-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|    64K
Mid:  |-|*|-|-|-|-|-|-|-|-|-|-|-|-|-|-|    64K
Top:  |- - - - - - - - - - - - - - - -|    1M


Then we'll have (without rounding down):

offset == 0, n == 1M

1st iteration:
bdrv_is_allocated() == 0 --> n == 1M
bdrv_is_allocated_above() == 0 --> n == 64K

offset == 64K, n == 960K

2nd iteration:
bdrv_is_allocated() == 0 --> n == 960K
bdrv_is_allocated_above() == 1 --> n == 64K

And then we'll proceed writing 64K w/o rounding up or 960K w/ rounding up.

So the whole point of expanding the unaligned region to its natural
borders is to write (sub)cluster aligned chunks.  I agree that in this
particular case we don't win anything in terms of CoW since in either
case we'll end up writing exactly 1M.  However, respecting cluster
boundaries will matter when we'll do compressed writes.

If you want to make things more orderly, I can move the rounding down
part to the commit ("qemu-img: add compression option to rebase
subcommand").


>> +            n += QEMU_ALIGN_UP(offset + n, write_align) - (offset + n);
>> +            n = MIN(n, size - offset);
> 
> However, I can see how this round up can matter.
> 
>> +            assert(!bdrv_is_allocated(unfiltered_bs, offset, n, &n_alloc) &&
>> +                   n_alloc == n);
> 
> This assertion feels a bit heavyweight.  I see what you're trying to
> say: if we found a (partial) unallocated region in the destination,
> then since write_align is the minimum alignment of such allocation,
> our rounding up to alignment boundaries should not change the fact
> that we still have an unallocated region in the destination.  But we
> already checked the data from the original offset through the original
> n, and I argue that the original offset was unchanged; so really, all
> the more you'd need to assert (if the assertion is even necessary)
> could be something like
> 
> assert(new_offset == orig_offset);
> tail = new_n - orig_n;
> assert(!bdrv_is_allocated(unfiltered_bs, orig_offset+orig_n, tail, &n_alloc) && n_alloc == tail);
>

Since the original offset may in fact change as we've established above,
I still think that checking the whole expanded region for being
unallocated makes more sense, than checking just the tail.

>> +
>> +            /*
>> +             * Much like the with the target image, we'll try to read as much
>> +             * of the old and new backings as we can.
>> +             */
>> +            n_old = MIN(n, MAX(0, old_backing_size - (int64_t) offset));
>> +            if (blk_new_backing) {
>> +                n_new = MIN(n, MAX(0, new_backing_size - (int64_t) offset));
>> +            }
>> +
>>              /*
>>               * Read old and new backing file and take into consideration that
>>               * backing files may be smaller than the COW image.
>>               */
>> -            if (offset >= old_backing_size) {
>> -                memset(buf_old, 0, n);
>> -                buf_old_is_zero = true;
>> +            memset(buf_old + n_old, 0, n - n_old);
>> +            if (!n_old) {
>> +                old_backing_eof = true;
>>              } else {
>> -                if (offset + n > old_backing_size) {
>> -                    n = old_backing_size - offset;
>> -                }
>> -
>> -                ret = blk_pread(blk_old_backing, offset, n, buf_old, 0);
>> +                ret = blk_pread(blk_old_backing, offset, n_old, buf_old, 0);
> 
> Here's a more fundamental question.  Why are we reading from the old
> backing file?  At this point in time, isn't unfiltered_bs (the target
> image) still chained to the old backing file?  Why not just do a
> blk_pread() from the destination?  It will cause the block layer to
> read through the backing layers on our behalf, but the block layer
> will then take care of any needed zeroing without us having to do a
> memset here.
> 
> Then, once we have the contents of the disk (as seen through the
> destination backed by the old image), we can compare to what the new
> image would read, to see if we still need to write into the
> destination or can just let the destination rely on backing from the
> new image.
> 

Hmm, that's actually a great question.  At first glance there're indeed
no obstacles to entirely get rid of blk_old_backing.  The only drawback
in your scheme above is that we wouldn't keep track of the old backing
size thus lacking the possibility of calling blk_pwrite_zeroes()
directly right after we've reached the end of the old backing.  But I
suppose we can still keep that size in mind while reading from the
target as you've described.

This optimization hasn't crossed my mind as I was mainly focused on the
alignments here, but I suppose we can try it in one of the upcoming
series (unless someone'll point to some other drawbacks of reading
directly from target).

>>                  if (ret < 0) {
>>                      error_report("error while reading from old backing file");
>>                      goto out;
>>                  }
>>              }
>>  
>> -            if (offset >= new_backing_size || !blk_new_backing) {
>> -                memset(buf_new, 0, n);
>> -            } else {
>> -                if (offset + n > new_backing_size) {
>> -                    n = new_backing_size - offset;
>> -                }
>> -
>> -                ret = blk_pread(blk_new_backing, offset, n, buf_new, 0);
>> +            memset(buf_new + n_new, 0, n - n_new);
>> +            if (blk_new_backing && n_new) {
>> +                ret = blk_pread(blk_new_backing, offset, n_new, buf_new, 0);
>>                  if (ret < 0) {
>>                      error_report("error while reading from new backing file");
>>                      goto out;
>> @@ -3884,11 +3916,12 @@ static int img_rebase(int argc, char **argv)
>>                  int64_t pnum;
>>  
>>                  if (compare_buffers(buf_old + written, buf_new + written,
>> -                                    n - written, 0, &pnum))
>> +                                    n - written, write_align, &pnum))
>>                  {
>> -                    if (buf_old_is_zero) {
>> +                    if (old_backing_eof) {
>>                          ret = blk_pwrite_zeroes(blk, offset + written, pnum, 0);
> 
> Deciding whether to write zeroes (which can be more efficient) is
> possible for more than just when the old backing file has already
> reached EOF.
> 

Sure, as you can see I haven't changed the logic here, simply renamed
the flag to indicate the condition more accurately.  But if we end up
reading directly from the target as you've suggested, the only way I can
think of to know for sure whether we should write zeroes is iteratively
compare portions of the read buffer to a predefined zero buffer.  That's
because at this point we know for sure that the region is unallocated in
the target, and we can't query bdrv_block_status() to know that it's
reading as zeroes.

>>                      } else {
>> +                        assert(written + pnum <= IO_BUF_SIZE);
>>                          ret = blk_pwrite(blk, offset + written, pnum,
>>                                           buf_old + written, 0);
>>                      }
>> @@ -3900,6 +3933,9 @@ static int img_rebase(int argc, char **argv)
>>                  }
>>  
>>                  written += pnum;
>> +                if (offset + written >= old_backing_size) {
>> +                    old_backing_eof = true;
>> +                }
>>              }
>>              qemu_progress_print(local_progress, 100);
>>          }
>> -- 
>> 2.39.3
>>
> 
> The idea behind this patch makes sense, but the function is already so
> long and complicated and you are adding more complexity.  I'll
> continue reviewing the rest of the series, but I'll be interested in
> seeing if any other block maintainers has an opinion on this patch.
> 

Sure, with the optimization comes complexity.  Due to the fact that we
send fewer read requests and avoid redundant CoW during rebase, the
speed improves significantly.  E.g. that's the numbers I get rebasing
the topmost image in the backup chain of the VM I have at hand:

base.qcow2 <-- delta.qcow2

# qemu-img info delta.qcow2 | grep size
virtual size: 64 GiB (68719476736 bytes)
disk size: 1.6 GiB
cluster_size: 65536

# qemu-img check delta.qcow2
No errors were found on the image.
34266/1048576 = 3.27% allocated, 99.76% fragmented, 99.47% compressed
clusters
Image end offset: 1722548224

# qemu-img info base.qcow2 | grep size
virtual size: 64 GiB (68719476736 bytes)
disk size: 4.06 GiB
cluster_size: 65536

# qemu-img check base.qcow2
No errors were found on the image.
133839/1048576 = 12.76% allocated, 87.92% fragmented, 76.32% compressed
clusters
Image end offset: 4391895040

vanilla:

# time qemu-img rebase -f qcow2 -b '' delta.qcow2.vanilla

real    0m49.192s
user    0m27.253s
sys     0m11.894s

patched:

# time qemu-img rebase -f qcow2 -b '' delta.qcow2.patched

real    0m33.071s
user    0m15.475s
sys     0m8.446s

#

# diff <(qemu-img map --output=json delta.qcow2.patched) <(qemu-img map
--output=json delta.qcow2.vanilla) && echo $?
0

# du --block-size=1 delta.qcow2.patched delta.qcow2.vanilla
10409897984     delta.qcow2.patched
10409897984     delta.qcow2.vanilla

# qemu-img compare delta.qcow2.vanilla delta.qcow2.patched
Images are identical.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 3/8] qemu-img: rebase: use backing files' BlockBackend for buffer alignment
  2023-09-15 16:20 ` [PATCH v2 3/8] qemu-img: rebase: use backing files' BlockBackend for buffer alignment Andrey Drobyshev via
  2023-09-15 18:39   ` Eric Blake
@ 2023-09-19  8:18   ` Hanna Czenczek
  2023-09-19  9:26     ` Andrey Drobyshev
  1 sibling, 1 reply; 21+ messages in thread
From: Hanna Czenczek @ 2023-09-19  8:18 UTC (permalink / raw)
  To: Andrey Drobyshev, qemu-block; +Cc: qemu-devel, kwolf, eblake, den

On 15.09.23 18:20, Andrey Drobyshev wrote:
> Since commit bb1c05973cf ("qemu-img: Use qemu_blockalign"), buffers for
> the data read from the old and new backing files are aligned using
> BlockDriverState (or BlockBackend later on) referring to the target image.
> However, this isn't quite right, because buf_new is only being used for
> reading from the new backing, while buf_old is being used for both reading
> from the old backing and writing to the target.  Let's take that into account
> and use more appropriate values as alignments.
>
> Signed-off-by: Andrey Drobyshev <andrey.drobyshev@virtuozzo.com>
> ---
>   qemu-img.c | 9 +++++++--
>   1 file changed, 7 insertions(+), 2 deletions(-)
>
> diff --git a/qemu-img.c b/qemu-img.c
> index 50660ba920..d12e4a4753 100644
> --- a/qemu-img.c
> +++ b/qemu-img.c
> @@ -3750,8 +3750,13 @@ static int img_rebase(int argc, char **argv)
>           int64_t n;
>           float local_progress = 0;
>   
> -        buf_old = blk_blockalign(blk, IO_BUF_SIZE);
> -        buf_new = blk_blockalign(blk, IO_BUF_SIZE);
> +        if (blk_old_backing && bdrv_opt_mem_align(blk_bs(blk)) >
> +            bdrv_opt_mem_align(blk_bs(blk_old_backing))) {
> +            buf_old = blk_blockalign(blk, IO_BUF_SIZE);
> +        } else {
> +            buf_old = blk_blockalign(blk_old_backing, IO_BUF_SIZE);
> +        }

As I read this, if blk_old_backing is NULL, we will go to the 
blk_blockalign(blk_old_backing, IO_BUF_SIZE) path.  I think if it is 
NULL, we should align on blk instead.

Hanna

> +        buf_new = blk_blockalign(blk_new_backing, IO_BUF_SIZE);
>   
>           size = blk_getlength(blk);
>           if (size < 0) {



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 4/8] qemu-img: add chunk size parameter to compare_buffers()
  2023-09-15 16:20 ` [PATCH v2 4/8] qemu-img: add chunk size parameter to compare_buffers() Andrey Drobyshev via
  2023-09-15 19:28   ` Eric Blake
@ 2023-09-19  8:32   ` Hanna Czenczek
  1 sibling, 0 replies; 21+ messages in thread
From: Hanna Czenczek @ 2023-09-19  8:32 UTC (permalink / raw)
  To: Andrey Drobyshev, qemu-block; +Cc: qemu-devel, kwolf, eblake, den

On 15.09.23 18:20, Andrey Drobyshev wrote:
> Add @chsize param to the function which, if non-zero, would represent
> the chunk size to be used for comparison.  If it's zero, then
> BDRV_SECTOR_SIZE is used as default chunk size, which is the previous
> behaviour.
>
> In particular, we're going to use this param in img_rebase() to make the
> write requests aligned to a predefined alignment value.
>
> Signed-off-by: Andrey Drobyshev <andrey.drobyshev@virtuozzo.com>
> ---
>   qemu-img.c | 22 ++++++++++++++--------
>   1 file changed, 14 insertions(+), 8 deletions(-)

With the comment fix Eric has suggested:

Reviewed-by: Hanna Czenczek <hreitz@redhat.com>



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 3/8] qemu-img: rebase: use backing files' BlockBackend for buffer alignment
  2023-09-19  8:18   ` Hanna Czenczek
@ 2023-09-19  9:26     ` Andrey Drobyshev
  0 siblings, 0 replies; 21+ messages in thread
From: Andrey Drobyshev @ 2023-09-19  9:26 UTC (permalink / raw)
  To: Hanna Czenczek, qemu-block; +Cc: qemu-devel, kwolf, eblake, den

On 9/19/23 11:18, Hanna Czenczek wrote:
> On 15.09.23 18:20, Andrey Drobyshev wrote:
>> Since commit bb1c05973cf ("qemu-img: Use qemu_blockalign"), buffers for
>> the data read from the old and new backing files are aligned using
>> BlockDriverState (or BlockBackend later on) referring to the target
>> image.
>> However, this isn't quite right, because buf_new is only being used for
>> reading from the new backing, while buf_old is being used for both
>> reading
>> from the old backing and writing to the target.  Let's take that into
>> account
>> and use more appropriate values as alignments.
>>
>> Signed-off-by: Andrey Drobyshev <andrey.drobyshev@virtuozzo.com>
>> ---
>>   qemu-img.c | 9 +++++++--
>>   1 file changed, 7 insertions(+), 2 deletions(-)
>>
>> diff --git a/qemu-img.c b/qemu-img.c
>> index 50660ba920..d12e4a4753 100644
>> --- a/qemu-img.c
>> +++ b/qemu-img.c
>> @@ -3750,8 +3750,13 @@ static int img_rebase(int argc, char **argv)
>>           int64_t n;
>>           float local_progress = 0;
>>   -        buf_old = blk_blockalign(blk, IO_BUF_SIZE);
>> -        buf_new = blk_blockalign(blk, IO_BUF_SIZE);
>> +        if (blk_old_backing && bdrv_opt_mem_align(blk_bs(blk)) >
>> +            bdrv_opt_mem_align(blk_bs(blk_old_backing))) {
>> +            buf_old = blk_blockalign(blk, IO_BUF_SIZE);
>> +        } else {
>> +            buf_old = blk_blockalign(blk_old_backing, IO_BUF_SIZE);
>> +        }
> 
> As I read this, if blk_old_backing is NULL, we will go to the
> blk_blockalign(blk_old_backing, IO_BUF_SIZE) path.  I think if it is
> NULL, we should align on blk instead.
> 
> Hanna

You're right, thanks for noticing.  Will fix that.

> 
>> +        buf_new = blk_blockalign(blk_new_backing, IO_BUF_SIZE);
>>             size = blk_getlength(blk);
>>           if (size < 0) {
> 



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 5/8] qemu-img: rebase: avoid unnecessary COW operations
  2023-09-15 16:20 ` [PATCH v2 5/8] qemu-img: rebase: avoid unnecessary COW operations Andrey Drobyshev via
  2023-09-15 21:52   ` Eric Blake
@ 2023-09-19 10:46   ` Hanna Czenczek
  2023-09-19 16:21     ` Andrey Drobyshev
  1 sibling, 1 reply; 21+ messages in thread
From: Hanna Czenczek @ 2023-09-19 10:46 UTC (permalink / raw)
  To: Andrey Drobyshev, qemu-block; +Cc: qemu-devel, kwolf, eblake, den

On 15.09.23 18:20, Andrey Drobyshev wrote:
> When rebasing an image from one backing file to another, we need to
> compare data from old and new backings.  If the diff between that data
> happens to be unaligned to the target cluster size, we might end up
> doing partial writes, which would lead to copy-on-write and additional IO.
>
> Consider the following simple case (virtual_size == cluster_size == 64K):
>
> base <-- inc1 <-- inc2
>
> qemu-io -c "write -P 0xaa 0 32K" base.qcow2
> qemu-io -c "write -P 0xcc 32K 32K" base.qcow2
> qemu-io -c "write -P 0xbb 0 32K" inc1.qcow2
> qemu-io -c "write -P 0xcc 32K 32K" inc1.qcow2
> qemu-img rebase -f qcow2 -b base.qcow2 -F qcow2 inc2.qcow2
>
> While doing rebase, we'll write a half of the cluster to inc2, and block
> layer will have to read the 2nd half of the same cluster from the base image
> inc1 while doing this write operation, although the whole cluster is already
> read earlier to perform data comparison.
>
> In order to avoid these unnecessary IO cycles, let's make sure every
> write request is aligned to the overlay subcluster boundaries.  Using
> subcluster size is universal as for the images which don't have them
> this size equals to the cluster size, so in any case we end up aligning
> to the smallest unit of allocation.
>
> Signed-off-by: Andrey Drobyshev <andrey.drobyshev@virtuozzo.com>
> ---
>   qemu-img.c | 76 ++++++++++++++++++++++++++++++++++++++++--------------
>   1 file changed, 56 insertions(+), 20 deletions(-)

Looks good, I like the changes from v1!  Two minor things:

> diff --git a/qemu-img.c b/qemu-img.c
> index fcd31d7b5b..83950af42b 100644
> --- a/qemu-img.c
> +++ b/qemu-img.c

[...]

> @@ -3844,33 +3861,48 @@ static int img_rebase(int argc, char **argv)
>                   }
>               }
>   
> +            /*
> +             * At this point we know that the region [offset; offset + n)
> +             * is unallocated within the target image.  This region might be
> +             * unaligned to the target image's (sub)cluster boundaries, as
> +             * old backing may have smaller clusters (or have subclusters).
> +             * We extend it to the aligned boundaries to avoid CoW on
> +             * partial writes in blk_pwrite(),
> +             */
> +            n += offset - QEMU_ALIGN_DOWN(offset, write_align);
> +            offset = QEMU_ALIGN_DOWN(offset, write_align);
> +            n += QEMU_ALIGN_UP(offset + n, write_align) - (offset + n);
> +            n = MIN(n, size - offset);
> +            assert(!bdrv_is_allocated(unfiltered_bs, offset, n, &n_alloc) &&
> +                   n_alloc == n);
> +
> +            /*
> +             * Much like the with the target image, we'll try to read as much

s/the with the/with the/

> +             * of the old and new backings as we can.
> +             */
> +            n_old = MIN(n, MAX(0, old_backing_size - (int64_t) offset));
> +            if (blk_new_backing) {
> +                n_new = MIN(n, MAX(0, new_backing_size - (int64_t) offset));
> +            }

If we don’t have a check for blk_old_backing (old_backing_size is 0 if 
blk_old_backing is NULL), why do we have a check for blk_new_backing 
(new_backing_size is 0 if blk_new_backing is NULL)?

(Perhaps because the previous check was `offset >= new_backing_size || 
!blk_new_backing`, i.e. included exactly such a check – but I don’t 
think it’s necessary, new_backing_size will be 0 if blk_new_backing is 
NULL.)

> +
>               /*
>                * Read old and new backing file and take into consideration that
>                * backing files may be smaller than the COW image.
>                */
> -            if (offset >= old_backing_size) {
> -                memset(buf_old, 0, n);
> -                buf_old_is_zero = true;
> +            memset(buf_old + n_old, 0, n - n_old);
> +            if (!n_old) {
> +                old_backing_eof = true;
>               } else {
> -                if (offset + n > old_backing_size) {
> -                    n = old_backing_size - offset;
> -                }
> -
> -                ret = blk_pread(blk_old_backing, offset, n, buf_old, 0);
> +                ret = blk_pread(blk_old_backing, offset, n_old, buf_old, 0);
>                   if (ret < 0) {
>                       error_report("error while reading from old backing file");
>                       goto out;
>                   }
>               }
>   
> -            if (offset >= new_backing_size || !blk_new_backing) {
> -                memset(buf_new, 0, n);
> -            } else {
> -                if (offset + n > new_backing_size) {
> -                    n = new_backing_size - offset;
> -                }
> -
> -                ret = blk_pread(blk_new_backing, offset, n, buf_new, 0);
> +            memset(buf_new + n_new, 0, n - n_new);
> +            if (blk_new_backing && n_new) {

Same as above, I think we can drop the blk_new_backing check, just so 
that both blocks (for old and new) look the same.

(Also, the memset() already has to trust that n_new is 0 if 
blk_new_backing is NULL, so the check doesn’t make much sense from that 
perspective either, and makes the memset() look wrong.)

> +                ret = blk_pread(blk_new_backing, offset, n_new, buf_new, 0);
>                   if (ret < 0) {
>                       error_report("error while reading from new backing file");
>                       goto out;



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 6/8] iotests/{024, 271}: add testcases for qemu-img rebase
  2023-09-15 16:20 ` [PATCH v2 6/8] iotests/{024, 271}: add testcases for qemu-img rebase Andrey Drobyshev via
@ 2023-09-19 11:08   ` Hanna Czenczek
  0 siblings, 0 replies; 21+ messages in thread
From: Hanna Czenczek @ 2023-09-19 11:08 UTC (permalink / raw)
  To: Andrey Drobyshev, qemu-block; +Cc: qemu-devel, kwolf, eblake, den

On 15.09.23 18:20, Andrey Drobyshev wrote:
> As the previous commit changes the logic of "qemu-img rebase" (it's using
> write alignment now), let's add a couple more test cases which would
> ensure it works correctly.  In particular, the following scenarios:
>
> 024: add test case for rebase within one backing chain when the overlay
>       cluster size > backings cluster size;
> 271: add test case for rebase images that contain subclusters.  Check
>       that no extra allocations are being made.
>
> Signed-off-by: Andrey Drobyshev <andrey.drobyshev@virtuozzo.com>
> ---
>   tests/qemu-iotests/024     | 60 ++++++++++++++++++++++++++++++++++
>   tests/qemu-iotests/024.out | 43 +++++++++++++++++++++++++
>   tests/qemu-iotests/271     | 66 ++++++++++++++++++++++++++++++++++++++
>   tests/qemu-iotests/271.out | 42 ++++++++++++++++++++++++
>   4 files changed, 211 insertions(+)

Interestingly, the new case in 024 hangs before this series.  So it 
isn’t just an optimization, but a fix, actually!

Reviewed-by: Hanna Czenczek <hreitz@redhat.com>



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 8/8] iotests: add tests for "qemu-img rebase" with compression
  2023-09-15 16:20 ` [PATCH v2 8/8] iotests: add tests for "qemu-img rebase" with compression Andrey Drobyshev via
@ 2023-09-19 11:14   ` Hanna Czenczek
  0 siblings, 0 replies; 21+ messages in thread
From: Hanna Czenczek @ 2023-09-19 11:14 UTC (permalink / raw)
  To: Andrey Drobyshev, qemu-block; +Cc: qemu-devel, kwolf, eblake, den

On 15.09.23 18:20, Andrey Drobyshev wrote:
> The test cases considered so far:
>
> 314 (new test suite):
>
> 1. Check that compression mode isn't compatible with "-f raw" (raw
>     format doesn't support compression).
> 2. Check that rebasing an image onto no backing file preserves the data
>     and writes the copied clusters actually compressed.
> 3. Same as 2, but with a raw backing file (i.e. the clusters copied from the
>     backing are originally uncompressed -- we check they end up compressed
>     after being merged).
> 4. Remove a single delta from a backing chain, perform the same checks
>     as in 2.
> 5. Check that even when backing and overlay are initially uncompressed,
>     copied clusters end up compressed when rebase with compression is
>     performed.
>
> 271:
>
> 1. Check that when target image has subclusters, rebase with compression
>     will make an entire cluster containing the written subcluster
>     compressed.
>
> Signed-off-by: Andrey Drobyshev <andrey.drobyshev@virtuozzo.com>
> ---
>   tests/qemu-iotests/271     |  65 +++++++++++++++
>   tests/qemu-iotests/271.out |  40 +++++++++
>   tests/qemu-iotests/314     | 165 +++++++++++++++++++++++++++++++++++++
>   tests/qemu-iotests/314.out |  75 +++++++++++++++++
>   4 files changed, 345 insertions(+)
>   create mode 100755 tests/qemu-iotests/314
>   create mode 100644 tests/qemu-iotests/314.out

Reviewed-by: Hanna Czenczek <hreitz@redhat.com>



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 5/8] qemu-img: rebase: avoid unnecessary COW operations
  2023-09-19 10:46   ` Hanna Czenczek
@ 2023-09-19 16:21     ` Andrey Drobyshev
  0 siblings, 0 replies; 21+ messages in thread
From: Andrey Drobyshev @ 2023-09-19 16:21 UTC (permalink / raw)
  To: Hanna Czenczek, qemu-block; +Cc: qemu-devel, kwolf, eblake, den

On 9/19/23 13:46, Hanna Czenczek wrote:
> On 15.09.23 18:20, Andrey Drobyshev wrote:
>> When rebasing an image from one backing file to another, we need to
>> compare data from old and new backings.  If the diff between that data
>> happens to be unaligned to the target cluster size, we might end up
>> doing partial writes, which would lead to copy-on-write and additional
>> IO.
>>
>> Consider the following simple case (virtual_size == cluster_size == 64K):
>>
>> base <-- inc1 <-- inc2
>>
>> qemu-io -c "write -P 0xaa 0 32K" base.qcow2
>> qemu-io -c "write -P 0xcc 32K 32K" base.qcow2
>> qemu-io -c "write -P 0xbb 0 32K" inc1.qcow2
>> qemu-io -c "write -P 0xcc 32K 32K" inc1.qcow2
>> qemu-img rebase -f qcow2 -b base.qcow2 -F qcow2 inc2.qcow2
>>
>> While doing rebase, we'll write a half of the cluster to inc2, and block
>> layer will have to read the 2nd half of the same cluster from the base
>> image
>> inc1 while doing this write operation, although the whole cluster is
>> already
>> read earlier to perform data comparison.
>>
>> In order to avoid these unnecessary IO cycles, let's make sure every
>> write request is aligned to the overlay subcluster boundaries.  Using
>> subcluster size is universal as for the images which don't have them
>> this size equals to the cluster size, so in any case we end up aligning
>> to the smallest unit of allocation.
>>
>> Signed-off-by: Andrey Drobyshev <andrey.drobyshev@virtuozzo.com>
>> ---
>>   qemu-img.c | 76 ++++++++++++++++++++++++++++++++++++++++--------------
>>   1 file changed, 56 insertions(+), 20 deletions(-)
> 
> Looks good, I like the changes from v1!  Two minor things:
> 
>> diff --git a/qemu-img.c b/qemu-img.c
>> index fcd31d7b5b..83950af42b 100644
>> --- a/qemu-img.c
>> +++ b/qemu-img.c
> 
> [...]
> 
>> @@ -3844,33 +3861,48 @@ static int img_rebase(int argc, char **argv)
>>                   }
>>               }
>>   +            /*
>> +             * At this point we know that the region [offset; offset
>> + n)
>> +             * is unallocated within the target image.  This region
>> might be
>> +             * unaligned to the target image's (sub)cluster
>> boundaries, as
>> +             * old backing may have smaller clusters (or have
>> subclusters).
>> +             * We extend it to the aligned boundaries to avoid CoW on
>> +             * partial writes in blk_pwrite(),
>> +             */
>> +            n += offset - QEMU_ALIGN_DOWN(offset, write_align);
>> +            offset = QEMU_ALIGN_DOWN(offset, write_align);
>> +            n += QEMU_ALIGN_UP(offset + n, write_align) - (offset + n);
>> +            n = MIN(n, size - offset);
>> +            assert(!bdrv_is_allocated(unfiltered_bs, offset, n,
>> &n_alloc) &&
>> +                   n_alloc == n);
>> +
>> +            /*
>> +             * Much like the with the target image, we'll try to read
>> as much
> 
> s/the with the/with the/
>

Noted.

>> +             * of the old and new backings as we can.
>> +             */
>> +            n_old = MIN(n, MAX(0, old_backing_size - (int64_t) offset));
>> +            if (blk_new_backing) {
>> +                n_new = MIN(n, MAX(0, new_backing_size - (int64_t)
>> offset));
>> +            }
> 
> If we don’t have a check for blk_old_backing (old_backing_size is 0 if
> blk_old_backing is NULL), why do we have a check for blk_new_backing
> (new_backing_size is 0 if blk_new_backing is NULL)?
> 
> (Perhaps because the previous check was `offset >= new_backing_size ||
> !blk_new_backing`, i.e. included exactly such a check – but I don’t
> think it’s necessary, new_backing_size will be 0 if blk_new_backing is
> NULL.)
> 
>> +
>>               /*
>>                * Read old and new backing file and take into
>> consideration that
>>                * backing files may be smaller than the COW image.
>>                */
>> -            if (offset >= old_backing_size) {
>> -                memset(buf_old, 0, n);
>> -                buf_old_is_zero = true;
>> +            memset(buf_old + n_old, 0, n - n_old);
>> +            if (!n_old) {
>> +                old_backing_eof = true;
>>               } else {
>> -                if (offset + n > old_backing_size) {
>> -                    n = old_backing_size - offset;
>> -                }
>> -
>> -                ret = blk_pread(blk_old_backing, offset, n, buf_old, 0);
>> +                ret = blk_pread(blk_old_backing, offset, n_old,
>> buf_old, 0);
>>                   if (ret < 0) {
>>                       error_report("error while reading from old
>> backing file");
>>                       goto out;
>>                   }
>>               }
>>   -            if (offset >= new_backing_size || !blk_new_backing) {
>> -                memset(buf_new, 0, n);
>> -            } else {
>> -                if (offset + n > new_backing_size) {
>> -                    n = new_backing_size - offset;
>> -                }
>> -
>> -                ret = blk_pread(blk_new_backing, offset, n, buf_new, 0);
>> +            memset(buf_new + n_new, 0, n - n_new);
>> +            if (blk_new_backing && n_new) {
> 
> Same as above, I think we can drop the blk_new_backing check, just so
> that both blocks (for old and new) look the same.
> 
> (Also, the memset() already has to trust that n_new is 0 if
> blk_new_backing is NULL, so the check doesn’t make much sense from that
> perspective either, and makes the memset() look wrong.)
>

Good point, thank you.  Looks like we indeed can safely remove these
extra checks.

>> +                ret = blk_pread(blk_new_backing, offset, n_new,
>> buf_new, 0);
>>                   if (ret < 0) {
>>                       error_report("error while reading from new
>> backing file");
>>                       goto out;
> 



^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2023-09-19 16:20 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-09-15 16:20 [PATCH v2 0/8] qemu-img: rebase: add compression support Andrey Drobyshev via
2023-09-15 16:20 ` [PATCH v2 1/8] qemu-img: rebase: stop when reaching EOF of old backing file Andrey Drobyshev via
2023-09-15 16:20 ` [PATCH v2 2/8] qemu-iotests: 024: add rebasing test case for overlay_size > backing_size Andrey Drobyshev via
2023-09-15 16:20 ` [PATCH v2 3/8] qemu-img: rebase: use backing files' BlockBackend for buffer alignment Andrey Drobyshev via
2023-09-15 18:39   ` Eric Blake
2023-09-15 19:27     ` Andrey Drobyshev
2023-09-19  8:18   ` Hanna Czenczek
2023-09-19  9:26     ` Andrey Drobyshev
2023-09-15 16:20 ` [PATCH v2 4/8] qemu-img: add chunk size parameter to compare_buffers() Andrey Drobyshev via
2023-09-15 19:28   ` Eric Blake
2023-09-19  8:32   ` Hanna Czenczek
2023-09-15 16:20 ` [PATCH v2 5/8] qemu-img: rebase: avoid unnecessary COW operations Andrey Drobyshev via
2023-09-15 21:52   ` Eric Blake
2023-09-16 17:45     ` Andrey Drobyshev
2023-09-19 10:46   ` Hanna Czenczek
2023-09-19 16:21     ` Andrey Drobyshev
2023-09-15 16:20 ` [PATCH v2 6/8] iotests/{024, 271}: add testcases for qemu-img rebase Andrey Drobyshev via
2023-09-19 11:08   ` Hanna Czenczek
2023-09-15 16:20 ` [PATCH v2 7/8] qemu-img: add compression option to rebase subcommand Andrey Drobyshev via
2023-09-15 16:20 ` [PATCH v2 8/8] iotests: add tests for "qemu-img rebase" with compression Andrey Drobyshev via
2023-09-19 11:14   ` Hanna Czenczek

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.