qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/4] qemu-img compare --stat
@ 2021-10-21 10:12 Vladimir Sementsov-Ogievskiy
  2021-10-21 10:12 ` [PATCH v2 1/4] qemu-img: implement " Vladimir Sementsov-Ogievskiy
                   ` (3 more replies)
  0 siblings, 4 replies; 14+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2021-10-21 10:12 UTC (permalink / raw)
  To: qemu-block
  Cc: qemu-devel, hreitz, kwolf, nsoffer, eblake, jsnow, vsementsov,
	den, nikita.lapshin

Hi all!

v2: allow using --shallow without --stat

Recently we faced the following task:

Customer comes and say: incremental backup images are too fat. Does you
incremental backup works correct?

What to answer? We should check something. At least check that
incremental images doesn't store same data twice. And we don't have a
tool for it. I just wrote a simple python script to compare raw files
cluster-by-cluster. Then we've mounted the qcow2 images with help of
qemu-nbd, the resulting /dev/nbd* were compared and we proved that
incremental backups don't store same data.

But that leads to idea that some kind of that script would be good to
have at hand.

So, here is a new option for qemu-img compare, that is a lot more
powerful and effective than original script, and allows to compare and
calculate statistics, i.e. how many clusters differs, how many
clusters changed from unallocated to data, and so on.

For examples of output look at the test in patch 04.

Vladimir Sementsov-Ogievskiy (4):
  qemu-img: implement compare --stat
  qemu-img: make --block-size optional for compare --stat
  qemu-img: add --shallow option for qemu-img compare
  iotests: add qemu-img-compare-stat test

 docs/tools/qemu-img.rst                       |  30 +-
 qemu-img.c                                    | 269 +++++++++++++++++-
 qemu-img-cmds.hx                              |   4 +-
 .../qemu-iotests/tests/qemu-img-compare-stat  |  88 ++++++
 .../tests/qemu-img-compare-stat.out           | 106 +++++++
 5 files changed, 479 insertions(+), 18 deletions(-)
 create mode 100755 tests/qemu-iotests/tests/qemu-img-compare-stat
 create mode 100644 tests/qemu-iotests/tests/qemu-img-compare-stat.out

-- 
2.31.1



^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v2 1/4] qemu-img: implement compare --stat
  2021-10-21 10:12 [PATCH v2 0/4] qemu-img compare --stat Vladimir Sementsov-Ogievskiy
@ 2021-10-21 10:12 ` Vladimir Sementsov-Ogievskiy
  2021-10-25 16:40   ` Hanna Reitz
  2021-10-21 10:12 ` [PATCH v2 2/4] qemu-img: make --block-size optional for " Vladimir Sementsov-Ogievskiy
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 14+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2021-10-21 10:12 UTC (permalink / raw)
  To: qemu-block
  Cc: qemu-devel, hreitz, kwolf, nsoffer, eblake, jsnow, vsementsov,
	den, nikita.lapshin

With new option qemu-img compare will not stop at first mismatch, but
instead calculate statistics: how many clusters with different data,
how many clusters with equal data, how many clusters were unallocated
but become data and so on.

We compare images chunk by chunk. Chunk size depends on what
block_status returns for both images. It may return less than cluster
(remember about qcow2 subclusters), it may return more than cluster (if
several consecutive clusters share same status). Finally images may
have different cluster sizes. This all leads to ambiguity in how to
finally compare the data.

What we can say for sure is that, when we compare two qcow2 images with
same cluster size, we should compare clusters with data separately.
Otherwise, if we for example compare 10 consecutive clusters of data
where only one byte differs we'll report 10 different clusters.
Expected result in this case is 1 different cluster and 9 equal ones.

So, to serve this case and just to have some defined rule let's do the
following:

1. Select some block-size for compare procedure. In this commit it must
   be specified by user, next commit will add some automatic logic and
   make --block-size optional.

2. Go chunk-by-chunk using block_status as we do now with only one
   differency:
   If block_status() returns DATA region that intersects block-size
   aligned boundary, crop this region at this boundary.

This way it's still possible to compare less than cluster and report
subcluster-level accuracy, but we newer compare more than one cluster
of data.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 docs/tools/qemu-img.rst |  18 +++-
 qemu-img.c              | 206 +++++++++++++++++++++++++++++++++++++---
 qemu-img-cmds.hx        |   4 +-
 3 files changed, 212 insertions(+), 16 deletions(-)

diff --git a/docs/tools/qemu-img.rst b/docs/tools/qemu-img.rst
index d58980aef8..21164253d4 100644
--- a/docs/tools/qemu-img.rst
+++ b/docs/tools/qemu-img.rst
@@ -159,6 +159,18 @@ Parameters to compare subcommand:
 
   Strict mode - fail on different image size or sector allocation
 
+.. option:: --stat
+
+  Instead of exit on first mismatch compare the whole images and print
+  statistics on amount of different pairs of clusters, based on their
+  block-status and are they equal or not.
+
+.. option:: --block-size BLOCK_SIZE
+
+  Block size for comparing with ``--stat``. This doesn't guarantee exact
+  size of comparing chunks, but that guarantee that data chunks being
+  compared will never cross aligned block-size boundary.
+
 Parameters to convert subcommand:
 
 .. program:: qemu-img-convert
@@ -378,7 +390,7 @@ Command description:
 
   The rate limit for the commit process is specified by ``-r``.
 
-.. option:: compare [--object OBJECTDEF] [--image-opts] [-f FMT] [-F FMT] [-T SRC_CACHE] [-p] [-q] [-s] [-U] FILENAME1 FILENAME2
+.. option:: compare [--object OBJECTDEF] [--image-opts] [-f FMT] [-F FMT] [-T SRC_CACHE] [-p] [-q] [-s] [-U] [--stat --block-size BLOCK_SIZE] FILENAME1 FILENAME2
 
   Check if two images have the same content. You can compare images with
   different format or settings.
@@ -405,9 +417,9 @@ Command description:
   The following table sumarizes all exit codes of the compare subcommand:
 
   0
-    Images are identical (or requested help was printed)
+    Images are identical (or requested help was printed, or ``--stat`` was used)
   1
-    Images differ
+    Images differ (1 is never returned when ``--stat`` option specified)
   2
     Error on opening an image
   3
diff --git a/qemu-img.c b/qemu-img.c
index f036a1d428..79a0589167 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -83,6 +83,8 @@ enum {
     OPTION_BITMAPS = 275,
     OPTION_FORCE = 276,
     OPTION_SKIP_BROKEN = 277,
+    OPTION_STAT = 277,
+    OPTION_BLOCK_SIZE = 278,
 };
 
 typedef enum OutputFormat {
@@ -1304,6 +1306,107 @@ static int check_empty_sectors(BlockBackend *blk, int64_t offset,
     return 0;
 }
 
+#define IMG_CMP_STATUS_MASK (BDRV_BLOCK_DATA | BDRV_BLOCK_ZERO | \
+                             BDRV_BLOCK_ALLOCATED)
+#define IMG_CMP_STATUS_MAX (IMG_CMP_STATUS_MASK | BDRV_BLOCK_EOF)
+
+typedef struct ImgCmpStat {
+    /* stat: [ret: 0 is equal, 1 is not][status1][status2] -> n_bytes */
+    uint64_t stat[2][IMG_CMP_STATUS_MAX + 1][IMG_CMP_STATUS_MAX + 1];
+} ImgCmpStat;
+
+/*
+ * To denote chunks after EOF when compared files are of different size, use
+ * corresponding status = -1
+ */
+static void cmp_stat_account(ImgCmpStat *stat, bool differs,
+                             int status1, int status2, int64_t bytes)
+{
+    assert(status1 >= -1);
+    if (status1 == -1) {
+        /*
+         * Note BDRV_BLOCK_EOF: we abuse it here a bit. User is not interested
+         * in EOF flag in the last chunk of file (especially when we trim the
+         * chunk and EOF becomes incorrect), so BDRV_BLOCK_EOF is not in
+         * IMG_CMP_STATUS_MASK. We instead use BDRV_BLOCK_EOF to illustrate
+         * chunks after-end-of-file in shorter file.
+         */
+        status1 = BDRV_BLOCK_EOF | BDRV_BLOCK_ZERO;
+    } else {
+        status1 &= IMG_CMP_STATUS_MASK;
+    }
+
+    assert(status2 >= -1);
+    if (status2 == -1) {
+        status2 = BDRV_BLOCK_EOF | BDRV_BLOCK_ZERO;
+    } else {
+        status2 &= IMG_CMP_STATUS_MASK;
+    }
+
+    stat->stat[differs][status1][status2] += bytes;
+}
+
+static void cmp_stat_print_agenda(void)
+{
+    printf("Compare stats:\n"
+           "Agenda\n"
+           "D: DATA\n"
+           "Z: ZERO\n"
+           "A: ALLOCATED\n"
+           "E: after end of file\n\n");
+}
+
+static void cmp_stat_print_status(int status)
+{
+    printf("%s%s%s%s",
+           status & BDRV_BLOCK_DATA ? "D" : "_",
+           status & BDRV_BLOCK_ZERO ? "Z" : "_",
+           status & BDRV_BLOCK_ALLOCATED ? "A" : "_",
+           status & BDRV_BLOCK_EOF ? "E" : "_");
+}
+
+static void cmp_stat_print(ImgCmpStat *stat, int64_t total_bytes)
+{
+    int ret, status1, status2;
+
+    cmp_stat_print_agenda();
+
+    for (ret = 0; ret <= 1; ret++) {
+        uint64_t total_in_group = 0;
+
+        if (ret == 0) {
+            printf("Equal:\n");
+        } else {
+            printf("\nUnequal:\n");
+        }
+
+        for (status1 = 0; status1 <= IMG_CMP_STATUS_MAX; status1++) {
+            for (status2 = 0; status2 <= IMG_CMP_STATUS_MAX; status2++) {
+                uint64_t bytes = stat->stat[ret][status1][status2];
+                g_autofree char *bytes_str = NULL;
+
+                if (!bytes) {
+                    continue;
+                }
+
+                total_in_group += bytes;
+
+                cmp_stat_print_status(status1);
+                printf(" -> ");
+                cmp_stat_print_status(status2);
+
+                bytes_str = size_to_str(bytes);
+                printf(" %" PRIu64 " bytes (%s) %.1f%%\n", bytes, bytes_str,
+                       100.0 * bytes / total_bytes);
+            }
+        }
+
+        if (!total_in_group) {
+            printf("<nothing>\n");
+        }
+    }
+}
+
 /*
  * Compares two images. Exit codes:
  *
@@ -1320,6 +1423,8 @@ static int img_compare(int argc, char **argv)
     uint8_t *buf1 = NULL, *buf2 = NULL;
     int64_t pnum1, pnum2;
     int allocated1, allocated2;
+    int status1, status2;
+    int64_t block_end;
     int ret = 0; /* return value - 0 Ident, 1 Different, >1 Error */
     bool progress = false, quiet = false, strict = false;
     int flags;
@@ -1331,6 +1436,9 @@ static int img_compare(int argc, char **argv)
     uint64_t progress_base;
     bool image_opts = false;
     bool force_share = false;
+    ImgCmpStat stat = {0};
+    bool do_stat;
+    int64_t block_size = 0;
 
     cache = BDRV_DEFAULT_CACHE;
     for (;;) {
@@ -1339,6 +1447,8 @@ static int img_compare(int argc, char **argv)
             {"object", required_argument, 0, OPTION_OBJECT},
             {"image-opts", no_argument, 0, OPTION_IMAGE_OPTS},
             {"force-share", no_argument, 0, 'U'},
+            {"stat", no_argument, 0, OPTION_STAT},
+            {"block-size", required_argument, 0, OPTION_BLOCK_SIZE},
             {0, 0, 0, 0}
         };
         c = getopt_long(argc, argv, ":hf:F:T:pqsU",
@@ -1395,6 +1505,15 @@ static int img_compare(int argc, char **argv)
         case OPTION_IMAGE_OPTS:
             image_opts = true;
             break;
+        case OPTION_STAT:
+            do_stat = true;
+            break;
+        case OPTION_BLOCK_SIZE:
+            block_size = cvtnum_full("block size", optarg, 1, INT64_MAX);
+            if (block_size < 0) {
+                exit(EXIT_SUCCESS);
+            }
+            break;
         }
     }
 
@@ -1410,6 +1529,20 @@ static int img_compare(int argc, char **argv)
     filename1 = argv[optind++];
     filename2 = argv[optind++];
 
+    if (!do_stat && block_size) {
+        error_report("--block-size can be used only together with --stat");
+        ret = 1;
+        goto out;
+    }
+
+    if (do_stat && !block_size) {
+        /* TODO: make block-size optional */
+        error_report("You must specify --block-size together with --stat");
+        ret = 1;
+        goto out;
+    }
+
+
     /* Initialize before goto out */
     qemu_progress_init(progress, 2.0);
 
@@ -1465,7 +1598,7 @@ static int img_compare(int argc, char **argv)
     }
 
     while (offset < total_size) {
-        int status1, status2;
+        block_end = QEMU_ALIGN_UP(offset + 1, block_size);
 
         status1 = bdrv_block_status_above(bs1, NULL, offset,
                                           total_size1 - offset, &pnum1, NULL,
@@ -1476,6 +1609,10 @@ static int img_compare(int argc, char **argv)
             goto out;
         }
         allocated1 = status1 & BDRV_BLOCK_ALLOCATED;
+        if (do_stat && (status1 & BDRV_BLOCK_DATA)) {
+            /* Don't compare cross-block data */
+            pnum1 = MIN(block_end, offset + pnum1) - offset;
+        }
 
         status2 = bdrv_block_status_above(bs2, NULL, offset,
                                           total_size2 - offset, &pnum2, NULL,
@@ -1486,11 +1623,15 @@ static int img_compare(int argc, char **argv)
             goto out;
         }
         allocated2 = status2 & BDRV_BLOCK_ALLOCATED;
+        if (do_stat && (status2 & BDRV_BLOCK_DATA)) {
+            /* Don't compare cross-block data */
+            pnum2 = MIN(block_end, offset + pnum2) - offset;
+        }
 
         assert(pnum1 && pnum2);
         chunk = MIN(pnum1, pnum2);
 
-        if (strict) {
+        if (strict && !do_stat) {
             if (status1 != status2) {
                 ret = 1;
                 qprintf(quiet, "Strict mode: Offset %" PRId64
@@ -1499,7 +1640,8 @@ static int img_compare(int argc, char **argv)
             }
         }
         if ((status1 & BDRV_BLOCK_ZERO) && (status2 & BDRV_BLOCK_ZERO)) {
-            /* nothing to do */
+            /* nothing to do, equal: */
+            ret = 0;
         } else if (allocated1 == allocated2) {
             if (allocated1) {
                 int64_t pnum;
@@ -1523,25 +1665,37 @@ static int img_compare(int argc, char **argv)
                 }
                 ret = compare_buffers(buf1, buf2, chunk, &pnum);
                 if (ret || pnum != chunk) {
-                    qprintf(quiet, "Content mismatch at offset %" PRId64 "!\n",
+                    qprintf(quiet || do_stat,
+                            "Content mismatch at offset %" PRId64 "!\n",
                             offset + (ret ? 0 : pnum));
                     ret = 1;
-                    goto out;
                 }
+            } else {
+                /* Consider unallocated areas equal. */
+                ret = 0;
             }
         } else {
             chunk = MIN(chunk, IO_BUF_SIZE);
             if (allocated1) {
                 ret = check_empty_sectors(blk1, offset, chunk,
-                                          filename1, buf1, quiet);
+                                          filename1, buf1, quiet || do_stat);
             } else {
                 ret = check_empty_sectors(blk2, offset, chunk,
-                                          filename2, buf1, quiet);
+                                          filename2, buf1, quiet || do_stat);
             }
-            if (ret) {
+            assert(ret == 0 || ret == 1 || ret == 4);
+            if (ret == 4) {
                 goto out;
             }
         }
+        assert(ret == 0 || ret == 1);
+
+        if (do_stat) {
+            cmp_stat_account(&stat, ret, status1, status2, chunk);
+        } else if (ret) {
+            goto out;
+        }
+
         offset += chunk;
         qemu_progress_print(((float) chunk / progress_base) * 100, 100);
     }
@@ -1549,17 +1703,24 @@ static int img_compare(int argc, char **argv)
     if (total_size1 != total_size2) {
         BlockBackend *blk_over;
         const char *filename_over;
+        int *status_over;
 
         qprintf(quiet, "Warning: Image size mismatch!\n");
         if (total_size1 > total_size2) {
             blk_over = blk1;
             filename_over = filename1;
+            status_over = &status1;
+            status2 = -1; /* Denote after-EOF for cmp_stat_account() */
         } else {
             blk_over = blk2;
             filename_over = filename2;
+            status1 = -1; /* Denote after-EOF for cmp_stat_account() */
+            status_over = &status2;
         }
 
         while (offset < progress_base) {
+            block_end = QEMU_ALIGN_UP(offset + 1, block_size);
+
             ret = bdrv_block_status_above(blk_bs(blk_over), NULL, offset,
                                           progress_base - offset, &chunk,
                                           NULL, NULL);
@@ -1570,20 +1731,43 @@ static int img_compare(int argc, char **argv)
                 goto out;
 
             }
+            if (do_stat && (ret & BDRV_BLOCK_DATA)) {
+                /* Don't compare cross-block data */
+                chunk = MIN(block_end, offset + chunk) - offset;
+            }
+            *status_over = ret;
+
             if (ret & BDRV_BLOCK_ALLOCATED && !(ret & BDRV_BLOCK_ZERO)) {
                 chunk = MIN(chunk, IO_BUF_SIZE);
                 ret = check_empty_sectors(blk_over, offset, chunk,
-                                          filename_over, buf1, quiet);
-                if (ret) {
+                                          filename_over, buf1,
+                                          quiet || do_stat);
+                assert(ret == 0 || ret == 1 || ret == 4);
+                if (ret == 4) {
                     goto out;
                 }
+            } else {
+                ret = 0;
             }
+            assert(ret == 0 || ret == 1);
+
+            if (do_stat) {
+                cmp_stat_account(&stat, ret, status1, status2, chunk);
+            } else if (ret) {
+                goto out;
+            }
+
             offset += chunk;
             qemu_progress_print(((float) chunk / progress_base) * 100, 100);
         }
     }
 
-    qprintf(quiet, "Images are identical.\n");
+    if (do_stat) {
+        cmp_stat_print(&stat, progress_base);
+    } else {
+        qprintf(quiet, "Images are identical.\n");
+    }
+
     ret = 0;
 
 out:
diff --git a/qemu-img-cmds.hx b/qemu-img-cmds.hx
index 4c4d94ab22..0b2d63ca5f 100644
--- a/qemu-img-cmds.hx
+++ b/qemu-img-cmds.hx
@@ -40,9 +40,9 @@ SRST
 ERST
 
 DEF("compare", img_compare,
-    "compare [--object objectdef] [--image-opts] [-f fmt] [-F fmt] [-T src_cache] [-p] [-q] [-s] [-U] filename1 filename2")
+    "compare [--object objectdef] [--image-opts] [-f fmt] [-F fmt] [-T src_cache] [-p] [-q] [-s] [-U] [--stat --block-size BLOCK_SIZE] filename1 filename2")
 SRST
-.. option:: compare [--object OBJECTDEF] [--image-opts] [-f FMT] [-F FMT] [-T SRC_CACHE] [-p] [-q] [-s] [-U] FILENAME1 FILENAME2
+.. option:: compare [--object OBJECTDEF] [--image-opts] [-f FMT] [-F FMT] [-T SRC_CACHE] [-p] [-q] [-s] [-U] [--stat --block-size BLOCK_SIZE] FILENAME1 FILENAME2
 ERST
 
 DEF("convert", img_convert,
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v2 2/4] qemu-img: make --block-size optional for compare --stat
  2021-10-21 10:12 [PATCH v2 0/4] qemu-img compare --stat Vladimir Sementsov-Ogievskiy
  2021-10-21 10:12 ` [PATCH v2 1/4] qemu-img: implement " Vladimir Sementsov-Ogievskiy
@ 2021-10-21 10:12 ` Vladimir Sementsov-Ogievskiy
  2021-10-26  8:07   ` Hanna Reitz
  2021-10-21 10:12 ` [PATCH v2 3/4] qemu-img: add --shallow option for qemu-img compare Vladimir Sementsov-Ogievskiy
  2021-10-21 10:12 ` [PATCH v2 4/4] iotests: add qemu-img-compare-stat test Vladimir Sementsov-Ogievskiy
  3 siblings, 1 reply; 14+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2021-10-21 10:12 UTC (permalink / raw)
  To: qemu-block
  Cc: qemu-devel, hreitz, kwolf, nsoffer, eblake, jsnow, vsementsov,
	den, nikita.lapshin

Let's detect block-size automatically if not specified by user:

 If both files define cluster-size, use minimum to be more precise.
 If both files don't specify cluster-size, use default of 64K
 If only one file specify cluster-size, just use it.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 docs/tools/qemu-img.rst |  7 +++-
 qemu-img.c              | 71 ++++++++++++++++++++++++++++++++++++-----
 qemu-img-cmds.hx        |  4 +--
 3 files changed, 71 insertions(+), 11 deletions(-)

diff --git a/docs/tools/qemu-img.rst b/docs/tools/qemu-img.rst
index 21164253d4..4b382ca2b0 100644
--- a/docs/tools/qemu-img.rst
+++ b/docs/tools/qemu-img.rst
@@ -170,6 +170,11 @@ Parameters to compare subcommand:
   Block size for comparing with ``--stat``. This doesn't guarantee exact
   size of comparing chunks, but that guarantee that data chunks being
   compared will never cross aligned block-size boundary.
+  When unspecified the following logic is used:
+
+    - If both files define cluster-size, use minimum to be more precise.
+    - If both files don't specify cluster-size, use default of 64K
+    - If only one file specify cluster-size, just use it.
 
 Parameters to convert subcommand:
 
@@ -390,7 +395,7 @@ Command description:
 
   The rate limit for the commit process is specified by ``-r``.
 
-.. option:: compare [--object OBJECTDEF] [--image-opts] [-f FMT] [-F FMT] [-T SRC_CACHE] [-p] [-q] [-s] [-U] [--stat --block-size BLOCK_SIZE] FILENAME1 FILENAME2
+.. option:: compare [--object OBJECTDEF] [--image-opts] [-f FMT] [-F FMT] [-T SRC_CACHE] [-p] [-q] [-s] [-U] [--stat [--block-size BLOCK_SIZE]] FILENAME1 FILENAME2
 
   Check if two images have the same content. You can compare images with
   different format or settings.
diff --git a/qemu-img.c b/qemu-img.c
index 79a0589167..61e7f470bb 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -1407,6 +1407,61 @@ static void cmp_stat_print(ImgCmpStat *stat, int64_t total_bytes)
     }
 }
 
+/* Get default value for qemu-img compare --block-size option. */
+static int img_compare_block_size(BlockDriverState *bs1,
+                                  BlockDriverState *bs2,
+                                  bool quiet)
+{
+    const int default_block_size = 64 * 1024; /* 64K */
+
+    int ret;
+    BlockDriverInfo bdi;
+    int cluster_size1, cluster_size2, block_size;
+    const char *note = "Note: to alter it, set --block-size option.";
+    const char *fname1 = bs1->filename;
+    const char *fname2 = bs2->filename;
+
+    ret = bdrv_get_info(bs1, &bdi);
+    if (ret < 0 && ret != -ENOTSUP) {
+        error_report("Failed to get info of %s: %s", fname1, strerror(-ret));
+        return ret;
+    }
+    cluster_size1 = ret < 0 ? 0 : bdi.cluster_size;
+
+    ret = bdrv_get_info(bs2, &bdi);
+    if (ret < 0 && ret != -ENOTSUP) {
+        error_report("Failed to get info of %s: %s", fname2, strerror(-ret));
+        return ret;
+    }
+    cluster_size2 = ret < 0 ? 0 : bdi.cluster_size;
+
+    if (cluster_size1 > 0 && cluster_size2 > 0) {
+        if (cluster_size1 == cluster_size2) {
+            block_size = cluster_size1;
+        } else {
+            block_size = MIN(cluster_size1, cluster_size2);
+            qprintf(quiet, "%s and %s has different cluster sizes: %d and %d "
+                    "correspondingly. Use minimum as block-size for "
+                    "accuracy: %d. %s\n",
+                    fname1, fname2, cluster_size1,
+                    cluster_size2, block_size, note);
+        }
+    } else if (cluster_size1 == 0 && cluster_size2 == 0) {
+        block_size = default_block_size;
+        qprintf(quiet, "Neither of %s and %s has explicit cluster size. Use "
+                "default of %d bytes. %s\n", fname1, fname2, block_size, note);
+    } else {
+        block_size = MAX(cluster_size1, cluster_size2);
+        qprintf(quiet, "%s has explicit cluster size of %d and %s "
+                "doesn't have one. Use %d as block-size. %s\n",
+                cluster_size1 ? fname1 : fname2, block_size,
+                cluster_size1 ? fname2 : fname1,
+                block_size, note);
+    }
+
+    return block_size;
+}
+
 /*
  * Compares two images. Exit codes:
  *
@@ -1535,14 +1590,6 @@ static int img_compare(int argc, char **argv)
         goto out;
     }
 
-    if (do_stat && !block_size) {
-        /* TODO: make block-size optional */
-        error_report("You must specify --block-size together with --stat");
-        ret = 1;
-        goto out;
-    }
-
-
     /* Initialize before goto out */
     qemu_progress_init(progress, 2.0);
 
@@ -1589,6 +1636,14 @@ static int img_compare(int argc, char **argv)
     total_size = MIN(total_size1, total_size2);
     progress_base = MAX(total_size1, total_size2);
 
+    if (do_stat && !block_size) {
+        block_size = img_compare_block_size(bs1, bs2, quiet);
+        if (block_size <= 0) {
+            ret = 4;
+            goto out;
+        }
+    }
+
     qemu_progress_print(0, 100);
 
     if (strict && total_size1 != total_size2) {
diff --git a/qemu-img-cmds.hx b/qemu-img-cmds.hx
index 0b2d63ca5f..96a193eea8 100644
--- a/qemu-img-cmds.hx
+++ b/qemu-img-cmds.hx
@@ -40,9 +40,9 @@ SRST
 ERST
 
 DEF("compare", img_compare,
-    "compare [--object objectdef] [--image-opts] [-f fmt] [-F fmt] [-T src_cache] [-p] [-q] [-s] [-U] [--stat --block-size BLOCK_SIZE] filename1 filename2")
+    "compare [--object objectdef] [--image-opts] [-f fmt] [-F fmt] [-T src_cache] [-p] [-q] [-s] [-U] [--stat [--block-size BLOCK_SIZE]] filename1 filename2")
 SRST
-.. option:: compare [--object OBJECTDEF] [--image-opts] [-f FMT] [-F FMT] [-T SRC_CACHE] [-p] [-q] [-s] [-U] [--stat --block-size BLOCK_SIZE] FILENAME1 FILENAME2
+.. option:: compare [--object OBJECTDEF] [--image-opts] [-f FMT] [-F FMT] [-T SRC_CACHE] [-p] [-q] [-s] [-U] [--stat [--block-size BLOCK_SIZE]] FILENAME1 FILENAME2
 ERST
 
 DEF("convert", img_convert,
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v2 3/4] qemu-img: add --shallow option for qemu-img compare
  2021-10-21 10:12 [PATCH v2 0/4] qemu-img compare --stat Vladimir Sementsov-Ogievskiy
  2021-10-21 10:12 ` [PATCH v2 1/4] qemu-img: implement " Vladimir Sementsov-Ogievskiy
  2021-10-21 10:12 ` [PATCH v2 2/4] qemu-img: make --block-size optional for " Vladimir Sementsov-Ogievskiy
@ 2021-10-21 10:12 ` Vladimir Sementsov-Ogievskiy
  2021-10-26  8:11   ` Hanna Reitz
  2021-10-21 10:12 ` [PATCH v2 4/4] iotests: add qemu-img-compare-stat test Vladimir Sementsov-Ogievskiy
  3 siblings, 1 reply; 14+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2021-10-21 10:12 UTC (permalink / raw)
  To: qemu-block
  Cc: qemu-devel, hreitz, kwolf, nsoffer, eblake, jsnow, vsementsov,
	den, nikita.lapshin

Allow compare only top images of backing chains. This is useful to
compare images with same backing file or to compare incremental images
from the chain of incremental backups with "--stat" option.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 docs/tools/qemu-img.rst | 9 ++++++++-
 qemu-img.c              | 8 ++++++--
 qemu-img-cmds.hx        | 4 ++--
 3 files changed, 16 insertions(+), 5 deletions(-)

diff --git a/docs/tools/qemu-img.rst b/docs/tools/qemu-img.rst
index 4b382ca2b0..4ae9543472 100644
--- a/docs/tools/qemu-img.rst
+++ b/docs/tools/qemu-img.rst
@@ -176,6 +176,13 @@ Parameters to compare subcommand:
     - If both files don't specify cluster-size, use default of 64K
     - If only one file specify cluster-size, just use it.
 
+.. option:: --shallow
+
+  This option prevents opening and comparing any backing files.
+  This is useful to compare images with same backing file or to compare
+  incremental images from the chain of incremental backups with
+  ``--stat`` option.
+
 Parameters to convert subcommand:
 
 .. program:: qemu-img-convert
@@ -395,7 +402,7 @@ Command description:
 
   The rate limit for the commit process is specified by ``-r``.
 
-.. option:: compare [--object OBJECTDEF] [--image-opts] [-f FMT] [-F FMT] [-T SRC_CACHE] [-p] [-q] [-s] [-U] [--stat [--block-size BLOCK_SIZE]] FILENAME1 FILENAME2
+.. option:: compare [--object OBJECTDEF] [--image-opts] [-f FMT] [-F FMT] [-T SRC_CACHE] [-p] [-q] [-s] [-U] [--stat [--block-size BLOCK_SIZE]] [--shallow] FILENAME1 FILENAME2
 
   Check if two images have the same content. You can compare images with
   different format or settings.
diff --git a/qemu-img.c b/qemu-img.c
index 61e7f470bb..c9b5067353 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -85,6 +85,7 @@ enum {
     OPTION_SKIP_BROKEN = 277,
     OPTION_STAT = 277,
     OPTION_BLOCK_SIZE = 278,
+    OPTION_SHALLOW = 279,
 };
 
 typedef enum OutputFormat {
@@ -1482,7 +1483,7 @@ static int img_compare(int argc, char **argv)
     int64_t block_end;
     int ret = 0; /* return value - 0 Ident, 1 Different, >1 Error */
     bool progress = false, quiet = false, strict = false;
-    int flags;
+    int flags = 0;
     bool writethrough;
     int64_t total_size;
     int64_t offset = 0;
@@ -1504,6 +1505,7 @@ static int img_compare(int argc, char **argv)
             {"force-share", no_argument, 0, 'U'},
             {"stat", no_argument, 0, OPTION_STAT},
             {"block-size", required_argument, 0, OPTION_BLOCK_SIZE},
+            {"shallow", no_argument, 0, OPTION_SHALLOW},
             {0, 0, 0, 0}
         };
         c = getopt_long(argc, argv, ":hf:F:T:pqsU",
@@ -1569,6 +1571,9 @@ static int img_compare(int argc, char **argv)
                 exit(EXIT_SUCCESS);
             }
             break;
+        case OPTION_SHALLOW:
+            flags |= BDRV_O_NO_BACKING;
+            break;
         }
     }
 
@@ -1593,7 +1598,6 @@ static int img_compare(int argc, char **argv)
     /* Initialize before goto out */
     qemu_progress_init(progress, 2.0);
 
-    flags = 0;
     ret = bdrv_parse_cache_mode(cache, &flags, &writethrough);
     if (ret < 0) {
         error_report("Invalid source cache option: %s", cache);
diff --git a/qemu-img-cmds.hx b/qemu-img-cmds.hx
index 96a193eea8..6b164767fd 100644
--- a/qemu-img-cmds.hx
+++ b/qemu-img-cmds.hx
@@ -40,9 +40,9 @@ SRST
 ERST
 
 DEF("compare", img_compare,
-    "compare [--object objectdef] [--image-opts] [-f fmt] [-F fmt] [-T src_cache] [-p] [-q] [-s] [-U] [--stat [--block-size BLOCK_SIZE]] filename1 filename2")
+    "compare [--object objectdef] [--image-opts] [-f fmt] [-F fmt] [-T src_cache] [-p] [-q] [-s] [-U] [--stat [--block-size BLOCK_SIZE]] [--shallow] filename1 filename2")
 SRST
-.. option:: compare [--object OBJECTDEF] [--image-opts] [-f FMT] [-F FMT] [-T SRC_CACHE] [-p] [-q] [-s] [-U] [--stat [--block-size BLOCK_SIZE]] FILENAME1 FILENAME2
+.. option:: compare [--object OBJECTDEF] [--image-opts] [-f FMT] [-F FMT] [-T SRC_CACHE] [-p] [-q] [-s] [-U] [--stat [--block-size BLOCK_SIZE]] [--shallow] FILENAME1 FILENAME2
 ERST
 
 DEF("convert", img_convert,
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v2 4/4] iotests: add qemu-img-compare-stat test
  2021-10-21 10:12 [PATCH v2 0/4] qemu-img compare --stat Vladimir Sementsov-Ogievskiy
                   ` (2 preceding siblings ...)
  2021-10-21 10:12 ` [PATCH v2 3/4] qemu-img: add --shallow option for qemu-img compare Vladimir Sementsov-Ogievskiy
@ 2021-10-21 10:12 ` Vladimir Sementsov-Ogievskiy
  2021-10-26  8:26   ` Hanna Reitz
  3 siblings, 1 reply; 14+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2021-10-21 10:12 UTC (permalink / raw)
  To: qemu-block
  Cc: qemu-devel, hreitz, kwolf, nsoffer, eblake, jsnow, vsementsov,
	den, nikita.lapshin

Test new feature qemu-img compare --stat.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 .../qemu-iotests/tests/qemu-img-compare-stat  |  88 +++++++++++++++
 .../tests/qemu-img-compare-stat.out           | 106 ++++++++++++++++++
 2 files changed, 194 insertions(+)
 create mode 100755 tests/qemu-iotests/tests/qemu-img-compare-stat
 create mode 100644 tests/qemu-iotests/tests/qemu-img-compare-stat.out

diff --git a/tests/qemu-iotests/tests/qemu-img-compare-stat b/tests/qemu-iotests/tests/qemu-img-compare-stat
new file mode 100755
index 0000000000..e2c0bcc7ef
--- /dev/null
+++ b/tests/qemu-iotests/tests/qemu-img-compare-stat
@@ -0,0 +1,88 @@
+#!/usr/bin/env python3
+#
+# Test qemu-img compare --stat
+#
+# Copyright (c) 2021 Virtuozzo International GmbH.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+#
+
+import iotests
+from iotests import qemu_img_create, qemu_io, qemu_img_log, log
+
+iotests.script_initialize(supported_fmts=['qcow2'])
+
+a, b, c = iotests.file_path('a', 'b', 'c')
+
+log('= compare two images =\n')
+
+qemu_img_create('-f', iotests.imgfmt, a, '1M')
+qemu_img_create('-f', iotests.imgfmt, b, '1M')
+
+# equal data and zero
+qemu_io('-c', 'write -z 0 64K', a)
+qemu_io('-c', 'write -P 0 0 64K', b)
+
+# different data
+qemu_io('-c', 'write -P 1 64K 64K', a)
+qemu_io('-c', 'write -P 0 64K 64K', b)
+
+# equal data
+qemu_io('-c', 'write -P 2 128K 64K', a)
+qemu_io('-c', 'write -P 2 128K 64K', b)
+
+# equal unallocated and allocated zero
+qemu_io('-c', 'write -z 192K 64K', b)
+
+# unequal data and unallocated zero
+qemu_io('-c', 'write -P 3 256K 64K', a)
+
+qemu_img_log('compare', '--stat', a, b)
+
+log('\n= compare two increments =\n')
+
+qemu_img_create('-f', iotests.imgfmt, a, '1M')
+qemu_img_create('-f', iotests.imgfmt, '-b', a, '-F', iotests.imgfmt, b, '1M')
+qemu_img_create('-f', iotests.imgfmt, '-b', b, '-F', iotests.imgfmt, c, '1M')
+
+qemu_io('-c', 'write -P 1 0 1M', a)
+qemu_io('-c', 'write -P 2 0 64K', b)
+qemu_io('-c', 'write -P 3 64K 64K', c)
+qemu_img_log('compare', '--stat', b, c)
+
+log('\n= compare two increments with --shallow=\n')
+qemu_img_log('compare', '--stat', '--shallow', b, c)
+
+log('\n= compare images of different size =\n')
+qemu_img_create('-f', iotests.imgfmt, a, '1M')
+qemu_img_create('-f', iotests.imgfmt, b, '2M')
+qemu_io('-c', 'write -P 1 0 1M', a)
+qemu_io('-c', 'write -P 2 0 1M', b)
+qemu_io('-c', 'write -P 1 1M 64K', b)
+qemu_io('-c', f'write -z {1024 + 64 * 2}K 64K', b)
+qemu_io('-c', f'write -P 0 {1024 + 64 * 3}K 64K', b)
+qemu_img_log('compare', '--stat', a, b)
+
+log('\n= compare images with only 512 bytes different =\n')
+qemu_img_create('-f', iotests.imgfmt, a, '1M')
+qemu_img_create('-f', iotests.imgfmt, b, '1M')
+qemu_io('-c', 'write -P 1 0 1M', a)
+qemu_io('-c', 'write -P 2 0 512', b)
+qemu_io('-c', f'write -P 1 512 {1024 * 1024 - 512}', b)
+qemu_img_log('compare', '--stat', a, b)
+
+log('\n= compare images with only 512 bytes different, block-size=4K =\n')
+qemu_img_log('compare', '--stat', '--block-size', '4K', a, b)
+
+log('\n= end =')
diff --git a/tests/qemu-iotests/tests/qemu-img-compare-stat.out b/tests/qemu-iotests/tests/qemu-img-compare-stat.out
new file mode 100644
index 0000000000..0dec76feb6
--- /dev/null
+++ b/tests/qemu-iotests/tests/qemu-img-compare-stat.out
@@ -0,0 +1,106 @@
+= compare two images =
+
+Compare stats:
+Agenda
+D: DATA
+Z: ZERO
+A: ALLOCATED
+E: after end of file
+
+Equal:
+_Z__ -> _Z__ 720896 bytes (704 KiB) 68.8%
+_Z__ -> _ZA_ 65536 bytes (64 KiB) 6.2%
+D_A_ -> D_A_ 65536 bytes (64 KiB) 6.2%
+_ZA_ -> D_A_ 65536 bytes (64 KiB) 6.2%
+
+Unequal:
+D_A_ -> _Z__ 65536 bytes (64 KiB) 6.2%
+D_A_ -> D_A_ 65536 bytes (64 KiB) 6.2%
+
+
+= compare two increments =
+
+Compare stats:
+Agenda
+D: DATA
+Z: ZERO
+A: ALLOCATED
+E: after end of file
+
+Equal:
+D_A_ -> D_A_ 983040 bytes (960 KiB) 93.8%
+
+Unequal:
+D_A_ -> D_A_ 65536 bytes (64 KiB) 6.2%
+
+
+= compare two increments with --shallow=
+
+Compare stats:
+Agenda
+D: DATA
+Z: ZERO
+A: ALLOCATED
+E: after end of file
+
+Equal:
+_Z__ -> _Z__ 917504 bytes (896 KiB) 87.5%
+
+Unequal:
+_Z__ -> D_A_ 65536 bytes (64 KiB) 6.2%
+D_A_ -> _Z__ 65536 bytes (64 KiB) 6.2%
+
+
+= compare images of different size =
+
+Warning: Image size mismatch!
+Compare stats:
+Agenda
+D: DATA
+Z: ZERO
+A: ALLOCATED
+E: after end of file
+
+Equal:
+_Z_E -> _Z__ 851968 bytes (832 KiB) 40.6%
+_Z_E -> D_A_ 65536 bytes (64 KiB) 3.1%
+_Z_E -> _ZA_ 65536 bytes (64 KiB) 3.1%
+
+Unequal:
+D_A_ -> D_A_ 1048576 bytes (1 MiB) 50.0%
+_Z_E -> D_A_ 65536 bytes (64 KiB) 3.1%
+
+
+= compare images with only 512 bytes different =
+
+Compare stats:
+Agenda
+D: DATA
+Z: ZERO
+A: ALLOCATED
+E: after end of file
+
+Equal:
+D_A_ -> D_A_ 983040 bytes (960 KiB) 93.8%
+
+Unequal:
+D_A_ -> D_A_ 65536 bytes (64 KiB) 6.2%
+
+
+= compare images with only 512 bytes different, block-size=4K =
+
+Compare stats:
+Agenda
+D: DATA
+Z: ZERO
+A: ALLOCATED
+E: after end of file
+
+Equal:
+D_A_ -> D_A_ 1044480 bytes (0.996 MiB) 99.6%
+
+Unequal:
+D_A_ -> D_A_ 4096 bytes (4 KiB) 0.4%
+
+
+= end =
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 1/4] qemu-img: implement compare --stat
  2021-10-21 10:12 ` [PATCH v2 1/4] qemu-img: implement " Vladimir Sementsov-Ogievskiy
@ 2021-10-25 16:40   ` Hanna Reitz
  2021-10-26  7:53     ` Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 14+ messages in thread
From: Hanna Reitz @ 2021-10-25 16:40 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-block
  Cc: kwolf, eblake, qemu-devel, nsoffer, nikita.lapshin, den, jsnow

On 21.10.21 12:12, Vladimir Sementsov-Ogievskiy wrote:
> With new option qemu-img compare will not stop at first mismatch, but
> instead calculate statistics: how many clusters with different data,
> how many clusters with equal data, how many clusters were unallocated
> but become data and so on.
>
> We compare images chunk by chunk. Chunk size depends on what
> block_status returns for both images. It may return less than cluster
> (remember about qcow2 subclusters), it may return more than cluster (if
> several consecutive clusters share same status). Finally images may
> have different cluster sizes. This all leads to ambiguity in how to
> finally compare the data.
>
> What we can say for sure is that, when we compare two qcow2 images with
> same cluster size, we should compare clusters with data separately.
> Otherwise, if we for example compare 10 consecutive clusters of data
> where only one byte differs we'll report 10 different clusters.
> Expected result in this case is 1 different cluster and 9 equal ones.
>
> So, to serve this case and just to have some defined rule let's do the
> following:
>
> 1. Select some block-size for compare procedure. In this commit it must
>     be specified by user, next commit will add some automatic logic and
>     make --block-size optional.
>
> 2. Go chunk-by-chunk using block_status as we do now with only one
>     differency:
>     If block_status() returns DATA region that intersects block-size
>     aligned boundary, crop this region at this boundary.
>
> This way it's still possible to compare less than cluster and report
> subcluster-level accuracy, but we newer compare more than one cluster
> of data.
>
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>   docs/tools/qemu-img.rst |  18 +++-
>   qemu-img.c              | 206 +++++++++++++++++++++++++++++++++++++---
>   qemu-img-cmds.hx        |   4 +-
>   3 files changed, 212 insertions(+), 16 deletions(-)

Looks good to me overall!  Just some technical comments below.

> diff --git a/docs/tools/qemu-img.rst b/docs/tools/qemu-img.rst
> index d58980aef8..21164253d4 100644
> --- a/docs/tools/qemu-img.rst
> +++ b/docs/tools/qemu-img.rst
> @@ -159,6 +159,18 @@ Parameters to compare subcommand:
>   
>     Strict mode - fail on different image size or sector allocation
>   
> +.. option:: --stat
> +
> +  Instead of exit on first mismatch compare the whole images and print
> +  statistics on amount of different pairs of clusters, based on their
> +  block-status and are they equal or not.

I’d phrase this as:

Instead of exiting on the first mismatch, compare the whole images and 
print statistics on how much they differ in terms of block status (i.e. 
are blocks allocated or not, do they contain data, are they marked as 
containing only zeroes) and block content (a block of data that contains 
only zero still has the same content as a marked-zero block).

> +
> +.. option:: --block-size BLOCK_SIZE
> +
> +  Block size for comparing with ``--stat``. This doesn't guarantee exact
> +  size of comparing chunks, but that guarantee that data chunks being
> +  compared will never cross aligned block-size boundary.

I’d do just some minor tweaks to the second sentence:

This doesn't guarantee an exact size for comparing chunks, but it does 
guarantee that those chunks will never cross a block-size-aligned boundary.

> +
>   Parameters to convert subcommand:
>   
>   .. program:: qemu-img-convert

[...]

> diff --git a/qemu-img.c b/qemu-img.c
> index f036a1d428..79a0589167 100644
> --- a/qemu-img.c
> +++ b/qemu-img.c
> @@ -83,6 +83,8 @@ enum {
>       OPTION_BITMAPS = 275,
>       OPTION_FORCE = 276,
>       OPTION_SKIP_BROKEN = 277,
> +    OPTION_STAT = 277,

That doesn’t look ideal, I believe `OPTION_STAT` should have a different 
value than `OPTION_SKIP_BROKEN`.  (I guess a rebase is to blame?)

> +    OPTION_BLOCK_SIZE = 278,
>   };
>   
>   typedef enum OutputFormat {
> @@ -1304,6 +1306,107 @@ static int check_empty_sectors(BlockBackend *blk, int64_t offset,
>       return 0;
>   }
>   
> +#define IMG_CMP_STATUS_MASK (BDRV_BLOCK_DATA | BDRV_BLOCK_ZERO | \
> +                             BDRV_BLOCK_ALLOCATED)
> +#define IMG_CMP_STATUS_MAX (IMG_CMP_STATUS_MASK | BDRV_BLOCK_EOF)
> +
> +typedef struct ImgCmpStat {
> +    /* stat: [ret: 0 is equal, 1 is not][status1][status2] -> n_bytes */
> +    uint64_t stat[2][IMG_CMP_STATUS_MAX + 1][IMG_CMP_STATUS_MAX + 1];

`IMG_CMP_STATUS_MAX` isn’t packed tightly because it only has four bits 
set (0x33).  That in itself isn’t a problem, but it means that 
`IMG_CMP_STATUS_MAX + 1` is 52, and so this array’s size is 52 * 52 * 2 
* sizeof(uint64_t) = 43264.  Again, that isn’t a problem in itself 
(although it is a bit sad that this could fit into 16 * 16 * 2 * 8 = 4 
kB), but in `img_compare()` [1], you put this structure on the stack, 
and I believe it’s too big for that.

> +} ImgCmpStat;

[...]

> +static void cmp_stat_print_agenda(void)
> +{
> +    printf("Compare stats:\n"
> +           "Agenda\n"

I’m more used to the term “Key” for this, but my experience is mostly 
limited to the git-backport-diff output, so it’s not that strong.

> +           "D: DATA\n"
> +           "Z: ZERO\n"
> +           "A: ALLOCATED\n"
> +           "E: after end of file\n\n");
> +}

[...]

> @@ -1331,6 +1436,9 @@ static int img_compare(int argc, char **argv)
>       uint64_t progress_base;
>       bool image_opts = false;
>       bool force_share = false;
> +    ImgCmpStat stat = {0};

[1] (Here is where `ImgCmpStat` goes on the stack, which it shouldn’t, 
considering it’s over 40 kB in size.)

> +    bool do_stat;
> +    int64_t block_size = 0;
>   
>       cache = BDRV_DEFAULT_CACHE;
>       for (;;) {

[...]

> @@ -1395,6 +1505,15 @@ static int img_compare(int argc, char **argv)
>           case OPTION_IMAGE_OPTS:
>               image_opts = true;
>               break;
> +        case OPTION_STAT:
> +            do_stat = true;
> +            break;
> +        case OPTION_BLOCK_SIZE:
> +            block_size = cvtnum_full("block size", optarg, 1, INT64_MAX);
> +            if (block_size < 0) {
> +                exit(EXIT_SUCCESS);

Shouldn’t this be 2 instead of `EXIT_SUCCESS`?

(Above for --object we use `EXIT_SUCCESS`, but only for the case where a 
help text was printed.  When parsing fails, we exit with 2, which is 
documented as the error code (“>1”).)

> +            }
> +            break;
>           }
>       }
>   
> @@ -1410,6 +1529,20 @@ static int img_compare(int argc, char **argv)
>       filename1 = argv[optind++];
>       filename2 = argv[optind++];
>   
> +    if (!do_stat && block_size) {
> +        error_report("--block-size can be used only together with --stat");
> +        ret = 1;

2 is the error code, 1 means “success, but images differ”.

> +        goto out;

The `out` label frees `buf1` and `buf2`, and unrefs `blk1` and `blk2`.  
My gcc complains that `blk1` and `blk2` have not been initialized by 
this point, though.  I believe we should go to `out3` here.

> +    }
> +
> +    if (do_stat && !block_size) {
> +        /* TODO: make block-size optional */
> +        error_report("You must specify --block-size together with --stat");
> +        ret = 1;
> +        goto out;

Same here.

> +    }
> +
> +
>       /* Initialize before goto out */
>       qemu_progress_init(progress, 2.0);
>   

[...]

> @@ -1486,11 +1623,15 @@ static int img_compare(int argc, char **argv)
>               goto out;
>           }
>           allocated2 = status2 & BDRV_BLOCK_ALLOCATED;
> +        if (do_stat && (status2 & BDRV_BLOCK_DATA)) {
> +            /* Don't compare cross-block data */
> +            pnum2 = MIN(block_end, offset + pnum2) - offset;
> +        }
>   
>           assert(pnum1 && pnum2);
>           chunk = MIN(pnum1, pnum2);
>   
> -        if (strict) {
> +        if (strict && !do_stat) {

Question is, do we want to allow strict mode together with --stat at 
all?  I think I’d prefer making it an outright error.

Hanna

>               if (status1 != status2) {
>                   ret = 1;
>                   qprintf(quiet, "Strict mode: Offset %" PRId64



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 1/4] qemu-img: implement compare --stat
  2021-10-25 16:40   ` Hanna Reitz
@ 2021-10-26  7:53     ` Vladimir Sementsov-Ogievskiy
  2021-10-26  8:47       ` Hanna Reitz
  0 siblings, 1 reply; 14+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2021-10-26  7:53 UTC (permalink / raw)
  To: Hanna Reitz, qemu-block
  Cc: qemu-devel, kwolf, nsoffer, eblake, jsnow, den, nikita.lapshin

25.10.2021 19:40, Hanna Reitz wrote:
> On 21.10.21 12:12, Vladimir Sementsov-Ogievskiy wrote:
>> With new option qemu-img compare will not stop at first mismatch, but
>> instead calculate statistics: how many clusters with different data,
>> how many clusters with equal data, how many clusters were unallocated
>> but become data and so on.
>>
>> We compare images chunk by chunk. Chunk size depends on what
>> block_status returns for both images. It may return less than cluster
>> (remember about qcow2 subclusters), it may return more than cluster (if
>> several consecutive clusters share same status). Finally images may
>> have different cluster sizes. This all leads to ambiguity in how to
>> finally compare the data.
>>
>> What we can say for sure is that, when we compare two qcow2 images with
>> same cluster size, we should compare clusters with data separately.
>> Otherwise, if we for example compare 10 consecutive clusters of data
>> where only one byte differs we'll report 10 different clusters.
>> Expected result in this case is 1 different cluster and 9 equal ones.
>>
>> So, to serve this case and just to have some defined rule let's do the
>> following:
>>
>> 1. Select some block-size for compare procedure. In this commit it must
>>     be specified by user, next commit will add some automatic logic and
>>     make --block-size optional.
>>
>> 2. Go chunk-by-chunk using block_status as we do now with only one
>>     differency:
>>     If block_status() returns DATA region that intersects block-size
>>     aligned boundary, crop this region at this boundary.
>>
>> This way it's still possible to compare less than cluster and report
>> subcluster-level accuracy, but we newer compare more than one cluster
>> of data.
>>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>> ---
>>   docs/tools/qemu-img.rst |  18 +++-
>>   qemu-img.c              | 206 +++++++++++++++++++++++++++++++++++++---
>>   qemu-img-cmds.hx        |   4 +-
>>   3 files changed, 212 insertions(+), 16 deletions(-)
> 
> Looks good to me overall!  Just some technical comments below.
> 
>> diff --git a/docs/tools/qemu-img.rst b/docs/tools/qemu-img.rst
>> index d58980aef8..21164253d4 100644
>> --- a/docs/tools/qemu-img.rst
>> +++ b/docs/tools/qemu-img.rst
>> @@ -159,6 +159,18 @@ Parameters to compare subcommand:
>>     Strict mode - fail on different image size or sector allocation
>> +.. option:: --stat
>> +
>> +  Instead of exit on first mismatch compare the whole images and print
>> +  statistics on amount of different pairs of clusters, based on their
>> +  block-status and are they equal or not.
> 
> I’d phrase this as:
> 
> Instead of exiting on the first mismatch, compare the whole images and print statistics on how much they differ in terms of block status (i.e. are blocks allocated or not, do they contain data, are they marked as containing only zeroes) and block content (a block of data that contains only zero still has the same content as a marked-zero block).

For me the rest starting from "and block content" sounds unclear, seems doesn't add any information to previous (i.e. are blocks allocated ...)

> 
>> +
>> +.. option:: --block-size BLOCK_SIZE
>> +
>> +  Block size for comparing with ``--stat``. This doesn't guarantee exact
>> +  size of comparing chunks, but that guarantee that data chunks being
>> +  compared will never cross aligned block-size boundary.
> 
> I’d do just some minor tweaks to the second sentence:
> 
> This doesn't guarantee an exact size for comparing chunks, but it does guarantee that those chunks will never cross a block-size-aligned boundary.

OK, sounds good

> 
>> +
>>   Parameters to convert subcommand:
>>   .. program:: qemu-img-convert
> 
> [...]
> 
>> diff --git a/qemu-img.c b/qemu-img.c
>> index f036a1d428..79a0589167 100644
>> --- a/qemu-img.c
>> +++ b/qemu-img.c
>> @@ -83,6 +83,8 @@ enum {
>>       OPTION_BITMAPS = 275,
>>       OPTION_FORCE = 276,
>>       OPTION_SKIP_BROKEN = 277,
>> +    OPTION_STAT = 277,
> 
> That doesn’t look ideal, I believe `OPTION_STAT` should have a different value than `OPTION_SKIP_BROKEN`.  (I guess a rebase is to blame?)

Oops

> 
>> +    OPTION_BLOCK_SIZE = 278,
>>   };
>>   typedef enum OutputFormat {
>> @@ -1304,6 +1306,107 @@ static int check_empty_sectors(BlockBackend *blk, int64_t offset,
>>       return 0;
>>   }
>> +#define IMG_CMP_STATUS_MASK (BDRV_BLOCK_DATA | BDRV_BLOCK_ZERO | \
>> +                             BDRV_BLOCK_ALLOCATED)
>> +#define IMG_CMP_STATUS_MAX (IMG_CMP_STATUS_MASK | BDRV_BLOCK_EOF)
>> +
>> +typedef struct ImgCmpStat {
>> +    /* stat: [ret: 0 is equal, 1 is not][status1][status2] -> n_bytes */
>> +    uint64_t stat[2][IMG_CMP_STATUS_MAX + 1][IMG_CMP_STATUS_MAX + 1];
> 
> `IMG_CMP_STATUS_MAX` isn’t packed tightly because it only has four bits set (0x33).  That in itself isn’t a problem, but it means that `IMG_CMP_STATUS_MAX + 1` is 52, and so this array’s size is 52 * 52 * 2 * sizeof(uint64_t) = 43264.  Again, that isn’t a problem in itself (although it is a bit sad that this could fit into 16 * 16 * 2 * 8 = 4 kB), but in `img_compare()` [1], you put this structure on the stack, and I believe it’s too big for that.

Hmm. May be, it's better just use GHashTables and don't bother with these sparse arrays

> 
>> +} ImgCmpStat;
> 
> [...]
> 
>> +static void cmp_stat_print_agenda(void)
>> +{
>> +    printf("Compare stats:\n"
>> +           "Agenda\n"
> 
> I’m more used to the term “Key” for this, but my experience is mostly limited to the git-backport-diff output, so it’s not that strong.

I don't care, "key" is OK for me.

> 
>> +           "D: DATA\n"
>> +           "Z: ZERO\n"
>> +           "A: ALLOCATED\n"
>> +           "E: after end of file\n\n");
>> +}
> 
> [...]
> 
>> @@ -1331,6 +1436,9 @@ static int img_compare(int argc, char **argv)
>>       uint64_t progress_base;
>>       bool image_opts = false;
>>       bool force_share = false;
>> +    ImgCmpStat stat = {0};
> 
> [1] (Here is where `ImgCmpStat` goes on the stack, which it shouldn’t, considering it’s over 40 kB in size.)
> 
>> +    bool do_stat;
>> +    int64_t block_size = 0;
>>       cache = BDRV_DEFAULT_CACHE;
>>       for (;;) {
> 
> [...]
> 
>> @@ -1395,6 +1505,15 @@ static int img_compare(int argc, char **argv)
>>           case OPTION_IMAGE_OPTS:
>>               image_opts = true;
>>               break;
>> +        case OPTION_STAT:
>> +            do_stat = true;
>> +            break;
>> +        case OPTION_BLOCK_SIZE:
>> +            block_size = cvtnum_full("block size", optarg, 1, INT64_MAX);
>> +            if (block_size < 0) {
>> +                exit(EXIT_SUCCESS);
> 
> Shouldn’t this be 2 instead of `EXIT_SUCCESS`?

Oops, right, it should be an error

> 
> (Above for --object we use `EXIT_SUCCESS`, but only for the case where a help text was printed.  When parsing fails, we exit with 2, which is documented as the error code (“>1”).)
> 
>> +            }
>> +            break;
>>           }
>>       }
>> @@ -1410,6 +1529,20 @@ static int img_compare(int argc, char **argv)
>>       filename1 = argv[optind++];
>>       filename2 = argv[optind++];
>> +    if (!do_stat && block_size) {
>> +        error_report("--block-size can be used only together with --stat");
>> +        ret = 1;
> 
> 2 is the error code, 1 means “success, but images differ”.

Will fix

> 
>> +        goto out;
> 
> The `out` label frees `buf1` and `buf2`, and unrefs `blk1` and `blk2`. My gcc complains that `blk1` and `blk2` have not been initialized by this point, though.  I believe we should go to `out3` here.
> 
>> +    }
>> +
>> +    if (do_stat && !block_size) {
>> +        /* TODO: make block-size optional */
>> +        error_report("You must specify --block-size together with --stat");
>> +        ret = 1;
>> +        goto out;
> 
> Same here.
> 
>> +    }
>> +
>> +
>>       /* Initialize before goto out */
>>       qemu_progress_init(progress, 2.0);
> 
> [...]
> 
>> @@ -1486,11 +1623,15 @@ static int img_compare(int argc, char **argv)
>>               goto out;
>>           }
>>           allocated2 = status2 & BDRV_BLOCK_ALLOCATED;
>> +        if (do_stat && (status2 & BDRV_BLOCK_DATA)) {
>> +            /* Don't compare cross-block data */
>> +            pnum2 = MIN(block_end, offset + pnum2) - offset;
>> +        }
>>           assert(pnum1 && pnum2);
>>           chunk = MIN(pnum1, pnum2);
>> -        if (strict) {
>> +        if (strict && !do_stat) {
> 
> Question is, do we want to allow strict mode together with --stat at all?  I think I’d prefer making it an outright error.
> 

Hmm. I think, I tend to agree, will do.

> 
>>               if (status1 != status2) {
>>                   ret = 1;
>>                   qprintf(quiet, "Strict mode: Offset %" PRId64
> 


Thanks!

-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 2/4] qemu-img: make --block-size optional for compare --stat
  2021-10-21 10:12 ` [PATCH v2 2/4] qemu-img: make --block-size optional for " Vladimir Sementsov-Ogievskiy
@ 2021-10-26  8:07   ` Hanna Reitz
  2021-10-28 10:04     ` Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 14+ messages in thread
From: Hanna Reitz @ 2021-10-26  8:07 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-block
  Cc: kwolf, eblake, qemu-devel, nsoffer, nikita.lapshin, den, jsnow

On 21.10.21 12:12, Vladimir Sementsov-Ogievskiy wrote:
> Let's detect block-size automatically if not specified by user:
>
>   If both files define cluster-size, use minimum to be more precise.
>   If both files don't specify cluster-size, use default of 64K
>   If only one file specify cluster-size, just use it.
>
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>   docs/tools/qemu-img.rst |  7 +++-
>   qemu-img.c              | 71 ++++++++++++++++++++++++++++++++++++-----
>   qemu-img-cmds.hx        |  4 +--
>   3 files changed, 71 insertions(+), 11 deletions(-)

Looks good, just grammar nits and a request for an assertion below.

> diff --git a/docs/tools/qemu-img.rst b/docs/tools/qemu-img.rst
> index 21164253d4..4b382ca2b0 100644
> --- a/docs/tools/qemu-img.rst
> +++ b/docs/tools/qemu-img.rst
> @@ -170,6 +170,11 @@ Parameters to compare subcommand:
>     Block size for comparing with ``--stat``. This doesn't guarantee exact
>     size of comparing chunks, but that guarantee that data chunks being
>     compared will never cross aligned block-size boundary.
> +  When unspecified the following logic is used:
> +
> +    - If both files define cluster-size, use minimum to be more precise.
> +    - If both files don't specify cluster-size, use default of 64K
> +    - If only one file specify cluster-size, just use it.

s/specify/specifies/; and perhaps s/it/that/

[...]

> diff --git a/qemu-img.c b/qemu-img.c
> index 79a0589167..61e7f470bb 100644
> --- a/qemu-img.c
> +++ b/qemu-img.c
> @@ -1407,6 +1407,61 @@ static void cmp_stat_print(ImgCmpStat *stat, int64_t total_bytes)
>       }
>   }
>   
> +/* Get default value for qemu-img compare --block-size option. */
> +static int img_compare_block_size(BlockDriverState *bs1,
> +                                  BlockDriverState *bs2,
> +                                  bool quiet)
> +{
> +    const int default_block_size = 64 * 1024; /* 64K */
> +
> +    int ret;
> +    BlockDriverInfo bdi;
> +    int cluster_size1, cluster_size2, block_size;
> +    const char *note = "Note: to alter it, set --block-size option.";
> +    const char *fname1 = bs1->filename;
> +    const char *fname2 = bs2->filename;
> +
> +    ret = bdrv_get_info(bs1, &bdi);
> +    if (ret < 0 && ret != -ENOTSUP) {
> +        error_report("Failed to get info of %s: %s", fname1, strerror(-ret));
> +        return ret;
> +    }
> +    cluster_size1 = ret < 0 ? 0 : bdi.cluster_size;
> +
> +    ret = bdrv_get_info(bs2, &bdi);
> +    if (ret < 0 && ret != -ENOTSUP) {
> +        error_report("Failed to get info of %s: %s", fname2, strerror(-ret));
> +        return ret;
> +    }
> +    cluster_size2 = ret < 0 ? 0 : bdi.cluster_size;
> +

I’d feel better with an `assert(cluster_size1 >= 0 && cluster_size2 >= 
0);` here.

> +    if (cluster_size1 > 0 && cluster_size2 > 0) {
> +        if (cluster_size1 == cluster_size2) {
> +            block_size = cluster_size1;
> +        } else {
> +            block_size = MIN(cluster_size1, cluster_size2);
> +            qprintf(quiet, "%s and %s has different cluster sizes: %d and %d "

s/has/have/

> +                    "correspondingly. Use minimum as block-size for "

s/correspondingly/respectively/; s/Use/Using/ (“Use” sounds like an 
imperative)

> +                    "accuracy: %d. %s\n",
> +                    fname1, fname2, cluster_size1,
> +                    cluster_size2, block_size, note);
> +        }
> +    } else if (cluster_size1 == 0 && cluster_size2 == 0) {
> +        block_size = default_block_size;
> +        qprintf(quiet, "Neither of %s and %s has explicit cluster size. Use "

s/has/have an/; s/Use/Using/

> +                "default of %d bytes. %s\n", fname1, fname2, block_size, note);
> +    } else {
> +        block_size = MAX(cluster_size1, cluster_size2);
> +        qprintf(quiet, "%s has explicit cluster size of %d and %s "

s/has/has an/

> +                "doesn't have one. Use %d as block-size. %s\n",

s/Use/Using/

Hanna

> +                cluster_size1 ? fname1 : fname2, block_size,
> +                cluster_size1 ? fname2 : fname1,
> +                block_size, note);
> +    }
> +
> +    return block_size;
> +}
> +
>   /*
>    * Compares two images. Exit codes:
>    *



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 3/4] qemu-img: add --shallow option for qemu-img compare
  2021-10-21 10:12 ` [PATCH v2 3/4] qemu-img: add --shallow option for qemu-img compare Vladimir Sementsov-Ogievskiy
@ 2021-10-26  8:11   ` Hanna Reitz
  0 siblings, 0 replies; 14+ messages in thread
From: Hanna Reitz @ 2021-10-26  8:11 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-block
  Cc: kwolf, eblake, qemu-devel, nsoffer, nikita.lapshin, den, jsnow

On 21.10.21 12:12, Vladimir Sementsov-Ogievskiy wrote:
> Allow compare only top images of backing chains. This is useful to
> compare images with same backing file or to compare incremental images
> from the chain of incremental backups with "--stat" option.
>
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>   docs/tools/qemu-img.rst | 9 ++++++++-
>   qemu-img.c              | 8 ++++++--
>   qemu-img-cmds.hx        | 4 ++--
>   3 files changed, 16 insertions(+), 5 deletions(-)

Reviewed-by: Hanna Reitz <hreitz@redhat.com>



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 4/4] iotests: add qemu-img-compare-stat test
  2021-10-21 10:12 ` [PATCH v2 4/4] iotests: add qemu-img-compare-stat test Vladimir Sementsov-Ogievskiy
@ 2021-10-26  8:26   ` Hanna Reitz
  0 siblings, 0 replies; 14+ messages in thread
From: Hanna Reitz @ 2021-10-26  8:26 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-block
  Cc: kwolf, eblake, qemu-devel, nsoffer, nikita.lapshin, den, jsnow

On 21.10.21 12:12, Vladimir Sementsov-Ogievskiy wrote:
> Test new feature qemu-img compare --stat.
>
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>   .../qemu-iotests/tests/qemu-img-compare-stat  |  88 +++++++++++++++
>   .../tests/qemu-img-compare-stat.out           | 106 ++++++++++++++++++
>   2 files changed, 194 insertions(+)
>   create mode 100755 tests/qemu-iotests/tests/qemu-img-compare-stat
>   create mode 100644 tests/qemu-iotests/tests/qemu-img-compare-stat.out

Reviewed-by: Hanna Reitz <hreitz@redhat.com>



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 1/4] qemu-img: implement compare --stat
  2021-10-26  7:53     ` Vladimir Sementsov-Ogievskiy
@ 2021-10-26  8:47       ` Hanna Reitz
  2021-10-26  9:18         ` Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 14+ messages in thread
From: Hanna Reitz @ 2021-10-26  8:47 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-block
  Cc: kwolf, eblake, qemu-devel, nsoffer, nikita.lapshin, den, jsnow

On 26.10.21 09:53, Vladimir Sementsov-Ogievskiy wrote:
> 25.10.2021 19:40, Hanna Reitz wrote:
>> On 21.10.21 12:12, Vladimir Sementsov-Ogievskiy wrote:
>>> With new option qemu-img compare will not stop at first mismatch, but
>>> instead calculate statistics: how many clusters with different data,
>>> how many clusters with equal data, how many clusters were unallocated
>>> but become data and so on.
>>>
>>> We compare images chunk by chunk. Chunk size depends on what
>>> block_status returns for both images. It may return less than cluster
>>> (remember about qcow2 subclusters), it may return more than cluster (if
>>> several consecutive clusters share same status). Finally images may
>>> have different cluster sizes. This all leads to ambiguity in how to
>>> finally compare the data.
>>>
>>> What we can say for sure is that, when we compare two qcow2 images with
>>> same cluster size, we should compare clusters with data separately.
>>> Otherwise, if we for example compare 10 consecutive clusters of data
>>> where only one byte differs we'll report 10 different clusters.
>>> Expected result in this case is 1 different cluster and 9 equal ones.
>>>
>>> So, to serve this case and just to have some defined rule let's do the
>>> following:
>>>
>>> 1. Select some block-size for compare procedure. In this commit it must
>>>     be specified by user, next commit will add some automatic logic and
>>>     make --block-size optional.
>>>
>>> 2. Go chunk-by-chunk using block_status as we do now with only one
>>>     differency:
>>>     If block_status() returns DATA region that intersects block-size
>>>     aligned boundary, crop this region at this boundary.
>>>
>>> This way it's still possible to compare less than cluster and report
>>> subcluster-level accuracy, but we newer compare more than one cluster
>>> of data.
>>>
>>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>>> ---
>>>   docs/tools/qemu-img.rst |  18 +++-
>>>   qemu-img.c              | 206 
>>> +++++++++++++++++++++++++++++++++++++---
>>>   qemu-img-cmds.hx        |   4 +-
>>>   3 files changed, 212 insertions(+), 16 deletions(-)
>>
>> Looks good to me overall!  Just some technical comments below.
>>
>>> diff --git a/docs/tools/qemu-img.rst b/docs/tools/qemu-img.rst
>>> index d58980aef8..21164253d4 100644
>>> --- a/docs/tools/qemu-img.rst
>>> +++ b/docs/tools/qemu-img.rst
>>> @@ -159,6 +159,18 @@ Parameters to compare subcommand:
>>>     Strict mode - fail on different image size or sector allocation
>>> +.. option:: --stat
>>> +
>>> +  Instead of exit on first mismatch compare the whole images and print
>>> +  statistics on amount of different pairs of clusters, based on their
>>> +  block-status and are they equal or not.
>>
>> I’d phrase this as:
>>
>> Instead of exiting on the first mismatch, compare the whole images 
>> and print statistics on how much they differ in terms of block status 
>> (i.e. are blocks allocated or not, do they contain data, are they 
>> marked as containing only zeroes) and block content (a block of data 
>> that contains only zero still has the same content as a marked-zero 
>> block).
>
> For me the rest starting from "and block content" sounds unclear, 
> seems doesn't add any information to previous (i.e. are blocks 
> allocated ...)

By “block content” I meant what you said by “equal or not”, i.e. what is 
returned when reading from the block.

Now that I think about it again, I believe we should go with your 
original “equal or not”, though, because that reflects what qemu-img 
--stat prints, like so perhaps:

Instead of exiting on the first mismatch, compare the whole images and 
print statistics on the amount of different pairs of blocks, based on 
their block status and whether they are equal or not.

I’d still like to add something like what I had in parentheses, though, 
because as a user, I’d find the “block status” and “equal or not” terms 
to be a bit handwave-y.  I don’t think “block status” is a common term 
in our documentation, so I wanted to add some examples; and I wanted to 
show by example that “equal” blocks don’t need to have the same block 
status.

[...]

>>> @@ -1304,6 +1306,107 @@ static int check_empty_sectors(BlockBackend 
>>> *blk, int64_t offset,
>>>       return 0;
>>>   }
>>> +#define IMG_CMP_STATUS_MASK (BDRV_BLOCK_DATA | BDRV_BLOCK_ZERO | \
>>> +                             BDRV_BLOCK_ALLOCATED)
>>> +#define IMG_CMP_STATUS_MAX (IMG_CMP_STATUS_MASK | BDRV_BLOCK_EOF)
>>> +
>>> +typedef struct ImgCmpStat {
>>> +    /* stat: [ret: 0 is equal, 1 is not][status1][status2] -> 
>>> n_bytes */
>>> +    uint64_t stat[2][IMG_CMP_STATUS_MAX + 1][IMG_CMP_STATUS_MAX + 1];
>>
>> `IMG_CMP_STATUS_MAX` isn’t packed tightly because it only has four 
>> bits set (0x33).  That in itself isn’t a problem, but it means that 
>> `IMG_CMP_STATUS_MAX + 1` is 52, and so this array’s size is 52 * 52 * 
>> 2 * sizeof(uint64_t) = 43264.  Again, that isn’t a problem in itself 
>> (although it is a bit sad that this could fit into 16 * 16 * 2 * 8 = 
>> 4 kB), but in `img_compare()` [1], you put this structure on the 
>> stack, and I believe it’s too big for that.
>
> Hmm. May be, it's better just use GHashTables and don't bother with 
> these sparse arrays

Or we could use our own bits here (ALLOCATED = (1 << 2), EOF = (1 << 3)) 
and have a small function that translates BDRV_BLOCK_* values into them.

In any case, I don’t mind the sparseness too much, it’s just that it 
shouldn’t go on the stack.

Hanna



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 1/4] qemu-img: implement compare --stat
  2021-10-26  8:47       ` Hanna Reitz
@ 2021-10-26  9:18         ` Vladimir Sementsov-Ogievskiy
  2021-10-27  8:35           ` Hanna Reitz
  0 siblings, 1 reply; 14+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2021-10-26  9:18 UTC (permalink / raw)
  To: Hanna Reitz, qemu-block
  Cc: qemu-devel, kwolf, nsoffer, eblake, jsnow, den, nikita.lapshin

26.10.2021 11:47, Hanna Reitz wrote:
> On 26.10.21 09:53, Vladimir Sementsov-Ogievskiy wrote:
>> 25.10.2021 19:40, Hanna Reitz wrote:
>>> On 21.10.21 12:12, Vladimir Sementsov-Ogievskiy wrote:
>>>> With new option qemu-img compare will not stop at first mismatch, but
>>>> instead calculate statistics: how many clusters with different data,
>>>> how many clusters with equal data, how many clusters were unallocated
>>>> but become data and so on.
>>>>
>>>> We compare images chunk by chunk. Chunk size depends on what
>>>> block_status returns for both images. It may return less than cluster
>>>> (remember about qcow2 subclusters), it may return more than cluster (if
>>>> several consecutive clusters share same status). Finally images may
>>>> have different cluster sizes. This all leads to ambiguity in how to
>>>> finally compare the data.
>>>>
>>>> What we can say for sure is that, when we compare two qcow2 images with
>>>> same cluster size, we should compare clusters with data separately.
>>>> Otherwise, if we for example compare 10 consecutive clusters of data
>>>> where only one byte differs we'll report 10 different clusters.
>>>> Expected result in this case is 1 different cluster and 9 equal ones.
>>>>
>>>> So, to serve this case and just to have some defined rule let's do the
>>>> following:
>>>>
>>>> 1. Select some block-size for compare procedure. In this commit it must
>>>>     be specified by user, next commit will add some automatic logic and
>>>>     make --block-size optional.
>>>>
>>>> 2. Go chunk-by-chunk using block_status as we do now with only one
>>>>     differency:
>>>>     If block_status() returns DATA region that intersects block-size
>>>>     aligned boundary, crop this region at this boundary.
>>>>
>>>> This way it's still possible to compare less than cluster and report
>>>> subcluster-level accuracy, but we newer compare more than one cluster
>>>> of data.
>>>>
>>>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>>>> ---
>>>>   docs/tools/qemu-img.rst |  18 +++-
>>>>   qemu-img.c              | 206 +++++++++++++++++++++++++++++++++++++---
>>>>   qemu-img-cmds.hx        |   4 +-
>>>>   3 files changed, 212 insertions(+), 16 deletions(-)
>>>
>>> Looks good to me overall!  Just some technical comments below.
>>>
>>>> diff --git a/docs/tools/qemu-img.rst b/docs/tools/qemu-img.rst
>>>> index d58980aef8..21164253d4 100644
>>>> --- a/docs/tools/qemu-img.rst
>>>> +++ b/docs/tools/qemu-img.rst
>>>> @@ -159,6 +159,18 @@ Parameters to compare subcommand:
>>>>     Strict mode - fail on different image size or sector allocation
>>>> +.. option:: --stat
>>>> +
>>>> +  Instead of exit on first mismatch compare the whole images and print
>>>> +  statistics on amount of different pairs of clusters, based on their
>>>> +  block-status and are they equal or not.
>>>
>>> I’d phrase this as:
>>>
>>> Instead of exiting on the first mismatch, compare the whole images and print statistics on how much they differ in terms of block status (i.e. are blocks allocated or not, do they contain data, are they marked as containing only zeroes) and block content (a block of data that contains only zero still has the same content as a marked-zero block).
>>
>> For me the rest starting from "and block content" sounds unclear, seems doesn't add any information to previous (i.e. are blocks allocated ...)
> 
> By “block content” I meant what you said by “equal or not”, i.e. what is returned when reading from the block.
> 
> Now that I think about it again, I believe we should go with your original “equal or not”, though, because that reflects what qemu-img --stat prints, like so perhaps:
> 
> Instead of exiting on the first mismatch, compare the whole images and print statistics on the amount of different pairs of blocks, based on their block status and whether they are equal or not.

OK

> 
> I’d still like to add something like what I had in parentheses, though, because as a user, I’d find the “block status” and “equal or not” terms to be a bit handwave-y.  I don’t think “block status” is a common term in our documentation, so I wanted to add some examples; and I wanted to show by example that “equal” blocks don’t need to have the same block status.

Actually, I think, that the resulting tool is anyway developer-oriented, to use it you should know what this block status really means.. May be just s/block status/block type/, will it be more human friendly?

> 
> [...]
> 
>>>> @@ -1304,6 +1306,107 @@ static int check_empty_sectors(BlockBackend *blk, int64_t offset,
>>>>       return 0;
>>>>   }
>>>> +#define IMG_CMP_STATUS_MASK (BDRV_BLOCK_DATA | BDRV_BLOCK_ZERO | \
>>>> +                             BDRV_BLOCK_ALLOCATED)
>>>> +#define IMG_CMP_STATUS_MAX (IMG_CMP_STATUS_MASK | BDRV_BLOCK_EOF)
>>>> +
>>>> +typedef struct ImgCmpStat {
>>>> +    /* stat: [ret: 0 is equal, 1 is not][status1][status2] -> n_bytes */
>>>> +    uint64_t stat[2][IMG_CMP_STATUS_MAX + 1][IMG_CMP_STATUS_MAX + 1];
>>>
>>> `IMG_CMP_STATUS_MAX` isn’t packed tightly because it only has four bits set (0x33).  That in itself isn’t a problem, but it means that `IMG_CMP_STATUS_MAX + 1` is 52, and so this array’s size is 52 * 52 * 2 * sizeof(uint64_t) = 43264.  Again, that isn’t a problem in itself (although it is a bit sad that this could fit into 16 * 16 * 2 * 8 = 4 kB), but in `img_compare()` [1], you put this structure on the stack, and I believe it’s too big for that.
>>
>> Hmm. May be, it's better just use GHashTables and don't bother with these sparse arrays
> 
> Or we could use our own bits here (ALLOCATED = (1 << 2), EOF = (1 << 3)) and have a small function that translates BDRV_BLOCK_* values into them.
> 
> In any case, I don’t mind the sparseness too much, it’s just that it shouldn’t go on the stack.
> 
> Hanna
> 


-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 1/4] qemu-img: implement compare --stat
  2021-10-26  9:18         ` Vladimir Sementsov-Ogievskiy
@ 2021-10-27  8:35           ` Hanna Reitz
  0 siblings, 0 replies; 14+ messages in thread
From: Hanna Reitz @ 2021-10-27  8:35 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-block
  Cc: kwolf, eblake, qemu-devel, nsoffer, nikita.lapshin, den, jsnow

On 26.10.21 11:18, Vladimir Sementsov-Ogievskiy wrote:
> 26.10.2021 11:47, Hanna Reitz wrote:
>> On 26.10.21 09:53, Vladimir Sementsov-Ogievskiy wrote:
>>> 25.10.2021 19:40, Hanna Reitz wrote:
>>>> On 21.10.21 12:12, Vladimir Sementsov-Ogievskiy wrote:
>>>>> With new option qemu-img compare will not stop at first mismatch, but
>>>>> instead calculate statistics: how many clusters with different data,
>>>>> how many clusters with equal data, how many clusters were unallocated
>>>>> but become data and so on.
>>>>>
>>>>> We compare images chunk by chunk. Chunk size depends on what
>>>>> block_status returns for both images. It may return less than cluster
>>>>> (remember about qcow2 subclusters), it may return more than 
>>>>> cluster (if
>>>>> several consecutive clusters share same status). Finally images may
>>>>> have different cluster sizes. This all leads to ambiguity in how to
>>>>> finally compare the data.
>>>>>
>>>>> What we can say for sure is that, when we compare two qcow2 images 
>>>>> with
>>>>> same cluster size, we should compare clusters with data separately.
>>>>> Otherwise, if we for example compare 10 consecutive clusters of data
>>>>> where only one byte differs we'll report 10 different clusters.
>>>>> Expected result in this case is 1 different cluster and 9 equal ones.
>>>>>
>>>>> So, to serve this case and just to have some defined rule let's do 
>>>>> the
>>>>> following:
>>>>>
>>>>> 1. Select some block-size for compare procedure. In this commit it 
>>>>> must
>>>>>     be specified by user, next commit will add some automatic 
>>>>> logic and
>>>>>     make --block-size optional.
>>>>>
>>>>> 2. Go chunk-by-chunk using block_status as we do now with only one
>>>>>     differency:
>>>>>     If block_status() returns DATA region that intersects block-size
>>>>>     aligned boundary, crop this region at this boundary.
>>>>>
>>>>> This way it's still possible to compare less than cluster and report
>>>>> subcluster-level accuracy, but we newer compare more than one cluster
>>>>> of data.
>>>>>
>>>>> Signed-off-by: Vladimir Sementsov-Ogievskiy 
>>>>> <vsementsov@virtuozzo.com>
>>>>> ---
>>>>>   docs/tools/qemu-img.rst |  18 +++-
>>>>>   qemu-img.c              | 206 
>>>>> +++++++++++++++++++++++++++++++++++++---
>>>>>   qemu-img-cmds.hx        |   4 +-
>>>>>   3 files changed, 212 insertions(+), 16 deletions(-)
>>>>
>>>> Looks good to me overall!  Just some technical comments below.
>>>>
>>>>> diff --git a/docs/tools/qemu-img.rst b/docs/tools/qemu-img.rst
>>>>> index d58980aef8..21164253d4 100644
>>>>> --- a/docs/tools/qemu-img.rst
>>>>> +++ b/docs/tools/qemu-img.rst
>>>>> @@ -159,6 +159,18 @@ Parameters to compare subcommand:
>>>>>     Strict mode - fail on different image size or sector allocation
>>>>> +.. option:: --stat
>>>>> +
>>>>> +  Instead of exit on first mismatch compare the whole images and 
>>>>> print
>>>>> +  statistics on amount of different pairs of clusters, based on 
>>>>> their
>>>>> +  block-status and are they equal or not.
>>>>
>>>> I’d phrase this as:
>>>>
>>>> Instead of exiting on the first mismatch, compare the whole images 
>>>> and print statistics on how much they differ in terms of block 
>>>> status (i.e. are blocks allocated or not, do they contain data, are 
>>>> they marked as containing only zeroes) and block content (a block 
>>>> of data that contains only zero still has the same content as a 
>>>> marked-zero block).
>>>
>>> For me the rest starting from "and block content" sounds unclear, 
>>> seems doesn't add any information to previous (i.e. are blocks 
>>> allocated ...)
>>
>> By “block content” I meant what you said by “equal or not”, i.e. what 
>> is returned when reading from the block.
>>
>> Now that I think about it again, I believe we should go with your 
>> original “equal or not”, though, because that reflects what qemu-img 
>> --stat prints, like so perhaps:
>>
>> Instead of exiting on the first mismatch, compare the whole images 
>> and print statistics on the amount of different pairs of blocks, 
>> based on their block status and whether they are equal or not.
>
> OK
>
>>
>> I’d still like to add something like what I had in parentheses, 
>> though, because as a user, I’d find the “block status” and “equal or 
>> not” terms to be a bit handwave-y.  I don’t think “block status” is a 
>> common term in our documentation, so I wanted to add some examples; 
>> and I wanted to show by example that “equal” blocks don’t need to 
>> have the same block status.
>
> Actually, I think, that the resulting tool is anyway 
> developer-oriented, to use it you should know what this block status 
> really means.. May be just s/block status/block type/, will it be more 
> human friendly?

Oh, OK.  I think I’d prefer “block status” still, because that’s what we 
use in the code.

Hanna



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 2/4] qemu-img: make --block-size optional for compare --stat
  2021-10-26  8:07   ` Hanna Reitz
@ 2021-10-28 10:04     ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 14+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2021-10-28 10:04 UTC (permalink / raw)
  To: Hanna Reitz, qemu-block
  Cc: qemu-devel, kwolf, nsoffer, eblake, jsnow, den, nikita.lapshin

26.10.2021 11:07, Hanna Reitz wrote:
> On 21.10.21 12:12, Vladimir Sementsov-Ogievskiy wrote:
>> Let's detect block-size automatically if not specified by user:
>>
>>   If both files define cluster-size, use minimum to be more precise.
>>   If both files don't specify cluster-size, use default of 64K
>>   If only one file specify cluster-size, just use it.
>>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>> ---
>>   docs/tools/qemu-img.rst |  7 +++-
>>   qemu-img.c              | 71 ++++++++++++++++++++++++++++++++++++-----
>>   qemu-img-cmds.hx        |  4 +--
>>   3 files changed, 71 insertions(+), 11 deletions(-)
> 
> Looks good, just grammar nits and a request for an assertion below.
> 
>> diff --git a/docs/tools/qemu-img.rst b/docs/tools/qemu-img.rst
>> index 21164253d4..4b382ca2b0 100644
>> --- a/docs/tools/qemu-img.rst
>> +++ b/docs/tools/qemu-img.rst
>> @@ -170,6 +170,11 @@ Parameters to compare subcommand:
>>     Block size for comparing with ``--stat``. This doesn't guarantee exact
>>     size of comparing chunks, but that guarantee that data chunks being
>>     compared will never cross aligned block-size boundary.
>> +  When unspecified the following logic is used:
>> +
>> +    - If both files define cluster-size, use minimum to be more precise.
>> +    - If both files don't specify cluster-size, use default of 64K
>> +    - If only one file specify cluster-size, just use it.
> 
> s/specify/specifies/; and perhaps s/it/that/
> 
> [...]
> 
>> diff --git a/qemu-img.c b/qemu-img.c
>> index 79a0589167..61e7f470bb 100644
>> --- a/qemu-img.c
>> +++ b/qemu-img.c
>> @@ -1407,6 +1407,61 @@ static void cmp_stat_print(ImgCmpStat *stat, int64_t total_bytes)
>>       }
>>   }
>> +/* Get default value for qemu-img compare --block-size option. */
>> +static int img_compare_block_size(BlockDriverState *bs1,
>> +                                  BlockDriverState *bs2,
>> +                                  bool quiet)
>> +{
>> +    const int default_block_size = 64 * 1024; /* 64K */
>> +
>> +    int ret;
>> +    BlockDriverInfo bdi;
>> +    int cluster_size1, cluster_size2, block_size;
>> +    const char *note = "Note: to alter it, set --block-size option.";
>> +    const char *fname1 = bs1->filename;
>> +    const char *fname2 = bs2->filename;
>> +
>> +    ret = bdrv_get_info(bs1, &bdi);
>> +    if (ret < 0 && ret != -ENOTSUP) {
>> +        error_report("Failed to get info of %s: %s", fname1, strerror(-ret));
>> +        return ret;
>> +    }
>> +    cluster_size1 = ret < 0 ? 0 : bdi.cluster_size;
>> +
>> +    ret = bdrv_get_info(bs2, &bdi);
>> +    if (ret < 0 && ret != -ENOTSUP) {
>> +        error_report("Failed to get info of %s: %s", fname2, strerror(-ret));
>> +        return ret;
>> +    }
>> +    cluster_size2 = ret < 0 ? 0 : bdi.cluster_size;
>> +
> 
> I’d feel better with an `assert(cluster_size1 >= 0 && cluster_size2 >= 0);` here.

Hmm.. But it seems obvious: bdi.cluster_size should not be <0 on success of bdrv_get_info.

> 
>> +    if (cluster_size1 > 0 && cluster_size2 > 0) {
>> +        if (cluster_size1 == cluster_size2) {
>> +            block_size = cluster_size1;
>> +        } else {
>> +            block_size = MIN(cluster_size1, cluster_size2);
>> +            qprintf(quiet, "%s and %s has different cluster sizes: %d and %d "
> 
> s/has/have/
> 
>> +                    "correspondingly. Use minimum as block-size for "
> 
> s/correspondingly/respectively/; s/Use/Using/ (“Use” sounds like an imperative)
> 
>> +                    "accuracy: %d. %s\n",
>> +                    fname1, fname2, cluster_size1,
>> +                    cluster_size2, block_size, note);
>> +        }
>> +    } else if (cluster_size1 == 0 && cluster_size2 == 0) {
>> +        block_size = default_block_size;
>> +        qprintf(quiet, "Neither of %s and %s has explicit cluster size. Use "
> 
> s/has/have an/; s/Use/Using/
> 
>> +                "default of %d bytes. %s\n", fname1, fname2, block_size, note);
>> +    } else {
>> +        block_size = MAX(cluster_size1, cluster_size2);
>> +        qprintf(quiet, "%s has explicit cluster size of %d and %s "
> 
> s/has/has an/
> 
>> +                "doesn't have one. Use %d as block-size. %s\n",
> 
> s/Use/Using/
> 
> Hanna
> 
>> +                cluster_size1 ? fname1 : fname2, block_size,
>> +                cluster_size1 ? fname2 : fname1,
>> +                block_size, note);
>> +    }
>> +
>> +    return block_size;
>> +}
>> +
>>   /*
>>    * Compares two images. Exit codes:
>>    *
> 


-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2021-10-28 10:06 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-21 10:12 [PATCH v2 0/4] qemu-img compare --stat Vladimir Sementsov-Ogievskiy
2021-10-21 10:12 ` [PATCH v2 1/4] qemu-img: implement " Vladimir Sementsov-Ogievskiy
2021-10-25 16:40   ` Hanna Reitz
2021-10-26  7:53     ` Vladimir Sementsov-Ogievskiy
2021-10-26  8:47       ` Hanna Reitz
2021-10-26  9:18         ` Vladimir Sementsov-Ogievskiy
2021-10-27  8:35           ` Hanna Reitz
2021-10-21 10:12 ` [PATCH v2 2/4] qemu-img: make --block-size optional for " Vladimir Sementsov-Ogievskiy
2021-10-26  8:07   ` Hanna Reitz
2021-10-28 10:04     ` Vladimir Sementsov-Ogievskiy
2021-10-21 10:12 ` [PATCH v2 3/4] qemu-img: add --shallow option for qemu-img compare Vladimir Sementsov-Ogievskiy
2021-10-26  8:11   ` Hanna Reitz
2021-10-21 10:12 ` [PATCH v2 4/4] iotests: add qemu-img-compare-stat test Vladimir Sementsov-Ogievskiy
2021-10-26  8:26   ` Hanna Reitz

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).