All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/3] Add qemu-img checksum command using blkhash
@ 2022-09-01 14:32 Nir Soffer
  2022-09-01 14:32 ` [PATCH 1/3] qemu-img: Add checksum command Nir Soffer
                   ` (3 more replies)
  0 siblings, 4 replies; 18+ messages in thread
From: Nir Soffer @ 2022-09-01 14:32 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, Kevin Wolf, Hanna Reitz, Nir Soffer

Since blkhash is available only via copr now, the new command is added as
optional feature, built only if blkhash-devel package is installed.

Nir Soffer (3):
  qemu-img: Add checksum command
  iotests: Test qemu-img checksum
  qemu-img: Speed up checksum

 docs/tools/qemu-img.rst                       |  22 +
 meson.build                                   |  10 +-
 meson_options.txt                             |   2 +
 qemu-img-cmds.hx                              |   8 +
 qemu-img.c                                    | 388 ++++++++++++++++++
 tests/qemu-iotests/tests/qemu-img-checksum    | 149 +++++++
 .../qemu-iotests/tests/qemu-img-checksum.out  |  74 ++++
 7 files changed, 652 insertions(+), 1 deletion(-)
 create mode 100755 tests/qemu-iotests/tests/qemu-img-checksum
 create mode 100644 tests/qemu-iotests/tests/qemu-img-checksum.out

-- 
2.37.2



^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 1/3] qemu-img: Add checksum command
  2022-09-01 14:32 [PATCH 0/3] Add qemu-img checksum command using blkhash Nir Soffer
@ 2022-09-01 14:32 ` Nir Soffer
  2022-10-26 13:00   ` Hanna Reitz
  2022-09-01 14:32 ` [PATCH 2/3] iotests: Test qemu-img checksum Nir Soffer
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 18+ messages in thread
From: Nir Soffer @ 2022-09-01 14:32 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, Kevin Wolf, Hanna Reitz, Nir Soffer

The checksum command compute a checksum for disk image content using the
blkhash library[1]. The blkhash library is not packaged yet, but it is
available via copr[2].

Example run:

    $ ./qemu-img checksum -p fedora-35.qcow2
    6e5c00c995056319d52395f8d91c7f84725ae3da69ffcba4de4c7d22cff713a5  fedora-35.qcow2

The block checksum is constructed by splitting the image to fixed sized
blocks and computing a digest of every block. The image checksum is the
digest of the all block digests.

The checksum uses internally the "sha256" algorithm but it cannot be
compared with checksums created by other tools such as `sha256sum`.

The blkhash library supports sparse images, zero detection, and
optimizes zero block hashing (they are practically free). The library
uses multiple threads to speed up the computation.

Comparing to `sha256sum`, `qemu-img checksum` is 3.5-4800[3] times
faster, depending on the amount of data in the image:

    $ ./qemu-img info /scratch/50p.raw
    file format: raw
    virtual size: 6 GiB (6442450944 bytes)
    disk size: 2.91 GiB

    $ hyperfine -w2 -r5 -p "sleep 1" "./qemu-img checksum /scratch/50p.raw" \
                                     "sha256sum /scratch/50p.raw"
    Benchmark 1: ./qemu-img checksum /scratch/50p.raw
      Time (mean ± σ):      1.849 s ±  0.037 s    [User: 7.764 s, System: 0.962 s]
      Range (min … max):    1.813 s …  1.908 s    5 runs

    Benchmark 2: sha256sum /scratch/50p.raw
      Time (mean ± σ):     14.585 s ±  0.072 s    [User: 13.537 s, System: 1.003 s]
      Range (min … max):   14.501 s … 14.697 s    5 runs

    Summary
      './qemu-img checksum /scratch/50p.raw' ran
        7.89 ± 0.16 times faster than 'sha256sum /scratch/50p.raw'

The new command is available only when `blkhash` is available during
build. To test the new command please install the `blkhash-devel`
package:

    $ dnf copr enable nsoffer/blkhash
    $ sudo dnf install blkhash-devel

[1] https://gitlab.com/nirs/blkhash
[2] https://copr.fedorainfracloud.org/coprs/nsoffer/blkhash/
[3] Computing checksum for 8T empty image: qemu-img checksum: 3.7s,
    sha256sum (estimate): 17,749s

Signed-off-by: Nir Soffer <nsoffer@redhat.com>
---
 docs/tools/qemu-img.rst |  22 +++++
 meson.build             |  10 ++-
 meson_options.txt       |   2 +
 qemu-img-cmds.hx        |   8 ++
 qemu-img.c              | 191 ++++++++++++++++++++++++++++++++++++++++
 5 files changed, 232 insertions(+), 1 deletion(-)

diff --git a/docs/tools/qemu-img.rst b/docs/tools/qemu-img.rst
index 85a6e05b35..8be9c45cbf 100644
--- a/docs/tools/qemu-img.rst
+++ b/docs/tools/qemu-img.rst
@@ -347,20 +347,42 @@ Command description:
     Check completed, image is corrupted
   3
     Check completed, image has leaked clusters, but is not corrupted
   63
     Checks are not supported by the image format
 
   If ``-r`` is specified, exit codes representing the image state refer to the
   state after (the attempt at) repairing it. That is, a successful ``-r all``
   will yield the exit code 0, independently of the image state before.
 
+.. option:: checksum [--object OBJECTDEF] [--image-opts] [-f FMT] [-T SRC_CACHE] [-p] FILENAME
+
+  Print a checksum for image *FILENAME* guest visible content. Images with
+  different format or settings wil have the same checksum.
+
+  The format is probed unless you specify it by ``-f``.
+
+  The checksum is computed for guest visible content. Allocated areas full of
+  zeroes, zero clusters, and unallocated areas are read as zeros so they will
+  have the same checksum. Images with single or multiple files or backing files
+  will have the same checksums if the guest will see the same content when
+  reading the image.
+
+  Image metadata that is not visible to the guest such as dirty bitmaps does
+  not affect the checksum.
+
+  Computing a checksum requires a read-only image. You cannot compute a
+  checksum of an active image used by a guest, but you can compute a checksum
+  of a guest during pull mode incremental backup using NBD URL.
+
+  The checksum is not compatible with other tools such as *sha256sum*.
+
 .. option:: commit [--object OBJECTDEF] [--image-opts] [-q] [-f FMT] [-t CACHE] [-b BASE] [-r RATE_LIMIT] [-d] [-p] FILENAME
 
   Commit the changes recorded in *FILENAME* in its base image or backing file.
   If the backing file is smaller than the snapshot, then the backing file will be
   resized to be the same size as the snapshot.  If the snapshot is smaller than
   the backing file, the backing file will not be truncated.  If you want the
   backing file to match the size of the smaller snapshot, you can safely truncate
   it yourself once the commit operation successfully completes.
 
   The image *FILENAME* is emptied after the operation has succeeded. If you do
diff --git a/meson.build b/meson.build
index 20fddbd707..56b648d8a7 100644
--- a/meson.build
+++ b/meson.build
@@ -727,20 +727,24 @@ if not get_option('curl').auto() or have_block
                     kwargs: static_kwargs)
 endif
 libudev = not_found
 if targetos == 'linux' and (have_system or have_tools)
   libudev = dependency('libudev',
                        method: 'pkg-config',
                        required: get_option('libudev'),
                        kwargs: static_kwargs)
 endif
 
+blkhash = dependency('blkhash', version: '>=0.4.1',
+                     method: 'pkg-config',
+                     required: get_option('blkhash'))
+
 mpathlibs = [libudev]
 mpathpersist = not_found
 mpathpersist_new_api = false
 if targetos == 'linux' and have_tools and get_option('mpath').allowed()
   mpath_test_source_new = '''
     #include <libudev.h>
     #include <mpath_persist.h>
     unsigned mpath_mx_alloc_len = 1024;
     int logsink;
     static struct config *multipath_conf;
@@ -1852,20 +1856,22 @@ config_host_data.set('CONFIG_CFI', get_option('cfi'))
 config_host_data.set('CONFIG_SELINUX', selinux.found())
 config_host_data.set('CONFIG_XEN_BACKEND', xen.found())
 if xen.found()
   # protect from xen.version() having less than three components
   xen_version = xen.version().split('.') + ['0', '0']
   xen_ctrl_version = xen_version[0] + \
     ('0' + xen_version[1]).substring(-2) + \
     ('0' + xen_version[2]).substring(-2)
   config_host_data.set('CONFIG_XEN_CTRL_INTERFACE_VERSION', xen_ctrl_version)
 endif
+config_host_data.set('CONFIG_BLKHASH', blkhash.found())
+
 config_host_data.set('QEMU_VERSION', '"@0@"'.format(meson.project_version()))
 config_host_data.set('QEMU_VERSION_MAJOR', meson.project_version().split('.')[0])
 config_host_data.set('QEMU_VERSION_MINOR', meson.project_version().split('.')[1])
 config_host_data.set('QEMU_VERSION_MICRO', meson.project_version().split('.')[2])
 
 config_host_data.set_quoted('CONFIG_HOST_DSOSUF', host_dsosuf)
 config_host_data.set('HAVE_HOST_BLOCK_DEVICE', have_host_block_device)
 
 have_coroutine_pool = get_option('coroutine_pool')
 if get_option('debug_stack_usage') and have_coroutine_pool
@@ -3594,21 +3600,22 @@ subdir('qga')
 # Don't build qemu-keymap if xkbcommon is not explicitly enabled
 # when we don't build tools or system
 if xkbcommon.found()
   # used for the update-keymaps target, so include rules even if !have_tools
   qemu_keymap = executable('qemu-keymap', files('qemu-keymap.c', 'ui/input-keymap.c') + genh,
                            dependencies: [qemuutil, xkbcommon], install: have_tools)
 endif
 
 if have_tools
   qemu_img = executable('qemu-img', [files('qemu-img.c'), hxdep],
-             dependencies: [authz, block, crypto, io, qom, qemuutil], install: true)
+             dependencies: [authz, block, crypto, io, qom, qemuutil, blkhash],
+             install: true)
   qemu_io = executable('qemu-io', files('qemu-io.c'),
              dependencies: [block, qemuutil], install: true)
   qemu_nbd = executable('qemu-nbd', files('qemu-nbd.c'),
                dependencies: [blockdev, qemuutil, gnutls, selinux],
                install: true)
 
   subdir('storage-daemon')
   subdir('contrib/rdmacm-mux')
   subdir('contrib/elf2dmp')
 
@@ -3977,20 +3984,21 @@ summary_info += {'bzip2 support':     libbzip2}
 summary_info += {'lzfse support':     liblzfse}
 summary_info += {'zstd support':      zstd}
 summary_info += {'NUMA host support': numa}
 summary_info += {'capstone':          capstone}
 summary_info += {'libpmem support':   libpmem}
 summary_info += {'libdaxctl support': libdaxctl}
 summary_info += {'libudev':           libudev}
 # Dummy dependency, keep .found()
 summary_info += {'FUSE lseek':        fuse_lseek.found()}
 summary_info += {'selinux':           selinux}
+summary_info += {'blkhash':           blkhash}
 summary(summary_info, bool_yn: true, section: 'Dependencies')
 
 if not supported_cpus.contains(cpu)
   message()
   warning('SUPPORT FOR THIS HOST CPU WILL GO AWAY IN FUTURE RELEASES!')
   message()
   message('CPU host architecture ' + cpu + ' support is not currently maintained.')
   message('The QEMU project intends to remove support for this host CPU in')
   message('a future release if nobody volunteers to maintain it and to')
   message('provide a build host for our continuous integration setup.')
diff --git a/meson_options.txt b/meson_options.txt
index e58e158396..8bf1dece12 100644
--- a/meson_options.txt
+++ b/meson_options.txt
@@ -304,10 +304,12 @@ option('debug_mutex', type: 'boolean', value: false,
 option('debug_stack_usage', type: 'boolean', value: false,
        description: 'measure coroutine stack usage')
 option('qom_cast_debug', type: 'boolean', value: false,
        description: 'cast debugging support')
 option('gprof', type: 'boolean', value: false,
        description: 'QEMU profiling with gprof')
 option('profiler', type: 'boolean', value: false,
        description: 'profiler support')
 option('slirp_smbd', type : 'feature', value : 'auto',
        description: 'use smbd (at path --smbd=*) in slirp networking')
+option('blkhash', type: 'feature', value: 'auto',
+       description: 'blkhash support for computing image checksum')
diff --git a/qemu-img-cmds.hx b/qemu-img-cmds.hx
index 1b1dab5b17..c4fc935a37 100644
--- a/qemu-img-cmds.hx
+++ b/qemu-img-cmds.hx
@@ -26,20 +26,28 @@ DEF("bitmap", img_bitmap,
 SRST
 .. option:: bitmap (--merge SOURCE | --add | --remove | --clear | --enable | --disable)... [-b SOURCE_FILE [-F SOURCE_FMT]] [-g GRANULARITY] [--object OBJECTDEF] [--image-opts | -f FMT] FILENAME BITMAP
 ERST
 
 DEF("check", img_check,
     "check [--object objectdef] [--image-opts] [-q] [-f fmt] [--output=ofmt] [-r [leaks | all]] [-T src_cache] [-U] filename")
 SRST
 .. option:: check [--object OBJECTDEF] [--image-opts] [-q] [-f FMT] [--output=OFMT] [-r [leaks | all]] [-T SRC_CACHE] [-U] FILENAME
 ERST
 
+#ifdef CONFIG_BLKHASH
+DEF("checksum", img_checksum,
+    "checksum [--object objectdef] [--image-opts] [-f fmt] [-T src_cache] [-p] filename")
+SRST
+.. option:: checksum [--object OBJECTDEF] [--image-opts] [-f FMT] [-T SRC_CACHE] [-p] FILENAME
+ERST
+#endif /* CONFIG_BLKHASH */
+
 DEF("commit", img_commit,
     "commit [--object objectdef] [--image-opts] [-q] [-f fmt] [-t cache] [-b base] [-r rate_limit] [-d] [-p] filename")
 SRST
 .. option:: commit [--object OBJECTDEF] [--image-opts] [-q] [-f FMT] [-t CACHE] [-b BASE] [-r RATE_LIMIT] [-d] [-p] FILENAME
 ERST
 
 DEF("compare", img_compare,
     "compare [--object objectdef] [--image-opts] [-f fmt] [-F fmt] [-T src_cache] [-p] [-q] [-s] [-U] filename1 filename2")
 SRST
 .. option:: compare [--object OBJECTDEF] [--image-opts] [-f FMT] [-F FMT] [-T SRC_CACHE] [-p] [-q] [-s] [-U] FILENAME1 FILENAME2
diff --git a/qemu-img.c b/qemu-img.c
index 7d4b33b3da..7edcfe4bc8 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -17,20 +17,21 @@
  * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
  * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
  * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
  * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
  * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
  * THE SOFTWARE.
  */
 
 #include "qemu/osdep.h"
 #include <getopt.h>
+#include <blkhash.h>
 
 #include "qemu/help-texts.h"
 #include "qemu/qemu-progress.h"
 #include "qemu-version.h"
 #include "qapi/error.h"
 #include "qapi/qapi-commands-block-core.h"
 #include "qapi/qapi-visit-block-core.h"
 #include "qapi/qobject-output-visitor.h"
 #include "qapi/qmp/qjson.h"
 #include "qapi/qmp/qdict.h"
@@ -1611,20 +1612,210 @@ out:
     qemu_vfree(buf1);
     qemu_vfree(buf2);
     blk_unref(blk2);
 out2:
     blk_unref(blk1);
 out3:
     qemu_progress_end();
     return ret;
 }
 
+#ifdef CONFIG_BLKHASH
+/*
+ * Compute image checksum.
+ */
+static int img_checksum(int argc, char **argv)
+{
+    const char *digest_name = "sha256";
+    const size_t block_size = 64 * KiB;
+
+    const char *format = NULL;
+    const char *cache = BDRV_DEFAULT_CACHE;
+    const char *filename;
+    BlockBackend *blk;
+    BlockDriverState *bs;
+    uint8_t *buf = NULL;
+    int64_t pnum;
+    bool progress = false;
+    int flags = 0;
+    bool writethrough;
+    int64_t total_size;
+    int64_t offset = 0;
+    int c;
+    bool image_opts = false;
+    struct blkhash *h = NULL;
+    unsigned char digest[64];
+    unsigned int digest_len;
+    int ret = 1;
+    int err;
+
+    for (;;) {
+        static const struct option long_options[] = {
+            {"help", no_argument, 0, 'h'},
+            {"object", required_argument, 0, OPTION_OBJECT},
+            {"image-opts", no_argument, 0, OPTION_IMAGE_OPTS},
+            {0, 0, 0, 0}
+        };
+        c = getopt_long(argc, argv, ":hf:T:p",
+                        long_options, NULL);
+        if (c == -1) {
+            break;
+        }
+        switch (c) {
+        case ':':
+            missing_argument(argv[optind - 1]);
+            break;
+        case '?':
+            unrecognized_option(argv[optind - 1]);
+            break;
+        case 'h':
+            help();
+            break;
+        case 'f':
+            format = optarg;
+            break;
+        case 'T':
+            cache = optarg;
+            break;
+        case 'p':
+            progress = true;
+            break;
+        case OPTION_OBJECT:
+            {
+                Error *local_err = NULL;
+
+                if (!user_creatable_add_from_str(optarg, &local_err)) {
+                    if (local_err) {
+                        error_report_err(local_err);
+                        exit(2);
+                    } else {
+                        /* Help was printed */
+                        exit(EXIT_SUCCESS);
+                    }
+                }
+                break;
+            }
+        case OPTION_IMAGE_OPTS:
+            image_opts = true;
+            break;
+        }
+    }
+
+    if (optind != argc - 1) {
+        error_exit("Expecting image file name");
+    }
+
+    filename = argv[optind++];
+
+    err = bdrv_parse_cache_mode(cache, &flags, &writethrough);
+    if (err < 0) {
+        error_report("Invalid source cache option: %s", cache);
+        return ret;
+    }
+
+    blk = img_open(image_opts, filename, format, flags, writethrough, false,
+                   false);
+    if (!blk) {
+        return ret;
+    }
+
+    /* Initialize before using goto out. */
+    qemu_progress_init(progress, 2.0);
+
+    bs = blk_bs(blk);
+    buf = blk_blockalign(blk, IO_BUF_SIZE);
+
+    total_size = blk_getlength(blk);
+    if (total_size < 0) {
+        error_report("Can't get size of %s: %s",
+                     filename, strerror(-total_size));
+        goto out;
+    }
+
+    h = blkhash_new(block_size, digest_name);
+    if (!h) {
+        error_report("Can't create blkhash: %s", strerror(errno));
+        goto out;
+    }
+
+    qemu_progress_print(0, 100);
+
+    while (offset < total_size) {
+        int status;
+        int64_t chunk;
+
+        status = bdrv_block_status_above(bs, NULL, offset,
+                                         total_size - offset, &pnum, NULL,
+                                         NULL);
+        if (status < 0) {
+            error_report("Error checking status at offset %" PRId64 " for %s",
+                         offset, filename);
+            goto out;
+        }
+
+        assert(pnum);
+        chunk = pnum;
+
+        if (status & BDRV_BLOCK_ZERO) {
+            chunk = MIN(chunk, SIZE_MAX);
+            err = blkhash_zero(h, chunk);
+            if (err) {
+                error_report("Error zeroing hash at offset %" PRId64
+                             " of %s: %s",
+                             offset, filename, strerror(err));
+                goto out;
+            }
+        } else {
+            chunk = MIN(chunk, IO_BUF_SIZE);
+            err = blk_pread(blk, offset, chunk, buf, 0);
+            if (err < 0) {
+                error_report("Error reading at offset %" PRId64 " of %s: %s",
+                             offset, filename, strerror(-err));
+                goto out;
+            }
+            err = blkhash_update(h, buf, chunk);
+            if (err) {
+                error_report("Error updating hash at offset %" PRId64
+                             " of %s: %s",
+                             offset, filename, strerror(err));
+                goto out;
+            }
+        }
+
+        offset += chunk;
+        qemu_progress_print(((float) chunk / total_size) * 100, 100);
+    }
+
+    err = blkhash_final(h, digest, &digest_len);
+    if (err) {
+        error_report("Error finalizing hash of %s: %s",
+                     filename, strerror(err));
+        goto out;
+    }
+
+    for (unsigned i = 0; i < digest_len; i++) {
+        printf("%02x", digest[i]);
+    }
+    printf("  %s%s", filename, progress ? "" : "\n");
+
+    ret = 0;
+
+out:
+    blkhash_free(h);
+    qemu_vfree(buf);
+    blk_unref(blk);
+    qemu_progress_end();
+
+    return ret;
+}
+#endif /* CONFIG_BLKHASH */
+
 /* Convenience wrapper around qmp_block_dirty_bitmap_merge */
 static void do_dirty_bitmap_merge(const char *dst_node, const char *dst_name,
                                   const char *src_node, const char *src_name,
                                   Error **errp)
 {
     BlockDirtyBitmapOrStr *merge_src;
     BlockDirtyBitmapOrStrList *list = NULL;
 
     merge_src = g_new0(BlockDirtyBitmapOrStr, 1);
     merge_src->type = QTYPE_QDICT;
-- 
2.37.2



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 2/3] iotests: Test qemu-img checksum
  2022-09-01 14:32 [PATCH 0/3] Add qemu-img checksum command using blkhash Nir Soffer
  2022-09-01 14:32 ` [PATCH 1/3] qemu-img: Add checksum command Nir Soffer
@ 2022-09-01 14:32 ` Nir Soffer
  2022-10-26 13:30   ` Hanna Reitz
  2022-09-01 14:32 ` [PATCH 3/3] qemu-img: Speed up checksum Nir Soffer
  2022-09-18  9:35 ` [PATCH 0/3] Add qemu-img checksum command using blkhash Nir Soffer
  3 siblings, 1 reply; 18+ messages in thread
From: Nir Soffer @ 2022-09-01 14:32 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, Kevin Wolf, Hanna Reitz, Nir Soffer

Add simple tests creating an image with all kinds of extents, different
formats, different backing chain, different protocol, and different
image options. Since all images have the same guest visible content they
must have the same checksum.

To help debugging in case of failures, the output includes a json map of
every test image.

Signed-off-by: Nir Soffer <nsoffer@redhat.com>
---
 tests/qemu-iotests/tests/qemu-img-checksum    | 149 ++++++++++++++++++
 .../qemu-iotests/tests/qemu-img-checksum.out  |  74 +++++++++
 2 files changed, 223 insertions(+)
 create mode 100755 tests/qemu-iotests/tests/qemu-img-checksum
 create mode 100644 tests/qemu-iotests/tests/qemu-img-checksum.out

diff --git a/tests/qemu-iotests/tests/qemu-img-checksum b/tests/qemu-iotests/tests/qemu-img-checksum
new file mode 100755
index 0000000000..3a85ba33f2
--- /dev/null
+++ b/tests/qemu-iotests/tests/qemu-img-checksum
@@ -0,0 +1,149 @@
+#!/usr/bin/env python3
+# group: rw auto quick
+#
+# Test cases for qemu-img checksum.
+#
+# Copyright (C) 2022 Red Hat, Inc.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+
+import re
+
+import iotests
+
+from iotests import (
+    filter_testfiles,
+    qemu_img,
+    qemu_img_log,
+    qemu_io,
+    qemu_nbd_popen,
+)
+
+
+def checksum_available():
+    out = qemu_img("--help").stdout
+    return re.search(r"\bchecksum .+ filename\b", out) is not None
+
+
+if not checksum_available():
+    iotests.notrun("checksum command not available")
+
+iotests.script_initialize(
+    supported_fmts=["raw", "qcow2"],
+    supported_cache_modes=["none", "writeback"],
+    supported_protocols=["file", "nbd"],
+    required_fmts=["raw", "qcow2"],
+)
+
+print()
+print("=== Test images ===")
+print()
+
+disk_raw = iotests.file_path('raw')
+qemu_img("create", "-f", "raw", disk_raw, "10m")
+qemu_io("-f", "raw",
+        "-c", "write -P 0x1 0 2m",      # data
+        "-c", "write -P 0x0 2m 2m",     # data with zeroes
+        "-c", "write -z 4m 2m",         # zero allocated
+        "-c", "write -z -u 6m 2m",      # zero hole
+                                        # unallocated
+        disk_raw)
+print(filter_testfiles(disk_raw))
+qemu_img_log("map", "--output", "json", disk_raw)
+
+disk_qcow2 = iotests.file_path('qcow2')
+qemu_img("create", "-f", "qcow2", disk_qcow2, "10m")
+qemu_io("-f", "qcow2",
+        "-c", "write -P 0x1 0 2m",      # data
+        "-c", "write -P 0x0 2m 2m",     # data with zeroes
+        "-c", "write -z 4m 2m",         # zero allocated
+        "-c", "write -z -u 6m 2m",      # zero hole
+                                        # unallocated
+        disk_qcow2)
+print(filter_testfiles(disk_qcow2))
+qemu_img_log("map", "--output", "json", disk_qcow2)
+
+disk_compressed = iotests.file_path('compressed')
+qemu_img("convert", "-f", "qcow2", "-O", "qcow2", "-c",
+         disk_qcow2, disk_compressed)
+print(filter_testfiles(disk_compressed))
+qemu_img_log("map", "--output", "json", disk_compressed)
+
+disk_base = iotests.file_path('base')
+qemu_img("create", "-f", "raw", disk_base, "10m")
+qemu_io("-f", "raw",
+        "-c", "write -P 0x1 0 2m",
+        "-c", "write -P 0x0 2m 2m",
+        disk_base)
+print(filter_testfiles(disk_base))
+qemu_img_log("map", "--output", "json", disk_base)
+
+disk_top = iotests.file_path('top')
+qemu_img("create", "-f", "qcow2", "-b", disk_base, "-F", "raw",
+         disk_top)
+qemu_io("-f", "qcow2",
+        "-c", "write -z 4m 2m",
+        "-c", "write -z -u 6m 2m",
+        disk_top)
+print(filter_testfiles(disk_top))
+qemu_img_log("map", "--output", "json", disk_top)
+
+print()
+print("=== Checksums - file ===")
+print()
+
+qemu_img_log("checksum", disk_raw)
+qemu_img_log("checksum", disk_qcow2)
+qemu_img_log("checksum", disk_compressed)
+qemu_img_log("checksum", disk_top)
+qemu_img_log("checksum", disk_base)
+
+print()
+print("=== Checksums - nbd ===")
+print()
+
+nbd_sock = iotests.file_path("nbd.sock", base_dir=iotests.sock_dir)
+nbd_uri = f"nbd+unix:///{{}}?socket={nbd_sock}"
+
+with qemu_nbd_popen("-k", nbd_sock, "-f", "raw", "-x", "raw", disk_raw):
+    qemu_img_log("checksum", nbd_uri.format("raw"))
+
+with qemu_nbd_popen("-k", nbd_sock, "-f", "qcow2", "-x", "qcow2", disk_qcow2):
+    qemu_img_log("checksum", nbd_uri.format("qcow2"))
+
+with qemu_nbd_popen("-k", nbd_sock, "-f", "qcow2", "-x", "compressed", disk_compressed):
+    qemu_img_log("checksum", nbd_uri.format("compressed"))
+
+with qemu_nbd_popen("-k", nbd_sock, "-f", "raw", "-x", "base", disk_base):
+    qemu_img_log("checksum", nbd_uri.format("base"))
+
+with qemu_nbd_popen("-k", nbd_sock, "-f", "qcow2", "-x", "top", disk_top):
+    qemu_img_log("checksum", nbd_uri.format("top"))
+
+print()
+print("=== Command line options ===")
+print()
+
+qemu_img_log("checksum", "-f", "qcow2", disk_top)
+qemu_img_log("checksum", "-T", "none", disk_top)
+
+out = qemu_img("checksum", "-p", disk_top).stdout
+last = out.splitlines()[-1]  # Filter progress lines.
+print(filter_testfiles(last))
+
+print()
+print("=== Incorrect usage ===")
+print()
+
+qemu_img_log("checksum", "-f", "qcow2", disk_raw, check=False)
diff --git a/tests/qemu-iotests/tests/qemu-img-checksum.out b/tests/qemu-iotests/tests/qemu-img-checksum.out
new file mode 100644
index 0000000000..2cff03439f
--- /dev/null
+++ b/tests/qemu-iotests/tests/qemu-img-checksum.out
@@ -0,0 +1,74 @@
+
+=== Test images ===
+
+TEST_DIR/PID-raw
+[{ "start": 0, "length": 4194304, "depth": 0, "present": true, "zero": false, "data": true, "offset": 0},
+{ "start": 4194304, "length": 6291456, "depth": 0, "present": true, "zero": true, "data": false, "offset": 4194304}]
+
+TEST_DIR/PID-qcow2
+[{ "start": 0, "length": 4194304, "depth": 0, "present": true, "zero": false, "data": true, "offset": 327680},
+{ "start": 4194304, "length": 4194304, "depth": 0, "present": true, "zero": true, "data": false},
+{ "start": 8388608, "length": 2097152, "depth": 0, "present": false, "zero": true, "data": false}]
+
+TEST_DIR/PID-compressed
+[{ "start": 0, "length": 2097152, "depth": 0, "present": true, "zero": false, "data": true},
+{ "start": 2097152, "length": 8388608, "depth": 0, "present": false, "zero": true, "data": false}]
+
+TEST_DIR/PID-base
+[{ "start": 0, "length": 4194304, "depth": 0, "present": true, "zero": false, "data": true, "offset": 0},
+{ "start": 4194304, "length": 6291456, "depth": 0, "present": true, "zero": true, "data": false, "offset": 4194304}]
+
+TEST_DIR/PID-top
+[{ "start": 0, "length": 4194304, "depth": 1, "present": true, "zero": false, "data": true, "offset": 0},
+{ "start": 4194304, "length": 4194304, "depth": 0, "present": true, "zero": true, "data": false},
+{ "start": 8388608, "length": 2097152, "depth": 1, "present": true, "zero": true, "data": false, "offset": 8388608}]
+
+
+=== Checksums - file ===
+
+57cd8ef0cfad106d737f8fb0de3a0306a8a1a41db7bf7c0c36e2dfe75ee9bd26  TEST_DIR/PID-raw
+
+57cd8ef0cfad106d737f8fb0de3a0306a8a1a41db7bf7c0c36e2dfe75ee9bd26  TEST_DIR/PID-qcow2
+
+57cd8ef0cfad106d737f8fb0de3a0306a8a1a41db7bf7c0c36e2dfe75ee9bd26  TEST_DIR/PID-compressed
+
+57cd8ef0cfad106d737f8fb0de3a0306a8a1a41db7bf7c0c36e2dfe75ee9bd26  TEST_DIR/PID-top
+
+57cd8ef0cfad106d737f8fb0de3a0306a8a1a41db7bf7c0c36e2dfe75ee9bd26  TEST_DIR/PID-base
+
+
+=== Checksums - nbd ===
+
+Start NBD server
+57cd8ef0cfad106d737f8fb0de3a0306a8a1a41db7bf7c0c36e2dfe75ee9bd26  nbd+unix:///raw?socket=SOCK_DIR/PID-nbd.sock
+
+Kill NBD server
+Start NBD server
+57cd8ef0cfad106d737f8fb0de3a0306a8a1a41db7bf7c0c36e2dfe75ee9bd26  nbd+unix:///qcow2?socket=SOCK_DIR/PID-nbd.sock
+
+Kill NBD server
+Start NBD server
+57cd8ef0cfad106d737f8fb0de3a0306a8a1a41db7bf7c0c36e2dfe75ee9bd26  nbd+unix:///compressed?socket=SOCK_DIR/PID-nbd.sock
+
+Kill NBD server
+Start NBD server
+57cd8ef0cfad106d737f8fb0de3a0306a8a1a41db7bf7c0c36e2dfe75ee9bd26  nbd+unix:///base?socket=SOCK_DIR/PID-nbd.sock
+
+Kill NBD server
+Start NBD server
+57cd8ef0cfad106d737f8fb0de3a0306a8a1a41db7bf7c0c36e2dfe75ee9bd26  nbd+unix:///top?socket=SOCK_DIR/PID-nbd.sock
+
+Kill NBD server
+
+=== Command line options ===
+
+57cd8ef0cfad106d737f8fb0de3a0306a8a1a41db7bf7c0c36e2dfe75ee9bd26  TEST_DIR/PID-top
+
+57cd8ef0cfad106d737f8fb0de3a0306a8a1a41db7bf7c0c36e2dfe75ee9bd26  TEST_DIR/PID-top
+
+57cd8ef0cfad106d737f8fb0de3a0306a8a1a41db7bf7c0c36e2dfe75ee9bd26  TEST_DIR/PID-top
+
+=== Incorrect usage ===
+
+qemu-img: Could not open 'TEST_DIR/PID-raw': Image is not in qcow2 format
+
-- 
2.37.2



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 3/3] qemu-img: Speed up checksum
  2022-09-01 14:32 [PATCH 0/3] Add qemu-img checksum command using blkhash Nir Soffer
  2022-09-01 14:32 ` [PATCH 1/3] qemu-img: Add checksum command Nir Soffer
  2022-09-01 14:32 ` [PATCH 2/3] iotests: Test qemu-img checksum Nir Soffer
@ 2022-09-01 14:32 ` Nir Soffer
  2022-10-26 13:54   ` Hanna Reitz
  2022-09-18  9:35 ` [PATCH 0/3] Add qemu-img checksum command using blkhash Nir Soffer
  3 siblings, 1 reply; 18+ messages in thread
From: Nir Soffer @ 2022-09-01 14:32 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, Kevin Wolf, Hanna Reitz, Nir Soffer

Add coroutine based loop inspired by `qemu-img convert` design.

Changes compared to `qemu-img convert`:

- State for the entire image is kept in ImgChecksumState

- State for single worker coroutine is kept in ImgChecksumworker.

- "Writes" are always in-order, ensured using a queue.

- Calling block status once per image extent, when the current extent is
  consumed by the workers.

- Using 1m buffer size - testings shows that this gives best read
  performance both with buffered and direct I/O.

- Number of coroutines is not configurable. Testing does not show
  improvement when using more than 8 coroutines.

- Progress include entire image, not only the allocated state.

Comparing to the simple read loop shows that this version is up to 4.67
times faster when computing a checksum for an image full of zeroes. For
real images it is 1.59 times faster with direct I/O, and with buffered
I/O there is no difference.

Test results on Dell PowerEdge R640 in a CentOS Stream 9 container:

| image    | size | i/o       | before         | after          | change |
|----------|------|-----------|----------------|----------------|--------|
| zero [1] |   6g | buffered  | 1.600s ±0.014s | 0.342s ±0.016s |  x4.67 |
| zero     |   6g | direct    | 4.684s ±0.093s | 2.211s ±0.009s |  x2.12 |
| real [2] |   6g | buffered  | 1.841s ±0.075s | 1.806s ±0.036s |  x1.02 |
| real     |   6g | direct    | 3.094s ±0.079s | 1.947s ±0.017s |  x1.59 |
| nbd  [3] |   6g | buffered  | 2.455s ±0.183s | 1.808s ±0.016s |  x1.36 |
| nbd      |   6g | direct    | 3.540s ±0.020s | 1.749s ±0.018s |  x2.02 |

[1] raw image full of zeroes
[2] raw fedora 35 image with additional random data, 50% full
[3] image [2] exported by qemu-nbd via unix socket

Signed-off-by: Nir Soffer <nsoffer@redhat.com>
---
 qemu-img.c | 343 +++++++++++++++++++++++++++++++++++++++++------------
 1 file changed, 270 insertions(+), 73 deletions(-)

diff --git a/qemu-img.c b/qemu-img.c
index 7edcfe4bc8..bfa8e2862f 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -1613,48 +1613,288 @@ out:
     qemu_vfree(buf2);
     blk_unref(blk2);
 out2:
     blk_unref(blk1);
 out3:
     qemu_progress_end();
     return ret;
 }
 
 #ifdef CONFIG_BLKHASH
+
+#define CHECKSUM_COROUTINES 8
+#define CHECKSUM_BUF_SIZE (1 * MiB)
+#define CHECKSUM_ZERO_SIZE MIN(16 * GiB, SIZE_MAX)
+
+typedef struct ImgChecksumState ImgChecksumState;
+
+typedef struct ImgChecksumWorker {
+    QTAILQ_ENTRY(ImgChecksumWorker) entry;
+    ImgChecksumState *state;
+    Coroutine *co;
+    uint8_t *buf;
+
+    /* The current chunk. */
+    int64_t offset;
+    int64_t length;
+    bool zero;
+
+    /* Always true for zero extent, false for data extent. Set to true
+     * when reading the chunk completes. */
+    bool ready;
+} ImgChecksumWorker;
+
+struct ImgChecksumState {
+    const char *filename;
+    BlockBackend *blk;
+    BlockDriverState *bs;
+    int64_t total_size;
+
+    /* Current extent, modified in checksum_co_next. */
+    int64_t offset;
+    int64_t length;
+    bool zero;
+
+    int running_coroutines;
+    CoMutex lock;
+    ImgChecksumWorker workers[CHECKSUM_COROUTINES];
+
+    /* Ensure in-order updates. Update are scheduled at the tail of the
+     * queue and processed from the head of the queue when a worker is
+     * ready. */
+    QTAILQ_HEAD(, ImgChecksumWorker) update_queue;
+
+    struct blkhash *hash;
+    int ret;
+};
+
+static int checksum_block_status(ImgChecksumState *s)
+{
+    int64_t length;
+    int status;
+
+    /* Must be called when current extent is consumed. */
+    assert(s->length == 0);
+
+    status = bdrv_block_status_above(s->bs, NULL, s->offset,
+                                     s->total_size - s->offset, &length, NULL,
+                                     NULL);
+    if (status < 0) {
+        error_report("Error checking status at offset %" PRId64 " for %s",
+                     s->offset, s->filename);
+        s->ret = status;
+        return -1;
+    }
+
+    assert(length > 0);
+
+    s->length = length;
+    s->zero = !!(status & BDRV_BLOCK_ZERO);
+
+    return 0;
+}
+
+/**
+ * Grab the next chunk from the current extent, getting the next extent if
+ * needed, and schecule the next update at the end fo the update queue.
+ *
+ * Retrun true if the worker has work to do, false if the worker has
+ * finished or there was an error getting the next extent.
+ */
+static coroutine_fn bool checksum_co_next(ImgChecksumWorker *w)
+{
+    ImgChecksumState *s = w->state;
+
+    qemu_co_mutex_lock(&s->lock);
+
+    if (s->offset == s->total_size || s->ret != -EINPROGRESS) {
+        qemu_co_mutex_unlock(&s->lock);
+        return false;
+    }
+
+    if (s->length == 0 && checksum_block_status(s)) {
+        qemu_co_mutex_unlock(&s->lock);
+        return false;
+    }
+
+    /* Grab one chunk from current extent. */
+    w->offset = s->offset;
+    w->length = MIN(s->length,
+                    s->zero ?  CHECKSUM_ZERO_SIZE : CHECKSUM_BUF_SIZE);
+    w->zero = s->zero;
+    w->ready = s->zero;
+
+    /* Advance state to the next chunk. */
+    s->offset += w->length;
+    s->length -= w->length;
+
+    /* Schedule this chunk update. */
+    QTAILQ_INSERT_TAIL(&s->update_queue, w, entry);
+
+    qemu_co_mutex_unlock(&s->lock);
+
+    return true;
+}
+
+/* Wait until this chunk is in the head of the update queue. */
+static void coroutine_fn checksum_co_wait_for_update(ImgChecksumWorker *w)
+{
+    ImgChecksumState *s = w->state;
+
+    /* Must be called only when we are ready to update the hash. */
+    assert(w->ready);
+
+    while (QTAILQ_FIRST(&s->update_queue) != w) {
+        qemu_coroutine_yield();
+    }
+
+    QTAILQ_REMOVE(&s->update_queue, w, entry);
+}
+
+/* Enter the next worker coroutine if the worker is ready. */
+static void coroutine_fn checksum_co_enter_next(ImgChecksumWorker *w)
+{
+    ImgChecksumState *s = w->state;
+    ImgChecksumWorker *next;
+
+    if (!QTAILQ_EMPTY(&s->update_queue)) {
+        next = QTAILQ_FIRST(&s->update_queue);
+        if (next->ready)
+            qemu_coroutine_enter(next->co);
+    }
+}
+
+static int coroutine_fn checksum_co_read(ImgChecksumWorker *w)
+{
+    ImgChecksumState *s = w->state;
+    int err;
+
+    err = blk_co_pread(s->blk, w->offset, w->length, w->buf, 0);
+    if (err < 0) {
+        error_report("Error reading at offset %" PRId64 " of %s: %s",
+                     w->offset, s->filename, strerror(-err));
+        /* Unschedule this chunk. */
+        QTAILQ_REMOVE(&s->update_queue, w, entry);
+        s->ret = err;
+        return -1;
+    }
+
+    w->ready = true;
+
+    return 0;
+}
+
+static int checksum_update_hash(ImgChecksumWorker *w)
+{
+    ImgChecksumState *s = w->state;
+    int err;
+
+    if (w->zero) {
+        err = blkhash_zero(s->hash, w->length);
+        if (err) {
+            error_report("Error zeroing hash at offset %" PRId64 " of %s: %s",
+                         w->offset, s->filename, strerror(err));
+            s->ret = -err;
+            return -1;
+        }
+    } else {
+        err = blkhash_update(s->hash, w->buf, w->length);
+        if (err) {
+            error_report("Error updating hash at offset %" PRId64 " of %s: %s",
+                         w->offset, s->filename, strerror(err));
+            s->ret = -err;
+            return -1;
+        }
+    }
+
+    return 0;
+}
+
+static void coroutine_fn checksum_co_loop(void *opaque)
+{
+    ImgChecksumWorker *w = opaque;
+    ImgChecksumState *s = w->state;
+
+    s->running_coroutines++;
+    w->buf = blk_blockalign(s->blk, CHECKSUM_BUF_SIZE);
+
+    while (checksum_co_next(w)) {
+        if (!w->zero && checksum_co_read(w)) {
+            break;
+        }
+        checksum_co_wait_for_update(w);
+        if (s->ret == -EINPROGRESS) {
+            if (checksum_update_hash(w)) {
+                break;
+            }
+            qemu_progress_print(((float) w->length / s->total_size) * 100, 100);
+        }
+        checksum_co_enter_next(w);
+    }
+
+    qemu_vfree(w->buf);
+    w->co = NULL;
+    s->running_coroutines--;
+
+    if (s->running_coroutines == 0 && s->ret == -EINPROGRESS) {
+        /* the checksum job finished successfully */
+        s->ret = 0;
+    }
+}
+
+static int checksum_loop(ImgChecksumState *s)
+{
+    int i;
+
+    s->ret = -EINPROGRESS;
+
+    qemu_co_mutex_init(&s->lock);
+    QTAILQ_INIT(&s->update_queue);
+
+    for (i = 0; i < CHECKSUM_COROUTINES; i++) {
+        ImgChecksumWorker *w = &s->workers[i];
+        w->state = s;
+        w->co = qemu_coroutine_create(checksum_co_loop, w);
+        qemu_coroutine_enter(w->co);
+    }
+
+    while (s->running_coroutines) {
+        main_loop_wait(false);
+    }
+
+    return s->ret;
+}
+
 /*
  * Compute image checksum.
  */
 static int img_checksum(int argc, char **argv)
 {
     const char *digest_name = "sha256";
     const size_t block_size = 64 * KiB;
 
     const char *format = NULL;
     const char *cache = BDRV_DEFAULT_CACHE;
-    const char *filename;
-    BlockBackend *blk;
-    BlockDriverState *bs;
-    uint8_t *buf = NULL;
-    int64_t pnum;
     bool progress = false;
+    bool image_opts = false;
+
     int flags = 0;
-    bool writethrough;
-    int64_t total_size;
-    int64_t offset = 0;
+    bool writethrough = false;
     int c;
-    bool image_opts = false;
-    struct blkhash *h = NULL;
-    unsigned char digest[64];
-    unsigned int digest_len;
+
     int ret = 1;
     int err;
 
+    ImgChecksumState s = {0};
+    unsigned char digest[64];
+    unsigned int digest_len;
+
     for (;;) {
         static const struct option long_options[] = {
             {"help", no_argument, 0, 'h'},
             {"object", required_argument, 0, OPTION_OBJECT},
             {"image-opts", no_argument, 0, OPTION_IMAGE_OPTS},
             {0, 0, 0, 0}
         };
         c = getopt_long(argc, argv, ":hf:T:p",
                         long_options, NULL);
         if (c == -1) {
@@ -1697,119 +1937,76 @@ static int img_checksum(int argc, char **argv)
         case OPTION_IMAGE_OPTS:
             image_opts = true;
             break;
         }
     }
 
     if (optind != argc - 1) {
         error_exit("Expecting image file name");
     }
 
-    filename = argv[optind++];
+    s.filename = argv[optind++];
 
     err = bdrv_parse_cache_mode(cache, &flags, &writethrough);
     if (err < 0) {
         error_report("Invalid source cache option: %s", cache);
         return ret;
     }
 
-    blk = img_open(image_opts, filename, format, flags, writethrough, false,
-                   false);
-    if (!blk) {
+    s.blk = img_open(image_opts, s.filename, format, flags, writethrough,
+                     false, false);
+    if (!s.blk) {
         return ret;
     }
 
     /* Initialize before using goto out. */
     qemu_progress_init(progress, 2.0);
 
-    bs = blk_bs(blk);
-    buf = blk_blockalign(blk, IO_BUF_SIZE);
+    s.bs = blk_bs(s.blk);
 
-    total_size = blk_getlength(blk);
-    if (total_size < 0) {
+    s.total_size = blk_getlength(s.blk);
+    if (s.total_size < 0) {
         error_report("Can't get size of %s: %s",
-                     filename, strerror(-total_size));
+                     s.filename, strerror(-s.total_size));
         goto out;
     }
 
-    h = blkhash_new(block_size, digest_name);
-    if (!h) {
+    s.hash = blkhash_new(block_size, digest_name);
+    if (!s.hash) {
         error_report("Can't create blkhash: %s", strerror(errno));
         goto out;
     }
 
     qemu_progress_print(0, 100);
 
-    while (offset < total_size) {
-        int status;
-        int64_t chunk;
-
-        status = bdrv_block_status_above(bs, NULL, offset,
-                                         total_size - offset, &pnum, NULL,
-                                         NULL);
-        if (status < 0) {
-            error_report("Error checking status at offset %" PRId64 " for %s",
-                         offset, filename);
-            goto out;
-        }
-
-        assert(pnum);
-        chunk = pnum;
-
-        if (status & BDRV_BLOCK_ZERO) {
-            chunk = MIN(chunk, SIZE_MAX);
-            err = blkhash_zero(h, chunk);
-            if (err) {
-                error_report("Error zeroing hash at offset %" PRId64
-                             " of %s: %s",
-                             offset, filename, strerror(err));
-                goto out;
-            }
-        } else {
-            chunk = MIN(chunk, IO_BUF_SIZE);
-            err = blk_pread(blk, offset, chunk, buf, 0);
-            if (err < 0) {
-                error_report("Error reading at offset %" PRId64 " of %s: %s",
-                             offset, filename, strerror(-err));
-                goto out;
-            }
-            err = blkhash_update(h, buf, chunk);
-            if (err) {
-                error_report("Error updating hash at offset %" PRId64
-                             " of %s: %s",
-                             offset, filename, strerror(err));
-                goto out;
-            }
-        }
-
-        offset += chunk;
-        qemu_progress_print(((float) chunk / total_size) * 100, 100);
+    err = checksum_loop(&s);
+    if (err) {
+        goto out;
     }
 
-    err = blkhash_final(h, digest, &digest_len);
+    err = blkhash_final(s.hash, digest, &digest_len);
     if (err) {
         error_report("Error finalizing hash of %s: %s",
-                     filename, strerror(err));
+                     s.filename, strerror(err));
         goto out;
     }
 
     for (unsigned i = 0; i < digest_len; i++) {
         printf("%02x", digest[i]);
     }
-    printf("  %s%s", filename, progress ? "" : "\n");
+    printf("  %s%s", s.filename, progress ? "" : "\n");
 
     ret = 0;
 
 out:
-    blkhash_free(h);
-    qemu_vfree(buf);
-    blk_unref(blk);
+    blkhash_free(s.hash);
+    blk_unref(s.blk);
     qemu_progress_end();
 
     return ret;
 }
 #endif /* CONFIG_BLKHASH */
 
 /* Convenience wrapper around qmp_block_dirty_bitmap_merge */
 static void do_dirty_bitmap_merge(const char *dst_node, const char *dst_name,
                                   const char *src_node, const char *src_name,
                                   Error **errp)
-- 
2.37.2



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/3] Add qemu-img checksum command using blkhash
  2022-09-01 14:32 [PATCH 0/3] Add qemu-img checksum command using blkhash Nir Soffer
                   ` (2 preceding siblings ...)
  2022-09-01 14:32 ` [PATCH 3/3] qemu-img: Speed up checksum Nir Soffer
@ 2022-09-18  9:35 ` Nir Soffer
  2022-10-18 17:49   ` Nir Soffer
  3 siblings, 1 reply; 18+ messages in thread
From: Nir Soffer @ 2022-09-18  9:35 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, Kevin Wolf, Hanna Reitz

ping

Kevin, Hanna, I hope you have time to take a look.

https://lists.nongnu.org/archive/html/qemu-block/2022-09/msg00021.html


On Thu, Sep 1, 2022 at 5:32 PM Nir Soffer <nsoffer@redhat.com> wrote:
>
> Since blkhash is available only via copr now, the new command is added as
> optional feature, built only if blkhash-devel package is installed.
>
> Nir Soffer (3):
>   qemu-img: Add checksum command
>   iotests: Test qemu-img checksum
>   qemu-img: Speed up checksum
>
>  docs/tools/qemu-img.rst                       |  22 +
>  meson.build                                   |  10 +-
>  meson_options.txt                             |   2 +
>  qemu-img-cmds.hx                              |   8 +
>  qemu-img.c                                    | 388 ++++++++++++++++++
>  tests/qemu-iotests/tests/qemu-img-checksum    | 149 +++++++
>  .../qemu-iotests/tests/qemu-img-checksum.out  |  74 ++++
>  7 files changed, 652 insertions(+), 1 deletion(-)
>  create mode 100755 tests/qemu-iotests/tests/qemu-img-checksum
>  create mode 100644 tests/qemu-iotests/tests/qemu-img-checksum.out
>
> --
> 2.37.2
>



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/3] Add qemu-img checksum command using blkhash
  2022-09-18  9:35 ` [PATCH 0/3] Add qemu-img checksum command using blkhash Nir Soffer
@ 2022-10-18 17:49   ` Nir Soffer
  0 siblings, 0 replies; 18+ messages in thread
From: Nir Soffer @ 2022-10-18 17:49 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, Kevin Wolf, Hanna Reitz

[-- Attachment #1: Type: text/plain, Size: 1303 bytes --]

On Sun, Sep 18, 2022 at 12:35 PM Nir Soffer <nsoffer@redhat.com> wrote:

> ping
>
> Kevin, Hanna, I hope you have time to take a look.
>
> https://lists.nongnu.org/archive/html/qemu-block/2022-09/msg00021.html


Ping again, hopefully someone has time to look at this :-)


>
>
>
> On Thu, Sep 1, 2022 at 5:32 PM Nir Soffer <nsoffer@redhat.com> wrote:
> >
> > Since blkhash is available only via copr now, the new command is added as
> > optional feature, built only if blkhash-devel package is installed.
> >
> > Nir Soffer (3):
> >   qemu-img: Add checksum command
> >   iotests: Test qemu-img checksum
> >   qemu-img: Speed up checksum
> >
> >  docs/tools/qemu-img.rst                       |  22 +
> >  meson.build                                   |  10 +-
> >  meson_options.txt                             |   2 +
> >  qemu-img-cmds.hx                              |   8 +
> >  qemu-img.c                                    | 388 ++++++++++++++++++
> >  tests/qemu-iotests/tests/qemu-img-checksum    | 149 +++++++
> >  .../qemu-iotests/tests/qemu-img-checksum.out  |  74 ++++
> >  7 files changed, 652 insertions(+), 1 deletion(-)
> >  create mode 100755 tests/qemu-iotests/tests/qemu-img-checksum
> >  create mode 100644 tests/qemu-iotests/tests/qemu-img-checksum.out
> >
> > --
> > 2.37.2
> >
>

[-- Attachment #2: Type: text/html, Size: 2175 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/3] qemu-img: Add checksum command
  2022-09-01 14:32 ` [PATCH 1/3] qemu-img: Add checksum command Nir Soffer
@ 2022-10-26 13:00   ` Hanna Reitz
  2022-10-30 17:37     ` Nir Soffer
  0 siblings, 1 reply; 18+ messages in thread
From: Hanna Reitz @ 2022-10-26 13:00 UTC (permalink / raw)
  To: Nir Soffer, qemu-devel; +Cc: qemu-block, Kevin Wolf

On 01.09.22 16:32, Nir Soffer wrote:
> The checksum command compute a checksum for disk image content using the
> blkhash library[1]. The blkhash library is not packaged yet, but it is
> available via copr[2].
>
> Example run:
>
>      $ ./qemu-img checksum -p fedora-35.qcow2
>      6e5c00c995056319d52395f8d91c7f84725ae3da69ffcba4de4c7d22cff713a5  fedora-35.qcow2
>
> The block checksum is constructed by splitting the image to fixed sized
> blocks and computing a digest of every block. The image checksum is the
> digest of the all block digests.
>
> The checksum uses internally the "sha256" algorithm but it cannot be
> compared with checksums created by other tools such as `sha256sum`.
>
> The blkhash library supports sparse images, zero detection, and
> optimizes zero block hashing (they are practically free). The library
> uses multiple threads to speed up the computation.
>
> Comparing to `sha256sum`, `qemu-img checksum` is 3.5-4800[3] times
> faster, depending on the amount of data in the image:
>
>      $ ./qemu-img info /scratch/50p.raw
>      file format: raw
>      virtual size: 6 GiB (6442450944 bytes)
>      disk size: 2.91 GiB
>
>      $ hyperfine -w2 -r5 -p "sleep 1" "./qemu-img checksum /scratch/50p.raw" \
>                                       "sha256sum /scratch/50p.raw"
>      Benchmark 1: ./qemu-img checksum /scratch/50p.raw
>        Time (mean ± σ):      1.849 s ±  0.037 s    [User: 7.764 s, System: 0.962 s]
>        Range (min … max):    1.813 s …  1.908 s    5 runs
>
>      Benchmark 2: sha256sum /scratch/50p.raw
>        Time (mean ± σ):     14.585 s ±  0.072 s    [User: 13.537 s, System: 1.003 s]
>        Range (min … max):   14.501 s … 14.697 s    5 runs
>
>      Summary
>        './qemu-img checksum /scratch/50p.raw' ran
>          7.89 ± 0.16 times faster than 'sha256sum /scratch/50p.raw'
>
> The new command is available only when `blkhash` is available during
> build. To test the new command please install the `blkhash-devel`
> package:
>
>      $ dnf copr enable nsoffer/blkhash
>      $ sudo dnf install blkhash-devel
>
> [1] https://gitlab.com/nirs/blkhash
> [2] https://copr.fedorainfracloud.org/coprs/nsoffer/blkhash/
> [3] Computing checksum for 8T empty image: qemu-img checksum: 3.7s,
>      sha256sum (estimate): 17,749s
>
> Signed-off-by: Nir Soffer <nsoffer@redhat.com>
> ---
>   docs/tools/qemu-img.rst |  22 +++++
>   meson.build             |  10 ++-
>   meson_options.txt       |   2 +
>   qemu-img-cmds.hx        |   8 ++
>   qemu-img.c              | 191 ++++++++++++++++++++++++++++++++++++++++
>   5 files changed, 232 insertions(+), 1 deletion(-)
>
> diff --git a/docs/tools/qemu-img.rst b/docs/tools/qemu-img.rst
> index 85a6e05b35..8be9c45cbf 100644
> --- a/docs/tools/qemu-img.rst
> +++ b/docs/tools/qemu-img.rst
> @@ -347,20 +347,42 @@ Command description:
>       Check completed, image is corrupted
>     3
>       Check completed, image has leaked clusters, but is not corrupted
>     63
>       Checks are not supported by the image format
>   
>     If ``-r`` is specified, exit codes representing the image state refer to the
>     state after (the attempt at) repairing it. That is, a successful ``-r all``
>     will yield the exit code 0, independently of the image state before.
>   
> +.. option:: checksum [--object OBJECTDEF] [--image-opts] [-f FMT] [-T SRC_CACHE] [-p] FILENAME
> +
> +  Print a checksum for image *FILENAME* guest visible content.

Why not say which kind of checksum it is?

>                                                                  Images with
> +  different format or settings wil have the same checksum.

s/wil/will/

> +
> +  The format is probed unless you specify it by ``-f``.
> +
> +  The checksum is computed for guest visible content. Allocated areas full of
> +  zeroes, zero clusters, and unallocated areas are read as zeros so they will
> +  have the same checksum. Images with single or multiple files or backing files
> +  will have the same checksums if the guest will see the same content when
> +  reading the image.
> +
> +  Image metadata that is not visible to the guest such as dirty bitmaps does
> +  not affect the checksum.
> +
> +  Computing a checksum requires a read-only image. You cannot compute a
> +  checksum of an active image used by a guest,

Makes me ask: Why not?  Other subcommands have the -U flag for this.

>                                                  but you can compute a checksum
> +  of a guest during pull mode incremental backup using NBD URL.
> +
> +  The checksum is not compatible with other tools such as *sha256sum*.

Why not?  I can see it differs even for raw images, but why?  I would 
have very much assumed that this gives me exactly what sha256sum in the 
guest on the guest device would yield.

> +
>   .. option:: commit [--object OBJECTDEF] [--image-opts] [-q] [-f FMT] [-t CACHE] [-b BASE] [-r RATE_LIMIT] [-d] [-p] FILENAME
>   
>     Commit the changes recorded in *FILENAME* in its base image or backing file.
>     If the backing file is smaller than the snapshot, then the backing file will be
>     resized to be the same size as the snapshot.  If the snapshot is smaller than
>     the backing file, the backing file will not be truncated.  If you want the
>     backing file to match the size of the smaller snapshot, you can safely truncate
>     it yourself once the commit operation successfully completes.
>   
>     The image *FILENAME* is emptied after the operation has succeeded. If you do

[...]

> diff --git a/qemu-img.c b/qemu-img.c
> index 7d4b33b3da..7edcfe4bc8 100644
> --- a/qemu-img.c
> +++ b/qemu-img.c
> @@ -17,20 +17,21 @@
>    * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
>    * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
>    * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
>    * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
>    * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
>    * THE SOFTWARE.
>    */
>   
>   #include "qemu/osdep.h"
>   #include <getopt.h>
> +#include <blkhash.h>

This must be guarded by CONFIG_BLKHASH.

>   #include "qemu/help-texts.h"
>   #include "qemu/qemu-progress.h"
>   #include "qemu-version.h"
>   #include "qapi/error.h"
>   #include "qapi/qapi-commands-block-core.h"
>   #include "qapi/qapi-visit-block-core.h"
>   #include "qapi/qobject-output-visitor.h"
>   #include "qapi/qmp/qjson.h"
>   #include "qapi/qmp/qdict.h"
> @@ -1611,20 +1612,210 @@ out:
>       qemu_vfree(buf1);
>       qemu_vfree(buf2);
>       blk_unref(blk2);
>   out2:
>       blk_unref(blk1);
>   out3:
>       qemu_progress_end();
>       return ret;
>   }
>   
> +#ifdef CONFIG_BLKHASH
> +/*
> + * Compute image checksum.
> + */
> +static int img_checksum(int argc, char **argv)
> +{
> +    const char *digest_name = "sha256";

Why don’t we make this configurable?

> +    const size_t block_size = 64 * KiB;
> +
> +    const char *format = NULL;
> +    const char *cache = BDRV_DEFAULT_CACHE;
> +    const char *filename;
> +    BlockBackend *blk;
> +    BlockDriverState *bs;
> +    uint8_t *buf = NULL;
> +    int64_t pnum;
> +    bool progress = false;
> +    int flags = 0;
> +    bool writethrough;
> +    int64_t total_size;
> +    int64_t offset = 0;
> +    int c;
> +    bool image_opts = false;
> +    struct blkhash *h = NULL;
> +    unsigned char digest[64];
> +    unsigned int digest_len;
> +    int ret = 1;
> +    int err;
> +
> +    for (;;) {
> +        static const struct option long_options[] = {
> +            {"help", no_argument, 0, 'h'},
> +            {"object", required_argument, 0, OPTION_OBJECT},
> +            {"image-opts", no_argument, 0, OPTION_IMAGE_OPTS},
> +            {0, 0, 0, 0}
> +        };
> +        c = getopt_long(argc, argv, ":hf:T:p",
> +                        long_options, NULL);
> +        if (c == -1) {
> +            break;
> +        }
> +        switch (c) {
> +        case ':':
> +            missing_argument(argv[optind - 1]);
> +            break;
> +        case '?':
> +            unrecognized_option(argv[optind - 1]);
> +            break;
> +        case 'h':
> +            help();
> +            break;
> +        case 'f':
> +            format = optarg;
> +            break;
> +        case 'T':
> +            cache = optarg;
> +            break;
> +        case 'p':
> +            progress = true;
> +            break;
> +        case OPTION_OBJECT:
> +            {
> +                Error *local_err = NULL;
> +
> +                if (!user_creatable_add_from_str(optarg, &local_err)) {
> +                    if (local_err) {
> +                        error_report_err(local_err);
> +                        exit(2);
> +                    } else {
> +                        /* Help was printed */
> +                        exit(EXIT_SUCCESS);
> +                    }
> +                }

How about simply using user_creatable_process_cmdline(optarg)? (which 
most other subcommands use)

(I believe img_compare() has to have its own code, because exit code 1 
is reserved there.  But that isn’t a concern here.)

> +                break;
> +            }
> +        case OPTION_IMAGE_OPTS:
> +            image_opts = true;
> +            break;
> +        }
> +    }
> +
> +    if (optind != argc - 1) {
> +        error_exit("Expecting image file name");
> +    }
> +
> +    filename = argv[optind++];
> +
> +    err = bdrv_parse_cache_mode(cache, &flags, &writethrough);
> +    if (err < 0) {
> +        error_report("Invalid source cache option: %s", cache);
> +        return ret;

Personally, I’m not too big of a fan of using `ret` here, because it was 
set so far above.  Why not just `return 1;`?

(Same in other error cases below.)

> +    }
> +
> +    blk = img_open(image_opts, filename, format, flags, writethrough, false,
> +                   false);
> +    if (!blk) {
> +        return ret;
> +    }
> +
> +    /* Initialize before using goto out. */
> +    qemu_progress_init(progress, 2.0);
> +
> +    bs = blk_bs(blk);
> +    buf = blk_blockalign(blk, IO_BUF_SIZE);

It looks like this macro is kind of specific to `img_compare()`, and is 
currently somewhere in the middle of its code.  I don’t mind reusing it 
here, but if so, we might want to move it up to the top of this file, 
and add a comment that this is the buffer size used for commands like 
compare or checksum.

> +
> +    total_size = blk_getlength(blk);
> +    if (total_size < 0) {
> +        error_report("Can't get size of %s: %s",
> +                     filename, strerror(-total_size));
> +        goto out;

I suggest adding `ret = 1;` before such a `goto out;` (not just here), 
so it is clear exactly what value we are going to return (it isn’t 
trivial to track that `ret` isn’t used for anything else). But I can see 
that it’s extra code, so maybe you don’t like that.

If we keep this as-is, perhaps we could rename `ret` to something more 
explicit like `exit_status`, though?  That way, it’s pretty clear that 
we won’t accidentally reuse it for anything else.  (I know this isn’t 
the first subcommand to use `ret` for the process exist status, but I 
don’t think those other subcommands are great role models in this regard.)

> +    }
> +
> +    h = blkhash_new(block_size, digest_name);

Should we somehow make sure that IO_BUF_SIZE is a multiple of 
block_size?  I mean, it is, but it isn’t obvious at least, and I guess 
maybe at some point someone might want to make block_size a parameter.  
Would a static assertion work?  (Would stop working once someone decide 
to make block_size a parameter, which is good, because it draws attention.)

> +    if (!h) {
> +        error_report("Can't create blkhash: %s", strerror(errno));
> +        goto out;
> +    }
> +
> +    qemu_progress_print(0, 100);
> +
> +    while (offset < total_size) {
> +        int status;
> +        int64_t chunk;
> +
> +        status = bdrv_block_status_above(bs, NULL, offset,
> +                                         total_size - offset, &pnum, NULL,
> +                                         NULL);
> +        if (status < 0) {
> +            error_report("Error checking status at offset %" PRId64 " for %s",
> +                         offset, filename);
> +            goto out;
> +        }
> +
> +        assert(pnum);
> +        chunk = pnum;

Can’t we drop `pnum` and replace it by `chunk` in all cases?

> +
> +        if (status & BDRV_BLOCK_ZERO) {
> +            chunk = MIN(chunk, SIZE_MAX);

Itches me a bit to propose rounding SIZE_MAX down to block_size, but I 
guess given the magnitude of SIZE_MAX on 64-bit systems, it doesn’t matter.

> +            err = blkhash_zero(h, chunk);
> +            if (err) {
> +                error_report("Error zeroing hash at offset %" PRId64
> +                             " of %s: %s",
> +                             offset, filename, strerror(err));
> +                goto out;
> +            }
> +        } else {
> +            chunk = MIN(chunk, IO_BUF_SIZE);
> +            err = blk_pread(blk, offset, chunk, buf, 0);
> +            if (err < 0) {
> +                error_report("Error reading at offset %" PRId64 " of %s: %s",
> +                             offset, filename, strerror(-err));
> +                goto out;
> +            }
> +            err = blkhash_update(h, buf, chunk);
> +            if (err) {
> +                error_report("Error updating hash at offset %" PRId64
> +                             " of %s: %s",
> +                             offset, filename, strerror(err));
> +                goto out;
> +            }
> +        }
> +
> +        offset += chunk;
> +        qemu_progress_print(((float) chunk / total_size) * 100, 100);
> +    }
> +
> +    err = blkhash_final(h, digest, &digest_len);

How does this verify that `digest` is sufficiently large?

I mean, it is, given that we only have sha256 now.  But it still seems 
rather dangerous to me.

> +    if (err) {
> +        error_report("Error finalizing hash of %s: %s",
> +                     filename, strerror(err));
> +        goto out;
> +    }
> +
> +    for (unsigned i = 0; i < digest_len; i++) {

I always such declarations weren’t allowed in qemu, but this isn’t the 
first place, and I don’t mind.  Good to know. :)

Hanna

> +        printf("%02x", digest[i]);
> +    }
> +    printf("  %s%s", filename, progress ? "" : "\n");
> +
> +    ret = 0;
> +
> +out:
> +    blkhash_free(h);
> +    qemu_vfree(buf);
> +    blk_unref(blk);
> +    qemu_progress_end();
> +
> +    return ret;
> +}
> +#endif /* CONFIG_BLKHASH */
> +
>   /* Convenience wrapper around qmp_block_dirty_bitmap_merge */
>   static void do_dirty_bitmap_merge(const char *dst_node, const char *dst_name,
>                                     const char *src_node, const char *src_name,
>                                     Error **errp)
>   {
>       BlockDirtyBitmapOrStr *merge_src;
>       BlockDirtyBitmapOrStrList *list = NULL;
>   
>       merge_src = g_new0(BlockDirtyBitmapOrStr, 1);
>       merge_src->type = QTYPE_QDICT;



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/3] iotests: Test qemu-img checksum
  2022-09-01 14:32 ` [PATCH 2/3] iotests: Test qemu-img checksum Nir Soffer
@ 2022-10-26 13:30   ` Hanna Reitz
  2022-10-30 17:38     ` Nir Soffer
  0 siblings, 1 reply; 18+ messages in thread
From: Hanna Reitz @ 2022-10-26 13:30 UTC (permalink / raw)
  To: Nir Soffer, qemu-devel; +Cc: qemu-block, Kevin Wolf

On 01.09.22 16:32, Nir Soffer wrote:
> Add simple tests creating an image with all kinds of extents, different
> formats, different backing chain, different protocol, and different
> image options. Since all images have the same guest visible content they
> must have the same checksum.
>
> To help debugging in case of failures, the output includes a json map of
> every test image.
>
> Signed-off-by: Nir Soffer <nsoffer@redhat.com>
> ---
>   tests/qemu-iotests/tests/qemu-img-checksum    | 149 ++++++++++++++++++
>   .../qemu-iotests/tests/qemu-img-checksum.out  |  74 +++++++++
>   2 files changed, 223 insertions(+)
>   create mode 100755 tests/qemu-iotests/tests/qemu-img-checksum
>   create mode 100644 tests/qemu-iotests/tests/qemu-img-checksum.out
>
> diff --git a/tests/qemu-iotests/tests/qemu-img-checksum b/tests/qemu-iotests/tests/qemu-img-checksum
> new file mode 100755
> index 0000000000..3a85ba33f2
> --- /dev/null
> +++ b/tests/qemu-iotests/tests/qemu-img-checksum
> @@ -0,0 +1,149 @@
> +#!/usr/bin/env python3
> +# group: rw auto quick
> +#
> +# Test cases for qemu-img checksum.
> +#
> +# Copyright (C) 2022 Red Hat, Inc.
> +#
> +# This program is free software; you can redistribute it and/or modify
> +# it under the terms of the GNU General Public License as published by
> +# the Free Software Foundation; either version 2 of the License, or
> +# (at your option) any later version.
> +#
> +# This program is distributed in the hope that it will be useful,
> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +# GNU General Public License for more details.
> +#
> +# You should have received a copy of the GNU General Public License
> +# along with this program.  If not, see <http://www.gnu.org/licenses/>.
> +
> +import re
> +
> +import iotests
> +
> +from iotests import (
> +    filter_testfiles,
> +    qemu_img,
> +    qemu_img_log,
> +    qemu_io,
> +    qemu_nbd_popen,
> +)
> +
> +
> +def checksum_available():
> +    out = qemu_img("--help").stdout
> +    return re.search(r"\bchecksum .+ filename\b", out) is not None
> +
> +
> +if not checksum_available():
> +    iotests.notrun("checksum command not available")
> +
> +iotests.script_initialize(
> +    supported_fmts=["raw", "qcow2"],
> +    supported_cache_modes=["none", "writeback"],

It doesn’t work with writeback, though, because it uses -T none below.

Which by the way is a heavy cost, because I usually run tests in tmpfs, 
where this won’t work.  Is there any way of not doing the -T none below?

> +    supported_protocols=["file", "nbd"],
> +    required_fmts=["raw", "qcow2"],
> +)
> +
> +print()
> +print("=== Test images ===")
> +print()
> +
> +disk_raw = iotests.file_path('raw')
> +qemu_img("create", "-f", "raw", disk_raw, "10m")
> +qemu_io("-f", "raw",
> +        "-c", "write -P 0x1 0 2m",      # data
> +        "-c", "write -P 0x0 2m 2m",     # data with zeroes
> +        "-c", "write -z 4m 2m",         # zero allocated
> +        "-c", "write -z -u 6m 2m",      # zero hole
> +                                        # unallocated
> +        disk_raw)
> +print(filter_testfiles(disk_raw))
> +qemu_img_log("map", "--output", "json", disk_raw)
> +
> +disk_qcow2 = iotests.file_path('qcow2')
> +qemu_img("create", "-f", "qcow2", disk_qcow2, "10m")
> +qemu_io("-f", "qcow2",
> +        "-c", "write -P 0x1 0 2m",      # data
> +        "-c", "write -P 0x0 2m 2m",     # data with zeroes
> +        "-c", "write -z 4m 2m",         # zero allocated
> +        "-c", "write -z -u 6m 2m",      # zero hole
> +                                        # unallocated
> +        disk_qcow2)
> +print(filter_testfiles(disk_qcow2))
> +qemu_img_log("map", "--output", "json", disk_qcow2)

This isn’t how iotests work, generally.  When run with -qcow2 -file, it 
should only test qcow2 on file, not raw on file, not raw on nbd. Perhaps 
this way this test could even support other formats than qcow2 and raw.

Hanna



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 3/3] qemu-img: Speed up checksum
  2022-09-01 14:32 ` [PATCH 3/3] qemu-img: Speed up checksum Nir Soffer
@ 2022-10-26 13:54   ` Hanna Reitz
  2022-10-30 17:38     ` Nir Soffer
  0 siblings, 1 reply; 18+ messages in thread
From: Hanna Reitz @ 2022-10-26 13:54 UTC (permalink / raw)
  To: Nir Soffer, qemu-devel; +Cc: qemu-block, Kevin Wolf

On 01.09.22 16:32, Nir Soffer wrote:
> Add coroutine based loop inspired by `qemu-img convert` design.
>
> Changes compared to `qemu-img convert`:
>
> - State for the entire image is kept in ImgChecksumState
>
> - State for single worker coroutine is kept in ImgChecksumworker.
>
> - "Writes" are always in-order, ensured using a queue.
>
> - Calling block status once per image extent, when the current extent is
>    consumed by the workers.
>
> - Using 1m buffer size - testings shows that this gives best read
>    performance both with buffered and direct I/O.

Why does patch 1 then choose to use 2 MB?

> - Number of coroutines is not configurable. Testing does not show
>    improvement when using more than 8 coroutines.
>
> - Progress include entire image, not only the allocated state.
>
> Comparing to the simple read loop shows that this version is up to 4.67
> times faster when computing a checksum for an image full of zeroes. For
> real images it is 1.59 times faster with direct I/O, and with buffered
> I/O there is no difference.
>
> Test results on Dell PowerEdge R640 in a CentOS Stream 9 container:
>
> | image    | size | i/o       | before         | after          | change |
> |----------|------|-----------|----------------|----------------|--------|
> | zero [1] |   6g | buffered  | 1.600s ±0.014s | 0.342s ±0.016s |  x4.67 |
> | zero     |   6g | direct    | 4.684s ±0.093s | 2.211s ±0.009s |  x2.12 |
> | real [2] |   6g | buffered  | 1.841s ±0.075s | 1.806s ±0.036s |  x1.02 |
> | real     |   6g | direct    | 3.094s ±0.079s | 1.947s ±0.017s |  x1.59 |
> | nbd  [3] |   6g | buffered  | 2.455s ±0.183s | 1.808s ±0.016s |  x1.36 |
> | nbd      |   6g | direct    | 3.540s ±0.020s | 1.749s ±0.018s |  x2.02 |
>
> [1] raw image full of zeroes
> [2] raw fedora 35 image with additional random data, 50% full
> [3] image [2] exported by qemu-nbd via unix socket
>
> Signed-off-by: Nir Soffer <nsoffer@redhat.com>
> ---
>   qemu-img.c | 343 +++++++++++++++++++++++++++++++++++++++++------------
>   1 file changed, 270 insertions(+), 73 deletions(-)

Looks good!

Just a couple of style comments below.

> diff --git a/qemu-img.c b/qemu-img.c
> index 7edcfe4bc8..bfa8e2862f 100644
> --- a/qemu-img.c
> +++ b/qemu-img.c
> @@ -1613,48 +1613,288 @@ out:
>       qemu_vfree(buf2);
>       blk_unref(blk2);
>   out2:
>       blk_unref(blk1);
>   out3:
>       qemu_progress_end();
>       return ret;
>   }
>   
>   #ifdef CONFIG_BLKHASH
> +
> +#define CHECKSUM_COROUTINES 8
> +#define CHECKSUM_BUF_SIZE (1 * MiB)
> +#define CHECKSUM_ZERO_SIZE MIN(16 * GiB, SIZE_MAX)
> +
> +typedef struct ImgChecksumState ImgChecksumState;
> +
> +typedef struct ImgChecksumWorker {
> +    QTAILQ_ENTRY(ImgChecksumWorker) entry;
> +    ImgChecksumState *state;
> +    Coroutine *co;
> +    uint8_t *buf;
> +
> +    /* The current chunk. */
> +    int64_t offset;
> +    int64_t length;
> +    bool zero;
> +
> +    /* Always true for zero extent, false for data extent. Set to true
> +     * when reading the chunk completes. */

Qemu codestyle requires /* and */ to be on separate lines for multi-line 
comments (see checkpatch.pl).

> +    bool ready;
> +} ImgChecksumWorker;
> +
> +struct ImgChecksumState {
> +    const char *filename;
> +    BlockBackend *blk;
> +    BlockDriverState *bs;
> +    int64_t total_size;
> +
> +    /* Current extent, modified in checksum_co_next. */
> +    int64_t offset;
> +    int64_t length;
> +    bool zero;
> +
> +    int running_coroutines;
> +    CoMutex lock;
> +    ImgChecksumWorker workers[CHECKSUM_COROUTINES];
> +
> +    /* Ensure in-order updates. Update are scheduled at the tail of the
> +     * queue and processed from the head of the queue when a worker is
> +     * ready. */

Qemu codestyle requires /* and */ to be on separate lines for multi-line 
comments.

> +    QTAILQ_HEAD(, ImgChecksumWorker) update_queue;
> +
> +    struct blkhash *hash;
> +    int ret;
> +};

[...]

> +static coroutine_fn bool checksum_co_next(ImgChecksumWorker *w)
> +{
> +    ImgChecksumState *s = w->state;
> +
> +    qemu_co_mutex_lock(&s->lock);
> +
> +    if (s->offset == s->total_size || s->ret != -EINPROGRESS) {
> +        qemu_co_mutex_unlock(&s->lock);
> +        return false;
> +    }
> +
> +    if (s->length == 0 && checksum_block_status(s)) {

I’d prefer `checksum_block_status(s) < 0` so that it is clear that 
negative values indicate errors.  Otherwise, based on this code alone, 
it looks to me more like `checksum_block_status()` returned a boolean, 
where `false` would generally indicate an error, which is confusing.

Same in other places below.

> +        qemu_co_mutex_unlock(&s->lock);
> +        return false;
> +    }

[...]

> +/* Enter the next worker coroutine if the worker is ready. */
> +static void coroutine_fn checksum_co_enter_next(ImgChecksumWorker *w)
> +{
> +    ImgChecksumState *s = w->state;
> +    ImgChecksumWorker *next;
> +
> +    if (!QTAILQ_EMPTY(&s->update_queue)) {
> +        next = QTAILQ_FIRST(&s->update_queue);
> +        if (next->ready)
> +            qemu_coroutine_enter(next->co);

Qemu codestyle requires braces here.

Hanna



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/3] qemu-img: Add checksum command
  2022-10-26 13:00   ` Hanna Reitz
@ 2022-10-30 17:37     ` Nir Soffer
  2022-11-07 10:20       ` Hanna Reitz
  0 siblings, 1 reply; 18+ messages in thread
From: Nir Soffer @ 2022-10-30 17:37 UTC (permalink / raw)
  To: Hanna Reitz; +Cc: qemu-devel, qemu-block, Kevin Wolf

[-- Attachment #1: Type: text/plain, Size: 19094 bytes --]

On Wed, Oct 26, 2022 at 4:00 PM Hanna Reitz <hreitz@redhat.com> wrote:

> On 01.09.22 16:32, Nir Soffer wrote:
> > The checksum command compute a checksum for disk image content using the
> > blkhash library[1]. The blkhash library is not packaged yet, but it is
> > available via copr[2].
> >
> > Example run:
> >
> >      $ ./qemu-img checksum -p fedora-35.qcow2
> >      6e5c00c995056319d52395f8d91c7f84725ae3da69ffcba4de4c7d22cff713a5
> fedora-35.qcow2
> >
> > The block checksum is constructed by splitting the image to fixed sized
> > blocks and computing a digest of every block. The image checksum is the
> > digest of the all block digests.
> >
> > The checksum uses internally the "sha256" algorithm but it cannot be
> > compared with checksums created by other tools such as `sha256sum`.
> >
> > The blkhash library supports sparse images, zero detection, and
> > optimizes zero block hashing (they are practically free). The library
> > uses multiple threads to speed up the computation.
> >
> > Comparing to `sha256sum`, `qemu-img checksum` is 3.5-4800[3] times
> > faster, depending on the amount of data in the image:
> >
> >      $ ./qemu-img info /scratch/50p.raw
> >      file format: raw
> >      virtual size: 6 GiB (6442450944 bytes)
> >      disk size: 2.91 GiB
> >
> >      $ hyperfine -w2 -r5 -p "sleep 1" "./qemu-img checksum
> /scratch/50p.raw" \
> >                                       "sha256sum /scratch/50p.raw"
> >      Benchmark 1: ./qemu-img checksum /scratch/50p.raw
> >        Time (mean ± σ):      1.849 s ±  0.037 s    [User: 7.764 s,
> System: 0.962 s]
> >        Range (min … max):    1.813 s …  1.908 s    5 runs
> >
> >      Benchmark 2: sha256sum /scratch/50p.raw
> >        Time (mean ± σ):     14.585 s ±  0.072 s    [User: 13.537 s,
> System: 1.003 s]
> >        Range (min … max):   14.501 s … 14.697 s    5 runs
> >
> >      Summary
> >        './qemu-img checksum /scratch/50p.raw' ran
> >          7.89 ± 0.16 times faster than 'sha256sum /scratch/50p.raw'
> >
> > The new command is available only when `blkhash` is available during
> > build. To test the new command please install the `blkhash-devel`
> > package:
> >
> >      $ dnf copr enable nsoffer/blkhash
> >      $ sudo dnf install blkhash-devel
> >
> > [1] https://gitlab.com/nirs/blkhash
> > [2] https://copr.fedorainfracloud.org/coprs/nsoffer/blkhash/
> > [3] Computing checksum for 8T empty image: qemu-img checksum: 3.7s,
> >      sha256sum (estimate): 17,749s
> >
> > Signed-off-by: Nir Soffer <nsoffer@redhat.com>
> > ---
> >   docs/tools/qemu-img.rst |  22 +++++
> >   meson.build             |  10 ++-
> >   meson_options.txt       |   2 +
> >   qemu-img-cmds.hx        |   8 ++
> >   qemu-img.c              | 191 ++++++++++++++++++++++++++++++++++++++++
> >   5 files changed, 232 insertions(+), 1 deletion(-)
> >
> > diff --git a/docs/tools/qemu-img.rst b/docs/tools/qemu-img.rst
> > index 85a6e05b35..8be9c45cbf 100644
> > --- a/docs/tools/qemu-img.rst
> > +++ b/docs/tools/qemu-img.rst
> > @@ -347,20 +347,42 @@ Command description:
> >       Check completed, image is corrupted
> >     3
> >       Check completed, image has leaked clusters, but is not corrupted
> >     63
> >       Checks are not supported by the image format
> >
> >     If ``-r`` is specified, exit codes representing the image state
> refer to the
> >     state after (the attempt at) repairing it. That is, a successful
> ``-r all``
> >     will yield the exit code 0, independently of the image state before.
> >
> > +.. option:: checksum [--object OBJECTDEF] [--image-opts] [-f FMT] [-T
> SRC_CACHE] [-p] FILENAME
> > +
> > +  Print a checksum for image *FILENAME* guest visible content.
>
> Why not say which kind of checksum it is?
>

Do you mean the algorithm used? This may be confusing, for example we write

   Print a sha256 checksum ...

User will expect to get the same result from "sha256sum disk.img". How about

   Print a blkhash checksum ...

And add a link to the blkhash project?


>
> >                                                                  Images
> with
> > +  different format or settings wil have the same checksum.
>
> s/wil/will/
>

Fixing


>
> > +
> > +  The format is probed unless you specify it by ``-f``.
> > +
> > +  The checksum is computed for guest visible content. Allocated areas
> full of
> > +  zeroes, zero clusters, and unallocated areas are read as zeros so
> they will
> > +  have the same checksum. Images with single or multiple files or
> backing files
> > +  will have the same checksums if the guest will see the same content
> when
> > +  reading the image.
> > +
> > +  Image metadata that is not visible to the guest such as dirty bitmaps
> does
> > +  not affect the checksum.
> > +
> > +  Computing a checksum requires a read-only image. You cannot compute a
> > +  checksum of an active image used by a guest,
>
> Makes me ask: Why not?  Other subcommands have the -U flag for this.
>

The text is not precise enough, the issue is not active image but having a
read only
image.

What we really want is to take a shared read lock on the image when opening
it to ensure
that qemu or another instance of qemu-img is not holding a write lock and
cannot upgrade
a read lock to write lock on this image.

If the image can be changed while we read from it there is no point in
computing
a checksum.

If the -U/--force-share flag has these semantics we can add it, but based
on testing it
works for images open for writing (e.g. qemu-img info, qemu-img measure).


>
> >                                                  but you can compute a
> checksum
> > +  of a guest during pull mode incremental backup using NBD URL.
> > +
> > +  The checksum is not compatible with other tools such as *sha256sum*.
>
> Why not?  I can see it differs even for raw images, but why?  I would
> have very much assumed that this gives me exactly what sha256sum in the
> guest on the guest device would yield.


The blkhash is a construction based on other cryptographic hash functions
(e.g. sha256).
The way the hash is constructed is explained here:
https://gitlab.com/nirs/blkhash/-/blob/master/blkhash.py#L52

We can provide a very slow version using a single thread and no zero
optimization
that will create the same hash as sha256sum for raw image. I don't think
this is very
interesting since it is not practical for large images. If we want
something like this,
we can add it later when we add an option to configure the algorithm.

We can also start with a slow version using available hash function and
enable
blkhash only if the library is available in build time.

[...]
>
> > diff --git a/qemu-img.c b/qemu-img.c
> > index 7d4b33b3da..7edcfe4bc8 100644
> > --- a/qemu-img.c
> > +++ b/qemu-img.c
> > @@ -17,20 +17,21 @@
> >    * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
> MERCHANTABILITY,
> >    * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT
> SHALL
> >    * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES
> OR OTHER
> >    * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
> ARISING FROM,
> >    * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
> DEALINGS IN
> >    * THE SOFTWARE.
> >    */
> >
> >   #include "qemu/osdep.h"
> >   #include <getopt.h>
> > +#include <blkhash.h>
>
> This must be guarded by CONFIG_BLKHASH.
>

Right

[...]

> > +#ifdef CONFIG_BLKHASH
> > +/*
> > + * Compute image checksum.
> > + */
> > +static int img_checksum(int argc, char **argv)
> > +{
> > +    const char *digest_name = "sha256";
>
> Why don’t we make this configurable?
>

We can, but this is not easy:
- which hash functions should be supported?
- how to encode the algorithm used in the hash function?

The blksum tool (part of blkhash) has a --digest and --list-digests options
and
can use any hash function supported by openssl. Currently it does not
include
the algorithm, in the result, so the user has to know 2 values to compare
hashes
of different images.

If we don't support configuration, all these issues are gone, and the tool
is
easier and safer to use.

If we want to make this configurable, I think we need to include the
algorithm
in the result, something like:

    $ qemu-img checksum disk.img

blk-sha256:9b3d2f329b8e1a3a10ac623efa163c12e953dbb5192825b4772dcf0f8905e1b1
disk.img

    $ qemu-img checksum --digest sha1 disk.img
    blk-sha1:6d7e6d55bed36e733bf50127d151f8af7bb5077f  disk.img

Or if we want to support standard checksums, maybe:

    $ qemu-img checksum --digest sha256 disk.img

sha256:49bc20df15e412a64472421e13fe86ff1c5165e18b2afccf160d4dc19fe68a14
disk.img

    $ qemu-img checksum --digest blk-sha256 disk.img

blk-sha256:9b3d2f329b8e1a3a10ac623efa163c12e953dbb5192825b4772dcf0f8905e1b1
disk.img


> > +    const size_t block_size = 64 * KiB;
> > +
> > +    const char *format = NULL;
> > +    const char *cache = BDRV_DEFAULT_CACHE;
> > +    const char *filename;
> > +    BlockBackend *blk;
> > +    BlockDriverState *bs;
> > +    uint8_t *buf = NULL;
> > +    int64_t pnum;
> > +    bool progress = false;
> > +    int flags = 0;
> > +    bool writethrough;
> > +    int64_t total_size;
> > +    int64_t offset = 0;
> > +    int c;
> > +    bool image_opts = false;
> > +    struct blkhash *h = NULL;
> > +    unsigned char digest[64];
> > +    unsigned int digest_len;
> > +    int ret = 1;
> > +    int err;
> > +
> > +    for (;;) {
> > +        static const struct option long_options[] = {
> > +            {"help", no_argument, 0, 'h'},
> > +            {"object", required_argument, 0, OPTION_OBJECT},
> > +            {"image-opts", no_argument, 0, OPTION_IMAGE_OPTS},
> > +            {0, 0, 0, 0}
> > +        };
> > +        c = getopt_long(argc, argv, ":hf:T:p",
> > +                        long_options, NULL);
> > +        if (c == -1) {
> > +            break;
> > +        }
> > +        switch (c) {
> > +        case ':':
> > +            missing_argument(argv[optind - 1]);
> > +            break;
> > +        case '?':
> > +            unrecognized_option(argv[optind - 1]);
> > +            break;
> > +        case 'h':
> > +            help();
> > +            break;
> > +        case 'f':
> > +            format = optarg;
> > +            break;
> > +        case 'T':
> > +            cache = optarg;
> > +            break;
> > +        case 'p':
> > +            progress = true;
> > +            break;
> > +        case OPTION_OBJECT:
> > +            {
> > +                Error *local_err = NULL;
> > +
> > +                if (!user_creatable_add_from_str(optarg, &local_err)) {
> > +                    if (local_err) {
> > +                        error_report_err(local_err);
> > +                        exit(2);
> > +                    } else {
> > +                        /* Help was printed */
> > +                        exit(EXIT_SUCCESS);
> > +                    }
> > +                }
>
> How about simply using user_creatable_process_cmdline(optarg)? (which
> most other subcommands use)
>
> (I believe img_compare() has to have its own code, because exit code 1
> is reserved there.  But that isn’t a concern here.)
>

Sure I can use it, I did not know about this.


>
> > +                break;
> > +            }
> > +        case OPTION_IMAGE_OPTS:
> > +            image_opts = true;
> > +            break;
> > +        }
> > +    }
> > +
> > +    if (optind != argc - 1) {
> > +        error_exit("Expecting image file name");
> > +    }
> > +
> > +    filename = argv[optind++];
> > +
> > +    err = bdrv_parse_cache_mode(cache, &flags, &writethrough);
> > +    if (err < 0) {
> > +        error_report("Invalid source cache option: %s", cache);
> > +        return ret;
>
> Personally, I’m not too big of a fan of using `ret` here, because it was
> set so far above.  Why not just `return 1;`?
>
> (Same in other error cases below.)
>

ret is set at the start to error value (1) and modified to 0 only if
everything
completed successfully.  This way we can return the same value in all
the exit paths.

> +    }
> > +
> > +    blk = img_open(image_opts, filename, format, flags, writethrough,
> false,
> > +                   false);
> > +    if (!blk) {
> > +        return ret;
> > +    }
> > +
> > +    /* Initialize before using goto out. */
> > +    qemu_progress_init(progress, 2.0);
> > +
> > +    bs = blk_bs(blk);
> > +    buf = blk_blockalign(blk, IO_BUF_SIZE);
>
> It looks like this macro is kind of specific to `img_compare()`, and is
> currently somewhere in the middle of its code.  I don’t mind reusing it
> here, but if so, we might want to move it up to the top of this file,
> and add a comment that this is the buffer size used for commands like
> compare or checksum.
>

IO_BUF_SIZE is also used by qemu-img convert. In the next commits
I'm adding another macro for the checksum code, so this change is not
needed, but I can move the macro to the top of the file to make it clear
that this is a default for many commands, and not specific qemu-img compare
default.


>
> > +
> > +    total_size = blk_getlength(blk);
> > +    if (total_size < 0) {
> > +        error_report("Can't get size of %s: %s",
> > +                     filename, strerror(-total_size));
> > +        goto out;
>
> I suggest adding `ret = 1;` before such a `goto out;` (not just here),
> so it is clear exactly what value we are going to return (it isn’t
> trivial to track that `ret` isn’t used for anything else). But I can see
> that it’s extra code, so maybe you don’t like that.
>
> If we keep this as-is, perhaps we could rename `ret` to something more
> explicit like `exit_status`, though?  That way, it’s pretty clear that
> we won’t accidentally reuse it for anything else.  (I know this isn’t
> the first subcommand to use `ret` for the process exist status, but I
> don’t think those other subcommands are great role models in this regard.)
>

exit_status is much better!

I'll rename ret to exit_status in the next version, and keep the way it is
handled
(initialized to 1, set to 0 on success).


>
> > +    }
> > +
> > +    h = blkhash_new(block_size, digest_name);
>
> Should we somehow make sure that IO_BUF_SIZE is a multiple of
> block_size?  I mean, it is, but it isn’t obvious at least, and I guess
> maybe at some point someone might want to make block_size a parameter.
> Would a static assertion work?  (Would stop working once someone decide
> to make block_size a parameter, which is good, because it draws attention.)


Using a multiple of block_size is no required but it can improve
performance by
minimizing copies in the pending buffer managed by blkhash.

Making block_size configurable is not good way, since it changes the value
of the
hash. In we do in this way we maybe wan to encode the block size in the
result
so someone can restore the arguments for comparing an image against a hash:


blk-sha256/64k:9b3d2f329b8e1a3a10ac623efa163c12e953dbb5192825b4772dcf0f8905e1b1

This starts to look too complicated to me. In blksum I do not allow
changing the block size.

I'll add an assert in the next version to keeps this default optimal.


> > +    if (!h) {
> > +        error_report("Can't create blkhash: %s", strerror(errno));
> > +        goto out;
> > +    }
> > +
> > +    qemu_progress_print(0, 100);
> > +
> > +    while (offset < total_size) {
> > +        int status;
> > +        int64_t chunk;
> > +
> > +        status = bdrv_block_status_above(bs, NULL, offset,
> > +                                         total_size - offset, &pnum,
> NULL,
> > +                                         NULL);
> > +        if (status < 0) {
> > +            error_report("Error checking status at offset %" PRId64 "
> for %s",
> > +                         offset, filename);
> > +            goto out;
> > +        }
> > +
> > +        assert(pnum);
> > +        chunk = pnum;
>
> Can’t we drop `pnum` and replace it by `chunk` in all cases?
>

I did this in the next commit, I can update it in this commit.


>
> > +
> > +        if (status & BDRV_BLOCK_ZERO) {
> > +            chunk = MIN(chunk, SIZE_MAX);
>
> Itches me a bit to propose rounding SIZE_MAX down to block_size, but I
> guess given the magnitude of SIZE_MAX on 64-bit systems, it doesn’t matter.
>

We round down to SIZE_MAX, since blkash_zero() accept size_t.

I can also change blkhash to use int64_t instead. Testing show that 16g is
a good
value to ensure timely progress updates.


>
> > +            err = blkhash_zero(h, chunk);
> > +            if (err) {
> > +                error_report("Error zeroing hash at offset %" PRId64
> > +                             " of %s: %s",
> > +                             offset, filename, strerror(err));
> > +                goto out;
> > +            }
> > +        } else {
> > +            chunk = MIN(chunk, IO_BUF_SIZE);
> > +            err = blk_pread(blk, offset, chunk, buf, 0);
> > +            if (err < 0) {
> > +                error_report("Error reading at offset %" PRId64 " of
> %s: %s",
> > +                             offset, filename, strerror(-err));
> > +                goto out;
> > +            }
> > +            err = blkhash_update(h, buf, chunk);
> > +            if (err) {
> > +                error_report("Error updating hash at offset %" PRId64
> > +                             " of %s: %s",
> > +                             offset, filename, strerror(err));
> > +                goto out;
> > +            }
> > +        }
> > +
> > +        offset += chunk;
> > +        qemu_progress_print(((float) chunk / total_size) * 100, 100);
> > +    }
> > +
> > +    err = blkhash_final(h, digest, &digest_len);
>
> How does this verify that `digest` is sufficiently large?
>
> I mean, it is, given that we only have sha256 now.  But it still seems
> rather dangerous to me.
>

In blksum I'm using EVP_MAX_MD_SIZE (from <openssl/evp.h>) which is 64
(e.g. sha512).

I'll add a similar macro in blkhash.h so users do not have to assume
anything.


>
> > +    if (err) {
> > +        error_report("Error finalizing hash of %s: %s",
> > +                     filename, strerror(err));
> > +        goto out;
> > +    }
> > +
> > +    for (unsigned i = 0; i < digest_len; i++) {
>
> I always such declarations weren’t allowed in qemu, but this isn’t the
> first place, and I don’t mind.  Good to know. :)
>

meson.build uses:

    'c_std=gnu99',

So the more modern and safer way should be allowed.

Thanks for reviewing, this is very helpful!

Nir

[-- Attachment #2: Type: text/html, Size: 26072 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/3] iotests: Test qemu-img checksum
  2022-10-26 13:30   ` Hanna Reitz
@ 2022-10-30 17:38     ` Nir Soffer
  2022-11-07 11:41       ` Hanna Reitz
  0 siblings, 1 reply; 18+ messages in thread
From: Nir Soffer @ 2022-10-30 17:38 UTC (permalink / raw)
  To: Hanna Reitz; +Cc: qemu-devel, qemu-block, Kevin Wolf

[-- Attachment #1: Type: text/plain, Size: 5221 bytes --]

On Wed, Oct 26, 2022 at 4:31 PM Hanna Reitz <hreitz@redhat.com> wrote:

> On 01.09.22 16:32, Nir Soffer wrote:
> > Add simple tests creating an image with all kinds of extents, different
> > formats, different backing chain, different protocol, and different
> > image options. Since all images have the same guest visible content they
> > must have the same checksum.
> >
> > To help debugging in case of failures, the output includes a json map of
> > every test image.
> >
> > Signed-off-by: Nir Soffer <nsoffer@redhat.com>
> > ---
> >   tests/qemu-iotests/tests/qemu-img-checksum    | 149 ++++++++++++++++++
> >   .../qemu-iotests/tests/qemu-img-checksum.out  |  74 +++++++++
> >   2 files changed, 223 insertions(+)
> >   create mode 100755 tests/qemu-iotests/tests/qemu-img-checksum
> >   create mode 100644 tests/qemu-iotests/tests/qemu-img-checksum.out
> >
> > diff --git a/tests/qemu-iotests/tests/qemu-img-checksum
> b/tests/qemu-iotests/tests/qemu-img-checksum
> > new file mode 100755
> > index 0000000000..3a85ba33f2
> > --- /dev/null
> > +++ b/tests/qemu-iotests/tests/qemu-img-checksum
> > @@ -0,0 +1,149 @@
> > +#!/usr/bin/env python3
> > +# group: rw auto quick
> > +#
> > +# Test cases for qemu-img checksum.
> > +#
> > +# Copyright (C) 2022 Red Hat, Inc.
> > +#
> > +# This program is free software; you can redistribute it and/or modify
> > +# it under the terms of the GNU General Public License as published by
> > +# the Free Software Foundation; either version 2 of the License, or
> > +# (at your option) any later version.
> > +#
> > +# This program is distributed in the hope that it will be useful,
> > +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> > +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> > +# GNU General Public License for more details.
> > +#
> > +# You should have received a copy of the GNU General Public License
> > +# along with this program.  If not, see <http://www.gnu.org/licenses/>.
> > +
> > +import re
> > +
> > +import iotests
> > +
> > +from iotests import (
> > +    filter_testfiles,
> > +    qemu_img,
> > +    qemu_img_log,
> > +    qemu_io,
> > +    qemu_nbd_popen,
> > +)
> > +
> > +
> > +def checksum_available():
> > +    out = qemu_img("--help").stdout
> > +    return re.search(r"\bchecksum .+ filename\b", out) is not None
> > +
> > +
> > +if not checksum_available():
> > +    iotests.notrun("checksum command not available")
> > +
> > +iotests.script_initialize(
> > +    supported_fmts=["raw", "qcow2"],
> > +    supported_cache_modes=["none", "writeback"],
>
> It doesn’t work with writeback, though, because it uses -T none below.
>

Good point


>
> Which by the way is a heavy cost, because I usually run tests in tmpfs,
> where this won’t work.  Is there any way of not doing the -T none below?
>

Testing using tempfs is problematic since you cannot test -T none. In oVirt
we alway use /var/tmp which usually uses something that supports direct I/O.

Do we have a way to specify cache mode in the tests, so we can use -T none
only when the option is set?


>
> > +    supported_protocols=["file", "nbd"],
> > +    required_fmts=["raw", "qcow2"],
> > +)
> > +
> > +print()
> > +print("=== Test images ===")
> > +print()
> > +
> > +disk_raw = iotests.file_path('raw')
> > +qemu_img("create", "-f", "raw", disk_raw, "10m")
> > +qemu_io("-f", "raw",
> > +        "-c", "write -P 0x1 0 2m",      # data
> > +        "-c", "write -P 0x0 2m 2m",     # data with zeroes
> > +        "-c", "write -z 4m 2m",         # zero allocated
> > +        "-c", "write -z -u 6m 2m",      # zero hole
> > +                                        # unallocated
> > +        disk_raw)
> > +print(filter_testfiles(disk_raw))
> > +qemu_img_log("map", "--output", "json", disk_raw)
> > +
> > +disk_qcow2 = iotests.file_path('qcow2')
> > +qemu_img("create", "-f", "qcow2", disk_qcow2, "10m")
> > +qemu_io("-f", "qcow2",
> > +        "-c", "write -P 0x1 0 2m",      # data
> > +        "-c", "write -P 0x0 2m 2m",     # data with zeroes
> > +        "-c", "write -z 4m 2m",         # zero allocated
> > +        "-c", "write -z -u 6m 2m",      # zero hole
> > +                                        # unallocated
> > +        disk_qcow2)
> > +print(filter_testfiles(disk_qcow2))
> > +qemu_img_log("map", "--output", "json", disk_qcow2)
>
> This isn’t how iotests work, generally.  When run with -qcow2 -file, it
> should only test qcow2 on file, not raw on file, not raw on nbd. Perhaps
> this way this test could even support other formats than qcow2 and raw.
>

For this type of tests, running the same code with raw, qcow2 (and other
formats)
and using file or nbd is the best way to test this feature - one test
verifies all the
use cases.

I can change this to use the current format (raw, qcow2, ...), protocol
(file, nbd, ...)
and cache value (none, writeback), but I'm not sure how this can work with
the
out files. The output from nbd is different from file. Maybe we need
different out
files for file and nbd (qemu-img-checksum.file.out,
qemu-img-checksum.nbd.out)?

Nir

[-- Attachment #2: Type: text/html, Size: 7244 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 3/3] qemu-img: Speed up checksum
  2022-10-26 13:54   ` Hanna Reitz
@ 2022-10-30 17:38     ` Nir Soffer
  2022-10-30 19:15       ` Nir Soffer
  2022-11-07 10:38       ` Hanna Reitz
  0 siblings, 2 replies; 18+ messages in thread
From: Nir Soffer @ 2022-10-30 17:38 UTC (permalink / raw)
  To: Hanna Reitz; +Cc: qemu-devel, qemu-block, Kevin Wolf, Stefan Hajnoczi

[-- Attachment #1: Type: text/plain, Size: 6287 bytes --]

On Wed, Oct 26, 2022 at 4:54 PM Hanna Reitz <hreitz@redhat.com> wrote:

> On 01.09.22 16:32, Nir Soffer wrote:
> > Add coroutine based loop inspired by `qemu-img convert` design.
> >
> > Changes compared to `qemu-img convert`:
> >
> > - State for the entire image is kept in ImgChecksumState
> >
> > - State for single worker coroutine is kept in ImgChecksumworker.
> >
> > - "Writes" are always in-order, ensured using a queue.
> >
> > - Calling block status once per image extent, when the current extent is
> >    consumed by the workers.
> >
> > - Using 1m buffer size - testings shows that this gives best read
> >    performance both with buffered and direct I/O.
>
> Why does patch 1 then choose to use 2 MB?
>

The first patch uses sync I/O, and in this case 2 MB is a little faster.


> > - Number of coroutines is not configurable. Testing does not show
> >    improvement when using more than 8 coroutines.
> >
> > - Progress include entire image, not only the allocated state.
> >
> > Comparing to the simple read loop shows that this version is up to 4.67
> > times faster when computing a checksum for an image full of zeroes. For
> > real images it is 1.59 times faster with direct I/O, and with buffered
> > I/O there is no difference.
> >
> > Test results on Dell PowerEdge R640 in a CentOS Stream 9 container:
> >
> > | image    | size | i/o       | before         | after          | change
> |
> >
> |----------|------|-----------|----------------|----------------|--------|
> > | zero [1] |   6g | buffered  | 1.600s ±0.014s | 0.342s ±0.016s |  x4.67
> |
> > | zero     |   6g | direct    | 4.684s ±0.093s | 2.211s ±0.009s |  x2.12
> |
> > | real [2] |   6g | buffered  | 1.841s ±0.075s | 1.806s ±0.036s |  x1.02
> |
> > | real     |   6g | direct    | 3.094s ±0.079s | 1.947s ±0.017s |  x1.59
> |
> > | nbd  [3] |   6g | buffered  | 2.455s ±0.183s | 1.808s ±0.016s |  x1.36
> |
> > | nbd      |   6g | direct    | 3.540s ±0.020s | 1.749s ±0.018s |  x2.02
> |
> >
> > [1] raw image full of zeroes
> > [2] raw fedora 35 image with additional random data, 50% full
> > [3] image [2] exported by qemu-nbd via unix socket
> >
> > Signed-off-by: Nir Soffer <nsoffer@redhat.com>
> > ---
> >   qemu-img.c | 343 +++++++++++++++++++++++++++++++++++++++++------------
> >   1 file changed, 270 insertions(+), 73 deletions(-)
>
> Looks good!
>
> Just a couple of style comments below.
>
> > diff --git a/qemu-img.c b/qemu-img.c
> > index 7edcfe4bc8..bfa8e2862f 100644
> > --- a/qemu-img.c
> > +++ b/qemu-img.c
> > @@ -1613,48 +1613,288 @@ out:
> >       qemu_vfree(buf2);
> >       blk_unref(blk2);
> >   out2:
> >       blk_unref(blk1);
> >   out3:
> >       qemu_progress_end();
> >       return ret;
> >   }
> >
> >   #ifdef CONFIG_BLKHASH
> > +
> > +#define CHECKSUM_COROUTINES 8
> > +#define CHECKSUM_BUF_SIZE (1 * MiB)
> > +#define CHECKSUM_ZERO_SIZE MIN(16 * GiB, SIZE_MAX)
> > +
> > +typedef struct ImgChecksumState ImgChecksumState;
> > +
> > +typedef struct ImgChecksumWorker {
> > +    QTAILQ_ENTRY(ImgChecksumWorker) entry;
> > +    ImgChecksumState *state;
> > +    Coroutine *co;
> > +    uint8_t *buf;
> > +
> > +    /* The current chunk. */
> > +    int64_t offset;
> > +    int64_t length;
> > +    bool zero;
> > +
> > +    /* Always true for zero extent, false for data extent. Set to true
> > +     * when reading the chunk completes. */
>
> Qemu codestyle requires /* and */ to be on separate lines for multi-line
> comments (see checkpatch.pl).
>

I'll change that. Do we have a good way to run checkpatch.pl when using
git-publish?

Maybe a way to run checkpatch.pl on all patches generated by git publish
automatically?


> > +    bool ready;
> > +} ImgChecksumWorker;
> > +
> > +struct ImgChecksumState {
> > +    const char *filename;
> > +    BlockBackend *blk;
> > +    BlockDriverState *bs;
> > +    int64_t total_size;
> > +
> > +    /* Current extent, modified in checksum_co_next. */
> > +    int64_t offset;
> > +    int64_t length;
> > +    bool zero;
> > +
> > +    int running_coroutines;
> > +    CoMutex lock;
> > +    ImgChecksumWorker workers[CHECKSUM_COROUTINES];
> > +
> > +    /* Ensure in-order updates. Update are scheduled at the tail of the
> > +     * queue and processed from the head of the queue when a worker is
> > +     * ready. */
>
> Qemu codestyle requires /* and */ to be on separate lines for multi-line
> comments.
>
> > +    QTAILQ_HEAD(, ImgChecksumWorker) update_queue;
> > +
> > +    struct blkhash *hash;
> > +    int ret;
> > +};
>
> [...]
>
> > +static coroutine_fn bool checksum_co_next(ImgChecksumWorker *w)
> > +{
> > +    ImgChecksumState *s = w->state;
> > +
> > +    qemu_co_mutex_lock(&s->lock);
> > +
> > +    if (s->offset == s->total_size || s->ret != -EINPROGRESS) {
> > +        qemu_co_mutex_unlock(&s->lock);
> > +        return false;
> > +    }
> > +
> > +    if (s->length == 0 && checksum_block_status(s)) {
>
> I’d prefer `checksum_block_status(s) < 0` so that it is clear that
> negative values indicate errors.  Otherwise, based on this code alone,
> it looks to me more like `checksum_block_status()` returned a boolean,
> where `false` would generally indicate an error, which is confusing.
>

Sure, will change.


>
> Same in other places below.
>
> > +        qemu_co_mutex_unlock(&s->lock);
> > +        return false;
> > +    }
>
> [...]
>
> > +/* Enter the next worker coroutine if the worker is ready. */
> > +static void coroutine_fn checksum_co_enter_next(ImgChecksumWorker *w)
> > +{
> > +    ImgChecksumState *s = w->state;
> > +    ImgChecksumWorker *next;
> > +
> > +    if (!QTAILQ_EMPTY(&s->update_queue)) {
> > +        next = QTAILQ_FIRST(&s->update_queue);
> > +        if (next->ready)
> > +            qemu_coroutine_enter(next->co);
>
> Qemu codestyle requires braces here.
>

Will add in next version.

Before beautifying the first commit, should I squash this commit into it?

The first commit was an attempt to do the simplest possible implementation,
but since this commit looks good (with some style issues), do we really
need the
first one?

Nir

[-- Attachment #2: Type: text/html, Size: 8439 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 3/3] qemu-img: Speed up checksum
  2022-10-30 17:38     ` Nir Soffer
@ 2022-10-30 19:15       ` Nir Soffer
  2022-11-07 10:38       ` Hanna Reitz
  1 sibling, 0 replies; 18+ messages in thread
From: Nir Soffer @ 2022-10-30 19:15 UTC (permalink / raw)
  To: Hanna Reitz; +Cc: qemu-devel, qemu-block, Kevin Wolf, Stefan Hajnoczi

[-- Attachment #1: Type: text/plain, Size: 855 bytes --]

On Sun, Oct 30, 2022 at 7:38 PM Nir Soffer <nsoffer@redhat.com> wrote:

> On Wed, Oct 26, 2022 at 4:54 PM Hanna Reitz <hreitz@redhat.com> wrote:
>
>> On 01.09.22 16:32, Nir Soffer wrote:
>>
> [...]

> > +    /* The current chunk. */
>> > +    int64_t offset;
>> > +    int64_t length;
>> > +    bool zero;
>> > +
>> > +    /* Always true for zero extent, false for data extent. Set to true
>> > +     * when reading the chunk completes. */
>>
>> Qemu codestyle requires /* and */ to be on separate lines for multi-line
>> comments (see checkpatch.pl).
>>
>
> I'll change that. Do we have a good way to run checkpatch.pl when using
> git-publish?
>
> Maybe a way to run checkpatch.pl on all patches generated by git publish
> automatically?
>

I found
https://blog.vmsplice.net/2011/03/how-to-automatically-run-checkpatchpl.html
and it seems to work well.

[-- Attachment #2: Type: text/html, Size: 2135 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/3] qemu-img: Add checksum command
  2022-10-30 17:37     ` Nir Soffer
@ 2022-11-07 10:20       ` Hanna Reitz
  2022-11-28 10:37         ` Nir Soffer
  0 siblings, 1 reply; 18+ messages in thread
From: Hanna Reitz @ 2022-11-07 10:20 UTC (permalink / raw)
  To: Nir Soffer; +Cc: qemu-devel, qemu-block, Kevin Wolf

On 30.10.22 18:37, Nir Soffer wrote:
> On Wed, Oct 26, 2022 at 4:00 PM Hanna Reitz <hreitz@redhat.com> wrote:
>
>     On 01.09.22 16:32, Nir Soffer wrote:
>     > The checksum command compute a checksum for disk image content
>     using the
>     > blkhash library[1]. The blkhash library is not packaged yet, but
>     it is
>     > available via copr[2].
>     >
>     > Example run:
>     >
>     >      $ ./qemu-img checksum -p fedora-35.qcow2
>     > 6e5c00c995056319d52395f8d91c7f84725ae3da69ffcba4de4c7d22cff713a5
>     fedora-35.qcow2
>     >
>     > The block checksum is constructed by splitting the image to
>     fixed sized
>     > blocks and computing a digest of every block. The image checksum
>     is the
>     > digest of the all block digests.
>     >
>     > The checksum uses internally the "sha256" algorithm but it cannot be
>     > compared with checksums created by other tools such as `sha256sum`.
>     >
>     > The blkhash library supports sparse images, zero detection, and
>     > optimizes zero block hashing (they are practically free). The
>     library
>     > uses multiple threads to speed up the computation.
>     >
>     > Comparing to `sha256sum`, `qemu-img checksum` is 3.5-4800[3] times
>     > faster, depending on the amount of data in the image:
>     >
>     >      $ ./qemu-img info /scratch/50p.raw
>     >      file format: raw
>     >      virtual size: 6 GiB (6442450944 bytes)
>     >      disk size: 2.91 GiB
>     >
>     >      $ hyperfine -w2 -r5 -p "sleep 1" "./qemu-img checksum
>     /scratch/50p.raw" \
>     >                                       "sha256sum /scratch/50p.raw"
>     >      Benchmark 1: ./qemu-img checksum /scratch/50p.raw
>     >        Time (mean ± σ):      1.849 s ±  0.037 s [User: 7.764 s,
>     System: 0.962 s]
>     >        Range (min … max):    1.813 s …  1.908 s    5 runs
>     >
>     >      Benchmark 2: sha256sum /scratch/50p.raw
>     >        Time (mean ± σ):     14.585 s ±  0.072 s [User: 13.537 s,
>     System: 1.003 s]
>     >        Range (min … max):   14.501 s … 14.697 s    5 runs
>     >
>     >      Summary
>     >        './qemu-img checksum /scratch/50p.raw' ran
>     >          7.89 ± 0.16 times faster than 'sha256sum /scratch/50p.raw'
>     >
>     > The new command is available only when `blkhash` is available during
>     > build. To test the new command please install the `blkhash-devel`
>     > package:
>     >
>     >      $ dnf copr enable nsoffer/blkhash
>     >      $ sudo dnf install blkhash-devel
>     >
>     > [1] https://gitlab.com/nirs/blkhash
>     > [2] https://copr.fedorainfracloud.org/coprs/nsoffer/blkhash/
>     > [3] Computing checksum for 8T empty image: qemu-img checksum: 3.7s,
>     >      sha256sum (estimate): 17,749s
>     >
>     > Signed-off-by: Nir Soffer <nsoffer@redhat.com>
>     > ---
>     >   docs/tools/qemu-img.rst |  22 +++++
>     >   meson.build             |  10 ++-
>     >   meson_options.txt       |   2 +
>     >   qemu-img-cmds.hx        |   8 ++
>     >   qemu-img.c              | 191
>     ++++++++++++++++++++++++++++++++++++++++
>     >   5 files changed, 232 insertions(+), 1 deletion(-)
>     >
>     > diff --git a/docs/tools/qemu-img.rst b/docs/tools/qemu-img.rst
>     > index 85a6e05b35..8be9c45cbf 100644
>     > --- a/docs/tools/qemu-img.rst
>     > +++ b/docs/tools/qemu-img.rst
>     > @@ -347,20 +347,42 @@ Command description:
>     >       Check completed, image is corrupted
>     >     3
>     >       Check completed, image has leaked clusters, but is not
>     corrupted
>     >     63
>     >       Checks are not supported by the image format
>     >
>     >     If ``-r`` is specified, exit codes representing the image
>     state refer to the
>     >     state after (the attempt at) repairing it. That is, a
>     successful ``-r all``
>     >     will yield the exit code 0, independently of the image state
>     before.
>     >
>     > +.. option:: checksum [--object OBJECTDEF] [--image-opts] [-f
>     FMT] [-T SRC_CACHE] [-p] FILENAME
>     > +
>     > +  Print a checksum for image *FILENAME* guest visible content.
>
>     Why not say which kind of checksum it is?
>
>
> Do you mean the algorithm used? This may be confusing, for example we 
> write
>
>    Print a sha256 checksum ...
>
> User will expect to get the same result from "sha256sum disk.img". How 
> about
>
>    Print a blkhash checksum ...
>
> And add a link to the blkhash project?

I did mean sha256, but if it isn’t pure sha256, then a link to any 
description how it is computed would be good, I think.

>
>     >           Images with
>     > +  different format or settings wil have the same checksum.
>
>     s/wil/will/
>
>
> Fixing
>
>
>     > +
>     > +  The format is probed unless you specify it by ``-f``.
>     > +
>     > +  The checksum is computed for guest visible content. Allocated
>     areas full of
>     > +  zeroes, zero clusters, and unallocated areas are read as
>     zeros so they will
>     > +  have the same checksum. Images with single or multiple files
>     or backing files
>     > +  will have the same checksums if the guest will see the same
>     content when
>     > +  reading the image.
>     > +
>     > +  Image metadata that is not visible to the guest such as dirty
>     bitmaps does
>     > +  not affect the checksum.
>     > +
>     > +  Computing a checksum requires a read-only image. You cannot
>     compute a
>     > +  checksum of an active image used by a guest,
>
>     Makes me ask: Why not?  Other subcommands have the -U flag for this.
>
>
> The text is not precise enough, the issue is not active image but 
> having a read only
> image.
>
> What we really want is to take a shared read lock on the image when 
> opening it to ensure
> that qemu or another instance of qemu-img is not holding a write lock 
> and cannot upgrade
> a read lock to write lock on this image.
>
> If the image can be changed while we read from it there is no point in 
> computing
> a checksum.

OK.

> If the -U/--force-share flag has these semantics we can add it, but 
> based on testing it
> works for images open for writing (e.g. qemu-img info, qemu-img measure).

Yes, -U would precisely allow other writers.

>
>     >                                                  but you can
>     compute a checksum
>     > +  of a guest during pull mode incremental backup using NBD URL.
>     > +
>     > +  The checksum is not compatible with other tools such as
>     *sha256sum*.
>
>     Why not?  I can see it differs even for raw images, but why?  I would
>     have very much assumed that this gives me exactly what sha256sum
>     in the
>     guest on the guest device would yield.
>
>
> The blkhash is a construction based on other cryptographic hash 
> functions (e.g. sha256).
> The way the hash is constructed is explained here:
> https://gitlab.com/nirs/blkhash/-/blob/master/blkhash.py#L52
>
> We can provide a very slow version using a single thread and no zero 
> optimization
> that will create the same hash as sha256sum for raw image.

Ah, right.  Yes, especially zero optimization is likely to make a huge 
difference.  Thanks for the explanation!

Maybe that could be mentioned here as a side note, though?  E.g. “The 
checksum is not compatible with other tools such as *sha256sum* for 
optimization purposes (to allow multithreading and optimized handling of 
zero areas).”?

> I don't think this is very
> interesting since it is not practical for large images. If we want 
> something like this,
> we can add it later when we add an option to configure the algorithm.
>
> We can also start with a slow version using available hash function 
> and enable
> blkhash only if the library is available in build time.

I mean, if that is necessary; but it feels like that would only 
complicate the code (in qemu-img) for little practical gain.

>
>     [...]
>
>     > diff --git a/qemu-img.c b/qemu-img.c
>     > index 7d4b33b3da..7edcfe4bc8 100644
>     > --- a/qemu-img.c
>     > +++ b/qemu-img.c
>     > @@ -17,20 +17,21 @@
>     >    * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
>     MERCHANTABILITY,
>     >    * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO
>     EVENT SHALL
>     >    * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM,
>     DAMAGES OR OTHER
>     >    * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR
>     OTHERWISE, ARISING FROM,
>     >    * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
>     OTHER DEALINGS IN
>     >    * THE SOFTWARE.
>     >    */
>     >
>     >   #include "qemu/osdep.h"
>     >   #include <getopt.h>
>     > +#include <blkhash.h>
>
>     This must be guarded by CONFIG_BLKHASH.
>
>
> Right
> [...]
>
>     > +#ifdef CONFIG_BLKHASH
>     > +/*
>     > + * Compute image checksum.
>     > + */
>     > +static int img_checksum(int argc, char **argv)
>     > +{
>     > +    const char *digest_name = "sha256";
>
>     Why don’t we make this configurable?
>
>
> We can, but this is not easy:
> - which hash functions should be supported?
> - how to encode the algorithm used in the hash function?

I think I wrote the question when I still thought that using sha256 
would trivially relate the result to pure sha256 (i.e. sha256sum). Given 
that there is no trivial relation, I no longer think it makes sense to 
have this be configurable either.

We can just add it and think about this later if anyone turns up with a 
need for it.

>
> The blksum tool (part of blkhash) has a --digest and --list-digests 
> options and
> can use any hash function supported by openssl. Currently it does not 
> include
> the algorithm, in the result, so the user has to know 2 values to 
> compare hashes
> of different images.
>
> If we don't support configuration, all these issues are gone, and the 
> tool is
> easier and safer to use.
>
> If we want to make this configurable, I think we need to include the 
> algorithm
> in the result, something like:
>
>     $ qemu-img checksum disk.img
> blk-sha256:9b3d2f329b8e1a3a10ac623efa163c12e953dbb5192825b4772dcf0f8905e1b1 
> disk.img
>
>     $ qemu-img checksum --digest sha1 disk.img
>     blk-sha1:6d7e6d55bed36e733bf50127d151f8af7bb5077f disk.img
>
> Or if we want to support standard checksums, maybe:
>
>     $ qemu-img checksum --digest sha256 disk.img
> sha256:49bc20df15e412a64472421e13fe86ff1c5165e18b2afccf160d4dc19fe68a14 
> disk.img
>
>     $ qemu-img checksum --digest blk-sha256 disk.img
> blk-sha256:9b3d2f329b8e1a3a10ac623efa163c12e953dbb5192825b4772dcf0f8905e1b1 
> disk.img
>
>
>     > +    const size_t block_size = 64 * KiB;
>     > +
>     > +    const char *format = NULL;
>     > +    const char *cache = BDRV_DEFAULT_CACHE;
>     > +    const char *filename;
>     > +    BlockBackend *blk;
>     > +    BlockDriverState *bs;
>     > +    uint8_t *buf = NULL;
>     > +    int64_t pnum;
>     > +    bool progress = false;
>     > +    int flags = 0;
>     > +    bool writethrough;
>     > +    int64_t total_size;
>     > +    int64_t offset = 0;
>     > +    int c;
>     > +    bool image_opts = false;
>     > +    struct blkhash *h = NULL;
>     > +    unsigned char digest[64];
>     > +    unsigned int digest_len;
>     > +    int ret = 1;
>     > +    int err;
>     > +
>     > +    for (;;) {
>     > +        static const struct option long_options[] = {
>     > +            {"help", no_argument, 0, 'h'},
>     > +            {"object", required_argument, 0, OPTION_OBJECT},
>     > +            {"image-opts", no_argument, 0, OPTION_IMAGE_OPTS},
>     > +            {0, 0, 0, 0}
>     > +        };
>     > +        c = getopt_long(argc, argv, ":hf:T:p",
>     > +                        long_options, NULL);
>     > +        if (c == -1) {
>     > +            break;
>     > +        }
>     > +        switch (c) {
>     > +        case ':':
>     > +            missing_argument(argv[optind - 1]);
>     > +            break;
>     > +        case '?':
>     > +            unrecognized_option(argv[optind - 1]);
>     > +            break;
>     > +        case 'h':
>     > +            help();
>     > +            break;
>     > +        case 'f':
>     > +            format = optarg;
>     > +            break;
>     > +        case 'T':
>     > +            cache = optarg;
>     > +            break;
>     > +        case 'p':
>     > +            progress = true;
>     > +            break;
>     > +        case OPTION_OBJECT:
>     > +            {
>     > +                Error *local_err = NULL;
>     > +
>     > +                if (!user_creatable_add_from_str(optarg,
>     &local_err)) {
>     > +                    if (local_err) {
>     > +                        error_report_err(local_err);
>     > +                        exit(2);
>     > +                    } else {
>     > +                        /* Help was printed */
>     > +                        exit(EXIT_SUCCESS);
>     > +                    }
>     > +                }
>
>     How about simply using user_creatable_process_cmdline(optarg)? (which
>     most other subcommands use)
>
>     (I believe img_compare() has to have its own code, because exit
>     code 1
>     is reserved there.  But that isn’t a concern here.)
>
>
> Sure I can use it, I did not know about this.
>
>
>     > +                break;
>     > +            }
>     > +        case OPTION_IMAGE_OPTS:
>     > +            image_opts = true;
>     > +            break;
>     > +        }
>     > +    }
>     > +
>     > +    if (optind != argc - 1) {
>     > +        error_exit("Expecting image file name");
>     > +    }
>     > +
>     > +    filename = argv[optind++];
>     > +
>     > +    err = bdrv_parse_cache_mode(cache, &flags, &writethrough);
>     > +    if (err < 0) {
>     > +        error_report("Invalid source cache option: %s", cache);
>     > +        return ret;
>
>     Personally, I’m not too big of a fan of using `ret` here, because
>     it was
>     set so far above.  Why not just `return 1;`?
>
>     (Same in other error cases below.)
>
>
> ret is set at the start to error value (1) and modified to 0 only if 
> everything
> completed successfully.  This way we can return the same value in all
> the exit paths.
>
>     > +    }
>     > +
>     > +    blk = img_open(image_opts, filename, format, flags,
>     writethrough, false,
>     > +                   false);
>     > +    if (!blk) {
>     > +        return ret;
>     > +    }
>     > +
>     > +    /* Initialize before using goto out. */
>     > +    qemu_progress_init(progress, 2.0);
>     > +
>     > +    bs = blk_bs(blk);
>     > +    buf = blk_blockalign(blk, IO_BUF_SIZE);
>
>     It looks like this macro is kind of specific to `img_compare()`,
>     and is
>     currently somewhere in the middle of its code.  I don’t mind
>     reusing it
>     here, but if so, we might want to move it up to the top of this file,
>     and add a comment that this is the buffer size used for commands like
>     compare or checksum.
>
>
> IO_BUF_SIZE is also used by qemu-img convert. In the next commits
> I'm adding another macro for the checksum code, so this change is not
> needed, but I can move the macro to the top of the file to make it clear
> that this is a default for many commands, and not specific qemu-img 
> compare
> default.

Works for me either way.

(During review I didn’t know yet that the use here would be replaced 
again by patch 3.  Since it is, I no longer really mind what happens to 
IO_BUF_SIZE.)

>
>
>     > +
>     > +    total_size = blk_getlength(blk);
>     > +    if (total_size < 0) {
>     > +        error_report("Can't get size of %s: %s",
>     > +                     filename, strerror(-total_size));
>     > +        goto out;
>
>     I suggest adding `ret = 1;` before such a `goto out;` (not just
>     here),
>     so it is clear exactly what value we are going to return (it isn’t
>     trivial to track that `ret` isn’t used for anything else). But I
>     can see
>     that it’s extra code, so maybe you don’t like that.
>
>     If we keep this as-is, perhaps we could rename `ret` to something
>     more
>     explicit like `exit_status`, though?  That way, it’s pretty clear
>     that
>     we won’t accidentally reuse it for anything else.  (I know this isn’t
>     the first subcommand to use `ret` for the process exist status, but I
>     don’t think those other subcommands are great role models in this
>     regard.)
>
>
> exit_status is much better!
>
> I'll rename ret to exit_status in the next version, and keep the way 
> it is handled
> (initialized to 1, set to 0 on success).
>
>
>     > +    }
>     > +
>     > +    h = blkhash_new(block_size, digest_name);
>
>     Should we somehow make sure that IO_BUF_SIZE is a multiple of
>     block_size?  I mean, it is, but it isn’t obvious at least, and I
>     guess
>     maybe at some point someone might want to make block_size a
>     parameter.
>     Would a static assertion work?  (Would stop working once someone
>     decide
>     to make block_size a parameter, which is good, because it draws
>     attention.)
>
>
> Using a multiple of block_size is no required but it can improve 
> performance by
> minimizing copies in the pending buffer managed by blkhash.
>
> Making block_size configurable is not good way, since it changes the 
> value of the
> hash.

Right.  I didn’t understand this at the time of the review.  Could we 
add a comment above blkhash_new() to that effect?  (E.g. “The value of 
block_size changes how the hash is calculated, so block_size must always 
be the same.”)

> In we do in this way we maybe wan to encode the block size in the result
> so someone can restore the arguments for comparing an image against a 
> hash:
>
> blk-sha256/64k:9b3d2f329b8e1a3a10ac623efa163c12e953dbb5192825b4772dcf0f8905e1b1
>
> This starts to look too complicated to me.

Right, right.

> In blksum I do not allow changing the block size.
>
> I'll add an assert in the next version to keeps this default optimal.

Thanks!  (Static assert should work, right?)

>     > +    if (!h) {
>     > +        error_report("Can't create blkhash: %s", strerror(errno));
>     > +        goto out;
>     > +    }
>     > +
>     > +    qemu_progress_print(0, 100);
>     > +
>     > +    while (offset < total_size) {
>     > +        int status;
>     > +        int64_t chunk;
>     > +
>     > +        status = bdrv_block_status_above(bs, NULL, offset,
>     > +                                         total_size - offset,
>     &pnum, NULL,
>     > +                                         NULL);
>     > +        if (status < 0) {
>     > +            error_report("Error checking status at offset %"
>     PRId64 " for %s",
>     > +                         offset, filename);
>     > +            goto out;
>     > +        }
>     > +
>     > +        assert(pnum);
>     > +        chunk = pnum;
>
>     Can’t we drop `pnum` and replace it by `chunk` in all cases?
>
>
> I did this in the next commit, I can update it in this commit.

Thanks!

>
>     > +
>     > +        if (status & BDRV_BLOCK_ZERO) {
>     > +            chunk = MIN(chunk, SIZE_MAX);
>
>     Itches me a bit to propose rounding SIZE_MAX down to block_size,
>     but I
>     guess given the magnitude of SIZE_MAX on 64-bit systems, it
>     doesn’t matter.
>
>
> We round down to SIZE_MAX, since blkash_zero() accept size_t.
>
> I can also change blkhash to use int64_t instead. Testing show that 
> 16g is a good
> value to ensure timely progress updates.

I mean, 64-bit systems are so widespread these days that size_t should 
generally be 64-bit anyway.

>
>     > +            err = blkhash_zero(h, chunk);
>     > +            if (err) {
>     > +                error_report("Error zeroing hash at offset %"
>     PRId64
>     > +                             " of %s: %s",
>     > +                             offset, filename, strerror(err));
>     > +                goto out;
>     > +            }
>     > +        } else {
>     > +            chunk = MIN(chunk, IO_BUF_SIZE);
>     > +            err = blk_pread(blk, offset, chunk, buf, 0);
>     > +            if (err < 0) {
>     > +                error_report("Error reading at offset %" PRId64
>     " of %s: %s",
>     > +                             offset, filename, strerror(-err));
>     > +                goto out;
>     > +            }
>     > +            err = blkhash_update(h, buf, chunk);
>     > +            if (err) {
>     > +                error_report("Error updating hash at offset %"
>     PRId64
>     > +                             " of %s: %s",
>     > +                             offset, filename, strerror(err));
>     > +                goto out;
>     > +            }
>     > +        }
>     > +
>     > +        offset += chunk;
>     > +        qemu_progress_print(((float) chunk / total_size) * 100,
>     100);
>     > +    }
>     > +
>     > +    err = blkhash_final(h, digest, &digest_len);
>
>     How does this verify that `digest` is sufficiently large?
>
>     I mean, it is, given that we only have sha256 now.  But it still
>     seems
>     rather dangerous to me.
>
>
> In blksum I'm using EVP_MAX_MD_SIZE (from <openssl/evp.h>) which is 64
> (e.g. sha512).
>
> I'll add a similar macro in blkhash.h so users do not have to assume
> anything.

Thanks!

>
>     > +    if (err) {
>     > +        error_report("Error finalizing hash of %s: %s",
>     > +                     filename, strerror(err));
>     > +        goto out;
>     > +    }
>     > +
>     > +    for (unsigned i = 0; i < digest_len; i++) {
>
>     I always such declarations weren’t allowed in qemu, but this isn’t
>     the
>     first place, and I don’t mind.  Good to know. :)
>
>
> meson.build uses:
>
>     'c_std=gnu99',
>
> So the more modern and safer way should be allowed.
>
> Thanks for reviewing, this is very helpful!

Thanks for the patches!

Hanna



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 3/3] qemu-img: Speed up checksum
  2022-10-30 17:38     ` Nir Soffer
  2022-10-30 19:15       ` Nir Soffer
@ 2022-11-07 10:38       ` Hanna Reitz
  1 sibling, 0 replies; 18+ messages in thread
From: Hanna Reitz @ 2022-11-07 10:38 UTC (permalink / raw)
  To: Nir Soffer; +Cc: qemu-devel, qemu-block, Kevin Wolf, Stefan Hajnoczi

On 30.10.22 18:38, Nir Soffer wrote:
> On Wed, Oct 26, 2022 at 4:54 PM Hanna Reitz <hreitz@redhat.com> wrote:
>
>     On 01.09.22 16:32, Nir Soffer wrote:
>     > Add coroutine based loop inspired by `qemu-img convert` design.
>     >
>     > Changes compared to `qemu-img convert`:
>     >
>     > - State for the entire image is kept in ImgChecksumState
>     >
>     > - State for single worker coroutine is kept in ImgChecksumworker.
>     >
>     > - "Writes" are always in-order, ensured using a queue.
>     >
>     > - Calling block status once per image extent, when the current
>     extent is
>     >    consumed by the workers.
>     >
>     > - Using 1m buffer size - testings shows that this gives best read
>     >    performance both with buffered and direct I/O.
>
>     Why does patch 1 then choose to use 2 MB?
>
>
> The first patch uses sync I/O, and in this case 2 MB is a little faster.

Interesting.

>     > - Number of coroutines is not configurable. Testing does not show
>     >    improvement when using more than 8 coroutines.
>     >
>     > - Progress include entire image, not only the allocated state.
>     >
>     > Comparing to the simple read loop shows that this version is up
>     to 4.67
>     > times faster when computing a checksum for an image full of
>     zeroes. For
>     > real images it is 1.59 times faster with direct I/O, and with
>     buffered
>     > I/O there is no difference.
>     >
>     > Test results on Dell PowerEdge R640 in a CentOS Stream 9 container:
>     >
>     > | image    | size | i/o       | before         | after         |
>     change |
>     >
>     |----------|------|-----------|----------------|----------------|--------|
>     > | zero [1] |   6g | buffered  | 1.600s ±0.014s | 0.342s ±0.016s
>     |  x4.67 |
>     > | zero     |   6g | direct    | 4.684s ±0.093s | 2.211s ±0.009s
>     |  x2.12 |
>     > | real [2] |   6g | buffered  | 1.841s ±0.075s | 1.806s ±0.036s
>     |  x1.02 |
>     > | real     |   6g | direct    | 3.094s ±0.079s | 1.947s ±0.017s
>     |  x1.59 |
>     > | nbd  [3] |   6g | buffered  | 2.455s ±0.183s | 1.808s ±0.016s
>     |  x1.36 |
>     > | nbd      |   6g | direct    | 3.540s ±0.020s | 1.749s ±0.018s
>     |  x2.02 |
>     >
>     > [1] raw image full of zeroes
>     > [2] raw fedora 35 image with additional random data, 50% full
>     > [3] image [2] exported by qemu-nbd via unix socket
>     >
>     > Signed-off-by: Nir Soffer <nsoffer@redhat.com>
>     > ---
>     >   qemu-img.c | 343
>     +++++++++++++++++++++++++++++++++++++++++------------
>     >   1 file changed, 270 insertions(+), 73 deletions(-)
>
>     Looks good!
>
>     Just a couple of style comments below.
>
>     > diff --git a/qemu-img.c b/qemu-img.c
>     > index 7edcfe4bc8..bfa8e2862f 100644
>     > --- a/qemu-img.c
>     > +++ b/qemu-img.c
>     > @@ -1613,48 +1613,288 @@ out:
>     >       qemu_vfree(buf2);
>     >       blk_unref(blk2);
>     >   out2:
>     >       blk_unref(blk1);
>     >   out3:
>     >       qemu_progress_end();
>     >       return ret;
>     >   }
>     >
>     >   #ifdef CONFIG_BLKHASH
>     > +
>     > +#define CHECKSUM_COROUTINES 8
>     > +#define CHECKSUM_BUF_SIZE (1 * MiB)
>     > +#define CHECKSUM_ZERO_SIZE MIN(16 * GiB, SIZE_MAX)
>     > +
>     > +typedef struct ImgChecksumState ImgChecksumState;
>     > +
>     > +typedef struct ImgChecksumWorker {
>     > +    QTAILQ_ENTRY(ImgChecksumWorker) entry;
>     > +    ImgChecksumState *state;
>     > +    Coroutine *co;
>     > +    uint8_t *buf;
>     > +
>     > +    /* The current chunk. */
>     > +    int64_t offset;
>     > +    int64_t length;
>     > +    bool zero;
>     > +
>     > +    /* Always true for zero extent, false for data extent. Set
>     to true
>     > +     * when reading the chunk completes. */
>
>     Qemu codestyle requires /* and */ to be on separate lines for
>     multi-line
>     comments (see checkpatch.pl <http://checkpatch.pl>).
>
>
> I'll change that. Do we have a good way to run checkpatch.pl 
> <http://checkpatch.pl/> when using
> git-publish?
>
> Maybe a way to run checkpatch.pl <http://checkpatch.pl> on all patches 
> generated by git publish
> automatically?

I don’t know myself, I don’t use git-publish.  Stefan should know. ;)

>
>     > +    bool ready;
>     > +} ImgChecksumWorker;
>     > +
>     > +struct ImgChecksumState {
>     > +    const char *filename;
>     > +    BlockBackend *blk;
>     > +    BlockDriverState *bs;
>     > +    int64_t total_size;
>     > +
>     > +    /* Current extent, modified in checksum_co_next. */
>     > +    int64_t offset;
>     > +    int64_t length;
>     > +    bool zero;
>     > +
>     > +    int running_coroutines;
>     > +    CoMutex lock;
>     > +    ImgChecksumWorker workers[CHECKSUM_COROUTINES];
>     > +
>     > +    /* Ensure in-order updates. Update are scheduled at the
>     tail of the
>     > +     * queue and processed from the head of the queue when a
>     worker is
>     > +     * ready. */
>
>     Qemu codestyle requires /* and */ to be on separate lines for
>     multi-line
>     comments.
>
>     > +    QTAILQ_HEAD(, ImgChecksumWorker) update_queue;
>     > +
>     > +    struct blkhash *hash;
>     > +    int ret;
>     > +};
>
>     [...]
>
>     > +static coroutine_fn bool checksum_co_next(ImgChecksumWorker *w)
>     > +{
>     > +    ImgChecksumState *s = w->state;
>     > +
>     > +    qemu_co_mutex_lock(&s->lock);
>     > +
>     > +    if (s->offset == s->total_size || s->ret != -EINPROGRESS) {
>     > +        qemu_co_mutex_unlock(&s->lock);
>     > +        return false;
>     > +    }
>     > +
>     > +    if (s->length == 0 && checksum_block_status(s)) {
>
>     I’d prefer `checksum_block_status(s) < 0` so that it is clear that
>     negative values indicate errors.  Otherwise, based on this code
>     alone,
>     it looks to me more like `checksum_block_status()` returned a
>     boolean,
>     where `false` would generally indicate an error, which is confusing.
>
>
> Sure, will change.
>
>
>     Same in other places below.
>
>     > +        qemu_co_mutex_unlock(&s->lock);
>     > +        return false;
>     > +    }
>
>     [...]
>
>     > +/* Enter the next worker coroutine if the worker is ready. */
>     > +static void coroutine_fn
>     checksum_co_enter_next(ImgChecksumWorker *w)
>     > +{
>     > +    ImgChecksumState *s = w->state;
>     > +    ImgChecksumWorker *next;
>     > +
>     > +    if (!QTAILQ_EMPTY(&s->update_queue)) {
>     > +        next = QTAILQ_FIRST(&s->update_queue);
>     > +        if (next->ready)
>     > +            qemu_coroutine_enter(next->co);
>
>     Qemu codestyle requires braces here.
>
>
> Will add in next version.
>
> Before beautifying the first commit, should I squash this commit into it?
>
> The first commit was an attempt to do the simplest possible 
> implementation,
> but since this commit looks good (with some style issues), do we 
> really need the
> first one?

I wouldn’t mind squashing.  Your choice. :)

Hanna



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/3] iotests: Test qemu-img checksum
  2022-10-30 17:38     ` Nir Soffer
@ 2022-11-07 11:41       ` Hanna Reitz
  2022-11-28 10:46         ` Nir Soffer
  0 siblings, 1 reply; 18+ messages in thread
From: Hanna Reitz @ 2022-11-07 11:41 UTC (permalink / raw)
  To: Nir Soffer; +Cc: qemu-devel, qemu-block, Kevin Wolf

On 30.10.22 18:38, Nir Soffer wrote:
> On Wed, Oct 26, 2022 at 4:31 PM Hanna Reitz <hreitz@redhat.com> wrote:
>
>     On 01.09.22 16:32, Nir Soffer wrote:
>     > Add simple tests creating an image with all kinds of extents,
>     different
>     > formats, different backing chain, different protocol, and different
>     > image options. Since all images have the same guest visible
>     content they
>     > must have the same checksum.
>     >
>     > To help debugging in case of failures, the output includes a
>     json map of
>     > every test image.
>     >
>     > Signed-off-by: Nir Soffer <nsoffer@redhat.com>
>     > ---
>     >   tests/qemu-iotests/tests/qemu-img-checksum    | 149
>     ++++++++++++++++++
>     >   .../qemu-iotests/tests/qemu-img-checksum.out  |  74 +++++++++
>     >   2 files changed, 223 insertions(+)
>     >   create mode 100755 tests/qemu-iotests/tests/qemu-img-checksum
>     >   create mode 100644 tests/qemu-iotests/tests/qemu-img-checksum.out
>     >
>     > diff --git a/tests/qemu-iotests/tests/qemu-img-checksum
>     b/tests/qemu-iotests/tests/qemu-img-checksum
>     > new file mode 100755
>     > index 0000000000..3a85ba33f2
>     > --- /dev/null
>     > +++ b/tests/qemu-iotests/tests/qemu-img-checksum
>     > @@ -0,0 +1,149 @@
>     > +#!/usr/bin/env python3
>     > +# group: rw auto quick
>     > +#
>     > +# Test cases for qemu-img checksum.
>     > +#
>     > +# Copyright (C) 2022 Red Hat, Inc.
>     > +#
>     > +# This program is free software; you can redistribute it and/or
>     modify
>     > +# it under the terms of the GNU General Public License as
>     published by
>     > +# the Free Software Foundation; either version 2 of the License, or
>     > +# (at your option) any later version.
>     > +#
>     > +# This program is distributed in the hope that it will be useful,
>     > +# but WITHOUT ANY WARRANTY; without even the implied warranty of
>     > +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>     > +# GNU General Public License for more details.
>     > +#
>     > +# You should have received a copy of the GNU General Public License
>     > +# along with this program.  If not, see
>     <http://www.gnu.org/licenses/>.
>     > +
>     > +import re
>     > +
>     > +import iotests
>     > +
>     > +from iotests import (
>     > +    filter_testfiles,
>     > +    qemu_img,
>     > +    qemu_img_log,
>     > +    qemu_io,
>     > +    qemu_nbd_popen,
>     > +)
>     > +
>     > +
>     > +def checksum_available():
>     > +    out = qemu_img("--help").stdout
>     > +    return re.search(r"\bchecksum .+ filename\b", out) is not None
>     > +
>     > +
>     > +if not checksum_available():
>     > +    iotests.notrun("checksum command not available")
>     > +
>     > +iotests.script_initialize(
>     > +    supported_fmts=["raw", "qcow2"],
>     > +    supported_cache_modes=["none", "writeback"],
>
>     It doesn’t work with writeback, though, because it uses -T none below.
>
>
> Good point
>
>
>     Which by the way is a heavy cost, because I usually run tests in
>     tmpfs,
>     where this won’t work.  Is there any way of not doing the -T none
>     below?
>
>
> Testing using tempfs is problematic since you cannot test -T none.In 
> oVirt
> we alway use /var/tmp which usually uses something that supports 
> direct I/O.
>
> Do we have a way to specify cache mode in the tests, so we can use -T none
> only when the option is set?

`./check` has a `-c` option (e.g. `./check -c none`), which lands in 
`iotests.cachemode`.  That isn’t automatically passed to qemu-img calls, 
but you can do it manually (i.e. `qemu_img_log("checksum", "-T", 
iotests.cachemode, disk_top)` instead of `"-T", "none"`).

>
>     > +    supported_protocols=["file", "nbd"],
>     > +    required_fmts=["raw", "qcow2"],
>     > +)
>     > +
>     > +print()
>     > +print("=== Test images ===")
>     > +print()
>     > +
>     > +disk_raw = iotests.file_path('raw')
>     > +qemu_img("create", "-f", "raw", disk_raw, "10m")
>     > +qemu_io("-f", "raw",
>     > +        "-c", "write -P 0x1 0 2m",      # data
>     > +        "-c", "write -P 0x0 2m 2m",     # data with zeroes
>     > +        "-c", "write -z 4m 2m",         # zero allocated
>     > +        "-c", "write -z -u 6m 2m",      # zero hole
>     > +                                        # unallocated
>     > +        disk_raw)
>     > +print(filter_testfiles(disk_raw))
>     > +qemu_img_log("map", "--output", "json", disk_raw)
>     > +
>     > +disk_qcow2 = iotests.file_path('qcow2')
>     > +qemu_img("create", "-f", "qcow2", disk_qcow2, "10m")
>     > +qemu_io("-f", "qcow2",
>     > +        "-c", "write -P 0x1 0 2m",      # data
>     > +        "-c", "write -P 0x0 2m 2m",     # data with zeroes
>     > +        "-c", "write -z 4m 2m",         # zero allocated
>     > +        "-c", "write -z -u 6m 2m",      # zero hole
>     > +                                        # unallocated
>     > +        disk_qcow2)
>     > +print(filter_testfiles(disk_qcow2))
>     > +qemu_img_log("map", "--output", "json", disk_qcow2)
>
>     This isn’t how iotests work, generally.  When run with -qcow2
>     -file, it
>     should only test qcow2 on file, not raw on file, not raw on nbd.
>     Perhaps
>     this way this test could even support other formats than qcow2 and
>     raw.
>
>
> For this type of tests, running the same code with raw, qcow2 (and 
> other formats)
> and using file or nbd is the best way to test this feature - one test 
> verifies all the
> use cases.

Yes, I see, but that’s a general thing in the iotests.  The design is 
such that tests don’t cycle through their test matrix themselves, but 
that they always only test a single combination of format+protocol on 
each run, and the user is responsible for cycling through the desired 
test matrix.

I’m not saying that was definitely the best design decision, but the 
problem now that if a test cycles through its test matrix by itself, it 
must also ensure that it is run only once when the user cycles through 
the same test matrix.  For example, a reasonable run of the iotests 
consists of `./check -raw`, `./check -qcow2`, and `./check -nbd`.  This 
test here would then run in all three configurations, but do the same 
thing every time (specifically, test all of those configurations every 
time).

So there’s a conflict.  Either the test follows the existing design and 
only tests a single configuration, as iotests are expected to do, or we 
have the question of how to deal with the fact that users will run the 
test suite in multiple configurations, but one run of this test would 
already cover them all.

I’m not sternly against cycling through the possible combinations right 
here in the test, but I want to lay out the problems with that approach.

> I can change this to use the current format (raw, qcow2, ...), 
> protocol (file, nbd, ...)
> and cache value (none, writeback), but I'm not sure how this can work 
> with the
> out files. The output from nbd is different from file. Maybe we need 
> different out
> files for file and nbd (qemu-img-checksum.file.out, 
> qemu-img-checksum.nbd.out)?

We already have that for the format (e.g. 178.out.{qcow2,raw}).  If you 
decide to do that, it shouldn’t be too difficult to implement 
(testrunner.py’s `TestRunner.find_reference()`).  Alternatively, it’s 
also possible to basically ignore the reference output and verify the 
expected output right in the test code.

Hanna



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/3] qemu-img: Add checksum command
  2022-11-07 10:20       ` Hanna Reitz
@ 2022-11-28 10:37         ` Nir Soffer
  0 siblings, 0 replies; 18+ messages in thread
From: Nir Soffer @ 2022-11-28 10:37 UTC (permalink / raw)
  To: Hanna Reitz; +Cc: qemu-devel, qemu-block, Kevin Wolf

[-- Attachment #1: Type: text/plain, Size: 3570 bytes --]

On Mon, Nov 7, 2022 at 12:20 PM Hanna Reitz <hreitz@redhat.com> wrote:

> On 30.10.22 18:37, Nir Soffer wrote:
> > On Wed, Oct 26, 2022 at 4:00 PM Hanna Reitz <hreitz@redhat.com> wrote:
> >
> >     On 01.09.22 16:32, Nir Soffer wrote:
> [...]
> >     > ---
> >     >   docs/tools/qemu-img.rst |  22 +++++
> >     >   meson.build             |  10 ++-
> >     >   meson_options.txt       |   2 +
> >     >   qemu-img-cmds.hx        |   8 ++
> >     >   qemu-img.c              | 191
> >     ++++++++++++++++++++++++++++++++++++++++
> >     >   5 files changed, 232 insertions(+), 1 deletion(-)
> >     >
> >     > diff --git a/docs/tools/qemu-img.rst b/docs/tools/qemu-img.rst
> >     > index 85a6e05b35..8be9c45cbf 100644
> >     > --- a/docs/tools/qemu-img.rst
> >     > +++ b/docs/tools/qemu-img.rst
> >     > @@ -347,20 +347,42 @@ Command description:
> >     >       Check completed, image is corrupted
> >     >     3
> >     >       Check completed, image has leaked clusters, but is not
> >     corrupted
> >     >     63
> >     >       Checks are not supported by the image format
> >     >
> >     >     If ``-r`` is specified, exit codes representing the image
> >     state refer to the
> >     >     state after (the attempt at) repairing it. That is, a
> >     successful ``-r all``
> >     >     will yield the exit code 0, independently of the image state
> >     before.
> >     >
> >     > +.. option:: checksum [--object OBJECTDEF] [--image-opts] [-f
> >     FMT] [-T SRC_CACHE] [-p] FILENAME
> >     > +
> >     > +  Print a checksum for image *FILENAME* guest visible content.
> >
> >     Why not say which kind of checksum it is?
> >
> >
> > Do you mean the algorithm used? This may be confusing, for example we
> > write
> >
> >    Print a sha256 checksum ...
> >
> > User will expect to get the same result from "sha256sum disk.img". How
> > about
> >
> >    Print a blkhash checksum ...
> >
> > And add a link to the blkhash project?
>
> I did mean sha256, but if it isn’t pure sha256, then a link to any
> description how it is computed would be good, I think.
>

Ok, will link to https://gitlab.com/nirs/blkhash

[...]

>
> >     > +  The checksum is not compatible with other tools such as
> >     *sha256sum*.
> >
> >     Why not?  I can see it differs even for raw images, but why?  I would
> >     have very much assumed that this gives me exactly what sha256sum
> >     in the
> >     guest on the guest device would yield.
> >
> >
> > The blkhash is a construction based on other cryptographic hash
> > functions (e.g. sha256).
> > The way the hash is constructed is explained here:
> > https://gitlab.com/nirs/blkhash/-/blob/master/blkhash.py#L52
> >
> > We can provide a very slow version using a single thread and no zero
> > optimization
> > that will create the same hash as sha256sum for raw image.
>
> Ah, right.  Yes, especially zero optimization is likely to make a huge
> difference.  Thanks for the explanation!
>
> Maybe that could be mentioned here as a side note, though?  E.g. “The
> checksum is not compatible with other tools such as *sha256sum* for
> optimization purposes (to allow multithreading and optimized handling of
> zero areas).”?
>

Ok, I will improve the text in the next version.

[...]

> > In blksum I do not allow changing the block size.
> >
> > I'll add an assert in the next version to keeps this default optimal.
>
> Thanks!  (Static assert should work, right?)
>

I think it should

Nir

[-- Attachment #2: Type: text/html, Size: 5109 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/3] iotests: Test qemu-img checksum
  2022-11-07 11:41       ` Hanna Reitz
@ 2022-11-28 10:46         ` Nir Soffer
  0 siblings, 0 replies; 18+ messages in thread
From: Nir Soffer @ 2022-11-28 10:46 UTC (permalink / raw)
  To: Hanna Reitz; +Cc: qemu-devel, qemu-block, Kevin Wolf

[-- Attachment #1: Type: text/plain, Size: 8595 bytes --]

On Mon, Nov 7, 2022 at 1:41 PM Hanna Reitz <hreitz@redhat.com> wrote:

> On 30.10.22 18:38, Nir Soffer wrote:
> > On Wed, Oct 26, 2022 at 4:31 PM Hanna Reitz <hreitz@redhat.com> wrote:
> >
> >     On 01.09.22 16:32, Nir Soffer wrote:
> >     > Add simple tests creating an image with all kinds of extents,
> >     different
> >     > formats, different backing chain, different protocol, and different
> >     > image options. Since all images have the same guest visible
> >     content they
> >     > must have the same checksum.
> >     >
> >     > To help debugging in case of failures, the output includes a
> >     json map of
> >     > every test image.
> >     >
> >     > Signed-off-by: Nir Soffer <nsoffer@redhat.com>
> >     > ---
> >     >   tests/qemu-iotests/tests/qemu-img-checksum    | 149
> >     ++++++++++++++++++
> >     >   .../qemu-iotests/tests/qemu-img-checksum.out  |  74 +++++++++
> >     >   2 files changed, 223 insertions(+)
> >     >   create mode 100755 tests/qemu-iotests/tests/qemu-img-checksum
> >     >   create mode 100644 tests/qemu-iotests/tests/qemu-img-checksum.out
> >     >
> >     > diff --git a/tests/qemu-iotests/tests/qemu-img-checksum
> >     b/tests/qemu-iotests/tests/qemu-img-checksum
> >     > new file mode 100755
> >     > index 0000000000..3a85ba33f2
> >     > --- /dev/null
> >     > +++ b/tests/qemu-iotests/tests/qemu-img-checksum
> >     > @@ -0,0 +1,149 @@
> >     > +#!/usr/bin/env python3
> >     > +# group: rw auto quick
> >     > +#
> >     > +# Test cases for qemu-img checksum.
> >     > +#
> >     > +# Copyright (C) 2022 Red Hat, Inc.
> >     > +#
> >     > +# This program is free software; you can redistribute it and/or
> >     modify
> >     > +# it under the terms of the GNU General Public License as
> >     published by
> >     > +# the Free Software Foundation; either version 2 of the License,
> or
> >     > +# (at your option) any later version.
> >     > +#
> >     > +# This program is distributed in the hope that it will be useful,
> >     > +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> >     > +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> >     > +# GNU General Public License for more details.
> >     > +#
> >     > +# You should have received a copy of the GNU General Public
> License
> >     > +# along with this program.  If not, see
> >     <http://www.gnu.org/licenses/>.
> >     > +
> >     > +import re
> >     > +
> >     > +import iotests
> >     > +
> >     > +from iotests import (
> >     > +    filter_testfiles,
> >     > +    qemu_img,
> >     > +    qemu_img_log,
> >     > +    qemu_io,
> >     > +    qemu_nbd_popen,
> >     > +)
> >     > +
> >     > +
> >     > +def checksum_available():
> >     > +    out = qemu_img("--help").stdout
> >     > +    return re.search(r"\bchecksum .+ filename\b", out) is not None
> >     > +
> >     > +
> >     > +if not checksum_available():
> >     > +    iotests.notrun("checksum command not available")
> >     > +
> >     > +iotests.script_initialize(
> >     > +    supported_fmts=["raw", "qcow2"],
> >     > +    supported_cache_modes=["none", "writeback"],
> >
> >     It doesn’t work with writeback, though, because it uses -T none
> below.
> >
> >
> > Good point
> >
> >
> >     Which by the way is a heavy cost, because I usually run tests in
> >     tmpfs,
> >     where this won’t work.  Is there any way of not doing the -T none
> >     below?
> >
> >
> > Testing using tempfs is problematic since you cannot test -T none.In
> > oVirt
> > we alway use /var/tmp which usually uses something that supports
> > direct I/O.
> >
> > Do we have a way to specify cache mode in the tests, so we can use -T
> none
> > only when the option is set?
>
> `./check` has a `-c` option (e.g. `./check -c none`), which lands in
> `iotests.cachemode`.  That isn’t automatically passed to qemu-img calls,
> but you can do it manually (i.e. `qemu_img_log("checksum", "-T",
> iotests.cachemode, disk_top)` instead of `"-T", "none"`).
>

Ok, I will change to use the current cache setting.


> >
> >     > +    supported_protocols=["file", "nbd"],
> >     > +    required_fmts=["raw", "qcow2"],
> >     > +)
> >     > +
> >     > +print()
> >     > +print("=== Test images ===")
> >     > +print()
> >     > +
> >     > +disk_raw = iotests.file_path('raw')
> >     > +qemu_img("create", "-f", "raw", disk_raw, "10m")
> >     > +qemu_io("-f", "raw",
> >     > +        "-c", "write -P 0x1 0 2m",      # data
> >     > +        "-c", "write -P 0x0 2m 2m",     # data with zeroes
> >     > +        "-c", "write -z 4m 2m",         # zero allocated
> >     > +        "-c", "write -z -u 6m 2m",      # zero hole
> >     > +                                        # unallocated
> >     > +        disk_raw)
> >     > +print(filter_testfiles(disk_raw))
> >     > +qemu_img_log("map", "--output", "json", disk_raw)
> >     > +
> >     > +disk_qcow2 = iotests.file_path('qcow2')
> >     > +qemu_img("create", "-f", "qcow2", disk_qcow2, "10m")
> >     > +qemu_io("-f", "qcow2",
> >     > +        "-c", "write -P 0x1 0 2m",      # data
> >     > +        "-c", "write -P 0x0 2m 2m",     # data with zeroes
> >     > +        "-c", "write -z 4m 2m",         # zero allocated
> >     > +        "-c", "write -z -u 6m 2m",      # zero hole
> >     > +                                        # unallocated
> >     > +        disk_qcow2)
> >     > +print(filter_testfiles(disk_qcow2))
> >     > +qemu_img_log("map", "--output", "json", disk_qcow2)
> >
> >     This isn’t how iotests work, generally.  When run with -qcow2
> >     -file, it
> >     should only test qcow2 on file, not raw on file, not raw on nbd.
> >     Perhaps
> >     this way this test could even support other formats than qcow2 and
> >     raw.
> >
> >
> > For this type of tests, running the same code with raw, qcow2 (and
> > other formats)
> > and using file or nbd is the best way to test this feature - one test
> > verifies all the
> > use cases.
>
> Yes, I see, but that’s a general thing in the iotests.  The design is
> such that tests don’t cycle through their test matrix themselves, but
> that they always only test a single combination of format+protocol on
> each run, and the user is responsible for cycling through the desired
> test matrix.
>
> I’m not saying that was definitely the best design decision, but the
> problem now that if a test cycles through its test matrix by itself, it
> must also ensure that it is run only once when the user cycles through
> the same test matrix.  For example, a reasonable run of the iotests
> consists of `./check -raw`, `./check -qcow2`, and `./check -nbd`.  This
> test here would then run in all three configurations, but do the same
> thing every time (specifically, test all of those configurations every
> time).
>
> So there’s a conflict.  Either the test follows the existing design and
> only tests a single configuration, as iotests are expected to do, or we
> have the question of how to deal with the fact that users will run the
> test suite in multiple configurations, but one run of this test would
> already cover them all.
>
> I’m not sternly against cycling through the possible combinations right
> here in the test, but I want to lay out the problems with that approach.
>
> > I can change this to use the current format (raw, qcow2, ...),
> > protocol (file, nbd, ...)
> > and cache value (none, writeback), but I'm not sure how this can work
> > with the
> > out files. The output from nbd is different from file. Maybe we need
> > different out
> > files for file and nbd (qemu-img-checksum.file.out,
> > qemu-img-checksum.nbd.out)?
>
> We already have that for the format (e.g. 178.out.{qcow2,raw}).  If you
> decide to do that, it shouldn’t be too difficult to implement
> (testrunner.py’s `TestRunner.find_reference()`).  Alternatively, it’s
> also possible to basically ignore the reference output and verify the
> expected output right in the test code.
>

In this kind of test checking the output is the right thing since this is
the actual result of the command.

I'll change to use the current format and cache mode - this should work fine
with file protocol, and will simplify the test a lot.

I'm not sure if this will work for NBD but it is less important and can be
added
later. There is nothing related to NBD in the code.

Nir

[-- Attachment #2: Type: text/html, Size: 11514 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2022-11-28 10:47 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-09-01 14:32 [PATCH 0/3] Add qemu-img checksum command using blkhash Nir Soffer
2022-09-01 14:32 ` [PATCH 1/3] qemu-img: Add checksum command Nir Soffer
2022-10-26 13:00   ` Hanna Reitz
2022-10-30 17:37     ` Nir Soffer
2022-11-07 10:20       ` Hanna Reitz
2022-11-28 10:37         ` Nir Soffer
2022-09-01 14:32 ` [PATCH 2/3] iotests: Test qemu-img checksum Nir Soffer
2022-10-26 13:30   ` Hanna Reitz
2022-10-30 17:38     ` Nir Soffer
2022-11-07 11:41       ` Hanna Reitz
2022-11-28 10:46         ` Nir Soffer
2022-09-01 14:32 ` [PATCH 3/3] qemu-img: Speed up checksum Nir Soffer
2022-10-26 13:54   ` Hanna Reitz
2022-10-30 17:38     ` Nir Soffer
2022-10-30 19:15       ` Nir Soffer
2022-11-07 10:38       ` Hanna Reitz
2022-09-18  9:35 ` [PATCH 0/3] Add qemu-img checksum command using blkhash Nir Soffer
2022-10-18 17:49   ` Nir Soffer

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.