All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/4] qcow2: Improve refcount structure rebuilding
@ 2021-03-10 15:59 Max Reitz
  2021-03-10 15:59 ` [PATCH 1/4] " Max Reitz
                   ` (3 more replies)
  0 siblings, 4 replies; 9+ messages in thread
From: Max Reitz @ 2021-03-10 15:59 UTC (permalink / raw)
  To: qemu-block; +Cc: Kevin Wolf, qemu-devel, Max Reitz

Hi,

When the qcow2 refcount structure is broken to a point where we cannot
rely on any information from it (because it shows clusters as free that
are in use), “qemu-img check -r all” completely rewrites it.

The new reftable is preferably written into the area covered by the last
refblock for the image, but if that refblock is empty (e.g. because the
image is on a block device and there is just nothing near the end of the
block device), then the reftable will be put after the image’s end.
Which is a problem on block devices, because they can’t easily be
resized (also, resizing wouldn’t really help in this case, because the
reftable would still be written past the new end).

So patch 1 modifies the algorithm to put the reftable into the first
free space in the image, and patch 4 adds a test.  So that we can make
patch 4 a bit nicer, patches 2 and 3 are included.  (In case you don’t
like anything about 2 or 3, I also have a version of this series without
patches 2 and 3, where 4 is correspondingly unnicer.)


Max Reitz (4):
  qcow2: Improve refcount structure rebuilding
  iotests/common.qemu: Add _cleanup_single_qemu
  iotests/common.qemu: Allow using the QSD
  iotests/108: Test new refcount rebuild algorithm

 block/qcow2-refcount.c         | 126 ++++++++--------
 tests/qemu-iotests/108         | 259 +++++++++++++++++++++++++++++++++
 tests/qemu-iotests/108.out     |  80 ++++++++++
 tests/qemu-iotests/common.qemu | 108 ++++++++++----
 4 files changed, 483 insertions(+), 90 deletions(-)

-- 
2.29.2



^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 1/4] qcow2: Improve refcount structure rebuilding
  2021-03-10 15:59 [PATCH 0/4] qcow2: Improve refcount structure rebuilding Max Reitz
@ 2021-03-10 15:59 ` Max Reitz
  2021-03-26 11:48   ` Vladimir Sementsov-Ogievskiy
  2021-03-10 15:59 ` [PATCH 2/4] iotests/common.qemu: Add _cleanup_single_qemu Max Reitz
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 9+ messages in thread
From: Max Reitz @ 2021-03-10 15:59 UTC (permalink / raw)
  To: qemu-block; +Cc: Kevin Wolf, qemu-devel, Max Reitz

When rebuilding the refcount structures (when qemu-img check -r found
errors with refcount = 0, but reference count > 0), the new refcount
table defaults to being put at the image file end[1].  There is no good
reason for that except that it means we will not have to rewrite any
refblocks we already wrote to disk.

Changing the code to rewrite those refblocks is not too difficult,
though, so let us do that.  That is beneficial for images on block
devices, where we cannot really write beyond the end of the image file.

[1] Unless there is something allocated in the area pointed to by the
    last refblock, so we have to write that refblock.  In that case, we
    try to put the reftable in there.

Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1519071
Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 block/qcow2-refcount.c | 126 ++++++++++++++++++++++-------------------
 1 file changed, 67 insertions(+), 59 deletions(-)

diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index 8e649b008e..162caeeb8e 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -2352,8 +2352,9 @@ static int rebuild_refcount_structure(BlockDriverState *bs,
                                       int64_t *nb_clusters)
 {
     BDRVQcow2State *s = bs->opaque;
-    int64_t first_free_cluster = 0, reftable_offset = -1, cluster = 0;
+    int64_t first_free_cluster = 0, reftable_offset = -1, cluster;
     int64_t refblock_offset, refblock_start, refblock_index;
+    int64_t first_cluster, end_cluster;
     uint32_t reftable_size = 0;
     uint64_t *on_disk_reftable = NULL;
     void *on_disk_refblock;
@@ -2365,8 +2366,11 @@ static int rebuild_refcount_structure(BlockDriverState *bs,
 
     qcow2_cache_empty(bs, s->refcount_block_cache);
 
+    first_cluster = 0;
+    end_cluster = *nb_clusters;
+
 write_refblocks:
-    for (; cluster < *nb_clusters; cluster++) {
+    for (cluster = first_cluster; cluster < end_cluster; cluster++) {
         if (!s->get_refcount(*refcount_table, cluster)) {
             continue;
         }
@@ -2374,65 +2378,68 @@ write_refblocks:
         refblock_index = cluster >> s->refcount_block_bits;
         refblock_start = refblock_index << s->refcount_block_bits;
 
-        /* Don't allocate a cluster in a refblock already written to disk */
-        if (first_free_cluster < refblock_start) {
-            first_free_cluster = refblock_start;
-        }
-        refblock_offset = alloc_clusters_imrt(bs, 1, refcount_table,
-                                              nb_clusters, &first_free_cluster);
-        if (refblock_offset < 0) {
-            fprintf(stderr, "ERROR allocating refblock: %s\n",
-                    strerror(-refblock_offset));
-            res->check_errors++;
-            ret = refblock_offset;
-            goto fail;
-        }
-
-        if (reftable_size <= refblock_index) {
-            uint32_t old_reftable_size = reftable_size;
-            uint64_t *new_on_disk_reftable;
+        if (reftable_size > refblock_index &&
+            on_disk_reftable[refblock_index])
+        {
+            refblock_offset = on_disk_reftable[refblock_index];
+        } else {
+            int64_t refblock_cluster_index;
 
-            reftable_size = ROUND_UP((refblock_index + 1) * REFTABLE_ENTRY_SIZE,
-                                     s->cluster_size) / REFTABLE_ENTRY_SIZE;
-            new_on_disk_reftable = g_try_realloc(on_disk_reftable,
-                                                 reftable_size *
-                                                 REFTABLE_ENTRY_SIZE);
-            if (!new_on_disk_reftable) {
+            /* Don't allocate a cluster in a refblock already written to disk */
+            if (first_free_cluster < refblock_start) {
+                first_free_cluster = refblock_start;
+            }
+            refblock_offset = alloc_clusters_imrt(bs, 1, refcount_table,
+                                                  nb_clusters,
+                                                  &first_free_cluster);
+            if (refblock_offset < 0) {
+                fprintf(stderr, "ERROR allocating refblock: %s\n",
+                        strerror(-refblock_offset));
                 res->check_errors++;
-                ret = -ENOMEM;
+                ret = refblock_offset;
                 goto fail;
             }
-            on_disk_reftable = new_on_disk_reftable;
 
-            memset(on_disk_reftable + old_reftable_size, 0,
-                   (reftable_size - old_reftable_size) * REFTABLE_ENTRY_SIZE);
+            refblock_cluster_index = refblock_offset / s->cluster_size;
+            if (refblock_cluster_index >= end_cluster) {
+                /*
+                 * We must write the refblock that holds this refblock's
+                 * refcount
+                 */
+                end_cluster = refblock_cluster_index + 1;
+            }
 
-            /* The offset we have for the reftable is now no longer valid;
-             * this will leak that range, but we can easily fix that by running
-             * a leak-fixing check after this rebuild operation */
-            reftable_offset = -1;
-        } else {
-            assert(on_disk_reftable);
-        }
-        on_disk_reftable[refblock_index] = refblock_offset;
+            if (reftable_size <= refblock_index) {
+                uint32_t old_reftable_size = reftable_size;
+                uint64_t *new_on_disk_reftable;
+
+                reftable_size =
+                    ROUND_UP((refblock_index + 1) * REFTABLE_ENTRY_SIZE,
+                             s->cluster_size) / REFTABLE_ENTRY_SIZE;
+                new_on_disk_reftable =
+                    g_try_realloc(on_disk_reftable,
+                                  reftable_size * REFTABLE_ENTRY_SIZE);
+                if (!new_on_disk_reftable) {
+                    res->check_errors++;
+                    ret = -ENOMEM;
+                    goto fail;
+                }
+                on_disk_reftable = new_on_disk_reftable;
 
-        /* If this is apparently the last refblock (for now), try to squeeze the
-         * reftable in */
-        if (refblock_index == (*nb_clusters - 1) >> s->refcount_block_bits &&
-            reftable_offset < 0)
-        {
-            uint64_t reftable_clusters = size_to_clusters(s, reftable_size *
-                                                          REFTABLE_ENTRY_SIZE);
-            reftable_offset = alloc_clusters_imrt(bs, reftable_clusters,
-                                                  refcount_table, nb_clusters,
-                                                  &first_free_cluster);
-            if (reftable_offset < 0) {
-                fprintf(stderr, "ERROR allocating reftable: %s\n",
-                        strerror(-reftable_offset));
-                res->check_errors++;
-                ret = reftable_offset;
-                goto fail;
+                memset(on_disk_reftable + old_reftable_size, 0,
+                       (reftable_size - old_reftable_size) *
+                       REFTABLE_ENTRY_SIZE);
+
+                /*
+                 * The offset we have for the reftable is now no longer valid;
+                 * this will leak that range, but we can easily fix that by
+                 * running a leak-fixing check after this rebuild operation
+                 */
+                reftable_offset = -1;
+            } else {
+                assert(on_disk_reftable);
             }
+            on_disk_reftable[refblock_index] = refblock_offset;
         }
 
         ret = qcow2_pre_write_overlap_check(bs, 0, refblock_offset,
@@ -2459,15 +2466,12 @@ write_refblocks:
     }
 
     if (reftable_offset < 0) {
-        uint64_t post_refblock_start, reftable_clusters;
+        uint64_t reftable_clusters;
 
-        post_refblock_start = ROUND_UP(*nb_clusters, s->refcount_block_size);
         reftable_clusters =
             size_to_clusters(s, reftable_size * REFTABLE_ENTRY_SIZE);
-        /* Not pretty but simple */
-        if (first_free_cluster < post_refblock_start) {
-            first_free_cluster = post_refblock_start;
-        }
+
+        first_free_cluster = 0;
         reftable_offset = alloc_clusters_imrt(bs, reftable_clusters,
                                               refcount_table, nb_clusters,
                                               &first_free_cluster);
@@ -2479,6 +2483,10 @@ write_refblocks:
             goto fail;
         }
 
+        assert(offset_into_cluster(s, reftable_offset) == 0);
+        first_cluster = reftable_offset / s->cluster_size;
+        end_cluster = first_cluster + reftable_clusters;
+
         goto write_refblocks;
     }
 
-- 
2.29.2



^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 2/4] iotests/common.qemu: Add _cleanup_single_qemu
  2021-03-10 15:59 [PATCH 0/4] qcow2: Improve refcount structure rebuilding Max Reitz
  2021-03-10 15:59 ` [PATCH 1/4] " Max Reitz
@ 2021-03-10 15:59 ` Max Reitz
  2021-03-10 15:59 ` [PATCH 3/4] iotests/common.qemu: Allow using the QSD Max Reitz
  2021-03-10 15:59 ` [PATCH 4/4] iotests/108: Test new refcount rebuild algorithm Max Reitz
  3 siblings, 0 replies; 9+ messages in thread
From: Max Reitz @ 2021-03-10 15:59 UTC (permalink / raw)
  To: qemu-block; +Cc: Kevin Wolf, qemu-devel, Max Reitz

_cleanup_qemu cleans up all qemu instances, which sometimes is not very
useful.  Pull out _cleanup_single_qemu, which does the same only for a
single instance.

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 tests/qemu-iotests/common.qemu | 55 +++++++++++++++++++++-------------
 1 file changed, 34 insertions(+), 21 deletions(-)

diff --git a/tests/qemu-iotests/common.qemu b/tests/qemu-iotests/common.qemu
index 0fc52d20d7..56c1ea1bac 100644
--- a/tests/qemu-iotests/common.qemu
+++ b/tests/qemu-iotests/common.qemu
@@ -365,37 +365,50 @@ _launch_qemu()
 }
 
 
-# Silently kills the QEMU process
+# Silently kills all QEMU processes
 #
 # If $wait is set to anything other than the empty string, the process will not
 # be killed but only waited for, and any output will be forwarded to stdout. If
 # $wait is empty, the process will be killed and all output will be suppressed.
+#
+# To cleanup a single process, use _cleanup_single_qemu instead.
 _cleanup_qemu()
 {
     # QEMU_PID[], QEMU_IN[], QEMU_OUT[] all use same indices
     for i in "${!QEMU_OUT[@]}"
     do
-        local QEMU_PID
-        if [ -f "${QEMU_TEST_DIR}/qemu-${i}.pid" ]; then
-            read QEMU_PID < "${QEMU_TEST_DIR}/qemu-${i}.pid"
-            rm -f "${QEMU_TEST_DIR}/qemu-${i}.pid"
-            if [ -z "${wait}" ] && [ -n "${QEMU_PID}" ]; then
-                kill -KILL ${QEMU_PID} 2>/dev/null
-            fi
-            if [ -n "${QEMU_PID}" ]; then
-                wait ${QEMU_PID} 2>/dev/null # silent kill
-            fi
-        fi
+        _cleanup_single_qemu $i
+    done
+}
 
-        if [ -n "${wait}" ]; then
-            cat <&${QEMU_OUT[$i]} | _filter_testdir | _filter_qemu \
-                                  | _filter_qemu_io | _filter_qmp | _filter_hmp
+# The same as _cleanup_qemu, but for a single instance.
+#
+# Input parameters:
+# $1: qemu handle
+_cleanup_single_qemu()
+{
+    i=$1
+
+    local QEMU_PID
+    if [ -f "${QEMU_TEST_DIR}/qemu-${i}.pid" ]; then
+        read QEMU_PID < "${QEMU_TEST_DIR}/qemu-${i}.pid"
+        rm -f "${QEMU_TEST_DIR}/qemu-${i}.pid"
+        if [ -z "${wait}" ] && [ -n "${QEMU_PID}" ]; then
+            kill -KILL ${QEMU_PID} 2>/dev/null
         fi
-        rm -f "${QEMU_FIFO_IN}_${i}" "${QEMU_FIFO_OUT}_${i}"
-        eval "exec ${QEMU_IN[$i]}<&-"   # close file descriptors
-        eval "exec ${QEMU_OUT[$i]}<&-"
+        if [ -n "${QEMU_PID}" ]; then
+            wait ${QEMU_PID} 2>/dev/null # silent kill
+        fi
+    fi
 
-        unset QEMU_IN[$i]
-        unset QEMU_OUT[$i]
-    done
+    if [ -n "${wait}" ]; then
+        cat <&${QEMU_OUT[$i]} | _filter_testdir | _filter_qemu \
+                              | _filter_qemu_io | _filter_qmp | _filter_hmp
+    fi
+    rm -f "${QEMU_FIFO_IN}_${i}" "${QEMU_FIFO_OUT}_${i}"
+    eval "exec ${QEMU_IN[$i]}<&-"   # close file descriptors
+    eval "exec ${QEMU_OUT[$i]}<&-"
+
+    unset QEMU_IN[$i]
+    unset QEMU_OUT[$i]
 }
-- 
2.29.2



^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 3/4] iotests/common.qemu: Allow using the QSD
  2021-03-10 15:59 [PATCH 0/4] qcow2: Improve refcount structure rebuilding Max Reitz
  2021-03-10 15:59 ` [PATCH 1/4] " Max Reitz
  2021-03-10 15:59 ` [PATCH 2/4] iotests/common.qemu: Add _cleanup_single_qemu Max Reitz
@ 2021-03-10 15:59 ` Max Reitz
  2021-03-10 15:59 ` [PATCH 4/4] iotests/108: Test new refcount rebuild algorithm Max Reitz
  3 siblings, 0 replies; 9+ messages in thread
From: Max Reitz @ 2021-03-10 15:59 UTC (permalink / raw)
  To: qemu-block; +Cc: Kevin Wolf, qemu-devel, Max Reitz

For block things, we often do not need to run all of qemu, so allow
using the qemu-storage-daemon instead.

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 tests/qemu-iotests/common.qemu | 53 +++++++++++++++++++++++++++-------
 1 file changed, 43 insertions(+), 10 deletions(-)

diff --git a/tests/qemu-iotests/common.qemu b/tests/qemu-iotests/common.qemu
index 56c1ea1bac..8003ed3586 100644
--- a/tests/qemu-iotests/common.qemu
+++ b/tests/qemu-iotests/common.qemu
@@ -291,6 +291,8 @@ _wait_event()
 # $qmp_pretty: Set this variable to 'y' to enable QMP pretty printing.
 # $keep_stderr: Set this variable to 'y' to keep QEMU's stderr output on stderr.
 #               If this variable is empty, stderr will be redirected to stdout.
+# $qsd: Set this variable to 'y' to use the QSD instead of QEMU.
+#       stdout and stderr are never redirected when using the QSD.
 # Returns:
 # $QEMU_HANDLE: set to a handle value to communicate with this QEMU instance.
 #
@@ -300,18 +302,31 @@ _launch_qemu()
     local fifo_out=
     local fifo_in=
 
+    if [[ $qsd = 'y' ]]; then
+        mon_arg='--monitor'
+    else
+        mon_arg='-mon'
+    fi
+
     if (shopt -s nocasematch; [[ "${qemu_comm_method}" == "monitor" ]])
     then
-        comm="-monitor stdio"
+        comm=(--chardev stdio,id=pipe
+              $mon_arg pipe,mode=readline)
     else
         local qemu_comm_method="qmp"
         if [ "$qmp_pretty" = "y" ]; then
-            comm="-monitor none -qmp-pretty stdio"
+            comm=(--chardev stdio,id=pipe
+                  $mon_arg pipe,mode=control,pretty=on)
         else
-            comm="-monitor none -qmp stdio"
+            comm=(--chardev stdio,id=pipe
+                  $mon_arg pipe,mode=control,pretty=off)
         fi
     fi
 
+    if [[ $qsd != 'y' ]]; then
+        comm=(-monitor none "${comm[@]}")
+    fi
+
     fifo_out=${QEMU_FIFO_OUT}_${_QEMU_HANDLE}
     fifo_in=${QEMU_FIFO_IN}_${_QEMU_HANDLE}
     mkfifo "${fifo_out}"
@@ -322,15 +337,23 @@ _launch_qemu()
         object_options="--object secret,id=keysec0,data=$IMGKEYSECRET"
     fi
 
+    if [[ $qsd = 'y' ]]; then
+        cmd=$QSD
+        args=()
+    else
+        cmd=$QEMU
+        args=(-nographic -serial none)
+    fi
+    args+=(${object_options} "${comm[@]}")
+    args+=("$@")
+
+    # Just set both QEMU_NEED_PID and QSD_NEED_PID, it can't hurt.
     if [ -z "$keep_stderr" ]; then
-        QEMU_NEED_PID='y'\
-        ${QEMU} ${object_options} -nographic -serial none ${comm} "${@}" >"${fifo_out}" \
-                                                       2>&1 \
-                                                       <"${fifo_in}" &
+        QEMU_NEED_PID='y' QSD_NEED_PID='y' $cmd "${args[@]}" \
+            >"$fifo_out" 2>&1 <"$fifo_in" &
     elif [ "$keep_stderr" = "y" ]; then
-        QEMU_NEED_PID='y'\
-        ${QEMU} ${object_options} -nographic -serial none ${comm} "${@}" >"${fifo_out}" \
-                                                       <"${fifo_in}" &
+        QEMU_NEED_PID='y' QSD_NEED_PID='y' $cmd "${args[@]}" \
+            >"$fifo_out" <"$fifo_in" &
     else
         exit 1
     fi
@@ -360,6 +383,16 @@ _launch_qemu()
             silent=yes _timed_wait_for ${_QEMU_HANDLE} "^}"
         fi
     fi
+
+    if [[ $qsd = 'y' ]]; then
+        # Wait for PID file, then move it to where qemu would put it
+        pidfile="$QEMU_TEST_DIR/qemu-storage-daemon.pid"
+        while [[ ! -f $pidfile ]]; do
+            sleep 0.5
+        done
+        mv "$pidfile" "$QEMU_TEST_DIR/qemu-${_QEMU_HANDLE}.pid"
+    fi
+
     QEMU_HANDLE=${_QEMU_HANDLE}
     let _QEMU_HANDLE++
 }
-- 
2.29.2



^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 4/4] iotests/108: Test new refcount rebuild algorithm
  2021-03-10 15:59 [PATCH 0/4] qcow2: Improve refcount structure rebuilding Max Reitz
                   ` (2 preceding siblings ...)
  2021-03-10 15:59 ` [PATCH 3/4] iotests/common.qemu: Allow using the QSD Max Reitz
@ 2021-03-10 15:59 ` Max Reitz
  2021-03-10 16:07   ` Eric Blake
  3 siblings, 1 reply; 9+ messages in thread
From: Max Reitz @ 2021-03-10 15:59 UTC (permalink / raw)
  To: qemu-block; +Cc: Kevin Wolf, qemu-devel, Max Reitz

One clear problem with how qcow2's refcount structure rebuild algorithm
used to be before "qcow2: Improve refcount structure rebuilding" was
that it is prone to failure for qcow2 images on block devices: There is
generally unused space after the actual image, and if that exceeds what
one refblock covers, the old algorithm would invariably write the
reftable past the block device's end, which cannot work.  The new
algorithm does not have this problem.

Test it with three tests:
(1) Create an image with more empty space at the end than what one
    refblock covers, see whether rebuilding the refcount structures
    results in a change in the image file length.  (It should not.)

(2) Leave precisely enough space somewhere at the beginning of the image
    for the new reftable (and the refblock for that place), see whether
    the new algorithm puts the reftable there.  (It should.)

(3) Test the original problem: Create (something like) a block device
    with a fixed size, then create a qcow2 image in there, write some
    data, and then have qemu-img check rebuild the refcount structures.
    Before HEAD^, the reftable would have been written past the image
    file end, i.e. outside of what the block device provides, which
    cannot work.  HEAD^ should have fixed that.
    ("Something like a block device" means a loop device if we can use
    one ("sudo -n losetup" works), or a FUSE block export with
    growable=false otherwise.)

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 tests/qemu-iotests/108     | 259 +++++++++++++++++++++++++++++++++++++
 tests/qemu-iotests/108.out |  80 ++++++++++++
 2 files changed, 339 insertions(+)

diff --git a/tests/qemu-iotests/108 b/tests/qemu-iotests/108
index 8eaef0b8bf..7b1aa08781 100755
--- a/tests/qemu-iotests/108
+++ b/tests/qemu-iotests/108
@@ -37,6 +37,7 @@ trap "_cleanup; exit \$status" 0 1 2 3 15
 # get standard environment, filters and checks
 . ./common.rc
 . ./common.filter
+. ./common.qemu
 
 # This tests qcow2-specific low-level functionality
 _supported_fmt qcow2
@@ -47,6 +48,22 @@ _supported_os Linux
 # files
 _unsupported_imgopts 'refcount_bits=\([^1]\|.\([^6]\|$\)\)' data_file
 
+# This test either needs sudo -n losetup or FUSE exports to work
+if sudo -n losetup &>/dev/null; then
+    loopdev=true
+else
+    loopdev=false
+
+    # QSD --export fuse will either yield "Parameter 'id' is missing"
+    # or "Invalid parameter 'fuse'", depending on whether there is
+    # FUSE support or not.
+    error=$($QSD --export fuse 2>&1)
+    if [[ $error = *"Invalid parameter 'fuse'" ]]; then
+        _notrun 'Passwordless sudo for losetup or FUSE support required, but' \
+                'neither is available'
+    fi
+fi
+
 echo
 echo '=== Repairing an image without any refcount table ==='
 echo
@@ -138,6 +155,248 @@ _make_test_img 64M
 poke_file "$TEST_IMG" $((0x10008)) "\xff\xff\xff\xff\xff\xff\x00\x00"
 _check_test_img -r all
 
+echo
+echo '=== Check rebuilt reftable location ==='
+
+# In an earlier version of the refcount rebuild algorithm, the
+# reftable was generally placed at the image end (unless something was
+# allocated in the area covered by the refblock right before the image
+# file end, then we would try to place the reftable in that refblock).
+# This was later changed so the reftable would be placed in the
+# earliest possible location.  Test this.
+
+echo
+echo '--- Does the image size increase? ---'
+echo
+
+# First test: Just create some image, write some data to it, and
+# resize it so there is free space at the end of the image (enough
+# that it spans at least one full refblock, which for cluster_size=512
+# images, spans 128k).  With the old algorithm, the reftable would
+# have then been placed at the end of the image file, but with the new
+# one, it will be put in that free space.
+# We want to check whether the size of the image file increases due to
+# rebuilding the refcount structures (it should not).
+
+_make_test_img -o 'cluster_size=512' 1M
+# Write something
+$QEMU_IO -c 'write 0 64k' "$TEST_IMG" | _filter_qemu_io
+
+# Add free space
+file_len=$(stat -c '%s' "$TEST_IMG")
+truncate -s $((file_len + 256 * 1024)) "$TEST_IMG"
+
+# Corrupt the image by saying the image header were not allocated
+rt_offset=$(peek_file_be "$TEST_IMG" 48 8)
+rb_offset=$(peek_file_be "$TEST_IMG" $rt_offset 8)
+poke_file "$TEST_IMG" $rb_offset "\x00\x00"
+
+# Check whether rebuilding the refcount structures increases the image
+# file size
+file_len=$(stat -c '%s' "$TEST_IMG")
+echo
+# The only leaks there can be are the old refcount structures that are
+# leaked during rebuilding, no need to clutter the output with them
+_check_test_img -r all | grep -v '^Repairing cluster.*refcount=1 reference=0'
+echo
+post_repair_file_len=$(stat -c '%s' "$TEST_IMG")
+
+if [[ $file_len -eq $post_repair_file_len ]]; then
+    echo 'OK: Image size did not change'
+else
+    echo 'ERROR: Image size differs' \
+        "($file_len before, $post_repair_file_len after)"
+fi
+
+echo
+echo '--- Will the reftable occupy a hole specifically left for it?  ---'
+echo
+
+# Note: With cluster_size=512, every refblock covers 128k.
+# The reftable covers 8M per reftable cluster.
+
+# Create an image that requires two reftable clusters (just because
+# this is more interesting than a single-clustered reftable).
+_make_test_img -o 'cluster_size=512' 9M
+$QEMU_IO -c 'write 0 8M' "$TEST_IMG" | _filter_qemu_io
+
+# Writing 8M will have resized the reftable.  Unfortunately, doing so
+# will leave holes in the file, so we need to fill them up so we can
+# be sure the whole file is allocated.  Do that by writing
+# consecutively smaller chunks starting from 8 MB, until the file
+# length increases even with a chunk size of 512.  Then we must have
+# filled all holes.
+ofs=$((8 * 1024 * 1024))
+block_len=$((16 * 1024))
+while [[ $block_len -ge 512 ]]; do
+    file_len=$(stat -c '%s' "$TEST_IMG")
+    while [[ $(stat -c '%s' "$TEST_IMG") -eq $file_len ]]; do
+        # Do not include this in the reference output, it does not
+        # really matter which qemu-io calls we do here exactly
+        $QEMU_IO -c "write $ofs $block_len" "$TEST_IMG" >/dev/null
+        ofs=$((ofs + block_len))
+    done
+    block_len=$((block_len / 2))
+done
+
+# Fill up to 9M (do not include this in the reference output either,
+# $ofs is random for all we know)
+$QEMU_IO -c "write $ofs $((9 * 1024 * 1024 - ofs))" "$TEST_IMG" >/dev/null
+
+# Make space as follows:
+# - For the first refblock: Right at the beginning of the image (this
+#   refblock is placed in the first place possible),
+# - For the reftable somewhere soon afterwards, still near the
+#   beginning of the image (i.e. covered by the first refblock); the
+#   reftable too is placed in the first place possible, but only after
+#   all refblocks have been placed)
+# No space is needed for the other refblocks, because no refblock is
+# put before the space it covers.  In this test case, we do not mind
+# if they are placed at the image file's end.
+
+# Before we make that space, we have to find out the host offset of
+# the area that belonged to the two data clusters at guest offset 4k,
+# because we expect the reftable to be placed there, and we will have
+# to verify that it is.
+
+l1_offset=$(peek_file_be "$TEST_IMG" 40 8)
+l2_offset=$(peek_file_be "$TEST_IMG" $l1_offset 8)
+l2_offset=$((l2_offset & 0x00fffffffffffe00))
+data_4k_offset=$(peek_file_be "$TEST_IMG" \
+                 $((l2_offset + 4096 / 512 * 8)) 8)
+data_4k_offset=$((data_4k_offset & 0x00fffffffffffe00))
+
+$QEMU_IO -c "discard 0 512" -c "discard 4k 1k" "$TEST_IMG" | _filter_qemu_io
+
+# Corrupt the image by saying the image header were not allocated
+rt_offset=$(peek_file_be "$TEST_IMG" 48 8)
+rb_offset=$(peek_file_be "$TEST_IMG" $rt_offset 8)
+poke_file "$TEST_IMG" $rb_offset "\x00\x00"
+
+echo
+# The only leaks there can be are the old refcount structures that are
+# leaked during rebuilding, no need to clutter the output with them
+_check_test_img -r all | grep -v '^Repairing cluster.*refcount=1 reference=0'
+echo
+
+# Check whether the reftable was put where we expected
+rt_offset=$(peek_file_be "$TEST_IMG" 48 8)
+if [[ $rt_offset -eq $data_4k_offset ]]; then
+    echo 'OK: Reftable is where we expect it'
+else
+    echo "ERROR: Reftable is at $rt_offset, but was expected at $data_4k_offset"
+fi
+
+echo
+echo '--- Rebuilding refcount structures on block devices ---'
+echo
+
+# A block device cannot really grow, at least not during qemu-img
+# check.  As mentioned in the above cases, rebuilding the refcount
+# structure may lead to new refcount structures being written after
+# the end of the image, and in the past that happened even if there
+# was more than sufficient space in the image.  Such post-EOF writes
+# will not work on block devices, so test that the new algorithm
+# avoids it.
+
+# If we have passwordless sudo and losetup, we can use those to create
+# a block device.  Otherwise, we can resort to qemu's FUSE export to
+# create a file that isn't growable, which effectively tests the same
+# thing.
+
+_cleanup_test_img
+truncate -s $((64 * 1024 * 1024)) "$TEST_IMG"
+
+if $loopdev; then
+    export_mp=$(sudo -n losetup --show -f "$TEST_IMG")
+    export_mp_driver=host_device
+    sudo -n chmod go+rw "$export_mp"
+else
+    # Create non-growable FUSE export that is a bit like an empty
+    # block device
+    export_mp="$TEST_DIR/fuse-export"
+    export_mp_driver=file
+    touch "$export_mp"
+
+    qsd=y _launch_qemu \
+        --blockdev file,node-name=export-node,filename="$TEST_IMG" \
+        --export fuse,id=fuse-export,node-name=export-node,mountpoint="$export_mp",writable=on,growable=off
+    fuse_export_handle=$QEMU_HANDLE
+
+    # Must not appear in the reference output (to stay equal to the
+    # losetup branch).
+    _send_qemu_cmd \
+        $fuse_export_handle \
+        '{ "execute": "qmp_capabilities" }' \
+        'return' \
+        >/dev/null
+fi
+
+# Now create a qcow2 image on the device -- unfortunately, qemu-img
+# create force-creates the file, so we have to resort to the
+# blockdev-create job.
+qsd=y _launch_qemu \
+    --blockdev $export_mp_driver,node-name=file,filename="$export_mp"
+
+_send_qemu_cmd \
+    $QEMU_HANDLE \
+    '{ "execute": "qmp_capabilities" }' \
+    'return'
+
+# Small cluster size again, so the image needs multiple refblocks
+_send_qemu_cmd \
+    $QEMU_HANDLE \
+    '{ "execute": "blockdev-create",
+       "arguments": {
+           "job-id": "create",
+           "options": {
+               "driver": "qcow2",
+               "file": "file",
+               "size": '$((64 * 1024 * 1024))',
+               "cluster-size": 512
+           } } }' \
+    '"concluded"'
+
+_send_qemu_cmd \
+    $QEMU_HANDLE \
+    '{ "execute": "job-dismiss", "arguments": { "id": "create" } }' \
+    'return'
+
+_send_qemu_cmd \
+    $QEMU_HANDLE \
+    '{ "execute": "quit" }' \
+    'return'
+
+wait=y _cleanup_single_qemu $QEMU_HANDLE
+echo
+
+# Write some data
+$QEMU_IO -c 'write 0 64k' "$export_mp" | _filter_qemu_io
+
+# Corrupt the image by saying the image header were not allocated
+rt_offset=$(peek_file_be "$export_mp" 48 8)
+rb_offset=$(peek_file_be "$export_mp" $rt_offset 8)
+poke_file "$export_mp" $rb_offset "\x00\x00"
+
+echo
+TEST_IMG="$export_mp" _check_test_img -r all \
+    | grep -v '^Repairing cluster.*refcount=1 reference=0'
+
+if $loopdev; then
+    sudo -n losetup -d "$export_mp"
+else
+    # Again, must not appear in the reference output (to stay equal to
+    # the losetup branch).
+    _send_qemu_cmd \
+        $fuse_export_handle \
+        '{ "execute": "quit" }' \
+        'return' \
+        >/dev/null
+
+    wait=y _cleanup_single_qemu $fuse_export_handle >/dev/null
+    rm -f "$export_mp"
+fi
+
 # success, all done
 echo '*** done'
 rm -f $seq.full
diff --git a/tests/qemu-iotests/108.out b/tests/qemu-iotests/108.out
index 75bab8dc84..a4c571df1a 100644
--- a/tests/qemu-iotests/108.out
+++ b/tests/qemu-iotests/108.out
@@ -105,6 +105,86 @@ The following inconsistencies were found and repaired:
     0 leaked clusters
     1 corruptions
 
+Double checking the fixed image now...
+No errors were found on the image.
+
+=== Check rebuilt reftable location ===
+
+--- Does the image size increase? ---
+
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1048576
+wrote 65536/65536 bytes at offset 0
+64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+
+ERROR cluster 0 refcount=0 reference=1
+Rebuilding refcount structure
+The following inconsistencies were found and repaired:
+
+    0 leaked clusters
+    1 corruptions
+
+Double checking the fixed image now...
+No errors were found on the image.
+
+OK: Image size did not change
+
+--- Will the reftable occupy a hole specifically left for it?  ---
+
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=9437184
+wrote 8388608/8388608 bytes at offset 0
+8 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+discard 512/512 bytes at offset 0
+512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+discard 1024/1024 bytes at offset 4096
+1 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+
+ERROR cluster 0 refcount=0 reference=1
+Rebuilding refcount structure
+The following inconsistencies were found and repaired:
+
+    0 leaked clusters
+    1 corruptions
+
+Double checking the fixed image now...
+No errors were found on the image.
+
+OK: Reftable is where we expect it
+
+--- Rebuilding refcount structures on block devices ---
+
+{ "execute": "qmp_capabilities" }
+{"return": {}}
+{ "execute": "blockdev-create",
+       "arguments": {
+           "job-id": "create",
+           "options": {
+               "driver": "IMGFMT",
+               "file": "file",
+               "size": 67108864,
+               "cluster-size": 512
+           } } }
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "created", "id": "create"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "running", "id": "create"}}
+{"return": {}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "waiting", "id": "create"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "pending", "id": "create"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "concluded", "id": "create"}}
+{ "execute": "job-dismiss", "arguments": { "id": "create" } }
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "null", "id": "create"}}
+{"return": {}}
+{ "execute": "quit" }
+{"return": {}}
+
+wrote 65536/65536 bytes at offset 0
+64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+
+ERROR cluster 0 refcount=0 reference=1
+Rebuilding refcount structure
+The following inconsistencies were found and repaired:
+
+    0 leaked clusters
+    1 corruptions
+
 Double checking the fixed image now...
 No errors were found on the image.
 *** done
-- 
2.29.2



^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH 4/4] iotests/108: Test new refcount rebuild algorithm
  2021-03-10 15:59 ` [PATCH 4/4] iotests/108: Test new refcount rebuild algorithm Max Reitz
@ 2021-03-10 16:07   ` Eric Blake
  0 siblings, 0 replies; 9+ messages in thread
From: Eric Blake @ 2021-03-10 16:07 UTC (permalink / raw)
  To: Max Reitz, qemu-block; +Cc: Kevin Wolf, qemu-devel

On 3/10/21 9:59 AM, Max Reitz wrote:
> One clear problem with how qcow2's refcount structure rebuild algorithm
> used to be before "qcow2: Improve refcount structure rebuilding" was
> that it is prone to failure for qcow2 images on block devices: There is
> generally unused space after the actual image, and if that exceeds what
> one refblock covers, the old algorithm would invariably write the
> reftable past the block device's end, which cannot work.  The new
> algorithm does not have this problem.
> 
> Test it with three tests:
> (1) Create an image with more empty space at the end than what one
>     refblock covers, see whether rebuilding the refcount structures
>     results in a change in the image file length.  (It should not.)
> 
> (2) Leave precisely enough space somewhere at the beginning of the image
>     for the new reftable (and the refblock for that place), see whether
>     the new algorithm puts the reftable there.  (It should.)
> 
> (3) Test the original problem: Create (something like) a block device
>     with a fixed size, then create a qcow2 image in there, write some
>     data, and then have qemu-img check rebuild the refcount structures.
>     Before HEAD^, the reftable would have been written past the image
>     file end, i.e. outside of what the block device provides, which
>     cannot work.  HEAD^ should have fixed that.
>     ("Something like a block device" means a loop device if we can use
>     one ("sudo -n losetup" works), or a FUSE block export with
>     growable=false otherwise.)

We could use qemu-nbd as another alternative to create a non-growable
protocol layer.  Then we don't need root access via sudo to run the test.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/4] qcow2: Improve refcount structure rebuilding
  2021-03-10 15:59 ` [PATCH 1/4] " Max Reitz
@ 2021-03-26 11:48   ` Vladimir Sementsov-Ogievskiy
  2021-03-26 13:47     ` Max Reitz
  0 siblings, 1 reply; 9+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2021-03-26 11:48 UTC (permalink / raw)
  To: Max Reitz, qemu-block; +Cc: Kevin Wolf, qemu-devel

10.03.2021 18:59, Max Reitz wrote:
> When rebuilding the refcount structures (when qemu-img check -r found
> errors with refcount = 0, but reference count > 0), the new refcount
> table defaults to being put at the image file end[1].  There is no good
> reason for that except that it means we will not have to rewrite any
> refblocks we already wrote to disk.
> 
> Changing the code to rewrite those refblocks is not too difficult,
> though, so let us do that.  That is beneficial for images on block
> devices, where we cannot really write beyond the end of the image file.
> 
> [1] Unless there is something allocated in the area pointed to by the
>      last refblock, so we have to write that refblock.  In that case, we
>      try to put the reftable in there.
> 
> Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1519071
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>   block/qcow2-refcount.c | 126 ++++++++++++++++++++++-------------------
>   1 file changed, 67 insertions(+), 59 deletions(-)
> 
> diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
> index 8e649b008e..162caeeb8e 100644
> --- a/block/qcow2-refcount.c
> +++ b/block/qcow2-refcount.c
> @@ -2352,8 +2352,9 @@ static int rebuild_refcount_structure(BlockDriverState *bs,
>                                         int64_t *nb_clusters)
>   {
>       BDRVQcow2State *s = bs->opaque;
> -    int64_t first_free_cluster = 0, reftable_offset = -1, cluster = 0;
> +    int64_t first_free_cluster = 0, reftable_offset = -1, cluster;
>       int64_t refblock_offset, refblock_start, refblock_index;
> +    int64_t first_cluster, end_cluster;
>       uint32_t reftable_size = 0;
>       uint64_t *on_disk_reftable = NULL;
>       void *on_disk_refblock;
> @@ -2365,8 +2366,11 @@ static int rebuild_refcount_structure(BlockDriverState *bs,
>   
>       qcow2_cache_empty(bs, s->refcount_block_cache);
>   
> +    first_cluster = 0;
> +    end_cluster = *nb_clusters;
> +
>   write_refblocks:
> -    for (; cluster < *nb_clusters; cluster++) {
> +    for (cluster = first_cluster; cluster < end_cluster; cluster++) {
>           if (!s->get_refcount(*refcount_table, cluster)) {
>               continue;
>           }
> @@ -2374,65 +2378,68 @@ write_refblocks:
>           refblock_index = cluster >> s->refcount_block_bits;
>           refblock_start = refblock_index << s->refcount_block_bits;
>   
> -        /* Don't allocate a cluster in a refblock already written to disk */
> -        if (first_free_cluster < refblock_start) {
> -            first_free_cluster = refblock_start;
> -        }
> -        refblock_offset = alloc_clusters_imrt(bs, 1, refcount_table,
> -                                              nb_clusters, &first_free_cluster);
> -        if (refblock_offset < 0) {
> -            fprintf(stderr, "ERROR allocating refblock: %s\n",
> -                    strerror(-refblock_offset));
> -            res->check_errors++;
> -            ret = refblock_offset;
> -            goto fail;
> -        }
> -
> -        if (reftable_size <= refblock_index) {
> -            uint32_t old_reftable_size = reftable_size;
> -            uint64_t *new_on_disk_reftable;
> +        if (reftable_size > refblock_index &&
> +            on_disk_reftable[refblock_index])
> +        {
> +            refblock_offset = on_disk_reftable[refblock_index];

In this branch, we assign it to ..

> +        } else {
> +            int64_t refblock_cluster_index;
>   
> -            reftable_size = ROUND_UP((refblock_index + 1) * REFTABLE_ENTRY_SIZE,
> -                                     s->cluster_size) / REFTABLE_ENTRY_SIZE;
> -            new_on_disk_reftable = g_try_realloc(on_disk_reftable,
> -                                                 reftable_size *
> -                                                 REFTABLE_ENTRY_SIZE);
> -            if (!new_on_disk_reftable) {
> +            /* Don't allocate a cluster in a refblock already written to disk */
> +            if (first_free_cluster < refblock_start) {
> +                first_free_cluster = refblock_start;
> +            }
> +            refblock_offset = alloc_clusters_imrt(bs, 1, refcount_table,
> +                                                  nb_clusters,
> +                                                  &first_free_cluster);
> +            if (refblock_offset < 0) {
> +                fprintf(stderr, "ERROR allocating refblock: %s\n",
> +                        strerror(-refblock_offset));
>                   res->check_errors++;
> -                ret = -ENOMEM;
> +                ret = refblock_offset;
>                   goto fail;
>               }
> -            on_disk_reftable = new_on_disk_reftable;
>   
> -            memset(on_disk_reftable + old_reftable_size, 0,
> -                   (reftable_size - old_reftable_size) * REFTABLE_ENTRY_SIZE);
> +            refblock_cluster_index = refblock_offset / s->cluster_size;
> +            if (refblock_cluster_index >= end_cluster) {
> +                /*
> +                 * We must write the refblock that holds this refblock's
> +                 * refcount
> +                 */
> +                end_cluster = refblock_cluster_index + 1;
> +            }
>   
> -            /* The offset we have for the reftable is now no longer valid;
> -             * this will leak that range, but we can easily fix that by running
> -             * a leak-fixing check after this rebuild operation */
> -            reftable_offset = -1;
> -        } else {
> -            assert(on_disk_reftable);
> -        }
> -        on_disk_reftable[refblock_index] = refblock_offset;
> +            if (reftable_size <= refblock_index) {
> +                uint32_t old_reftable_size = reftable_size;
> +                uint64_t *new_on_disk_reftable;
> +
> +                reftable_size =
> +                    ROUND_UP((refblock_index + 1) * REFTABLE_ENTRY_SIZE,
> +                             s->cluster_size) / REFTABLE_ENTRY_SIZE;
> +                new_on_disk_reftable =
> +                    g_try_realloc(on_disk_reftable,
> +                                  reftable_size * REFTABLE_ENTRY_SIZE);
> +                if (!new_on_disk_reftable) {
> +                    res->check_errors++;
> +                    ret = -ENOMEM;
> +                    goto fail;
> +                }
> +                on_disk_reftable = new_on_disk_reftable;
>   
> -        /* If this is apparently the last refblock (for now), try to squeeze the
> -         * reftable in */
> -        if (refblock_index == (*nb_clusters - 1) >> s->refcount_block_bits &&
> -            reftable_offset < 0)
> -        {
> -            uint64_t reftable_clusters = size_to_clusters(s, reftable_size *
> -                                                          REFTABLE_ENTRY_SIZE);
> -            reftable_offset = alloc_clusters_imrt(bs, reftable_clusters,
> -                                                  refcount_table, nb_clusters,
> -                                                  &first_free_cluster);
> -            if (reftable_offset < 0) {
> -                fprintf(stderr, "ERROR allocating reftable: %s\n",
> -                        strerror(-reftable_offset));
> -                res->check_errors++;
> -                ret = reftable_offset;
> -                goto fail;
> +                memset(on_disk_reftable + old_reftable_size, 0,
> +                       (reftable_size - old_reftable_size) *
> +                       REFTABLE_ENTRY_SIZE);
> +
> +                /*
> +                 * The offset we have for the reftable is now no longer valid;
> +                 * this will leak that range, but we can easily fix that by
> +                 * running a leak-fixing check after this rebuild operation
> +                 */
> +                reftable_offset = -1;
> +            } else {
> +                assert(on_disk_reftable);
>               }
> +            on_disk_reftable[refblock_index] = refblock_offset;

only to write back again ?

>           }
>   
>           ret = qcow2_pre_write_overlap_check(bs, 0, refblock_offset,
> @@ -2459,15 +2466,12 @@ write_refblocks:
>       }
>   
>       if (reftable_offset < 0) {

at this point reftable_offset is always -1 now..

Ah not. and now I am a bit close to understanding all of this logic. this thing with "goto write_refblocks" is not obvious

> -        uint64_t post_refblock_start, reftable_clusters;
> +        uint64_t reftable_clusters;
>   
> -        post_refblock_start = ROUND_UP(*nb_clusters, s->refcount_block_size);
>           reftable_clusters =
>               size_to_clusters(s, reftable_size * REFTABLE_ENTRY_SIZE);
> -        /* Not pretty but simple */
> -        if (first_free_cluster < post_refblock_start) {
> -            first_free_cluster = post_refblock_start;
> -        }
> +
> +        first_free_cluster = 0;
>           reftable_offset = alloc_clusters_imrt(bs, reftable_clusters,
>                                                 refcount_table, nb_clusters,
>                                                 &first_free_cluster);
> @@ -2479,6 +2483,10 @@ write_refblocks:
>               goto fail;
>           }
>   
> +        assert(offset_into_cluster(s, reftable_offset) == 0);
> +        first_cluster = reftable_offset / s->cluster_size;
> +        end_cluster = first_cluster + reftable_clusters;
> +
>           goto write_refblocks;

these three lines now looks like a function call in assembler :)

>       }
>   
> 

You didn't ping the series (more than two week old) so, I'm not sure that you are not preparing v2 now.. But I kept it "marked unred" all this time, and several times tried to look at it, and postponed, because I don't familiar with this place of qcow2 driver. And the function looks too difficult. Now finally I think I understand most of the logic that you change. Honestly, I think a bit of refactoring the rebuild_refcount_structure() prior to logic change would make it a lot easier to review..


-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/4] qcow2: Improve refcount structure rebuilding
  2021-03-26 11:48   ` Vladimir Sementsov-Ogievskiy
@ 2021-03-26 13:47     ` Max Reitz
  2021-03-26 14:38       ` Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 9+ messages in thread
From: Max Reitz @ 2021-03-26 13:47 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-block; +Cc: Kevin Wolf, qemu-devel

On 26.03.21 12:48, Vladimir Sementsov-Ogievskiy wrote:
> 10.03.2021 18:59, Max Reitz wrote:
>> When rebuilding the refcount structures (when qemu-img check -r found
>> errors with refcount = 0, but reference count > 0), the new refcount
>> table defaults to being put at the image file end[1].  There is no good
>> reason for that except that it means we will not have to rewrite any
>> refblocks we already wrote to disk.
>>
>> Changing the code to rewrite those refblocks is not too difficult,
>> though, so let us do that.  That is beneficial for images on block
>> devices, where we cannot really write beyond the end of the image file.
>>
>> [1] Unless there is something allocated in the area pointed to by the
>>      last refblock, so we have to write that refblock.  In that case, we
>>      try to put the reftable in there.
>>
>> Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1519071
>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>> ---
>>   block/qcow2-refcount.c | 126 ++++++++++++++++++++++-------------------
>>   1 file changed, 67 insertions(+), 59 deletions(-)
>>
>> diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
>> index 8e649b008e..162caeeb8e 100644
>> --- a/block/qcow2-refcount.c
>> +++ b/block/qcow2-refcount.c
>> @@ -2352,8 +2352,9 @@ static int 
>> rebuild_refcount_structure(BlockDriverState *bs,
>>                                         int64_t *nb_clusters)
>>   {
>>       BDRVQcow2State *s = bs->opaque;
>> -    int64_t first_free_cluster = 0, reftable_offset = -1, cluster = 0;
>> +    int64_t first_free_cluster = 0, reftable_offset = -1, cluster;
>>       int64_t refblock_offset, refblock_start, refblock_index;
>> +    int64_t first_cluster, end_cluster;
>>       uint32_t reftable_size = 0;
>>       uint64_t *on_disk_reftable = NULL;
>>       void *on_disk_refblock;
>> @@ -2365,8 +2366,11 @@ static int 
>> rebuild_refcount_structure(BlockDriverState *bs,
>>       qcow2_cache_empty(bs, s->refcount_block_cache);
>> +    first_cluster = 0;
>> +    end_cluster = *nb_clusters;
>> +
>>   write_refblocks:
>> -    for (; cluster < *nb_clusters; cluster++) {
>> +    for (cluster = first_cluster; cluster < end_cluster; cluster++) {
>>           if (!s->get_refcount(*refcount_table, cluster)) {
>>               continue;
>>           }
>> @@ -2374,65 +2378,68 @@ write_refblocks:
>>           refblock_index = cluster >> s->refcount_block_bits;
>>           refblock_start = refblock_index << s->refcount_block_bits;
>> -        /* Don't allocate a cluster in a refblock already written to 
>> disk */
>> -        if (first_free_cluster < refblock_start) {
>> -            first_free_cluster = refblock_start;
>> -        }
>> -        refblock_offset = alloc_clusters_imrt(bs, 1, refcount_table,
>> -                                              nb_clusters, 
>> &first_free_cluster);
>> -        if (refblock_offset < 0) {
>> -            fprintf(stderr, "ERROR allocating refblock: %s\n",
>> -                    strerror(-refblock_offset));
>> -            res->check_errors++;
>> -            ret = refblock_offset;
>> -            goto fail;
>> -        }
>> -
>> -        if (reftable_size <= refblock_index) {
>> -            uint32_t old_reftable_size = reftable_size;
>> -            uint64_t *new_on_disk_reftable;
>> +        if (reftable_size > refblock_index &&
>> +            on_disk_reftable[refblock_index])
>> +        {
>> +            refblock_offset = on_disk_reftable[refblock_index];
> 
> In this branch, we assign it to ..
> 
>> +        } else {
>> +            int64_t refblock_cluster_index;
>> -            reftable_size = ROUND_UP((refblock_index + 1) * 
>> REFTABLE_ENTRY_SIZE,
>> -                                     s->cluster_size) / 
>> REFTABLE_ENTRY_SIZE;
>> -            new_on_disk_reftable = g_try_realloc(on_disk_reftable,
>> -                                                 reftable_size *
>> -                                                 REFTABLE_ENTRY_SIZE);
>> -            if (!new_on_disk_reftable) {
>> +            /* Don't allocate a cluster in a refblock already written 
>> to disk */
>> +            if (first_free_cluster < refblock_start) {
>> +                first_free_cluster = refblock_start;
>> +            }
>> +            refblock_offset = alloc_clusters_imrt(bs, 1, refcount_table,
>> +                                                  nb_clusters,
>> +                                                  &first_free_cluster);
>> +            if (refblock_offset < 0) {
>> +                fprintf(stderr, "ERROR allocating refblock: %s\n",
>> +                        strerror(-refblock_offset));
>>                   res->check_errors++;
>> -                ret = -ENOMEM;
>> +                ret = refblock_offset;
>>                   goto fail;
>>               }
>> -            on_disk_reftable = new_on_disk_reftable;
>> -            memset(on_disk_reftable + old_reftable_size, 0,
>> -                   (reftable_size - old_reftable_size) * 
>> REFTABLE_ENTRY_SIZE);
>> +            refblock_cluster_index = refblock_offset / s->cluster_size;
>> +            if (refblock_cluster_index >= end_cluster) {
>> +                /*
>> +                 * We must write the refblock that holds this refblock's
>> +                 * refcount
>> +                 */
>> +                end_cluster = refblock_cluster_index + 1;
>> +            }
>> -            /* The offset we have for the reftable is now no longer 
>> valid;
>> -             * this will leak that range, but we can easily fix that 
>> by running
>> -             * a leak-fixing check after this rebuild operation */
>> -            reftable_offset = -1;
>> -        } else {
>> -            assert(on_disk_reftable);
>> -        }
>> -        on_disk_reftable[refblock_index] = refblock_offset;
>> +            if (reftable_size <= refblock_index) {
>> +                uint32_t old_reftable_size = reftable_size;
>> +                uint64_t *new_on_disk_reftable;
>> +
>> +                reftable_size =
>> +                    ROUND_UP((refblock_index + 1) * REFTABLE_ENTRY_SIZE,
>> +                             s->cluster_size) / REFTABLE_ENTRY_SIZE;
>> +                new_on_disk_reftable =
>> +                    g_try_realloc(on_disk_reftable,
>> +                                  reftable_size * REFTABLE_ENTRY_SIZE);
>> +                if (!new_on_disk_reftable) {
>> +                    res->check_errors++;
>> +                    ret = -ENOMEM;
>> +                    goto fail;
>> +                }
>> +                on_disk_reftable = new_on_disk_reftable;
>> -        /* If this is apparently the last refblock (for now), try to 
>> squeeze the
>> -         * reftable in */
>> -        if (refblock_index == (*nb_clusters - 1) >> 
>> s->refcount_block_bits &&
>> -            reftable_offset < 0)
>> -        {
>> -            uint64_t reftable_clusters = size_to_clusters(s, 
>> reftable_size *
>> -                                                          
>> REFTABLE_ENTRY_SIZE);
>> -            reftable_offset = alloc_clusters_imrt(bs, reftable_clusters,
>> -                                                  refcount_table, 
>> nb_clusters,
>> -                                                  &first_free_cluster);
>> -            if (reftable_offset < 0) {
>> -                fprintf(stderr, "ERROR allocating reftable: %s\n",
>> -                        strerror(-reftable_offset));
>> -                res->check_errors++;
>> -                ret = reftable_offset;
>> -                goto fail;
>> +                memset(on_disk_reftable + old_reftable_size, 0,
>> +                       (reftable_size - old_reftable_size) *
>> +                       REFTABLE_ENTRY_SIZE);
>> +
>> +                /*
>> +                 * The offset we have for the reftable is now no 
>> longer valid;
>> +                 * this will leak that range, but we can easily fix 
>> that by
>> +                 * running a leak-fixing check after this rebuild 
>> operation
>> +                 */
>> +                reftable_offset = -1;
>> +            } else {
>> +                assert(on_disk_reftable);
>>               }
>> +            on_disk_reftable[refblock_index] = refblock_offset;
> 
> only to write back again ?

This assignment is on a deeper level, though, isn it?  I.e. it’s inside 
the else branch from above.

>>           }
>>           ret = qcow2_pre_write_overlap_check(bs, 0, refblock_offset,
>> @@ -2459,15 +2466,12 @@ write_refblocks:
>>       }
>>       if (reftable_offset < 0) {
> 
> at this point reftable_offset is always -1 now..
> 
> Ah not. and now I am a bit close to understanding all of this logic. 
> this thing with "goto write_refblocks" is not obvious
> 
>> -        uint64_t post_refblock_start, reftable_clusters;
>> +        uint64_t reftable_clusters;
>> -        post_refblock_start = ROUND_UP(*nb_clusters, 
>> s->refcount_block_size);
>>           reftable_clusters =
>>               size_to_clusters(s, reftable_size * REFTABLE_ENTRY_SIZE);
>> -        /* Not pretty but simple */
>> -        if (first_free_cluster < post_refblock_start) {
>> -            first_free_cluster = post_refblock_start;
>> -        }
>> +
>> +        first_free_cluster = 0;
>>           reftable_offset = alloc_clusters_imrt(bs, reftable_clusters,
>>                                                 refcount_table, 
>> nb_clusters,
>>                                                 &first_free_cluster);
>> @@ -2479,6 +2483,10 @@ write_refblocks:
>>               goto fail;
>>           }
>> +        assert(offset_into_cluster(s, reftable_offset) == 0);
>> +        first_cluster = reftable_offset / s->cluster_size;
>> +        end_cluster = first_cluster + reftable_clusters;
>> +
>>           goto write_refblocks;
> 
> these three lines now looks like a function call in assembler :)
> 
>>       }
>>
> 
> You didn't ping the series (more than two week old) so, I'm not sure 
> that you are not preparing v2 now.. But I kept it "marked unred" all 
> this time, and several times tried to look at it, and postponed, because 
> I don't familiar with this place of qcow2 driver. And the function looks 
> too difficult. Now finally I think I understand most of the logic that 
> you change. Honestly, I think a bit of refactoring the 
> rebuild_refcount_structure() prior to logic change would make it a lot 
> easier to review..

Hm, yes, putting the for () loop into its own function would probably be 
a good starting put.  I’ll look into it.

Max



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/4] qcow2: Improve refcount structure rebuilding
  2021-03-26 13:47     ` Max Reitz
@ 2021-03-26 14:38       ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 9+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2021-03-26 14:38 UTC (permalink / raw)
  To: Max Reitz, qemu-block; +Cc: Kevin Wolf, qemu-devel

26.03.2021 16:47, Max Reitz wrote:
> On 26.03.21 12:48, Vladimir Sementsov-Ogievskiy wrote:
>> 10.03.2021 18:59, Max Reitz wrote:
>>> When rebuilding the refcount structures (when qemu-img check -r found
>>> errors with refcount = 0, but reference count > 0), the new refcount
>>> table defaults to being put at the image file end[1].  There is no good
>>> reason for that except that it means we will not have to rewrite any
>>> refblocks we already wrote to disk.
>>>
>>> Changing the code to rewrite those refblocks is not too difficult,
>>> though, so let us do that.  That is beneficial for images on block
>>> devices, where we cannot really write beyond the end of the image file.
>>>
>>> [1] Unless there is something allocated in the area pointed to by the
>>>      last refblock, so we have to write that refblock.  In that case, we
>>>      try to put the reftable in there.
>>>
>>> Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1519071
>>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>>> ---
>>>   block/qcow2-refcount.c | 126 ++++++++++++++++++++++-------------------
>>>   1 file changed, 67 insertions(+), 59 deletions(-)
>>>
>>> diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
>>> index 8e649b008e..162caeeb8e 100644
>>> --- a/block/qcow2-refcount.c
>>> +++ b/block/qcow2-refcount.c
>>> @@ -2352,8 +2352,9 @@ static int rebuild_refcount_structure(BlockDriverState *bs,
>>>                                         int64_t *nb_clusters)
>>>   {
>>>       BDRVQcow2State *s = bs->opaque;
>>> -    int64_t first_free_cluster = 0, reftable_offset = -1, cluster = 0;
>>> +    int64_t first_free_cluster = 0, reftable_offset = -1, cluster;
>>>       int64_t refblock_offset, refblock_start, refblock_index;
>>> +    int64_t first_cluster, end_cluster;
>>>       uint32_t reftable_size = 0;
>>>       uint64_t *on_disk_reftable = NULL;
>>>       void *on_disk_refblock;
>>> @@ -2365,8 +2366,11 @@ static int rebuild_refcount_structure(BlockDriverState *bs,
>>>       qcow2_cache_empty(bs, s->refcount_block_cache);
>>> +    first_cluster = 0;
>>> +    end_cluster = *nb_clusters;
>>> +
>>>   write_refblocks:
>>> -    for (; cluster < *nb_clusters; cluster++) {
>>> +    for (cluster = first_cluster; cluster < end_cluster; cluster++) {
>>>           if (!s->get_refcount(*refcount_table, cluster)) {
>>>               continue;
>>>           }
>>> @@ -2374,65 +2378,68 @@ write_refblocks:
>>>           refblock_index = cluster >> s->refcount_block_bits;
>>>           refblock_start = refblock_index << s->refcount_block_bits;
>>> -        /* Don't allocate a cluster in a refblock already written to disk */
>>> -        if (first_free_cluster < refblock_start) {
>>> -            first_free_cluster = refblock_start;
>>> -        }
>>> -        refblock_offset = alloc_clusters_imrt(bs, 1, refcount_table,
>>> -                                              nb_clusters, &first_free_cluster);
>>> -        if (refblock_offset < 0) {
>>> -            fprintf(stderr, "ERROR allocating refblock: %s\n",
>>> -                    strerror(-refblock_offset));
>>> -            res->check_errors++;
>>> -            ret = refblock_offset;
>>> -            goto fail;
>>> -        }
>>> -
>>> -        if (reftable_size <= refblock_index) {
>>> -            uint32_t old_reftable_size = reftable_size;
>>> -            uint64_t *new_on_disk_reftable;
>>> +        if (reftable_size > refblock_index &&
>>> +            on_disk_reftable[refblock_index])
>>> +        {
>>> +            refblock_offset = on_disk_reftable[refblock_index];
>>
>> In this branch, we assign it to ..
>>
>>> +        } else {
>>> +            int64_t refblock_cluster_index;
>>> -            reftable_size = ROUND_UP((refblock_index + 1) * REFTABLE_ENTRY_SIZE,
>>> -                                     s->cluster_size) / REFTABLE_ENTRY_SIZE;
>>> -            new_on_disk_reftable = g_try_realloc(on_disk_reftable,
>>> -                                                 reftable_size *
>>> -                                                 REFTABLE_ENTRY_SIZE);
>>> -            if (!new_on_disk_reftable) {
>>> +            /* Don't allocate a cluster in a refblock already written to disk */
>>> +            if (first_free_cluster < refblock_start) {
>>> +                first_free_cluster = refblock_start;
>>> +            }
>>> +            refblock_offset = alloc_clusters_imrt(bs, 1, refcount_table,
>>> +                                                  nb_clusters,
>>> +                                                  &first_free_cluster);
>>> +            if (refblock_offset < 0) {
>>> +                fprintf(stderr, "ERROR allocating refblock: %s\n",
>>> +                        strerror(-refblock_offset));
>>>                   res->check_errors++;
>>> -                ret = -ENOMEM;
>>> +                ret = refblock_offset;
>>>                   goto fail;
>>>               }
>>> -            on_disk_reftable = new_on_disk_reftable;
>>> -            memset(on_disk_reftable + old_reftable_size, 0,
>>> -                   (reftable_size - old_reftable_size) * REFTABLE_ENTRY_SIZE);
>>> +            refblock_cluster_index = refblock_offset / s->cluster_size;
>>> +            if (refblock_cluster_index >= end_cluster) {
>>> +                /*
>>> +                 * We must write the refblock that holds this refblock's
>>> +                 * refcount
>>> +                 */
>>> +                end_cluster = refblock_cluster_index + 1;
>>> +            }
>>> -            /* The offset we have for the reftable is now no longer valid;
>>> -             * this will leak that range, but we can easily fix that by running
>>> -             * a leak-fixing check after this rebuild operation */
>>> -            reftable_offset = -1;
>>> -        } else {
>>> -            assert(on_disk_reftable);
>>> -        }
>>> -        on_disk_reftable[refblock_index] = refblock_offset;
>>> +            if (reftable_size <= refblock_index) {
>>> +                uint32_t old_reftable_size = reftable_size;
>>> +                uint64_t *new_on_disk_reftable;
>>> +
>>> +                reftable_size =
>>> +                    ROUND_UP((refblock_index + 1) * REFTABLE_ENTRY_SIZE,
>>> +                             s->cluster_size) / REFTABLE_ENTRY_SIZE;
>>> +                new_on_disk_reftable =
>>> +                    g_try_realloc(on_disk_reftable,
>>> +                                  reftable_size * REFTABLE_ENTRY_SIZE);
>>> +                if (!new_on_disk_reftable) {
>>> +                    res->check_errors++;
>>> +                    ret = -ENOMEM;
>>> +                    goto fail;
>>> +                }
>>> +                on_disk_reftable = new_on_disk_reftable;
>>> -        /* If this is apparently the last refblock (for now), try to squeeze the
>>> -         * reftable in */
>>> -        if (refblock_index == (*nb_clusters - 1) >> s->refcount_block_bits &&
>>> -            reftable_offset < 0)
>>> -        {
>>> -            uint64_t reftable_clusters = size_to_clusters(s, reftable_size *
>>> - REFTABLE_ENTRY_SIZE);
>>> -            reftable_offset = alloc_clusters_imrt(bs, reftable_clusters,
>>> -                                                  refcount_table, nb_clusters,
>>> -                                                  &first_free_cluster);
>>> -            if (reftable_offset < 0) {
>>> -                fprintf(stderr, "ERROR allocating reftable: %s\n",
>>> -                        strerror(-reftable_offset));
>>> -                res->check_errors++;
>>> -                ret = reftable_offset;
>>> -                goto fail;
>>> +                memset(on_disk_reftable + old_reftable_size, 0,
>>> +                       (reftable_size - old_reftable_size) *
>>> +                       REFTABLE_ENTRY_SIZE);
>>> +
>>> +                /*
>>> +                 * The offset we have for the reftable is now no longer valid;
>>> +                 * this will leak that range, but we can easily fix that by
>>> +                 * running a leak-fixing check after this rebuild operation
>>> +                 */
>>> +                reftable_offset = -1;
>>> +            } else {
>>> +                assert(on_disk_reftable);
>>>               }
>>> +            on_disk_reftable[refblock_index] = refblock_offset;
>>
>> only to write back again ?
> 
> This assignment is on a deeper level, though, isn it?  I.e. it’s inside the else branch from above.

Ahm right.

I looked at it in vimdiff, and modifed pre-patch alignment so that it looks good.. but something went wrong.

> 
>>>           }
>>>           ret = qcow2_pre_write_overlap_check(bs, 0, refblock_offset,
>>> @@ -2459,15 +2466,12 @@ write_refblocks:
>>>       }
>>>       if (reftable_offset < 0) {
>>
>> at this point reftable_offset is always -1 now..
>>
>> Ah not. and now I am a bit close to understanding all of this logic. this thing with "goto write_refblocks" is not obvious
>>
>>> -        uint64_t post_refblock_start, reftable_clusters;
>>> +        uint64_t reftable_clusters;
>>> -        post_refblock_start = ROUND_UP(*nb_clusters, s->refcount_block_size);
>>>           reftable_clusters =
>>>               size_to_clusters(s, reftable_size * REFTABLE_ENTRY_SIZE);
>>> -        /* Not pretty but simple */
>>> -        if (first_free_cluster < post_refblock_start) {
>>> -            first_free_cluster = post_refblock_start;
>>> -        }
>>> +
>>> +        first_free_cluster = 0;
>>>           reftable_offset = alloc_clusters_imrt(bs, reftable_clusters,
>>>                                                 refcount_table, nb_clusters,
>>>                                                 &first_free_cluster);
>>> @@ -2479,6 +2483,10 @@ write_refblocks:
>>>               goto fail;
>>>           }
>>> +        assert(offset_into_cluster(s, reftable_offset) == 0);
>>> +        first_cluster = reftable_offset / s->cluster_size;
>>> +        end_cluster = first_cluster + reftable_clusters;
>>> +
>>>           goto write_refblocks;
>>
>> these three lines now looks like a function call in assembler :)
>>
>>>       }
>>>
>>
>> You didn't ping the series (more than two week old) so, I'm not sure that you are not preparing v2 now.. But I kept it "marked unred" all this time, and several times tried to look at it, and postponed, because I don't familiar with this place of qcow2 driver. And the function looks too difficult. Now finally I think I understand most of the logic that you change. Honestly, I think a bit of refactoring the rebuild_refcount_structure() prior to logic change would make it a lot easier to review..
> 
> Hm, yes, putting the for () loop into its own function would probably be a good starting put.  I’ll look into it.
> 

Thanks!


-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2021-03-26 14:40 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-10 15:59 [PATCH 0/4] qcow2: Improve refcount structure rebuilding Max Reitz
2021-03-10 15:59 ` [PATCH 1/4] " Max Reitz
2021-03-26 11:48   ` Vladimir Sementsov-Ogievskiy
2021-03-26 13:47     ` Max Reitz
2021-03-26 14:38       ` Vladimir Sementsov-Ogievskiy
2021-03-10 15:59 ` [PATCH 2/4] iotests/common.qemu: Add _cleanup_single_qemu Max Reitz
2021-03-10 15:59 ` [PATCH 3/4] iotests/common.qemu: Allow using the QSD Max Reitz
2021-03-10 15:59 ` [PATCH 4/4] iotests/108: Test new refcount rebuild algorithm Max Reitz
2021-03-10 16:07   ` Eric Blake

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.