All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v6 00/11] qcow2: Fix image repairing
@ 2014-10-20 14:35 Max Reitz
  2014-10-20 14:35 ` [Qemu-devel] [PATCH v6 01/11] qcow2: Calculate refcount block entry count Max Reitz
                   ` (10 more replies)
  0 siblings, 11 replies; 30+ messages in thread
From: Max Reitz @ 2014-10-20 14:35 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Benoît Canet, Stefan Hajnoczi, Max Reitz

As can be seen in the final patch of this series, there are certain
cases where the current repair implementation of qcow2 actually damages
the image further because it allocates new clusters for the refcount
structure which overlap with existing but according to the on-disk
refcounts (which are assumed to be wrong to begin with) unallocated
clusters.

This series fixes this by completely recreating the refcount structure
based on the in-memory information calculated during the check operation
if the possibility of damaging the image while repairing the refcount
structures in-place exists.


And thanks a lot to the reviewers so far! (and future reviewers as well,
of course)


v6:
- Patch 1: Additional comment about why
  s->refcount_block_bits = s->cluster_bits - (s->refcount_order - 3)
  [Benoît]
  (I kept Eric's R-b due to the rather trivial change)
- Patch 8:
  - Commit message: The leak will be dealt with [Eric]
  - Changed comment for alloc_clusters_imrt() according to Eric's
    (second) proposal
  - Changed a comment insode alloc_clusters_imrt() [Eric]
  - Use refblock_offset instead of rb_ofs and so on [Benoît]
  - Don't blindly use strerror(-ret), but the correct variable [Eric]
- Patch 11: Use different test number [Rebase on Kevin's block branch]


git-backport-diff against v5:

Key:
[----] : patches are identical
[####] : number of functional differences between upstream/downstream patch
[down] : patch is downstream-only
The flags [FC] indicate (F)unctional and (C)ontextual differences, respectively

001/11:[0001] [FC] 'qcow2: Calculate refcount block entry count'
002/11:[----] [--] 'qcow2: Fix leaks in dirty images'
003/11:[----] [--] 'qcow2: Split qcow2_check_refcounts()'
004/11:[----] [--] 'qcow2: Pull check_refblocks() up'
005/11:[----] [--] 'qcow2: Reuse refcount table in calculate_refcounts()'
006/11:[----] [--] 'qcow2: Fix refcount blocks beyond image end'
007/11:[----] [--] 'qcow2: Do not perform potentially damaging repairs'
008/11:[0138] [FC] 'qcow2: Rebuild refcount structure during check'
009/11:[----] [--] 'qcow2: Clean up after refcount rebuild'
010/11:[----] [-C] 'iotests: Fix test outputs'
011/11:[0004] [FC] 'iotests: Add test for potentially damaging repairs'


Max Reitz (11):
  qcow2: Calculate refcount block entry count
  qcow2: Fix leaks in dirty images
  qcow2: Split qcow2_check_refcounts()
  qcow2: Pull check_refblocks() up
  qcow2: Reuse refcount table in calculate_refcounts()
  qcow2: Fix refcount blocks beyond image end
  qcow2: Do not perform potentially damaging repairs
  qcow2: Rebuild refcount structure during check
  qcow2: Clean up after refcount rebuild
  iotests: Fix test outputs
  iotests: Add test for potentially damaging repairs

 block/qcow2-refcount.c     | 687 ++++++++++++++++++++++++++++++++-------------
 block/qcow2.c              |   5 +-
 block/qcow2.h              |   2 +
 tests/qemu-iotests/039.out |  10 +-
 tests/qemu-iotests/060.out |  10 +-
 tests/qemu-iotests/061.out |  18 +-
 tests/qemu-iotests/108     | 141 ++++++++++
 tests/qemu-iotests/108.out | 110 ++++++++
 tests/qemu-iotests/group   |   1 +
 9 files changed, 778 insertions(+), 206 deletions(-)
 create mode 100755 tests/qemu-iotests/108
 create mode 100644 tests/qemu-iotests/108.out

-- 
2.1.2

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [Qemu-devel] [PATCH v6 01/11] qcow2: Calculate refcount block entry count
  2014-10-20 14:35 [Qemu-devel] [PATCH v6 00/11] qcow2: Fix image repairing Max Reitz
@ 2014-10-20 14:35 ` Max Reitz
  2014-10-20 14:35 ` [Qemu-devel] [PATCH v6 02/11] qcow2: Fix leaks in dirty images Max Reitz
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 30+ messages in thread
From: Max Reitz @ 2014-10-20 14:35 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Benoît Canet, Stefan Hajnoczi, Max Reitz

The size of a refblock entry is (in theory) variable; calculate
therefore the number of entries per refblock and the according bit shift
(1 << x == entry count) when opening an image.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
---
 block/qcow2.c | 3 +++
 block/qcow2.h | 2 ++
 2 files changed, 5 insertions(+)

diff --git a/block/qcow2.c b/block/qcow2.c
index 156a219..3c8b881 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -698,6 +698,9 @@ static int qcow2_open(BlockDriverState *bs, QDict *options, int flags,
 
     s->l2_bits = s->cluster_bits - 3; /* L2 is always one cluster */
     s->l2_size = 1 << s->l2_bits;
+    /* 2^(s->refcount_order - 3) is the refcount width in bytes */
+    s->refcount_block_bits = s->cluster_bits - (s->refcount_order - 3);
+    s->refcount_block_size = 1 << s->refcount_block_bits;
     bs->total_sectors = header.size / 512;
     s->csize_shift = (62 - (s->cluster_bits - 8));
     s->csize_mask = (1 << (s->cluster_bits - 8)) - 1;
diff --git a/block/qcow2.h b/block/qcow2.h
index 7d61e61..47cd4b3 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -223,6 +223,8 @@ typedef struct BDRVQcowState {
     int l2_size;
     int l1_size;
     int l1_vm_state_index;
+    int refcount_block_bits;
+    int refcount_block_size;
     int csize_shift;
     int csize_mask;
     uint64_t cluster_offset_mask;
-- 
2.1.2

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [Qemu-devel] [PATCH v6 02/11] qcow2: Fix leaks in dirty images
  2014-10-20 14:35 [Qemu-devel] [PATCH v6 00/11] qcow2: Fix image repairing Max Reitz
  2014-10-20 14:35 ` [Qemu-devel] [PATCH v6 01/11] qcow2: Calculate refcount block entry count Max Reitz
@ 2014-10-20 14:35 ` Max Reitz
  2014-10-20 14:35 ` [Qemu-devel] [PATCH v6 03/11] qcow2: Split qcow2_check_refcounts() Max Reitz
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 30+ messages in thread
From: Max Reitz @ 2014-10-20 14:35 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Benoît Canet, Stefan Hajnoczi, Max Reitz

When opening dirty images, qcow2's repair function should not only
repair errors but leaks as well.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Benoît Canet <benoit.canet@nodalink.com>
---
 block/qcow2.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index 3c8b881..7a2c66f 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -910,7 +910,7 @@ static int qcow2_open(BlockDriverState *bs, QDict *options, int flags,
         (s->incompatible_features & QCOW2_INCOMPAT_DIRTY)) {
         BdrvCheckResult result = {0};
 
-        ret = qcow2_check(bs, &result, BDRV_FIX_ERRORS);
+        ret = qcow2_check(bs, &result, BDRV_FIX_ERRORS | BDRV_FIX_LEAKS);
         if (ret < 0) {
             error_setg_errno(errp, -ret, "Could not repair dirty image");
             goto fail;
-- 
2.1.2

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [Qemu-devel] [PATCH v6 03/11] qcow2: Split qcow2_check_refcounts()
  2014-10-20 14:35 [Qemu-devel] [PATCH v6 00/11] qcow2: Fix image repairing Max Reitz
  2014-10-20 14:35 ` [Qemu-devel] [PATCH v6 01/11] qcow2: Calculate refcount block entry count Max Reitz
  2014-10-20 14:35 ` [Qemu-devel] [PATCH v6 02/11] qcow2: Fix leaks in dirty images Max Reitz
@ 2014-10-20 14:35 ` Max Reitz
  2014-10-20 14:35 ` [Qemu-devel] [PATCH v6 04/11] qcow2: Pull check_refblocks() up Max Reitz
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 30+ messages in thread
From: Max Reitz @ 2014-10-20 14:35 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Benoît Canet, Stefan Hajnoczi, Max Reitz

Put the code for calculating the reference counts and comparing them
during qemu-img check into own functions.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Benoît Canet <benoit.canet@nodalink.com>
---
 block/qcow2-refcount.c | 153 ++++++++++++++++++++++++++++++++-----------------
 1 file changed, 102 insertions(+), 51 deletions(-)

diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index c31d85a..e5f7876 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -1535,71 +1535,70 @@ done:
     return new_offset;
 }
 
+static int check_refblocks(BlockDriverState *bs, BdrvCheckResult *res,
+                           BdrvCheckMode fix, uint16_t **refcount_table,
+                           int64_t *nb_clusters);
+
 /*
- * Checks an image for refcount consistency.
- *
- * Returns 0 if no errors are found, the number of errors in case the image is
- * detected as corrupted, and -errno when an internal error occurred.
+ * Calculates an in-memory refcount table.
  */
-int qcow2_check_refcounts(BlockDriverState *bs, BdrvCheckResult *res,
-                          BdrvCheckMode fix)
+static int calculate_refcounts(BlockDriverState *bs, BdrvCheckResult *res,
+                               BdrvCheckMode fix, uint16_t **refcount_table,
+                               int64_t *nb_clusters)
 {
     BDRVQcowState *s = bs->opaque;
-    int64_t size, i, highest_cluster, nb_clusters;
-    int refcount1, refcount2;
+    int64_t i;
     QCowSnapshot *sn;
-    uint16_t *refcount_table;
     int ret;
 
-    size = bdrv_getlength(bs->file);
-    if (size < 0) {
-        res->check_errors++;
-        return size;
-    }
-
-    nb_clusters = size_to_clusters(s, size);
-    if (nb_clusters > INT_MAX) {
-        res->check_errors++;
-        return -EFBIG;
-    }
-
-    refcount_table = g_try_new0(uint16_t, nb_clusters);
-    if (nb_clusters && refcount_table == NULL) {
+    *refcount_table = g_try_new0(uint16_t, *nb_clusters);
+    if (*nb_clusters && *refcount_table == NULL) {
         res->check_errors++;
         return -ENOMEM;
     }
 
-    res->bfi.total_clusters =
-        size_to_clusters(s, bs->total_sectors * BDRV_SECTOR_SIZE);
-
     /* header */
-    inc_refcounts(bs, res, refcount_table, nb_clusters,
+    inc_refcounts(bs, res, *refcount_table, *nb_clusters,
         0, s->cluster_size);
 
     /* current L1 table */
-    ret = check_refcounts_l1(bs, res, refcount_table, nb_clusters,
+    ret = check_refcounts_l1(bs, res, *refcount_table, *nb_clusters,
                              s->l1_table_offset, s->l1_size, CHECK_FRAG_INFO);
     if (ret < 0) {
-        goto fail;
+        return ret;
     }
 
     /* snapshots */
-    for(i = 0; i < s->nb_snapshots; i++) {
+    for (i = 0; i < s->nb_snapshots; i++) {
         sn = s->snapshots + i;
-        ret = check_refcounts_l1(bs, res, refcount_table, nb_clusters,
+        ret = check_refcounts_l1(bs, res, *refcount_table, *nb_clusters,
             sn->l1_table_offset, sn->l1_size, 0);
         if (ret < 0) {
-            goto fail;
+            return ret;
         }
     }
-    inc_refcounts(bs, res, refcount_table, nb_clusters,
+    inc_refcounts(bs, res, *refcount_table, *nb_clusters,
         s->snapshots_offset, s->snapshots_size);
 
     /* refcount data */
-    inc_refcounts(bs, res, refcount_table, nb_clusters,
+    inc_refcounts(bs, res, *refcount_table, *nb_clusters,
         s->refcount_table_offset,
         s->refcount_table_size * sizeof(uint64_t));
 
+    return check_refblocks(bs, res, fix, refcount_table, nb_clusters);
+}
+
+/*
+ * Checks consistency of refblocks and accounts for each refblock in
+ * *refcount_table.
+ */
+static int check_refblocks(BlockDriverState *bs, BdrvCheckResult *res,
+                           BdrvCheckMode fix, uint16_t **refcount_table,
+                           int64_t *nb_clusters)
+{
+    BDRVQcowState *s = bs->opaque;
+    int64_t i;
+
     for(i = 0; i < s->refcount_table_size; i++) {
         uint64_t offset, cluster;
         offset = s->refcount_table[i];
@@ -1613,7 +1612,7 @@ int qcow2_check_refcounts(BlockDriverState *bs, BdrvCheckResult *res,
             continue;
         }
 
-        if (cluster >= nb_clusters) {
+        if (cluster >= *nb_clusters) {
             fprintf(stderr, "ERROR refcount block %" PRId64
                     " is outside image\n", i);
             res->corruptions++;
@@ -1621,14 +1620,14 @@ int qcow2_check_refcounts(BlockDriverState *bs, BdrvCheckResult *res,
         }
 
         if (offset != 0) {
-            inc_refcounts(bs, res, refcount_table, nb_clusters,
+            inc_refcounts(bs, res, *refcount_table, *nb_clusters,
                 offset, s->cluster_size);
-            if (refcount_table[cluster] != 1) {
+            if ((*refcount_table)[cluster] != 1) {
                 fprintf(stderr, "%s refcount block %" PRId64
                     " refcount=%d\n",
                     fix & BDRV_FIX_ERRORS ? "Repairing" :
                                             "ERROR",
-                    i, refcount_table[cluster]);
+                    i, (*refcount_table)[cluster]);
 
                 if (fix & BDRV_FIX_ERRORS) {
                     int64_t new_offset;
@@ -1640,17 +1639,18 @@ int qcow2_check_refcounts(BlockDriverState *bs, BdrvCheckResult *res,
                     }
 
                     /* update refcounts */
-                    if ((new_offset >> s->cluster_bits) >= nb_clusters) {
+                    if ((new_offset >> s->cluster_bits) >= *nb_clusters) {
                         /* increase refcount_table size if necessary */
-                        int old_nb_clusters = nb_clusters;
-                        nb_clusters = (new_offset >> s->cluster_bits) + 1;
-                        refcount_table = g_renew(uint16_t, refcount_table,
-                                                 nb_clusters);
-                        memset(&refcount_table[old_nb_clusters], 0, (nb_clusters
-                                - old_nb_clusters) * sizeof(uint16_t));
+                        int old_nb_clusters = *nb_clusters;
+                        *nb_clusters = (new_offset >> s->cluster_bits) + 1;
+                        *refcount_table = g_renew(uint16_t, *refcount_table,
+                                                  *nb_clusters);
+                        memset(&(*refcount_table)[old_nb_clusters], 0,
+                               (*nb_clusters - old_nb_clusters) *
+                               sizeof(uint16_t));
                     }
-                    refcount_table[cluster]--;
-                    inc_refcounts(bs, res, refcount_table, nb_clusters,
+                    (*refcount_table)[cluster]--;
+                    inc_refcounts(bs, res, *refcount_table, *nb_clusters,
                             new_offset, s->cluster_size);
 
                     res->corruptions_fixed++;
@@ -1661,8 +1661,22 @@ int qcow2_check_refcounts(BlockDriverState *bs, BdrvCheckResult *res,
         }
     }
 
-    /* compare ref counts */
-    for (i = 0, highest_cluster = 0; i < nb_clusters; i++) {
+    return 0;
+}
+
+/*
+ * Compares the actual reference count for each cluster in the image against the
+ * refcount as reported by the refcount structures on-disk.
+ */
+static void compare_refcounts(BlockDriverState *bs, BdrvCheckResult *res,
+                              BdrvCheckMode fix, int64_t *highest_cluster,
+                              uint16_t *refcount_table, int64_t nb_clusters)
+{
+    BDRVQcowState *s = bs->opaque;
+    int64_t i;
+    int refcount1, refcount2, ret;
+
+    for (i = 0, *highest_cluster = 0; i < nb_clusters; i++) {
         refcount1 = get_refcount(bs, i);
         if (refcount1 < 0) {
             fprintf(stderr, "Can't get refcount for cluster %" PRId64 ": %s\n",
@@ -1674,11 +1688,10 @@ int qcow2_check_refcounts(BlockDriverState *bs, BdrvCheckResult *res,
         refcount2 = refcount_table[i];
 
         if (refcount1 > 0 || refcount2 > 0) {
-            highest_cluster = i;
+            *highest_cluster = i;
         }
 
         if (refcount1 != refcount2) {
-
             /* Check if we're allowed to fix the mismatch */
             int *num_fixed = NULL;
             if (refcount1 > refcount2 && (fix & BDRV_FIX_LEAKS)) {
@@ -1711,6 +1724,44 @@ int qcow2_check_refcounts(BlockDriverState *bs, BdrvCheckResult *res,
             }
         }
     }
+}
+
+/*
+ * Checks an image for refcount consistency.
+ *
+ * Returns 0 if no errors are found, the number of errors in case the image is
+ * detected as corrupted, and -errno when an internal error occurred.
+ */
+int qcow2_check_refcounts(BlockDriverState *bs, BdrvCheckResult *res,
+                          BdrvCheckMode fix)
+{
+    BDRVQcowState *s = bs->opaque;
+    int64_t size, highest_cluster, nb_clusters;
+    uint16_t *refcount_table;
+    int ret;
+
+    size = bdrv_getlength(bs->file);
+    if (size < 0) {
+        res->check_errors++;
+        return size;
+    }
+
+    nb_clusters = size_to_clusters(s, size);
+    if (nb_clusters > INT_MAX) {
+        res->check_errors++;
+        return -EFBIG;
+    }
+
+    res->bfi.total_clusters =
+        size_to_clusters(s, bs->total_sectors * BDRV_SECTOR_SIZE);
+
+    ret = calculate_refcounts(bs, res, fix, &refcount_table, &nb_clusters);
+    if (ret < 0) {
+        goto fail;
+    }
+
+    compare_refcounts(bs, res, fix, &highest_cluster, refcount_table,
+                      nb_clusters);
 
     /* check OFLAG_COPIED */
     ret = check_oflag_copied(bs, res, fix);
-- 
2.1.2

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [Qemu-devel] [PATCH v6 04/11] qcow2: Pull check_refblocks() up
  2014-10-20 14:35 [Qemu-devel] [PATCH v6 00/11] qcow2: Fix image repairing Max Reitz
                   ` (2 preceding siblings ...)
  2014-10-20 14:35 ` [Qemu-devel] [PATCH v6 03/11] qcow2: Split qcow2_check_refcounts() Max Reitz
@ 2014-10-20 14:35 ` Max Reitz
  2014-10-20 15:07   ` Kevin Wolf
  2014-10-20 14:35 ` [Qemu-devel] [PATCH v6 05/11] qcow2: Reuse refcount table in calculate_refcounts() Max Reitz
                   ` (6 subsequent siblings)
  10 siblings, 1 reply; 30+ messages in thread
From: Max Reitz @ 2014-10-20 14:35 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Benoît Canet, Stefan Hajnoczi, Max Reitz

Pull check_refblocks() before calculate_refcounts() so we can drop its
static declaration.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Benoît Canet <benoit.canet@nodalink.com>
---
 block/qcow2-refcount.c | 102 ++++++++++++++++++++++++-------------------------
 1 file changed, 49 insertions(+), 53 deletions(-)

diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index e5f7876..42de4ab 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -1535,59 +1535,6 @@ done:
     return new_offset;
 }
 
-static int check_refblocks(BlockDriverState *bs, BdrvCheckResult *res,
-                           BdrvCheckMode fix, uint16_t **refcount_table,
-                           int64_t *nb_clusters);
-
-/*
- * Calculates an in-memory refcount table.
- */
-static int calculate_refcounts(BlockDriverState *bs, BdrvCheckResult *res,
-                               BdrvCheckMode fix, uint16_t **refcount_table,
-                               int64_t *nb_clusters)
-{
-    BDRVQcowState *s = bs->opaque;
-    int64_t i;
-    QCowSnapshot *sn;
-    int ret;
-
-    *refcount_table = g_try_new0(uint16_t, *nb_clusters);
-    if (*nb_clusters && *refcount_table == NULL) {
-        res->check_errors++;
-        return -ENOMEM;
-    }
-
-    /* header */
-    inc_refcounts(bs, res, *refcount_table, *nb_clusters,
-        0, s->cluster_size);
-
-    /* current L1 table */
-    ret = check_refcounts_l1(bs, res, *refcount_table, *nb_clusters,
-                             s->l1_table_offset, s->l1_size, CHECK_FRAG_INFO);
-    if (ret < 0) {
-        return ret;
-    }
-
-    /* snapshots */
-    for (i = 0; i < s->nb_snapshots; i++) {
-        sn = s->snapshots + i;
-        ret = check_refcounts_l1(bs, res, *refcount_table, *nb_clusters,
-            sn->l1_table_offset, sn->l1_size, 0);
-        if (ret < 0) {
-            return ret;
-        }
-    }
-    inc_refcounts(bs, res, *refcount_table, *nb_clusters,
-        s->snapshots_offset, s->snapshots_size);
-
-    /* refcount data */
-    inc_refcounts(bs, res, *refcount_table, *nb_clusters,
-        s->refcount_table_offset,
-        s->refcount_table_size * sizeof(uint64_t));
-
-    return check_refblocks(bs, res, fix, refcount_table, nb_clusters);
-}
-
 /*
  * Checks consistency of refblocks and accounts for each refblock in
  * *refcount_table.
@@ -1665,6 +1612,55 @@ static int check_refblocks(BlockDriverState *bs, BdrvCheckResult *res,
 }
 
 /*
+ * Calculates an in-memory refcount table.
+ */
+static int calculate_refcounts(BlockDriverState *bs, BdrvCheckResult *res,
+                               BdrvCheckMode fix, uint16_t **refcount_table,
+                               int64_t *nb_clusters)
+{
+    BDRVQcowState *s = bs->opaque;
+    int64_t i;
+    QCowSnapshot *sn;
+    int ret;
+
+    *refcount_table = g_try_new0(uint16_t, *nb_clusters);
+    if (*nb_clusters && *refcount_table == NULL) {
+        res->check_errors++;
+        return -ENOMEM;
+    }
+
+    /* header */
+    inc_refcounts(bs, res, *refcount_table, *nb_clusters,
+        0, s->cluster_size);
+
+    /* current L1 table */
+    ret = check_refcounts_l1(bs, res, *refcount_table, *nb_clusters,
+                             s->l1_table_offset, s->l1_size, CHECK_FRAG_INFO);
+    if (ret < 0) {
+        return ret;
+    }
+
+    /* snapshots */
+    for (i = 0; i < s->nb_snapshots; i++) {
+        sn = s->snapshots + i;
+        ret = check_refcounts_l1(bs, res, *refcount_table, *nb_clusters,
+            sn->l1_table_offset, sn->l1_size, 0);
+        if (ret < 0) {
+            return ret;
+        }
+    }
+    inc_refcounts(bs, res, *refcount_table, *nb_clusters,
+        s->snapshots_offset, s->snapshots_size);
+
+    /* refcount data */
+    inc_refcounts(bs, res, *refcount_table, *nb_clusters,
+        s->refcount_table_offset,
+        s->refcount_table_size * sizeof(uint64_t));
+
+    return check_refblocks(bs, res, fix, refcount_table, nb_clusters);
+}
+
+/*
  * Compares the actual reference count for each cluster in the image against the
  * refcount as reported by the refcount structures on-disk.
  */
-- 
2.1.2

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [Qemu-devel] [PATCH v6 05/11] qcow2: Reuse refcount table in calculate_refcounts()
  2014-10-20 14:35 [Qemu-devel] [PATCH v6 00/11] qcow2: Fix image repairing Max Reitz
                   ` (3 preceding siblings ...)
  2014-10-20 14:35 ` [Qemu-devel] [PATCH v6 04/11] qcow2: Pull check_refblocks() up Max Reitz
@ 2014-10-20 14:35 ` Max Reitz
  2014-10-20 15:09   ` Kevin Wolf
  2014-10-20 14:35 ` [Qemu-devel] [PATCH v6 06/11] qcow2: Fix refcount blocks beyond image end Max Reitz
                   ` (5 subsequent siblings)
  10 siblings, 1 reply; 30+ messages in thread
From: Max Reitz @ 2014-10-20 14:35 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Benoît Canet, Stefan Hajnoczi, Max Reitz

We will later call calculate_refcounts multiple times, so reuse the
refcount table if possible.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Benoît Canet <benoit.canet@nodalink.com>
---
 block/qcow2-refcount.c | 12 +++++++-----
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index 42de4ab..55a539f 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -1623,10 +1623,12 @@ static int calculate_refcounts(BlockDriverState *bs, BdrvCheckResult *res,
     QCowSnapshot *sn;
     int ret;
 
-    *refcount_table = g_try_new0(uint16_t, *nb_clusters);
-    if (*nb_clusters && *refcount_table == NULL) {
-        res->check_errors++;
-        return -ENOMEM;
+    if (!*refcount_table) {
+        *refcount_table = g_try_new0(uint16_t, *nb_clusters);
+        if (*nb_clusters && *refcount_table == NULL) {
+            res->check_errors++;
+            return -ENOMEM;
+        }
     }
 
     /* header */
@@ -1733,7 +1735,7 @@ int qcow2_check_refcounts(BlockDriverState *bs, BdrvCheckResult *res,
 {
     BDRVQcowState *s = bs->opaque;
     int64_t size, highest_cluster, nb_clusters;
-    uint16_t *refcount_table;
+    uint16_t *refcount_table = NULL;
     int ret;
 
     size = bdrv_getlength(bs->file);
-- 
2.1.2

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [Qemu-devel] [PATCH v6 06/11] qcow2: Fix refcount blocks beyond image end
  2014-10-20 14:35 [Qemu-devel] [PATCH v6 00/11] qcow2: Fix image repairing Max Reitz
                   ` (4 preceding siblings ...)
  2014-10-20 14:35 ` [Qemu-devel] [PATCH v6 05/11] qcow2: Reuse refcount table in calculate_refcounts() Max Reitz
@ 2014-10-20 14:35 ` Max Reitz
  2014-10-20 16:44   ` Kevin Wolf
  2014-10-20 14:35 ` [Qemu-devel] [PATCH v6 07/11] qcow2: Do not perform potentially damaging repairs Max Reitz
                   ` (4 subsequent siblings)
  10 siblings, 1 reply; 30+ messages in thread
From: Max Reitz @ 2014-10-20 14:35 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Benoît Canet, Stefan Hajnoczi, Max Reitz

If the qcow2 check function detects a refcount block located beyond the
image end, grow the image appropriately. This cannot break anything and
is the logical fix for such a case.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
---
 block/qcow2-refcount.c | 62 ++++++++++++++++++++++++++++++++++++++++++++++----
 1 file changed, 58 insertions(+), 4 deletions(-)

diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index 55a539f..0225769 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -1544,7 +1544,8 @@ static int check_refblocks(BlockDriverState *bs, BdrvCheckResult *res,
                            int64_t *nb_clusters)
 {
     BDRVQcowState *s = bs->opaque;
-    int64_t i;
+    int64_t i, size;
+    int ret;
 
     for(i = 0; i < s->refcount_table_size; i++) {
         uint64_t offset, cluster;
@@ -1560,9 +1561,62 @@ static int check_refblocks(BlockDriverState *bs, BdrvCheckResult *res,
         }
 
         if (cluster >= *nb_clusters) {
-            fprintf(stderr, "ERROR refcount block %" PRId64
-                    " is outside image\n", i);
-            res->corruptions++;
+            fprintf(stderr, "%s refcount block %" PRId64 " is outside image\n",
+                    fix & BDRV_FIX_ERRORS ? "Repairing" : "ERROR", i);
+
+            if (fix & BDRV_FIX_ERRORS) {
+                int64_t old_nb_clusters = *nb_clusters;
+
+                if (offset + s->cluster_size < offset ||
+                    offset + s->cluster_size > INT64_MAX)
+                {
+                    ret = -EINVAL;
+                    goto resize_fail;
+                }
+
+                ret = bdrv_truncate(bs->file, offset + s->cluster_size);
+                if (ret < 0) {
+                    goto resize_fail;
+                }
+                size = bdrv_getlength(bs->file);
+                if (size < 0) {
+                    ret = size;
+                    goto resize_fail;
+                }
+
+                *nb_clusters = size_to_clusters(s, size);
+                assert(*nb_clusters >= old_nb_clusters);
+
+                *refcount_table = g_try_realloc(*refcount_table,
+                        *nb_clusters * sizeof(uint16_t));
+                if (!*refcount_table) {
+                    res->check_errors++;
+                    return -ENOMEM;
+                }
+
+                memset(*refcount_table + old_nb_clusters, 0,
+                       (*nb_clusters - old_nb_clusters) * sizeof(uint16_t));
+
+                if (cluster >= *nb_clusters) {
+                    ret = -EINVAL;
+                    goto resize_fail;
+                }
+
+                res->corruptions_fixed++;
+                inc_refcounts(bs, res, *refcount_table, *nb_clusters,
+                              offset, s->cluster_size);
+                /* No need to check whether the refcount is now greater than 1:
+                 * This area was just allocated and zeroed, so it can only be
+                 * exactly 1 after inc_refcounts() */
+                continue;
+
+resize_fail:
+                res->corruptions++;
+                fprintf(stderr, "ERROR could not resize image: %s\n",
+                        strerror(-ret));
+            } else {
+                res->corruptions++;
+            }
             continue;
         }
 
-- 
2.1.2

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [Qemu-devel] [PATCH v6 07/11] qcow2: Do not perform potentially damaging repairs
  2014-10-20 14:35 [Qemu-devel] [PATCH v6 00/11] qcow2: Fix image repairing Max Reitz
                   ` (5 preceding siblings ...)
  2014-10-20 14:35 ` [Qemu-devel] [PATCH v6 06/11] qcow2: Fix refcount blocks beyond image end Max Reitz
@ 2014-10-20 14:35 ` Max Reitz
  2014-10-21  7:52   ` Kevin Wolf
  2014-10-20 14:35 ` [Qemu-devel] [PATCH v6 08/11] qcow2: Rebuild refcount structure during check Max Reitz
                   ` (3 subsequent siblings)
  10 siblings, 1 reply; 30+ messages in thread
From: Max Reitz @ 2014-10-20 14:35 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Benoît Canet, Stefan Hajnoczi, Max Reitz

If a referenced cluster has a refcount of 0, increasing its refcount may
result in clusters being allocated for the refcount structures. This may
overwrite the referenced cluster, therefore we cannot simply increase
the refcount then.

In such cases, we can either try to replicate all the refcount
operations solely for the check operation, basing the allocations on the
in-memory refcount table; or we can simply rebuild the whole refcount
structure based on the in-memory refcount table. Since the latter will
be much easier, do that.

To prepare for this, introduce a "rebuild" boolean which should be set
to true whenever a fix is rather dangerous or too complicated using the
current refcount structures. Another example for this is refcount blocks
being referenced more than once.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Benoît Canet <benoit.canet@nodalink.com>
---
 block/qcow2-refcount.c | 185 ++++++++-----------------------------------------
 1 file changed, 27 insertions(+), 158 deletions(-)

diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index 0225769..183fc5b 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -1421,127 +1421,12 @@ fail:
 }
 
 /*
- * Writes one sector of the refcount table to the disk
- */
-#define RT_ENTRIES_PER_SECTOR (512 / sizeof(uint64_t))
-static int write_reftable_entry(BlockDriverState *bs, int rt_index)
-{
-    BDRVQcowState *s = bs->opaque;
-    uint64_t buf[RT_ENTRIES_PER_SECTOR];
-    int rt_start_index;
-    int i, ret;
-
-    rt_start_index = rt_index & ~(RT_ENTRIES_PER_SECTOR - 1);
-    for (i = 0; i < RT_ENTRIES_PER_SECTOR; i++) {
-        buf[i] = cpu_to_be64(s->refcount_table[rt_start_index + i]);
-    }
-
-    ret = qcow2_pre_write_overlap_check(bs, QCOW2_OL_REFCOUNT_TABLE,
-            s->refcount_table_offset + rt_start_index * sizeof(uint64_t),
-            sizeof(buf));
-    if (ret < 0) {
-        return ret;
-    }
-
-    BLKDBG_EVENT(bs->file, BLKDBG_REFTABLE_UPDATE);
-    ret = bdrv_pwrite_sync(bs->file, s->refcount_table_offset +
-            rt_start_index * sizeof(uint64_t), buf, sizeof(buf));
-    if (ret < 0) {
-        return ret;
-    }
-
-    return 0;
-}
-
-/*
- * Allocates a new cluster for the given refcount block (represented by its
- * offset in the image file) and copies the current content there. This function
- * does _not_ decrement the reference count for the currently occupied cluster.
- *
- * This function prints an informative message to stderr on error (and returns
- * -errno); on success, the offset of the newly allocated cluster is returned.
- */
-static int64_t realloc_refcount_block(BlockDriverState *bs, int reftable_index,
-                                      uint64_t offset)
-{
-    BDRVQcowState *s = bs->opaque;
-    int64_t new_offset = 0;
-    void *refcount_block = NULL;
-    int ret;
-
-    /* allocate new refcount block */
-    new_offset = qcow2_alloc_clusters(bs, s->cluster_size);
-    if (new_offset < 0) {
-        fprintf(stderr, "Could not allocate new cluster: %s\n",
-                strerror(-new_offset));
-        ret = new_offset;
-        goto done;
-    }
-
-    /* fetch current refcount block content */
-    ret = qcow2_cache_get(bs, s->refcount_block_cache, offset, &refcount_block);
-    if (ret < 0) {
-        fprintf(stderr, "Could not fetch refcount block: %s\n", strerror(-ret));
-        goto fail_free_cluster;
-    }
-
-    /* new block has not yet been entered into refcount table, therefore it is
-     * no refcount block yet (regarding this check) */
-    ret = qcow2_pre_write_overlap_check(bs, 0, new_offset, s->cluster_size);
-    if (ret < 0) {
-        fprintf(stderr, "Could not write refcount block; metadata overlap "
-                "check failed: %s\n", strerror(-ret));
-        /* the image will be marked corrupt, so don't even attempt on freeing
-         * the cluster */
-        goto done;
-    }
-
-    /* write to new block */
-    ret = bdrv_write(bs->file, new_offset / BDRV_SECTOR_SIZE, refcount_block,
-            s->cluster_sectors);
-    if (ret < 0) {
-        fprintf(stderr, "Could not write refcount block: %s\n", strerror(-ret));
-        goto fail_free_cluster;
-    }
-
-    /* update refcount table */
-    assert(!offset_into_cluster(s, new_offset));
-    s->refcount_table[reftable_index] = new_offset;
-    ret = write_reftable_entry(bs, reftable_index);
-    if (ret < 0) {
-        fprintf(stderr, "Could not update refcount table: %s\n",
-                strerror(-ret));
-        goto fail_free_cluster;
-    }
-
-    goto done;
-
-fail_free_cluster:
-    qcow2_free_clusters(bs, new_offset, s->cluster_size, QCOW2_DISCARD_OTHER);
-
-done:
-    if (refcount_block) {
-        /* This should never fail, as it would only do so if the given refcount
-         * block cannot be found in the cache. As this is impossible as long as
-         * there are no bugs, assert the success. */
-        int tmp = qcow2_cache_put(bs, s->refcount_block_cache, &refcount_block);
-        assert(tmp == 0);
-    }
-
-    if (ret < 0) {
-        return ret;
-    }
-
-    return new_offset;
-}
-
-/*
  * Checks consistency of refblocks and accounts for each refblock in
  * *refcount_table.
  */
 static int check_refblocks(BlockDriverState *bs, BdrvCheckResult *res,
-                           BdrvCheckMode fix, uint16_t **refcount_table,
-                           int64_t *nb_clusters)
+                           BdrvCheckMode fix, bool *rebuild,
+                           uint16_t **refcount_table, int64_t *nb_clusters)
 {
     BDRVQcowState *s = bs->opaque;
     int64_t i, size;
@@ -1557,6 +1442,7 @@ static int check_refblocks(BlockDriverState *bs, BdrvCheckResult *res,
             fprintf(stderr, "ERROR refcount block %" PRId64 " is not "
                 "cluster aligned; refcount table entry corrupted\n", i);
             res->corruptions++;
+            *rebuild = true;
             continue;
         }
 
@@ -1612,6 +1498,7 @@ static int check_refblocks(BlockDriverState *bs, BdrvCheckResult *res,
 
 resize_fail:
                 res->corruptions++;
+                *rebuild = true;
                 fprintf(stderr, "ERROR could not resize image: %s\n",
                         strerror(-ret));
             } else {
@@ -1624,40 +1511,10 @@ resize_fail:
             inc_refcounts(bs, res, *refcount_table, *nb_clusters,
                 offset, s->cluster_size);
             if ((*refcount_table)[cluster] != 1) {
-                fprintf(stderr, "%s refcount block %" PRId64
-                    " refcount=%d\n",
-                    fix & BDRV_FIX_ERRORS ? "Repairing" :
-                                            "ERROR",
-                    i, (*refcount_table)[cluster]);
-
-                if (fix & BDRV_FIX_ERRORS) {
-                    int64_t new_offset;
-
-                    new_offset = realloc_refcount_block(bs, i, offset);
-                    if (new_offset < 0) {
-                        res->corruptions++;
-                        continue;
-                    }
-
-                    /* update refcounts */
-                    if ((new_offset >> s->cluster_bits) >= *nb_clusters) {
-                        /* increase refcount_table size if necessary */
-                        int old_nb_clusters = *nb_clusters;
-                        *nb_clusters = (new_offset >> s->cluster_bits) + 1;
-                        *refcount_table = g_renew(uint16_t, *refcount_table,
-                                                  *nb_clusters);
-                        memset(&(*refcount_table)[old_nb_clusters], 0,
-                               (*nb_clusters - old_nb_clusters) *
-                               sizeof(uint16_t));
-                    }
-                    (*refcount_table)[cluster]--;
-                    inc_refcounts(bs, res, *refcount_table, *nb_clusters,
-                            new_offset, s->cluster_size);
-
-                    res->corruptions_fixed++;
-                } else {
-                    res->corruptions++;
-                }
+                fprintf(stderr, "ERROR refcount block %" PRId64
+                        " refcount=%d\n", i, (*refcount_table)[cluster]);
+                res->corruptions++;
+                *rebuild = true;
             }
         }
     }
@@ -1669,8 +1526,8 @@ resize_fail:
  * Calculates an in-memory refcount table.
  */
 static int calculate_refcounts(BlockDriverState *bs, BdrvCheckResult *res,
-                               BdrvCheckMode fix, uint16_t **refcount_table,
-                               int64_t *nb_clusters)
+                               BdrvCheckMode fix, bool *rebuild,
+                               uint16_t **refcount_table, int64_t *nb_clusters)
 {
     BDRVQcowState *s = bs->opaque;
     int64_t i;
@@ -1713,7 +1570,7 @@ static int calculate_refcounts(BlockDriverState *bs, BdrvCheckResult *res,
         s->refcount_table_offset,
         s->refcount_table_size * sizeof(uint64_t));
 
-    return check_refblocks(bs, res, fix, refcount_table, nb_clusters);
+    return check_refblocks(bs, res, fix, rebuild, refcount_table, nb_clusters);
 }
 
 /*
@@ -1721,7 +1578,8 @@ static int calculate_refcounts(BlockDriverState *bs, BdrvCheckResult *res,
  * refcount as reported by the refcount structures on-disk.
  */
 static void compare_refcounts(BlockDriverState *bs, BdrvCheckResult *res,
-                              BdrvCheckMode fix, int64_t *highest_cluster,
+                              BdrvCheckMode fix, bool *rebuild,
+                              int64_t *highest_cluster,
                               uint16_t *refcount_table, int64_t nb_clusters)
 {
     BDRVQcowState *s = bs->opaque;
@@ -1748,10 +1606,15 @@ static void compare_refcounts(BlockDriverState *bs, BdrvCheckResult *res,
             int *num_fixed = NULL;
             if (refcount1 > refcount2 && (fix & BDRV_FIX_LEAKS)) {
                 num_fixed = &res->leaks_fixed;
-            } else if (refcount1 < refcount2 && (fix & BDRV_FIX_ERRORS)) {
+            } else if (refcount1 > 0 && refcount1 < refcount2 &&
+                       (fix & BDRV_FIX_ERRORS)) {
                 num_fixed = &res->corruptions_fixed;
             }
 
+            if (refcount1 == 0) {
+                *rebuild = true;
+            }
+
             fprintf(stderr, "%s cluster %" PRId64 " refcount=%d reference=%d\n",
                    num_fixed != NULL     ? "Repairing" :
                    refcount1 < refcount2 ? "ERROR" :
@@ -1790,6 +1653,7 @@ int qcow2_check_refcounts(BlockDriverState *bs, BdrvCheckResult *res,
     BDRVQcowState *s = bs->opaque;
     int64_t size, highest_cluster, nb_clusters;
     uint16_t *refcount_table = NULL;
+    bool rebuild = false;
     int ret;
 
     size = bdrv_getlength(bs->file);
@@ -1807,14 +1671,19 @@ int qcow2_check_refcounts(BlockDriverState *bs, BdrvCheckResult *res,
     res->bfi.total_clusters =
         size_to_clusters(s, bs->total_sectors * BDRV_SECTOR_SIZE);
 
-    ret = calculate_refcounts(bs, res, fix, &refcount_table, &nb_clusters);
+    ret = calculate_refcounts(bs, res, fix, &rebuild, &refcount_table,
+                              &nb_clusters);
     if (ret < 0) {
         goto fail;
     }
 
-    compare_refcounts(bs, res, fix, &highest_cluster, refcount_table,
+    compare_refcounts(bs, res, fix, &rebuild, &highest_cluster, refcount_table,
                       nb_clusters);
 
+    if (rebuild) {
+        fprintf(stderr, "ERROR need to rebuild refcount structures\n");
+    }
+
     /* check OFLAG_COPIED */
     ret = check_oflag_copied(bs, res, fix);
     if (ret < 0) {
-- 
2.1.2

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [Qemu-devel] [PATCH v6 08/11] qcow2: Rebuild refcount structure during check
  2014-10-20 14:35 [Qemu-devel] [PATCH v6 00/11] qcow2: Fix image repairing Max Reitz
                   ` (6 preceding siblings ...)
  2014-10-20 14:35 ` [Qemu-devel] [PATCH v6 07/11] qcow2: Do not perform potentially damaging repairs Max Reitz
@ 2014-10-20 14:35 ` Max Reitz
  2014-10-21  9:31   ` Kevin Wolf
  2014-10-20 14:35 ` [Qemu-devel] [PATCH v6 09/11] qcow2: Clean up after refcount rebuild Max Reitz
                   ` (2 subsequent siblings)
  10 siblings, 1 reply; 30+ messages in thread
From: Max Reitz @ 2014-10-20 14:35 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Benoît Canet, Stefan Hajnoczi, Max Reitz

The previous commit introduced the "rebuild" variable to qcow2's
implementation of the image consistency check. Now make use of this by
adding a function which creates a completely new refcount structure
based solely on the in-memory information gathered before.

The old refcount structure will be leaked, however. This leak will be
dealt with in a follow-up commit.

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 block/qcow2-refcount.c | 296 ++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 293 insertions(+), 3 deletions(-)

diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index 183fc5b..75e726b 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -1642,6 +1642,276 @@ static void compare_refcounts(BlockDriverState *bs, BdrvCheckResult *res,
 }
 
 /*
+ * Allocates a cluster using an in-memory refcount table (IMRT) in contrast to
+ * the on-disk refcount structures.
+ *
+ * On input, *first_free_cluster tells where to start looking, and need not
+ * actually be a free cluster; the returned offset will not be before that
+ * cluster.  On output, *first_free_cluster points to the first gap found, even
+ * if that gap was too small to be used as the returned offset.
+ *
+ * Note that *first_free_cluster is a cluster index whereas the return value is
+ * an offset.
+ */
+static int64_t alloc_clusters_imrt(BlockDriverState *bs,
+                                   int cluster_count,
+                                   uint16_t **refcount_table,
+                                   int64_t *nb_clusters,
+                                   int64_t *first_free_cluster)
+{
+    BDRVQcowState *s = bs->opaque;
+    int64_t cluster = *first_free_cluster, i;
+    bool first_gap = true;
+    int contiguous_free_clusters;
+
+    /* Starting at *first_free_cluster, find a range of at least cluster_count
+     * continuously free clusters */
+    for (contiguous_free_clusters = 0;
+         cluster < *nb_clusters && contiguous_free_clusters < cluster_count;
+         cluster++)
+    {
+        if (!(*refcount_table)[cluster]) {
+            contiguous_free_clusters++;
+            if (first_gap) {
+                /* If this is the first free cluster found, update
+                 * *first_free_cluster accordingly */
+                *first_free_cluster = cluster;
+                first_gap = false;
+            }
+        } else if (contiguous_free_clusters) {
+            contiguous_free_clusters = 0;
+        }
+    }
+
+    /* If contiguous_free_clusters is greater than zero, it contains the number
+     * of continuously free clusters until the current cluster; the first free
+     * cluster in the current "gap" is therefore
+     * cluster - contiguous_free_clusters */
+
+    /* If no such range could be found, grow the in-memory refcount table
+     * accordingly to append free clusters at the end of the image */
+    if (contiguous_free_clusters < cluster_count) {
+        int64_t old_nb_clusters = *nb_clusters;
+
+        /* contiguous_free_clusters clusters are already empty at the image end;
+         * we need cluster_count clusters; therefore, we have to allocate
+         * cluster_count - contiguous_free_clusters new clusters at the end of
+         * the image (which is the current value of cluster; note that cluster
+         * may exceed old_nb_clusters if *first_free_cluster pointed beyond the
+         * image end) */
+        *nb_clusters = cluster + cluster_count - contiguous_free_clusters;
+        *refcount_table = g_try_realloc(*refcount_table,
+                                        *nb_clusters * sizeof(uint16_t));
+        if (!*refcount_table) {
+            return -ENOMEM;
+        }
+
+        memset(*refcount_table + old_nb_clusters, 0,
+               (*nb_clusters - old_nb_clusters) * sizeof(uint16_t));
+    }
+
+    /* Go back to the first free cluster */
+    cluster -= contiguous_free_clusters;
+    for (i = 0; i < cluster_count; i++) {
+        (*refcount_table)[cluster + i] = 1;
+    }
+
+    return cluster << s->cluster_bits;
+}
+
+/*
+ * Creates a new refcount structure based solely on the in-memory information
+ * given through *refcount_table. All necessary allocations will be reflected
+ * in that array.
+ *
+ * On success, the old refcount structure is leaked (it will be covered by the
+ * new refcount structure).
+ */
+static int rebuild_refcount_structure(BlockDriverState *bs,
+                                      BdrvCheckResult *res,
+                                      uint16_t **refcount_table,
+                                      int64_t *nb_clusters)
+{
+    BDRVQcowState *s = bs->opaque;
+    int64_t first_free_cluster = 0, reftable_offset = -1, cluster = 0;
+    int64_t refblock_offset, refblock_start, refblock_index;
+    uint32_t reftable_size = 0;
+    uint64_t *reftable = NULL;
+    uint16_t *on_disk_refblock;
+    int i, ret = 0;
+    struct {
+        uint64_t reftable_offset;
+        uint32_t reftable_clusters;
+    } QEMU_PACKED reftable_offset_and_clusters;
+
+    qcow2_cache_empty(bs, s->refcount_block_cache);
+
+write_refblocks:
+    for (; cluster < *nb_clusters; cluster++) {
+        if (!(*refcount_table)[cluster]) {
+            continue;
+        }
+
+        refblock_index = cluster >> s->refcount_block_bits;
+        refblock_start = refblock_index << s->refcount_block_bits;
+
+        /* Don't allocate a cluster in a refblock already written to disk */
+        if (first_free_cluster < refblock_start) {
+            first_free_cluster = refblock_start;
+        }
+        refblock_offset = alloc_clusters_imrt(bs, 1, refcount_table,
+                                              nb_clusters, &first_free_cluster);
+        if (refblock_offset < 0) {
+            fprintf(stderr, "ERROR allocating refblock: %s\n",
+                    strerror(-refblock_offset));
+            res->check_errors++;
+            ret = refblock_offset;
+            goto fail;
+        }
+
+        if (reftable_size <= refblock_index) {
+            uint32_t old_rt_size = reftable_size;
+            reftable_size = ROUND_UP((refblock_index + 1) * sizeof(uint64_t),
+                                     s->cluster_size) / sizeof(uint64_t);
+            reftable = g_try_realloc(reftable,
+                                     reftable_size * sizeof(uint64_t));
+            if (!reftable) {
+                res->check_errors++;
+                ret = -ENOMEM;
+                goto fail;
+            }
+
+            memset(reftable + old_rt_size, 0,
+                   (reftable_size - old_rt_size) * sizeof(uint64_t));
+
+            /* The offset we have for the reftable is now no longer valid;
+             * this will leak that range, but we can easily fix that by running
+             * a leak-fixing check after this rebuild operation */
+            reftable_offset = -1;
+        }
+        reftable[refblock_index] = refblock_offset;
+
+        /* If this is apparently the last refblock (for now), try to squeeze the
+         * reftable in */
+        if (refblock_index == (*nb_clusters - 1) >> s->refcount_block_bits &&
+            reftable_offset < 0)
+        {
+            uint64_t reftable_clusters = size_to_clusters(s, reftable_size *
+                                                          sizeof(uint64_t));
+            reftable_offset = alloc_clusters_imrt(bs, reftable_clusters,
+                                                  refcount_table, nb_clusters,
+                                                  &first_free_cluster);
+            if (reftable_offset < 0) {
+                fprintf(stderr, "ERROR allocating reftable: %s\n",
+                        strerror(-reftable_offset));
+                res->check_errors++;
+                ret = reftable_offset;
+                goto fail;
+            }
+        }
+
+        ret = qcow2_pre_write_overlap_check(bs, 0, refblock_offset,
+                                            s->cluster_size);
+        if (ret < 0) {
+            fprintf(stderr, "ERROR writing refblock: %s\n", strerror(-ret));
+            goto fail;
+        }
+
+        on_disk_refblock = g_malloc0(s->cluster_size);
+        for (i = 0; i < s->cluster_size / sizeof(uint16_t) &&
+                    refblock_start + i < *nb_clusters; i++)
+        {
+            on_disk_refblock[i] =
+                cpu_to_be16((*refcount_table)[refblock_start + i]);
+        }
+
+        ret = bdrv_write(bs->file, refblock_offset / BDRV_SECTOR_SIZE,
+                         (void *)on_disk_refblock, s->cluster_sectors);
+        g_free(on_disk_refblock);
+        if (ret < 0) {
+            fprintf(stderr, "ERROR writing refblock: %s\n", strerror(-ret));
+            goto fail;
+        }
+
+        /* Go to the end of this refblock */
+        cluster = refblock_start + s->cluster_size / sizeof(uint16_t) - 1;
+    }
+
+    if (reftable_offset < 0) {
+        uint64_t post_refblock_start, reftable_clusters;
+
+        post_refblock_start = ROUND_UP(*nb_clusters,
+                                       s->cluster_size / sizeof(uint16_t));
+        reftable_clusters = size_to_clusters(s,
+                                             reftable_size * sizeof(uint64_t));
+        /* Not pretty but simple */
+        if (first_free_cluster < post_refblock_start) {
+            first_free_cluster = post_refblock_start;
+        }
+        reftable_offset = alloc_clusters_imrt(bs, reftable_clusters,
+                                              refcount_table, nb_clusters,
+                                              &first_free_cluster);
+        if (reftable_offset < 0) {
+            fprintf(stderr, "ERROR allocating reftable: %s\n",
+                    strerror(-reftable_offset));
+            res->check_errors++;
+            ret = reftable_offset;
+            goto fail;
+        }
+
+        goto write_refblocks;
+    }
+
+    assert(reftable);
+
+    for (refblock_index = 0; refblock_index < reftable_size; refblock_index++) {
+        cpu_to_be64s(&reftable[refblock_index]);
+    }
+
+    ret = qcow2_pre_write_overlap_check(bs, 0, reftable_offset,
+                                        reftable_size * sizeof(uint64_t));
+    if (ret < 0) {
+        fprintf(stderr, "ERROR writing reftable: %s\n", strerror(-ret));
+        goto fail;
+    }
+
+    ret = bdrv_write(bs->file, reftable_offset / BDRV_SECTOR_SIZE,
+                     (void *)reftable,
+                     reftable_size * sizeof(uint64_t) / BDRV_SECTOR_SIZE);
+    if (ret < 0) {
+        fprintf(stderr, "ERROR writing reftable: %s\n", strerror(-ret));
+        goto fail;
+    }
+
+    /* Enter new reftable into the image header */
+    cpu_to_be64w(&reftable_offset_and_clusters.reftable_offset,
+                 reftable_offset);
+    cpu_to_be32w(&reftable_offset_and_clusters.reftable_clusters,
+                 size_to_clusters(s, reftable_size * sizeof(uint64_t)));
+    ret = bdrv_pwrite_sync(bs->file, offsetof(QCowHeader,
+                                              refcount_table_offset),
+                           &reftable_offset_and_clusters,
+                           sizeof(reftable_offset_and_clusters));
+    if (ret < 0) {
+        fprintf(stderr, "ERROR setting reftable: %s\n", strerror(-ret));
+        goto fail;
+    }
+
+    for (refblock_index = 0; refblock_index < reftable_size; refblock_index++) {
+        be64_to_cpus(&reftable[refblock_index]);
+    }
+    s->refcount_table = reftable;
+    s->refcount_table_offset = reftable_offset;
+    s->refcount_table_size = reftable_size;
+
+    return 0;
+
+fail:
+    g_free(reftable);
+    return ret;
+}
+
+/*
  * Checks an image for refcount consistency.
  *
  * Returns 0 if no errors are found, the number of errors in case the image is
@@ -1651,6 +1921,7 @@ int qcow2_check_refcounts(BlockDriverState *bs, BdrvCheckResult *res,
                           BdrvCheckMode fix)
 {
     BDRVQcowState *s = bs->opaque;
+    BdrvCheckResult pre_compare_res;
     int64_t size, highest_cluster, nb_clusters;
     uint16_t *refcount_table = NULL;
     bool rebuild = false;
@@ -1677,11 +1948,30 @@ int qcow2_check_refcounts(BlockDriverState *bs, BdrvCheckResult *res,
         goto fail;
     }
 
-    compare_refcounts(bs, res, fix, &rebuild, &highest_cluster, refcount_table,
+    /* In case we don't need to rebuild the refcount structure (but want to fix
+     * something), this function is immediately called again, in which case the
+     * result should be ignored */
+    pre_compare_res = *res;
+    compare_refcounts(bs, res, 0, &rebuild, &highest_cluster, refcount_table,
                       nb_clusters);
 
-    if (rebuild) {
-        fprintf(stderr, "ERROR need to rebuild refcount structures\n");
+    if (rebuild && (fix & BDRV_FIX_ERRORS)) {
+        fprintf(stderr, "Rebuilding refcount structure\n");
+        ret = rebuild_refcount_structure(bs, res, &refcount_table,
+                                         &nb_clusters);
+        if (ret < 0) {
+            goto fail;
+        }
+    } else if (fix) {
+        if (rebuild) {
+            fprintf(stderr, "ERROR need to rebuild refcount structures\n");
+        }
+
+        if (res->leaks || res->corruptions) {
+            *res = pre_compare_res;
+            compare_refcounts(bs, res, fix, &rebuild, &highest_cluster,
+                              refcount_table, nb_clusters);
+        }
     }
 
     /* check OFLAG_COPIED */
-- 
2.1.2

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [Qemu-devel] [PATCH v6 09/11] qcow2: Clean up after refcount rebuild
  2014-10-20 14:35 [Qemu-devel] [PATCH v6 00/11] qcow2: Fix image repairing Max Reitz
                   ` (7 preceding siblings ...)
  2014-10-20 14:35 ` [Qemu-devel] [PATCH v6 08/11] qcow2: Rebuild refcount structure during check Max Reitz
@ 2014-10-20 14:35 ` Max Reitz
  2014-10-21  9:59   ` Kevin Wolf
  2014-10-20 14:35 ` [Qemu-devel] [PATCH v6 10/11] iotests: Fix test outputs Max Reitz
  2014-10-20 14:35 ` [Qemu-devel] [PATCH v6 11/11] iotests: Add test for potentially damaging repairs Max Reitz
  10 siblings, 1 reply; 30+ messages in thread
From: Max Reitz @ 2014-10-20 14:35 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Benoît Canet, Stefan Hajnoczi, Max Reitz

Because the old refcount structure will be leaked after having rebuilt
it, we need to recalculate the refcounts and run a leak-fixing operation
afterwards (if leaks should be fixed at all).

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Benoît Canet <benoit.canet@nodalink.com>
---
 block/qcow2-refcount.c | 35 +++++++++++++++++++++++++++++++++++
 1 file changed, 35 insertions(+)

diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index 75e726b..3730be2 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -1956,12 +1956,47 @@ int qcow2_check_refcounts(BlockDriverState *bs, BdrvCheckResult *res,
                       nb_clusters);
 
     if (rebuild && (fix & BDRV_FIX_ERRORS)) {
+        BdrvCheckResult old_res = *res;
+
         fprintf(stderr, "Rebuilding refcount structure\n");
         ret = rebuild_refcount_structure(bs, res, &refcount_table,
                                          &nb_clusters);
         if (ret < 0) {
             goto fail;
         }
+
+        res->corruptions = 0;
+        res->leaks = 0;
+
+        /* Because the old reftable has been exchanged for a new one the
+         * references have to be recalculated */
+        rebuild = false;
+        memset(refcount_table, 0, nb_clusters * sizeof(uint16_t));
+        ret = calculate_refcounts(bs, res, 0, &rebuild, &refcount_table,
+                                  &nb_clusters);
+        if (ret < 0) {
+            goto fail;
+        }
+
+        if (fix & BDRV_FIX_LEAKS) {
+            /* The old refcount structures are now leaked, fix it; the result
+             * can be ignored */
+            pre_compare_res = *res;
+            compare_refcounts(bs, res, BDRV_FIX_LEAKS, &rebuild,
+                              &highest_cluster, refcount_table, nb_clusters);
+            if (rebuild) {
+                fprintf(stderr, "ERROR rebuilt refcount structure is still "
+                        "broken\n");
+            }
+            *res = pre_compare_res;
+        }
+
+        if (res->corruptions < old_res.corruptions) {
+            res->corruptions_fixed += old_res.corruptions - res->corruptions;
+        }
+        if (res->leaks < old_res.leaks) {
+            res->leaks_fixed += old_res.leaks - res->leaks;
+        }
     } else if (fix) {
         if (rebuild) {
             fprintf(stderr, "ERROR need to rebuild refcount structures\n");
-- 
2.1.2

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [Qemu-devel] [PATCH v6 10/11] iotests: Fix test outputs
  2014-10-20 14:35 [Qemu-devel] [PATCH v6 00/11] qcow2: Fix image repairing Max Reitz
                   ` (8 preceding siblings ...)
  2014-10-20 14:35 ` [Qemu-devel] [PATCH v6 09/11] qcow2: Clean up after refcount rebuild Max Reitz
@ 2014-10-20 14:35 ` Max Reitz
  2014-10-21 13:42   ` Kevin Wolf
  2014-10-20 14:35 ` [Qemu-devel] [PATCH v6 11/11] iotests: Add test for potentially damaging repairs Max Reitz
  10 siblings, 1 reply; 30+ messages in thread
From: Max Reitz @ 2014-10-20 14:35 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Benoît Canet, Stefan Hajnoczi, Max Reitz

039, 060 and 061 all create images with referenced clusters having a
refcount of 0. Because previous commits changed handling of such errors,
these tests now have a different output. Fix it.

Furthermore, 060 created a refblock with a refcount greater than one
which now results in having to rebuild the refcount structure as well.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
---
 tests/qemu-iotests/039.out | 10 ++++++++--
 tests/qemu-iotests/060.out | 10 ++++++++--
 tests/qemu-iotests/061.out | 18 ++++++++++++------
 3 files changed, 28 insertions(+), 10 deletions(-)

diff --git a/tests/qemu-iotests/039.out b/tests/qemu-iotests/039.out
index 67e7744..0adf153 100644
--- a/tests/qemu-iotests/039.out
+++ b/tests/qemu-iotests/039.out
@@ -25,7 +25,10 @@ read 512/512 bytes at offset 0
 incompatible_features     0x1
 
 == Repairing the image file must succeed ==
-Repairing cluster 5 refcount=0 reference=1
+ERROR cluster 5 refcount=0 reference=1
+Rebuilding refcount structure
+Repairing cluster 1 refcount=1 reference=0
+Repairing cluster 2 refcount=1 reference=0
 The following inconsistencies were found and repaired:
 
     0 leaked clusters
@@ -45,7 +48,10 @@ wrote 512/512 bytes at offset 0
 512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 ./039: Aborted                 ( ulimit -c 0; exec "$@" )
 incompatible_features     0x1
-Repairing cluster 5 refcount=0 reference=1
+ERROR cluster 5 refcount=0 reference=1
+Rebuilding refcount structure
+Repairing cluster 1 refcount=1 reference=0
+Repairing cluster 2 refcount=1 reference=0
 wrote 512/512 bytes at offset 0
 512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 incompatible_features     0x0
diff --git a/tests/qemu-iotests/060.out b/tests/qemu-iotests/060.out
index cd679f9..9419da1 100644
--- a/tests/qemu-iotests/060.out
+++ b/tests/qemu-iotests/060.out
@@ -36,11 +36,15 @@ incompatible_features     0x0
 qcow2: Marking image as corrupt: Preventing invalid write on metadata (overlaps with refcount block); further corruption events will be suppressed
 write failed: Input/output error
 incompatible_features     0x2
-Repairing refcount block 0 refcount=2
+ERROR refcount block 0 refcount=2
+ERROR cluster 2 refcount=1 reference=2
+Rebuilding refcount structure
+Repairing cluster 1 refcount=1 reference=0
+Repairing cluster 2 refcount=2 reference=1
 The following inconsistencies were found and repaired:
 
     0 leaked clusters
-    1 corruptions
+    2 corruptions
 
 Double checking the fixed image now...
 No errors were found on the image.
@@ -68,6 +72,8 @@ incompatible_features     0x0
 qcow2: Marking image as corrupt: Preventing invalid write on metadata (overlaps with inactive L2 table); further corruption events will be suppressed
 write failed: Input/output error
 incompatible_features     0x2
+ERROR cluster 4 refcount=1 reference=2
+Leaked cluster 9 refcount=1 reference=0
 Repairing cluster 4 refcount=1 reference=2
 Repairing cluster 9 refcount=1 reference=0
 Repairing OFLAG_COPIED data cluster: l2_entry=8000000000040000 refcount=2
diff --git a/tests/qemu-iotests/061.out b/tests/qemu-iotests/061.out
index e372470..4ae6472 100644
--- a/tests/qemu-iotests/061.out
+++ b/tests/qemu-iotests/061.out
@@ -76,8 +76,11 @@ autoclear_features        0x0
 refcount_order            4
 header_length             104
 
-Repairing cluster 5 refcount=0 reference=1
-Repairing cluster 6 refcount=0 reference=1
+ERROR cluster 5 refcount=0 reference=1
+ERROR cluster 6 refcount=0 reference=1
+Rebuilding refcount structure
+Repairing cluster 1 refcount=1 reference=0
+Repairing cluster 2 refcount=1 reference=0
 magic                     0x514649fb
 version                   2
 backing_file_offset       0x0
@@ -87,7 +90,7 @@ size                      67108864
 crypt_method              0
 l1_size                   1
 l1_table_offset           0x30000
-refcount_table_offset     0x10000
+refcount_table_offset     0x80000
 refcount_table_clusters   1
 nb_snapshots              0
 snapshot_offset           0x0
@@ -230,8 +233,11 @@ autoclear_features        0x0
 refcount_order            4
 header_length             104
 
-Repairing cluster 5 refcount=0 reference=1
-Repairing cluster 6 refcount=0 reference=1
+ERROR cluster 5 refcount=0 reference=1
+ERROR cluster 6 refcount=0 reference=1
+Rebuilding refcount structure
+Repairing cluster 1 refcount=1 reference=0
+Repairing cluster 2 refcount=1 reference=0
 magic                     0x514649fb
 version                   3
 backing_file_offset       0x0
@@ -241,7 +247,7 @@ size                      67108864
 crypt_method              0
 l1_size                   1
 l1_table_offset           0x30000
-refcount_table_offset     0x10000
+refcount_table_offset     0x80000
 refcount_table_clusters   1
 nb_snapshots              0
 snapshot_offset           0x0
-- 
2.1.2

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [Qemu-devel] [PATCH v6 11/11] iotests: Add test for potentially damaging repairs
  2014-10-20 14:35 [Qemu-devel] [PATCH v6 00/11] qcow2: Fix image repairing Max Reitz
                   ` (9 preceding siblings ...)
  2014-10-20 14:35 ` [Qemu-devel] [PATCH v6 10/11] iotests: Fix test outputs Max Reitz
@ 2014-10-20 14:35 ` Max Reitz
  2014-10-21 14:12   ` Kevin Wolf
  10 siblings, 1 reply; 30+ messages in thread
From: Max Reitz @ 2014-10-20 14:35 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Benoît Canet, Stefan Hajnoczi, Max Reitz

There are certain cases where repairing a qcow2 image might actually
damage it further (or rather, where repairing it has in fact damaged it
further with the old qcow2 check implementation). This should not
happen, so add a test for these cases.

Furthermore, the repair function now repairs refblocks beyond the image
end by resizing the image accordingly. Add several tests for this as
well.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
---
 tests/qemu-iotests/108     | 141 +++++++++++++++++++++++++++++++++++++++++++++
 tests/qemu-iotests/108.out | 110 +++++++++++++++++++++++++++++++++++
 tests/qemu-iotests/group   |   1 +
 3 files changed, 252 insertions(+)
 create mode 100755 tests/qemu-iotests/108
 create mode 100644 tests/qemu-iotests/108.out

diff --git a/tests/qemu-iotests/108 b/tests/qemu-iotests/108
new file mode 100755
index 0000000..a2458be
--- /dev/null
+++ b/tests/qemu-iotests/108
@@ -0,0 +1,141 @@
+#!/bin/bash
+#
+# Test case for repairing qcow2 images which cannot be repaired using
+# the on-disk refcount structures
+#
+# Copyright (C) 2014 Red Hat, Inc.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+#
+
+# creator
+owner=mreitz@redhat.com
+
+seq="$(basename $0)"
+echo "QA output created by $seq"
+
+here="$PWD"
+tmp=/tmp/$$
+status=1	# failure is the default!
+
+_cleanup()
+{
+	_cleanup_test_img
+}
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+# get standard environment, filters and checks
+. ./common.rc
+. ./common.filter
+
+# This tests qocw2-specific low-level functionality
+_supported_fmt qcow2
+_supported_proto file
+_supported_os Linux
+
+echo
+echo '=== Repairing an image without any refcount table ==='
+echo
+
+_make_test_img 64M
+# just write some data
+$QEMU_IO -c 'write -P 42 0 64k' "$TEST_IMG" | _filter_qemu_io
+
+# refcount_table_offset
+poke_file "$TEST_IMG" 48 "\x00\x00\x00\x00\x00\x00\x00\x00"
+# refcount_table_clusters
+poke_file "$TEST_IMG" 56 "\x00\x00\x00\x00"
+
+_check_test_img -r all
+
+$QEMU_IO -c 'read -P 42 0 64k' "$TEST_IMG" | _filter_qemu_io
+
+echo
+echo '=== Repairing unreferenced data cluster in new refblock area ==='
+echo
+
+IMGOPTS='cluster_size=512' _make_test_img 64M
+# Allocate the first 128 kB in the image (first refblock)
+$QEMU_IO -c 'write 0 111104' "$TEST_IMG" | _filter_qemu_io
+# should be 131072
+stat -c '%s' "$TEST_IMG"
+
+# Enter a cluster at 128 kB (0x20000)
+# XXX: This (0x1ccc8) should be the first free entry in the last L2 table, but
+# we cannot be sure
+poke_file "$TEST_IMG" 117960 "\x80\x00\x00\x00\x00\x02\x00\x00"
+
+# Fill the cluster
+truncate -s 131584 "$TEST_IMG"
+$QEMU_IO -c "open -o driver=raw $TEST_IMG" -c 'write -P 42 128k 512' \
+    | _filter_qemu_io
+
+# The data should now appear at this guest offset
+$QEMU_IO -c 'read -P 42 111104 512' "$TEST_IMG" | _filter_qemu_io
+
+# This cluster is unallocated; fix it
+_check_test_img -r all
+
+# This repair operation must have allocated a new refblock; and that refblock
+# should not overlap with the unallocated data cluster. If it does, the data
+# will be damaged, so check it.
+$QEMU_IO -c 'read -P 42 111104 512' "$TEST_IMG" | _filter_qemu_io
+
+echo
+echo '=== Repairing refblock beyond the image end ==='
+echo
+
+echo
+echo '--- Otherwise clean ---'
+echo
+
+_make_test_img 64M
+# Normally, qemu doesn't create empty refblocks, so we just have to do it by
+# hand
+# XXX: This (0x10008) should be the entry for the second refblock
+poke_file "$TEST_IMG" 65544 "\x00\x00\x00\x00\x00\x10\x00\x00"
+# Mark that refblock as used
+# XXX: This (0x20020) should be the 17th entry (cluster 16) of the first
+# refblock
+poke_file "$TEST_IMG" 131104 "\x00\x01"
+_check_test_img -r all
+
+echo
+echo '--- Refblock is unallocated ---'
+echo
+
+_make_test_img 64M
+poke_file "$TEST_IMG" 65544 "\x00\x00\x00\x00\x00\x10\x00\x00"
+_check_test_img -r all
+
+echo
+echo '--- Signed overflow after the refblock ---'
+echo
+
+_make_test_img 64M
+poke_file "$TEST_IMG" 65544 "\x7f\xff\xff\xff\xff\xff\x00\x00"
+_check_test_img -r all
+
+echo
+echo '--- Unsigned overflow after the refblock ---'
+echo
+
+_make_test_img 64M
+poke_file "$TEST_IMG" 65544 "\xff\xff\xff\xff\xff\xff\x00\x00"
+_check_test_img -r all
+
+# success, all done
+echo '*** done'
+rm -f $seq.full
+status=0
diff --git a/tests/qemu-iotests/108.out b/tests/qemu-iotests/108.out
new file mode 100644
index 0000000..824d5cf
--- /dev/null
+++ b/tests/qemu-iotests/108.out
@@ -0,0 +1,110 @@
+QA output created by 108
+
+=== Repairing an image without any refcount table ===
+
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=67108864 
+wrote 65536/65536 bytes at offset 0
+64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+ERROR cluster 0 refcount=0 reference=1
+ERROR cluster 3 refcount=0 reference=1
+ERROR cluster 4 refcount=0 reference=1
+ERROR cluster 5 refcount=0 reference=1
+Rebuilding refcount structure
+The following inconsistencies were found and repaired:
+
+    0 leaked clusters
+    4 corruptions
+
+Double checking the fixed image now...
+No errors were found on the image.
+read 65536/65536 bytes at offset 0
+64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+
+=== Repairing unreferenced data cluster in new refblock area ===
+
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=67108864 
+wrote 111104/111104 bytes at offset 0
+108.500 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+131072
+wrote 512/512 bytes at offset 131072
+512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 512/512 bytes at offset 111104
+512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+ERROR cluster 256 refcount=0 reference=1
+Rebuilding refcount structure
+Repairing cluster 1 refcount=1 reference=0
+Repairing cluster 2 refcount=1 reference=0
+The following inconsistencies were found and repaired:
+
+    0 leaked clusters
+    1 corruptions
+
+Double checking the fixed image now...
+No errors were found on the image.
+read 512/512 bytes at offset 111104
+512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+
+=== Repairing refblock beyond the image end ===
+
+
+--- Otherwise clean ---
+
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=67108864 
+Repairing refcount block 1 is outside image
+The following inconsistencies were found and repaired:
+
+    0 leaked clusters
+    1 corruptions
+
+Double checking the fixed image now...
+No errors were found on the image.
+
+--- Refblock is unallocated ---
+
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=67108864 
+Repairing refcount block 1 is outside image
+ERROR cluster 16 refcount=0 reference=1
+Rebuilding refcount structure
+Repairing cluster 1 refcount=1 reference=0
+Repairing cluster 2 refcount=1 reference=0
+Repairing cluster 16 refcount=1 reference=0
+The following inconsistencies were found and repaired:
+
+    0 leaked clusters
+    2 corruptions
+
+Double checking the fixed image now...
+No errors were found on the image.
+
+--- Signed overflow after the refblock ---
+
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=67108864 
+Repairing refcount block 1 is outside image
+ERROR could not resize image: Invalid argument
+Rebuilding refcount structure
+Repairing cluster 1 refcount=1 reference=0
+Repairing cluster 2 refcount=1 reference=0
+The following inconsistencies were found and repaired:
+
+    0 leaked clusters
+    1 corruptions
+
+Double checking the fixed image now...
+No errors were found on the image.
+
+--- Unsigned overflow after the refblock ---
+
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=67108864 
+Repairing refcount block 1 is outside image
+ERROR could not resize image: Invalid argument
+Rebuilding refcount structure
+Repairing cluster 1 refcount=1 reference=0
+Repairing cluster 2 refcount=1 reference=0
+The following inconsistencies were found and repaired:
+
+    0 leaked clusters
+    1 corruptions
+
+Double checking the fixed image now...
+No errors were found on the image.
+*** done
diff --git a/tests/qemu-iotests/group b/tests/qemu-iotests/group
index b230996..be2054f 100644
--- a/tests/qemu-iotests/group
+++ b/tests/qemu-iotests/group
@@ -106,3 +106,4 @@
 103 rw auto quick
 104 rw auto
 105 rw auto quick
+108 rw auto quick
-- 
2.1.2

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] [PATCH v6 04/11] qcow2: Pull check_refblocks() up
  2014-10-20 14:35 ` [Qemu-devel] [PATCH v6 04/11] qcow2: Pull check_refblocks() up Max Reitz
@ 2014-10-20 15:07   ` Kevin Wolf
  0 siblings, 0 replies; 30+ messages in thread
From: Kevin Wolf @ 2014-10-20 15:07 UTC (permalink / raw)
  To: Max Reitz; +Cc: qemu-devel, Stefan Hajnoczi, Benoît Canet

Am 20.10.2014 um 16:35 hat Max Reitz geschrieben:
> Pull check_refblocks() before calculate_refcounts() so we can drop its
> static declaration.
> 
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> Reviewed-by: Benoît Canet <benoit.canet@nodalink.com>

Reviewed-by: Kevin Wolf <kwolf@redhat.com>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] [PATCH v6 05/11] qcow2: Reuse refcount table in calculate_refcounts()
  2014-10-20 14:35 ` [Qemu-devel] [PATCH v6 05/11] qcow2: Reuse refcount table in calculate_refcounts() Max Reitz
@ 2014-10-20 15:09   ` Kevin Wolf
  0 siblings, 0 replies; 30+ messages in thread
From: Kevin Wolf @ 2014-10-20 15:09 UTC (permalink / raw)
  To: Max Reitz; +Cc: qemu-devel, Stefan Hajnoczi, Benoît Canet

Am 20.10.2014 um 16:35 hat Max Reitz geschrieben:
> We will later call calculate_refcounts multiple times, so reuse the
> refcount table if possible.
> 
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> Reviewed-by: Benoît Canet <benoit.canet@nodalink.com>

Reviewed-by: Kevin Wolf <kwolf@redhat.com>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] [PATCH v6 06/11] qcow2: Fix refcount blocks beyond image end
  2014-10-20 14:35 ` [Qemu-devel] [PATCH v6 06/11] qcow2: Fix refcount blocks beyond image end Max Reitz
@ 2014-10-20 16:44   ` Kevin Wolf
  2014-10-21  7:14     ` Max Reitz
  0 siblings, 1 reply; 30+ messages in thread
From: Kevin Wolf @ 2014-10-20 16:44 UTC (permalink / raw)
  To: Max Reitz; +Cc: qemu-devel, Stefan Hajnoczi, Benoît Canet

Am 20.10.2014 um 16:35 hat Max Reitz geschrieben:
> If the qcow2 check function detects a refcount block located beyond the
> image end, grow the image appropriately. This cannot break anything and
> is the logical fix for such a case.
> 
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> Reviewed-by: Eric Blake <eblake@redhat.com>
> ---
>  block/qcow2-refcount.c | 62 ++++++++++++++++++++++++++++++++++++++++++++++----
>  1 file changed, 58 insertions(+), 4 deletions(-)
> 
> diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
> index 55a539f..0225769 100644
> --- a/block/qcow2-refcount.c
> +++ b/block/qcow2-refcount.c
> @@ -1544,7 +1544,8 @@ static int check_refblocks(BlockDriverState *bs, BdrvCheckResult *res,
>                             int64_t *nb_clusters)
>  {
>      BDRVQcowState *s = bs->opaque;
> -    int64_t i;
> +    int64_t i, size;
> +    int ret;
>  
>      for(i = 0; i < s->refcount_table_size; i++) {
>          uint64_t offset, cluster;
> @@ -1560,9 +1561,62 @@ static int check_refblocks(BlockDriverState *bs, BdrvCheckResult *res,
>          }
>  
>          if (cluster >= *nb_clusters) {
> -            fprintf(stderr, "ERROR refcount block %" PRId64
> -                    " is outside image\n", i);
> -            res->corruptions++;
> +            fprintf(stderr, "%s refcount block %" PRId64 " is outside image\n",
> +                    fix & BDRV_FIX_ERRORS ? "Repairing" : "ERROR", i);
> +
> +            if (fix & BDRV_FIX_ERRORS) {
> +                int64_t old_nb_clusters = *nb_clusters;
> +
> +                if (offset + s->cluster_size < offset ||
> +                    offset + s->cluster_size > INT64_MAX)

I _think_ this code is correct because offset is unsigned, but it would
be easier and not rely on overflow semantics like this:

    if (offset > INT64_MAX - s->cluster_size)

> +                {
> +                    ret = -EINVAL;
> +                    goto resize_fail;
> +                }
> +
> +                ret = bdrv_truncate(bs->file, offset + s->cluster_size);
> +                if (ret < 0) {
> +                    goto resize_fail;
> +                }
> +                size = bdrv_getlength(bs->file);
> +                if (size < 0) {
> +                    ret = size;
> +                    goto resize_fail;
> +                }
> +
> +                *nb_clusters = size_to_clusters(s, size);
> +                assert(*nb_clusters >= old_nb_clusters);
> +
> +                *refcount_table = g_try_realloc(*refcount_table,
> +                        *nb_clusters * sizeof(uint16_t));
> +                if (!*refcount_table) {
> +                    res->check_errors++;
> +                    return -ENOMEM;
> +                }
> +
> +                memset(*refcount_table + old_nb_clusters, 0,
> +                       (*nb_clusters - old_nb_clusters) * sizeof(uint16_t));

Considering the comments you got in previous comments,
sizeof(**refcount_table) might make it more obvious what's going on, and
would also be more robust against later changes of the type.

> +                if (cluster >= *nb_clusters) {
> +                    ret = -EINVAL;
> +                    goto resize_fail;
> +                }
> +
> +                res->corruptions_fixed++;
> +                inc_refcounts(bs, res, *refcount_table, *nb_clusters,
> +                              offset, s->cluster_size);
> +                /* No need to check whether the refcount is now greater than 1:
> +                 * This area was just allocated and zeroed, so it can only be
> +                 * exactly 1 after inc_refcounts() */
> +                continue;
> +
> +resize_fail:
> +                res->corruptions++;
> +                fprintf(stderr, "ERROR could not resize image: %s\n",
> +                        strerror(-ret));
> +            } else {
> +                res->corruptions++;
> +            }
>              continue;
>          }

Reviewed-by: Kevin Wolf <kwolf@redhat.com>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] [PATCH v6 06/11] qcow2: Fix refcount blocks beyond image end
  2014-10-20 16:44   ` Kevin Wolf
@ 2014-10-21  7:14     ` Max Reitz
  0 siblings, 0 replies; 30+ messages in thread
From: Max Reitz @ 2014-10-21  7:14 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: qemu-devel, Stefan Hajnoczi, Benoît Canet

On 2014-10-20 at 18:44, Kevin Wolf wrote:
> Am 20.10.2014 um 16:35 hat Max Reitz geschrieben:
>> If the qcow2 check function detects a refcount block located beyond the
>> image end, grow the image appropriately. This cannot break anything and
>> is the logical fix for such a case.
>>
>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>> Reviewed-by: Eric Blake <eblake@redhat.com>
>> ---
>>   block/qcow2-refcount.c | 62 ++++++++++++++++++++++++++++++++++++++++++++++----
>>   1 file changed, 58 insertions(+), 4 deletions(-)
>>
>> diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
>> index 55a539f..0225769 100644
>> --- a/block/qcow2-refcount.c
>> +++ b/block/qcow2-refcount.c
>> @@ -1544,7 +1544,8 @@ static int check_refblocks(BlockDriverState *bs, BdrvCheckResult *res,
>>                              int64_t *nb_clusters)
>>   {
>>       BDRVQcowState *s = bs->opaque;
>> -    int64_t i;
>> +    int64_t i, size;
>> +    int ret;
>>   
>>       for(i = 0; i < s->refcount_table_size; i++) {
>>           uint64_t offset, cluster;
>> @@ -1560,9 +1561,62 @@ static int check_refblocks(BlockDriverState *bs, BdrvCheckResult *res,
>>           }
>>   
>>           if (cluster >= *nb_clusters) {
>> -            fprintf(stderr, "ERROR refcount block %" PRId64
>> -                    " is outside image\n", i);
>> -            res->corruptions++;
>> +            fprintf(stderr, "%s refcount block %" PRId64 " is outside image\n",
>> +                    fix & BDRV_FIX_ERRORS ? "Repairing" : "ERROR", i);
>> +
>> +            if (fix & BDRV_FIX_ERRORS) {
>> +                int64_t old_nb_clusters = *nb_clusters;
>> +
>> +                if (offset + s->cluster_size < offset ||
>> +                    offset + s->cluster_size > INT64_MAX)
> I _think_ this code is correct because offset is unsigned, but it would
> be easier and not rely on overflow semantics like this:
>
>      if (offset > INT64_MAX - s->cluster_size)

Okay.

>> +                {
>> +                    ret = -EINVAL;
>> +                    goto resize_fail;
>> +                }
>> +
>> +                ret = bdrv_truncate(bs->file, offset + s->cluster_size);
>> +                if (ret < 0) {
>> +                    goto resize_fail;
>> +                }
>> +                size = bdrv_getlength(bs->file);
>> +                if (size < 0) {
>> +                    ret = size;
>> +                    goto resize_fail;
>> +                }
>> +
>> +                *nb_clusters = size_to_clusters(s, size);
>> +                assert(*nb_clusters >= old_nb_clusters);
>> +
>> +                *refcount_table = g_try_realloc(*refcount_table,
>> +                        *nb_clusters * sizeof(uint16_t));
>> +                if (!*refcount_table) {
>> +                    res->check_errors++;
>> +                    return -ENOMEM;
>> +                }
>> +
>> +                memset(*refcount_table + old_nb_clusters, 0,
>> +                       (*nb_clusters - old_nb_clusters) * sizeof(uint16_t));
> Considering the comments you got in previous comments,
> sizeof(**refcount_table) might make it more obvious what's going on, and
> would also be more robust against later changes of the type.

I highly doubt I'll change the type so that sizeof(**refcount_table) 
works. If variable refcounts are implemented, I'll probably make 
*refcount_table just a void * and then access its elements through a 
function; sadly, qemu's make doesn't error out against sizeof(void) 
(which would be -Wpointer-arith or just -pedantic), so just using 
sizeof(**refcount_table) so that everything throws an error once it's a 
void ** won't work.

I guess I'll change it in a v7 because sizeof(**refcount_table) is 
easier to grep for than sizeof(uint16_t).

>> +                if (cluster >= *nb_clusters) {
>> +                    ret = -EINVAL;
>> +                    goto resize_fail;
>> +                }
>> +
>> +                res->corruptions_fixed++;
>> +                inc_refcounts(bs, res, *refcount_table, *nb_clusters,
>> +                              offset, s->cluster_size);
>> +                /* No need to check whether the refcount is now greater than 1:
>> +                 * This area was just allocated and zeroed, so it can only be
>> +                 * exactly 1 after inc_refcounts() */
>> +                continue;
>> +
>> +resize_fail:
>> +                res->corruptions++;
>> +                fprintf(stderr, "ERROR could not resize image: %s\n",
>> +                        strerror(-ret));
>> +            } else {
>> +                res->corruptions++;
>> +            }
>>               continue;
>>           }
> Reviewed-by: Kevin Wolf <kwolf@redhat.com>

Thanks!

Max

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] [PATCH v6 07/11] qcow2: Do not perform potentially damaging repairs
  2014-10-20 14:35 ` [Qemu-devel] [PATCH v6 07/11] qcow2: Do not perform potentially damaging repairs Max Reitz
@ 2014-10-21  7:52   ` Kevin Wolf
  2014-10-21 10:10     ` Max Reitz
  0 siblings, 1 reply; 30+ messages in thread
From: Kevin Wolf @ 2014-10-21  7:52 UTC (permalink / raw)
  To: Max Reitz; +Cc: qemu-devel, Stefan Hajnoczi, Benoît Canet

Am 20.10.2014 um 16:35 hat Max Reitz geschrieben:
> If a referenced cluster has a refcount of 0, increasing its refcount may
> result in clusters being allocated for the refcount structures. This may
> overwrite the referenced cluster, therefore we cannot simply increase
> the refcount then.
> 
> In such cases, we can either try to replicate all the refcount
> operations solely for the check operation, basing the allocations on the
> in-memory refcount table; or we can simply rebuild the whole refcount
> structure based on the in-memory refcount table. Since the latter will
> be much easier, do that.
> 
> To prepare for this, introduce a "rebuild" boolean which should be set
> to true whenever a fix is rather dangerous or too complicated using the
> current refcount structures. Another example for this is refcount blocks
> being referenced more than once.
> 
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> Reviewed-by: Benoît Canet <benoit.canet@nodalink.com>

> @@ -1557,6 +1442,7 @@ static int check_refblocks(BlockDriverState *bs, BdrvCheckResult *res,
>              fprintf(stderr, "ERROR refcount block %" PRId64 " is not "
>                  "cluster aligned; refcount table entry corrupted\n", i);
>              res->corruptions++;
> +            *rebuild = true;
>              continue;
>          }
>  
> @@ -1612,6 +1498,7 @@ static int check_refblocks(BlockDriverState *bs, BdrvCheckResult *res,
>  
>  resize_fail:
>                  res->corruptions++;
> +                *rebuild = true;
>                  fprintf(stderr, "ERROR could not resize image: %s\n",
>                          strerror(-ret));
>              } else {

Increasing res->corruptions in this patch looks right because you don't
actually rebuild anything yet. However, at the end of the series this
code still looks the same - shouldn't we somehow convert those the
corruptions caused by wrong refcounts into res->corruptions_fixed?

Hm... It seems that you do this by storing intermediate results in
qcow2_check_refcounts(), which assumes that compare_refcounts() finds
only refcount problems, and no other place can find any. This may be
true, but wouldn't it be more elegant to have a separate counter for
corruptions that will be fixed with a rebuild? We can use some
Qcow2CheckResult instead of BdrvCheckResult internally and add more
fields there.

Also, I don't think all "ERROR" messages correctly say "Repairing" again
at the end of the series.

> @@ -1624,40 +1511,10 @@ resize_fail:
>              inc_refcounts(bs, res, *refcount_table, *nb_clusters,
>                  offset, s->cluster_size);
>              if ((*refcount_table)[cluster] != 1) {
> -                fprintf(stderr, "%s refcount block %" PRId64
> -                    " refcount=%d\n",
> -                    fix & BDRV_FIX_ERRORS ? "Repairing" :
> -                                            "ERROR",
> -                    i, (*refcount_table)[cluster]);
> -
> -                if (fix & BDRV_FIX_ERRORS) {
> -                    int64_t new_offset;
> -
> -                    new_offset = realloc_refcount_block(bs, i, offset);
> -                    if (new_offset < 0) {
> -                        res->corruptions++;
> -                        continue;
> -                    }
> -
> -                    /* update refcounts */
> -                    if ((new_offset >> s->cluster_bits) >= *nb_clusters) {
> -                        /* increase refcount_table size if necessary */
> -                        int old_nb_clusters = *nb_clusters;
> -                        *nb_clusters = (new_offset >> s->cluster_bits) + 1;
> -                        *refcount_table = g_renew(uint16_t, *refcount_table,
> -                                                  *nb_clusters);
> -                        memset(&(*refcount_table)[old_nb_clusters], 0,
> -                               (*nb_clusters - old_nb_clusters) *
> -                               sizeof(uint16_t));
> -                    }
> -                    (*refcount_table)[cluster]--;
> -                    inc_refcounts(bs, res, *refcount_table, *nb_clusters,
> -                            new_offset, s->cluster_size);
> -
> -                    res->corruptions_fixed++;
> -                } else {
> -                    res->corruptions++;
> -                }
> +                fprintf(stderr, "ERROR refcount block %" PRId64
> +                        " refcount=%d\n", i, (*refcount_table)[cluster]);
> +                res->corruptions++;
> +                *rebuild = true;
>              }
>          }
>      }
> @@ -1669,8 +1526,8 @@ resize_fail:
>   * Calculates an in-memory refcount table.
>   */
>  static int calculate_refcounts(BlockDriverState *bs, BdrvCheckResult *res,
> -                               BdrvCheckMode fix, uint16_t **refcount_table,
> -                               int64_t *nb_clusters)
> +                               BdrvCheckMode fix, bool *rebuild,
> +                               uint16_t **refcount_table, int64_t *nb_clusters)
>  {
>      BDRVQcowState *s = bs->opaque;
>      int64_t i;
> @@ -1713,7 +1570,7 @@ static int calculate_refcounts(BlockDriverState *bs, BdrvCheckResult *res,
>          s->refcount_table_offset,
>          s->refcount_table_size * sizeof(uint64_t));
>  
> -    return check_refblocks(bs, res, fix, refcount_table, nb_clusters);
> +    return check_refblocks(bs, res, fix, rebuild, refcount_table, nb_clusters);
>  }
>  
>  /*
> @@ -1721,7 +1578,8 @@ static int calculate_refcounts(BlockDriverState *bs, BdrvCheckResult *res,
>   * refcount as reported by the refcount structures on-disk.
>   */
>  static void compare_refcounts(BlockDriverState *bs, BdrvCheckResult *res,
> -                              BdrvCheckMode fix, int64_t *highest_cluster,
> +                              BdrvCheckMode fix, bool *rebuild,
> +                              int64_t *highest_cluster,
>                                uint16_t *refcount_table, int64_t nb_clusters)
>  {
>      BDRVQcowState *s = bs->opaque;
> @@ -1748,10 +1606,15 @@ static void compare_refcounts(BlockDriverState *bs, BdrvCheckResult *res,
>              int *num_fixed = NULL;
>              if (refcount1 > refcount2 && (fix & BDRV_FIX_LEAKS)) {
>                  num_fixed = &res->leaks_fixed;
> -            } else if (refcount1 < refcount2 && (fix & BDRV_FIX_ERRORS)) {
> +            } else if (refcount1 > 0 && refcount1 < refcount2 &&
> +                       (fix & BDRV_FIX_ERRORS)) {
>                  num_fixed = &res->corruptions_fixed;
>              }
>  
> +            if (refcount1 == 0) {
> +                *rebuild = true;
> +            }

Why isn't this just another else if branch? Possibly even as the first
or second condition, so that you don't have to add 'refcount1 > 0' for
the other branch.

> +
>              fprintf(stderr, "%s cluster %" PRId64 " refcount=%d reference=%d\n",
>                     num_fixed != NULL     ? "Repairing" :
>                     refcount1 < refcount2 ? "ERROR" :
> @@ -1790,6 +1653,7 @@ int qcow2_check_refcounts(BlockDriverState *bs, BdrvCheckResult *res,
>      BDRVQcowState *s = bs->opaque;
>      int64_t size, highest_cluster, nb_clusters;
>      uint16_t *refcount_table = NULL;
> +    bool rebuild = false;
>      int ret;
>  
>      size = bdrv_getlength(bs->file);
> @@ -1807,14 +1671,19 @@ int qcow2_check_refcounts(BlockDriverState *bs, BdrvCheckResult *res,
>      res->bfi.total_clusters =
>          size_to_clusters(s, bs->total_sectors * BDRV_SECTOR_SIZE);
>  
> -    ret = calculate_refcounts(bs, res, fix, &refcount_table, &nb_clusters);
> +    ret = calculate_refcounts(bs, res, fix, &rebuild, &refcount_table,
> +                              &nb_clusters);
>      if (ret < 0) {
>          goto fail;
>      }
>  
> -    compare_refcounts(bs, res, fix, &highest_cluster, refcount_table,
> +    compare_refcounts(bs, res, fix, &rebuild, &highest_cluster, refcount_table,
>                        nb_clusters);
>  
> +    if (rebuild) {
> +        fprintf(stderr, "ERROR need to rebuild refcount structures\n");
> +    }

Should this increate res->check_errors? (It doesn't survive until the
end of the series anyway, but more places seem to be added later that
print an error message without increasing the error count.)

Kevin

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] [PATCH v6 08/11] qcow2: Rebuild refcount structure during check
  2014-10-20 14:35 ` [Qemu-devel] [PATCH v6 08/11] qcow2: Rebuild refcount structure during check Max Reitz
@ 2014-10-21  9:31   ` Kevin Wolf
  2014-10-21  9:52     ` Max Reitz
  0 siblings, 1 reply; 30+ messages in thread
From: Kevin Wolf @ 2014-10-21  9:31 UTC (permalink / raw)
  To: Max Reitz; +Cc: qemu-devel, Stefan Hajnoczi, Benoît Canet

Am 20.10.2014 um 16:35 hat Max Reitz geschrieben:
> The previous commit introduced the "rebuild" variable to qcow2's
> implementation of the image consistency check. Now make use of this by
> adding a function which creates a completely new refcount structure
> based solely on the in-memory information gathered before.
> 
> The old refcount structure will be leaked, however. This leak will be
> dealt with in a follow-up commit.
> 
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>  block/qcow2-refcount.c | 296 ++++++++++++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 293 insertions(+), 3 deletions(-)
> 
> diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
> index 183fc5b..75e726b 100644
> --- a/block/qcow2-refcount.c
> +++ b/block/qcow2-refcount.c
> @@ -1642,6 +1642,276 @@ static void compare_refcounts(BlockDriverState *bs, BdrvCheckResult *res,
>  }
>  
>  /*
> + * Allocates a cluster using an in-memory refcount table (IMRT) in contrast to

s/a cluster/clusters/

> + * the on-disk refcount structures.
> + *
> + * On input, *first_free_cluster tells where to start looking, and need not
> + * actually be a free cluster; the returned offset will not be before that
> + * cluster.  On output, *first_free_cluster points to the first gap found, even
> + * if that gap was too small to be used as the returned offset.
> + *
> + * Note that *first_free_cluster is a cluster index whereas the return value is
> + * an offset.
> + */
> +static int64_t alloc_clusters_imrt(BlockDriverState *bs,
> +                                   int cluster_count,
> +                                   uint16_t **refcount_table,
> +                                   int64_t *nb_clusters,

Having cluster_count and nb_clusters at the same time is confusing.

Maybe imrt_nb_clusters for this one?

> +                                   int64_t *first_free_cluster)
> +{
> +    BDRVQcowState *s = bs->opaque;
> +    int64_t cluster = *first_free_cluster, i;
> +    bool first_gap = true;
> +    int contiguous_free_clusters;
> +
> +    /* Starting at *first_free_cluster, find a range of at least cluster_count
> +     * continuously free clusters */
> +    for (contiguous_free_clusters = 0;
> +         cluster < *nb_clusters && contiguous_free_clusters < cluster_count;
> +         cluster++)
> +    {
> +        if (!(*refcount_table)[cluster]) {
> +            contiguous_free_clusters++;
> +            if (first_gap) {
> +                /* If this is the first free cluster found, update
> +                 * *first_free_cluster accordingly */
> +                *first_free_cluster = cluster;
> +                first_gap = false;
> +            }
> +        } else if (contiguous_free_clusters) {
> +            contiguous_free_clusters = 0;
> +        }
> +    }
> +
> +    /* If contiguous_free_clusters is greater than zero, it contains the number
> +     * of continuously free clusters until the current cluster; the first free
> +     * cluster in the current "gap" is therefore
> +     * cluster - contiguous_free_clusters */
> +
> +    /* If no such range could be found, grow the in-memory refcount table
> +     * accordingly to append free clusters at the end of the image */
> +    if (contiguous_free_clusters < cluster_count) {
> +        int64_t old_nb_clusters = *nb_clusters;
> +
> +        /* contiguous_free_clusters clusters are already empty at the image end;
> +         * we need cluster_count clusters; therefore, we have to allocate
> +         * cluster_count - contiguous_free_clusters new clusters at the end of
> +         * the image (which is the current value of cluster; note that cluster
> +         * may exceed old_nb_clusters if *first_free_cluster pointed beyond the
> +         * image end) */
> +        *nb_clusters = cluster + cluster_count - contiguous_free_clusters;
> +        *refcount_table = g_try_realloc(*refcount_table,
> +                                        *nb_clusters * sizeof(uint16_t));
> +        if (!*refcount_table) {
> +            return -ENOMEM;

This means that on failure the passed refcount table pointer may be
overwritten by NULL. This is a surprising interface and should at least
be mentioned in the function comment.

But as the original IMRT is leaked here, too, it's probably better to
change the interface to leave *refcount_table alone if the failure case.

> +        }
> +
> +        memset(*refcount_table + old_nb_clusters, 0,
> +               (*nb_clusters - old_nb_clusters) * sizeof(uint16_t));
> +    }
> +
> +    /* Go back to the first free cluster */
> +    cluster -= contiguous_free_clusters;
> +    for (i = 0; i < cluster_count; i++) {
> +        (*refcount_table)[cluster + i] = 1;
> +    }
> +
> +    return cluster << s->cluster_bits;
> +}
> +
> +/*
> + * Creates a new refcount structure based solely on the in-memory information
> + * given through *refcount_table. All necessary allocations will be reflected
> + * in that array.
> + *
> + * On success, the old refcount structure is leaked (it will be covered by the
> + * new refcount structure).
> + */
> +static int rebuild_refcount_structure(BlockDriverState *bs,
> +                                      BdrvCheckResult *res,
> +                                      uint16_t **refcount_table,
> +                                      int64_t *nb_clusters)
> +{
> +    BDRVQcowState *s = bs->opaque;
> +    int64_t first_free_cluster = 0, reftable_offset = -1, cluster = 0;
> +    int64_t refblock_offset, refblock_start, refblock_index;
> +    uint32_t reftable_size = 0;
> +    uint64_t *reftable = NULL;

refcount_table and reftable? Seriously?

> +    uint16_t *on_disk_refblock;
> +    int i, ret = 0;
> +    struct {
> +        uint64_t reftable_offset;
> +        uint32_t reftable_clusters;
> +    } QEMU_PACKED reftable_offset_and_clusters;
> +
> +    qcow2_cache_empty(bs, s->refcount_block_cache);
> +
> +write_refblocks:
> +    for (; cluster < *nb_clusters; cluster++) {
> +        if (!(*refcount_table)[cluster]) {
> +            continue;
> +        }
> +
> +        refblock_index = cluster >> s->refcount_block_bits;
> +        refblock_start = refblock_index << s->refcount_block_bits;
> +
> +        /* Don't allocate a cluster in a refblock already written to disk */
> +        if (first_free_cluster < refblock_start) {
> +            first_free_cluster = refblock_start;
> +        }
> +        refblock_offset = alloc_clusters_imrt(bs, 1, refcount_table,
> +                                              nb_clusters, &first_free_cluster);
> +        if (refblock_offset < 0) {
> +            fprintf(stderr, "ERROR allocating refblock: %s\n",
> +                    strerror(-refblock_offset));
> +            res->check_errors++;
> +            ret = refblock_offset;
> +            goto fail;
> +        }
> +
> +        if (reftable_size <= refblock_index) {
> +            uint32_t old_rt_size = reftable_size;
> +            reftable_size = ROUND_UP((refblock_index + 1) * sizeof(uint64_t),
> +                                     s->cluster_size) / sizeof(uint64_t);
> +            reftable = g_try_realloc(reftable,
> +                                     reftable_size * sizeof(uint64_t));

Another leak here.

> +            if (!reftable) {
> +                res->check_errors++;
> +                ret = -ENOMEM;
> +                goto fail;
> +            }
> +
> +            memset(reftable + old_rt_size, 0,
> +                   (reftable_size - old_rt_size) * sizeof(uint64_t));
> +
> +            /* The offset we have for the reftable is now no longer valid;
> +             * this will leak that range, but we can easily fix that by running
> +             * a leak-fixing check after this rebuild operation */
> +            reftable_offset = -1;
> +        }
> +        reftable[refblock_index] = refblock_offset;
> +
> +        /* If this is apparently the last refblock (for now), try to squeeze the
> +         * reftable in */
> +        if (refblock_index == (*nb_clusters - 1) >> s->refcount_block_bits &&
> +            reftable_offset < 0)
> +        {
> +            uint64_t reftable_clusters = size_to_clusters(s, reftable_size *
> +                                                          sizeof(uint64_t));
> +            reftable_offset = alloc_clusters_imrt(bs, reftable_clusters,
> +                                                  refcount_table, nb_clusters,
> +                                                  &first_free_cluster);
> +            if (reftable_offset < 0) {
> +                fprintf(stderr, "ERROR allocating reftable: %s\n",
> +                        strerror(-reftable_offset));
> +                res->check_errors++;
> +                ret = reftable_offset;
> +                goto fail;
> +            }
> +        }
> +
> +        ret = qcow2_pre_write_overlap_check(bs, 0, refblock_offset,
> +                                            s->cluster_size);
> +        if (ret < 0) {
> +            fprintf(stderr, "ERROR writing refblock: %s\n", strerror(-ret));
> +            goto fail;
> +        }
> +
> +        on_disk_refblock = g_malloc0(s->cluster_size);

qemu_blockalign?

> +        for (i = 0; i < s->cluster_size / sizeof(uint16_t) &&
> +                    refblock_start + i < *nb_clusters; i++)
> +        {
> +            on_disk_refblock[i] =
> +                cpu_to_be16((*refcount_table)[refblock_start + i]);
> +        }
> +
> +        ret = bdrv_write(bs->file, refblock_offset / BDRV_SECTOR_SIZE,
> +                         (void *)on_disk_refblock, s->cluster_sectors);
> +        g_free(on_disk_refblock);
> +        if (ret < 0) {
> +            fprintf(stderr, "ERROR writing refblock: %s\n", strerror(-ret));
> +            goto fail;
> +        }
> +
> +        /* Go to the end of this refblock */
> +        cluster = refblock_start + s->cluster_size / sizeof(uint16_t) - 1;
> +    }
> +
> +    if (reftable_offset < 0) {
> +        uint64_t post_refblock_start, reftable_clusters;
> +
> +        post_refblock_start = ROUND_UP(*nb_clusters,
> +                                       s->cluster_size / sizeof(uint16_t));
> +        reftable_clusters = size_to_clusters(s,
> +                                             reftable_size * sizeof(uint64_t));
> +        /* Not pretty but simple */
> +        if (first_free_cluster < post_refblock_start) {
> +            first_free_cluster = post_refblock_start;
> +        }
> +        reftable_offset = alloc_clusters_imrt(bs, reftable_clusters,
> +                                              refcount_table, nb_clusters,
> +                                              &first_free_cluster);
> +        if (reftable_offset < 0) {
> +            fprintf(stderr, "ERROR allocating reftable: %s\n",
> +                    strerror(-reftable_offset));
> +            res->check_errors++;
> +            ret = reftable_offset;
> +            goto fail;
> +        }
> +
> +        goto write_refblocks;
> +    }

Ouch. :-)

Well, it should work.

> +    assert(reftable);
> +
> +    for (refblock_index = 0; refblock_index < reftable_size; refblock_index++) {
> +        cpu_to_be64s(&reftable[refblock_index]);
> +    }
> +
> +    ret = qcow2_pre_write_overlap_check(bs, 0, reftable_offset,
> +                                        reftable_size * sizeof(uint64_t));
> +    if (ret < 0) {
> +        fprintf(stderr, "ERROR writing reftable: %s\n", strerror(-ret));
> +        goto fail;
> +    }
> +
> +    ret = bdrv_write(bs->file, reftable_offset / BDRV_SECTOR_SIZE,
> +                     (void *)reftable,
> +                     reftable_size * sizeof(uint64_t) / BDRV_SECTOR_SIZE);

Why not bdrv_pwrite when you only have byte offset and length?

> +    if (ret < 0) {
> +        fprintf(stderr, "ERROR writing reftable: %s\n", strerror(-ret));
> +        goto fail;
> +    }
> +
> +    /* Enter new reftable into the image header */
> +    cpu_to_be64w(&reftable_offset_and_clusters.reftable_offset,
> +                 reftable_offset);
> +    cpu_to_be32w(&reftable_offset_and_clusters.reftable_clusters,
> +                 size_to_clusters(s, reftable_size * sizeof(uint64_t)));
> +    ret = bdrv_pwrite_sync(bs->file, offsetof(QCowHeader,
> +                                              refcount_table_offset),
> +                           &reftable_offset_and_clusters,
> +                           sizeof(reftable_offset_and_clusters));
> +    if (ret < 0) {
> +        fprintf(stderr, "ERROR setting reftable: %s\n", strerror(-ret));
> +        goto fail;
> +    }
> +
> +    for (refblock_index = 0; refblock_index < reftable_size; refblock_index++) {
> +        be64_to_cpus(&reftable[refblock_index]);
> +    }
> +    s->refcount_table = reftable;
> +    s->refcount_table_offset = reftable_offset;
> +    s->refcount_table_size = reftable_size;
> +
> +    return 0;
> +
> +fail:
> +    g_free(reftable);
> +    return ret;
> +}
> +
> +/*
>   * Checks an image for refcount consistency.
>   *
>   * Returns 0 if no errors are found, the number of errors in case the image is
> @@ -1651,6 +1921,7 @@ int qcow2_check_refcounts(BlockDriverState *bs, BdrvCheckResult *res,
>                            BdrvCheckMode fix)
>  {
>      BDRVQcowState *s = bs->opaque;
> +    BdrvCheckResult pre_compare_res;
>      int64_t size, highest_cluster, nb_clusters;
>      uint16_t *refcount_table = NULL;
>      bool rebuild = false;
> @@ -1677,11 +1948,30 @@ int qcow2_check_refcounts(BlockDriverState *bs, BdrvCheckResult *res,
>          goto fail;
>      }
>  
> -    compare_refcounts(bs, res, fix, &rebuild, &highest_cluster, refcount_table,
> +    /* In case we don't need to rebuild the refcount structure (but want to fix
> +     * something), this function is immediately called again, in which case the
> +     * result should be ignored */
> +    pre_compare_res = *res;
> +    compare_refcounts(bs, res, 0, &rebuild, &highest_cluster, refcount_table,
>                        nb_clusters);
>  
> -    if (rebuild) {
> -        fprintf(stderr, "ERROR need to rebuild refcount structures\n");
> +    if (rebuild && (fix & BDRV_FIX_ERRORS)) {
> +        fprintf(stderr, "Rebuilding refcount structure\n");
> +        ret = rebuild_refcount_structure(bs, res, &refcount_table,
> +                                         &nb_clusters);
> +        if (ret < 0) {
> +            goto fail;
> +        }
> +    } else if (fix) {
> +        if (rebuild) {
> +            fprintf(stderr, "ERROR need to rebuild refcount structures\n");

Is it safe in this case to continue with fixing leaks? Should some error
counter be increased?

> +        }
> +
> +        if (res->leaks || res->corruptions) {
> +            *res = pre_compare_res;
> +            compare_refcounts(bs, res, fix, &rebuild, &highest_cluster,
> +                              refcount_table, nb_clusters);
> +        }
>      }

Kevin

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] [PATCH v6 08/11] qcow2: Rebuild refcount structure during check
  2014-10-21  9:31   ` Kevin Wolf
@ 2014-10-21  9:52     ` Max Reitz
  2014-10-21 10:12       ` Kevin Wolf
  0 siblings, 1 reply; 30+ messages in thread
From: Max Reitz @ 2014-10-21  9:52 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: qemu-devel, Stefan Hajnoczi, Benoît Canet

On 2014-10-21 at 11:31, Kevin Wolf wrote:
> Am 20.10.2014 um 16:35 hat Max Reitz geschrieben:
>> The previous commit introduced the "rebuild" variable to qcow2's
>> implementation of the image consistency check. Now make use of this by
>> adding a function which creates a completely new refcount structure
>> based solely on the in-memory information gathered before.
>>
>> The old refcount structure will be leaked, however. This leak will be
>> dealt with in a follow-up commit.
>>
>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>> ---
>>   block/qcow2-refcount.c | 296 ++++++++++++++++++++++++++++++++++++++++++++++++-
>>   1 file changed, 293 insertions(+), 3 deletions(-)
>>
>> diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
>> index 183fc5b..75e726b 100644
>> --- a/block/qcow2-refcount.c
>> +++ b/block/qcow2-refcount.c
>> @@ -1642,6 +1642,276 @@ static void compare_refcounts(BlockDriverState *bs, BdrvCheckResult *res,
>>   }
>>   
>>   /*
>> + * Allocates a cluster using an in-memory refcount table (IMRT) in contrast to
> s/a cluster/clusters/
>
>> + * the on-disk refcount structures.
>> + *
>> + * On input, *first_free_cluster tells where to start looking, and need not
>> + * actually be a free cluster; the returned offset will not be before that
>> + * cluster.  On output, *first_free_cluster points to the first gap found, even
>> + * if that gap was too small to be used as the returned offset.
>> + *
>> + * Note that *first_free_cluster is a cluster index whereas the return value is
>> + * an offset.
>> + */
>> +static int64_t alloc_clusters_imrt(BlockDriverState *bs,
>> +                                   int cluster_count,
>> +                                   uint16_t **refcount_table,
>> +                                   int64_t *nb_clusters,
> Having cluster_count and nb_clusters at the same time is confusing.
>
> Maybe imrt_nb_clusters for this one?

Sounds good.

>> +                                   int64_t *first_free_cluster)
>> +{
>> +    BDRVQcowState *s = bs->opaque;
>> +    int64_t cluster = *first_free_cluster, i;
>> +    bool first_gap = true;
>> +    int contiguous_free_clusters;
>> +
>> +    /* Starting at *first_free_cluster, find a range of at least cluster_count
>> +     * continuously free clusters */
>> +    for (contiguous_free_clusters = 0;
>> +         cluster < *nb_clusters && contiguous_free_clusters < cluster_count;
>> +         cluster++)
>> +    {
>> +        if (!(*refcount_table)[cluster]) {
>> +            contiguous_free_clusters++;
>> +            if (first_gap) {
>> +                /* If this is the first free cluster found, update
>> +                 * *first_free_cluster accordingly */
>> +                *first_free_cluster = cluster;
>> +                first_gap = false;
>> +            }
>> +        } else if (contiguous_free_clusters) {
>> +            contiguous_free_clusters = 0;
>> +        }
>> +    }
>> +
>> +    /* If contiguous_free_clusters is greater than zero, it contains the number
>> +     * of continuously free clusters until the current cluster; the first free
>> +     * cluster in the current "gap" is therefore
>> +     * cluster - contiguous_free_clusters */
>> +
>> +    /* If no such range could be found, grow the in-memory refcount table
>> +     * accordingly to append free clusters at the end of the image */
>> +    if (contiguous_free_clusters < cluster_count) {
>> +        int64_t old_nb_clusters = *nb_clusters;
>> +
>> +        /* contiguous_free_clusters clusters are already empty at the image end;
>> +         * we need cluster_count clusters; therefore, we have to allocate
>> +         * cluster_count - contiguous_free_clusters new clusters at the end of
>> +         * the image (which is the current value of cluster; note that cluster
>> +         * may exceed old_nb_clusters if *first_free_cluster pointed beyond the
>> +         * image end) */
>> +        *nb_clusters = cluster + cluster_count - contiguous_free_clusters;
>> +        *refcount_table = g_try_realloc(*refcount_table,
>> +                                        *nb_clusters * sizeof(uint16_t));
>> +        if (!*refcount_table) {
>> +            return -ENOMEM;
> This means that on failure the passed refcount table pointer may be
> overwritten by NULL. This is a surprising interface and should at least
> be mentioned in the function comment.
>
> But as the original IMRT is leaked here, too, it's probably better to
> change the interface to leave *refcount_table alone if the failure case.

Okay, I had to look it up, but realloc() does not free the given pointer 
on failure (and I assume g_try_realloc() does not either). Therefore, 
yes, we should just leave *refcount_table as it is.

>> +        }
>> +
>> +        memset(*refcount_table + old_nb_clusters, 0,
>> +               (*nb_clusters - old_nb_clusters) * sizeof(uint16_t));
>> +    }
>> +
>> +    /* Go back to the first free cluster */
>> +    cluster -= contiguous_free_clusters;
>> +    for (i = 0; i < cluster_count; i++) {
>> +        (*refcount_table)[cluster + i] = 1;
>> +    }
>> +
>> +    return cluster << s->cluster_bits;
>> +}
>> +
>> +/*
>> + * Creates a new refcount structure based solely on the in-memory information
>> + * given through *refcount_table. All necessary allocations will be reflected
>> + * in that array.
>> + *
>> + * On success, the old refcount structure is leaked (it will be covered by the
>> + * new refcount structure).
>> + */
>> +static int rebuild_refcount_structure(BlockDriverState *bs,
>> +                                      BdrvCheckResult *res,
>> +                                      uint16_t **refcount_table,
>> +                                      int64_t *nb_clusters)
>> +{
>> +    BDRVQcowState *s = bs->opaque;
>> +    int64_t first_free_cluster = 0, reftable_offset = -1, cluster = 0;
>> +    int64_t refblock_offset, refblock_start, refblock_index;
>> +    uint32_t reftable_size = 0;
>> +    uint64_t *reftable = NULL;
> refcount_table and reftable? Seriously?

Reviewing would have been too easy otherwise. *cough*

One option is s/refcount_table/imrt/. I like it more than the second 
option, but it simply goes against the current naming convention.

The other option would be s/reftable/on_disk_reftable/, following the 
convention used in the next line.

>> +    uint16_t *on_disk_refblock;
>> +    int i, ret = 0;
>> +    struct {
>> +        uint64_t reftable_offset;
>> +        uint32_t reftable_clusters;
>> +    } QEMU_PACKED reftable_offset_and_clusters;
>> +
>> +    qcow2_cache_empty(bs, s->refcount_block_cache);
>> +
>> +write_refblocks:
>> +    for (; cluster < *nb_clusters; cluster++) {
>> +        if (!(*refcount_table)[cluster]) {
>> +            continue;
>> +        }
>> +
>> +        refblock_index = cluster >> s->refcount_block_bits;
>> +        refblock_start = refblock_index << s->refcount_block_bits;
>> +
>> +        /* Don't allocate a cluster in a refblock already written to disk */
>> +        if (first_free_cluster < refblock_start) {
>> +            first_free_cluster = refblock_start;
>> +        }
>> +        refblock_offset = alloc_clusters_imrt(bs, 1, refcount_table,
>> +                                              nb_clusters, &first_free_cluster);
>> +        if (refblock_offset < 0) {
>> +            fprintf(stderr, "ERROR allocating refblock: %s\n",
>> +                    strerror(-refblock_offset));
>> +            res->check_errors++;
>> +            ret = refblock_offset;
>> +            goto fail;
>> +        }
>> +
>> +        if (reftable_size <= refblock_index) {
>> +            uint32_t old_rt_size = reftable_size;
>> +            reftable_size = ROUND_UP((refblock_index + 1) * sizeof(uint64_t),
>> +                                     s->cluster_size) / sizeof(uint64_t);
>> +            reftable = g_try_realloc(reftable,
>> +                                     reftable_size * sizeof(uint64_t));
> Another leak here.

Will fix.

>> +            if (!reftable) {
>> +                res->check_errors++;
>> +                ret = -ENOMEM;
>> +                goto fail;
>> +            }
>> +
>> +            memset(reftable + old_rt_size, 0,
>> +                   (reftable_size - old_rt_size) * sizeof(uint64_t));
>> +
>> +            /* The offset we have for the reftable is now no longer valid;
>> +             * this will leak that range, but we can easily fix that by running
>> +             * a leak-fixing check after this rebuild operation */
>> +            reftable_offset = -1;
>> +        }
>> +        reftable[refblock_index] = refblock_offset;
>> +
>> +        /* If this is apparently the last refblock (for now), try to squeeze the
>> +         * reftable in */
>> +        if (refblock_index == (*nb_clusters - 1) >> s->refcount_block_bits &&
>> +            reftable_offset < 0)
>> +        {
>> +            uint64_t reftable_clusters = size_to_clusters(s, reftable_size *
>> +                                                          sizeof(uint64_t));
>> +            reftable_offset = alloc_clusters_imrt(bs, reftable_clusters,
>> +                                                  refcount_table, nb_clusters,
>> +                                                  &first_free_cluster);
>> +            if (reftable_offset < 0) {
>> +                fprintf(stderr, "ERROR allocating reftable: %s\n",
>> +                        strerror(-reftable_offset));
>> +                res->check_errors++;
>> +                ret = reftable_offset;
>> +                goto fail;
>> +            }
>> +        }
>> +
>> +        ret = qcow2_pre_write_overlap_check(bs, 0, refblock_offset,
>> +                                            s->cluster_size);
>> +        if (ret < 0) {
>> +            fprintf(stderr, "ERROR writing refblock: %s\n", strerror(-ret));
>> +            goto fail;
>> +        }
>> +
>> +        on_disk_refblock = g_malloc0(s->cluster_size);
> qemu_blockalign?

Meh, I think I have to write a qemu_blockalign0() some time...

If I do so, it's one more patch you have to review; if I just use 
qemu_blockalign(), I find the memset() directly afterwards rather ugly, 
but would be fine with it. I guess it's up to you, then.

>> +        for (i = 0; i < s->cluster_size / sizeof(uint16_t) &&
>> +                    refblock_start + i < *nb_clusters; i++)
>> +        {
>> +            on_disk_refblock[i] =
>> +                cpu_to_be16((*refcount_table)[refblock_start + i]);
>> +        }
>> +
>> +        ret = bdrv_write(bs->file, refblock_offset / BDRV_SECTOR_SIZE,
>> +                         (void *)on_disk_refblock, s->cluster_sectors);
>> +        g_free(on_disk_refblock);
>> +        if (ret < 0) {
>> +            fprintf(stderr, "ERROR writing refblock: %s\n", strerror(-ret));
>> +            goto fail;
>> +        }
>> +
>> +        /* Go to the end of this refblock */
>> +        cluster = refblock_start + s->cluster_size / sizeof(uint16_t) - 1;
>> +    }
>> +
>> +    if (reftable_offset < 0) {
>> +        uint64_t post_refblock_start, reftable_clusters;
>> +
>> +        post_refblock_start = ROUND_UP(*nb_clusters,
>> +                                       s->cluster_size / sizeof(uint16_t));
>> +        reftable_clusters = size_to_clusters(s,
>> +                                             reftable_size * sizeof(uint64_t));
>> +        /* Not pretty but simple */
>> +        if (first_free_cluster < post_refblock_start) {
>> +            first_free_cluster = post_refblock_start;
>> +        }
>> +        reftable_offset = alloc_clusters_imrt(bs, reftable_clusters,
>> +                                              refcount_table, nb_clusters,
>> +                                              &first_free_cluster);
>> +        if (reftable_offset < 0) {
>> +            fprintf(stderr, "ERROR allocating reftable: %s\n",
>> +                    strerror(-reftable_offset));
>> +            res->check_errors++;
>> +            ret = reftable_offset;
>> +            goto fail;
>> +        }
>> +
>> +        goto write_refblocks;
>> +    }
> Ouch. :-)
>
> Well, it should work.
>
>> +    assert(reftable);
>> +
>> +    for (refblock_index = 0; refblock_index < reftable_size; refblock_index++) {
>> +        cpu_to_be64s(&reftable[refblock_index]);
>> +    }
>> +
>> +    ret = qcow2_pre_write_overlap_check(bs, 0, reftable_offset,
>> +                                        reftable_size * sizeof(uint64_t));
>> +    if (ret < 0) {
>> +        fprintf(stderr, "ERROR writing reftable: %s\n", strerror(-ret));
>> +        goto fail;
>> +    }
>> +
>> +    ret = bdrv_write(bs->file, reftable_offset / BDRV_SECTOR_SIZE,
>> +                     (void *)reftable,
>> +                     reftable_size * sizeof(uint64_t) / BDRV_SECTOR_SIZE);
> Why not bdrv_pwrite when you only have byte offset and length?

I don't like pwrite probably only because it takes an int byte length. 
reftable_size * sizeof(uint64_t) should be well below INT_MAX, but I 
don't see why we should use pwrite if we are writing full sectors to a 
sector-aligned offset.

>> +    if (ret < 0) {
>> +        fprintf(stderr, "ERROR writing reftable: %s\n", strerror(-ret));
>> +        goto fail;
>> +    }
>> +
>> +    /* Enter new reftable into the image header */
>> +    cpu_to_be64w(&reftable_offset_and_clusters.reftable_offset,
>> +                 reftable_offset);
>> +    cpu_to_be32w(&reftable_offset_and_clusters.reftable_clusters,
>> +                 size_to_clusters(s, reftable_size * sizeof(uint64_t)));
>> +    ret = bdrv_pwrite_sync(bs->file, offsetof(QCowHeader,
>> +                                              refcount_table_offset),
>> +                           &reftable_offset_and_clusters,
>> +                           sizeof(reftable_offset_and_clusters));
>> +    if (ret < 0) {
>> +        fprintf(stderr, "ERROR setting reftable: %s\n", strerror(-ret));
>> +        goto fail;
>> +    }
>> +
>> +    for (refblock_index = 0; refblock_index < reftable_size; refblock_index++) {
>> +        be64_to_cpus(&reftable[refblock_index]);
>> +    }
>> +    s->refcount_table = reftable;
>> +    s->refcount_table_offset = reftable_offset;
>> +    s->refcount_table_size = reftable_size;
>> +
>> +    return 0;
>> +
>> +fail:
>> +    g_free(reftable);
>> +    return ret;
>> +}
>> +
>> +/*
>>    * Checks an image for refcount consistency.
>>    *
>>    * Returns 0 if no errors are found, the number of errors in case the image is
>> @@ -1651,6 +1921,7 @@ int qcow2_check_refcounts(BlockDriverState *bs, BdrvCheckResult *res,
>>                             BdrvCheckMode fix)
>>   {
>>       BDRVQcowState *s = bs->opaque;
>> +    BdrvCheckResult pre_compare_res;
>>       int64_t size, highest_cluster, nb_clusters;
>>       uint16_t *refcount_table = NULL;
>>       bool rebuild = false;
>> @@ -1677,11 +1948,30 @@ int qcow2_check_refcounts(BlockDriverState *bs, BdrvCheckResult *res,
>>           goto fail;
>>       }
>>   
>> -    compare_refcounts(bs, res, fix, &rebuild, &highest_cluster, refcount_table,
>> +    /* In case we don't need to rebuild the refcount structure (but want to fix
>> +     * something), this function is immediately called again, in which case the
>> +     * result should be ignored */
>> +    pre_compare_res = *res;
>> +    compare_refcounts(bs, res, 0, &rebuild, &highest_cluster, refcount_table,
>>                         nb_clusters);
>>   
>> -    if (rebuild) {
>> -        fprintf(stderr, "ERROR need to rebuild refcount structures\n");
>> +    if (rebuild && (fix & BDRV_FIX_ERRORS)) {
>> +        fprintf(stderr, "Rebuilding refcount structure\n");
>> +        ret = rebuild_refcount_structure(bs, res, &refcount_table,
>> +                                         &nb_clusters);
>> +        if (ret < 0) {
>> +            goto fail;
>> +        }
>> +    } else if (fix) {
>> +        if (rebuild) {
>> +            fprintf(stderr, "ERROR need to rebuild refcount structures\n");
> Is it safe in this case to continue with fixing leaks? Should some error
> counter be increased?

Good point. We should bail out here. res->corruptions should be enough, 
but incrementing res->check_errors won't hurt.

>> +        }
>> +
>> +        if (res->leaks || res->corruptions) {
>> +            *res = pre_compare_res;
>> +            compare_refcounts(bs, res, fix, &rebuild, &highest_cluster,
>> +                              refcount_table, nb_clusters);
>> +        }
>>       }
> Kevin

Thanks for you review!

Max

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] [PATCH v6 09/11] qcow2: Clean up after refcount rebuild
  2014-10-20 14:35 ` [Qemu-devel] [PATCH v6 09/11] qcow2: Clean up after refcount rebuild Max Reitz
@ 2014-10-21  9:59   ` Kevin Wolf
  2014-10-21 10:16     ` Max Reitz
  0 siblings, 1 reply; 30+ messages in thread
From: Kevin Wolf @ 2014-10-21  9:59 UTC (permalink / raw)
  To: Max Reitz; +Cc: qemu-devel, Stefan Hajnoczi, Benoît Canet

Am 20.10.2014 um 16:35 hat Max Reitz geschrieben:
> Because the old refcount structure will be leaked after having rebuilt
> it, we need to recalculate the refcounts and run a leak-fixing operation
> afterwards (if leaks should be fixed at all).
> 
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> Reviewed-by: Benoît Canet <benoit.canet@nodalink.com>
> ---
>  block/qcow2-refcount.c | 35 +++++++++++++++++++++++++++++++++++
>  1 file changed, 35 insertions(+)
> 
> diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
> index 75e726b..3730be2 100644
> --- a/block/qcow2-refcount.c
> +++ b/block/qcow2-refcount.c
> @@ -1956,12 +1956,47 @@ int qcow2_check_refcounts(BlockDriverState *bs, BdrvCheckResult *res,
>                        nb_clusters);
>  
>      if (rebuild && (fix & BDRV_FIX_ERRORS)) {
> +        BdrvCheckResult old_res = *res;
> +
>          fprintf(stderr, "Rebuilding refcount structure\n");
>          ret = rebuild_refcount_structure(bs, res, &refcount_table,
>                                           &nb_clusters);
>          if (ret < 0) {
>              goto fail;
>          }
> +
> +        res->corruptions = 0;
> +        res->leaks = 0;
> +
> +        /* Because the old reftable has been exchanged for a new one the
> +         * references have to be recalculated */
> +        rebuild = false;
> +        memset(refcount_table, 0, nb_clusters * sizeof(uint16_t));
> +        ret = calculate_refcounts(bs, res, 0, &rebuild, &refcount_table,
> +                                  &nb_clusters);
> +        if (ret < 0) {
> +            goto fail;
> +        }
> +
> +        if (fix & BDRV_FIX_LEAKS) {
> +            /* The old refcount structures are now leaked, fix it; the result
> +             * can be ignored */
> +            pre_compare_res = *res;

I would prefer using another local variable here. At the first sight
it's not quite clear which references to pre_compare_res correspond to
which state.

> +            compare_refcounts(bs, res, BDRV_FIX_LEAKS, &rebuild,
> +                              &highest_cluster, refcount_table, nb_clusters);
> +            if (rebuild) {
> +                fprintf(stderr, "ERROR rebuilt refcount structure is still "
> +                        "broken\n");
> +            }
> +            *res = pre_compare_res;
> +        }
> +
> +        if (res->corruptions < old_res.corruptions) {
> +            res->corruptions_fixed += old_res.corruptions - res->corruptions;
> +        }
> +        if (res->leaks < old_res.leaks) {
> +            res->leaks_fixed += old_res.leaks - res->leaks;
> +        }

For these numbers to be accurate, don't we need to run
compare_refcounts() unconditionally and only make BDRV_FIX_LEAKS
conditional?

Now that I've read the rest of the series, comparing res and old_res
actually makes sense, so maybe it's not necessary to introduce a
Qcow2CheckResult.

>      } else if (fix) {
>          if (rebuild) {
>              fprintf(stderr, "ERROR need to rebuild refcount structures\n");
> -- 
> 2.1.2

Kevin

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] [PATCH v6 07/11] qcow2: Do not perform potentially damaging repairs
  2014-10-21  7:52   ` Kevin Wolf
@ 2014-10-21 10:10     ` Max Reitz
  0 siblings, 0 replies; 30+ messages in thread
From: Max Reitz @ 2014-10-21 10:10 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: qemu-devel, Stefan Hajnoczi, Benoît Canet

On 2014-10-21 at 09:52, Kevin Wolf wrote:
> Am 20.10.2014 um 16:35 hat Max Reitz geschrieben:
>> If a referenced cluster has a refcount of 0, increasing its refcount may
>> result in clusters being allocated for the refcount structures. This may
>> overwrite the referenced cluster, therefore we cannot simply increase
>> the refcount then.
>>
>> In such cases, we can either try to replicate all the refcount
>> operations solely for the check operation, basing the allocations on the
>> in-memory refcount table; or we can simply rebuild the whole refcount
>> structure based on the in-memory refcount table. Since the latter will
>> be much easier, do that.
>>
>> To prepare for this, introduce a "rebuild" boolean which should be set
>> to true whenever a fix is rather dangerous or too complicated using the
>> current refcount structures. Another example for this is refcount blocks
>> being referenced more than once.
>>
>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>> Reviewed-by: Benoît Canet <benoit.canet@nodalink.com>
>> @@ -1557,6 +1442,7 @@ static int check_refblocks(BlockDriverState *bs, BdrvCheckResult *res,
>>               fprintf(stderr, "ERROR refcount block %" PRId64 " is not "
>>                   "cluster aligned; refcount table entry corrupted\n", i);
>>               res->corruptions++;
>> +            *rebuild = true;
>>               continue;
>>           }
>>   
>> @@ -1612,6 +1498,7 @@ static int check_refblocks(BlockDriverState *bs, BdrvCheckResult *res,
>>   
>>   resize_fail:
>>                   res->corruptions++;
>> +                *rebuild = true;
>>                   fprintf(stderr, "ERROR could not resize image: %s\n",
>>                           strerror(-ret));
>>               } else {
> Increasing res->corruptions in this patch looks right because you don't
> actually rebuild anything yet. However, at the end of the series this
> code still looks the same - shouldn't we somehow convert those the
> corruptions caused by wrong refcounts into res->corruptions_fixed?
>
> Hm... It seems that you do this by storing intermediate results in
> qcow2_check_refcounts(), which assumes that compare_refcounts() finds
> only refcount problems, and no other place can find any. This may be
> true, but wouldn't it be more elegant to have a separate counter for
> corruptions that will be fixed with a rebuild? We can use some
> Qcow2CheckResult instead of BdrvCheckResult internally and add more
> fields there.
>
> Also, I don't think all "ERROR" messages correctly say "Repairing" again
> at the end of the series.

I'll have a look on this, but this hunk for instance should not say 
"Repairing". It's an error and it is not directly repaired. I think the 
output "Rebuilding refcount structure" should be sufficient.

>> @@ -1624,40 +1511,10 @@ resize_fail:
>>               inc_refcounts(bs, res, *refcount_table, *nb_clusters,
>>                   offset, s->cluster_size);
>>               if ((*refcount_table)[cluster] != 1) {
>> -                fprintf(stderr, "%s refcount block %" PRId64
>> -                    " refcount=%d\n",
>> -                    fix & BDRV_FIX_ERRORS ? "Repairing" :
>> -                                            "ERROR",
>> -                    i, (*refcount_table)[cluster]);
>> -
>> -                if (fix & BDRV_FIX_ERRORS) {
>> -                    int64_t new_offset;
>> -
>> -                    new_offset = realloc_refcount_block(bs, i, offset);
>> -                    if (new_offset < 0) {
>> -                        res->corruptions++;
>> -                        continue;
>> -                    }
>> -
>> -                    /* update refcounts */
>> -                    if ((new_offset >> s->cluster_bits) >= *nb_clusters) {
>> -                        /* increase refcount_table size if necessary */
>> -                        int old_nb_clusters = *nb_clusters;
>> -                        *nb_clusters = (new_offset >> s->cluster_bits) + 1;
>> -                        *refcount_table = g_renew(uint16_t, *refcount_table,
>> -                                                  *nb_clusters);
>> -                        memset(&(*refcount_table)[old_nb_clusters], 0,
>> -                               (*nb_clusters - old_nb_clusters) *
>> -                               sizeof(uint16_t));
>> -                    }
>> -                    (*refcount_table)[cluster]--;
>> -                    inc_refcounts(bs, res, *refcount_table, *nb_clusters,
>> -                            new_offset, s->cluster_size);
>> -
>> -                    res->corruptions_fixed++;
>> -                } else {
>> -                    res->corruptions++;
>> -                }
>> +                fprintf(stderr, "ERROR refcount block %" PRId64
>> +                        " refcount=%d\n", i, (*refcount_table)[cluster]);
>> +                res->corruptions++;
>> +                *rebuild = true;
>>               }
>>           }
>>       }
>> @@ -1669,8 +1526,8 @@ resize_fail:
>>    * Calculates an in-memory refcount table.
>>    */
>>   static int calculate_refcounts(BlockDriverState *bs, BdrvCheckResult *res,
>> -                               BdrvCheckMode fix, uint16_t **refcount_table,
>> -                               int64_t *nb_clusters)
>> +                               BdrvCheckMode fix, bool *rebuild,
>> +                               uint16_t **refcount_table, int64_t *nb_clusters)
>>   {
>>       BDRVQcowState *s = bs->opaque;
>>       int64_t i;
>> @@ -1713,7 +1570,7 @@ static int calculate_refcounts(BlockDriverState *bs, BdrvCheckResult *res,
>>           s->refcount_table_offset,
>>           s->refcount_table_size * sizeof(uint64_t));
>>   
>> -    return check_refblocks(bs, res, fix, refcount_table, nb_clusters);
>> +    return check_refblocks(bs, res, fix, rebuild, refcount_table, nb_clusters);
>>   }
>>   
>>   /*
>> @@ -1721,7 +1578,8 @@ static int calculate_refcounts(BlockDriverState *bs, BdrvCheckResult *res,
>>    * refcount as reported by the refcount structures on-disk.
>>    */
>>   static void compare_refcounts(BlockDriverState *bs, BdrvCheckResult *res,
>> -                              BdrvCheckMode fix, int64_t *highest_cluster,
>> +                              BdrvCheckMode fix, bool *rebuild,
>> +                              int64_t *highest_cluster,
>>                                 uint16_t *refcount_table, int64_t nb_clusters)
>>   {
>>       BDRVQcowState *s = bs->opaque;
>> @@ -1748,10 +1606,15 @@ static void compare_refcounts(BlockDriverState *bs, BdrvCheckResult *res,
>>               int *num_fixed = NULL;
>>               if (refcount1 > refcount2 && (fix & BDRV_FIX_LEAKS)) {
>>                   num_fixed = &res->leaks_fixed;
>> -            } else if (refcount1 < refcount2 && (fix & BDRV_FIX_ERRORS)) {
>> +            } else if (refcount1 > 0 && refcount1 < refcount2 &&
>> +                       (fix & BDRV_FIX_ERRORS)) {
>>                   num_fixed = &res->corruptions_fixed;
>>               }
>>   
>> +            if (refcount1 == 0) {
>> +                *rebuild = true;
>> +            }
> Why isn't this just another else if branch? Possibly even as the first
> or second condition, so that you don't have to add 'refcount1 > 0' for
> the other branch.

Because it's semantically different. The if-else sets the num_fixed 
pointer, whereas this conditional block forces a rebuild. I can include 
it as the second condition, just at the time I liked it more to have it 
externally.

Now that I'm reading once again over it... Probably using it as the 
second condition is better. I'll see to it.

>> +
>>               fprintf(stderr, "%s cluster %" PRId64 " refcount=%d reference=%d\n",
>>                      num_fixed != NULL     ? "Repairing" :
>>                      refcount1 < refcount2 ? "ERROR" :
>> @@ -1790,6 +1653,7 @@ int qcow2_check_refcounts(BlockDriverState *bs, BdrvCheckResult *res,
>>       BDRVQcowState *s = bs->opaque;
>>       int64_t size, highest_cluster, nb_clusters;
>>       uint16_t *refcount_table = NULL;
>> +    bool rebuild = false;
>>       int ret;
>>   
>>       size = bdrv_getlength(bs->file);
>> @@ -1807,14 +1671,19 @@ int qcow2_check_refcounts(BlockDriverState *bs, BdrvCheckResult *res,
>>       res->bfi.total_clusters =
>>           size_to_clusters(s, bs->total_sectors * BDRV_SECTOR_SIZE);
>>   
>> -    ret = calculate_refcounts(bs, res, fix, &refcount_table, &nb_clusters);
>> +    ret = calculate_refcounts(bs, res, fix, &rebuild, &refcount_table,
>> +                              &nb_clusters);
>>       if (ret < 0) {
>>           goto fail;
>>       }
>>   
>> -    compare_refcounts(bs, res, fix, &highest_cluster, refcount_table,
>> +    compare_refcounts(bs, res, fix, &rebuild, &highest_cluster, refcount_table,
>>                         nb_clusters);
>>   
>> +    if (rebuild) {
>> +        fprintf(stderr, "ERROR need to rebuild refcount structures\n");
>> +    }
> Should this increate res->check_errors? (It doesn't survive until the
> end of the series anyway, but more places seem to be added later that
> print an error message without increasing the error count.)

Yes, it should. We could even bail out, but it doesn't make much sense 
here (it makes more sense in patch 8).

Thanks,

Max

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] [PATCH v6 08/11] qcow2: Rebuild refcount structure during check
  2014-10-21  9:52     ` Max Reitz
@ 2014-10-21 10:12       ` Kevin Wolf
  0 siblings, 0 replies; 30+ messages in thread
From: Kevin Wolf @ 2014-10-21 10:12 UTC (permalink / raw)
  To: Max Reitz; +Cc: qemu-devel, Stefan Hajnoczi, Benoît Canet

Am 21.10.2014 um 11:52 hat Max Reitz geschrieben:
> On 2014-10-21 at 11:31, Kevin Wolf wrote:
> >Am 20.10.2014 um 16:35 hat Max Reitz geschrieben:
> >>The previous commit introduced the "rebuild" variable to qcow2's
> >>implementation of the image consistency check. Now make use of this by
> >>adding a function which creates a completely new refcount structure
> >>based solely on the in-memory information gathered before.
> >>
> >>The old refcount structure will be leaked, however. This leak will be
> >>dealt with in a follow-up commit.
> >>
> >>Signed-off-by: Max Reitz <mreitz@redhat.com>
> >>---
> >>  block/qcow2-refcount.c | 296 ++++++++++++++++++++++++++++++++++++++++++++++++-
> >>  1 file changed, 293 insertions(+), 3 deletions(-)

> >>+/*
> >>+ * Creates a new refcount structure based solely on the in-memory information
> >>+ * given through *refcount_table. All necessary allocations will be reflected
> >>+ * in that array.
> >>+ *
> >>+ * On success, the old refcount structure is leaked (it will be covered by the
> >>+ * new refcount structure).
> >>+ */
> >>+static int rebuild_refcount_structure(BlockDriverState *bs,
> >>+                                      BdrvCheckResult *res,
> >>+                                      uint16_t **refcount_table,
> >>+                                      int64_t *nb_clusters)
> >>+{
> >>+    BDRVQcowState *s = bs->opaque;
> >>+    int64_t first_free_cluster = 0, reftable_offset = -1, cluster = 0;
> >>+    int64_t refblock_offset, refblock_start, refblock_index;
> >>+    uint32_t reftable_size = 0;
> >>+    uint64_t *reftable = NULL;
> >refcount_table and reftable? Seriously?
> 
> Reviewing would have been too easy otherwise. *cough*
> 
> One option is s/refcount_table/imrt/. I like it more than the second
> option, but it simply goes against the current naming convention.
> 
> The other option would be s/reftable/on_disk_reftable/, following
> the convention used in the next line.

The second option is probably more consistent with the rest (and less
cryptic).

> >>+        ret = qcow2_pre_write_overlap_check(bs, 0, refblock_offset,
> >>+                                            s->cluster_size);
> >>+        if (ret < 0) {
> >>+            fprintf(stderr, "ERROR writing refblock: %s\n", strerror(-ret));
> >>+            goto fail;
> >>+        }
> >>+
> >>+        on_disk_refblock = g_malloc0(s->cluster_size);
> >qemu_blockalign?
> 
> Meh, I think I have to write a qemu_blockalign0() some time...
> 
> If I do so, it's one more patch you have to review; if I just use
> qemu_blockalign(), I find the memset() directly afterwards rather
> ugly, but would be fine with it. I guess it's up to you, then.

A qemu_blockalign0() sounds easy enough to review. It's not the number
of patches that makes review hard, it's their content.

Anyway, you're the patch author, it's your choice.

> >>+    ret = qcow2_pre_write_overlap_check(bs, 0, reftable_offset,
> >>+                                        reftable_size * sizeof(uint64_t));
> >>+    if (ret < 0) {
> >>+        fprintf(stderr, "ERROR writing reftable: %s\n", strerror(-ret));
> >>+        goto fail;
> >>+    }
> >>+
> >>+    ret = bdrv_write(bs->file, reftable_offset / BDRV_SECTOR_SIZE,
> >>+                     (void *)reftable,
> >>+                     reftable_size * sizeof(uint64_t) / BDRV_SECTOR_SIZE);
> >Why not bdrv_pwrite when you only have byte offset and length?
> 
> I don't like pwrite probably only because it takes an int byte
> length. reftable_size * sizeof(uint64_t) should be well below
> INT_MAX, but I don't see why we should use pwrite if we are writing
> full sectors to a sector-aligned offset.

Good point, we should look into fixing the type of the length argument
sometime.

Other than that, I'd prefer bdrv_pread/pwrite because it doesn't require
unit conversion here, and later in block.c back (bdrv_co_do_pwritev is
the native block layer interface, and it's byte based). bdrv_write()
goes through more emulation layers in block.c.

Kevin

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] [PATCH v6 09/11] qcow2: Clean up after refcount rebuild
  2014-10-21  9:59   ` Kevin Wolf
@ 2014-10-21 10:16     ` Max Reitz
  2014-10-21 14:55       ` Max Reitz
  0 siblings, 1 reply; 30+ messages in thread
From: Max Reitz @ 2014-10-21 10:16 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: qemu-devel, Stefan Hajnoczi, Benoît Canet

On 2014-10-21 at 11:59, Kevin Wolf wrote:
> Am 20.10.2014 um 16:35 hat Max Reitz geschrieben:
>> Because the old refcount structure will be leaked after having rebuilt
>> it, we need to recalculate the refcounts and run a leak-fixing operation
>> afterwards (if leaks should be fixed at all).
>>
>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>> Reviewed-by: Benoît Canet <benoit.canet@nodalink.com>
>> ---
>>   block/qcow2-refcount.c | 35 +++++++++++++++++++++++++++++++++++
>>   1 file changed, 35 insertions(+)
>>
>> diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
>> index 75e726b..3730be2 100644
>> --- a/block/qcow2-refcount.c
>> +++ b/block/qcow2-refcount.c
>> @@ -1956,12 +1956,47 @@ int qcow2_check_refcounts(BlockDriverState *bs, BdrvCheckResult *res,
>>                         nb_clusters);
>>   
>>       if (rebuild && (fix & BDRV_FIX_ERRORS)) {
>> +        BdrvCheckResult old_res = *res;
>> +
>>           fprintf(stderr, "Rebuilding refcount structure\n");
>>           ret = rebuild_refcount_structure(bs, res, &refcount_table,
>>                                            &nb_clusters);
>>           if (ret < 0) {
>>               goto fail;
>>           }
>> +
>> +        res->corruptions = 0;
>> +        res->leaks = 0;
>> +
>> +        /* Because the old reftable has been exchanged for a new one the
>> +         * references have to be recalculated */
>> +        rebuild = false;
>> +        memset(refcount_table, 0, nb_clusters * sizeof(uint16_t));
>> +        ret = calculate_refcounts(bs, res, 0, &rebuild, &refcount_table,
>> +                                  &nb_clusters);
>> +        if (ret < 0) {
>> +            goto fail;
>> +        }
>> +
>> +        if (fix & BDRV_FIX_LEAKS) {
>> +            /* The old refcount structures are now leaked, fix it; the result
>> +             * can be ignored */
>> +            pre_compare_res = *res;
> I would prefer using another local variable here. At the first sight
> it's not quite clear which references to pre_compare_res correspond to
> which state.

Why not.

>> +            compare_refcounts(bs, res, BDRV_FIX_LEAKS, &rebuild,
>> +                              &highest_cluster, refcount_table, nb_clusters);
>> +            if (rebuild) {
>> +                fprintf(stderr, "ERROR rebuilt refcount structure is still "
>> +                        "broken\n");
>> +            }
>> +            *res = pre_compare_res;
>> +        }
>> +
>> +        if (res->corruptions < old_res.corruptions) {
>> +            res->corruptions_fixed += old_res.corruptions - res->corruptions;
>> +        }
>> +        if (res->leaks < old_res.leaks) {
>> +            res->leaks_fixed += old_res.leaks - res->leaks;
>> +        }
> For these numbers to be accurate, don't we need to run
> compare_refcounts() unconditionally and only make BDRV_FIX_LEAKS
> conditional?

Actually, there is no difference, because at the point of this patch, 
you cannot use BDRV_FIX_ERRORS without BDRV_FIX_LEAKS. But it'd be more 
correct, right.

Thanks,

Max

> Now that I've read the rest of the series, comparing res and old_res
> actually makes sense, so maybe it's not necessary to introduce a
> Qcow2CheckResult.
>
>>       } else if (fix) {
>>           if (rebuild) {
>>               fprintf(stderr, "ERROR need to rebuild refcount structures\n");
>> -- 
>> 2.1.2
> Kevin

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] [PATCH v6 10/11] iotests: Fix test outputs
  2014-10-20 14:35 ` [Qemu-devel] [PATCH v6 10/11] iotests: Fix test outputs Max Reitz
@ 2014-10-21 13:42   ` Kevin Wolf
  0 siblings, 0 replies; 30+ messages in thread
From: Kevin Wolf @ 2014-10-21 13:42 UTC (permalink / raw)
  To: Max Reitz; +Cc: qemu-devel, Stefan Hajnoczi, Benoît Canet

Am 20.10.2014 um 16:35 hat Max Reitz geschrieben:
> 039, 060 and 061 all create images with referenced clusters having a
> refcount of 0. Because previous commits changed handling of such errors,
> these tests now have a different output. Fix it.
> 
> Furthermore, 060 created a refblock with a refcount greater than one
> which now results in having to rebuild the refcount structure as well.
> 
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> Reviewed-by: Eric Blake <eblake@redhat.com>

Personally, I prefer updating test case results in the patches that
change the output. Anyway, the changes seem to match what I saw in the
previous patches.

Reviewed-by: Kevin Wolf <kwolf@redhat.com>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] [PATCH v6 11/11] iotests: Add test for potentially damaging repairs
  2014-10-20 14:35 ` [Qemu-devel] [PATCH v6 11/11] iotests: Add test for potentially damaging repairs Max Reitz
@ 2014-10-21 14:12   ` Kevin Wolf
  2014-10-21 14:20     ` Max Reitz
  0 siblings, 1 reply; 30+ messages in thread
From: Kevin Wolf @ 2014-10-21 14:12 UTC (permalink / raw)
  To: Max Reitz; +Cc: qemu-devel, Stefan Hajnoczi, Benoît Canet

Am 20.10.2014 um 16:35 hat Max Reitz geschrieben:
> There are certain cases where repairing a qcow2 image might actually
> damage it further (or rather, where repairing it has in fact damaged it
> further with the old qcow2 check implementation). This should not
> happen, so add a test for these cases.
> 
> Furthermore, the repair function now repairs refblocks beyond the image
> end by resizing the image accordingly. Add several tests for this as
> well.
> 
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> Reviewed-by: Eric Blake <eblake@redhat.com>

In case you didn't know: qemu-img handles hex offsets just fine, so
there's no need to comment the hex value and then convert it to decimal
for the real command.

> +--- Refblock is unallocated ---
> +
> +Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=67108864 
> +Repairing refcount block 1 is outside image
> +ERROR cluster 16 refcount=0 reference=1
> +Rebuilding refcount structure
> +Repairing cluster 1 refcount=1 reference=0
> +Repairing cluster 2 refcount=1 reference=0
> +Repairing cluster 16 refcount=1 reference=0
> +The following inconsistencies were found and repaired:
> +
> +    0 leaked clusters
> +    2 corruptions
> +
> +Double checking the fixed image now...
> +No errors were found on the image.
> +
> +--- Signed overflow after the refblock ---
> +
> +Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=67108864 
> +Repairing refcount block 1 is outside image
> +ERROR could not resize image: Invalid argument
> +Rebuilding refcount structure
> +Repairing cluster 1 refcount=1 reference=0
> +Repairing cluster 2 refcount=1 reference=0
> +The following inconsistencies were found and repaired:
> +
> +    0 leaked clusters
> +    1 corruptions
> +
> +Double checking the fixed image now...
> +No errors were found on the image.

This looks fishy. Compare this to the output of the previous case. We're
now missing the corruption for the refblock because *nb_clusters wasn't
increased.

Don't we actually run the risk of allocating a clusters during the
refcount rebuild that was outside the image, but couldn't be repaired?
Perhaps a resize failure needs to stop the repair.

Kevin

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] [PATCH v6 11/11] iotests: Add test for potentially damaging repairs
  2014-10-21 14:12   ` Kevin Wolf
@ 2014-10-21 14:20     ` Max Reitz
  2014-10-21 15:16       ` Kevin Wolf
  0 siblings, 1 reply; 30+ messages in thread
From: Max Reitz @ 2014-10-21 14:20 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: qemu-devel, Stefan Hajnoczi, Benoît Canet

On 2014-10-21 at 16:12, Kevin Wolf wrote:
> Am 20.10.2014 um 16:35 hat Max Reitz geschrieben:
>> There are certain cases where repairing a qcow2 image might actually
>> damage it further (or rather, where repairing it has in fact damaged it
>> further with the old qcow2 check implementation). This should not
>> happen, so add a test for these cases.
>>
>> Furthermore, the repair function now repairs refblocks beyond the image
>> end by resizing the image accordingly. Add several tests for this as
>> well.
>>
>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>> Reviewed-by: Eric Blake <eblake@redhat.com>
> In case you didn't know: qemu-img handles hex offsets just fine, so
> there's no need to comment the hex value and then convert it to decimal
> for the real command.

Aha *g*

I did it that way in 060 and since then I just copied from there...

>> +--- Refblock is unallocated ---
>> +
>> +Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=67108864
>> +Repairing refcount block 1 is outside image
>> +ERROR cluster 16 refcount=0 reference=1
>> +Rebuilding refcount structure
>> +Repairing cluster 1 refcount=1 reference=0
>> +Repairing cluster 2 refcount=1 reference=0
>> +Repairing cluster 16 refcount=1 reference=0
>> +The following inconsistencies were found and repaired:
>> +
>> +    0 leaked clusters
>> +    2 corruptions
>> +
>> +Double checking the fixed image now...
>> +No errors were found on the image.
>> +
>> +--- Signed overflow after the refblock ---
>> +
>> +Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=67108864
>> +Repairing refcount block 1 is outside image
>> +ERROR could not resize image: Invalid argument
>> +Rebuilding refcount structure
>> +Repairing cluster 1 refcount=1 reference=0
>> +Repairing cluster 2 refcount=1 reference=0
>> +The following inconsistencies were found and repaired:
>> +
>> +    0 leaked clusters
>> +    1 corruptions
>> +
>> +Double checking the fixed image now...
>> +No errors were found on the image.
> This looks fishy. Compare this to the output of the previous case. We're
> now missing the corruption for the refblock because *nb_clusters wasn't
> increased.

I think we're rather missing the corruption for cluster 16 refcount=0. 
And I'd find that completely fine.

> Don't we actually run the risk of allocating a clusters during the
> refcount rebuild that was outside the image, but couldn't be repaired?
> Perhaps a resize failure needs to stop the repair.

The other way would be to unconditionally call inc_refcounts().

But I think it does not matter either way. If the refcount structure is 
rebuilt, all current refblocks are leaked anyway, so overwriting them is 
not an issue, I think.

Max

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] [PATCH v6 09/11] qcow2: Clean up after refcount rebuild
  2014-10-21 10:16     ` Max Reitz
@ 2014-10-21 14:55       ` Max Reitz
  2014-10-21 15:11         ` Kevin Wolf
  0 siblings, 1 reply; 30+ messages in thread
From: Max Reitz @ 2014-10-21 14:55 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: qemu-devel, Stefan Hajnoczi, Benoît Canet

On 2014-10-21 at 12:16, Max Reitz wrote:
> On 2014-10-21 at 11:59, Kevin Wolf wrote:
>> Am 20.10.2014 um 16:35 hat Max Reitz geschrieben:
>>> Because the old refcount structure will be leaked after having rebuilt
>>> it, we need to recalculate the refcounts and run a leak-fixing 
>>> operation
>>> afterwards (if leaks should be fixed at all).
>>>
>>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>>> Reviewed-by: Benoît Canet <benoit.canet@nodalink.com>
>>> ---
>>>   block/qcow2-refcount.c | 35 +++++++++++++++++++++++++++++++++++
>>>   1 file changed, 35 insertions(+)
>>>
>>> diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
>>> index 75e726b..3730be2 100644
>>> --- a/block/qcow2-refcount.c
>>> +++ b/block/qcow2-refcount.c
>>> @@ -1956,12 +1956,47 @@ int qcow2_check_refcounts(BlockDriverState 
>>> *bs, BdrvCheckResult *res,
>>>                         nb_clusters);
>>>         if (rebuild && (fix & BDRV_FIX_ERRORS)) {
>>> +        BdrvCheckResult old_res = *res;
>>> +
>>>           fprintf(stderr, "Rebuilding refcount structure\n");
>>>           ret = rebuild_refcount_structure(bs, res, &refcount_table,
>>>                                            &nb_clusters);
>>>           if (ret < 0) {
>>>               goto fail;
>>>           }
>>> +
>>> +        res->corruptions = 0;
>>> +        res->leaks = 0;
>>> +
>>> +        /* Because the old reftable has been exchanged for a new 
>>> one the
>>> +         * references have to be recalculated */
>>> +        rebuild = false;
>>> +        memset(refcount_table, 0, nb_clusters * sizeof(uint16_t));
>>> +        ret = calculate_refcounts(bs, res, 0, &rebuild, 
>>> &refcount_table,
>>> +                                  &nb_clusters);
>>> +        if (ret < 0) {
>>> +            goto fail;
>>> +        }
>>> +
>>> +        if (fix & BDRV_FIX_LEAKS) {
>>> +            /* The old refcount structures are now leaked, fix it; 
>>> the result
>>> +             * can be ignored */
>>> +            pre_compare_res = *res;
>> I would prefer using another local variable here. At the first sight
>> it's not quite clear which references to pre_compare_res correspond to
>> which state.
>
> Why not.
>
>>> +            compare_refcounts(bs, res, BDRV_FIX_LEAKS, &rebuild,
>>> +                              &highest_cluster, refcount_table, 
>>> nb_clusters);
>>> +            if (rebuild) {
>>> +                fprintf(stderr, "ERROR rebuilt refcount structure 
>>> is still "
>>> +                        "broken\n");
>>> +            }
>>> +            *res = pre_compare_res;
>>> +        }
>>> +
>>> +        if (res->corruptions < old_res.corruptions) {
>>> +            res->corruptions_fixed += old_res.corruptions - 
>>> res->corruptions;
>>> +        }
>>> +        if (res->leaks < old_res.leaks) {
>>> +            res->leaks_fixed += old_res.leaks - res->leaks;
>>> +        }
>> For these numbers to be accurate, don't we need to run
>> compare_refcounts() unconditionally and only make BDRV_FIX_LEAKS
>> conditional?
>
> Actually, there is no difference, because at the point of this patch, 
> you cannot use BDRV_FIX_ERRORS without BDRV_FIX_LEAKS. But it'd be 
> more correct, right.

Wait, it would not be more correct. The result of the 
compare_refcounts() call inside of the "if (fix & BDRV_FIX_LEAKS)" 
conditional block is ignored, its only purpose is to fix leaks resulting 
from rebuild_refcount_structure().

So the question is whether we should discard the result of that 
compare_refcounts() call. I think we should. Its sole purpose is to fix 
leaks due to the rebuilt refcount structures, and qemu-img will double 
check anyway.

Max

> Thanks,
>
> Max
>
>> Now that I've read the rest of the series, comparing res and old_res
>> actually makes sense, so maybe it's not necessary to introduce a
>> Qcow2CheckResult.
>>
>>>       } else if (fix) {
>>>           if (rebuild) {
>>>               fprintf(stderr, "ERROR need to rebuild refcount 
>>> structures\n");
>>> -- 
>>> 2.1.2
>> Kevin
>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] [PATCH v6 09/11] qcow2: Clean up after refcount rebuild
  2014-10-21 14:55       ` Max Reitz
@ 2014-10-21 15:11         ` Kevin Wolf
  2014-10-21 15:17           ` Max Reitz
  0 siblings, 1 reply; 30+ messages in thread
From: Kevin Wolf @ 2014-10-21 15:11 UTC (permalink / raw)
  To: Max Reitz; +Cc: qemu-devel, Stefan Hajnoczi, Benoît Canet

Am 21.10.2014 um 16:55 hat Max Reitz geschrieben:
> On 2014-10-21 at 12:16, Max Reitz wrote:
> >On 2014-10-21 at 11:59, Kevin Wolf wrote:
> >>Am 20.10.2014 um 16:35 hat Max Reitz geschrieben:
> >>>Because the old refcount structure will be leaked after having rebuilt
> >>>it, we need to recalculate the refcounts and run a leak-fixing
> >>>operation
> >>>afterwards (if leaks should be fixed at all).
> >>>
> >>>Signed-off-by: Max Reitz <mreitz@redhat.com>
> >>>Reviewed-by: Benoît Canet <benoit.canet@nodalink.com>
> >>>---
> >>>  block/qcow2-refcount.c | 35 +++++++++++++++++++++++++++++++++++
> >>>  1 file changed, 35 insertions(+)
> >>>
> >>>diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
> >>>index 75e726b..3730be2 100644
> >>>--- a/block/qcow2-refcount.c
> >>>+++ b/block/qcow2-refcount.c
> >>>@@ -1956,12 +1956,47 @@ int
> >>>qcow2_check_refcounts(BlockDriverState *bs, BdrvCheckResult
> >>>*res,
> >>>                        nb_clusters);
> >>>        if (rebuild && (fix & BDRV_FIX_ERRORS)) {
> >>>+        BdrvCheckResult old_res = *res;
> >>>+
> >>>          fprintf(stderr, "Rebuilding refcount structure\n");
> >>>          ret = rebuild_refcount_structure(bs, res, &refcount_table,
> >>>                                           &nb_clusters);
> >>>          if (ret < 0) {
> >>>              goto fail;
> >>>          }
> >>>+
> >>>+        res->corruptions = 0;
> >>>+        res->leaks = 0;
> >>>+
> >>>+        /* Because the old reftable has been exchanged for a
> >>>new one the
> >>>+         * references have to be recalculated */
> >>>+        rebuild = false;
> >>>+        memset(refcount_table, 0, nb_clusters * sizeof(uint16_t));
> >>>+        ret = calculate_refcounts(bs, res, 0, &rebuild,
> >>>&refcount_table,
> >>>+                                  &nb_clusters);
> >>>+        if (ret < 0) {
> >>>+            goto fail;
> >>>+        }
> >>>+
> >>>+        if (fix & BDRV_FIX_LEAKS) {
> >>>+            /* The old refcount structures are now leaked,
> >>>fix it; the result
> >>>+             * can be ignored */
> >>>+            pre_compare_res = *res;
> >>I would prefer using another local variable here. At the first sight
> >>it's not quite clear which references to pre_compare_res correspond to
> >>which state.
> >
> >Why not.
> >
> >>>+            compare_refcounts(bs, res, BDRV_FIX_LEAKS, &rebuild,
> >>>+                              &highest_cluster,
> >>>refcount_table, nb_clusters);
> >>>+            if (rebuild) {
> >>>+                fprintf(stderr, "ERROR rebuilt refcount
> >>>structure is still "
> >>>+                        "broken\n");
> >>>+            }
> >>>+            *res = pre_compare_res;
> >>>+        }
> >>>+
> >>>+        if (res->corruptions < old_res.corruptions) {
> >>>+            res->corruptions_fixed += old_res.corruptions -
> >>>res->corruptions;
> >>>+        }
> >>>+        if (res->leaks < old_res.leaks) {
> >>>+            res->leaks_fixed += old_res.leaks - res->leaks;
> >>>+        }
> >>For these numbers to be accurate, don't we need to run
> >>compare_refcounts() unconditionally and only make BDRV_FIX_LEAKS
> >>conditional?
> >
> >Actually, there is no difference, because at the point of this
> >patch, you cannot use BDRV_FIX_ERRORS without BDRV_FIX_LEAKS. But
> >it'd be more correct, right.
> 
> Wait, it would not be more correct. The result of the
> compare_refcounts() call inside of the "if (fix & BDRV_FIX_LEAKS)"
> conditional block is ignored, its only purpose is to fix leaks
> resulting from rebuild_refcount_structure().
> 
> So the question is whether we should discard the result of that
> compare_refcounts() call. I think we should. Its sole purpose is to
> fix leaks due to the rebuilt refcount structures, and qemu-img will
> double check anyway.

Right, the other leaks should have been fixed by rebuilding the refcount
structures. So what you're saying is that we could do this:

    if (res->corruptions < old_res.corruptions) {
        res->corruptions_fixed += old_res.corruptions - res->corruptions;
    }
    assert(res->leaks == 0);
    res->leaks_fixed = old_res.leaks;

If this weren't true, we'd ignore leaked clusters even with
BDRV_FIX_LEAKS set.

Kevin

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] [PATCH v6 11/11] iotests: Add test for potentially damaging repairs
  2014-10-21 14:20     ` Max Reitz
@ 2014-10-21 15:16       ` Kevin Wolf
  0 siblings, 0 replies; 30+ messages in thread
From: Kevin Wolf @ 2014-10-21 15:16 UTC (permalink / raw)
  To: Max Reitz; +Cc: qemu-devel, Stefan Hajnoczi, Benoît Canet

Am 21.10.2014 um 16:20 hat Max Reitz geschrieben:
> On 2014-10-21 at 16:12, Kevin Wolf wrote:
> >Am 20.10.2014 um 16:35 hat Max Reitz geschrieben:
> >>There are certain cases where repairing a qcow2 image might actually
> >>damage it further (or rather, where repairing it has in fact damaged it
> >>further with the old qcow2 check implementation). This should not
> >>happen, so add a test for these cases.
> >>
> >>Furthermore, the repair function now repairs refblocks beyond the image
> >>end by resizing the image accordingly. Add several tests for this as
> >>well.
> >>
> >>Signed-off-by: Max Reitz <mreitz@redhat.com>
> >>Reviewed-by: Eric Blake <eblake@redhat.com>
> >In case you didn't know: qemu-img handles hex offsets just fine, so
> >there's no need to comment the hex value and then convert it to decimal
> >for the real command.
> 
> Aha *g*
> 
> I did it that way in 060 and since then I just copied from there...
> 
> >>+--- Refblock is unallocated ---
> >>+
> >>+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=67108864
> >>+Repairing refcount block 1 is outside image
> >>+ERROR cluster 16 refcount=0 reference=1
> >>+Rebuilding refcount structure
> >>+Repairing cluster 1 refcount=1 reference=0
> >>+Repairing cluster 2 refcount=1 reference=0
> >>+Repairing cluster 16 refcount=1 reference=0
> >>+The following inconsistencies were found and repaired:
> >>+
> >>+    0 leaked clusters
> >>+    2 corruptions
> >>+
> >>+Double checking the fixed image now...
> >>+No errors were found on the image.
> >>+
> >>+--- Signed overflow after the refblock ---
> >>+
> >>+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=67108864
> >>+Repairing refcount block 1 is outside image
> >>+ERROR could not resize image: Invalid argument
> >>+Rebuilding refcount structure
> >>+Repairing cluster 1 refcount=1 reference=0
> >>+Repairing cluster 2 refcount=1 reference=0
> >>+The following inconsistencies were found and repaired:
> >>+
> >>+    0 leaked clusters
> >>+    1 corruptions
> >>+
> >>+Double checking the fixed image now...
> >>+No errors were found on the image.
> >This looks fishy. Compare this to the output of the previous case. We're
> >now missing the corruption for the refblock because *nb_clusters wasn't
> >increased.
> 
> I think we're rather missing the corruption for cluster 16
> refcount=0. And I'd find that completely fine.

Wasn't cluster 16 the additional refblock? Technically, it should now be
a corruption for cluster $HUGE_NUMBER, though I'll accept your excuse
below (for refblocks at least).

> >Don't we actually run the risk of allocating a clusters during the
> >refcount rebuild that was outside the image, but couldn't be repaired?
> >Perhaps a resize failure needs to stop the repair.
> 
> The other way would be to unconditionally call inc_refcounts().
> 
> But I think it does not matter either way. If the refcount structure
> is rebuilt, all current refblocks are leaked anyway, so overwriting
> them is not an issue, I think.

True. My concern was more about data blocks, but these are handled in a
different place. Nevertheless, I think inc_refcounts() could ignore them
with just a warning and a refcount rebuild would potentially overwrite
them.

Kevin

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] [PATCH v6 09/11] qcow2: Clean up after refcount rebuild
  2014-10-21 15:11         ` Kevin Wolf
@ 2014-10-21 15:17           ` Max Reitz
  0 siblings, 0 replies; 30+ messages in thread
From: Max Reitz @ 2014-10-21 15:17 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: qemu-devel, Stefan Hajnoczi, Benoît Canet

On 2014-10-21 at 17:11, Kevin Wolf wrote:
> Am 21.10.2014 um 16:55 hat Max Reitz geschrieben:
>> On 2014-10-21 at 12:16, Max Reitz wrote:
>>> On 2014-10-21 at 11:59, Kevin Wolf wrote:
>>>> Am 20.10.2014 um 16:35 hat Max Reitz geschrieben:
>>>>> Because the old refcount structure will be leaked after having rebuilt
>>>>> it, we need to recalculate the refcounts and run a leak-fixing
>>>>> operation
>>>>> afterwards (if leaks should be fixed at all).
>>>>>
>>>>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>>>>> Reviewed-by: Benoît Canet <benoit.canet@nodalink.com>
>>>>> ---
>>>>>   block/qcow2-refcount.c | 35 +++++++++++++++++++++++++++++++++++
>>>>>   1 file changed, 35 insertions(+)
>>>>>
>>>>> diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
>>>>> index 75e726b..3730be2 100644
>>>>> --- a/block/qcow2-refcount.c
>>>>> +++ b/block/qcow2-refcount.c
>>>>> @@ -1956,12 +1956,47 @@ int
>>>>> qcow2_check_refcounts(BlockDriverState *bs, BdrvCheckResult
>>>>> *res,
>>>>>                         nb_clusters);
>>>>>         if (rebuild && (fix & BDRV_FIX_ERRORS)) {
>>>>> +        BdrvCheckResult old_res = *res;
>>>>> +
>>>>>           fprintf(stderr, "Rebuilding refcount structure\n");
>>>>>           ret = rebuild_refcount_structure(bs, res, &refcount_table,
>>>>>                                            &nb_clusters);
>>>>>           if (ret < 0) {
>>>>>               goto fail;
>>>>>           }
>>>>> +
>>>>> +        res->corruptions = 0;
>>>>> +        res->leaks = 0;
>>>>> +
>>>>> +        /* Because the old reftable has been exchanged for a
>>>>> new one the
>>>>> +         * references have to be recalculated */
>>>>> +        rebuild = false;
>>>>> +        memset(refcount_table, 0, nb_clusters * sizeof(uint16_t));
>>>>> +        ret = calculate_refcounts(bs, res, 0, &rebuild,
>>>>> &refcount_table,
>>>>> +                                  &nb_clusters);
>>>>> +        if (ret < 0) {
>>>>> +            goto fail;
>>>>> +        }
>>>>> +
>>>>> +        if (fix & BDRV_FIX_LEAKS) {
>>>>> +            /* The old refcount structures are now leaked,
>>>>> fix it; the result
>>>>> +             * can be ignored */
>>>>> +            pre_compare_res = *res;
>>>> I would prefer using another local variable here. At the first sight
>>>> it's not quite clear which references to pre_compare_res correspond to
>>>> which state.
>>> Why not.
>>>
>>>>> +            compare_refcounts(bs, res, BDRV_FIX_LEAKS, &rebuild,
>>>>> +                              &highest_cluster,
>>>>> refcount_table, nb_clusters);
>>>>> +            if (rebuild) {
>>>>> +                fprintf(stderr, "ERROR rebuilt refcount
>>>>> structure is still "
>>>>> +                        "broken\n");
>>>>> +            }
>>>>> +            *res = pre_compare_res;
>>>>> +        }
>>>>> +
>>>>> +        if (res->corruptions < old_res.corruptions) {
>>>>> +            res->corruptions_fixed += old_res.corruptions -
>>>>> res->corruptions;
>>>>> +        }
>>>>> +        if (res->leaks < old_res.leaks) {
>>>>> +            res->leaks_fixed += old_res.leaks - res->leaks;
>>>>> +        }
>>>> For these numbers to be accurate, don't we need to run
>>>> compare_refcounts() unconditionally and only make BDRV_FIX_LEAKS
>>>> conditional?
>>> Actually, there is no difference, because at the point of this
>>> patch, you cannot use BDRV_FIX_ERRORS without BDRV_FIX_LEAKS. But
>>> it'd be more correct, right.
>> Wait, it would not be more correct. The result of the
>> compare_refcounts() call inside of the "if (fix & BDRV_FIX_LEAKS)"
>> conditional block is ignored, its only purpose is to fix leaks
>> resulting from rebuild_refcount_structure().
>>
>> So the question is whether we should discard the result of that
>> compare_refcounts() call. I think we should. Its sole purpose is to
>> fix leaks due to the rebuilt refcount structures, and qemu-img will
>> double check anyway.
> Right, the other leaks should have been fixed by rebuilding the refcount
> structures. So what you're saying is that we could do this:
>
>      if (res->corruptions < old_res.corruptions) {
>          res->corruptions_fixed += old_res.corruptions - res->corruptions;
>      }
>      assert(res->leaks == 0);
>      res->leaks_fixed = old_res.leaks;
>
> If this weren't true, we'd ignore leaked clusters even with
> BDRV_FIX_LEAKS set.

Okay, so for obvious reasons this is not nice. Well, I don't know what 
will be worse. Writing the code which takes the leak fix result into 
account or having to review it. I think I'll just see how many new leaks 
appeared due to the rebuild and how many of those could not be fixed, 
and then add that result to res->leaks. That should work out without 
being too complicated.

Max

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2014-10-21 15:17 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-10-20 14:35 [Qemu-devel] [PATCH v6 00/11] qcow2: Fix image repairing Max Reitz
2014-10-20 14:35 ` [Qemu-devel] [PATCH v6 01/11] qcow2: Calculate refcount block entry count Max Reitz
2014-10-20 14:35 ` [Qemu-devel] [PATCH v6 02/11] qcow2: Fix leaks in dirty images Max Reitz
2014-10-20 14:35 ` [Qemu-devel] [PATCH v6 03/11] qcow2: Split qcow2_check_refcounts() Max Reitz
2014-10-20 14:35 ` [Qemu-devel] [PATCH v6 04/11] qcow2: Pull check_refblocks() up Max Reitz
2014-10-20 15:07   ` Kevin Wolf
2014-10-20 14:35 ` [Qemu-devel] [PATCH v6 05/11] qcow2: Reuse refcount table in calculate_refcounts() Max Reitz
2014-10-20 15:09   ` Kevin Wolf
2014-10-20 14:35 ` [Qemu-devel] [PATCH v6 06/11] qcow2: Fix refcount blocks beyond image end Max Reitz
2014-10-20 16:44   ` Kevin Wolf
2014-10-21  7:14     ` Max Reitz
2014-10-20 14:35 ` [Qemu-devel] [PATCH v6 07/11] qcow2: Do not perform potentially damaging repairs Max Reitz
2014-10-21  7:52   ` Kevin Wolf
2014-10-21 10:10     ` Max Reitz
2014-10-20 14:35 ` [Qemu-devel] [PATCH v6 08/11] qcow2: Rebuild refcount structure during check Max Reitz
2014-10-21  9:31   ` Kevin Wolf
2014-10-21  9:52     ` Max Reitz
2014-10-21 10:12       ` Kevin Wolf
2014-10-20 14:35 ` [Qemu-devel] [PATCH v6 09/11] qcow2: Clean up after refcount rebuild Max Reitz
2014-10-21  9:59   ` Kevin Wolf
2014-10-21 10:16     ` Max Reitz
2014-10-21 14:55       ` Max Reitz
2014-10-21 15:11         ` Kevin Wolf
2014-10-21 15:17           ` Max Reitz
2014-10-20 14:35 ` [Qemu-devel] [PATCH v6 10/11] iotests: Fix test outputs Max Reitz
2014-10-21 13:42   ` Kevin Wolf
2014-10-20 14:35 ` [Qemu-devel] [PATCH v6 11/11] iotests: Add test for potentially damaging repairs Max Reitz
2014-10-21 14:12   ` Kevin Wolf
2014-10-21 14:20     ` Max Reitz
2014-10-21 15:16       ` Kevin Wolf

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.