All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH 0/7] qcow2 check improvements
@ 2018-08-17 12:22 Vladimir Sementsov-Ogievskiy
  2018-08-17 12:22 ` [Qemu-devel] [PATCH v2 1/7] block/qcow2-refcount: fix check_oflag_copied Vladimir Sementsov-Ogievskiy
                   ` (7 more replies)
  0 siblings, 8 replies; 32+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2018-08-17 12:22 UTC (permalink / raw)
  To: qemu-devel, qemu-block; +Cc: mreitz, kwolf, eblake, vsementsov, den

Hi all!

v2:
02, 06: check bdrv_getlength error return code

v1:

We've faced the following problem: after host fs corruption, vm images
becomes invalid. And which is interesting, starting qemu-img check on
them led to allocating of the whole RAM and then killing qemu-img by
OOM Killer.

This was due to corrupted l2 entries, which referenced clusters far-far
beyond the end of the qcow2 file.
02 is a generic fix for the bug, 01 is unrelated improvement, 03-07 are
additional info and fixing for such corrupted table entries.

Questions on 02, 06 and 07:
1. Should restrictions be more or less strict?
2. Are there valid cases, when such entries should not be considered as
   corrupted?

Vladimir Sementsov-Ogievskiy (7):
  block/qcow2-refcount: fix check_oflag_copied
  block/qcow2-refcount: avoid eating RAM
  block/qcow2-refcount: check_refcounts_l2: refactor compressed case
  block/qcow2-refcount: check_refcounts_l2: reduce ignored overlaps
  block/qcow2-refcount: check_refcounts_l2: split fix_l2_entry_to_zero
  block/qcow2-refcount: fix out-of-file L1 entries to be zero
  block/qcow2-refcount: fix out-of-file L2 entries to be read-as-zero

 block/qcow2-refcount.c | 270 +++++++++++++++++++++++++++++++++++++++----------
 1 file changed, 219 insertions(+), 51 deletions(-)

-- 
2.11.1

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [Qemu-devel] [PATCH v2 1/7] block/qcow2-refcount: fix check_oflag_copied
  2018-08-17 12:22 [Qemu-devel] [PATCH 0/7] qcow2 check improvements Vladimir Sementsov-Ogievskiy
@ 2018-08-17 12:22 ` Vladimir Sementsov-Ogievskiy
  2018-10-08 15:28   ` Max Reitz
  2018-08-17 12:22 ` [Qemu-devel] [PATCH v2 2/7] block/qcow2-refcount: avoid eating RAM Vladimir Sementsov-Ogievskiy
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 32+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2018-08-17 12:22 UTC (permalink / raw)
  To: qemu-devel, qemu-block; +Cc: mreitz, kwolf, eblake, vsementsov, den

Increase corruptions_fixed only after successful fix.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 block/qcow2-refcount.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index 3c539f02e5..615847eb09 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -1816,7 +1816,7 @@ static int check_oflag_copied(BlockDriverState *bs, BdrvCheckResult *res,
     for (i = 0; i < s->l1_size; i++) {
         uint64_t l1_entry = s->l1_table[i];
         uint64_t l2_offset = l1_entry & L1E_OFFSET_MASK;
-        bool l2_dirty = false;
+        int l2_fixed_entries = 0;
 
         if (!l2_offset) {
             continue;
@@ -1878,8 +1878,7 @@ static int check_oflag_copied(BlockDriverState *bs, BdrvCheckResult *res,
                         l2_table[j] = cpu_to_be64(refcount == 1
                                     ? l2_entry |  QCOW_OFLAG_COPIED
                                     : l2_entry & ~QCOW_OFLAG_COPIED);
-                        l2_dirty = true;
-                        res->corruptions_fixed++;
+                        l2_fixed_entries++;
                     } else {
                         res->corruptions++;
                     }
@@ -1887,7 +1886,7 @@ static int check_oflag_copied(BlockDriverState *bs, BdrvCheckResult *res,
             }
         }
 
-        if (l2_dirty) {
+        if (l2_fixed_entries > 0) {
             ret = qcow2_pre_write_overlap_check(bs, QCOW2_OL_ACTIVE_L2,
                                                 l2_offset, s->cluster_size);
             if (ret < 0) {
@@ -1905,6 +1904,7 @@ static int check_oflag_copied(BlockDriverState *bs, BdrvCheckResult *res,
                 res->check_errors++;
                 goto fail;
             }
+            res->corruptions_fixed += l2_fixed_entries;
         }
     }
 
-- 
2.11.1

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [Qemu-devel] [PATCH v2 2/7] block/qcow2-refcount: avoid eating RAM
  2018-08-17 12:22 [Qemu-devel] [PATCH 0/7] qcow2 check improvements Vladimir Sementsov-Ogievskiy
  2018-08-17 12:22 ` [Qemu-devel] [PATCH v2 1/7] block/qcow2-refcount: fix check_oflag_copied Vladimir Sementsov-Ogievskiy
@ 2018-08-17 12:22 ` Vladimir Sementsov-Ogievskiy
  2018-10-08 15:31   ` Max Reitz
  2018-08-17 12:22 ` [Qemu-devel] [PATCH v2 3/7] block/qcow2-refcount: check_refcounts_l2: refactor compressed case Vladimir Sementsov-Ogievskiy
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 32+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2018-08-17 12:22 UTC (permalink / raw)
  To: qemu-devel, qemu-block; +Cc: mreitz, kwolf, eblake, vsementsov, den

qcow2_inc_refcounts_imrt() (through realloc_refcount_array()) can eat
an unpredictable amount of memory on corrupted table entries, which are
referencing regions far beyond the end of file.

Prevent this, by skipping such regions from further processing.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 block/qcow2-refcount.c | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index 615847eb09..566c19fbfa 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -1499,12 +1499,26 @@ int qcow2_inc_refcounts_imrt(BlockDriverState *bs, BdrvCheckResult *res,
 {
     BDRVQcow2State *s = bs->opaque;
     uint64_t start, last, cluster_offset, k, refcount;
+    int64_t file_len;
     int ret;
 
     if (size <= 0) {
         return 0;
     }
 
+    file_len = bdrv_getlength(bs->file->bs);
+    if (file_len < 0) {
+        return file_len;
+    }
+
+    if (offset + size - file_len > s->cluster_size) {
+        fprintf(stderr, "ERROR: counting reference for region exceeding the "
+                "end of the file by more than one cluster: offset 0x%" PRIx64
+                " size 0x%" PRIx64 "\n", offset, size);
+        res->corruptions++;
+        return 0;
+    }
+
     start = start_of_cluster(s, offset);
     last = start_of_cluster(s, offset + size - 1);
     for(cluster_offset = start; cluster_offset <= last;
-- 
2.11.1

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [Qemu-devel] [PATCH v2 3/7] block/qcow2-refcount: check_refcounts_l2: refactor compressed case
  2018-08-17 12:22 [Qemu-devel] [PATCH 0/7] qcow2 check improvements Vladimir Sementsov-Ogievskiy
  2018-08-17 12:22 ` [Qemu-devel] [PATCH v2 1/7] block/qcow2-refcount: fix check_oflag_copied Vladimir Sementsov-Ogievskiy
  2018-08-17 12:22 ` [Qemu-devel] [PATCH v2 2/7] block/qcow2-refcount: avoid eating RAM Vladimir Sementsov-Ogievskiy
@ 2018-08-17 12:22 ` Vladimir Sementsov-Ogievskiy
  2018-10-08 15:40   ` Max Reitz
  2018-08-17 12:22 ` [Qemu-devel] [PATCH v2 4/7] block/qcow2-refcount: check_refcounts_l2: reduce ignored overlaps Vladimir Sementsov-Ogievskiy
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 32+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2018-08-17 12:22 UTC (permalink / raw)
  To: qemu-devel, qemu-block; +Cc: mreitz, kwolf, eblake, vsementsov, den

Separate offset and size of compressed cluster.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 block/qcow2-refcount.c | 15 ++++++++++-----
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index 566c19fbfa..0ea01e3ee2 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -1570,7 +1570,7 @@ static int check_refcounts_l2(BlockDriverState *bs, BdrvCheckResult *res,
     BDRVQcow2State *s = bs->opaque;
     uint64_t *l2_table, l2_entry;
     uint64_t next_contiguous_offset = 0;
-    int i, l2_size, nb_csectors, ret;
+    int i, l2_size, ret;
 
     /* Read L2 table from disk */
     l2_size = s->l2_size * sizeof(uint64_t);
@@ -1589,6 +1589,9 @@ static int check_refcounts_l2(BlockDriverState *bs, BdrvCheckResult *res,
 
         switch (qcow2_get_cluster_type(l2_entry)) {
         case QCOW2_CLUSTER_COMPRESSED:
+        {
+            int64_t csize, coffset;
+
             /* Compressed clusters don't have QCOW_OFLAG_COPIED */
             if (l2_entry & QCOW_OFLAG_COPIED) {
                 fprintf(stderr, "ERROR: coffset=0x%" PRIx64 ": "
@@ -1599,12 +1602,13 @@ static int check_refcounts_l2(BlockDriverState *bs, BdrvCheckResult *res,
             }
 
             /* Mark cluster as used */
-            nb_csectors = ((l2_entry >> s->csize_shift) &
-                           s->csize_mask) + 1;
-            l2_entry &= s->cluster_offset_mask;
+            csize = (((l2_entry >> s->csize_shift) & s->csize_mask) + 1) *
+                    BDRV_SECTOR_SIZE;
+            coffset = l2_entry & s->cluster_offset_mask &
+                      ~(BDRV_SECTOR_SIZE - 1);
             ret = qcow2_inc_refcounts_imrt(bs, res,
                                            refcount_table, refcount_table_size,
-                                           l2_entry & ~511, nb_csectors * 512);
+                                           coffset, csize);
             if (ret < 0) {
                 goto fail;
             }
@@ -1621,6 +1625,7 @@ static int check_refcounts_l2(BlockDriverState *bs, BdrvCheckResult *res,
                 res->bfi.fragmented_clusters++;
             }
             break;
+        }
 
         case QCOW2_CLUSTER_ZERO_ALLOC:
         case QCOW2_CLUSTER_NORMAL:
-- 
2.11.1

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [Qemu-devel] [PATCH v2 4/7] block/qcow2-refcount: check_refcounts_l2: reduce ignored overlaps
  2018-08-17 12:22 [Qemu-devel] [PATCH 0/7] qcow2 check improvements Vladimir Sementsov-Ogievskiy
                   ` (2 preceding siblings ...)
  2018-08-17 12:22 ` [Qemu-devel] [PATCH v2 3/7] block/qcow2-refcount: check_refcounts_l2: refactor compressed case Vladimir Sementsov-Ogievskiy
@ 2018-08-17 12:22 ` Vladimir Sementsov-Ogievskiy
  2018-10-08 15:44   ` Max Reitz
  2018-08-17 12:22 ` [Qemu-devel] [PATCH v2 5/7] block/qcow2-refcount: check_refcounts_l2: split fix_l2_entry_to_zero Vladimir Sementsov-Ogievskiy
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 32+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2018-08-17 12:22 UTC (permalink / raw)
  To: qemu-devel, qemu-block; +Cc: mreitz, kwolf, eblake, vsementsov, den

Reduce number of structures ignored in overlap check: when checking
active table ignore active tables, when checking inactive table ignore
inactive ones.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 block/qcow2-refcount.c | 16 +++++++++-------
 1 file changed, 9 insertions(+), 7 deletions(-)

diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index 0ea01e3ee2..3b057b635d 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -1565,7 +1565,7 @@ enum {
 static int check_refcounts_l2(BlockDriverState *bs, BdrvCheckResult *res,
                               void **refcount_table,
                               int64_t *refcount_table_size, int64_t l2_offset,
-                              int flags, BdrvCheckMode fix)
+                              int flags, BdrvCheckMode fix, bool active)
 {
     BDRVQcow2State *s = bs->opaque;
     uint64_t *l2_table, l2_entry;
@@ -1654,11 +1654,12 @@ static int check_refcounts_l2(BlockDriverState *bs, BdrvCheckResult *res,
                     if (fix & BDRV_FIX_ERRORS) {
                         uint64_t l2e_offset =
                             l2_offset + (uint64_t)i * sizeof(uint64_t);
+                        int ign = active ? QCOW2_OL_ACTIVE_L2 :
+                                           QCOW2_OL_INACTIVE_L2;
 
                         l2_entry = QCOW_OFLAG_ZERO;
                         l2_table[i] = cpu_to_be64(l2_entry);
-                        ret = qcow2_pre_write_overlap_check(bs,
-                                QCOW2_OL_ACTIVE_L2 | QCOW2_OL_INACTIVE_L2,
+                        ret = qcow2_pre_write_overlap_check(bs, ign,
                                 l2e_offset, sizeof(uint64_t));
                         if (ret < 0) {
                             fprintf(stderr, "ERROR: Overlap check failed\n");
@@ -1732,7 +1733,7 @@ static int check_refcounts_l1(BlockDriverState *bs,
                               void **refcount_table,
                               int64_t *refcount_table_size,
                               int64_t l1_table_offset, int l1_size,
-                              int flags, BdrvCheckMode fix)
+                              int flags, BdrvCheckMode fix, bool active)
 {
     BDRVQcow2State *s = bs->opaque;
     uint64_t *l1_table = NULL, l2_offset, l1_size2;
@@ -1788,7 +1789,7 @@ static int check_refcounts_l1(BlockDriverState *bs,
             /* Process and check L2 entries */
             ret = check_refcounts_l2(bs, res, refcount_table,
                                      refcount_table_size, l2_offset, flags,
-                                     fix);
+                                     fix, active);
             if (ret < 0) {
                 goto fail;
             }
@@ -2074,7 +2075,7 @@ static int calculate_refcounts(BlockDriverState *bs, BdrvCheckResult *res,
     /* current L1 table */
     ret = check_refcounts_l1(bs, res, refcount_table, nb_clusters,
                              s->l1_table_offset, s->l1_size, CHECK_FRAG_INFO,
-                             fix);
+                             fix, true);
     if (ret < 0) {
         return ret;
     }
@@ -2097,7 +2098,8 @@ static int calculate_refcounts(BlockDriverState *bs, BdrvCheckResult *res,
             continue;
         }
         ret = check_refcounts_l1(bs, res, refcount_table, nb_clusters,
-                                 sn->l1_table_offset, sn->l1_size, 0, fix);
+                                 sn->l1_table_offset, sn->l1_size, 0, fix,
+                                 false);
         if (ret < 0) {
             return ret;
         }
-- 
2.11.1

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [Qemu-devel] [PATCH v2 5/7] block/qcow2-refcount: check_refcounts_l2: split fix_l2_entry_to_zero
  2018-08-17 12:22 [Qemu-devel] [PATCH 0/7] qcow2 check improvements Vladimir Sementsov-Ogievskiy
                   ` (3 preceding siblings ...)
  2018-08-17 12:22 ` [Qemu-devel] [PATCH v2 4/7] block/qcow2-refcount: check_refcounts_l2: reduce ignored overlaps Vladimir Sementsov-Ogievskiy
@ 2018-08-17 12:22 ` Vladimir Sementsov-Ogievskiy
  2018-10-08 19:54   ` Max Reitz
  2018-08-17 12:22 ` [Qemu-devel] [PATCH v2 6/7] block/qcow2-refcount: fix out-of-file L1 entries to be zero Vladimir Sementsov-Ogievskiy
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 32+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2018-08-17 12:22 UTC (permalink / raw)
  To: qemu-devel, qemu-block; +Cc: mreitz, kwolf, eblake, vsementsov, den

Split entry repairing to separate function, to be reused later.

Note: entry in in-memory l2 table (local variable in
check_refcounts_l2) is not updated after this patch.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 block/qcow2-refcount.c | 147 ++++++++++++++++++++++++++++++++++++-------------
 1 file changed, 109 insertions(+), 38 deletions(-)

diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index 3b057b635d..d9c8cd666b 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -1554,6 +1554,99 @@ enum {
     CHECK_FRAG_INFO = 0x2,      /* update BlockFragInfo counters */
 };
 
+/* Update entry in L1 or L2 table
+ *
+ * Returns: -errno if overlap check failed
+ *          0 if write failed
+ *          1 on success
+ */
+static int write_table_entry(BlockDriverState *bs, const char *table_name,
+                             uint64_t table_offset, int entry_index,
+                             uint64_t new_val, int ign)
+{
+    int ret;
+    uint64_t entry_offset =
+            table_offset + (uint64_t)entry_index * sizeof(new_val);
+
+    cpu_to_be64s(&new_val);
+    ret = qcow2_pre_write_overlap_check(bs, ign, entry_offset, sizeof(new_val));
+    if (ret < 0) {
+        fprintf(stderr,
+                "ERROR: Can't write %s table entry: overlap check failed: %s\n",
+                table_name, strerror(-ret));
+        return ret;
+    }
+
+    ret = bdrv_pwrite_sync(bs->file, entry_offset, &new_val, sizeof(new_val));
+    if (ret < 0) {
+        fprintf(stderr, "ERROR: Failed to overwrite %s table entry: %s\n",
+                table_name, strerror(-ret));
+        return 0;
+    }
+
+    return 1;
+}
+
+/* Try to fix (if allowed) entry in L1 or L2 table. Update @res correspondingly.
+ *
+ * Returns: -errno if overlap check failed
+ *          0 if entry was not updated for other reason
+ *            (fixing disabled or write failed)
+ *          1 on success
+ */
+static int fix_table_entry(BlockDriverState *bs, BdrvCheckResult *res,
+                           BdrvCheckMode fix, const char *table_name,
+                           uint64_t table_offset, int entry_index,
+                           uint64_t new_val, int ign,
+                           const char *fmt, va_list args)
+{
+    int ret;
+
+    fprintf(stderr, fix & BDRV_FIX_ERRORS ? "Repairing: " : "ERROR: ");
+    vfprintf(stderr, fmt, args);
+    fprintf(stderr, "\n");
+
+    if (!(fix & BDRV_FIX_ERRORS)) {
+        res->corruptions++;
+        return 0;
+    }
+
+    ret = write_table_entry(bs, table_name, table_offset, entry_index, new_val,
+                            ign);
+
+    if (ret == 1) {
+        res->corruptions_fixed++;
+    } else {
+        res->check_errors++;
+    }
+
+    return ret;
+}
+
+/* Make L2 entry to be QCOW2_CLUSTER_ZERO_PLAIN
+ *
+ * Returns: -errno if overlap check failed
+ *          0 if write failed
+ *          1 on success
+ */
+static int fix_l2_entry_to_zero(BlockDriverState *bs, BdrvCheckResult *res,
+                                BdrvCheckMode fix, int64_t l2_offset,
+                                int l2_index, bool active,
+                                const char *fmt, ...)
+{
+    int ret;
+    int ign = active ? QCOW2_OL_ACTIVE_L2 : QCOW2_OL_INACTIVE_L2;
+    uint64_t l2_entry = QCOW_OFLAG_ZERO;
+    va_list args;
+
+    va_start(args, fmt);
+    ret = fix_table_entry(bs, res, fix, "L2", l2_offset, l2_index, l2_entry,
+                          ign, fmt, args);
+    va_end(args);
+
+    return ret;
+}
+
 /*
  * Increases the refcount in the given refcount table for the all clusters
  * referenced in the L2 table. While doing so, performs some checks on L2
@@ -1646,46 +1739,24 @@ static int check_refcounts_l2(BlockDriverState *bs, BdrvCheckResult *res,
                 if (qcow2_get_cluster_type(l2_entry) ==
                     QCOW2_CLUSTER_ZERO_ALLOC)
                 {
-                    fprintf(stderr, "%s offset=%" PRIx64 ": Preallocated zero "
-                            "cluster is not properly aligned; L2 entry "
-                            "corrupted.\n",
-                            fix & BDRV_FIX_ERRORS ? "Repairing" : "ERROR",
+                    ret = fix_l2_entry_to_zero(
+                            bs, res, fix, l2_offset, i, active,
+                            "offset=%" PRIx64 ": Preallocated zero cluster is "
+                            "not properly aligned; L2 entry corrupted.",
                             offset);
-                    if (fix & BDRV_FIX_ERRORS) {
-                        uint64_t l2e_offset =
-                            l2_offset + (uint64_t)i * sizeof(uint64_t);
-                        int ign = active ? QCOW2_OL_ACTIVE_L2 :
-                                           QCOW2_OL_INACTIVE_L2;
-
-                        l2_entry = QCOW_OFLAG_ZERO;
-                        l2_table[i] = cpu_to_be64(l2_entry);
-                        ret = qcow2_pre_write_overlap_check(bs, ign,
-                                l2e_offset, sizeof(uint64_t));
-                        if (ret < 0) {
-                            fprintf(stderr, "ERROR: Overlap check failed\n");
-                            res->check_errors++;
-                            /* Something is seriously wrong, so abort checking
-                             * this L2 table */
-                            goto fail;
-                        }
-
-                        ret = bdrv_pwrite_sync(bs->file, l2e_offset,
-                                               &l2_table[i], sizeof(uint64_t));
-                        if (ret < 0) {
-                            fprintf(stderr, "ERROR: Failed to overwrite L2 "
-                                    "table entry: %s\n", strerror(-ret));
-                            res->check_errors++;
-                            /* Do not abort, continue checking the rest of this
-                             * L2 table's entries */
-                        } else {
-                            res->corruptions_fixed++;
-                            /* Skip marking the cluster as used
-                             * (it is unused now) */
-                            continue;
-                        }
-                    } else {
-                        res->corruptions++;
+                    if (ret < 0) {
+                        /* Something is seriously wrong, so abort checking
+                         * this L2 table */
+                        goto fail;
+                    }
+                    if (ret == 1) {
+                        /* Skip marking the cluster as used
+                         * (it is unused now) */
+                        continue;
                     }
+                    /* Entry was not updated, but do not abort, mark cluster
+                     * as used and continue checking the rest of this L2
+                     * table's entries */
                 } else {
                     fprintf(stderr, "ERROR offset=%" PRIx64 ": Data cluster is "
                         "not properly aligned; L2 entry corrupted.\n", offset);
-- 
2.11.1

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [Qemu-devel] [PATCH v2 6/7] block/qcow2-refcount: fix out-of-file L1 entries to be zero
  2018-08-17 12:22 [Qemu-devel] [PATCH 0/7] qcow2 check improvements Vladimir Sementsov-Ogievskiy
                   ` (4 preceding siblings ...)
  2018-08-17 12:22 ` [Qemu-devel] [PATCH v2 5/7] block/qcow2-refcount: check_refcounts_l2: split fix_l2_entry_to_zero Vladimir Sementsov-Ogievskiy
@ 2018-08-17 12:22 ` Vladimir Sementsov-Ogievskiy
  2018-10-08 20:09   ` Max Reitz
  2018-08-17 12:22 ` [Qemu-devel] [PATCH v2 7/7] block/qcow2-refcount: fix out-of-file L2 entries to be read-as-zero Vladimir Sementsov-Ogievskiy
  2018-10-08 15:02 ` [Qemu-devel] [PATCH 0/7] qcow2 check improvements Vladimir Sementsov-Ogievskiy
  7 siblings, 1 reply; 32+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2018-08-17 12:22 UTC (permalink / raw)
  To: qemu-devel, qemu-block; +Cc: mreitz, kwolf, eblake, vsementsov, den

Zero out corrupted L1 table entry, which reference L2 table out of
underlying file.
Zero L1 table entry means that "the L2 table and all clusters described
by this L2 table are unallocated."

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 block/qcow2-refcount.c | 44 ++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 44 insertions(+)

diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index d9c8cd666b..3c004e5bfe 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -1647,6 +1647,29 @@ static int fix_l2_entry_to_zero(BlockDriverState *bs, BdrvCheckResult *res,
     return ret;
 }
 
+/* Zero out L1 entry
+ *
+ * Returns: -errno if overlap check failed
+ *          0 if write failed
+ *          1 on success
+ */
+static int fix_l1_entry_to_zero(BlockDriverState *bs, BdrvCheckResult *res,
+                                BdrvCheckMode fix, int64_t l1_offset,
+                                int l1_index, bool active,
+                                const char *fmt, ...)
+{
+    int ret;
+    int ign = active ? QCOW2_OL_ACTIVE_L2 : QCOW2_OL_INACTIVE_L2;
+    va_list args;
+
+    va_start(args, fmt);
+    ret = fix_table_entry(bs, res, fix, "L1", l1_offset, l1_index, 0, ign,
+                          fmt, args);
+    va_end(args);
+
+    return ret;
+}
+
 /*
  * Increases the refcount in the given refcount table for the all clusters
  * referenced in the L2 table. While doing so, performs some checks on L2
@@ -1808,6 +1831,7 @@ static int check_refcounts_l1(BlockDriverState *bs,
 {
     BDRVQcow2State *s = bs->opaque;
     uint64_t *l1_table = NULL, l2_offset, l1_size2;
+    int64_t file_len;
     int i, ret;
 
     l1_size2 = l1_size * sizeof(uint64_t);
@@ -1837,12 +1861,32 @@ static int check_refcounts_l1(BlockDriverState *bs,
             be64_to_cpus(&l1_table[i]);
     }
 
+    file_len = bdrv_getlength(bs->file->bs);
+    if (file_len < 0) {
+        ret = file_len;
+        goto fail;
+    }
+
     /* Do the actual checks */
     for(i = 0; i < l1_size; i++) {
         l2_offset = l1_table[i];
         if (l2_offset) {
             /* Mark L2 table as used */
             l2_offset &= L1E_OFFSET_MASK;
+            if (l2_offset >= file_len) {
+                ret = fix_l1_entry_to_zero(
+                        bs, res, fix, l1_table_offset, i, active,
+                        "l2 table offset out of file: offset 0x%" PRIx64,
+                        l2_offset);
+                if (ret < 0) {
+                    /* Something is seriously wrong, so abort checking
+                     * this L1 table */
+                    goto fail;
+                }
+
+                continue;
+            }
+
             ret = qcow2_inc_refcounts_imrt(bs, res,
                                            refcount_table, refcount_table_size,
                                            l2_offset, s->cluster_size);
-- 
2.11.1

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [Qemu-devel] [PATCH v2 7/7] block/qcow2-refcount: fix out-of-file L2 entries to be read-as-zero
  2018-08-17 12:22 [Qemu-devel] [PATCH 0/7] qcow2 check improvements Vladimir Sementsov-Ogievskiy
                   ` (5 preceding siblings ...)
  2018-08-17 12:22 ` [Qemu-devel] [PATCH v2 6/7] block/qcow2-refcount: fix out-of-file L1 entries to be zero Vladimir Sementsov-Ogievskiy
@ 2018-08-17 12:22 ` Vladimir Sementsov-Ogievskiy
  2018-10-08 20:51   ` Max Reitz
  2018-10-10 16:39   ` Vladimir Sementsov-Ogievskiy
  2018-10-08 15:02 ` [Qemu-devel] [PATCH 0/7] qcow2 check improvements Vladimir Sementsov-Ogievskiy
  7 siblings, 2 replies; 32+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2018-08-17 12:22 UTC (permalink / raw)
  To: qemu-devel, qemu-block; +Cc: mreitz, kwolf, eblake, vsementsov, den

Rewrite corrupted L2 table entry, which reference space out of
underlying file.

Make this L2 table entry read-as-all-zeros without any allocation.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 block/qcow2-refcount.c | 32 ++++++++++++++++++++++++++++++++
 1 file changed, 32 insertions(+)

diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index 3c004e5bfe..3de3768a3c 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -1720,8 +1720,30 @@ static int check_refcounts_l2(BlockDriverState *bs, BdrvCheckResult *res,
             /* Mark cluster as used */
             csize = (((l2_entry >> s->csize_shift) & s->csize_mask) + 1) *
                     BDRV_SECTOR_SIZE;
+            if (csize > s->cluster_size) {
+                ret = fix_l2_entry_to_zero(
+                        bs, res, fix, l2_offset, i, active,
+                        "compressed cluster larger than cluster: size 0x%"
+                        PRIx64, csize);
+                if (ret < 0) {
+                    goto fail;
+                }
+                continue;
+            }
+
             coffset = l2_entry & s->cluster_offset_mask &
                       ~(BDRV_SECTOR_SIZE - 1);
+            if (coffset >= bdrv_getlength(bs->file->bs)) {
+                ret = fix_l2_entry_to_zero(
+                        bs, res, fix, l2_offset, i, active,
+                        "compressed cluster out of file: offset 0x%" PRIx64,
+                        coffset);
+                if (ret < 0) {
+                    goto fail;
+                }
+                continue;
+            }
+
             ret = qcow2_inc_refcounts_imrt(bs, res,
                                            refcount_table, refcount_table_size,
                                            coffset, csize);
@@ -1748,6 +1770,16 @@ static int check_refcounts_l2(BlockDriverState *bs, BdrvCheckResult *res,
         {
             uint64_t offset = l2_entry & L2E_OFFSET_MASK;
 
+            if (offset >= bdrv_getlength(bs->file->bs)) {
+                ret = fix_l2_entry_to_zero(
+                        bs, res, fix, l2_offset, i, active,
+                        "cluster out of file: offset 0x%" PRIx64, offset);
+                if (ret < 0) {
+                    goto fail;
+                }
+                continue;
+            }
+
             if (flags & CHECK_FRAG_INFO) {
                 res->bfi.allocated_clusters++;
                 if (next_contiguous_offset &&
-- 
2.11.1

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [PATCH 0/7] qcow2 check improvements
  2018-08-17 12:22 [Qemu-devel] [PATCH 0/7] qcow2 check improvements Vladimir Sementsov-Ogievskiy
                   ` (6 preceding siblings ...)
  2018-08-17 12:22 ` [Qemu-devel] [PATCH v2 7/7] block/qcow2-refcount: fix out-of-file L2 entries to be read-as-zero Vladimir Sementsov-Ogievskiy
@ 2018-10-08 15:02 ` Vladimir Sementsov-Ogievskiy
  7 siblings, 0 replies; 32+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2018-10-08 15:02 UTC (permalink / raw)
  To: qemu-devel, qemu-block; +Cc: mreitz, kwolf, eblake, Denis Lunev

ping

Hi, what about this?

17.08.2018 15:22, Vladimir Sementsov-Ogievskiy wrote:
> Hi all!
>
> v2:
> 02, 06: check bdrv_getlength error return code
>
> v1:
>
> We've faced the following problem: after host fs corruption, vm images
> becomes invalid. And which is interesting, starting qemu-img check on
> them led to allocating of the whole RAM and then killing qemu-img by
> OOM Killer.
>
> This was due to corrupted l2 entries, which referenced clusters far-far
> beyond the end of the qcow2 file.
> 02 is a generic fix for the bug, 01 is unrelated improvement, 03-07 are
> additional info and fixing for such corrupted table entries.
>
> Questions on 02, 06 and 07:
> 1. Should restrictions be more or less strict?
> 2. Are there valid cases, when such entries should not be considered as
>     corrupted?
>
> Vladimir Sementsov-Ogievskiy (7):
>    block/qcow2-refcount: fix check_oflag_copied
>    block/qcow2-refcount: avoid eating RAM
>    block/qcow2-refcount: check_refcounts_l2: refactor compressed case
>    block/qcow2-refcount: check_refcounts_l2: reduce ignored overlaps
>    block/qcow2-refcount: check_refcounts_l2: split fix_l2_entry_to_zero
>    block/qcow2-refcount: fix out-of-file L1 entries to be zero
>    block/qcow2-refcount: fix out-of-file L2 entries to be read-as-zero
>
>   block/qcow2-refcount.c | 270 +++++++++++++++++++++++++++++++++++++++----------
>   1 file changed, 219 insertions(+), 51 deletions(-)
>


-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [PATCH v2 1/7] block/qcow2-refcount: fix check_oflag_copied
  2018-08-17 12:22 ` [Qemu-devel] [PATCH v2 1/7] block/qcow2-refcount: fix check_oflag_copied Vladimir Sementsov-Ogievskiy
@ 2018-10-08 15:28   ` Max Reitz
  0 siblings, 0 replies; 32+ messages in thread
From: Max Reitz @ 2018-10-08 15:28 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-devel, qemu-block; +Cc: kwolf, eblake, den

[-- Attachment #1: Type: text/plain, Size: 2246 bytes --]

On 17.08.18 14:22, Vladimir Sementsov-Ogievskiy wrote:
> Increase corruptions_fixed only after successful fix.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>  block/qcow2-refcount.c | 8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
> index 3c539f02e5..615847eb09 100644
> --- a/block/qcow2-refcount.c
> +++ b/block/qcow2-refcount.c
> @@ -1816,7 +1816,7 @@ static int check_oflag_copied(BlockDriverState *bs, BdrvCheckResult *res,
>      for (i = 0; i < s->l1_size; i++) {
>          uint64_t l1_entry = s->l1_table[i];
>          uint64_t l2_offset = l1_entry & L1E_OFFSET_MASK;
> -        bool l2_dirty = false;
> +        int l2_fixed_entries = 0;

I understand that this new variable kind of supersedes the other one,
but I think for the sake of readability it would be better to keep l2_dirty.

Max

>          if (!l2_offset) {
>              continue;
> @@ -1878,8 +1878,7 @@ static int check_oflag_copied(BlockDriverState *bs, BdrvCheckResult *res,
>                          l2_table[j] = cpu_to_be64(refcount == 1
>                                      ? l2_entry |  QCOW_OFLAG_COPIED
>                                      : l2_entry & ~QCOW_OFLAG_COPIED);
> -                        l2_dirty = true;
> -                        res->corruptions_fixed++;
> +                        l2_fixed_entries++;
>                      } else {
>                          res->corruptions++;
>                      }
> @@ -1887,7 +1886,7 @@ static int check_oflag_copied(BlockDriverState *bs, BdrvCheckResult *res,
>              }
>          }
>  
> -        if (l2_dirty) {
> +        if (l2_fixed_entries > 0) {
>              ret = qcow2_pre_write_overlap_check(bs, QCOW2_OL_ACTIVE_L2,
>                                                  l2_offset, s->cluster_size);
>              if (ret < 0) {
> @@ -1905,6 +1904,7 @@ static int check_oflag_copied(BlockDriverState *bs, BdrvCheckResult *res,
>                  res->check_errors++;
>                  goto fail;
>              }
> +            res->corruptions_fixed += l2_fixed_entries;
>          }
>      }
>  
> 



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [PATCH v2 2/7] block/qcow2-refcount: avoid eating RAM
  2018-08-17 12:22 ` [Qemu-devel] [PATCH v2 2/7] block/qcow2-refcount: avoid eating RAM Vladimir Sementsov-Ogievskiy
@ 2018-10-08 15:31   ` Max Reitz
  2018-10-08 20:17     ` Vladimir Sementsov-Ogievskiy
  2018-10-08 20:22     ` Vladimir Sementsov-Ogievskiy
  0 siblings, 2 replies; 32+ messages in thread
From: Max Reitz @ 2018-10-08 15:31 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-devel, qemu-block; +Cc: kwolf, eblake, den

[-- Attachment #1: Type: text/plain, Size: 1834 bytes --]

On 17.08.18 14:22, Vladimir Sementsov-Ogievskiy wrote:
> qcow2_inc_refcounts_imrt() (through realloc_refcount_array()) can eat
> an unpredictable amount of memory on corrupted table entries, which are
> referencing regions far beyond the end of file.
> 
> Prevent this, by skipping such regions from further processing.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>  block/qcow2-refcount.c | 14 ++++++++++++++
>  1 file changed, 14 insertions(+)
> 
> diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
> index 615847eb09..566c19fbfa 100644
> --- a/block/qcow2-refcount.c
> +++ b/block/qcow2-refcount.c
> @@ -1499,12 +1499,26 @@ int qcow2_inc_refcounts_imrt(BlockDriverState *bs, BdrvCheckResult *res,
>  {
>      BDRVQcow2State *s = bs->opaque;
>      uint64_t start, last, cluster_offset, k, refcount;
> +    int64_t file_len;
>      int ret;
>  
>      if (size <= 0) {
>          return 0;
>      }
>  
> +    file_len = bdrv_getlength(bs->file->bs);
> +    if (file_len < 0) {
> +        return file_len;
> +    }

Doesn't this slow things down?  Can we not cache the length somewhere
and update it whenever the image is modified?

> +
> +    if (offset + size - file_len > s->cluster_size) {
> +        fprintf(stderr, "ERROR: counting reference for region exceeding the "
> +                "end of the file by more than one cluster: offset 0x%" PRIx64
> +                " size 0x%" PRIx64 "\n", offset, size);

Why is one cluster OK?  Is there a specific case you're trying to catch
here?

Max

> +        res->corruptions++;
> +        return 0;
> +    }
> +
>      start = start_of_cluster(s, offset);
>      last = start_of_cluster(s, offset + size - 1);
>      for(cluster_offset = start; cluster_offset <= last;
> 



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [PATCH v2 3/7] block/qcow2-refcount: check_refcounts_l2: refactor compressed case
  2018-08-17 12:22 ` [Qemu-devel] [PATCH v2 3/7] block/qcow2-refcount: check_refcounts_l2: refactor compressed case Vladimir Sementsov-Ogievskiy
@ 2018-10-08 15:40   ` Max Reitz
  0 siblings, 0 replies; 32+ messages in thread
From: Max Reitz @ 2018-10-08 15:40 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-devel, qemu-block; +Cc: kwolf, eblake, den

[-- Attachment #1: Type: text/plain, Size: 2819 bytes --]

On 17.08.18 14:22, Vladimir Sementsov-Ogievskiy wrote:
> Separate offset and size of compressed cluster.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>  block/qcow2-refcount.c | 15 ++++++++++-----
>  1 file changed, 10 insertions(+), 5 deletions(-)
> 
> diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
> index 566c19fbfa..0ea01e3ee2 100644
> --- a/block/qcow2-refcount.c
> +++ b/block/qcow2-refcount.c
> @@ -1570,7 +1570,7 @@ static int check_refcounts_l2(BlockDriverState *bs, BdrvCheckResult *res,
>      BDRVQcow2State *s = bs->opaque;
>      uint64_t *l2_table, l2_entry;
>      uint64_t next_contiguous_offset = 0;
> -    int i, l2_size, nb_csectors, ret;
> +    int i, l2_size, ret;
>  
>      /* Read L2 table from disk */
>      l2_size = s->l2_size * sizeof(uint64_t);
> @@ -1589,6 +1589,9 @@ static int check_refcounts_l2(BlockDriverState *bs, BdrvCheckResult *res,
>  
>          switch (qcow2_get_cluster_type(l2_entry)) {
>          case QCOW2_CLUSTER_COMPRESSED:
> +        {
> +            int64_t csize, coffset;
> +
>              /* Compressed clusters don't have QCOW_OFLAG_COPIED */
>              if (l2_entry & QCOW_OFLAG_COPIED) {
>                  fprintf(stderr, "ERROR: coffset=0x%" PRIx64 ": "
> @@ -1599,12 +1602,13 @@ static int check_refcounts_l2(BlockDriverState *bs, BdrvCheckResult *res,
>              }
>  
>              /* Mark cluster as used */
> -            nb_csectors = ((l2_entry >> s->csize_shift) &
> -                           s->csize_mask) + 1;
> -            l2_entry &= s->cluster_offset_mask;
> +            csize = (((l2_entry >> s->csize_shift) & s->csize_mask) + 1) *
> +                    BDRV_SECTOR_SIZE;
> +            coffset = l2_entry & s->cluster_offset_mask &
> +                      ~(BDRV_SECTOR_SIZE - 1);

This should actually be 512 instead of BDRV_SECTOR_SIZE (or the former
may be a shift by 9 bits), because this is about qcow2 and not about the
block layer's sector size.

(Other places in the qcow2 driver handling compressed clusters use 512
or shift by 9, too.)

Max

>              ret = qcow2_inc_refcounts_imrt(bs, res,
>                                             refcount_table, refcount_table_size,
> -                                           l2_entry & ~511, nb_csectors * 512);
> +                                           coffset, csize);
>              if (ret < 0) {
>                  goto fail;
>              }
> @@ -1621,6 +1625,7 @@ static int check_refcounts_l2(BlockDriverState *bs, BdrvCheckResult *res,
>                  res->bfi.fragmented_clusters++;
>              }
>              break;
> +        }
>  
>          case QCOW2_CLUSTER_ZERO_ALLOC:
>          case QCOW2_CLUSTER_NORMAL:
> 



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [PATCH v2 4/7] block/qcow2-refcount: check_refcounts_l2: reduce ignored overlaps
  2018-08-17 12:22 ` [Qemu-devel] [PATCH v2 4/7] block/qcow2-refcount: check_refcounts_l2: reduce ignored overlaps Vladimir Sementsov-Ogievskiy
@ 2018-10-08 15:44   ` Max Reitz
  0 siblings, 0 replies; 32+ messages in thread
From: Max Reitz @ 2018-10-08 15:44 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-devel, qemu-block; +Cc: kwolf, eblake, den

[-- Attachment #1: Type: text/plain, Size: 453 bytes --]

On 17.08.18 14:22, Vladimir Sementsov-Ogievskiy wrote:
> Reduce number of structures ignored in overlap check: when checking
> active table ignore active tables, when checking inactive table ignore
> inactive ones.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>  block/qcow2-refcount.c | 16 +++++++++-------
>  1 file changed, 9 insertions(+), 7 deletions(-)

Reviewed-by: Max Reitz <mreitz@redhat.com>


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [PATCH v2 5/7] block/qcow2-refcount: check_refcounts_l2: split fix_l2_entry_to_zero
  2018-08-17 12:22 ` [Qemu-devel] [PATCH v2 5/7] block/qcow2-refcount: check_refcounts_l2: split fix_l2_entry_to_zero Vladimir Sementsov-Ogievskiy
@ 2018-10-08 19:54   ` Max Reitz
  2018-10-10 12:25     ` Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 32+ messages in thread
From: Max Reitz @ 2018-10-08 19:54 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-devel, qemu-block; +Cc: kwolf, eblake, den

[-- Attachment #1: Type: text/plain, Size: 4797 bytes --]

On 17.08.18 14:22, Vladimir Sementsov-Ogievskiy wrote:
> Split entry repairing to separate function, to be reused later.
> 
> Note: entry in in-memory l2 table (local variable in
> check_refcounts_l2) is not updated after this patch.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>  block/qcow2-refcount.c | 147 ++++++++++++++++++++++++++++++++++++-------------
>  1 file changed, 109 insertions(+), 38 deletions(-)
> 
> diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
> index 3b057b635d..d9c8cd666b 100644
> --- a/block/qcow2-refcount.c
> +++ b/block/qcow2-refcount.c
> @@ -1554,6 +1554,99 @@ enum {
>      CHECK_FRAG_INFO = 0x2,      /* update BlockFragInfo counters */
>  };
>  
> +/* Update entry in L1 or L2 table
> + *
> + * Returns: -errno if overlap check failed
> + *          0 if write failed
> + *          1 on success
> + */
> +static int write_table_entry(BlockDriverState *bs, const char *table_name,
> +                             uint64_t table_offset, int entry_index,
> +                             uint64_t new_val, int ign)
> +{
> +    int ret;
> +    uint64_t entry_offset =
> +            table_offset + (uint64_t)entry_index * sizeof(new_val);
> +
> +    cpu_to_be64s(&new_val);
> +    ret = qcow2_pre_write_overlap_check(bs, ign, entry_offset, sizeof(new_val));
> +    if (ret < 0) {
> +        fprintf(stderr,
> +                "ERROR: Can't write %s table entry: overlap check failed: %s\n",

I recently complained to Berto that I don't like elisions ("can't") in
user interfaces, so I suppose I'll have to complain here, too, that I'd
prefer a "Cannot".

> +                table_name, strerror(-ret));
> +        return ret;
> +    }
> +
> +    ret = bdrv_pwrite_sync(bs->file, entry_offset, &new_val, sizeof(new_val));
> +    if (ret < 0) {
> +        fprintf(stderr, "ERROR: Failed to overwrite %s table entry: %s\n",
> +                table_name, strerror(-ret));
> +        return 0;
> +    }
> +
> +    return 1;
> +}
> +
> +/* Try to fix (if allowed) entry in L1 or L2 table. Update @res correspondingly.
> + *
> + * Returns: -errno if overlap check failed
> + *          0 if entry was not updated for other reason
> + *            (fixing disabled or write failed)
> + *          1 on success
> + */
> +static int fix_table_entry(BlockDriverState *bs, BdrvCheckResult *res,
> +                           BdrvCheckMode fix, const char *table_name,
> +                           uint64_t table_offset, int entry_index,
> +                           uint64_t new_val, int ign,
> +                           const char *fmt, va_list args)
> +{
> +    int ret;
> +
> +    fprintf(stderr, fix & BDRV_FIX_ERRORS ? "Repairing: " : "ERROR: ");
> +    vfprintf(stderr, fmt, args);
> +    fprintf(stderr, "\n");

If you're just going to print this here, right at the start, I think it
would be better to just do it in the caller.

Sure, with this solution the caller safes an fprintf() call, but I find
it a bit over the top to start with vararg handling here when the caller
can just do an additional fprintf().

(Also, I actually find it clearer if you have to call two separate
functions.)

> +
> +    if (!(fix & BDRV_FIX_ERRORS)) {
> +        res->corruptions++;
> +        return 0;
> +    }
> +
> +    ret = write_table_entry(bs, table_name, table_offset, entry_index, new_val,
> +                            ign);
> +
> +    if (ret == 1) {
> +        res->corruptions_fixed++;
> +    } else {
> +        res->check_errors++;
> +    }
> +
> +    return ret;
> +}
> +
> +/* Make L2 entry to be QCOW2_CLUSTER_ZERO_PLAIN
> + *
> + * Returns: -errno if overlap check failed
> + *          0 if write failed
> + *          1 on success
> + */
> +static int fix_l2_entry_to_zero(BlockDriverState *bs, BdrvCheckResult *res,
> +                                BdrvCheckMode fix, int64_t l2_offset,
> +                                int l2_index, bool active,
> +                                const char *fmt, ...)
> +{
> +    int ret;
> +    int ign = active ? QCOW2_OL_ACTIVE_L2 : QCOW2_OL_INACTIVE_L2;
> +    uint64_t l2_entry = QCOW_OFLAG_ZERO;
> +    va_list args;
> +
> +    va_start(args, fmt);
> +    ret = fix_table_entry(bs, res, fix, "L2", l2_offset, l2_index, l2_entry,
> +                          ign, fmt, args);
> +    va_end(args);
> +
> +    return ret;
> +}

If you drop the fprintf() from fix_table_entry(), this function will
make less sense as well.  Just calling fix_table_entry() directly will
be just as easy.

(Yes, I see that you use the function in patch 7 again, and yes, you'd
have to use a full line for the "active ?:" ternary, but still.)

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [PATCH v2 6/7] block/qcow2-refcount: fix out-of-file L1 entries to be zero
  2018-08-17 12:22 ` [Qemu-devel] [PATCH v2 6/7] block/qcow2-refcount: fix out-of-file L1 entries to be zero Vladimir Sementsov-Ogievskiy
@ 2018-10-08 20:09   ` Max Reitz
  0 siblings, 0 replies; 32+ messages in thread
From: Max Reitz @ 2018-10-08 20:09 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-devel, qemu-block; +Cc: kwolf, eblake, den

[-- Attachment #1: Type: text/plain, Size: 886 bytes --]

On 17.08.18 14:22, Vladimir Sementsov-Ogievskiy wrote:
> Zero out corrupted L1 table entry, which reference L2 table out of
> underlying file.
> Zero L1 table entry means that "the L2 table and all clusters described
> by this L2 table are unallocated."
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>  block/qcow2-refcount.c | 44 ++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 44 insertions(+)

Hm.  The specification actually says nothing about offsets being allowed
past the end of the file, and I don't think we ever use them (outside of
a very short period during image creation, where we point to refcount
structures beyond the EOF).

So the patch looks OK to me, although I'd still prefer a separate
fprintf() and think it would be fine for check_refcounts_l1() to call
fix_table_entry() directly.

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [PATCH v2 2/7] block/qcow2-refcount: avoid eating RAM
  2018-10-08 15:31   ` Max Reitz
@ 2018-10-08 20:17     ` Vladimir Sementsov-Ogievskiy
  2018-10-08 20:22     ` Vladimir Sementsov-Ogievskiy
  1 sibling, 0 replies; 32+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2018-10-08 20:17 UTC (permalink / raw)
  To: Max Reitz, qemu-devel, qemu-block; +Cc: kwolf, eblake, Denis Lunev



On 10/08/2018 06:31 PM, Max Reitz wrote:
> On 17.08.18 14:22, Vladimir Sementsov-Ogievskiy wrote:
>> qcow2_inc_refcounts_imrt() (through realloc_refcount_array()) can eat
>> an unpredictable amount of memory on corrupted table entries, which are
>> referencing regions far beyond the end of file.
>>
>> Prevent this, by skipping such regions from further processing.
>>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>> ---
>>   block/qcow2-refcount.c | 14 ++++++++++++++
>>   1 file changed, 14 insertions(+)
>>
>> diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
>> index 615847eb09..566c19fbfa 100644
>> --- a/block/qcow2-refcount.c
>> +++ b/block/qcow2-refcount.c
>> @@ -1499,12 +1499,26 @@ int qcow2_inc_refcounts_imrt(BlockDriverState *bs, BdrvCheckResult *res,
>>   {
>>       BDRVQcow2State *s = bs->opaque;
>>       uint64_t start, last, cluster_offset, k, refcount;
>> +    int64_t file_len;
>>       int ret;
>>   
>>       if (size <= 0) {
>>           return 0;
>>       }
>>   
>> +    file_len = bdrv_getlength(bs->file->bs);
>> +    if (file_len < 0) {
>> +        return file_len;
>> +    }
> 
> Doesn't this slow things down?  Can we not cache the length somewhere
> and update it whenever the image is modified?


hmm. bdrv_getlength is used everywhere in Qemu, and I don't think it is 
good idea to improve it locally for these series. If we can improve it 
somehow with a cache or something like this, it should be done for all 
users and therefore it is outside of these series..

> 
>> +
>> +    if (offset + size - file_len > s->cluster_size) {
>> +        fprintf(stderr, "ERROR: counting reference for region exceeding the "
>> +                "end of the file by more than one cluster: offset 0x%" PRIx64
>> +                " size 0x%" PRIx64 "\n", offset, size);
> 
> Why is one cluster OK?  Is there a specific case you're trying to catch
> here?

raw file under qcow2 may be not aligned in real size to qcow2 cluster, 
as I understand, it's normal for the last cluster to be semi-allocated

> 
> Max
> 
>> +        res->corruptions++;
>> +        return 0;
>> +    }
>> +
>>       start = start_of_cluster(s, offset);
>>       last = start_of_cluster(s, offset + size - 1);
>>       for(cluster_offset = start; cluster_offset <= last;
>>
> 
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [PATCH v2 2/7] block/qcow2-refcount: avoid eating RAM
  2018-10-08 15:31   ` Max Reitz
  2018-10-08 20:17     ` Vladimir Sementsov-Ogievskiy
@ 2018-10-08 20:22     ` Vladimir Sementsov-Ogievskiy
  2018-10-08 20:39       ` Max Reitz
  1 sibling, 1 reply; 32+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2018-10-08 20:22 UTC (permalink / raw)
  To: Max Reitz, qemu-devel, qemu-block; +Cc: kwolf, eblake, Denis Lunev



On 10/08/2018 06:31 PM, Max Reitz wrote:
> On 17.08.18 14:22, Vladimir Sementsov-Ogievskiy wrote:
>> qcow2_inc_refcounts_imrt() (through realloc_refcount_array()) can eat
>> an unpredictable amount of memory on corrupted table entries, which are
>> referencing regions far beyond the end of file.
>>
>> Prevent this, by skipping such regions from further processing.
>>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>> ---
>>   block/qcow2-refcount.c | 14 ++++++++++++++
>>   1 file changed, 14 insertions(+)
>>
>> diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
>> index 615847eb09..566c19fbfa 100644
>> --- a/block/qcow2-refcount.c
>> +++ b/block/qcow2-refcount.c
>> @@ -1499,12 +1499,26 @@ int qcow2_inc_refcounts_imrt(BlockDriverState *bs, BdrvCheckResult *res,
>>   {
>>       BDRVQcow2State *s = bs->opaque;
>>       uint64_t start, last, cluster_offset, k, refcount;
>> +    int64_t file_len;
>>       int ret;
>>   
>>       if (size <= 0) {
>>           return 0;
>>       }
>>   
>> +    file_len = bdrv_getlength(bs->file->bs);
>> +    if (file_len < 0) {
>> +        return file_len;
>> +    }
> 
> Doesn't this slow things down?  Can we not cache the length somewhere
> and update it whenever the image is modified?


hmm. bdrv_getlength is used everywhere in Qemu, and I don't think it is 
good idea to improve it locally for these series. If we can improve it 
somehow with a cache or something like this, it should be done for all 
users and therefore it is outside of these series..

> 
>> +
>> +    if (offset + size - file_len > s->cluster_size) {
>> +        fprintf(stderr, "ERROR: counting reference for region exceeding the "
>> +                "end of the file by more than one cluster: offset 0x%" PRIx64
>> +                " size 0x%" PRIx64 "\n", offset, size);
> 
> Why is one cluster OK?  Is there a specific case you're trying to catch
> here?

raw file under qcow2 may be not aligned in real size to qcow2 cluster, 
as I understand, it's normal for the last cluster to be semi-allocated

> 
> Max
> 
>> +        res->corruptions++;
>> +        return 0;
>> +    }
>> +
>>       start = start_of_cluster(s, offset);
>>       last = start_of_cluster(s, offset + size - 1);
>>       for(cluster_offset = start; cluster_offset <= last;
>>
> 
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [PATCH v2 2/7] block/qcow2-refcount: avoid eating RAM
  2018-10-08 20:22     ` Vladimir Sementsov-Ogievskiy
@ 2018-10-08 20:39       ` Max Reitz
  0 siblings, 0 replies; 32+ messages in thread
From: Max Reitz @ 2018-10-08 20:39 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-devel, qemu-block
  Cc: kwolf, eblake, Denis Lunev

[-- Attachment #1: Type: text/plain, Size: 3044 bytes --]

On 08.10.18 22:22, Vladimir Sementsov-Ogievskiy wrote:
> 
> 
> On 10/08/2018 06:31 PM, Max Reitz wrote:
>> On 17.08.18 14:22, Vladimir Sementsov-Ogievskiy wrote:
>>> qcow2_inc_refcounts_imrt() (through realloc_refcount_array()) can eat
>>> an unpredictable amount of memory on corrupted table entries, which are
>>> referencing regions far beyond the end of file.
>>>
>>> Prevent this, by skipping such regions from further processing.
>>>
>>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>>> ---
>>>   block/qcow2-refcount.c | 14 ++++++++++++++
>>>   1 file changed, 14 insertions(+)
>>>
>>> diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
>>> index 615847eb09..566c19fbfa 100644
>>> --- a/block/qcow2-refcount.c
>>> +++ b/block/qcow2-refcount.c
>>> @@ -1499,12 +1499,26 @@ int qcow2_inc_refcounts_imrt(BlockDriverState *bs, BdrvCheckResult *res,
>>>   {
>>>       BDRVQcow2State *s = bs->opaque;
>>>       uint64_t start, last, cluster_offset, k, refcount;
>>> +    int64_t file_len;
>>>       int ret;
>>>   
>>>       if (size <= 0) {
>>>           return 0;
>>>       }
>>>   
>>> +    file_len = bdrv_getlength(bs->file->bs);
>>> +    if (file_len < 0) {
>>> +        return file_len;
>>> +    }
>>
>> Doesn't this slow things down?  Can we not cache the length somewhere
>> and update it whenever the image is modified?
> 
> 
> hmm. bdrv_getlength is used everywhere in Qemu, and I don't think it is 
> good idea to improve it locally for these series. If we can improve it 
> somehow with a cache or something like this, it should be done for all 
> users and therefore it is outside of these series..

I wanted to write: Sure it's used everywhere, but usually that is before
someone performs some I/O, so it isn't too bad.  But this is a function
that's suppose to just increment a couple of values in memory, which is
different.

However, I put the "wanted to write" prefix there, because: I knew that
we already have a central cache for bdrv_getlength(), but it isn't used
when the block driver reports has_variable_length as true.  I thought
file-posix did that.  But it only does so for CD-ROM devices.

So I think it should be OK to call the function here, yes.

>>> +
>>> +    if (offset + size - file_len > s->cluster_size) {
>>> +        fprintf(stderr, "ERROR: counting reference for region exceeding the "
>>> +                "end of the file by more than one cluster: offset 0x%" PRIx64
>>> +                " size 0x%" PRIx64 "\n", offset, size);
>>
>> Why is one cluster OK?  Is there a specific case you're trying to catch
>> here?
> 
> raw file under qcow2 may be not aligned in real size to qcow2 cluster, 
> as I understand, it's normal for the last cluster to be semi-allocated

Ah, that's true, thanks.  I'd appreciate a comment here, though, and in
that case I think we don't need to check whether the reference is off by
more than a cluster, but whether it's off by a cluster or more (so >=).

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [PATCH v2 7/7] block/qcow2-refcount: fix out-of-file L2 entries to be read-as-zero
  2018-08-17 12:22 ` [Qemu-devel] [PATCH v2 7/7] block/qcow2-refcount: fix out-of-file L2 entries to be read-as-zero Vladimir Sementsov-Ogievskiy
@ 2018-10-08 20:51   ` Max Reitz
  2018-10-08 22:02     ` Vladimir Sementsov-Ogievskiy
  2018-10-10 16:39   ` Vladimir Sementsov-Ogievskiy
  1 sibling, 1 reply; 32+ messages in thread
From: Max Reitz @ 2018-10-08 20:51 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-devel, qemu-block; +Cc: kwolf, eblake, den

[-- Attachment #1: Type: text/plain, Size: 2964 bytes --]

On 17.08.18 14:22, Vladimir Sementsov-Ogievskiy wrote:
> Rewrite corrupted L2 table entry, which reference space out of
> underlying file.
> 
> Make this L2 table entry read-as-all-zeros without any allocation.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>  block/qcow2-refcount.c | 32 ++++++++++++++++++++++++++++++++
>  1 file changed, 32 insertions(+)
> 
> diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
> index 3c004e5bfe..3de3768a3c 100644
> --- a/block/qcow2-refcount.c
> +++ b/block/qcow2-refcount.c
> @@ -1720,8 +1720,30 @@ static int check_refcounts_l2(BlockDriverState *bs, BdrvCheckResult *res,
>              /* Mark cluster as used */
>              csize = (((l2_entry >> s->csize_shift) & s->csize_mask) + 1) *
>                      BDRV_SECTOR_SIZE;
> +            if (csize > s->cluster_size) {
> +                ret = fix_l2_entry_to_zero(
> +                        bs, res, fix, l2_offset, i, active,
> +                        "compressed cluster larger than cluster: size 0x%"
> +                        PRIx64, csize);
> +                if (ret < 0) {
> +                    goto fail;
> +                }
> +                continue;
> +            }
> +

This seems recoverable, isn't it?  Can we not try to just limit the
csize, or decompress the cluster with the given csize from the given
offset, disregarding the cluster limit?

>              coffset = l2_entry & s->cluster_offset_mask &
>                        ~(BDRV_SECTOR_SIZE - 1);
> +            if (coffset >= bdrv_getlength(bs->file->bs)) {
> +                ret = fix_l2_entry_to_zero(
> +                        bs, res, fix, l2_offset, i, active,
> +                        "compressed cluster out of file: offset 0x%" PRIx64,
> +                        coffset);
> +                if (ret < 0) {
> +                    goto fail;
> +                }
> +                continue;
> +            }
> +
>              ret = qcow2_inc_refcounts_imrt(bs, res,
>                                             refcount_table, refcount_table_size,
>                                             coffset, csize);
> @@ -1748,6 +1770,16 @@ static int check_refcounts_l2(BlockDriverState *bs, BdrvCheckResult *res,
>          {
>              uint64_t offset = l2_entry & L2E_OFFSET_MASK;
>  
> +            if (offset >= bdrv_getlength(bs->file->bs)) {
> +                ret = fix_l2_entry_to_zero(
> +                        bs, res, fix, l2_offset, i, active,
> +                        "cluster out of file: offset 0x%" PRIx64, offset);
> +                if (ret < 0) {
> +                    goto fail;
> +                }
> +                continue;
> +            }
> +

These other two look OK, but they have another issue:  If this is a v2
image, you cannot create zero clusters; so you'll have to unallocate the
cluster in that case.

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [PATCH v2 7/7] block/qcow2-refcount: fix out-of-file L2 entries to be read-as-zero
  2018-10-08 20:51   ` Max Reitz
@ 2018-10-08 22:02     ` Vladimir Sementsov-Ogievskiy
  2018-10-08 22:08       ` Max Reitz
  0 siblings, 1 reply; 32+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2018-10-08 22:02 UTC (permalink / raw)
  To: Max Reitz, qemu-devel, qemu-block; +Cc: kwolf, eblake, Denis Lunev



On 10/08/2018 11:51 PM, Max Reitz wrote:
> On 17.08.18 14:22, Vladimir Sementsov-Ogievskiy wrote:
>> Rewrite corrupted L2 table entry, which reference space out of
>> underlying file.
>>
>> Make this L2 table entry read-as-all-zeros without any allocation.
>>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>> ---
>>   block/qcow2-refcount.c | 32 ++++++++++++++++++++++++++++++++
>>   1 file changed, 32 insertions(+)
>>
>> diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
>> index 3c004e5bfe..3de3768a3c 100644
>> --- a/block/qcow2-refcount.c
>> +++ b/block/qcow2-refcount.c
>> @@ -1720,8 +1720,30 @@ static int check_refcounts_l2(BlockDriverState *bs, BdrvCheckResult *res,
>>               /* Mark cluster as used */
>>               csize = (((l2_entry >> s->csize_shift) & s->csize_mask) + 1) *
>>                       BDRV_SECTOR_SIZE;
>> +            if (csize > s->cluster_size) {
>> +                ret = fix_l2_entry_to_zero(
>> +                        bs, res, fix, l2_offset, i, active,
>> +                        "compressed cluster larger than cluster: size 0x%"
>> +                        PRIx64, csize);
>> +                if (ret < 0) {
>> +                    goto fail;
>> +                }
>> +                continue;
>> +            }
>> +
> 
> This seems recoverable, isn't it?  Can we not try to just limit the
> csize, or decompress the cluster with the given csize from the given
> offset, disregarding the cluster limit?

Hm, you want to assume that csize is corrupted but coffset may be 
correct? Unlikely, I think.

So, to carefully repair csize, we should decompress one cluster (or one 
cluster - 1 byte) of data, trying to get one cluster of decompressed 
data. If we succeed, we know csize, or we can safely set it to one cluster.

Or we can just set csize = 1 cluster, if it is larger. And leave 
problems to real execution which will lead to EIO in worst case.

> 
>>               coffset = l2_entry & s->cluster_offset_mask &
>>                         ~(BDRV_SECTOR_SIZE - 1);
>> +            if (coffset >= bdrv_getlength(bs->file->bs)) {
>> +                ret = fix_l2_entry_to_zero(
>> +                        bs, res, fix, l2_offset, i, active,
>> +                        "compressed cluster out of file: offset 0x%" PRIx64,
>> +                        coffset);
>> +                if (ret < 0) {
>> +                    goto fail;
>> +                }
>> +                continue;
>> +            }
>> +
>>               ret = qcow2_inc_refcounts_imrt(bs, res,
>>                                              refcount_table, refcount_table_size,
>>                                              coffset, csize);
>> @@ -1748,6 +1770,16 @@ static int check_refcounts_l2(BlockDriverState *bs, BdrvCheckResult *res,
>>           {
>>               uint64_t offset = l2_entry & L2E_OFFSET_MASK;
>>   
>> +            if (offset >= bdrv_getlength(bs->file->bs)) {
>> +                ret = fix_l2_entry_to_zero(
>> +                        bs, res, fix, l2_offset, i, active,
>> +                        "cluster out of file: offset 0x%" PRIx64, offset);
>> +                if (ret < 0) {
>> +                    goto fail;
>> +                }
>> +                continue;
>> +            }
>> +
> 
> These other two look OK, but they have another issue:  If this is a v2
> image, you cannot create zero clusters; so you'll have to unallocate the
> cluster in that case.


Oho, it's a problem. It may be unsafe to discard clusters, making 
backing image available through the holes. What discard do on v2? 
Zeroing or holes?


> 
> Max
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [PATCH v2 7/7] block/qcow2-refcount: fix out-of-file L2 entries to be read-as-zero
  2018-10-08 22:02     ` Vladimir Sementsov-Ogievskiy
@ 2018-10-08 22:08       ` Max Reitz
  2018-10-08 22:14         ` Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 32+ messages in thread
From: Max Reitz @ 2018-10-08 22:08 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-devel, qemu-block
  Cc: kwolf, eblake, Denis Lunev

[-- Attachment #1: Type: text/plain, Size: 4363 bytes --]

On 09.10.18 00:02, Vladimir Sementsov-Ogievskiy wrote:
> 
> 
> On 10/08/2018 11:51 PM, Max Reitz wrote:
>> On 17.08.18 14:22, Vladimir Sementsov-Ogievskiy wrote:
>>> Rewrite corrupted L2 table entry, which reference space out of
>>> underlying file.
>>>
>>> Make this L2 table entry read-as-all-zeros without any allocation.
>>>
>>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>>> ---
>>>   block/qcow2-refcount.c | 32 ++++++++++++++++++++++++++++++++
>>>   1 file changed, 32 insertions(+)
>>>
>>> diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
>>> index 3c004e5bfe..3de3768a3c 100644
>>> --- a/block/qcow2-refcount.c
>>> +++ b/block/qcow2-refcount.c
>>> @@ -1720,8 +1720,30 @@ static int check_refcounts_l2(BlockDriverState *bs, BdrvCheckResult *res,
>>>               /* Mark cluster as used */
>>>               csize = (((l2_entry >> s->csize_shift) & s->csize_mask) + 1) *
>>>                       BDRV_SECTOR_SIZE;
>>> +            if (csize > s->cluster_size) {
>>> +                ret = fix_l2_entry_to_zero(
>>> +                        bs, res, fix, l2_offset, i, active,
>>> +                        "compressed cluster larger than cluster: size 0x%"
>>> +                        PRIx64, csize);
>>> +                if (ret < 0) {
>>> +                    goto fail;
>>> +                }
>>> +                continue;
>>> +            }
>>> +
>>
>> This seems recoverable, isn't it?  Can we not try to just limit the
>> csize, or decompress the cluster with the given csize from the given
>> offset, disregarding the cluster limit?
> 
> Hm, you want to assume that csize is corrupted but coffset may be 
> correct? Unlikely, I think.

Better to reconstruct probably garbage data than to definitely garbage
data (all zeroes) is what I think.

> So, to carefully repair csize, we should decompress one cluster (or one 
> cluster - 1 byte) of data, trying to get one cluster of decompressed 
> data. If we succeed, we know csize, or we can safely set it to one cluster.

Yes.

> Or we can just set csize = 1 cluster, if it is larger. And leave 
> problems to real execution which will lead to EIO in worst case.

Or this, yes.

>>>               coffset = l2_entry & s->cluster_offset_mask &
>>>                         ~(BDRV_SECTOR_SIZE - 1);
>>> +            if (coffset >= bdrv_getlength(bs->file->bs)) {
>>> +                ret = fix_l2_entry_to_zero(
>>> +                        bs, res, fix, l2_offset, i, active,
>>> +                        "compressed cluster out of file: offset 0x%" PRIx64,
>>> +                        coffset);
>>> +                if (ret < 0) {
>>> +                    goto fail;
>>> +                }
>>> +                continue;
>>> +            }
>>> +
>>>               ret = qcow2_inc_refcounts_imrt(bs, res,
>>>                                              refcount_table, refcount_table_size,
>>>                                              coffset, csize);
>>> @@ -1748,6 +1770,16 @@ static int check_refcounts_l2(BlockDriverState *bs, BdrvCheckResult *res,
>>>           {
>>>               uint64_t offset = l2_entry & L2E_OFFSET_MASK;
>>>   
>>> +            if (offset >= bdrv_getlength(bs->file->bs)) {
>>> +                ret = fix_l2_entry_to_zero(
>>> +                        bs, res, fix, l2_offset, i, active,
>>> +                        "cluster out of file: offset 0x%" PRIx64, offset);
>>> +                if (ret < 0) {
>>> +                    goto fail;
>>> +                }
>>> +                continue;
>>> +            }
>>> +
>>
>> These other two look OK, but they have another issue:  If this is a v2
>> image, you cannot create zero clusters; so you'll have to unallocate the
>> cluster in that case.
> 
> 
> Oho, it's a problem. It may be unsafe to discard clusters, making 
> backing image available through the holes. What discard do on v2? 
> Zeroing or holes?

Oh, right!  discard on v2 punches a hole.  So I see three ways:
(1) You can do the same and point to that bit of code, or
(2) You allocate a data cluster full of zeroes in case of v2, or
(3) You just error out.

(3) doesn't seem like the worst option.  Amending the image to be v3 is
always possible and trivial.  Maybe point the user to that option.

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [PATCH v2 7/7] block/qcow2-refcount: fix out-of-file L2 entries to be read-as-zero
  2018-10-08 22:08       ` Max Reitz
@ 2018-10-08 22:14         ` Vladimir Sementsov-Ogievskiy
  2018-10-08 22:21           ` Max Reitz
  0 siblings, 1 reply; 32+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2018-10-08 22:14 UTC (permalink / raw)
  To: Max Reitz, qemu-devel, qemu-block; +Cc: kwolf, eblake, Denis Lunev



On 10/09/2018 01:08 AM, Max Reitz wrote:
> On 09.10.18 00:02, Vladimir Sementsov-Ogievskiy wrote:
>>
>>
>> On 10/08/2018 11:51 PM, Max Reitz wrote:
>>> On 17.08.18 14:22, Vladimir Sementsov-Ogievskiy wrote:
>>>> Rewrite corrupted L2 table entry, which reference space out of
>>>> underlying file.
>>>>
>>>> Make this L2 table entry read-as-all-zeros without any allocation.
>>>>
>>>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>>>> ---
>>>>    block/qcow2-refcount.c | 32 ++++++++++++++++++++++++++++++++
>>>>    1 file changed, 32 insertions(+)
>>>>
>>>> diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
>>>> index 3c004e5bfe..3de3768a3c 100644
>>>> --- a/block/qcow2-refcount.c
>>>> +++ b/block/qcow2-refcount.c
>>>> @@ -1720,8 +1720,30 @@ static int check_refcounts_l2(BlockDriverState *bs, BdrvCheckResult *res,
>>>>                /* Mark cluster as used */
>>>>                csize = (((l2_entry >> s->csize_shift) & s->csize_mask) + 1) *
>>>>                        BDRV_SECTOR_SIZE;
>>>> +            if (csize > s->cluster_size) {
>>>> +                ret = fix_l2_entry_to_zero(
>>>> +                        bs, res, fix, l2_offset, i, active,
>>>> +                        "compressed cluster larger than cluster: size 0x%"
>>>> +                        PRIx64, csize);
>>>> +                if (ret < 0) {
>>>> +                    goto fail;
>>>> +                }
>>>> +                continue;
>>>> +            }
>>>> +
>>>
>>> This seems recoverable, isn't it?  Can we not try to just limit the
>>> csize, or decompress the cluster with the given csize from the given
>>> offset, disregarding the cluster limit?
>>
>> Hm, you want to assume that csize is corrupted but coffset may be
>> correct? Unlikely, I think.
> 
> Better to reconstruct probably garbage data than to definitely garbage
> data (all zeroes) is what I think.
> 
>> So, to carefully repair csize, we should decompress one cluster (or one
>> cluster - 1 byte) of data, trying to get one cluster of decompressed
>> data. If we succeed, we know csize, or we can safely set it to one cluster.
> 
> Yes.
> 
>> Or we can just set csize = 1 cluster, if it is larger. And leave
>> problems to real execution which will lead to EIO in worst case.
> 
> Or this, yes.
> 
>>>>                coffset = l2_entry & s->cluster_offset_mask &
>>>>                          ~(BDRV_SECTOR_SIZE - 1);
>>>> +            if (coffset >= bdrv_getlength(bs->file->bs)) {
>>>> +                ret = fix_l2_entry_to_zero(
>>>> +                        bs, res, fix, l2_offset, i, active,
>>>> +                        "compressed cluster out of file: offset 0x%" PRIx64,
>>>> +                        coffset);
>>>> +                if (ret < 0) {
>>>> +                    goto fail;
>>>> +                }
>>>> +                continue;
>>>> +            }
>>>> +
>>>>                ret = qcow2_inc_refcounts_imrt(bs, res,
>>>>                                               refcount_table, refcount_table_size,
>>>>                                               coffset, csize);
>>>> @@ -1748,6 +1770,16 @@ static int check_refcounts_l2(BlockDriverState *bs, BdrvCheckResult *res,
>>>>            {
>>>>                uint64_t offset = l2_entry & L2E_OFFSET_MASK;
>>>>    
>>>> +            if (offset >= bdrv_getlength(bs->file->bs)) {
>>>> +                ret = fix_l2_entry_to_zero(
>>>> +                        bs, res, fix, l2_offset, i, active,
>>>> +                        "cluster out of file: offset 0x%" PRIx64, offset);
>>>> +                if (ret < 0) {
>>>> +                    goto fail;
>>>> +                }
>>>> +                continue;
>>>> +            }
>>>> +
>>>
>>> These other two look OK, but they have another issue:  If this is a v2
>>> image, you cannot create zero clusters; so you'll have to unallocate the
>>> cluster in that case.
>>
>>
>> Oho, it's a problem. It may be unsafe to discard clusters, making
>> backing image available through the holes. What discard do on v2?
>> Zeroing or holes?
> 
> Oh, right!  discard on v2 punches a hole.  So I see three ways:
> (1) You can do the same and point to that bit of code, or
> (2) You allocate a data cluster full of zeroes in case of v2, or
> (3) You just error out.
> 
> (3) doesn't seem like the worst option.  

> Amending the image to be v3 is
> always possible and trivial. 

how to do it for corrupted image?

> Maybe point the user to that option.
> 
> Max
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [PATCH v2 7/7] block/qcow2-refcount: fix out-of-file L2 entries to be read-as-zero
  2018-10-08 22:14         ` Vladimir Sementsov-Ogievskiy
@ 2018-10-08 22:21           ` Max Reitz
  2018-10-08 23:14             ` Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 32+ messages in thread
From: Max Reitz @ 2018-10-08 22:21 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-devel, qemu-block
  Cc: kwolf, eblake, Denis Lunev

[-- Attachment #1: Type: text/plain, Size: 4823 bytes --]

On 09.10.18 00:14, Vladimir Sementsov-Ogievskiy wrote:
> 
> 
> On 10/09/2018 01:08 AM, Max Reitz wrote:
>> On 09.10.18 00:02, Vladimir Sementsov-Ogievskiy wrote:
>>>
>>>
>>> On 10/08/2018 11:51 PM, Max Reitz wrote:
>>>> On 17.08.18 14:22, Vladimir Sementsov-Ogievskiy wrote:
>>>>> Rewrite corrupted L2 table entry, which reference space out of
>>>>> underlying file.
>>>>>
>>>>> Make this L2 table entry read-as-all-zeros without any allocation.
>>>>>
>>>>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>>>>> ---
>>>>>    block/qcow2-refcount.c | 32 ++++++++++++++++++++++++++++++++
>>>>>    1 file changed, 32 insertions(+)
>>>>>
>>>>> diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
>>>>> index 3c004e5bfe..3de3768a3c 100644
>>>>> --- a/block/qcow2-refcount.c
>>>>> +++ b/block/qcow2-refcount.c
>>>>> @@ -1720,8 +1720,30 @@ static int check_refcounts_l2(BlockDriverState *bs, BdrvCheckResult *res,
>>>>>                /* Mark cluster as used */
>>>>>                csize = (((l2_entry >> s->csize_shift) & s->csize_mask) + 1) *
>>>>>                        BDRV_SECTOR_SIZE;
>>>>> +            if (csize > s->cluster_size) {
>>>>> +                ret = fix_l2_entry_to_zero(
>>>>> +                        bs, res, fix, l2_offset, i, active,
>>>>> +                        "compressed cluster larger than cluster: size 0x%"
>>>>> +                        PRIx64, csize);
>>>>> +                if (ret < 0) {
>>>>> +                    goto fail;
>>>>> +                }
>>>>> +                continue;
>>>>> +            }
>>>>> +
>>>>
>>>> This seems recoverable, isn't it?  Can we not try to just limit the
>>>> csize, or decompress the cluster with the given csize from the given
>>>> offset, disregarding the cluster limit?
>>>
>>> Hm, you want to assume that csize is corrupted but coffset may be
>>> correct? Unlikely, I think.
>>
>> Better to reconstruct probably garbage data than to definitely garbage
>> data (all zeroes) is what I think.
>>
>>> So, to carefully repair csize, we should decompress one cluster (or one
>>> cluster - 1 byte) of data, trying to get one cluster of decompressed
>>> data. If we succeed, we know csize, or we can safely set it to one cluster.
>>
>> Yes.
>>
>>> Or we can just set csize = 1 cluster, if it is larger. And leave
>>> problems to real execution which will lead to EIO in worst case.
>>
>> Or this, yes.
>>
>>>>>                coffset = l2_entry & s->cluster_offset_mask &
>>>>>                          ~(BDRV_SECTOR_SIZE - 1);
>>>>> +            if (coffset >= bdrv_getlength(bs->file->bs)) {
>>>>> +                ret = fix_l2_entry_to_zero(
>>>>> +                        bs, res, fix, l2_offset, i, active,
>>>>> +                        "compressed cluster out of file: offset 0x%" PRIx64,
>>>>> +                        coffset);
>>>>> +                if (ret < 0) {
>>>>> +                    goto fail;
>>>>> +                }
>>>>> +                continue;
>>>>> +            }
>>>>> +
>>>>>                ret = qcow2_inc_refcounts_imrt(bs, res,
>>>>>                                               refcount_table, refcount_table_size,
>>>>>                                               coffset, csize);
>>>>> @@ -1748,6 +1770,16 @@ static int check_refcounts_l2(BlockDriverState *bs, BdrvCheckResult *res,
>>>>>            {
>>>>>                uint64_t offset = l2_entry & L2E_OFFSET_MASK;
>>>>>    
>>>>> +            if (offset >= bdrv_getlength(bs->file->bs)) {
>>>>> +                ret = fix_l2_entry_to_zero(
>>>>> +                        bs, res, fix, l2_offset, i, active,
>>>>> +                        "cluster out of file: offset 0x%" PRIx64, offset);
>>>>> +                if (ret < 0) {
>>>>> +                    goto fail;
>>>>> +                }
>>>>> +                continue;
>>>>> +            }
>>>>> +
>>>>
>>>> These other two look OK, but they have another issue:  If this is a v2
>>>> image, you cannot create zero clusters; so you'll have to unallocate the
>>>> cluster in that case.
>>>
>>>
>>> Oho, it's a problem. It may be unsafe to discard clusters, making
>>> backing image available through the holes. What discard do on v2?
>>> Zeroing or holes?
>>
>> Oh, right!  discard on v2 punches a hole.  So I see three ways:
>> (1) You can do the same and point to that bit of code, or
>> (2) You allocate a data cluster full of zeroes in case of v2, or
>> (3) You just error out.
>>
>> (3) doesn't seem like the worst option.  
> 
>> Amending the image to be v3 is
>> always possible and trivial. 
> 
> how to do it for corrupted image?

Oh, yeah, you can't open a corrupted image, can you...  I suppose we
want a way to force-clear the flag anyway. :-)

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [PATCH v2 7/7] block/qcow2-refcount: fix out-of-file L2 entries to be read-as-zero
  2018-10-08 22:21           ` Max Reitz
@ 2018-10-08 23:14             ` Vladimir Sementsov-Ogievskiy
  2018-10-13 12:51               ` Max Reitz
  0 siblings, 1 reply; 32+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2018-10-08 23:14 UTC (permalink / raw)
  To: Max Reitz, qemu-devel, qemu-block; +Cc: kwolf, eblake, Denis Lunev



On 10/09/2018 01:21 AM, Max Reitz wrote:
> On 09.10.18 00:14, Vladimir Sementsov-Ogievskiy wrote:
>>
>>
>> On 10/09/2018 01:08 AM, Max Reitz wrote:
>>> On 09.10.18 00:02, Vladimir Sementsov-Ogievskiy wrote:
>>>>
>>>>
>>>> On 10/08/2018 11:51 PM, Max Reitz wrote:
>>>>> On 17.08.18 14:22, Vladimir Sementsov-Ogievskiy wrote:
>>>>>> Rewrite corrupted L2 table entry, which reference space out of
>>>>>> underlying file.
>>>>>>
>>>>>> Make this L2 table entry read-as-all-zeros without any allocation.
>>>>>>
>>>>>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>>>>>> ---
>>>>>>     block/qcow2-refcount.c | 32 ++++++++++++++++++++++++++++++++
>>>>>>     1 file changed, 32 insertions(+)
>>>>>>
>>>>>> diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
>>>>>> index 3c004e5bfe..3de3768a3c 100644
>>>>>> --- a/block/qcow2-refcount.c
>>>>>> +++ b/block/qcow2-refcount.c
>>>>>> @@ -1720,8 +1720,30 @@ static int check_refcounts_l2(BlockDriverState *bs, BdrvCheckResult *res,
>>>>>>                 /* Mark cluster as used */
>>>>>>                 csize = (((l2_entry >> s->csize_shift) & s->csize_mask) + 1) *
>>>>>>                         BDRV_SECTOR_SIZE;
>>>>>> +            if (csize > s->cluster_size) {
>>>>>> +                ret = fix_l2_entry_to_zero(
>>>>>> +                        bs, res, fix, l2_offset, i, active,
>>>>>> +                        "compressed cluster larger than cluster: size 0x%"
>>>>>> +                        PRIx64, csize);
>>>>>> +                if (ret < 0) {
>>>>>> +                    goto fail;
>>>>>> +                }
>>>>>> +                continue;
>>>>>> +            }
>>>>>> +
>>>>>
>>>>> This seems recoverable, isn't it?  Can we not try to just limit the
>>>>> csize, or decompress the cluster with the given csize from the given
>>>>> offset, disregarding the cluster limit?
>>>>
>>>> Hm, you want to assume that csize is corrupted but coffset may be
>>>> correct? Unlikely, I think.
>>>
>>> Better to reconstruct probably garbage data than to definitely garbage
>>> data (all zeroes) is what I think.
>>>
>>>> So, to carefully repair csize, we should decompress one cluster (or one
>>>> cluster - 1 byte) of data, trying to get one cluster of decompressed
>>>> data. If we succeed, we know csize, or we can safely set it to one cluster.
>>>
>>> Yes.
>>>
>>>> Or we can just set csize = 1 cluster, if it is larger. And leave
>>>> problems to real execution which will lead to EIO in worst case.
>>>
>>> Or this, yes.
>>>
>>>>>>                 coffset = l2_entry & s->cluster_offset_mask &
>>>>>>                           ~(BDRV_SECTOR_SIZE - 1);
>>>>>> +            if (coffset >= bdrv_getlength(bs->file->bs)) {
>>>>>> +                ret = fix_l2_entry_to_zero(
>>>>>> +                        bs, res, fix, l2_offset, i, active,
>>>>>> +                        "compressed cluster out of file: offset 0x%" PRIx64,
>>>>>> +                        coffset);
>>>>>> +                if (ret < 0) {
>>>>>> +                    goto fail;
>>>>>> +                }
>>>>>> +                continue;
>>>>>> +            }
>>>>>> +
>>>>>>                 ret = qcow2_inc_refcounts_imrt(bs, res,
>>>>>>                                                refcount_table, refcount_table_size,
>>>>>>                                                coffset, csize);
>>>>>> @@ -1748,6 +1770,16 @@ static int check_refcounts_l2(BlockDriverState *bs, BdrvCheckResult *res,
>>>>>>             {
>>>>>>                 uint64_t offset = l2_entry & L2E_OFFSET_MASK;
>>>>>>     
>>>>>> +            if (offset >= bdrv_getlength(bs->file->bs)) {
>>>>>> +                ret = fix_l2_entry_to_zero(
>>>>>> +                        bs, res, fix, l2_offset, i, active,
>>>>>> +                        "cluster out of file: offset 0x%" PRIx64, offset);
>>>>>> +                if (ret < 0) {
>>>>>> +                    goto fail;
>>>>>> +                }
>>>>>> +                continue;
>>>>>> +            }
>>>>>> +
>>>>>
>>>>> These other two look OK, but they have another issue:  If this is a v2
>>>>> image, you cannot create zero clusters; so you'll have to unallocate the
>>>>> cluster in that case.
>>>>
>>>>
>>>> Oho, it's a problem. It may be unsafe to discard clusters, making
>>>> backing image available through the holes. What discard do on v2?
>>>> Zeroing or holes?
>>>
>>> Oh, right!  discard on v2 punches a hole.  So I see three ways:
>>> (1) You can do the same and point to that bit of code, or
>>> (2) You allocate a data cluster full of zeroes in case of v2, or
>>> (3) You just error out.
>>>
>>> (3) doesn't seem like the worst option.
>>
>>> Amending the image to be v3 is
>>> always possible and trivial.
>>
>> how to do it for corrupted image?
> 
> Oh, yeah, you can't open a corrupted image, can you...  I suppose we
> want a way to force-clear the flag anyway. :-)

am, which flag?

> 
> Max
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [PATCH v2 5/7] block/qcow2-refcount: check_refcounts_l2: split fix_l2_entry_to_zero
  2018-10-08 19:54   ` Max Reitz
@ 2018-10-10 12:25     ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 32+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2018-10-10 12:25 UTC (permalink / raw)
  To: Max Reitz, qemu-devel, qemu-block; +Cc: kwolf, eblake, Denis Lunev

08.10.2018 22:54, Max Reitz wrote:
> On 17.08.18 14:22, Vladimir Sementsov-Ogievskiy wrote:
>> Split entry repairing to separate function, to be reused later.
>>
>> Note: entry in in-memory l2 table (local variable in
>> check_refcounts_l2) is not updated after this patch.
>>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>> ---
>>   block/qcow2-refcount.c | 147 ++++++++++++++++++++++++++++++++++++-------------
>>   1 file changed, 109 insertions(+), 38 deletions(-)
>>
>> diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
>> index 3b057b635d..d9c8cd666b 100644
>> --- a/block/qcow2-refcount.c
>> +++ b/block/qcow2-refcount.c
>> @@ -1554,6 +1554,99 @@ enum {
>>       CHECK_FRAG_INFO = 0x2,      /* update BlockFragInfo counters */
>>   };
>>   
>> +/* Update entry in L1 or L2 table
>> + *
>> + * Returns: -errno if overlap check failed
>> + *          0 if write failed
>> + *          1 on success
>> + */
>> +static int write_table_entry(BlockDriverState *bs, const char *table_name,
>> +                             uint64_t table_offset, int entry_index,
>> +                             uint64_t new_val, int ign)
>> +{
>> +    int ret;
>> +    uint64_t entry_offset =
>> +            table_offset + (uint64_t)entry_index * sizeof(new_val);
>> +
>> +    cpu_to_be64s(&new_val);
>> +    ret = qcow2_pre_write_overlap_check(bs, ign, entry_offset, sizeof(new_val));
>> +    if (ret < 0) {
>> +        fprintf(stderr,
>> +                "ERROR: Can't write %s table entry: overlap check failed: %s\n",
> I recently complained to Berto that I don't like elisions ("can't") in
> user interfaces, so I suppose I'll have to complain here, too, that I'd
> prefer a "Cannot".
>
>> +                table_name, strerror(-ret));
>> +        return ret;
>> +    }
>> +
>> +    ret = bdrv_pwrite_sync(bs->file, entry_offset, &new_val, sizeof(new_val));
>> +    if (ret < 0) {
>> +        fprintf(stderr, "ERROR: Failed to overwrite %s table entry: %s\n",
>> +                table_name, strerror(-ret));
>> +        return 0;
>> +    }
>> +
>> +    return 1;
>> +}
>> +
>> +/* Try to fix (if allowed) entry in L1 or L2 table. Update @res correspondingly.
>> + *
>> + * Returns: -errno if overlap check failed
>> + *          0 if entry was not updated for other reason
>> + *            (fixing disabled or write failed)
>> + *          1 on success
>> + */
>> +static int fix_table_entry(BlockDriverState *bs, BdrvCheckResult *res,
>> +                           BdrvCheckMode fix, const char *table_name,
>> +                           uint64_t table_offset, int entry_index,
>> +                           uint64_t new_val, int ign,
>> +                           const char *fmt, va_list args)
>> +{
>> +    int ret;
>> +
>> +    fprintf(stderr, fix & BDRV_FIX_ERRORS ? "Repairing: " : "ERROR: ");
>> +    vfprintf(stderr, fmt, args);
>> +    fprintf(stderr, "\n");
> If you're just going to print this here, right at the start, I think it
> would be better to just do it in the caller.
>
> Sure, with this solution the caller safes an fprintf() call, but I find
> it a bit over the top to start with vararg handling here when the caller
> can just do an additional fprintf().
>
> (Also, I actually find it clearer if you have to call two separate
> functions.)
>
>> +
>> +    if (!(fix & BDRV_FIX_ERRORS)) {
>> +        res->corruptions++;
>> +        return 0;
>> +    }

hmm, after dropping printfs, this if becomes strange too. it's the only 
use of "fix", and not correspond to function name (correspond to 
comments, but it's a bad reason).

>> +
>> +    ret = write_table_entry(bs, table_name, table_offset, entry_index, new_val,
>> +                            ign);
>> +
>> +    if (ret == 1) {
>> +        res->corruptions_fixed++;
>> +    } else {
>> +        res->check_errors++;
>> +    }
>> +
>> +    return ret;
>> +}
>> +
>> +/* Make L2 entry to be QCOW2_CLUSTER_ZERO_PLAIN
>> + *
>> + * Returns: -errno if overlap check failed
>> + *          0 if write failed
>> + *          1 on success
>> + */
>> +static int fix_l2_entry_to_zero(BlockDriverState *bs, BdrvCheckResult *res,
>> +                                BdrvCheckMode fix, int64_t l2_offset,
>> +                                int l2_index, bool active,
>> +                                const char *fmt, ...)
>> +{
>> +    int ret;
>> +    int ign = active ? QCOW2_OL_ACTIVE_L2 : QCOW2_OL_INACTIVE_L2;
>> +    uint64_t l2_entry = QCOW_OFLAG_ZERO;
>> +    va_list args;
>> +
>> +    va_start(args, fmt);
>> +    ret = fix_table_entry(bs, res, fix, "L2", l2_offset, l2_index, l2_entry,
>> +                          ign, fmt, args);
>> +    va_end(args);
>> +
>> +    return ret;
>> +}
> If you drop the fprintf() from fix_table_entry(), this function will
> make less sense as well.  Just calling fix_table_entry() directly will
> be just as easy.
>
> (Yes, I see that you use the function in patch 7 again, and yes, you'd
> have to use a full line for the "active ?:" ternary, but still.)
>
> Max
>


-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [PATCH v2 7/7] block/qcow2-refcount: fix out-of-file L2 entries to be read-as-zero
  2018-08-17 12:22 ` [Qemu-devel] [PATCH v2 7/7] block/qcow2-refcount: fix out-of-file L2 entries to be read-as-zero Vladimir Sementsov-Ogievskiy
  2018-10-08 20:51   ` Max Reitz
@ 2018-10-10 16:39   ` Vladimir Sementsov-Ogievskiy
  2018-10-10 16:55     ` Vladimir Sementsov-Ogievskiy
  1 sibling, 1 reply; 32+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2018-10-10 16:39 UTC (permalink / raw)
  To: qemu-devel, qemu-block; +Cc: mreitz, kwolf, eblake, Denis Lunev

17.08.2018 15:22, Vladimir Sementsov-Ogievskiy wrote:
> Rewrite corrupted L2 table entry, which reference space out of
> underlying file.
>
> Make this L2 table entry read-as-all-zeros without any allocation.
>
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>   block/qcow2-refcount.c | 32 ++++++++++++++++++++++++++++++++
>   1 file changed, 32 insertions(+)
>
> diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
> index 3c004e5bfe..3de3768a3c 100644
> --- a/block/qcow2-refcount.c
> +++ b/block/qcow2-refcount.c
> @@ -1720,8 +1720,30 @@ static int check_refcounts_l2(BlockDriverState *bs, BdrvCheckResult *res,
>               /* Mark cluster as used */
>               csize = (((l2_entry >> s->csize_shift) & s->csize_mask) + 1) *
>                       BDRV_SECTOR_SIZE;
> +            if (csize > s->cluster_size) {
> +                ret = fix_l2_entry_to_zero(
> +                        bs, res, fix, l2_offset, i, active,
> +                        "compressed cluster larger than cluster: size 0x%"
> +                        PRIx64, csize);
> +                if (ret < 0) {
> +                    goto fail;
> +                }
> +                continue;
> +            }
> +
>               coffset = l2_entry & s->cluster_offset_mask &
>                         ~(BDRV_SECTOR_SIZE - 1);
> +            if (coffset >= bdrv_getlength(bs->file->bs)) {
> +                ret = fix_l2_entry_to_zero(
> +                        bs, res, fix, l2_offset, i, active,
> +                        "compressed cluster out of file: offset 0x%" PRIx64,
> +                        coffset);
> +                if (ret < 0) {
> +                    goto fail;
> +                }
> +                continue;
> +            }
> +
>               ret = qcow2_inc_refcounts_imrt(bs, res,
>                                              refcount_table, refcount_table_size,
>                                              coffset, csize);
> @@ -1748,6 +1770,16 @@ static int check_refcounts_l2(BlockDriverState *bs, BdrvCheckResult *res,
>           {
>               uint64_t offset = l2_entry & L2E_OFFSET_MASK;
>   
> +            if (offset >= bdrv_getlength(bs->file->bs)) {
> +                ret = fix_l2_entry_to_zero(
> +                        bs, res, fix, l2_offset, i, active,
> +                        "cluster out of file: offset 0x%" PRIx64, offset);
> +                if (ret < 0) {
> +                    goto fail;
> +                }
> +                continue;
> +            }
> +
>               if (flags & CHECK_FRAG_INFO) {
>                   res->bfi.allocated_clusters++;
>                   if (next_contiguous_offset &&

hmm, interesting question here: in case of misaligned l2 entry, we zero 
it out only for QCOW2_CLUSTER_ZERO_ALLOC, but not for normal clusters? 
Why? I think it is ok to mark as zero misaligned normal cluster l2 
entry, otherwise we'll have fatal corruption on any operation to this 
cluster.

-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [PATCH v2 7/7] block/qcow2-refcount: fix out-of-file L2 entries to be read-as-zero
  2018-10-10 16:39   ` Vladimir Sementsov-Ogievskiy
@ 2018-10-10 16:55     ` Vladimir Sementsov-Ogievskiy
  2018-10-10 16:59       ` Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 32+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2018-10-10 16:55 UTC (permalink / raw)
  To: qemu-devel, qemu-block; +Cc: mreitz, kwolf, eblake, Denis Lunev

10.10.2018 19:39, Vladimir Sementsov-Ogievskiy wrote:
> 17.08.2018 15:22, Vladimir Sementsov-Ogievskiy wrote:
>> Rewrite corrupted L2 table entry, which reference space out of
>> underlying file.
>>
>> Make this L2 table entry read-as-all-zeros without any allocation.
>>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>> ---
>>   block/qcow2-refcount.c | 32 ++++++++++++++++++++++++++++++++
>>   1 file changed, 32 insertions(+)
>>
>> diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
>> index 3c004e5bfe..3de3768a3c 100644
>> --- a/block/qcow2-refcount.c
>> +++ b/block/qcow2-refcount.c
>> @@ -1720,8 +1720,30 @@ static int check_refcounts_l2(BlockDriverState 
>> *bs, BdrvCheckResult *res,
>>               /* Mark cluster as used */
>>               csize = (((l2_entry >> s->csize_shift) & s->csize_mask) 
>> + 1) *
>>                       BDRV_SECTOR_SIZE;
>> +            if (csize > s->cluster_size) {
>> +                ret = fix_l2_entry_to_zero(
>> +                        bs, res, fix, l2_offset, i, active,
>> +                        "compressed cluster larger than cluster: 
>> size 0x%"
>> +                        PRIx64, csize);
>> +                if (ret < 0) {
>> +                    goto fail;
>> +                }
>> +                continue;
>> +            }
>> +
>>               coffset = l2_entry & s->cluster_offset_mask &
>>                         ~(BDRV_SECTOR_SIZE - 1);
>> +            if (coffset >= bdrv_getlength(bs->file->bs)) {
>> +                ret = fix_l2_entry_to_zero(
>> +                        bs, res, fix, l2_offset, i, active,
>> +                        "compressed cluster out of file: offset 0x%" 
>> PRIx64,
>> +                        coffset);
>> +                if (ret < 0) {
>> +                    goto fail;
>> +                }
>> +                continue;
>> +            }
>> +
>>               ret = qcow2_inc_refcounts_imrt(bs, res,
>>                                              refcount_table, 
>> refcount_table_size,
>>                                              coffset, csize);
>> @@ -1748,6 +1770,16 @@ static int check_refcounts_l2(BlockDriverState 
>> *bs, BdrvCheckResult *res,
>>           {
>>               uint64_t offset = l2_entry & L2E_OFFSET_MASK;
>>   +            if (offset >= bdrv_getlength(bs->file->bs)) {
>> +                ret = fix_l2_entry_to_zero(
>> +                        bs, res, fix, l2_offset, i, active,
>> +                        "cluster out of file: offset 0x%" PRIx64, 
>> offset);
>> +                if (ret < 0) {
>> +                    goto fail;
>> +                }
>> +                continue;
>> +            }
>> +
>>               if (flags & CHECK_FRAG_INFO) {
>>                   res->bfi.allocated_clusters++;
>>                   if (next_contiguous_offset &&
>
> hmm, interesting question here: in case of misaligned l2 entry, we 
> zero it out only for QCOW2_CLUSTER_ZERO_ALLOC, but not for normal 
> clusters? Why? I think it is ok to mark as zero misaligned normal 
> cluster l2 entry, otherwise we'll have fatal corruption on any 
> operation to this cluster.
>

or we can just align them down.

-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [PATCH v2 7/7] block/qcow2-refcount: fix out-of-file L2 entries to be read-as-zero
  2018-10-10 16:55     ` Vladimir Sementsov-Ogievskiy
@ 2018-10-10 16:59       ` Vladimir Sementsov-Ogievskiy
  2018-10-13 12:58         ` Max Reitz
  0 siblings, 1 reply; 32+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2018-10-10 16:59 UTC (permalink / raw)
  To: qemu-devel, qemu-block; +Cc: mreitz, kwolf, eblake, Denis Lunev

10.10.2018 19:55, Vladimir Sementsov-Ogievskiy wrote:
> 10.10.2018 19:39, Vladimir Sementsov-Ogievskiy wrote:
>> 17.08.2018 15:22, Vladimir Sementsov-Ogievskiy wrote:
>>> Rewrite corrupted L2 table entry, which reference space out of
>>> underlying file.
>>>
>>> Make this L2 table entry read-as-all-zeros without any allocation.
>>>
>>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>>> ---
>>>   block/qcow2-refcount.c | 32 ++++++++++++++++++++++++++++++++
>>>   1 file changed, 32 insertions(+)
>>>
>>> diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
>>> index 3c004e5bfe..3de3768a3c 100644
>>> --- a/block/qcow2-refcount.c
>>> +++ b/block/qcow2-refcount.c
>>> @@ -1720,8 +1720,30 @@ static int 
>>> check_refcounts_l2(BlockDriverState *bs, BdrvCheckResult *res,
>>>               /* Mark cluster as used */
>>>               csize = (((l2_entry >> s->csize_shift) & 
>>> s->csize_mask) + 1) *
>>>                       BDRV_SECTOR_SIZE;
>>> +            if (csize > s->cluster_size) {
>>> +                ret = fix_l2_entry_to_zero(
>>> +                        bs, res, fix, l2_offset, i, active,
>>> +                        "compressed cluster larger than cluster: 
>>> size 0x%"
>>> +                        PRIx64, csize);
>>> +                if (ret < 0) {
>>> +                    goto fail;
>>> +                }
>>> +                continue;
>>> +            }
>>> +
>>>               coffset = l2_entry & s->cluster_offset_mask &
>>>                         ~(BDRV_SECTOR_SIZE - 1);
>>> +            if (coffset >= bdrv_getlength(bs->file->bs)) {
>>> +                ret = fix_l2_entry_to_zero(
>>> +                        bs, res, fix, l2_offset, i, active,
>>> +                        "compressed cluster out of file: offset 
>>> 0x%" PRIx64,
>>> +                        coffset);
>>> +                if (ret < 0) {
>>> +                    goto fail;
>>> +                }
>>> +                continue;
>>> +            }
>>> +
>>>               ret = qcow2_inc_refcounts_imrt(bs, res,
>>>                                              refcount_table, 
>>> refcount_table_size,
>>>                                              coffset, csize);
>>> @@ -1748,6 +1770,16 @@ static int 
>>> check_refcounts_l2(BlockDriverState *bs, BdrvCheckResult *res,
>>>           {
>>>               uint64_t offset = l2_entry & L2E_OFFSET_MASK;
>>>   +            if (offset >= bdrv_getlength(bs->file->bs)) {
>>> +                ret = fix_l2_entry_to_zero(
>>> +                        bs, res, fix, l2_offset, i, active,
>>> +                        "cluster out of file: offset 0x%" PRIx64, 
>>> offset);
>>> +                if (ret < 0) {
>>> +                    goto fail;
>>> +                }
>>> +                continue;
>>> +            }
>>> +
>>>               if (flags & CHECK_FRAG_INFO) {
>>>                   res->bfi.allocated_clusters++;
>>>                   if (next_contiguous_offset &&
>>
>> hmm, interesting question here: in case of misaligned l2 entry, we 
>> zero it out only for QCOW2_CLUSTER_ZERO_ALLOC, but not for normal 
>> clusters? Why? I think it is ok to mark as zero misaligned normal 
>> cluster l2 entry, otherwise we'll have fatal corruption on any 
>> operation to this cluster.
>>
>
> or we can just align them down.
>

and why do we calculate refcounts for corrupted l2 entry? Is it correct, 
to consider data range referenced by this entry, if we'll never success 
in writing or reading this data?

-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [PATCH v2 7/7] block/qcow2-refcount: fix out-of-file L2 entries to be read-as-zero
  2018-10-08 23:14             ` Vladimir Sementsov-Ogievskiy
@ 2018-10-13 12:51               ` Max Reitz
  0 siblings, 0 replies; 32+ messages in thread
From: Max Reitz @ 2018-10-13 12:51 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-devel, qemu-block
  Cc: kwolf, eblake, Denis Lunev

[-- Attachment #1: Type: text/plain, Size: 5232 bytes --]

On 09.10.18 01:14, Vladimir Sementsov-Ogievskiy wrote:
> 
> 
> On 10/09/2018 01:21 AM, Max Reitz wrote:
>> On 09.10.18 00:14, Vladimir Sementsov-Ogievskiy wrote:
>>>
>>>
>>> On 10/09/2018 01:08 AM, Max Reitz wrote:
>>>> On 09.10.18 00:02, Vladimir Sementsov-Ogievskiy wrote:
>>>>>
>>>>>
>>>>> On 10/08/2018 11:51 PM, Max Reitz wrote:
>>>>>> On 17.08.18 14:22, Vladimir Sementsov-Ogievskiy wrote:
>>>>>>> Rewrite corrupted L2 table entry, which reference space out of
>>>>>>> underlying file.
>>>>>>>
>>>>>>> Make this L2 table entry read-as-all-zeros without any allocation.
>>>>>>>
>>>>>>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>>>>>>> ---
>>>>>>>     block/qcow2-refcount.c | 32 ++++++++++++++++++++++++++++++++
>>>>>>>     1 file changed, 32 insertions(+)
>>>>>>>
>>>>>>> diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
>>>>>>> index 3c004e5bfe..3de3768a3c 100644
>>>>>>> --- a/block/qcow2-refcount.c
>>>>>>> +++ b/block/qcow2-refcount.c
>>>>>>> @@ -1720,8 +1720,30 @@ static int check_refcounts_l2(BlockDriverState *bs, BdrvCheckResult *res,
>>>>>>>                 /* Mark cluster as used */
>>>>>>>                 csize = (((l2_entry >> s->csize_shift) & s->csize_mask) + 1) *
>>>>>>>                         BDRV_SECTOR_SIZE;
>>>>>>> +            if (csize > s->cluster_size) {
>>>>>>> +                ret = fix_l2_entry_to_zero(
>>>>>>> +                        bs, res, fix, l2_offset, i, active,
>>>>>>> +                        "compressed cluster larger than cluster: size 0x%"
>>>>>>> +                        PRIx64, csize);
>>>>>>> +                if (ret < 0) {
>>>>>>> +                    goto fail;
>>>>>>> +                }
>>>>>>> +                continue;
>>>>>>> +            }
>>>>>>> +
>>>>>>
>>>>>> This seems recoverable, isn't it?  Can we not try to just limit the
>>>>>> csize, or decompress the cluster with the given csize from the given
>>>>>> offset, disregarding the cluster limit?
>>>>>
>>>>> Hm, you want to assume that csize is corrupted but coffset may be
>>>>> correct? Unlikely, I think.
>>>>
>>>> Better to reconstruct probably garbage data than to definitely garbage
>>>> data (all zeroes) is what I think.
>>>>
>>>>> So, to carefully repair csize, we should decompress one cluster (or one
>>>>> cluster - 1 byte) of data, trying to get one cluster of decompressed
>>>>> data. If we succeed, we know csize, or we can safely set it to one cluster.
>>>>
>>>> Yes.
>>>>
>>>>> Or we can just set csize = 1 cluster, if it is larger. And leave
>>>>> problems to real execution which will lead to EIO in worst case.
>>>>
>>>> Or this, yes.
>>>>
>>>>>>>                 coffset = l2_entry & s->cluster_offset_mask &
>>>>>>>                           ~(BDRV_SECTOR_SIZE - 1);
>>>>>>> +            if (coffset >= bdrv_getlength(bs->file->bs)) {
>>>>>>> +                ret = fix_l2_entry_to_zero(
>>>>>>> +                        bs, res, fix, l2_offset, i, active,
>>>>>>> +                        "compressed cluster out of file: offset 0x%" PRIx64,
>>>>>>> +                        coffset);
>>>>>>> +                if (ret < 0) {
>>>>>>> +                    goto fail;
>>>>>>> +                }
>>>>>>> +                continue;
>>>>>>> +            }
>>>>>>> +
>>>>>>>                 ret = qcow2_inc_refcounts_imrt(bs, res,
>>>>>>>                                                refcount_table, refcount_table_size,
>>>>>>>                                                coffset, csize);
>>>>>>> @@ -1748,6 +1770,16 @@ static int check_refcounts_l2(BlockDriverState *bs, BdrvCheckResult *res,
>>>>>>>             {
>>>>>>>                 uint64_t offset = l2_entry & L2E_OFFSET_MASK;
>>>>>>>     
>>>>>>> +            if (offset >= bdrv_getlength(bs->file->bs)) {
>>>>>>> +                ret = fix_l2_entry_to_zero(
>>>>>>> +                        bs, res, fix, l2_offset, i, active,
>>>>>>> +                        "cluster out of file: offset 0x%" PRIx64, offset);
>>>>>>> +                if (ret < 0) {
>>>>>>> +                    goto fail;
>>>>>>> +                }
>>>>>>> +                continue;
>>>>>>> +            }
>>>>>>> +
>>>>>>
>>>>>> These other two look OK, but they have another issue:  If this is a v2
>>>>>> image, you cannot create zero clusters; so you'll have to unallocate the
>>>>>> cluster in that case.
>>>>>
>>>>>
>>>>> Oho, it's a problem. It may be unsafe to discard clusters, making
>>>>> backing image available through the holes. What discard do on v2?
>>>>> Zeroing or holes?
>>>>
>>>> Oh, right!  discard on v2 punches a hole.  So I see three ways:
>>>> (1) You can do the same and point to that bit of code, or
>>>> (2) You allocate a data cluster full of zeroes in case of v2, or
>>>> (3) You just error out.
>>>>
>>>> (3) doesn't seem like the worst option.
>>>
>>>> Amending the image to be v3 is
>>>> always possible and trivial.
>>>
>>> how to do it for corrupted image?
>>
>> Oh, yeah, you can't open a corrupted image, can you...  I suppose we
>> want a way to force-clear the flag anyway. :-)
> 
> am, which flag?

The corrupt flag in the image header.

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [PATCH v2 7/7] block/qcow2-refcount: fix out-of-file L2 entries to be read-as-zero
  2018-10-10 16:59       ` Vladimir Sementsov-Ogievskiy
@ 2018-10-13 12:58         ` Max Reitz
  2018-12-12  8:36           ` Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 32+ messages in thread
From: Max Reitz @ 2018-10-13 12:58 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-devel, qemu-block
  Cc: kwolf, eblake, Denis Lunev

[-- Attachment #1: Type: text/plain, Size: 5519 bytes --]

On 10.10.18 18:59, Vladimir Sementsov-Ogievskiy wrote:
> 10.10.2018 19:55, Vladimir Sementsov-Ogievskiy wrote:
>> 10.10.2018 19:39, Vladimir Sementsov-Ogievskiy wrote:
>>> 17.08.2018 15:22, Vladimir Sementsov-Ogievskiy wrote:
>>>> Rewrite corrupted L2 table entry, which reference space out of
>>>> underlying file.
>>>>
>>>> Make this L2 table entry read-as-all-zeros without any allocation.
>>>>
>>>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>>>> ---
>>>>   block/qcow2-refcount.c | 32 ++++++++++++++++++++++++++++++++
>>>>   1 file changed, 32 insertions(+)
>>>>
>>>> diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
>>>> index 3c004e5bfe..3de3768a3c 100644
>>>> --- a/block/qcow2-refcount.c
>>>> +++ b/block/qcow2-refcount.c
>>>> @@ -1720,8 +1720,30 @@ static int 
>>>> check_refcounts_l2(BlockDriverState *bs, BdrvCheckResult *res,
>>>>               /* Mark cluster as used */
>>>>               csize = (((l2_entry >> s->csize_shift) & 
>>>> s->csize_mask) + 1) *
>>>>                       BDRV_SECTOR_SIZE;
>>>> +            if (csize > s->cluster_size) {
>>>> +                ret = fix_l2_entry_to_zero(
>>>> +                        bs, res, fix, l2_offset, i, active,
>>>> +                        "compressed cluster larger than cluster: 
>>>> size 0x%"
>>>> +                        PRIx64, csize);
>>>> +                if (ret < 0) {
>>>> +                    goto fail;
>>>> +                }
>>>> +                continue;
>>>> +            }
>>>> +
>>>>               coffset = l2_entry & s->cluster_offset_mask &
>>>>                         ~(BDRV_SECTOR_SIZE - 1);
>>>> +            if (coffset >= bdrv_getlength(bs->file->bs)) {
>>>> +                ret = fix_l2_entry_to_zero(
>>>> +                        bs, res, fix, l2_offset, i, active,
>>>> +                        "compressed cluster out of file: offset 
>>>> 0x%" PRIx64,
>>>> +                        coffset);
>>>> +                if (ret < 0) {
>>>> +                    goto fail;
>>>> +                }
>>>> +                continue;
>>>> +            }
>>>> +
>>>>               ret = qcow2_inc_refcounts_imrt(bs, res,
>>>>                                              refcount_table, 
>>>> refcount_table_size,
>>>>                                              coffset, csize);
>>>> @@ -1748,6 +1770,16 @@ static int 
>>>> check_refcounts_l2(BlockDriverState *bs, BdrvCheckResult *res,
>>>>           {
>>>>               uint64_t offset = l2_entry & L2E_OFFSET_MASK;
>>>>   +            if (offset >= bdrv_getlength(bs->file->bs)) {
>>>> +                ret = fix_l2_entry_to_zero(
>>>> +                        bs, res, fix, l2_offset, i, active,
>>>> +                        "cluster out of file: offset 0x%" PRIx64, 
>>>> offset);
>>>> +                if (ret < 0) {
>>>> +                    goto fail;
>>>> +                }
>>>> +                continue;
>>>> +            }
>>>> +
>>>>               if (flags & CHECK_FRAG_INFO) {
>>>>                   res->bfi.allocated_clusters++;
>>>>                   if (next_contiguous_offset &&
>>>
>>> hmm, interesting question here: in case of misaligned l2 entry, we 
>>> zero it out only for QCOW2_CLUSTER_ZERO_ALLOC, but not for normal 
>>> clusters? Why? I think it is ok to mark as zero misaligned normal 
>>> cluster l2 entry, otherwise we'll have fatal corruption on any 
>>> operation to this cluster.

Because for zero clusters the solution is clear.  We just throw away the
obviously wrong preallocation information, but the cluster data stays
the same (zero).  So there is no data loss.

For normal clusters, you definitely destroy the data by zeroing them out.

>> or we can just align them down.

Which would destroy the data as well.

You can argue that if the value is misaligned, it is extremely likely to
be just garbage as a whole, though.  But in any case, it is not obvious
what to do and always means data loss (which is different from zero
clusters, where you can just keep them zero).

The clearest and most obvious solution would be to allocate a new
cluster and copy the unaligned data there.  Maybe that doesn't make
sense because the data is probably garbage anyway, but it definitely
won't harm.

> and why do we calculate refcounts for corrupted l2 entry? Is it correct, 
> to consider data range referenced by this entry, if we'll never success 
> in writing or reading this data?

It's definitely better to mark something wrongly as referenced than
wrongly as free.

The only difference it makes is that maybe we could save some space, but
if there are any such corruptions, saving space really is the least of
the users issues.

MAx


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [PATCH v2 7/7] block/qcow2-refcount: fix out-of-file L2 entries to be read-as-zero
  2018-10-13 12:58         ` Max Reitz
@ 2018-12-12  8:36           ` Vladimir Sementsov-Ogievskiy
  2018-12-12 12:49             ` Max Reitz
  0 siblings, 1 reply; 32+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2018-12-12  8:36 UTC (permalink / raw)
  To: Max Reitz, qemu-devel, qemu-block; +Cc: kwolf, eblake, Denis Lunev

13.10.2018 15:58, Max Reitz wrote:
> On 10.10.18 18:59, Vladimir Sementsov-Ogievskiy wrote:
>> 10.10.2018 19:55, Vladimir Sementsov-Ogievskiy wrote:
>>> 10.10.2018 19:39, Vladimir Sementsov-Ogievskiy wrote:
>>>> 17.08.2018 15:22, Vladimir Sementsov-Ogievskiy wrote:
>>>>> Rewrite corrupted L2 table entry, which reference space out of
>>>>> underlying file.
>>>>>
>>>>> Make this L2 table entry read-as-all-zeros without any allocation.
>>>>>
>>>>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>>>>> ---
>>>>>    block/qcow2-refcount.c | 32 ++++++++++++++++++++++++++++++++
>>>>>    1 file changed, 32 insertions(+)
>>>>>
>>>>> diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
>>>>> index 3c004e5bfe..3de3768a3c 100644
>>>>> --- a/block/qcow2-refcount.c
>>>>> +++ b/block/qcow2-refcount.c
>>>>> @@ -1720,8 +1720,30 @@ static int
>>>>> check_refcounts_l2(BlockDriverState *bs, BdrvCheckResult *res,
>>>>>                /* Mark cluster as used */
>>>>>                csize = (((l2_entry >> s->csize_shift) &
>>>>> s->csize_mask) + 1) *
>>>>>                        BDRV_SECTOR_SIZE;
>>>>> +            if (csize > s->cluster_size) {
>>>>> +                ret = fix_l2_entry_to_zero(
>>>>> +                        bs, res, fix, l2_offset, i, active,
>>>>> +                        "compressed cluster larger than cluster:
>>>>> size 0x%"
>>>>> +                        PRIx64, csize);
>>>>> +                if (ret < 0) {
>>>>> +                    goto fail;
>>>>> +                }
>>>>> +                continue;
>>>>> +            }
>>>>> +
>>>>>                coffset = l2_entry & s->cluster_offset_mask &
>>>>>                          ~(BDRV_SECTOR_SIZE - 1);
>>>>> +            if (coffset >= bdrv_getlength(bs->file->bs)) {
>>>>> +                ret = fix_l2_entry_to_zero(
>>>>> +                        bs, res, fix, l2_offset, i, active,
>>>>> +                        "compressed cluster out of file: offset
>>>>> 0x%" PRIx64,
>>>>> +                        coffset);
>>>>> +                if (ret < 0) {
>>>>> +                    goto fail;
>>>>> +                }
>>>>> +                continue;
>>>>> +            }
>>>>> +
>>>>>                ret = qcow2_inc_refcounts_imrt(bs, res,
>>>>>                                               refcount_table,
>>>>> refcount_table_size,
>>>>>                                               coffset, csize);
>>>>> @@ -1748,6 +1770,16 @@ static int
>>>>> check_refcounts_l2(BlockDriverState *bs, BdrvCheckResult *res,
>>>>>            {
>>>>>                uint64_t offset = l2_entry & L2E_OFFSET_MASK;
>>>>>    +            if (offset >= bdrv_getlength(bs->file->bs)) {
>>>>> +                ret = fix_l2_entry_to_zero(
>>>>> +                        bs, res, fix, l2_offset, i, active,
>>>>> +                        "cluster out of file: offset 0x%" PRIx64,
>>>>> offset);
>>>>> +                if (ret < 0) {
>>>>> +                    goto fail;
>>>>> +                }
>>>>> +                continue;
>>>>> +            }
>>>>> +
>>>>>                if (flags & CHECK_FRAG_INFO) {
>>>>>                    res->bfi.allocated_clusters++;
>>>>>                    if (next_contiguous_offset &&
>>>>
>>>> hmm, interesting question here: in case of misaligned l2 entry, we
>>>> zero it out only for QCOW2_CLUSTER_ZERO_ALLOC, but not for normal
>>>> clusters? Why? I think it is ok to mark as zero misaligned normal
>>>> cluster l2 entry, otherwise we'll have fatal corruption on any
>>>> operation to this cluster.
> 
> Because for zero clusters the solution is clear.  We just throw away the
> obviously wrong preallocation information, but the cluster data stays
> the same (zero).  So there is no data loss.
> 
> For normal clusters, you definitely destroy the data by zeroing them out.
> 
>>> or we can just align them down.
> 
> Which would destroy the data as well.
> 
> You can argue that if the value is misaligned, it is extremely likely to
> be just garbage as a whole, though.  But in any case, it is not obvious
> what to do and always means data loss (which is different from zero
> clusters, where you can just keep them zero).
> 
> The clearest and most obvious solution would be to allocate a new
> cluster and copy the unaligned data there.  Maybe that doesn't make
> sense because the data is probably garbage anyway, but it definitely
> won't harm.


but what to copy? I think, it is mostly impossible that there is a misaligned
data cluster. More probable is just partly wrong l2 entry. So, in your way we
will lose this data (as we lose l2 entry, our last hope). Finally, what to do with
misaligned cluster on check? We definitely should do something, as trying to
access such cluster corrupts qcow2 in qemu.

What about an additional flag like "-align-misaligned-clusters-down"?

> 
>> and why do we calculate refcounts for corrupted l2 entry? Is it correct,
>> to consider data range referenced by this entry, if we'll never success
>> in writing or reading this data?
> 
> It's definitely better to mark something wrongly as referenced than
> wrongly as free.
> 
> The only difference it makes is that maybe we could save some space, but
> if there are any such corruptions, saving space really is the least of
> the users issues.
> 
> MAx
> 


-- 
Best regards,
Vladimir

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [PATCH v2 7/7] block/qcow2-refcount: fix out-of-file L2 entries to be read-as-zero
  2018-12-12  8:36           ` Vladimir Sementsov-Ogievskiy
@ 2018-12-12 12:49             ` Max Reitz
  0 siblings, 0 replies; 32+ messages in thread
From: Max Reitz @ 2018-12-12 12:49 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-devel, qemu-block
  Cc: kwolf, eblake, Denis Lunev

[-- Attachment #1: Type: text/plain, Size: 7414 bytes --]

On 12.12.18 09:36, Vladimir Sementsov-Ogievskiy wrote:
> 13.10.2018 15:58, Max Reitz wrote:
>> On 10.10.18 18:59, Vladimir Sementsov-Ogievskiy wrote:
>>> 10.10.2018 19:55, Vladimir Sementsov-Ogievskiy wrote:
>>>> 10.10.2018 19:39, Vladimir Sementsov-Ogievskiy wrote:
>>>>> 17.08.2018 15:22, Vladimir Sementsov-Ogievskiy wrote:
>>>>>> Rewrite corrupted L2 table entry, which reference space out of
>>>>>> underlying file.
>>>>>>
>>>>>> Make this L2 table entry read-as-all-zeros without any allocation.
>>>>>>
>>>>>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>>>>>> ---
>>>>>>    block/qcow2-refcount.c | 32 ++++++++++++++++++++++++++++++++
>>>>>>    1 file changed, 32 insertions(+)
>>>>>>
>>>>>> diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
>>>>>> index 3c004e5bfe..3de3768a3c 100644
>>>>>> --- a/block/qcow2-refcount.c
>>>>>> +++ b/block/qcow2-refcount.c
>>>>>> @@ -1720,8 +1720,30 @@ static int
>>>>>> check_refcounts_l2(BlockDriverState *bs, BdrvCheckResult *res,
>>>>>>                /* Mark cluster as used */
>>>>>>                csize = (((l2_entry >> s->csize_shift) &
>>>>>> s->csize_mask) + 1) *
>>>>>>                        BDRV_SECTOR_SIZE;
>>>>>> +            if (csize > s->cluster_size) {
>>>>>> +                ret = fix_l2_entry_to_zero(
>>>>>> +                        bs, res, fix, l2_offset, i, active,
>>>>>> +                        "compressed cluster larger than cluster:
>>>>>> size 0x%"
>>>>>> +                        PRIx64, csize);
>>>>>> +                if (ret < 0) {
>>>>>> +                    goto fail;
>>>>>> +                }
>>>>>> +                continue;
>>>>>> +            }
>>>>>> +
>>>>>>                coffset = l2_entry & s->cluster_offset_mask &
>>>>>>                          ~(BDRV_SECTOR_SIZE - 1);
>>>>>> +            if (coffset >= bdrv_getlength(bs->file->bs)) {
>>>>>> +                ret = fix_l2_entry_to_zero(
>>>>>> +                        bs, res, fix, l2_offset, i, active,
>>>>>> +                        "compressed cluster out of file: offset
>>>>>> 0x%" PRIx64,
>>>>>> +                        coffset);
>>>>>> +                if (ret < 0) {
>>>>>> +                    goto fail;
>>>>>> +                }
>>>>>> +                continue;
>>>>>> +            }
>>>>>> +
>>>>>>                ret = qcow2_inc_refcounts_imrt(bs, res,
>>>>>>                                               refcount_table,
>>>>>> refcount_table_size,
>>>>>>                                               coffset, csize);
>>>>>> @@ -1748,6 +1770,16 @@ static int
>>>>>> check_refcounts_l2(BlockDriverState *bs, BdrvCheckResult *res,
>>>>>>            {
>>>>>>                uint64_t offset = l2_entry & L2E_OFFSET_MASK;
>>>>>>    +            if (offset >= bdrv_getlength(bs->file->bs)) {
>>>>>> +                ret = fix_l2_entry_to_zero(
>>>>>> +                        bs, res, fix, l2_offset, i, active,
>>>>>> +                        "cluster out of file: offset 0x%" PRIx64,
>>>>>> offset);
>>>>>> +                if (ret < 0) {
>>>>>> +                    goto fail;
>>>>>> +                }
>>>>>> +                continue;
>>>>>> +            }
>>>>>> +
>>>>>>                if (flags & CHECK_FRAG_INFO) {
>>>>>>                    res->bfi.allocated_clusters++;
>>>>>>                    if (next_contiguous_offset &&
>>>>>
>>>>> hmm, interesting question here: in case of misaligned l2 entry, we
>>>>> zero it out only for QCOW2_CLUSTER_ZERO_ALLOC, but not for normal
>>>>> clusters? Why? I think it is ok to mark as zero misaligned normal
>>>>> cluster l2 entry, otherwise we'll have fatal corruption on any
>>>>> operation to this cluster.
>>
>> Because for zero clusters the solution is clear.  We just throw away the
>> obviously wrong preallocation information, but the cluster data stays
>> the same (zero).  So there is no data loss.
>>
>> For normal clusters, you definitely destroy the data by zeroing them out.
>>
>>>> or we can just align them down.
>>
>> Which would destroy the data as well.
>>
>> You can argue that if the value is misaligned, it is extremely likely to
>> be just garbage as a whole, though.  But in any case, it is not obvious
>> what to do and always means data loss (which is different from zero
>> clusters, where you can just keep them zero).
>>
>> The clearest and most obvious solution would be to allocate a new
>> cluster and copy the unaligned data there.  Maybe that doesn't make
>> sense because the data is probably garbage anyway, but it definitely
>> won't harm.
> 
> 
> but what to copy? I think, it is mostly impossible that there is a misaligned
> data cluster. More probable is just partly wrong l2 entry.

What do you mean by "partly"?  I think having eight bytes "partly" wrong
is not very probable either.

I do agree that it's more likely that the L2 information is just garbage
than that the cluster base really is misaligned.  But I think it would
be garbage as a whole.

> So, in your way we will lose this data (as we lose l2 entry, our last hope).

So you think we should set the zero bit and leave the rest of the
cluster as it is?  But the resulting image would not be correct (because
the preallocation offset is wrong), so I don't see that as a good way of
repairing.

On one hand I think we want some repair option to explicitly acknowledge
data loss.  Like invalid bitmaps being removed or invalid L2 entries
being set to some value that is valid.

On the other, I would imagine that one usually runs qemu-img check
without -r on a broken image first to see what's up; at least if they
intent to have a deep look into it at all.  I think people should be
aware that -r all may destroy these kinds of leads.

But in any case, since I think the chances of the L2 entry only being
partly wrong are very small, I think it doesn't bring much to keep that
data around anyway.  I only find it useful in finding out why the
corruption occurred in the first place (by seeing what kind of data it
was overwritten with).

> Finally, what to do with
> misaligned cluster on check? We definitely should do something, as trying to
> access such cluster corrupts qcow2 in qemu.

Well, I gave a description of what I think should be done; which is to
allocate a new cluster, copy the unaligned data there, and then make the
entry point to that new cluster.

> What about an additional flag like "-align-misaligned-clusters-down"?

It would probably make more sense to add flags to the qemu-img check
infrastructure than adding a new -r mode, yes.

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2018-12-12 12:49 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-08-17 12:22 [Qemu-devel] [PATCH 0/7] qcow2 check improvements Vladimir Sementsov-Ogievskiy
2018-08-17 12:22 ` [Qemu-devel] [PATCH v2 1/7] block/qcow2-refcount: fix check_oflag_copied Vladimir Sementsov-Ogievskiy
2018-10-08 15:28   ` Max Reitz
2018-08-17 12:22 ` [Qemu-devel] [PATCH v2 2/7] block/qcow2-refcount: avoid eating RAM Vladimir Sementsov-Ogievskiy
2018-10-08 15:31   ` Max Reitz
2018-10-08 20:17     ` Vladimir Sementsov-Ogievskiy
2018-10-08 20:22     ` Vladimir Sementsov-Ogievskiy
2018-10-08 20:39       ` Max Reitz
2018-08-17 12:22 ` [Qemu-devel] [PATCH v2 3/7] block/qcow2-refcount: check_refcounts_l2: refactor compressed case Vladimir Sementsov-Ogievskiy
2018-10-08 15:40   ` Max Reitz
2018-08-17 12:22 ` [Qemu-devel] [PATCH v2 4/7] block/qcow2-refcount: check_refcounts_l2: reduce ignored overlaps Vladimir Sementsov-Ogievskiy
2018-10-08 15:44   ` Max Reitz
2018-08-17 12:22 ` [Qemu-devel] [PATCH v2 5/7] block/qcow2-refcount: check_refcounts_l2: split fix_l2_entry_to_zero Vladimir Sementsov-Ogievskiy
2018-10-08 19:54   ` Max Reitz
2018-10-10 12:25     ` Vladimir Sementsov-Ogievskiy
2018-08-17 12:22 ` [Qemu-devel] [PATCH v2 6/7] block/qcow2-refcount: fix out-of-file L1 entries to be zero Vladimir Sementsov-Ogievskiy
2018-10-08 20:09   ` Max Reitz
2018-08-17 12:22 ` [Qemu-devel] [PATCH v2 7/7] block/qcow2-refcount: fix out-of-file L2 entries to be read-as-zero Vladimir Sementsov-Ogievskiy
2018-10-08 20:51   ` Max Reitz
2018-10-08 22:02     ` Vladimir Sementsov-Ogievskiy
2018-10-08 22:08       ` Max Reitz
2018-10-08 22:14         ` Vladimir Sementsov-Ogievskiy
2018-10-08 22:21           ` Max Reitz
2018-10-08 23:14             ` Vladimir Sementsov-Ogievskiy
2018-10-13 12:51               ` Max Reitz
2018-10-10 16:39   ` Vladimir Sementsov-Ogievskiy
2018-10-10 16:55     ` Vladimir Sementsov-Ogievskiy
2018-10-10 16:59       ` Vladimir Sementsov-Ogievskiy
2018-10-13 12:58         ` Max Reitz
2018-12-12  8:36           ` Vladimir Sementsov-Ogievskiy
2018-12-12 12:49             ` Max Reitz
2018-10-08 15:02 ` [Qemu-devel] [PATCH 0/7] qcow2 check improvements Vladimir Sementsov-Ogievskiy

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.