QEMU-Devel Archive on lore.kernel.org
 help / color / Atom feed
* [Qemu-devel] [PATCH for-4.2 00/13] qcow2: Let check -r all repair some snapshot bits
@ 2019-07-30 17:24 Max Reitz
  2019-07-30 17:24 ` [Qemu-devel] [PATCH for-4.2 01/13] qcow2: Add Error ** to qcow2_read_snapshots() Max Reitz
                   ` (13 more replies)
  0 siblings, 14 replies; 38+ messages in thread
From: Max Reitz @ 2019-07-30 17:24 UTC (permalink / raw)
  To: qemu-block; +Cc: Kevin Wolf, qemu-devel, Max Reitz

Hi,

As Eric reports in https://bugzilla.redhat.com/show_bug.cgi?id=1727347,
qemu-img amend has a bug when it comes to converting qcow2 v2 images to
v3: In v3, every snapshot table entry requires at least 16 bytes of
extra metadata to be present, which isn’t the case for v2 images.
Currently, qemu-img amend doesn’t take care of updating the snapshot
table, so the image is a bit corrupt afterwards (luckily, qemu doesn’t
take notice, though).

This yields the following patches:
- Patch 3: Helper patch
- Patch 4: Helper patch, so we can actually do more than just to bump up
  the version number when upgrading a qcow2 image from v2 to v3
- Patch 5: The fix

Eric also points out that qemu-img check does not see any problem with
such images and doesn’t fix them, so:
- Patch 11: Makes qemu-img check report if a snapshot table entry has
  too little extra data, and repair it with -r all
  (Patches 6 and 7 add the necessary infrastructure so we can check and
  repair the snapshot table at all.)

Then I got the glorious idea of “Hey, if I want to see how much extra
data a snapshot table entry has outside of qcow2_read_snapshots(), I
should add a field that reports that value to QCowSnapshot.  And if I do
that, I might as well make the qcow2 driver interpret the specification
a bit more literally, namely it should ignore all unknown extra data,
that is (as I interpret it), keep it in memory and write it back when
updating the snapshot table.”

That led to patch 2.  Maybe you find that stupid, in which case we can
totally drop patch 2 (with some changes to other patches).

Anyway.  Because of this, qcow2_read_snapshots() suddenly got more error
case, so I thought now would be a good time to give it an Error **
parameter.  Cue patch 1.

At this point:
(1) I had infrastructure for repairing a snapshot table in
    qemu-img check -r all, and
(2) I had added a new error case if a snapshot table entry has a
    suspiciously large amount of extra data.

I decided that this should be repairable, too.  This is done by patch 8.
(If we drop patch 2, this will go, too.)

Now I was really into it, so I decided even more things needed fixing!
Namely the final two reasons why we would reject a snapshot table:
(1) It has too many snapshots (patch 10),
(2) It is too long overall (patch 9).


Finally, patch 13 adds an overly complicated test (using the new
peek_file* functions added in patch 12).


Max Reitz (13):
  qcow2: Add Error ** to qcow2_read_snapshots()
  qcow2: Keep unknown extra snapshot data
  qcow2: Make qcow2_write_snapshots() public
  qcow2: Put qcow2_upgrade() into an own function
  qcow2: Write v3-compliant snapshot list on upgrade
  qcow2: Separate qcow2_check_read_snapshot_table()
  qcow2: Add qcow2_check_fix_snapshot_table()
  qcow2: Fix broken snapshot table entries
  qcow2: Fix overly long snapshot tables
  qcow2: Repair snapshot table with too many entries
  qcow2: Fix v3 snapshot table entry compliancy
  iotests: Add peek_file* functions
  iotests: Test qcow2's snapshot table handling

 block/qcow2.h                |  15 +-
 block/qcow2-snapshot.c       | 302 +++++++++++++++++++++--
 block/qcow2.c                | 152 ++++++++++--
 tests/qemu-iotests/261       | 449 +++++++++++++++++++++++++++++++++++
 tests/qemu-iotests/261.out   | 321 +++++++++++++++++++++++++
 tests/qemu-iotests/common.rc |  20 ++
 tests/qemu-iotests/group     |   1 +
 7 files changed, 1219 insertions(+), 41 deletions(-)
 create mode 100755 tests/qemu-iotests/261
 create mode 100644 tests/qemu-iotests/261.out

-- 
2.21.0



^ permalink raw reply	[flat|nested] 38+ messages in thread

* [Qemu-devel] [PATCH for-4.2 01/13] qcow2: Add Error ** to qcow2_read_snapshots()
  2019-07-30 17:24 [Qemu-devel] [PATCH for-4.2 00/13] qcow2: Let check -r all repair some snapshot bits Max Reitz
@ 2019-07-30 17:24 ` Max Reitz
  2019-07-30 17:41   ` Eric Blake
  2019-07-30 17:24 ` [Qemu-devel] [PATCH for-4.2 02/13] qcow2: Keep unknown extra snapshot data Max Reitz
                   ` (12 subsequent siblings)
  13 siblings, 1 reply; 38+ messages in thread
From: Max Reitz @ 2019-07-30 17:24 UTC (permalink / raw)
  To: qemu-block; +Cc: Kevin Wolf, qemu-devel, Max Reitz

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 block/qcow2.h          | 2 +-
 block/qcow2-snapshot.c | 7 ++++++-
 block/qcow2.c          | 3 +--
 3 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/block/qcow2.h b/block/qcow2.h
index fc1b0d3c1e..175708cee0 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -704,7 +704,7 @@ int qcow2_snapshot_load_tmp(BlockDriverState *bs,
                             Error **errp);
 
 void qcow2_free_snapshots(BlockDriverState *bs);
-int qcow2_read_snapshots(BlockDriverState *bs);
+int qcow2_read_snapshots(BlockDriverState *bs, Error **errp);
 
 /* qcow2-cache.c functions */
 Qcow2Cache *qcow2_cache_create(BlockDriverState *bs, int num_tables,
diff --git a/block/qcow2-snapshot.c b/block/qcow2-snapshot.c
index d0e7fa9311..6295ba4c9c 100644
--- a/block/qcow2-snapshot.c
+++ b/block/qcow2-snapshot.c
@@ -43,7 +43,7 @@ void qcow2_free_snapshots(BlockDriverState *bs)
     s->nb_snapshots = 0;
 }
 
-int qcow2_read_snapshots(BlockDriverState *bs)
+int qcow2_read_snapshots(BlockDriverState *bs, Error **errp)
 {
     BDRVQcow2State *s = bs->opaque;
     QCowSnapshotHeader h;
@@ -68,6 +68,7 @@ int qcow2_read_snapshots(BlockDriverState *bs)
         offset = ROUND_UP(offset, 8);
         ret = bdrv_pread(bs->file, offset, &h, sizeof(h));
         if (ret < 0) {
+            error_setg_errno(errp, -ret, "Failed to read snapshot table");
             goto fail;
         }
 
@@ -88,6 +89,7 @@ int qcow2_read_snapshots(BlockDriverState *bs)
         ret = bdrv_pread(bs->file, offset, &extra,
                          MIN(sizeof(extra), extra_data_size));
         if (ret < 0) {
+            error_setg_errno(errp, -ret, "Failed to read snapshot table");
             goto fail;
         }
         offset += extra_data_size;
@@ -106,6 +108,7 @@ int qcow2_read_snapshots(BlockDriverState *bs)
         sn->id_str = g_malloc(id_str_size + 1);
         ret = bdrv_pread(bs->file, offset, sn->id_str, id_str_size);
         if (ret < 0) {
+            error_setg_errno(errp, -ret, "Failed to read snapshot table");
             goto fail;
         }
         offset += id_str_size;
@@ -115,6 +118,7 @@ int qcow2_read_snapshots(BlockDriverState *bs)
         sn->name = g_malloc(name_size + 1);
         ret = bdrv_pread(bs->file, offset, sn->name, name_size);
         if (ret < 0) {
+            error_setg_errno(errp, -ret, "Failed to read snapshot table");
             goto fail;
         }
         offset += name_size;
@@ -122,6 +126,7 @@ int qcow2_read_snapshots(BlockDriverState *bs)
 
         if (offset - s->snapshots_offset > QCOW_MAX_SNAPSHOTS_SIZE) {
             ret = -EFBIG;
+            error_setg(errp, "Snapshot table is too big");
             goto fail;
         }
     }
diff --git a/block/qcow2.c b/block/qcow2.c
index 039bdc2f7e..0336deaf2c 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -1577,9 +1577,8 @@ static int coroutine_fn qcow2_do_open(BlockDriverState *bs, QDict *options,
     s->snapshots_offset = header.snapshots_offset;
     s->nb_snapshots = header.nb_snapshots;
 
-    ret = qcow2_read_snapshots(bs);
+    ret = qcow2_read_snapshots(bs, errp);
     if (ret < 0) {
-        error_setg_errno(errp, -ret, "Could not read snapshots");
         goto fail;
     }
 
-- 
2.21.0



^ permalink raw reply	[flat|nested] 38+ messages in thread

* [Qemu-devel] [PATCH for-4.2 02/13] qcow2: Keep unknown extra snapshot data
  2019-07-30 17:24 [Qemu-devel] [PATCH for-4.2 00/13] qcow2: Let check -r all repair some snapshot bits Max Reitz
  2019-07-30 17:24 ` [Qemu-devel] [PATCH for-4.2 01/13] qcow2: Add Error ** to qcow2_read_snapshots() Max Reitz
@ 2019-07-30 17:24 ` Max Reitz
  2019-07-30 17:56   ` Eric Blake
  2019-07-30 17:24 ` [Qemu-devel] [PATCH for-4.2 03/13] qcow2: Make qcow2_write_snapshots() public Max Reitz
                   ` (11 subsequent siblings)
  13 siblings, 1 reply; 38+ messages in thread
From: Max Reitz @ 2019-07-30 17:24 UTC (permalink / raw)
  To: qemu-block; +Cc: Kevin Wolf, qemu-devel, Max Reitz

The qcow2 specification says to ignore unknown extra data fields in
snapshot table entries.  Currently, we discard it whenever we update the
image, which is a bit different from "ignore".

This patch makes the qcow2 driver keep all unknown extra data fields
when updating an image's snapshot table.

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 block/qcow2.h          |  5 ++++
 block/qcow2-snapshot.c | 59 +++++++++++++++++++++++++++++++++++-------
 2 files changed, 55 insertions(+), 9 deletions(-)

diff --git a/block/qcow2.h b/block/qcow2.h
index 175708cee0..290a48b77e 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -61,6 +61,9 @@
  * space for snapshot names and IDs */
 #define QCOW_MAX_SNAPSHOTS_SIZE (1024 * QCOW_MAX_SNAPSHOTS)
 
+/* Maximum amount of extra data per snapshot table entry to accept */
+#define QCOW_MAX_SNAPSHOT_EXTRA_DATA 1024
+
 /* Bitmap header extension constraints */
 #define QCOW2_MAX_BITMAPS 65535
 #define QCOW2_MAX_BITMAP_DIRECTORY_SIZE (1024 * QCOW2_MAX_BITMAPS)
@@ -178,6 +181,8 @@ typedef struct QCowSnapshot {
     uint32_t date_sec;
     uint32_t date_nsec;
     uint64_t vm_clock_nsec;
+    uint32_t extra_data_size;
+    void *unknown_extra_data; /* Extra data past QCowSnapshotExtraData */
 } QCowSnapshot;
 
 struct Qcow2Cache;
diff --git a/block/qcow2-snapshot.c b/block/qcow2-snapshot.c
index 6295ba4c9c..b037c6e5fd 100644
--- a/block/qcow2-snapshot.c
+++ b/block/qcow2-snapshot.c
@@ -37,6 +37,7 @@ void qcow2_free_snapshots(BlockDriverState *bs)
     for(i = 0; i < s->nb_snapshots; i++) {
         g_free(s->snapshots[i].name);
         g_free(s->snapshots[i].id_str);
+        g_free(s->snapshots[i].unknown_extra_data);
     }
     g_free(s->snapshots);
     s->snapshots = NULL;
@@ -51,7 +52,6 @@ int qcow2_read_snapshots(BlockDriverState *bs, Error **errp)
     QCowSnapshot *sn;
     int i, id_str_size, name_size;
     int64_t offset;
-    uint32_t extra_data_size;
     int ret;
 
     if (!s->nb_snapshots) {
@@ -80,30 +80,52 @@ int qcow2_read_snapshots(BlockDriverState *bs, Error **errp)
         sn->date_sec = be32_to_cpu(h.date_sec);
         sn->date_nsec = be32_to_cpu(h.date_nsec);
         sn->vm_clock_nsec = be64_to_cpu(h.vm_clock_nsec);
-        extra_data_size = be32_to_cpu(h.extra_data_size);
+        sn->extra_data_size = be32_to_cpu(h.extra_data_size);
 
         id_str_size = be16_to_cpu(h.id_str_size);
         name_size = be16_to_cpu(h.name_size);
 
-        /* Read extra data */
+        if (sn->extra_data_size > QCOW_MAX_SNAPSHOT_EXTRA_DATA) {
+            ret = -EFBIG;
+            error_setg(errp, "Too much extra metadata in snapshot table "
+                       "entry %i", i);
+            goto fail;
+        }
+
+        /* Read known extra data */
         ret = bdrv_pread(bs->file, offset, &extra,
-                         MIN(sizeof(extra), extra_data_size));
+                         MIN(sizeof(extra), sn->extra_data_size));
         if (ret < 0) {
             error_setg_errno(errp, -ret, "Failed to read snapshot table");
             goto fail;
         }
-        offset += extra_data_size;
+        offset += MIN(sizeof(extra), sn->extra_data_size);
 
-        if (extra_data_size >= 8) {
+        if (sn->extra_data_size >= 8) {
             sn->vm_state_size = be64_to_cpu(extra.vm_state_size_large);
         }
 
-        if (extra_data_size >= 16) {
+        if (sn->extra_data_size >= 16) {
             sn->disk_size = be64_to_cpu(extra.disk_size);
         } else {
             sn->disk_size = bs->total_sectors * BDRV_SECTOR_SIZE;
         }
 
+        if (sn->extra_data_size > sizeof(extra)) {
+            /* Store unknown extra data */
+            size_t unknown_extra_data_size =
+                sn->extra_data_size - sizeof(extra);
+
+            sn->unknown_extra_data = g_malloc(unknown_extra_data_size);
+            ret = bdrv_pread(bs->file, offset, sn->unknown_extra_data,
+                             unknown_extra_data_size);
+            if (ret < 0) {
+                error_setg_errno(errp, -ret, "Failed to read snapshot table");
+                goto fail;
+            }
+            offset += unknown_extra_data_size;
+        }
+
         /* Read snapshot ID */
         sn->id_str = g_malloc(id_str_size + 1);
         ret = bdrv_pread(bs->file, offset, sn->id_str, id_str_size);
@@ -161,7 +183,7 @@ static int qcow2_write_snapshots(BlockDriverState *bs)
         sn = s->snapshots + i;
         offset = ROUND_UP(offset, 8);
         offset += sizeof(h);
-        offset += sizeof(extra);
+        offset += MAX(sizeof(extra), sn->extra_data_size);
         offset += strlen(sn->id_str);
         offset += strlen(sn->name);
 
@@ -208,7 +230,8 @@ static int qcow2_write_snapshots(BlockDriverState *bs)
         h.date_sec = cpu_to_be32(sn->date_sec);
         h.date_nsec = cpu_to_be32(sn->date_nsec);
         h.vm_clock_nsec = cpu_to_be64(sn->vm_clock_nsec);
-        h.extra_data_size = cpu_to_be32(sizeof(extra));
+        h.extra_data_size = cpu_to_be32(MAX(sizeof(extra),
+                                            sn->extra_data_size));
 
         memset(&extra, 0, sizeof(extra));
         extra.vm_state_size_large = cpu_to_be64(sn->vm_state_size);
@@ -233,6 +256,22 @@ static int qcow2_write_snapshots(BlockDriverState *bs)
         }
         offset += sizeof(extra);
 
+        if (sn->extra_data_size > sizeof(extra)) {
+            size_t unknown_extra_data_size =
+                sn->extra_data_size - sizeof(extra);
+
+            /* qcow2_read_snapshots() ensures no unbounded allocation */
+            assert(unknown_extra_data_size <= INT_MAX);
+            assert(sn->unknown_extra_data);
+
+            ret = bdrv_pwrite(bs->file, offset, sn->unknown_extra_data,
+                              unknown_extra_data_size);
+            if (ret < 0) {
+                goto fail;
+            }
+            offset += unknown_extra_data_size;
+        }
+
         ret = bdrv_pwrite(bs->file, offset, sn->id_str, id_str_size);
         if (ret < 0) {
             goto fail;
@@ -375,6 +414,7 @@ int qcow2_snapshot_create(BlockDriverState *bs, QEMUSnapshotInfo *sn_info)
     sn->date_sec = sn_info->date_sec;
     sn->date_nsec = sn_info->date_nsec;
     sn->vm_clock_nsec = sn_info->vm_clock_nsec;
+    sn->extra_data_size = sizeof(QCowSnapshotExtraData);
 
     /* Allocate the L1 table of the snapshot and copy the current one there. */
     l1_table_offset = qcow2_alloc_clusters(bs, s->l1_size * sizeof(uint64_t));
@@ -646,6 +686,7 @@ int qcow2_snapshot_delete(BlockDriverState *bs,
      * The snapshot is now unused, clean up. If we fail after this point, we
      * won't recover but just leak clusters.
      */
+    g_free(sn.unknown_extra_data);
     g_free(sn.id_str);
     g_free(sn.name);
 
-- 
2.21.0



^ permalink raw reply	[flat|nested] 38+ messages in thread

* [Qemu-devel] [PATCH for-4.2 03/13] qcow2: Make qcow2_write_snapshots() public
  2019-07-30 17:24 [Qemu-devel] [PATCH for-4.2 00/13] qcow2: Let check -r all repair some snapshot bits Max Reitz
  2019-07-30 17:24 ` [Qemu-devel] [PATCH for-4.2 01/13] qcow2: Add Error ** to qcow2_read_snapshots() Max Reitz
  2019-07-30 17:24 ` [Qemu-devel] [PATCH for-4.2 02/13] qcow2: Keep unknown extra snapshot data Max Reitz
@ 2019-07-30 17:24 ` Max Reitz
  2019-07-30 17:57   ` Eric Blake
  2019-07-30 17:24 ` [Qemu-devel] [PATCH for-4.2 04/13] qcow2: Put qcow2_upgrade() into an own function Max Reitz
                   ` (10 subsequent siblings)
  13 siblings, 1 reply; 38+ messages in thread
From: Max Reitz @ 2019-07-30 17:24 UTC (permalink / raw)
  To: qemu-block; +Cc: Kevin Wolf, qemu-devel, Max Reitz

Updating the snapshot list will be useful when upgrading a v2 image to
v3, so we will need to call this function in qcow2.c.

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 block/qcow2.h          | 1 +
 block/qcow2-snapshot.c | 2 +-
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/block/qcow2.h b/block/qcow2.h
index 290a48b77e..77586d81b9 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -710,6 +710,7 @@ int qcow2_snapshot_load_tmp(BlockDriverState *bs,
 
 void qcow2_free_snapshots(BlockDriverState *bs);
 int qcow2_read_snapshots(BlockDriverState *bs, Error **errp);
+int qcow2_write_snapshots(BlockDriverState *bs);
 
 /* qcow2-cache.c functions */
 Qcow2Cache *qcow2_cache_create(BlockDriverState *bs, int num_tables,
diff --git a/block/qcow2-snapshot.c b/block/qcow2-snapshot.c
index b037c6e5fd..3d097b92d7 100644
--- a/block/qcow2-snapshot.c
+++ b/block/qcow2-snapshot.c
@@ -163,7 +163,7 @@ fail:
 }
 
 /* add at the end of the file a new list of snapshots */
-static int qcow2_write_snapshots(BlockDriverState *bs)
+int qcow2_write_snapshots(BlockDriverState *bs)
 {
     BDRVQcow2State *s = bs->opaque;
     QCowSnapshot *sn;
-- 
2.21.0



^ permalink raw reply	[flat|nested] 38+ messages in thread

* [Qemu-devel] [PATCH for-4.2 04/13] qcow2: Put qcow2_upgrade() into an own function
  2019-07-30 17:24 [Qemu-devel] [PATCH for-4.2 00/13] qcow2: Let check -r all repair some snapshot bits Max Reitz
                   ` (2 preceding siblings ...)
  2019-07-30 17:24 ` [Qemu-devel] [PATCH for-4.2 03/13] qcow2: Make qcow2_write_snapshots() public Max Reitz
@ 2019-07-30 17:24 ` Max Reitz
  2019-07-30 18:00   ` Eric Blake
  2019-07-30 17:25 ` [Qemu-devel] [PATCH for-4.2 05/13] qcow2: Write v3-compliant snapshot list on upgrade Max Reitz
                   ` (9 subsequent siblings)
  13 siblings, 1 reply; 38+ messages in thread
From: Max Reitz @ 2019-07-30 17:24 UTC (permalink / raw)
  To: qemu-block; +Cc: Kevin Wolf, qemu-devel, Max Reitz

This does not make sense right now, but it will make sense once we need
to do more than to just update s->qcow_version.

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 block/qcow2.c | 43 ++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 38 insertions(+), 5 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index 0336deaf2c..99e9dad798 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -4721,12 +4721,46 @@ static int qcow2_downgrade(BlockDriverState *bs, int target_version,
     return 0;
 }
 
+/*
+ * Upgrades an image's version.  While newer versions encompass all
+ * features of older versions, some things may have to be presented
+ * differently.
+ */
+static int qcow2_upgrade(BlockDriverState *bs, int target_version,
+                         BlockDriverAmendStatusCB *status_cb, void *cb_opaque,
+                         Error **errp)
+{
+    BDRVQcow2State *s = bs->opaque;
+    int current_version = s->qcow_version;
+    int ret;
+
+    /* This is qcow2_upgrade(), not qcow2_downgrade() */
+    assert(target_version > current_version);
+
+    /* There are no other versions (yet) that you can upgrade to */
+    assert(target_version == 3);
+
+    status_cb(bs, 0, 1, cb_opaque);
+
+    s->qcow_version = target_version;
+    ret = qcow2_update_header(bs);
+    if (ret < 0) {
+        s->qcow_version = current_version;
+        error_setg_errno(errp, -ret, "Failed to update the image header");
+        return ret;
+    }
+    status_cb(bs, 1, 1, cb_opaque);
+
+    return 0;
+}
+
 typedef enum Qcow2AmendOperation {
     /* This is the value Qcow2AmendHelperCBInfo::last_operation will be
      * statically initialized to so that the helper CB can discern the first
      * invocation from an operation change */
     QCOW2_NO_OPERATION = 0,
 
+    QCOW2_UPGRADING,
     QCOW2_CHANGING_REFCOUNT_ORDER,
     QCOW2_DOWNGRADING,
 } Qcow2AmendOperation;
@@ -4909,17 +4943,16 @@ static int qcow2_amend_options(BlockDriverState *bs, QemuOpts *opts,
     helper_cb_info = (Qcow2AmendHelperCBInfo){
         .original_status_cb = status_cb,
         .original_cb_opaque = cb_opaque,
-        .total_operations = (new_version < old_version)
+        .total_operations = (new_version != old_version)
                           + (s->refcount_bits != refcount_bits)
     };
 
     /* Upgrade first (some features may require compat=1.1) */
     if (new_version > old_version) {
-        s->qcow_version = new_version;
-        ret = qcow2_update_header(bs);
+        helper_cb_info.current_operation = QCOW2_UPGRADING;
+        ret = qcow2_upgrade(bs, new_version, &qcow2_amend_helper_cb,
+                            &helper_cb_info, errp);
         if (ret < 0) {
-            s->qcow_version = old_version;
-            error_setg_errno(errp, -ret, "Failed to update the image header");
             return ret;
         }
     }
-- 
2.21.0



^ permalink raw reply	[flat|nested] 38+ messages in thread

* [Qemu-devel] [PATCH for-4.2 05/13] qcow2: Write v3-compliant snapshot list on upgrade
  2019-07-30 17:24 [Qemu-devel] [PATCH for-4.2 00/13] qcow2: Let check -r all repair some snapshot bits Max Reitz
                   ` (3 preceding siblings ...)
  2019-07-30 17:24 ` [Qemu-devel] [PATCH for-4.2 04/13] qcow2: Put qcow2_upgrade() into an own function Max Reitz
@ 2019-07-30 17:25 ` Max Reitz
  2019-07-30 18:10   ` Eric Blake
  2019-07-30 17:25 ` [Qemu-devel] [PATCH for-4.2 06/13] qcow2: Separate qcow2_check_read_snapshot_table() Max Reitz
                   ` (8 subsequent siblings)
  13 siblings, 1 reply; 38+ messages in thread
From: Max Reitz @ 2019-07-30 17:25 UTC (permalink / raw)
  To: qemu-block; +Cc: Kevin Wolf, qemu-devel, Max Reitz

qcow2 v3 requires every snapshot table entry to have two extra data
fields: The 64-bit VM state size, and the virtual disk size.  Both are
optional for v2 images, so they may not be present.

qcow2_upgrade() therefore should update the snapshot table to ensure all
entries have these extra data fields.

Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1727347
Reported-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 block/qcow2.c | 29 +++++++++++++++++++++++++++--
 1 file changed, 27 insertions(+), 2 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index 99e9dad798..75d41683a1 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -4731,7 +4731,9 @@ static int qcow2_upgrade(BlockDriverState *bs, int target_version,
                          Error **errp)
 {
     BDRVQcow2State *s = bs->opaque;
+    bool need_snapshot_update;
     int current_version = s->qcow_version;
+    int i;
     int ret;
 
     /* This is qcow2_upgrade(), not qcow2_downgrade() */
@@ -4740,7 +4742,30 @@ static int qcow2_upgrade(BlockDriverState *bs, int target_version,
     /* There are no other versions (yet) that you can upgrade to */
     assert(target_version == 3);
 
-    status_cb(bs, 0, 1, cb_opaque);
+    status_cb(bs, 0, 2, cb_opaque);
+
+    /*
+     * In v2, snapshots do not need to have extra data.  v3 requires
+     * the 64-bit VM state size and the virtual disk size to be
+     * present.
+     * qcow2_write_snapshots() will always write the list in the
+     * v3-compliant format.
+     */
+    need_snapshot_update = false;
+    for (i = 0; i < s->nb_snapshots; i++) {
+        if (s->snapshots[i].extra_data_size < 16) {
+            need_snapshot_update = true;
+            break;
+        }
+    }
+    if (need_snapshot_update) {
+        ret = qcow2_write_snapshots(bs);
+        if (ret < 0) {
+            error_setg_errno(errp, -ret, "Failed to update the snapshot table");
+            return ret;
+        }
+    }
+    status_cb(bs, 1, 2, cb_opaque);
 
     s->qcow_version = target_version;
     ret = qcow2_update_header(bs);
@@ -4749,7 +4774,7 @@ static int qcow2_upgrade(BlockDriverState *bs, int target_version,
         error_setg_errno(errp, -ret, "Failed to update the image header");
         return ret;
     }
-    status_cb(bs, 1, 1, cb_opaque);
+    status_cb(bs, 2, 2, cb_opaque);
 
     return 0;
 }
-- 
2.21.0



^ permalink raw reply	[flat|nested] 38+ messages in thread

* [Qemu-devel] [PATCH for-4.2 06/13] qcow2: Separate qcow2_check_read_snapshot_table()
  2019-07-30 17:24 [Qemu-devel] [PATCH for-4.2 00/13] qcow2: Let check -r all repair some snapshot bits Max Reitz
                   ` (4 preceding siblings ...)
  2019-07-30 17:25 ` [Qemu-devel] [PATCH for-4.2 05/13] qcow2: Write v3-compliant snapshot list on upgrade Max Reitz
@ 2019-07-30 17:25 ` Max Reitz
  2019-07-30 18:53   ` Eric Blake
  2019-07-30 17:25 ` [Qemu-devel] [PATCH for-4.2 07/13] qcow2: Add qcow2_check_fix_snapshot_table() Max Reitz
                   ` (7 subsequent siblings)
  13 siblings, 1 reply; 38+ messages in thread
From: Max Reitz @ 2019-07-30 17:25 UTC (permalink / raw)
  To: qemu-block; +Cc: Kevin Wolf, qemu-devel, Max Reitz

Reading the snapshot table can fail.  That is a problem when we want to
repair the image.

Therefore, stop reading the snapshot table in qcow2_do_open() in check
mode.  Instead, add a new function qcow2_check_read_snapshot_table()
that reads the snapshot table at a later point.  In the future, we want
to handle errors here and fix them.

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 block/qcow2.h          |  4 +++
 block/qcow2-snapshot.c | 58 ++++++++++++++++++++++++++++++++
 block/qcow2.c          | 76 ++++++++++++++++++++++++++++++++----------
 3 files changed, 120 insertions(+), 18 deletions(-)

diff --git a/block/qcow2.h b/block/qcow2.h
index 77586d81b9..50c7eefb5b 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -712,6 +712,10 @@ void qcow2_free_snapshots(BlockDriverState *bs);
 int qcow2_read_snapshots(BlockDriverState *bs, Error **errp);
 int qcow2_write_snapshots(BlockDriverState *bs);
 
+int coroutine_fn qcow2_check_read_snapshot_table(BlockDriverState *bs,
+                                                 BdrvCheckResult *result,
+                                                 BdrvCheckMode fix);
+
 /* qcow2-cache.c functions */
 Qcow2Cache *qcow2_cache_create(BlockDriverState *bs, int num_tables,
                                unsigned table_size);
diff --git a/block/qcow2-snapshot.c b/block/qcow2-snapshot.c
index 3d097b92d7..cbc6d699b6 100644
--- a/block/qcow2-snapshot.c
+++ b/block/qcow2-snapshot.c
@@ -321,6 +321,64 @@ fail:
     return ret;
 }
 
+int coroutine_fn qcow2_check_read_snapshot_table(BlockDriverState *bs,
+                                                 BdrvCheckResult *result,
+                                                 BdrvCheckMode fix)
+{
+    BDRVQcow2State *s = bs->opaque;
+    Error *local_err = NULL;
+    int ret;
+    struct {
+        uint32_t nb_snapshots;
+        uint64_t snapshots_offset;
+    } QEMU_PACKED snapshot_table_pointer;
+
+    /* qcow2_do_open() discards this information in check mode */
+    ret = bdrv_pread(bs->file, 60, &snapshot_table_pointer,
+                     sizeof(snapshot_table_pointer));
+    if (ret < 0) {
+        result->check_errors++;
+        fprintf(stderr, "ERROR failed to read the snapshot table pointer from "
+                "the image header: %s\n", strerror(-ret));
+        return ret;
+    }
+
+    s->snapshots_offset = be64_to_cpu(snapshot_table_pointer.snapshots_offset);
+    s->nb_snapshots = be32_to_cpu(snapshot_table_pointer.nb_snapshots);
+
+    ret = qcow2_validate_table(bs, s->snapshots_offset, s->nb_snapshots,
+                               sizeof(QCowSnapshotHeader),
+                               sizeof(QCowSnapshotHeader) * QCOW_MAX_SNAPSHOTS,
+                               "snapshot table", &local_err);
+    if (ret < 0) {
+        result->check_errors++;
+        error_reportf_err(local_err, "ERROR ");
+
+        /* We did not read the snapshot table, so invalidate this information */
+        s->snapshots_offset = 0;
+        s->nb_snapshots = 0;
+
+        return ret;
+    }
+
+    qemu_co_mutex_unlock(&s->lock);
+    ret = qcow2_read_snapshots(bs, &local_err);
+    qemu_co_mutex_lock(&s->lock);
+    if (ret < 0) {
+        result->check_errors++;
+        error_reportf_err(local_err,
+                          "ERROR failed to read the snapshot table: ");
+
+        /* We did not read the snapshot table, so invalidate this information */
+        s->snapshots_offset = 0;
+        s->nb_snapshots = 0;
+
+        return ret;
+    }
+
+    return 0;
+}
+
 static void find_new_snapshot_id(BlockDriverState *bs,
                                  char *id_str, int id_str_size)
 {
diff --git a/block/qcow2.c b/block/qcow2.c
index 75d41683a1..2983a7795a 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -567,11 +567,40 @@ int qcow2_mark_consistent(BlockDriverState *bs)
     return 0;
 }
 
+static void qcow2_add_check_result(BdrvCheckResult *out,
+                                   const BdrvCheckResult *src,
+                                   bool set_allocation_info)
+{
+    out->corruptions += src->corruptions;
+    out->leaks += src->leaks;
+    out->check_errors += src->check_errors;
+    out->corruptions_fixed += src->corruptions_fixed;
+    out->leaks_fixed += src->leaks_fixed;
+
+    if (set_allocation_info) {
+        out->image_end_offset = src->image_end_offset;
+        out->bfi = src->bfi;
+    }
+}
+
 static int coroutine_fn qcow2_co_check_locked(BlockDriverState *bs,
                                               BdrvCheckResult *result,
                                               BdrvCheckMode fix)
 {
-    int ret = qcow2_check_refcounts(bs, result, fix);
+    BdrvCheckResult snapshot_res = {};
+    BdrvCheckResult refcount_res = {};
+    int ret;
+
+    memset(result, 0, sizeof(*result));
+
+    ret = qcow2_check_read_snapshot_table(bs, &snapshot_res, fix);
+    qcow2_add_check_result(result, &snapshot_res, false);
+    if (ret < 0) {
+        return ret;
+    }
+
+    ret = qcow2_check_refcounts(bs, &refcount_res, fix);
+    qcow2_add_check_result(result, &refcount_res, true);
     if (ret < 0) {
         return ret;
     }
@@ -1403,17 +1432,22 @@ static int coroutine_fn qcow2_do_open(BlockDriverState *bs, QDict *options,
         goto fail;
     }
 
-    /* The total size in bytes of the snapshot table is checked in
-     * qcow2_read_snapshots() because the size of each snapshot is
-     * variable and we don't know it yet.
-     * Here we only check the offset and number of snapshots. */
-    ret = qcow2_validate_table(bs, header.snapshots_offset,
-                               header.nb_snapshots,
-                               sizeof(QCowSnapshotHeader),
-                               sizeof(QCowSnapshotHeader) * QCOW_MAX_SNAPSHOTS,
-                               "Snapshot table", errp);
-    if (ret < 0) {
-        goto fail;
+    if (!(flags & BDRV_O_CHECK)) {
+        /*
+         * The total size in bytes of the snapshot table is checked in
+         * qcow2_read_snapshots() because the size of each snapshot is
+         * variable and we don't know it yet.
+         * Here we only check the offset and number of snapshots.
+         */
+        ret = qcow2_validate_table(bs, header.snapshots_offset,
+                                   header.nb_snapshots,
+                                   sizeof(QCowSnapshotHeader),
+                                   sizeof(QCowSnapshotHeader) *
+                                       QCOW_MAX_SNAPSHOTS,
+                                   "Snapshot table", errp);
+        if (ret < 0) {
+            goto fail;
+        }
     }
 
     /* read the level 1 table */
@@ -1573,13 +1607,19 @@ static int coroutine_fn qcow2_do_open(BlockDriverState *bs, QDict *options,
         s->image_backing_file = g_strdup(bs->auto_backing_file);
     }
 
-    /* Internal snapshots */
-    s->snapshots_offset = header.snapshots_offset;
-    s->nb_snapshots = header.nb_snapshots;
+    /*
+     * Internal snapshots; skip reading them in check mode, because
+     * we do not need them then, and we do not want to abort because
+     * of a broken table.
+     */
+    if (!(flags & BDRV_O_CHECK)) {
+        s->snapshots_offset = header.snapshots_offset;
+        s->nb_snapshots = header.nb_snapshots;
 
-    ret = qcow2_read_snapshots(bs, errp);
-    if (ret < 0) {
-        goto fail;
+        ret = qcow2_read_snapshots(bs, errp);
+        if (ret < 0) {
+            goto fail;
+        }
     }
 
     /* Clear unknown autoclear feature bits */
-- 
2.21.0



^ permalink raw reply	[flat|nested] 38+ messages in thread

* [Qemu-devel] [PATCH for-4.2 07/13] qcow2: Add qcow2_check_fix_snapshot_table()
  2019-07-30 17:24 [Qemu-devel] [PATCH for-4.2 00/13] qcow2: Let check -r all repair some snapshot bits Max Reitz
                   ` (5 preceding siblings ...)
  2019-07-30 17:25 ` [Qemu-devel] [PATCH for-4.2 06/13] qcow2: Separate qcow2_check_read_snapshot_table() Max Reitz
@ 2019-07-30 17:25 ` Max Reitz
  2019-07-30 18:54   ` Eric Blake
  2019-07-30 17:25 ` [Qemu-devel] [PATCH for-4.2 08/13] qcow2: Fix broken snapshot table entries Max Reitz
                   ` (6 subsequent siblings)
  13 siblings, 1 reply; 38+ messages in thread
From: Max Reitz @ 2019-07-30 17:25 UTC (permalink / raw)
  To: qemu-block; +Cc: Kevin Wolf, qemu-devel, Max Reitz

qcow2_check_read_snapshot_table() can perform consistency checks, but it
cannot fix everything.  Specifically, it cannot allocate new clusters,
because that should wait until the refcount structures are known to be
consistent (i.e., after qcow2_check_refcounts()).  Thus, it cannot call
qcow2_write_snapshots().

Do that in qcow2_check_fix_snapshot_table(), which is called after
qcow2_check_refcounts().

Currently, there is nothing that would set result->corruptions, so this
is a no-op.  A follow-up patch will change that.

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 block/qcow2.h          |  3 +++
 block/qcow2-snapshot.c | 25 +++++++++++++++++++++++++
 block/qcow2.c          |  9 ++++++++-
 3 files changed, 36 insertions(+), 1 deletion(-)

diff --git a/block/qcow2.h b/block/qcow2.h
index 50c7eefb5b..6823c37e82 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -715,6 +715,9 @@ int qcow2_write_snapshots(BlockDriverState *bs);
 int coroutine_fn qcow2_check_read_snapshot_table(BlockDriverState *bs,
                                                  BdrvCheckResult *result,
                                                  BdrvCheckMode fix);
+int coroutine_fn qcow2_check_fix_snapshot_table(BlockDriverState *bs,
+                                                BdrvCheckResult *result,
+                                                BdrvCheckMode fix);
 
 /* qcow2-cache.c functions */
 Qcow2Cache *qcow2_cache_create(BlockDriverState *bs, int num_tables,
diff --git a/block/qcow2-snapshot.c b/block/qcow2-snapshot.c
index cbc6d699b6..58609b6ea4 100644
--- a/block/qcow2-snapshot.c
+++ b/block/qcow2-snapshot.c
@@ -379,6 +379,31 @@ int coroutine_fn qcow2_check_read_snapshot_table(BlockDriverState *bs,
     return 0;
 }
 
+int coroutine_fn qcow2_check_fix_snapshot_table(BlockDriverState *bs,
+                                                BdrvCheckResult *result,
+                                                BdrvCheckMode fix)
+{
+    BDRVQcow2State *s = bs->opaque;
+    int ret;
+
+    if (result->corruptions && (fix & BDRV_FIX_ERRORS)) {
+        qemu_co_mutex_unlock(&s->lock);
+        ret = qcow2_write_snapshots(bs);
+        qemu_co_mutex_lock(&s->lock);
+        if (ret < 0) {
+            result->check_errors++;
+            fprintf(stderr, "ERROR failed to update snapshot table: %s\n",
+                    strerror(-ret));
+            return ret;
+        }
+
+        result->corruptions_fixed += result->corruptions;
+        result->corruptions = 0;
+    }
+
+    return 0;
+}
+
 static void find_new_snapshot_id(BlockDriverState *bs,
                                  char *id_str, int id_str_size)
 {
diff --git a/block/qcow2.c b/block/qcow2.c
index 2983a7795a..ac94cc7807 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -594,13 +594,20 @@ static int coroutine_fn qcow2_co_check_locked(BlockDriverState *bs,
     memset(result, 0, sizeof(*result));
 
     ret = qcow2_check_read_snapshot_table(bs, &snapshot_res, fix);
-    qcow2_add_check_result(result, &snapshot_res, false);
     if (ret < 0) {
+        qcow2_add_check_result(result, &snapshot_res, false);
         return ret;
     }
 
     ret = qcow2_check_refcounts(bs, &refcount_res, fix);
     qcow2_add_check_result(result, &refcount_res, true);
+    if (ret < 0) {
+        qcow2_add_check_result(result, &snapshot_res, false);
+        return ret;
+    }
+
+    ret = qcow2_check_fix_snapshot_table(bs, &snapshot_res, fix);
+    qcow2_add_check_result(result, &snapshot_res, false);
     if (ret < 0) {
         return ret;
     }
-- 
2.21.0



^ permalink raw reply	[flat|nested] 38+ messages in thread

* [Qemu-devel] [PATCH for-4.2 08/13] qcow2: Fix broken snapshot table entries
  2019-07-30 17:24 [Qemu-devel] [PATCH for-4.2 00/13] qcow2: Let check -r all repair some snapshot bits Max Reitz
                   ` (6 preceding siblings ...)
  2019-07-30 17:25 ` [Qemu-devel] [PATCH for-4.2 07/13] qcow2: Add qcow2_check_fix_snapshot_table() Max Reitz
@ 2019-07-30 17:25 ` Max Reitz
  2019-07-30 19:02   ` Eric Blake
  2019-07-30 17:25 ` [Qemu-devel] [PATCH for-4.2 09/13] qcow2: Fix overly long snapshot tables Max Reitz
                   ` (5 subsequent siblings)
  13 siblings, 1 reply; 38+ messages in thread
From: Max Reitz @ 2019-07-30 17:25 UTC (permalink / raw)
  To: qemu-block; +Cc: Kevin Wolf, qemu-devel, Max Reitz

The only case where we currently reject snapshot table entries is when
they have too much extra data.  Fix them with qemu-img check -r all by
counting it as a corruption, reducing their extra_data_size, and then
letting qcow2_check_fix_snapshot_table() do the rest.

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 block/qcow2-snapshot.c | 69 ++++++++++++++++++++++++++++++++++--------
 1 file changed, 56 insertions(+), 13 deletions(-)

diff --git a/block/qcow2-snapshot.c b/block/qcow2-snapshot.c
index 58609b6ea4..9956c32964 100644
--- a/block/qcow2-snapshot.c
+++ b/block/qcow2-snapshot.c
@@ -44,7 +44,23 @@ void qcow2_free_snapshots(BlockDriverState *bs)
     s->nb_snapshots = 0;
 }
 
-int qcow2_read_snapshots(BlockDriverState *bs, Error **errp)
+/*
+ * If @repair is true, try to repair a broken snapshot table instead
+ * of just returning an error:
+ *
+ * - If there were snapshots with too much extra metadata, increment
+ *   *extra_data_dropped for each.
+ *   This requires the caller to eventually rewrite the whole snapshot
+ *   table, which requires cluster allocation.  Therefore, this should
+ *   be done only after qcow2_check_refcounts() made sure the refcount
+ *   structures are valid.
+ *   (In the meantime, the image is still valid because
+ *   qcow2_check_refcounts() does not do anything with snapshots'
+ *   extra data.)
+ */
+static int qcow2_do_read_snapshots(BlockDriverState *bs, bool repair,
+                                   int *extra_data_dropped,
+                                   Error **errp)
 {
     BDRVQcow2State *s = bs->opaque;
     QCowSnapshotHeader h;
@@ -64,6 +80,8 @@ int qcow2_read_snapshots(BlockDriverState *bs, Error **errp)
     s->snapshots = g_new0(QCowSnapshot, s->nb_snapshots);
 
     for(i = 0; i < s->nb_snapshots; i++) {
+        bool discard_unknown_extra_data = false;
+
         /* Read statically sized part of the snapshot header */
         offset = ROUND_UP(offset, 8);
         ret = bdrv_pread(bs->file, offset, &h, sizeof(h));
@@ -86,10 +104,21 @@ int qcow2_read_snapshots(BlockDriverState *bs, Error **errp)
         name_size = be16_to_cpu(h.name_size);
 
         if (sn->extra_data_size > QCOW_MAX_SNAPSHOT_EXTRA_DATA) {
-            ret = -EFBIG;
-            error_setg(errp, "Too much extra metadata in snapshot table "
-                       "entry %i", i);
-            goto fail;
+            if (!repair) {
+                ret = -EFBIG;
+                error_setg(errp, "Too much extra metadata in snapshot table "
+                           "entry %i", i);
+                error_append_hint(errp, "You can force-remove this extra "
+                                  "metadata with qemu-img check -r all\n");
+                goto fail;
+            }
+
+            fprintf(stderr, "Discarding too much extra metadata in snapshot "
+                    "table entry %i (%" PRIu32 " > %u)\n",
+                    i, sn->extra_data_size, QCOW_MAX_SNAPSHOT_EXTRA_DATA);
+
+            (*extra_data_dropped)++;
+            discard_unknown_extra_data = true;
         }
 
         /* Read known extra data */
@@ -112,16 +141,22 @@ int qcow2_read_snapshots(BlockDriverState *bs, Error **errp)
         }
 
         if (sn->extra_data_size > sizeof(extra)) {
-            /* Store unknown extra data */
             size_t unknown_extra_data_size =
                 sn->extra_data_size - sizeof(extra);
 
-            sn->unknown_extra_data = g_malloc(unknown_extra_data_size);
-            ret = bdrv_pread(bs->file, offset, sn->unknown_extra_data,
-                             unknown_extra_data_size);
-            if (ret < 0) {
-                error_setg_errno(errp, -ret, "Failed to read snapshot table");
-                goto fail;
+            if (discard_unknown_extra_data) {
+                /* Discard unknown extra data */
+                sn->extra_data_size = sizeof(extra);
+            } else {
+                /* Store unknown extra data */
+                sn->unknown_extra_data = g_malloc(unknown_extra_data_size);
+                ret = bdrv_pread(bs->file, offset, sn->unknown_extra_data,
+                                 unknown_extra_data_size);
+                if (ret < 0) {
+                    error_setg_errno(errp, -ret,
+                                     "Failed to read snapshot table");
+                    goto fail;
+                }
             }
             offset += unknown_extra_data_size;
         }
@@ -162,6 +197,11 @@ fail:
     return ret;
 }
 
+int qcow2_read_snapshots(BlockDriverState *bs, Error **errp)
+{
+    return qcow2_do_read_snapshots(bs, false, NULL, errp);
+}
+
 /* add at the end of the file a new list of snapshots */
 int qcow2_write_snapshots(BlockDriverState *bs)
 {
@@ -327,6 +367,7 @@ int coroutine_fn qcow2_check_read_snapshot_table(BlockDriverState *bs,
 {
     BDRVQcow2State *s = bs->opaque;
     Error *local_err = NULL;
+    int extra_data_dropped = 0;
     int ret;
     struct {
         uint32_t nb_snapshots;
@@ -362,7 +403,8 @@ int coroutine_fn qcow2_check_read_snapshot_table(BlockDriverState *bs,
     }
 
     qemu_co_mutex_unlock(&s->lock);
-    ret = qcow2_read_snapshots(bs, &local_err);
+    ret = qcow2_do_read_snapshots(bs, fix & BDRV_FIX_ERRORS,
+                                  &extra_data_dropped, &local_err);
     qemu_co_mutex_lock(&s->lock);
     if (ret < 0) {
         result->check_errors++;
@@ -375,6 +417,7 @@ int coroutine_fn qcow2_check_read_snapshot_table(BlockDriverState *bs,
 
         return ret;
     }
+    result->corruptions += extra_data_dropped;
 
     return 0;
 }
-- 
2.21.0



^ permalink raw reply	[flat|nested] 38+ messages in thread

* [Qemu-devel] [PATCH for-4.2 09/13] qcow2: Fix overly long snapshot tables
  2019-07-30 17:24 [Qemu-devel] [PATCH for-4.2 00/13] qcow2: Let check -r all repair some snapshot bits Max Reitz
                   ` (7 preceding siblings ...)
  2019-07-30 17:25 ` [Qemu-devel] [PATCH for-4.2 08/13] qcow2: Fix broken snapshot table entries Max Reitz
@ 2019-07-30 17:25 ` Max Reitz
  2019-07-30 19:08   ` Eric Blake
  2019-07-30 17:25 ` [Qemu-devel] [PATCH for-4.2 10/13] qcow2: Repair snapshot table with too many entries Max Reitz
                   ` (4 subsequent siblings)
  13 siblings, 1 reply; 38+ messages in thread
From: Max Reitz @ 2019-07-30 17:25 UTC (permalink / raw)
  To: qemu-block; +Cc: Kevin Wolf, qemu-devel, Max Reitz

We currently refuse to open qcow2 images with overly long snapshot
tables.  This patch makes qemu-img check -r all drop all offending
entries past what we deem acceptable.

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 block/qcow2-snapshot.c | 89 +++++++++++++++++++++++++++++++++++++-----
 1 file changed, 79 insertions(+), 10 deletions(-)

diff --git a/block/qcow2-snapshot.c b/block/qcow2-snapshot.c
index 9956c32964..bd8e56a99e 100644
--- a/block/qcow2-snapshot.c
+++ b/block/qcow2-snapshot.c
@@ -29,15 +29,24 @@
 #include "qemu/error-report.h"
 #include "qemu/cutils.h"
 
+static void qcow2_free_single_snapshot(BlockDriverState *bs, int i)
+{
+    BDRVQcow2State *s = bs->opaque;
+
+    assert(i >= 0 && i < s->nb_snapshots);
+    g_free(s->snapshots[i].name);
+    g_free(s->snapshots[i].id_str);
+    g_free(s->snapshots[i].unknown_extra_data);
+    memset(&s->snapshots[i], 0, sizeof(s->snapshots[i]));
+}
+
 void qcow2_free_snapshots(BlockDriverState *bs)
 {
     BDRVQcow2State *s = bs->opaque;
     int i;
 
     for(i = 0; i < s->nb_snapshots; i++) {
-        g_free(s->snapshots[i].name);
-        g_free(s->snapshots[i].id_str);
-        g_free(s->snapshots[i].unknown_extra_data);
+        qcow2_free_single_snapshot(bs, i);
     }
     g_free(s->snapshots);
     s->snapshots = NULL;
@@ -48,6 +57,14 @@ void qcow2_free_snapshots(BlockDriverState *bs)
  * If @repair is true, try to repair a broken snapshot table instead
  * of just returning an error:
  *
+ * - If the snapshot table was too long, set *nb_clusters_reduced to
+ *   the number of snapshots removed off the end.
+ *   The caller will update the on-disk nb_snapshots accordingly;
+ *   this leaks clusters, but is safe.
+ *   (The on-disk information must be updated before
+ *   qcow2_check_refcounts(), because that function relies on
+ *   s->nb_snapshots to reflect the on-disk value.)
+ *
  * - If there were snapshots with too much extra metadata, increment
  *   *extra_data_dropped for each.
  *   This requires the caller to eventually rewrite the whole snapshot
@@ -59,6 +76,7 @@ void qcow2_free_snapshots(BlockDriverState *bs)
  *   extra data.)
  */
 static int qcow2_do_read_snapshots(BlockDriverState *bs, bool repair,
+                                   int *nb_clusters_reduced,
                                    int *extra_data_dropped,
                                    Error **errp)
 {
@@ -67,7 +85,7 @@ static int qcow2_do_read_snapshots(BlockDriverState *bs, bool repair,
     QCowSnapshotExtraData extra;
     QCowSnapshot *sn;
     int i, id_str_size, name_size;
-    int64_t offset;
+    int64_t offset, pre_sn_offset;
     int ret;
 
     if (!s->nb_snapshots) {
@@ -82,6 +100,8 @@ static int qcow2_do_read_snapshots(BlockDriverState *bs, bool repair,
     for(i = 0; i < s->nb_snapshots; i++) {
         bool discard_unknown_extra_data = false;
 
+        pre_sn_offset = offset;
+
         /* Read statically sized part of the snapshot header */
         offset = ROUND_UP(offset, 8);
         ret = bdrv_pread(bs->file, offset, &h, sizeof(h));
@@ -182,9 +202,31 @@ static int qcow2_do_read_snapshots(BlockDriverState *bs, bool repair,
         sn->name[name_size] = '\0';
 
         if (offset - s->snapshots_offset > QCOW_MAX_SNAPSHOTS_SIZE) {
-            ret = -EFBIG;
-            error_setg(errp, "Snapshot table is too big");
-            goto fail;
+            if (!repair) {
+                ret = -EFBIG;
+                error_setg(errp, "Snapshot table is too big");
+                error_append_hint(errp, "You can force-remove all %u "
+                                  "overhanging snapshots with qemu-img check "
+                                  "-r all\n", s->nb_snapshots - i);
+                goto fail;
+            }
+
+            fprintf(stderr, "Discarding %u overhanging snapshots (snapshot "
+                    "table is too big)\n", s->nb_snapshots - i);
+
+            *nb_clusters_reduced += (s->nb_snapshots - i);
+
+            /* Discard current snapshot also */
+            qcow2_free_single_snapshot(bs, i);
+
+            /*
+             * This leaks all the rest of the snapshot table and the
+             * snapshots' clusters, but we run in check -r all mode,
+             * so qcow2_check_refcounts() will take care of it.
+             */
+            s->nb_snapshots = i;
+            offset = pre_sn_offset;
+            break;
         }
     }
 
@@ -199,7 +241,7 @@ fail:
 
 int qcow2_read_snapshots(BlockDriverState *bs, Error **errp)
 {
-    return qcow2_do_read_snapshots(bs, false, NULL, errp);
+    return qcow2_do_read_snapshots(bs, false, NULL, NULL, errp);
 }
 
 /* add at the end of the file a new list of snapshots */
@@ -367,6 +409,7 @@ int coroutine_fn qcow2_check_read_snapshot_table(BlockDriverState *bs,
 {
     BDRVQcow2State *s = bs->opaque;
     Error *local_err = NULL;
+    int nb_clusters_reduced = 0;
     int extra_data_dropped = 0;
     int ret;
     struct {
@@ -404,7 +447,8 @@ int coroutine_fn qcow2_check_read_snapshot_table(BlockDriverState *bs,
 
     qemu_co_mutex_unlock(&s->lock);
     ret = qcow2_do_read_snapshots(bs, fix & BDRV_FIX_ERRORS,
-                                  &extra_data_dropped, &local_err);
+                                  &nb_clusters_reduced, &extra_data_dropped,
+                                  &local_err);
     qemu_co_mutex_lock(&s->lock);
     if (ret < 0) {
         result->check_errors++;
@@ -417,7 +461,32 @@ int coroutine_fn qcow2_check_read_snapshot_table(BlockDriverState *bs,
 
         return ret;
     }
-    result->corruptions += extra_data_dropped;
+    result->corruptions += nb_clusters_reduced + extra_data_dropped;
+
+    if (nb_clusters_reduced) {
+        /*
+         * Update image header now, because:
+         * (1) qcow2_check_refcounts() relies on s->nb_snapshots to be
+         *     the same as what the image header says,
+         * (2) this leaks clusters, but qcow2_check_refcounts() will
+         *     fix that.
+         */
+        assert(fix & BDRV_FIX_ERRORS);
+
+        snapshot_table_pointer.nb_snapshots = cpu_to_be32(s->nb_snapshots);
+        ret = bdrv_pwrite_sync(bs->file, 60,
+                               &snapshot_table_pointer.nb_snapshots,
+                               sizeof(snapshot_table_pointer.nb_snapshots));
+        if (ret < 0) {
+            result->check_errors++;
+            fprintf(stderr, "ERROR failed to update the snapshot count in the "
+                    "image header: %s\n", strerror(-ret));
+            return ret;
+        }
+
+        result->corruptions_fixed += nb_clusters_reduced;
+        result->corruptions -= nb_clusters_reduced;
+    }
 
     return 0;
 }
-- 
2.21.0



^ permalink raw reply	[flat|nested] 38+ messages in thread

* [Qemu-devel] [PATCH for-4.2 10/13] qcow2: Repair snapshot table with too many entries
  2019-07-30 17:24 [Qemu-devel] [PATCH for-4.2 00/13] qcow2: Let check -r all repair some snapshot bits Max Reitz
                   ` (8 preceding siblings ...)
  2019-07-30 17:25 ` [Qemu-devel] [PATCH for-4.2 09/13] qcow2: Fix overly long snapshot tables Max Reitz
@ 2019-07-30 17:25 ` Max Reitz
  2019-07-30 19:10   ` Eric Blake
  2019-07-30 17:25 ` [Qemu-devel] [PATCH for-4.2 11/13] qcow2: Fix v3 snapshot table entry compliancy Max Reitz
                   ` (3 subsequent siblings)
  13 siblings, 1 reply; 38+ messages in thread
From: Max Reitz @ 2019-07-30 17:25 UTC (permalink / raw)
  To: qemu-block; +Cc: Kevin Wolf, qemu-devel, Max Reitz

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 block/qcow2-snapshot.c | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/block/qcow2-snapshot.c b/block/qcow2-snapshot.c
index bd8e56a99e..9e8c7c1f7f 100644
--- a/block/qcow2-snapshot.c
+++ b/block/qcow2-snapshot.c
@@ -430,6 +430,14 @@ int coroutine_fn qcow2_check_read_snapshot_table(BlockDriverState *bs,
     s->snapshots_offset = be64_to_cpu(snapshot_table_pointer.snapshots_offset);
     s->nb_snapshots = be32_to_cpu(snapshot_table_pointer.nb_snapshots);
 
+    if (s->nb_snapshots > QCOW_MAX_SNAPSHOTS && (fix & BDRV_FIX_ERRORS)) {
+        fprintf(stderr, "Discarding %u overhanging snapshots\n",
+                s->nb_snapshots - QCOW_MAX_SNAPSHOTS);
+
+        nb_clusters_reduced += s->nb_snapshots - QCOW_MAX_SNAPSHOTS;
+        s->nb_snapshots = QCOW_MAX_SNAPSHOTS;
+    }
+
     ret = qcow2_validate_table(bs, s->snapshots_offset, s->nb_snapshots,
                                sizeof(QCowSnapshotHeader),
                                sizeof(QCowSnapshotHeader) * QCOW_MAX_SNAPSHOTS,
@@ -438,6 +446,12 @@ int coroutine_fn qcow2_check_read_snapshot_table(BlockDriverState *bs,
         result->check_errors++;
         error_reportf_err(local_err, "ERROR ");
 
+        if (s->nb_snapshots > QCOW_MAX_SNAPSHOTS) {
+            fprintf(stderr, "You can force-remove all %u overhanging snapshots "
+                    "with qemu-img check -r all\n",
+                    s->nb_snapshots - QCOW_MAX_SNAPSHOTS);
+        }
+
         /* We did not read the snapshot table, so invalidate this information */
         s->snapshots_offset = 0;
         s->nb_snapshots = 0;
-- 
2.21.0



^ permalink raw reply	[flat|nested] 38+ messages in thread

* [Qemu-devel] [PATCH for-4.2 11/13] qcow2: Fix v3 snapshot table entry compliancy
  2019-07-30 17:24 [Qemu-devel] [PATCH for-4.2 00/13] qcow2: Let check -r all repair some snapshot bits Max Reitz
                   ` (9 preceding siblings ...)
  2019-07-30 17:25 ` [Qemu-devel] [PATCH for-4.2 10/13] qcow2: Repair snapshot table with too many entries Max Reitz
@ 2019-07-30 17:25 ` Max Reitz
  2019-07-30 19:12   ` Eric Blake
  2019-07-30 17:25 ` [Qemu-devel] [PATCH for-4.2 12/13] iotests: Add peek_file* functions Max Reitz
                   ` (2 subsequent siblings)
  13 siblings, 1 reply; 38+ messages in thread
From: Max Reitz @ 2019-07-30 17:25 UTC (permalink / raw)
  To: qemu-block; +Cc: Kevin Wolf, qemu-devel, Max Reitz

qcow2 v3 images require every snapshot table entry to have at least 16
bytes of extra data.  If they do not, let qemu-img check -r all fix it.

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 block/qcow2-snapshot.c | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/block/qcow2-snapshot.c b/block/qcow2-snapshot.c
index 9e8c7c1f7f..e22c964b2a 100644
--- a/block/qcow2-snapshot.c
+++ b/block/qcow2-snapshot.c
@@ -502,6 +502,21 @@ int coroutine_fn qcow2_check_read_snapshot_table(BlockDriverState *bs,
         result->corruptions -= nb_clusters_reduced;
     }
 
+    /*
+     * All of v3 images' snapshot table entries need to have at least
+     * 16 bytes of extra data.
+     */
+    if (s->qcow_version >= 3) {
+        int i;
+        for (i = 0; i < s->nb_snapshots; i++) {
+            if (s->snapshots[i].extra_data_size < 16) {
+                result->corruptions++;
+                fprintf(stderr, "%s snapshot table entry %i is incomplete\n",
+                        fix & BDRV_FIX_ERRORS ? "Repairing" : "ERROR", i);
+            }
+        }
+    }
+
     return 0;
 }
 
-- 
2.21.0



^ permalink raw reply	[flat|nested] 38+ messages in thread

* [Qemu-devel] [PATCH for-4.2 12/13] iotests: Add peek_file* functions
  2019-07-30 17:24 [Qemu-devel] [PATCH for-4.2 00/13] qcow2: Let check -r all repair some snapshot bits Max Reitz
                   ` (10 preceding siblings ...)
  2019-07-30 17:25 ` [Qemu-devel] [PATCH for-4.2 11/13] qcow2: Fix v3 snapshot table entry compliancy Max Reitz
@ 2019-07-30 17:25 ` Max Reitz
  2019-07-30 19:22   ` Eric Blake
  2019-07-30 17:25 ` [Qemu-devel] [PATCH for-4.2 13/13] iotests: Test qcow2's snapshot table handling Max Reitz
  2019-07-30 17:39 ` [Qemu-devel] [PATCH for-4.2 00/13] qcow2: Let check -r all repair some snapshot bits Eric Blake
  13 siblings, 1 reply; 38+ messages in thread
From: Max Reitz @ 2019-07-30 17:25 UTC (permalink / raw)
  To: qemu-block; +Cc: Kevin Wolf, qemu-devel, Max Reitz

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 tests/qemu-iotests/common.rc | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

diff --git a/tests/qemu-iotests/common.rc b/tests/qemu-iotests/common.rc
index 5502c3da2f..78decfd5d5 100644
--- a/tests/qemu-iotests/common.rc
+++ b/tests/qemu-iotests/common.rc
@@ -53,6 +53,26 @@ poke_file()
     printf "$3" | dd "of=$1" bs=1 "seek=$2" conv=notrunc &>/dev/null
 }
 
+# peek_file_le 'test.img' 512 2 => 65534
+peek_file_le()
+{
+    # Wrap in echo $() to strip spaces
+    echo $(od -j"$2" -N"$3" --endian=little -An -vtu"$3" "$1")
+}
+
+# peek_file_be 'test.img' 512 2 => 65279
+peek_file_be()
+{
+    # Wrap in echo $() to strip spaces
+    echo $(od -j"$2" -N"$3" --endian=big -An -vtu"$3" "$1")
+}
+
+# peek_file_raw 'test.img' 512 2 => '\xff\xfe'
+peek_file_raw()
+{
+    dd if="$1" bs=1 skip="$2" count="$3" status=none
+}
+
 
 if ! . ./common.config
     then
-- 
2.21.0



^ permalink raw reply	[flat|nested] 38+ messages in thread

* [Qemu-devel] [PATCH for-4.2 13/13] iotests: Test qcow2's snapshot table handling
  2019-07-30 17:24 [Qemu-devel] [PATCH for-4.2 00/13] qcow2: Let check -r all repair some snapshot bits Max Reitz
                   ` (11 preceding siblings ...)
  2019-07-30 17:25 ` [Qemu-devel] [PATCH for-4.2 12/13] iotests: Add peek_file* functions Max Reitz
@ 2019-07-30 17:25 ` Max Reitz
  2019-07-30 19:56   ` Eric Blake
  2019-07-30 17:39 ` [Qemu-devel] [PATCH for-4.2 00/13] qcow2: Let check -r all repair some snapshot bits Eric Blake
  13 siblings, 1 reply; 38+ messages in thread
From: Max Reitz @ 2019-07-30 17:25 UTC (permalink / raw)
  To: qemu-block; +Cc: Kevin Wolf, qemu-devel, Max Reitz

Add a test how our qcow2 driver handles extra data in snapshot table
entries, and how it repairs overly long snapshot tables.

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 tests/qemu-iotests/261     | 449 +++++++++++++++++++++++++++++++++++++
 tests/qemu-iotests/261.out | 321 ++++++++++++++++++++++++++
 tests/qemu-iotests/group   |   1 +
 3 files changed, 771 insertions(+)
 create mode 100755 tests/qemu-iotests/261
 create mode 100644 tests/qemu-iotests/261.out

diff --git a/tests/qemu-iotests/261 b/tests/qemu-iotests/261
new file mode 100755
index 0000000000..c7c4a4742a
--- /dev/null
+++ b/tests/qemu-iotests/261
@@ -0,0 +1,449 @@
+#!/usr/bin/env bash
+#
+# Test case for qcow2's handling of extra data in snapshot table entries
+# (and more generally, how certain cases of broken snapshot tables are
+# handled)
+#
+# Copyright (C) 2019 Red Hat, Inc.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+#
+
+# creator
+owner=mreitz@redhat.com
+
+seq=$(basename $0)
+echo "QA output created by $seq"
+
+status=1	# failure is the default!
+
+_cleanup()
+{
+    _cleanup_test_img
+    rm -f "$TEST_IMG".v{2,3}.orig
+    rm -f "$TEST_DIR"/sn{0,1,2}{,-pre,-extra,-post}
+}
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+# get standard environment, filters and checks
+. ./common.rc
+. ./common.filter
+
+# This tests qocw2-specific low-level functionality
+_supported_fmt qcow2
+_supported_proto file
+_supported_os Linux
+# (1) We create a v2 image that supports nothing but refcount_bits=16
+# (2) We do some refcount management on our own which expects
+#     refcount_bits=16
+_unsupported_imgopts 'refcount_bits=\([^1]\|.\([^6]\|$\)\)'
+
+# Parameters:
+#   $1: image filename
+#   $2: snapshot table entry offset in the image
+snapshot_table_entry_size()
+{
+    id_len=$(peek_file_be "$1" $(($2 + 12)) 2)
+    name_len=$(peek_file_be "$1" $(($2 + 14)) 2)
+    extra_len=$(peek_file_be "$1" $(($2 + 36)) 4)
+
+    full_len=$((40 + extra_len + id_len + name_len))
+    if [ $((full_len % 8)) = 0 ]; then
+        echo $full_len
+    else
+        echo $((full_len + 8 - full_len % 8))
+    fi
+}
+
+# Parameter:
+#   $1: image filename
+print_snapshot_table()
+{
+    nb_entries=$(peek_file_be "$1" 60 4)
+    offset=$(peek_file_be "$1" 64 8)
+
+    echo "Snapshots in $1:" | _filter_testdir | _filter_imgfmt
+
+    for ((i = 0; i < nb_entries; i++)); do
+        id_len=$(peek_file_be "$1" $((offset + 12)) 2)
+        name_len=$(peek_file_be "$1" $((offset + 14)) 2)
+        extra_len=$(peek_file_be "$1" $((offset + 36)) 4)
+
+        extra_ofs=$((offset + 40))
+        id_ofs=$((extra_ofs + extra_len))
+        name_ofs=$((id_ofs + id_len))
+
+        echo "  [$i]"
+        echo "    ID: $(peek_file_raw "$1" $id_ofs $id_len)"
+        echo "    Name: $(peek_file_raw "$1" $name_ofs $name_len)"
+        echo "    Extra data size: $extra_len"
+        if [ $extra_len -ge 8 ]; then
+            echo "    VM state size: $(peek_file_be "$1" $extra_ofs 8)"
+        fi
+        if [ $extra_len -ge 16 ]; then
+            echo "    Disk size: $(peek_file_be "$1" $((extra_ofs + 8)) 8)"
+        fi
+        if [ $extra_len -gt 16 ]; then
+            echo '    Unknown extra data:' \
+                "$(peek_file_raw "$1" $((extra_ofs + 16)) $((extra_len - 16)) \
+                   | tr -d '\0')"
+        fi
+
+        offset=$((offset + $(snapshot_table_entry_size "$1" $offset)))
+    done
+}
+
+# Mark clusters as allocated; works only in refblock 0 (i.e. before
+# cluster #32768).
+# Parameters:
+#   $1: Start offset of what to allocate
+#   $2: End offset (exclusive)
+refblock0_allocate()
+{
+    reftable_ofs=$(peek_file_be "$TEST_IMG" 48 8)
+    refblock_ofs=$(peek_file_be "$TEST_IMG" $reftable_ofs 8)
+
+    cluster=$(($1 / 65536))
+    ecluster=$((($2 + 65535) / 65536))
+
+    while [ $cluster -lt $ecluster ]; do
+        if [ $cluster -ge 32768 ]; then
+            echo "*** Abort: Cluster $cluster exceeds refblock 0 ***"
+            exit 1
+        fi
+        poke_file "$TEST_IMG" $((refblock_ofs + cluster * 2)) '\x00\x01'
+        cluster=$((cluster + 1))
+    done
+}
+
+
+echo
+echo '=== Create v2 template ==='
+echo
+
+# Create v2 image with a snapshot table with three entries:
+# [0]: No extra data (valid with v2, not valid with v3)
+# [1]: Has extra data unknown to qemu
+# [2]: Has the 64-bit VM state size, but not the disk size (again,
+#      valid with v2, not valid with v3)
+
+TEST_IMG="$TEST_IMG.v2.orig" IMGOPTS='compat=0.10' _make_test_img 64M
+$QEMU_IMG snapshot -c sn0 "$TEST_IMG.v2.orig"
+$QEMU_IMG snapshot -c sn1 "$TEST_IMG.v2.orig"
+$QEMU_IMG snapshot -c sn2 "$TEST_IMG.v2.orig"
+
+# Copy out all existing snapshot table entries
+sn_table_ofs=$(peek_file_be "$TEST_IMG.v2.orig" 64 8)
+
+# ofs: Snapshot table entry offset
+# eds: Extra data size
+# ids: Name + ID size
+# len: Total entry length
+sn0_ofs=$sn_table_ofs
+sn0_eds=$(peek_file_be "$TEST_IMG.v2.orig" $((sn0_ofs + 36)) 4)
+sn0_ids=$(($(peek_file_be "$TEST_IMG.v2.orig" $((sn0_ofs + 12)) 2) +
+           $(peek_file_be "$TEST_IMG.v2.orig" $((sn0_ofs + 14)) 2)))
+sn0_len=$(snapshot_table_entry_size "$TEST_IMG.v2.orig" $sn0_ofs)
+sn1_ofs=$((sn0_ofs + sn0_len))
+sn1_eds=$(peek_file_be "$TEST_IMG.v2.orig" $((sn1_ofs + 36)) 4)
+sn1_ids=$(($(peek_file_be "$TEST_IMG.v2.orig" $((sn1_ofs + 12)) 2) +
+           $(peek_file_be "$TEST_IMG.v2.orig" $((sn1_ofs + 14)) 2)))
+sn1_len=$(snapshot_table_entry_size "$TEST_IMG.v2.orig" $sn1_ofs)
+sn2_ofs=$((sn1_ofs + sn1_len))
+sn2_eds=$(peek_file_be "$TEST_IMG.v2.orig" $((sn2_ofs + 36)) 4)
+sn2_ids=$(($(peek_file_be "$TEST_IMG.v2.orig" $((sn2_ofs + 12)) 2) +
+           $(peek_file_be "$TEST_IMG.v2.orig" $((sn2_ofs + 14)) 2)))
+sn2_len=$(snapshot_table_entry_size "$TEST_IMG.v2.orig" $sn2_ofs)
+
+# Data before extra data
+dd if="$TEST_IMG.v2.orig" of="$TEST_DIR/sn0-pre" bs=1 skip=$sn0_ofs count=40 \
+    &>/dev/null
+dd if="$TEST_IMG.v2.orig" of="$TEST_DIR/sn1-pre" bs=1 skip=$sn1_ofs count=40 \
+    &>/dev/null
+dd if="$TEST_IMG.v2.orig" of="$TEST_DIR/sn2-pre" bs=1 skip=$sn2_ofs count=40 \
+    &>/dev/null
+
+# Extra data
+dd if="$TEST_IMG.v2.orig" of="$TEST_DIR/sn0-extra" bs=1 \
+    skip=$((sn0_ofs + 40)) count=$sn0_eds &>/dev/null
+dd if="$TEST_IMG.v2.orig" of="$TEST_DIR/sn1-extra" bs=1 \
+    skip=$((sn1_ofs + 40)) count=$sn1_eds &>/dev/null
+dd if="$TEST_IMG.v2.orig" of="$TEST_DIR/sn2-extra" bs=1 \
+    skip=$((sn2_ofs + 40)) count=$sn2_eds &>/dev/null
+
+# Data after extra data
+dd if="$TEST_IMG.v2.orig" of="$TEST_DIR/sn0-post" bs=1 \
+    skip=$((sn0_ofs + 40 + sn0_eds)) count=$sn0_ids \
+    &> /dev/null
+dd if="$TEST_IMG.v2.orig" of="$TEST_DIR/sn1-post" bs=1 \
+    skip=$((sn1_ofs + 40 + sn1_eds)) count=$sn1_ids \
+    &> /dev/null
+dd if="$TEST_IMG.v2.orig" of="$TEST_DIR/sn2-post" bs=1 \
+    skip=$((sn2_ofs + 40 + sn2_eds)) count=$sn2_ids \
+    &> /dev/null
+
+# Amend them, one by one
+# Set sn0's extra data size to 0
+poke_file "$TEST_DIR/sn0-pre" 36 '\x00\x00\x00\x00'
+truncate -s 0 "$TEST_DIR/sn0-extra"
+# Grow sn0-post to pad
+truncate -s $(($(snapshot_table_entry_size "$TEST_DIR/sn0-pre") - 40)) \
+    "$TEST_DIR/sn0-post"
+
+# Set sn1's extra data size to 42
+poke_file "$TEST_DIR/sn1-pre" 36 '\x00\x00\x00\x2a'
+truncate -s 42 "$TEST_DIR/sn1-extra"
+poke_file "$TEST_DIR/sn1-extra" 16 'very important data'
+# Grow sn1-post to pad
+truncate -s $(($(snapshot_table_entry_size "$TEST_DIR/sn1-pre") - 82)) \
+    "$TEST_DIR/sn1-post"
+
+# Set sn2's extra data size to 8
+poke_file "$TEST_DIR/sn2-pre" 36 '\x00\x00\x00\x08'
+truncate -s 8 "$TEST_DIR/sn2-extra"
+# Grow sn2-post to pad
+truncate -s $(($(snapshot_table_entry_size "$TEST_DIR/sn2-pre") - 48)) \
+    "$TEST_DIR/sn2-post"
+
+# Construct snapshot table
+cat "$TEST_DIR"/sn0-{pre,extra,post} \
+    "$TEST_DIR"/sn1-{pre,extra,post} \
+    "$TEST_DIR"/sn2-{pre,extra,post} \
+    | dd of="$TEST_IMG.v2.orig" bs=1 seek=$sn_table_ofs conv=notrunc \
+          &> /dev/null
+
+# Done!
+TEST_IMG="$TEST_IMG.v2.orig" _check_test_img
+print_snapshot_table "$TEST_IMG.v2.orig"
+
+echo
+echo '=== Upgrade to v3 ==='
+echo
+
+cp "$TEST_IMG.v2.orig" "$TEST_IMG.v3.orig"
+$QEMU_IMG amend -o compat=1.1 "$TEST_IMG.v3.orig"
+TEST_IMG="$TEST_IMG.v3.orig" _check_test_img
+print_snapshot_table "$TEST_IMG.v3.orig"
+
+echo
+echo '=== Repair botched v3 ==='
+echo
+
+# Force the v2 file to be v3.  v3 requires each snapshot table entry
+# to have at least 16 bytes of extra data, so it will not comply to
+# the qcow2 v3 specification; but we can fix that.
+cp "$TEST_IMG.v2.orig" "$TEST_IMG"
+
+# Set version
+poke_file "$TEST_IMG" 4 '\x00\x00\x00\x03'
+# Increase header length (necessary for v3)
+poke_file "$TEST_IMG" 100 '\x00\x00\x00\x68'
+# Set refcount order (necessary for v3)
+poke_file "$TEST_IMG" 96 '\x00\x00\x00\x04'
+
+_check_test_img -r all
+print_snapshot_table "$TEST_IMG"
+
+
+# From now on, just test the qcow2 version we are supposed to test.
+# (v3 by default, v2 by choice through $IMGOPTS.)
+# That works because we always write all known extra data when
+# updating the snapshot table, independent of the version.
+
+if echo "$IMGOPTS" | grep -q 'compat=\(0\.10\|v2\)' 2> /dev/null; then
+    subver=v2
+else
+    subver=v3
+fi
+
+echo
+echo '=== Add new snapshot ==='
+echo
+
+cp "$TEST_IMG.$subver.orig" "$TEST_IMG"
+$QEMU_IMG snapshot -c sn3 "$TEST_IMG"
+_check_test_img
+print_snapshot_table "$TEST_IMG"
+
+echo
+echo '=== Remove different snapshots ==='
+
+for sn in sn0 sn1 sn2; do
+    echo
+    echo "--- $sn ---"
+
+    cp "$TEST_IMG.$subver.orig" "$TEST_IMG"
+    $QEMU_IMG snapshot -d $sn "$TEST_IMG"
+    _check_test_img
+    print_snapshot_table "$TEST_IMG"
+done
+
+echo
+echo '=== Reject too much unknown extra data ==='
+echo
+
+cp "$TEST_IMG.$subver.orig" "$TEST_IMG"
+$QEMU_IMG snapshot -c sn3 "$TEST_IMG"
+
+sn_table_ofs=$(peek_file_be "$TEST_IMG" 64 8)
+sn0_ofs=$sn_table_ofs
+sn1_ofs=$((sn0_ofs + $(snapshot_table_entry_size "$TEST_IMG" $sn0_ofs)))
+sn2_ofs=$((sn1_ofs + $(snapshot_table_entry_size "$TEST_IMG" $sn1_ofs)))
+sn3_ofs=$((sn2_ofs + $(snapshot_table_entry_size "$TEST_IMG" $sn2_ofs)))
+
+# 64 kB of extra data should be rejected
+# (Note that this also induces a refcount error, because it spills
+# over to the next cluster.  That's a good way to test that we can
+# handle simultaneous snapshot table and refcount errors.)
+poke_file "$TEST_IMG" $((sn3_ofs + 36)) '\x00\x01\x00\x00'
+
+# Print error
+_img_info
+echo
+_check_test_img
+echo
+
+# Should be repairable
+_check_test_img -r all
+
+echo
+echo '=== Snapshot table too big ==='
+echo
+
+sn_table_ofs=$(peek_file_be "$TEST_IMG.v3.orig" 64 8)
+
+# Fill a snapshot with 1 kB of extra data, a 65535-char ID, and a
+# 65535-char name, and repeat it as many times as necessary to fill
+# 64 MB (the maximum supported by qemu)
+
+touch "$TEST_DIR/sn0"
+
+# Full size (fixed + extra + ID + name + padding)
+sn_size=$((40 + 1024 + 65535 + 65535 + 2))
+
+# We only need the fixed part, though.
+truncate -s 40 "$TEST_DIR/sn0"
+
+# 65535-char ID string
+poke_file "$TEST_DIR/sn0" 12 '\xff\xff'
+# 65535-char name
+poke_file "$TEST_DIR/sn0" 14 '\xff\xff'
+# 1 kB of extra data
+poke_file "$TEST_DIR/sn0" 36 '\x00\x00\x04\x00'
+
+# Create test image
+_make_test_img 64M
+
+# Hook up snapshot table somewhere safe (at 1 MB)
+poke_file "$TEST_IMG" 64 '\x00\x00\x00\x00\x00\x10\x00\x00'
+
+offset=1048576
+size_written=0
+sn_count=0
+while [ $size_written -le $((64 * 1048576)) ]; do
+    dd if="$TEST_DIR/sn0" of="$TEST_IMG" bs=1 seek=$offset conv=notrunc \
+        &> /dev/null
+    offset=$((offset + sn_size))
+    size_written=$((size_written + sn_size))
+    sn_count=$((sn_count + 1))
+done
+
+# Give the last snapshot (the one to be removed) an L1 table so we can
+# see how that is handled when repairing the image
+# (Put it two clusters before 1 MB, and one L2 table one cluster
+# before 1 MB)
+poke_file "$TEST_IMG" $((offset - sn-size + 0)) \
+    '\x00\x00\x00\x00\x00\x0e\x00\x00'
+poke_file "$TEST_IMG" $((offset - sn-size + 8)) \
+    '\x00\x00\x00\x01'
+
+# Hook up the L2 table
+poke_file "$TEST_IMG" $((1048576 - 2 * 65536)) \
+    '\x80\x00\x00\x00\x00\x0f\x00\x00'
+
+# Make sure all of the clusters we just hooked up are allocated:
+# - The snapshot table
+# - The last snapshot's L1 and L2 table
+refblock0_allocate $((1048576 - 2 * 65536)) $offset
+
+poke_file "$TEST_IMG" 60 \
+    "$(printf '%08x' $sn_count | sed -e 's/\(..\)/\\x\1/g')"
+
+# Print error
+_img_info
+echo
+_check_test_img
+echo
+
+# Should be repairable
+_check_test_img -r all
+
+echo
+echo "$((sn_count - 1)) snapshots should remain:"
+echo "  qemu-img info reports $(_img_info | grep -c '^ \{34\}') snapshots"
+echo "  Image header reports $(peek_file_be "$TEST_IMG" 60 4) snapshots"
+
+echo
+echo '=== Too many snapshots ==='
+echo
+
+# Create a v2 image, for speeds' sake: All-zero snapshot table entries
+# are only valid in v2.
+IMGOPTS='compat=0.10' _make_test_img 64M
+
+# Hook up snapshot table somewhere safe (at 1 MB)
+poke_file "$TEST_IMG" 64 '\x00\x00\x00\x00\x00\x10\x00\x00'
+# "Create" more than 65536 snapshots (twice that many here)
+poke_file "$TEST_IMG" 60 '\x00\x02\x00\x00'
+
+# 40-byte all-zero snapshot table entries are valid snapshots, but
+# only in v2 (v3 needs 16 bytes of extra data, so we would have to
+# write 131072x '\x10').
+truncate -s $((1048576 + 40 * 131072)) "$TEST_IMG"
+
+# But let us give one of the snapshots to be removed an L1 table so
+# we can see how that is handled when repairing the image.
+# (Put it two clusters before 1 MB, and one L2 table one cluster
+# before 1 MB)
+poke_file "$TEST_IMG" $((1048576 + 40 * 65536 + 0)) \
+    '\x00\x00\x00\x00\x00\x0e\x00\x00'
+poke_file "$TEST_IMG" $((1048576 + 40 * 65536 + 8)) \
+    '\x00\x00\x00\x01'
+
+# Hook up the L2 table
+poke_file "$TEST_IMG" $((1048576 - 2 * 65536)) \
+    '\x80\x00\x00\x00\x00\x0f\x00\x00'
+
+# Make sure all of the clusters we just hooked up are allocated:
+# - The snapshot table
+# - The last snapshot's L1 and L2 table
+refblock0_allocate $((1048576 - 2 * 65536)) $((1048576 + 40 * 131072))
+
+# Print error
+_img_info
+echo
+_check_test_img
+echo
+
+# Should be repairable
+_check_test_img -r all
+
+echo
+echo '65536 snapshots should remain:'
+echo "  qemu-img info reports $(_img_info | grep -c '^ \{34\}') snapshots"
+echo "  Image header reports $(peek_file_be "$TEST_IMG" 60 4) snapshots"
+
+# success, all done
+echo "*** done"
+status=0
diff --git a/tests/qemu-iotests/261.out b/tests/qemu-iotests/261.out
new file mode 100644
index 0000000000..75af8c98ac
--- /dev/null
+++ b/tests/qemu-iotests/261.out
@@ -0,0 +1,321 @@
+QA output created by 261
+
+=== Create v2 template ===
+
+Formatting 'TEST_DIR/t.IMGFMT.v2.orig', fmt=IMGFMT size=67108864
+No errors were found on the image.
+Snapshots in TEST_DIR/t.IMGFMT.v2.orig:
+  [0]
+    ID: 1
+    Name: sn0
+    Extra data size: 0
+  [1]
+    ID: 2
+    Name: sn1
+    Extra data size: 42
+    VM state size: 0
+    Disk size: 67108864
+    Unknown extra data: very important data
+  [2]
+    ID: 3
+    Name: sn2
+    Extra data size: 8
+    VM state size: 0
+
+=== Upgrade to v3 ===
+
+No errors were found on the image.
+Snapshots in TEST_DIR/t.IMGFMT.v3.orig:
+  [0]
+    ID: 1
+    Name: sn0
+    Extra data size: 16
+    VM state size: 0
+    Disk size: 67108864
+  [1]
+    ID: 2
+    Name: sn1
+    Extra data size: 42
+    VM state size: 0
+    Disk size: 67108864
+    Unknown extra data: very important data
+  [2]
+    ID: 3
+    Name: sn2
+    Extra data size: 16
+    VM state size: 0
+    Disk size: 67108864
+
+=== Repair botched v3 ===
+
+Repairing snapshot table entry 0 is incomplete
+Repairing snapshot table entry 2 is incomplete
+The following inconsistencies were found and repaired:
+
+    0 leaked clusters
+    2 corruptions
+
+Double checking the fixed image now...
+No errors were found on the image.
+Snapshots in TEST_DIR/t.IMGFMT:
+  [0]
+    ID: 1
+    Name: sn0
+    Extra data size: 16
+    VM state size: 0
+    Disk size: 67108864
+  [1]
+    ID: 2
+    Name: sn1
+    Extra data size: 42
+    VM state size: 0
+    Disk size: 67108864
+    Unknown extra data: very important data
+  [2]
+    ID: 3
+    Name: sn2
+    Extra data size: 16
+    VM state size: 0
+    Disk size: 67108864
+
+=== Add new snapshot ===
+
+No errors were found on the image.
+Snapshots in TEST_DIR/t.IMGFMT:
+  [0]
+    ID: 1
+    Name: sn0
+    Extra data size: 16
+    VM state size: 0
+    Disk size: 67108864
+  [1]
+    ID: 2
+    Name: sn1
+    Extra data size: 42
+    VM state size: 0
+    Disk size: 67108864
+    Unknown extra data: very important data
+  [2]
+    ID: 3
+    Name: sn2
+    Extra data size: 16
+    VM state size: 0
+    Disk size: 67108864
+  [3]
+    ID: 4
+    Name: sn3
+    Extra data size: 16
+    VM state size: 0
+    Disk size: 67108864
+
+=== Remove different snapshots ===
+
+--- sn0 ---
+No errors were found on the image.
+Snapshots in TEST_DIR/t.IMGFMT:
+  [0]
+    ID: 2
+    Name: sn1
+    Extra data size: 42
+    VM state size: 0
+    Disk size: 67108864
+    Unknown extra data: very important data
+  [1]
+    ID: 3
+    Name: sn2
+    Extra data size: 16
+    VM state size: 0
+    Disk size: 67108864
+
+--- sn1 ---
+No errors were found on the image.
+Snapshots in TEST_DIR/t.IMGFMT:
+  [0]
+    ID: 1
+    Name: sn0
+    Extra data size: 16
+    VM state size: 0
+    Disk size: 67108864
+  [1]
+    ID: 3
+    Name: sn2
+    Extra data size: 16
+    VM state size: 0
+    Disk size: 67108864
+
+--- sn2 ---
+No errors were found on the image.
+Snapshots in TEST_DIR/t.IMGFMT:
+  [0]
+    ID: 1
+    Name: sn0
+    Extra data size: 16
+    VM state size: 0
+    Disk size: 67108864
+  [1]
+    ID: 2
+    Name: sn1
+    Extra data size: 42
+    VM state size: 0
+    Disk size: 67108864
+    Unknown extra data: very important data
+
+=== Reject too much unknown extra data ===
+
+qemu-img: Could not open 'TEST_DIR/t.IMGFMT': Too much extra metadata in snapshot table entry 3
+You can force-remove this extra metadata with qemu-img check -r all
+
+qemu-img: ERROR failed to read the snapshot table: Too much extra metadata in snapshot table entry 3
+You can force-remove this extra metadata with qemu-img check -r all
+qemu-img: Check failed: File too large
+
+Discarding too much extra metadata in snapshot table entry 3 (65536 > 1024)
+ERROR cluster 10 refcount=0 reference=1
+Rebuilding refcount structure
+Repairing cluster 1 refcount=1 reference=0
+Repairing cluster 2 refcount=1 reference=0
+The following inconsistencies were found and repaired:
+
+    0 leaked clusters
+    2 corruptions
+
+Double checking the fixed image now...
+No errors were found on the image.
+
+=== Snapshot table too big ===
+
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=67108864
+qemu-img: Could not open 'TEST_DIR/t.IMGFMT': Snapshot table is too big
+You can force-remove all 1 overhanging snapshots with qemu-img check -r all
+
+qemu-img: ERROR failed to read the snapshot table: Snapshot table is too big
+You can force-remove all 1 overhanging snapshots with qemu-img check -r all
+qemu-img: Check failed: File too large
+
+Discarding 1 overhanging snapshots (snapshot table is too big)
+Leaked cluster 14 refcount=1 reference=0
+Leaked cluster 15 refcount=1 reference=0
+Leaked cluster 1039 refcount=1 reference=0
+Leaked cluster 1040 refcount=1 reference=0
+Repairing cluster 14 refcount=1 reference=0
+Repairing cluster 15 refcount=1 reference=0
+Repairing cluster 1039 refcount=1 reference=0
+Repairing cluster 1040 refcount=1 reference=0
+The following inconsistencies were found and repaired:
+
+    4 leaked clusters
+    1 corruptions
+
+Double checking the fixed image now...
+No errors were found on the image.
+
+507 snapshots should remain:
+  qemu-img info reports 507 snapshots
+  Image header reports 507 snapshots
+
+=== Too many snapshots ===
+
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=67108864
+qemu-img: Could not open 'TEST_DIR/t.IMGFMT': Snapshot table too large
+
+qemu-img: ERROR snapshot table too large
+You can force-remove all 65536 overhanging snapshots with qemu-img check -r all
+qemu-img: Check failed: File too large
+
+Discarding 65536 overhanging snapshots
+Leaked cluster 14 refcount=1 reference=0
+Leaked cluster 15 refcount=1 reference=0
+Leaked cluster 56 refcount=1 reference=0
+Leaked cluster 57 refcount=1 reference=0
+Leaked cluster 58 refcount=1 reference=0
+Leaked cluster 59 refcount=1 reference=0
+Leaked cluster 60 refcount=1 reference=0
+Leaked cluster 61 refcount=1 reference=0
+Leaked cluster 62 refcount=1 reference=0
+Leaked cluster 63 refcount=1 reference=0
+Leaked cluster 64 refcount=1 reference=0
+Leaked cluster 65 refcount=1 reference=0
+Leaked cluster 66 refcount=1 reference=0
+Leaked cluster 67 refcount=1 reference=0
+Leaked cluster 68 refcount=1 reference=0
+Leaked cluster 69 refcount=1 reference=0
+Leaked cluster 70 refcount=1 reference=0
+Leaked cluster 71 refcount=1 reference=0
+Leaked cluster 72 refcount=1 reference=0
+Leaked cluster 73 refcount=1 reference=0
+Leaked cluster 74 refcount=1 reference=0
+Leaked cluster 75 refcount=1 reference=0
+Leaked cluster 76 refcount=1 reference=0
+Leaked cluster 77 refcount=1 reference=0
+Leaked cluster 78 refcount=1 reference=0
+Leaked cluster 79 refcount=1 reference=0
+Leaked cluster 80 refcount=1 reference=0
+Leaked cluster 81 refcount=1 reference=0
+Leaked cluster 82 refcount=1 reference=0
+Leaked cluster 83 refcount=1 reference=0
+Leaked cluster 84 refcount=1 reference=0
+Leaked cluster 85 refcount=1 reference=0
+Leaked cluster 86 refcount=1 reference=0
+Leaked cluster 87 refcount=1 reference=0
+Leaked cluster 88 refcount=1 reference=0
+Leaked cluster 89 refcount=1 reference=0
+Leaked cluster 90 refcount=1 reference=0
+Leaked cluster 91 refcount=1 reference=0
+Leaked cluster 92 refcount=1 reference=0
+Leaked cluster 93 refcount=1 reference=0
+Leaked cluster 94 refcount=1 reference=0
+Leaked cluster 95 refcount=1 reference=0
+Repairing cluster 14 refcount=1 reference=0
+Repairing cluster 15 refcount=1 reference=0
+Repairing cluster 56 refcount=1 reference=0
+Repairing cluster 57 refcount=1 reference=0
+Repairing cluster 58 refcount=1 reference=0
+Repairing cluster 59 refcount=1 reference=0
+Repairing cluster 60 refcount=1 reference=0
+Repairing cluster 61 refcount=1 reference=0
+Repairing cluster 62 refcount=1 reference=0
+Repairing cluster 63 refcount=1 reference=0
+Repairing cluster 64 refcount=1 reference=0
+Repairing cluster 65 refcount=1 reference=0
+Repairing cluster 66 refcount=1 reference=0
+Repairing cluster 67 refcount=1 reference=0
+Repairing cluster 68 refcount=1 reference=0
+Repairing cluster 69 refcount=1 reference=0
+Repairing cluster 70 refcount=1 reference=0
+Repairing cluster 71 refcount=1 reference=0
+Repairing cluster 72 refcount=1 reference=0
+Repairing cluster 73 refcount=1 reference=0
+Repairing cluster 74 refcount=1 reference=0
+Repairing cluster 75 refcount=1 reference=0
+Repairing cluster 76 refcount=1 reference=0
+Repairing cluster 77 refcount=1 reference=0
+Repairing cluster 78 refcount=1 reference=0
+Repairing cluster 79 refcount=1 reference=0
+Repairing cluster 80 refcount=1 reference=0
+Repairing cluster 81 refcount=1 reference=0
+Repairing cluster 82 refcount=1 reference=0
+Repairing cluster 83 refcount=1 reference=0
+Repairing cluster 84 refcount=1 reference=0
+Repairing cluster 85 refcount=1 reference=0
+Repairing cluster 86 refcount=1 reference=0
+Repairing cluster 87 refcount=1 reference=0
+Repairing cluster 88 refcount=1 reference=0
+Repairing cluster 89 refcount=1 reference=0
+Repairing cluster 90 refcount=1 reference=0
+Repairing cluster 91 refcount=1 reference=0
+Repairing cluster 92 refcount=1 reference=0
+Repairing cluster 93 refcount=1 reference=0
+Repairing cluster 94 refcount=1 reference=0
+Repairing cluster 95 refcount=1 reference=0
+The following inconsistencies were found and repaired:
+
+    42 leaked clusters
+    65536 corruptions
+
+Double checking the fixed image now...
+No errors were found on the image.
+
+65536 snapshots should remain:
+  qemu-img info reports 65536 snapshots
+  Image header reports 65536 snapshots
+*** done
diff --git a/tests/qemu-iotests/group b/tests/qemu-iotests/group
index f13e5f2e23..d33ab9238e 100644
--- a/tests/qemu-iotests/group
+++ b/tests/qemu-iotests/group
@@ -271,3 +271,4 @@
 254 rw backing quick
 255 rw quick
 256 rw quick
+261 rw
-- 
2.21.0



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH for-4.2 00/13] qcow2: Let check -r all repair some snapshot bits
  2019-07-30 17:24 [Qemu-devel] [PATCH for-4.2 00/13] qcow2: Let check -r all repair some snapshot bits Max Reitz
                   ` (12 preceding siblings ...)
  2019-07-30 17:25 ` [Qemu-devel] [PATCH for-4.2 13/13] iotests: Test qcow2's snapshot table handling Max Reitz
@ 2019-07-30 17:39 ` Eric Blake
  13 siblings, 0 replies; 38+ messages in thread
From: Eric Blake @ 2019-07-30 17:39 UTC (permalink / raw)
  To: Max Reitz, qemu-block; +Cc: Kevin Wolf, qemu-devel

[-- Attachment #1.1: Type: text/plain, Size: 2553 bytes --]

On 7/30/19 12:24 PM, Max Reitz wrote:
> Hi,
> 
> As Eric reports in https://bugzilla.redhat.com/show_bug.cgi?id=1727347,
> qemu-img amend has a bug when it comes to converting qcow2 v2 images to
> v3: In v3, every snapshot table entry requires at least 16 bytes of
> extra metadata to be present, which isn’t the case for v2 images.
> Currently, qemu-img amend doesn’t take care of updating the snapshot
> table, so the image is a bit corrupt afterwards (luckily, qemu doesn’t
> take notice, though).

And if anyone wants to work on an obvious followup series after this: we
should allow 'qemu-img resize' to work on v3 images with internal
snapshots, while still continuing to forbid it on v2 images (with v2,
ALL internal snapshots are assumed to have the same size as the image
itself, so you can't resize the image without also resizing internal
snapshots, but that doesn't preserve proper guest history; but with v3,
since all internal snapshots have their own proper size recorded,
changing the image size doesn't impact the snapshots).

> 
> This yields the following patches:
> - Patch 3: Helper patch
> - Patch 4: Helper patch, so we can actually do more than just to bump up
>   the version number when upgrading a qcow2 image from v2 to v3
> - Patch 5: The fix

Quite this history of how the series came to be. Thanks for tackling it!


> Then I got the glorious idea of “Hey, if I want to see how much extra
> data a snapshot table entry has outside of qcow2_read_snapshots(), I
> should add a field that reports that value to QCowSnapshot.  And if I do
> that, I might as well make the qcow2 driver interpret the specification
> a bit more literally, namely it should ignore all unknown extra data,
> that is (as I interpret it), keep it in memory and write it back when
> updating the snapshot table.”

Indeed. If the extra data ever becomes essential to correct
interpretation of the image, then we would add an incompatible feature
bit in the qcow2 header (older actors would refuse to open the image
because they don't understand the incompatible feature, newer actors
will know to use the longer extra data as mandatory).  Otherwise, extra
data should be preserved intact insofar as is possible, but can safely
be ignored or truncated by older actors without breaking the proper
image interpretation in the newer actor that wrote that extra data.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH for-4.2 01/13] qcow2: Add Error ** to qcow2_read_snapshots()
  2019-07-30 17:24 ` [Qemu-devel] [PATCH for-4.2 01/13] qcow2: Add Error ** to qcow2_read_snapshots() Max Reitz
@ 2019-07-30 17:41   ` Eric Blake
  0 siblings, 0 replies; 38+ messages in thread
From: Eric Blake @ 2019-07-30 17:41 UTC (permalink / raw)
  To: Max Reitz, qemu-block; +Cc: Kevin Wolf, qemu-devel

[-- Attachment #1.1: Type: text/plain, Size: 434 bytes --]

On 7/30/19 12:24 PM, Max Reitz wrote:
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>  block/qcow2.h          | 2 +-
>  block/qcow2-snapshot.c | 7 ++++++-
>  block/qcow2.c          | 3 +--
>  3 files changed, 8 insertions(+), 4 deletions(-)

Reviewed-by: Eric Blake <eblake@redhat.com>

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH for-4.2 02/13] qcow2: Keep unknown extra snapshot data
  2019-07-30 17:24 ` [Qemu-devel] [PATCH for-4.2 02/13] qcow2: Keep unknown extra snapshot data Max Reitz
@ 2019-07-30 17:56   ` Eric Blake
  2019-07-31  8:54     ` Max Reitz
  0 siblings, 1 reply; 38+ messages in thread
From: Eric Blake @ 2019-07-30 17:56 UTC (permalink / raw)
  To: Max Reitz, qemu-block; +Cc: Kevin Wolf, qemu-devel

[-- Attachment #1.1: Type: text/plain, Size: 6866 bytes --]

On 7/30/19 12:24 PM, Max Reitz wrote:
> The qcow2 specification says to ignore unknown extra data fields in
> snapshot table entries.  Currently, we discard it whenever we update the
> image, which is a bit different from "ignore".
> 
> This patch makes the qcow2 driver keep all unknown extra data fields
> when updating an image's snapshot table.

The cover letter questioned whether we want this, but I think we do.

> 
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>  block/qcow2.h          |  5 ++++
>  block/qcow2-snapshot.c | 59 +++++++++++++++++++++++++++++++++++-------
>  2 files changed, 55 insertions(+), 9 deletions(-)
> 
> diff --git a/block/qcow2.h b/block/qcow2.h
> index 175708cee0..290a48b77e 100644
> --- a/block/qcow2.h
> +++ b/block/qcow2.h
> @@ -61,6 +61,9 @@
>   * space for snapshot names and IDs */
>  #define QCOW_MAX_SNAPSHOTS_SIZE (1024 * QCOW_MAX_SNAPSHOTS)
>  
> +/* Maximum amount of extra data per snapshot table entry to accept */
> +#define QCOW_MAX_SNAPSHOT_EXTRA_DATA 1024
> +
>  /* Bitmap header extension constraints */
>  #define QCOW2_MAX_BITMAPS 65535
>  #define QCOW2_MAX_BITMAP_DIRECTORY_SIZE (1024 * QCOW2_MAX_BITMAPS)
> @@ -178,6 +181,8 @@ typedef struct QCowSnapshot {
>      uint32_t date_sec;
>      uint32_t date_nsec;
>      uint64_t vm_clock_nsec;
> +    uint32_t extra_data_size;
> +    void *unknown_extra_data; /* Extra data past QCowSnapshotExtraData */

Is char* going to be any easier to use than void*?

> +++ b/block/qcow2-snapshot.c

> @@ -80,30 +80,52 @@ int qcow2_read_snapshots(BlockDriverState *bs, Error **errp)

> +
> +        /* Read known extra data */
>          ret = bdrv_pread(bs->file, offset, &extra,
> -                         MIN(sizeof(extra), extra_data_size));
> +                         MIN(sizeof(extra), sn->extra_data_size));
>          if (ret < 0) {
>              error_setg_errno(errp, -ret, "Failed to read snapshot table");
>              goto fail;
>          }
> -        offset += extra_data_size;
> +        offset += MIN(sizeof(extra), sn->extra_data_size);
>  
> -        if (extra_data_size >= 8) {
> +        if (sn->extra_data_size >= 8) {

While touching this, is it worth spelling it:
if (sn->extra_data_size >= sizeof(extra.vm_state_size_large)) {

>              sn->vm_state_size = be64_to_cpu(extra.vm_state_size_large);
>          }
>  
> -        if (extra_data_size >= 16) {
> +        if (sn->extra_data_size >= 16) {

and a similar use of sizeof() instead of hard-coded 16 here?

>              sn->disk_size = be64_to_cpu(extra.disk_size);
>          } else {
>              sn->disk_size = bs->total_sectors * BDRV_SECTOR_SIZE;
>          }
>  
> +        if (sn->extra_data_size > sizeof(extra)) {
> +            /* Store unknown extra data */
> +            size_t unknown_extra_data_size =
> +                sn->extra_data_size - sizeof(extra);
> +
> +            sn->unknown_extra_data = g_malloc(unknown_extra_data_size);
> +            ret = bdrv_pread(bs->file, offset, sn->unknown_extra_data,
> +                             unknown_extra_data_size);

We're doing two separate bdrv_pread()s. Would it be better to do a
single bdrv_preadv into a vector composed of &extra and
&unknown_extra_data, for less I/O?  (Then again, this micro-optimization
is probably in the noise in the long run)

> +            if (ret < 0) {
> +                error_setg_errno(errp, -ret, "Failed to read snapshot table");
> +                goto fail;
> +            }
> +            offset += unknown_extra_data_size;
> +        }
> +
>          /* Read snapshot ID */
>          sn->id_str = g_malloc(id_str_size + 1);
>          ret = bdrv_pread(bs->file, offset, sn->id_str, id_str_size);
> @@ -161,7 +183,7 @@ static int qcow2_write_snapshots(BlockDriverState *bs)
>          sn = s->snapshots + i;
>          offset = ROUND_UP(offset, 8);
>          offset += sizeof(h);
> -        offset += sizeof(extra);
> +        offset += MAX(sizeof(extra), sn->extra_data_size);
>          offset += strlen(sn->id_str);
>          offset += strlen(sn->name);
>  
> @@ -208,7 +230,8 @@ static int qcow2_write_snapshots(BlockDriverState *bs)
>          h.date_sec = cpu_to_be32(sn->date_sec);
>          h.date_nsec = cpu_to_be32(sn->date_nsec);
>          h.vm_clock_nsec = cpu_to_be64(sn->vm_clock_nsec);
> -        h.extra_data_size = cpu_to_be32(sizeof(extra));
> +        h.extra_data_size = cpu_to_be32(MAX(sizeof(extra),
> +                                            sn->extra_data_size));
>  
>          memset(&extra, 0, sizeof(extra));
>          extra.vm_state_size_large = cpu_to_be64(sn->vm_state_size);
> @@ -233,6 +256,22 @@ static int qcow2_write_snapshots(BlockDriverState *bs)
>          }
>          offset += sizeof(extra);
>  
> +        if (sn->extra_data_size > sizeof(extra)) {
> +            size_t unknown_extra_data_size =
> +                sn->extra_data_size - sizeof(extra);
> +
> +            /* qcow2_read_snapshots() ensures no unbounded allocation */
> +            assert(unknown_extra_data_size <= INT_MAX);

Should we make this tighter:
 assert(unknown_extra_data_size < QCOW_MAX_SNAPSHOT_EXTRA_DATA);
or even
 assert(unknown_extra_data_size <=
        QCOW_MAX_SNAPSHOT_EXTRA_DATA - sizeof(extra));

> +            assert(sn->unknown_extra_data);
> +
> +            ret = bdrv_pwrite(bs->file, offset, sn->unknown_extra_data,
> +                              unknown_extra_data_size);
> +            if (ret < 0) {
> +                goto fail;
> +            }
> +            offset += unknown_extra_data_size;
> +        }
> +
>          ret = bdrv_pwrite(bs->file, offset, sn->id_str, id_str_size);
>          if (ret < 0) {
>              goto fail;
> @@ -375,6 +414,7 @@ int qcow2_snapshot_create(BlockDriverState *bs, QEMUSnapshotInfo *sn_info)
>      sn->date_sec = sn_info->date_sec;
>      sn->date_nsec = sn_info->date_nsec;
>      sn->vm_clock_nsec = sn_info->vm_clock_nsec;
> +    sn->extra_data_size = sizeof(QCowSnapshotExtraData);
>  
>      /* Allocate the L1 table of the snapshot and copy the current one there. */
>      l1_table_offset = qcow2_alloc_clusters(bs, s->l1_size * sizeof(uint64_t));
> @@ -646,6 +686,7 @@ int qcow2_snapshot_delete(BlockDriverState *bs,
>       * The snapshot is now unused, clean up. If we fail after this point, we
>       * won't recover but just leak clusters.
>       */
> +    g_free(sn.unknown_extra_data);
>      g_free(sn.id_str);
>      g_free(sn.name);
>  
> 

Tweaks to assertions are minor enough that I'm okay with:
Reviewed-by: Eric Blake <eblake@redhat.com>

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH for-4.2 03/13] qcow2: Make qcow2_write_snapshots() public
  2019-07-30 17:24 ` [Qemu-devel] [PATCH for-4.2 03/13] qcow2: Make qcow2_write_snapshots() public Max Reitz
@ 2019-07-30 17:57   ` Eric Blake
  0 siblings, 0 replies; 38+ messages in thread
From: Eric Blake @ 2019-07-30 17:57 UTC (permalink / raw)
  To: Max Reitz, qemu-block; +Cc: Kevin Wolf, qemu-devel

[-- Attachment #1.1: Type: text/plain, Size: 531 bytes --]

On 7/30/19 12:24 PM, Max Reitz wrote:
> Updating the snapshot list will be useful when upgrading a v2 image to
> v3, so we will need to call this function in qcow2.c.
> 
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>  block/qcow2.h          | 1 +
>  block/qcow2-snapshot.c | 2 +-
>  2 files changed, 2 insertions(+), 1 deletion(-)
> 

Reviewed-by: Eric Blake <eblake@redhat.com>

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH for-4.2 04/13] qcow2: Put qcow2_upgrade() into an own function
  2019-07-30 17:24 ` [Qemu-devel] [PATCH for-4.2 04/13] qcow2: Put qcow2_upgrade() into an own function Max Reitz
@ 2019-07-30 18:00   ` Eric Blake
  0 siblings, 0 replies; 38+ messages in thread
From: Eric Blake @ 2019-07-30 18:00 UTC (permalink / raw)
  To: Max Reitz, qemu-block; +Cc: Kevin Wolf, qemu-devel

[-- Attachment #1.1: Type: text/plain, Size: 580 bytes --]

On 7/30/19 12:24 PM, Max Reitz wrote:

Subject line sounds funny; I'd probably s/an/its/

> This does not make sense right now, but it will make sense once we need
> to do more than to just update s->qcow_version.
> 
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>  block/qcow2.c | 43 ++++++++++++++++++++++++++++++++++++++-----
>  1 file changed, 38 insertions(+), 5 deletions(-)
> 
Reviewed-by: Eric Blake <eblake@redhat.com>

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH for-4.2 05/13] qcow2: Write v3-compliant snapshot list on upgrade
  2019-07-30 17:25 ` [Qemu-devel] [PATCH for-4.2 05/13] qcow2: Write v3-compliant snapshot list on upgrade Max Reitz
@ 2019-07-30 18:10   ` Eric Blake
  2019-07-31  8:56     ` Max Reitz
  0 siblings, 1 reply; 38+ messages in thread
From: Eric Blake @ 2019-07-30 18:10 UTC (permalink / raw)
  To: Max Reitz, qemu-block; +Cc: Kevin Wolf, qemu-devel

[-- Attachment #1.1: Type: text/plain, Size: 1325 bytes --]

On 7/30/19 12:25 PM, Max Reitz wrote:
> qcow2 v3 requires every snapshot table entry to have two extra data
> fields: The 64-bit VM state size, and the virtual disk size.  Both are
> optional for v2 images, so they may not be present.
> 
> qcow2_upgrade() therefore should update the snapshot table to ensure all
> entries have these extra data fields.
> 
> Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1727347
> Reported-by: Eric Blake <eblake@redhat.com>
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>  block/qcow2.c | 29 +++++++++++++++++++++++++++--
>  1 file changed, 27 insertions(+), 2 deletions(-)
> 

> +
> +    /*
> +     * In v2, snapshots do not need to have extra data.  v3 requires
> +     * the 64-bit VM state size and the virtual disk size to be
> +     * present.
> +     * qcow2_write_snapshots() will always write the list in the
> +     * v3-compliant format.
> +     */
> +    need_snapshot_update = false;
> +    for (i = 0; i < s->nb_snapshots; i++) {
> +        if (s->snapshots[i].extra_data_size < 16) {

s/16/sizeof(extra)/ looks a bit nicer, but doesn't change semantics.

Reviewed-by: Eric Blake <eblake@redhat.com>

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH for-4.2 06/13] qcow2: Separate qcow2_check_read_snapshot_table()
  2019-07-30 17:25 ` [Qemu-devel] [PATCH for-4.2 06/13] qcow2: Separate qcow2_check_read_snapshot_table() Max Reitz
@ 2019-07-30 18:53   ` Eric Blake
  2019-07-31  8:59     ` Max Reitz
  0 siblings, 1 reply; 38+ messages in thread
From: Eric Blake @ 2019-07-30 18:53 UTC (permalink / raw)
  To: Max Reitz, qemu-block; +Cc: Kevin Wolf, qemu-devel

[-- Attachment #1.1: Type: text/plain, Size: 1746 bytes --]

On 7/30/19 12:25 PM, Max Reitz wrote:
> Reading the snapshot table can fail.  That is a problem when we want to
> repair the image.
> 
> Therefore, stop reading the snapshot table in qcow2_do_open() in check
> mode.  Instead, add a new function qcow2_check_read_snapshot_table()
> that reads the snapshot table at a later point.  In the future, we want
> to handle errors here and fix them.
> 
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>  block/qcow2.h          |  4 +++
>  block/qcow2-snapshot.c | 58 ++++++++++++++++++++++++++++++++
>  block/qcow2.c          | 76 ++++++++++++++++++++++++++++++++----------
>  3 files changed, 120 insertions(+), 18 deletions(-)
> 

> +++ b/block/qcow2-snapshot.c
> @@ -321,6 +321,64 @@ fail:
>      return ret;
>  }
>  
> +int coroutine_fn qcow2_check_read_snapshot_table(BlockDriverState *bs,
> +                                                 BdrvCheckResult *result,
> +                                                 BdrvCheckMode fix)
> +{
> +    BDRVQcow2State *s = bs->opaque;
> +    Error *local_err = NULL;
> +    int ret;
> +    struct {
> +        uint32_t nb_snapshots;
> +        uint64_t snapshots_offset;
> +    } QEMU_PACKED snapshot_table_pointer;
> +
> +    /* qcow2_do_open() discards this information in check mode */
> +    ret = bdrv_pread(bs->file, 60, &snapshot_table_pointer,
> +                     sizeof(snapshot_table_pointer));

Should that '60' be a named constant or offsetof() expression?  (I know,
you just copied this instance from elsewhere)

Reviewed-by: Eric Blake <eblake@redhat.com>

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH for-4.2 07/13] qcow2: Add qcow2_check_fix_snapshot_table()
  2019-07-30 17:25 ` [Qemu-devel] [PATCH for-4.2 07/13] qcow2: Add qcow2_check_fix_snapshot_table() Max Reitz
@ 2019-07-30 18:54   ` Eric Blake
  0 siblings, 0 replies; 38+ messages in thread
From: Eric Blake @ 2019-07-30 18:54 UTC (permalink / raw)
  To: Max Reitz, qemu-block; +Cc: Kevin Wolf, qemu-devel

[-- Attachment #1.1: Type: text/plain, Size: 837 bytes --]

On 7/30/19 12:25 PM, Max Reitz wrote:
> qcow2_check_read_snapshot_table() can perform consistency checks, but it
> cannot fix everything.  Specifically, it cannot allocate new clusters,
> because that should wait until the refcount structures are known to be
> consistent (i.e., after qcow2_check_refcounts()).  Thus, it cannot call
> qcow2_write_snapshots().
> 
> Do that in qcow2_check_fix_snapshot_table(), which is called after
> qcow2_check_refcounts().
> 
> Currently, there is nothing that would set result->corruptions, so this
> is a no-op.  A follow-up patch will change that.
> 
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---

Reviewed-by: Eric Blake <eblake@redhat.com>

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH for-4.2 08/13] qcow2: Fix broken snapshot table entries
  2019-07-30 17:25 ` [Qemu-devel] [PATCH for-4.2 08/13] qcow2: Fix broken snapshot table entries Max Reitz
@ 2019-07-30 19:02   ` Eric Blake
  2019-07-31  9:06     ` Max Reitz
  0 siblings, 1 reply; 38+ messages in thread
From: Eric Blake @ 2019-07-30 19:02 UTC (permalink / raw)
  To: Max Reitz, qemu-block; +Cc: Kevin Wolf, qemu-devel

[-- Attachment #1.1: Type: text/plain, Size: 1805 bytes --]

On 7/30/19 12:25 PM, Max Reitz wrote:
> The only case where we currently reject snapshot table entries is when
> they have too much extra data.  Fix them with qemu-img check -r all by
> counting it as a corruption, reducing their extra_data_size, and then
> letting qcow2_check_fix_snapshot_table() do the rest.
> 
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>  block/qcow2-snapshot.c | 69 ++++++++++++++++++++++++++++++++++--------
>  1 file changed, 56 insertions(+), 13 deletions(-)
> 

> @@ -112,16 +141,22 @@ int qcow2_read_snapshots(BlockDriverState *bs, Error **errp)
>          }
>  
>          if (sn->extra_data_size > sizeof(extra)) {
> -            /* Store unknown extra data */
>              size_t unknown_extra_data_size =
>                  sn->extra_data_size - sizeof(extra);
>  
> -            sn->unknown_extra_data = g_malloc(unknown_extra_data_size);
> -            ret = bdrv_pread(bs->file, offset, sn->unknown_extra_data,
> -                             unknown_extra_data_size);
> -            if (ret < 0) {
> -                error_setg_errno(errp, -ret, "Failed to read snapshot table");
> -                goto fail;
> +            if (discard_unknown_extra_data) {
> +                /* Discard unknown extra data */
> +                sn->extra_data_size = sizeof(extra);

This truncates it down to just the data we know. Should it instead
truncate down to the 1024 bytes of QCOW_MAX_SNAPSHOT_EXTRA_DATA defined
in 2/13?  (We can't keep all of the user's extra stuff, but we can at
least try to preserve as much as possible)

Otherwise, looks good.
Reviewed-by: Eric Blake <eblake@redhat.com>

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH for-4.2 09/13] qcow2: Fix overly long snapshot tables
  2019-07-30 17:25 ` [Qemu-devel] [PATCH for-4.2 09/13] qcow2: Fix overly long snapshot tables Max Reitz
@ 2019-07-30 19:08   ` Eric Blake
  2019-07-31  9:22     ` Max Reitz
  0 siblings, 1 reply; 38+ messages in thread
From: Eric Blake @ 2019-07-30 19:08 UTC (permalink / raw)
  To: Max Reitz, qemu-block; +Cc: Kevin Wolf, qemu-devel

[-- Attachment #1.1: Type: text/plain, Size: 2193 bytes --]

On 7/30/19 12:25 PM, Max Reitz wrote:
> We currently refuse to open qcow2 images with overly long snapshot
> tables.  This patch makes qemu-img check -r all drop all offending
> entries past what we deem acceptable.
> 
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>  block/qcow2-snapshot.c | 89 +++++++++++++++++++++++++++++++++++++-----
>  1 file changed, 79 insertions(+), 10 deletions(-)

I'm less sure about this one.  8/13 should have no semantic effect (if
the user _depended_ on that much extra data, they should have set an
incompatible feature flag bit, at which point we'd leave their data
alone because we don't recognize the feature bit; so it is safe to
assume the user did not depend on the data and that we can thus nuke it
with impunity).  But here, we are throwing away the user's internal
snapshots, and not even giving them a say in which ones to throw away
(more likely, by trimming from the end, we are destroying the most
recent snapshots in favor of the older ones - but I could argue that
throwing away the oldest also has its uses).


> @@ -417,7 +461,32 @@ int coroutine_fn qcow2_check_read_snapshot_table(BlockDriverState *bs,
>  
>          return ret;
>      }
> -    result->corruptions += extra_data_dropped;
> +    result->corruptions += nb_clusters_reduced + extra_data_dropped;
> +
> +    if (nb_clusters_reduced) {
> +        /*
> +         * Update image header now, because:
> +         * (1) qcow2_check_refcounts() relies on s->nb_snapshots to be
> +         *     the same as what the image header says,
> +         * (2) this leaks clusters, but qcow2_check_refcounts() will
> +         *     fix that.
> +         */
> +        assert(fix & BDRV_FIX_ERRORS);
> +
> +        snapshot_table_pointer.nb_snapshots = cpu_to_be32(s->nb_snapshots);
> +        ret = bdrv_pwrite_sync(bs->file, 60,

That '60' needs a name; it keeps popping up.

If we like the patch, I didn't spot major coding problems.  But because
I'm not sure we want this patch, I'll skip R-b for now.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH for-4.2 10/13] qcow2: Repair snapshot table with too many entries
  2019-07-30 17:25 ` [Qemu-devel] [PATCH for-4.2 10/13] qcow2: Repair snapshot table with too many entries Max Reitz
@ 2019-07-30 19:10   ` Eric Blake
  2019-07-31  9:25     ` Max Reitz
  0 siblings, 1 reply; 38+ messages in thread
From: Eric Blake @ 2019-07-30 19:10 UTC (permalink / raw)
  To: Max Reitz, qemu-block; +Cc: Kevin Wolf, qemu-devel

[-- Attachment #1.1: Type: text/plain, Size: 640 bytes --]

On 7/30/19 12:25 PM, Max Reitz wrote:
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>  block/qcow2-snapshot.c | 14 ++++++++++++++
>  1 file changed, 14 insertions(+)
> 

Same problem as for 9/13 - should we really be throwing away the user's
data like this?  (9/13 hits if the user has a small number of snapshots,
but each has enough extra data, that the overall table is bigger than we
like; 10/13 hits if the user has more snapshots than we like, but
otherwise they do the same thing).

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH for-4.2 11/13] qcow2: Fix v3 snapshot table entry compliancy
  2019-07-30 17:25 ` [Qemu-devel] [PATCH for-4.2 11/13] qcow2: Fix v3 snapshot table entry compliancy Max Reitz
@ 2019-07-30 19:12   ` Eric Blake
  0 siblings, 0 replies; 38+ messages in thread
From: Eric Blake @ 2019-07-30 19:12 UTC (permalink / raw)
  To: Max Reitz, qemu-block; +Cc: Kevin Wolf, qemu-devel

[-- Attachment #1.1: Type: text/plain, Size: 1460 bytes --]

On 7/30/19 12:25 PM, Max Reitz wrote:
> qcow2 v3 images require every snapshot table entry to have at least 16
> bytes of extra data.  If they do not, let qemu-img check -r all fix it.
> 
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>  block/qcow2-snapshot.c | 15 +++++++++++++++
>  1 file changed, 15 insertions(+)
> 
> diff --git a/block/qcow2-snapshot.c b/block/qcow2-snapshot.c
> index 9e8c7c1f7f..e22c964b2a 100644
> --- a/block/qcow2-snapshot.c
> +++ b/block/qcow2-snapshot.c
> @@ -502,6 +502,21 @@ int coroutine_fn qcow2_check_read_snapshot_table(BlockDriverState *bs,
>          result->corruptions -= nb_clusters_reduced;
>      }
>  
> +    /*
> +     * All of v3 images' snapshot table entries need to have at least
> +     * 16 bytes of extra data.
> +     */
> +    if (s->qcow_version >= 3) {
> +        int i;
> +        for (i = 0; i < s->nb_snapshots; i++) {
> +            if (s->snapshots[i].extra_data_size < 16) {

s/16/sizeof(extra)/ may be nicer here.

> +                result->corruptions++;
> +                fprintf(stderr, "%s snapshot table entry %i is incomplete\n",
> +                        fix & BDRV_FIX_ERRORS ? "Repairing" : "ERROR", i);
> +            }
> +        }
> +    }
> +
>      return 0;

Reviewed-by: Eric Blake <eblake@redhat.com>

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH for-4.2 12/13] iotests: Add peek_file* functions
  2019-07-30 17:25 ` [Qemu-devel] [PATCH for-4.2 12/13] iotests: Add peek_file* functions Max Reitz
@ 2019-07-30 19:22   ` Eric Blake
  2019-07-31  9:27     ` Max Reitz
  0 siblings, 1 reply; 38+ messages in thread
From: Eric Blake @ 2019-07-30 19:22 UTC (permalink / raw)
  To: Max Reitz, qemu-block; +Cc: Kevin Wolf, qemu-devel

[-- Attachment #1.1: Type: text/plain, Size: 1993 bytes --]

On 7/30/19 12:25 PM, Max Reitz wrote:
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>  tests/qemu-iotests/common.rc | 20 ++++++++++++++++++++
>  1 file changed, 20 insertions(+)
> 
> diff --git a/tests/qemu-iotests/common.rc b/tests/qemu-iotests/common.rc
> index 5502c3da2f..78decfd5d5 100644
> --- a/tests/qemu-iotests/common.rc
> +++ b/tests/qemu-iotests/common.rc
> @@ -53,6 +53,26 @@ poke_file()
>      printf "$3" | dd "of=$1" bs=1 "seek=$2" conv=notrunc &>/dev/null
>  }
>  
> +# peek_file_le 'test.img' 512 2 => 65534
> +peek_file_le()
> +{
> +    # Wrap in echo $() to strip spaces
> +    echo $(od -j"$2" -N"$3" --endian=little -An -vtu"$3" "$1")

Requires coreutils' od, but we can patch that later if it proves to be a
problem on other hosts.

I used to do something similar in nbdkit (prior to qemu-nbd --list
making my life a lot easier; see nbdkit commit b228cb40); but there, I
read a 16-bit value in 2 8-bit chunks and pieced it together myself
rather than relying on --endian:

-    # Protocol is big endian, we want native endian.
-    # xargs trick to trim whitespace from
-    # https://stackoverflow.com/a/12973694
-    eflags_hi=$(od -An -N1     -tx1 eflags.out | xargs)
-    eflags_lo=$(od -An -N1 -j1 -tx1 eflags.out | xargs)
-    eflags=$(( 0x$eflags_hi << 8 | 0x$eflags_lo ))

But as long as we are using --endian, your version nicely handles 1, 2,
4, and 8-byte reads.

> +
> +# peek_file_raw 'test.img' 512 2 => '\xff\xfe'
> +peek_file_raw()
> +{
> +    dd if="$1" bs=1 skip="$2" count="$3" status=none
> +}

Of course, calling $(peek_file_raw ...) is a bad idea, because it might
eat a trailing byte that happened to be a newline; it also doesn't
handle NUL bytes very well.  Is it worth documenting caveats for using
this one?

Reviewed-by: Eric Blake <eblake@redhat.com>

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH for-4.2 13/13] iotests: Test qcow2's snapshot table handling
  2019-07-30 17:25 ` [Qemu-devel] [PATCH for-4.2 13/13] iotests: Test qcow2's snapshot table handling Max Reitz
@ 2019-07-30 19:56   ` Eric Blake
  2019-07-31  9:36     ` Max Reitz
  0 siblings, 1 reply; 38+ messages in thread
From: Eric Blake @ 2019-07-30 19:56 UTC (permalink / raw)
  To: Max Reitz, qemu-block; +Cc: Kevin Wolf, qemu-devel

[-- Attachment #1.1: Type: text/plain, Size: 14024 bytes --]

On 7/30/19 12:25 PM, Max Reitz wrote:
> Add a test how our qcow2 driver handles extra data in snapshot table
> entries, and how it repairs overly long snapshot tables.

May need tweaking if we drop 9 and 10.

> 
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>  tests/qemu-iotests/261     | 449 +++++++++++++++++++++++++++++++++++++
>  tests/qemu-iotests/261.out | 321 ++++++++++++++++++++++++++
>  tests/qemu-iotests/group   |   1 +
>  3 files changed, 771 insertions(+)
>  create mode 100755 tests/qemu-iotests/261
>  create mode 100644 tests/qemu-iotests/261.out
> 
> +
> +# Parameters:
> +#   $1: image filename
> +#   $2: snapshot table entry offset in the image
> +snapshot_table_entry_size()
> +{
> +    id_len=$(peek_file_be "$1" $(($2 + 12)) 2)
> +    name_len=$(peek_file_be "$1" $(($2 + 14)) 2)
> +    extra_len=$(peek_file_be "$1" $(($2 + 36)) 4)
> +
> +    full_len=$((40 + extra_len + id_len + name_len))
> +    if [ $((full_len % 8)) = 0 ]; then
> +        echo $full_len
> +    else
> +        echo $((full_len + 8 - full_len % 8))

Could replace the entire if with:
 echo $(( (full_len + 7) / 8 * 8 ))
but what you have works.

> +    fi
> +}
> +
> +# Parameter:
> +#   $1: image filename
> +print_snapshot_table()
> +{
> +    nb_entries=$(peek_file_be "$1" 60 4)
> +    offset=$(peek_file_be "$1" 64 8)
> +
> +    echo "Snapshots in $1:" | _filter_testdir | _filter_imgfmt

Should a separate patch add support in 'qemu-img info'/'qemu-img
snapshot -l' for letting users know how much extra info is in each
snapshot?  It seems useful enough without having to recode this
low-level iotest introspection.

> +
> +    for ((i = 0; i < nb_entries; i++)); do
> +        id_len=$(peek_file_be "$1" $((offset + 12)) 2)
> +        name_len=$(peek_file_be "$1" $((offset + 14)) 2)
> +        extra_len=$(peek_file_be "$1" $((offset + 36)) 4)
> +
> +        extra_ofs=$((offset + 40))
> +        id_ofs=$((extra_ofs + extra_len))
> +        name_ofs=$((id_ofs + id_len))
> +
> +        echo "  [$i]"
> +        echo "    ID: $(peek_file_raw "$1" $id_ofs $id_len)"
> +        echo "    Name: $(peek_file_raw "$1" $name_ofs $name_len)"

We're relying on the files having sane strings at those offsets - but
that's fine for the iotest.

> +        echo "    Extra data size: $extra_len"
> +        if [ $extra_len -ge 8 ]; then
> +            echo "    VM state size: $(peek_file_be "$1" $extra_ofs 8)"
> +        fi
> +        if [ $extra_len -ge 16 ]; then
> +            echo "    Disk size: $(peek_file_be "$1" $((extra_ofs + 8)) 8)"
> +        fi
> +        if [ $extra_len -gt 16 ]; then
> +            echo '    Unknown extra data:' \
> +                "$(peek_file_raw "$1" $((extra_ofs + 16)) $((extra_len - 16)) \
> +                   | tr -d '\0')"

Printing the unknown extra data seems fishy, especially if you are going
to sanitize out the NUL bytes.  An od dump of every byte might be more
useful, but I'd also be happy with just printing the number of unknown
bytes without actually worrying about printing the contents of those bytes.


> +echo
> +echo '=== Create v2 template ==='
> +echo
> +
> +# Create v2 image with a snapshot table with three entries:
> +# [0]: No extra data (valid with v2, not valid with v3)
> +# [1]: Has extra data unknown to qemu
> +# [2]: Has the 64-bit VM state size, but not the disk size (again,
> +#      valid with v2, not valid with v3)

Cool - and covers cases that are not possible to do with current
'qemu-img create -f qcow2 -o compat=v2' (if you have a CentOS 6 machine
hanging around, you can create v2 images like entry [0]; but these days,
modern qemu-img creates v2 images with the 16 extra bytes as they will
be used by v3).

> +
> +TEST_IMG="$TEST_IMG.v2.orig" IMGOPTS='compat=0.10' _make_test_img 64M
> +$QEMU_IMG snapshot -c sn0 "$TEST_IMG.v2.orig"
> +$QEMU_IMG snapshot -c sn1 "$TEST_IMG.v2.orig"
> +$QEMU_IMG snapshot -c sn2 "$TEST_IMG.v2.orig"
> +
> +# Copy out all existing snapshot table entries

> +
> +# Amend them, one by one
> +# Set sn0's extra data size to 0
> +poke_file "$TEST_DIR/sn0-pre" 36 '\x00\x00\x00\x00'
> +truncate -s 0 "$TEST_DIR/sn0-extra"
> +# Grow sn0-post to pad
> +truncate -s $(($(snapshot_table_entry_size "$TEST_DIR/sn0-pre") - 40)) \
> +    "$TEST_DIR/sn0-post"
> +
> +# Set sn1's extra data size to 42

Nice - not a multiple of 8, to test our code handling of rounding :)

> +poke_file "$TEST_DIR/sn1-pre" 36 '\x00\x00\x00\x2a'
> +truncate -s 42 "$TEST_DIR/sn1-extra"
> +poke_file "$TEST_DIR/sn1-extra" 16 'very important data'
> +# Grow sn1-post to pad
> +truncate -s $(($(snapshot_table_entry_size "$TEST_DIR/sn1-pre") - 82)) \
> +    "$TEST_DIR/sn1-post"
> +
> +# Set sn2's extra data size to 8
> +poke_file "$TEST_DIR/sn2-pre" 36 '\x00\x00\x00\x08'
> +truncate -s 8 "$TEST_DIR/sn2-extra"
> +# Grow sn2-post to pad
> +truncate -s $(($(snapshot_table_entry_size "$TEST_DIR/sn2-pre") - 48)) \
> +    "$TEST_DIR/sn2-post"
> +
> +# Construct snapshot table
> +cat "$TEST_DIR"/sn0-{pre,extra,post} \
> +    "$TEST_DIR"/sn1-{pre,extra,post} \
> +    "$TEST_DIR"/sn2-{pre,extra,post} \
> +    | dd of="$TEST_IMG.v2.orig" bs=1 seek=$sn_table_ofs conv=notrunc \
> +          &> /dev/null
> +
> +# Done!

Hairy, but looks like a valid v2 image.

> +TEST_IMG="$TEST_IMG.v2.orig" _check_test_img
> +print_snapshot_table "$TEST_IMG.v2.orig"
> +
> +echo
> +echo '=== Upgrade to v3 ==='
> +echo
> +
> +cp "$TEST_IMG.v2.orig" "$TEST_IMG.v3.orig"
> +$QEMU_IMG amend -o compat=1.1 "$TEST_IMG.v3.orig"
> +TEST_IMG="$TEST_IMG.v3.orig" _check_test_img
> +print_snapshot_table "$TEST_IMG.v3.orig"
> +
> +echo
> +echo '=== Repair botched v3 ==='
> +echo
> +
> +# Force the v2 file to be v3.  v3 requires each snapshot table entry
> +# to have at least 16 bytes of extra data, so it will not comply to
> +# the qcow2 v3 specification; but we can fix that.
> +cp "$TEST_IMG.v2.orig" "$TEST_IMG"
> +
> +# Set version
> +poke_file "$TEST_IMG" 4 '\x00\x00\x00\x03'
> +# Increase header length (necessary for v3)
> +poke_file "$TEST_IMG" 100 '\x00\x00\x00\x68'
> +# Set refcount order (necessary for v3)
> +poke_file "$TEST_IMG" 96 '\x00\x00\x00\x04'
> +
> +_check_test_img -r all
> +print_snapshot_table "$TEST_IMG"

Good - you're testing that we can now flag and fix images that we were
previously silently using in spite of their non-compliance.

> +
> +
> +# From now on, just test the qcow2 version we are supposed to test.
> +# (v3 by default, v2 by choice through $IMGOPTS.)
> +# That works because we always write all known extra data when
> +# updating the snapshot table, independent of the version.
> +
> +if echo "$IMGOPTS" | grep -q 'compat=\(0\.10\|v2\)' 2> /dev/null; then
> +    subver=v2
> +else
> +    subver=v3
> +fi
> +
> +echo
> +echo '=== Add new snapshot ==='
> +echo
> +
> +cp "$TEST_IMG.$subver.orig" "$TEST_IMG"
> +$QEMU_IMG snapshot -c sn3 "$TEST_IMG"
> +_check_test_img
> +print_snapshot_table "$TEST_IMG"
> +
> +echo
> +echo '=== Remove different snapshots ==='
> +
> +for sn in sn0 sn1 sn2; do
> +    echo
> +    echo "--- $sn ---"
> +
> +    cp "$TEST_IMG.$subver.orig" "$TEST_IMG"
> +    $QEMU_IMG snapshot -d $sn "$TEST_IMG"
> +    _check_test_img
> +    print_snapshot_table "$TEST_IMG"
> +done
> +
> +echo
> +echo '=== Reject too much unknown extra data ==='
> +echo
> +
> +cp "$TEST_IMG.$subver.orig" "$TEST_IMG"
> +$QEMU_IMG snapshot -c sn3 "$TEST_IMG"
> +
> +sn_table_ofs=$(peek_file_be "$TEST_IMG" 64 8)
> +sn0_ofs=$sn_table_ofs
> +sn1_ofs=$((sn0_ofs + $(snapshot_table_entry_size "$TEST_IMG" $sn0_ofs)))
> +sn2_ofs=$((sn1_ofs + $(snapshot_table_entry_size "$TEST_IMG" $sn1_ofs)))
> +sn3_ofs=$((sn2_ofs + $(snapshot_table_entry_size "$TEST_IMG" $sn2_ofs)))
> +
> +# 64 kB of extra data should be rejected
> +# (Note that this also induces a refcount error, because it spills
> +# over to the next cluster.  That's a good way to test that we can
> +# handle simultaneous snapshot table and refcount errors.)
> +poke_file "$TEST_IMG" $((sn3_ofs + 36)) '\x00\x01\x00\x00'
> +
> +# Print error
> +_img_info
> +echo
> +_check_test_img
> +echo
> +
> +# Should be repairable
> +_check_test_img -r all

Nice.

> +
> +echo
> +echo '=== Snapshot table too big ==='
> +echo
> +
> +sn_table_ofs=$(peek_file_be "$TEST_IMG.v3.orig" 64 8)
> +
> +# Fill a snapshot with 1 kB of extra data, a 65535-char ID, and a
> +# 65535-char name, and repeat it as many times as necessary to fill
> +# 64 MB (the maximum supported by qemu)
> +
> +touch "$TEST_DIR/sn0"
> +
> +# Full size (fixed + extra + ID + name + padding)
> +sn_size=$((40 + 1024 + 65535 + 65535 + 2))
> +
> +# We only need the fixed part, though.
> +truncate -s 40 "$TEST_DIR/sn0"
> +
> +# 65535-char ID string
> +poke_file "$TEST_DIR/sn0" 12 '\xff\xff'
> +# 65535-char name
> +poke_file "$TEST_DIR/sn0" 14 '\xff\xff'

Do we care that there are NUL bytes in the id and name?  (The spec is
clear that id and name are not NUL-terminated, but does not actually
seem to forbid the use of arbitrary binary values as names...)

> +# 1 kB of extra data
> +poke_file "$TEST_DIR/sn0" 36 '\x00\x00\x04\x00'
> +
> +# Create test image
> +_make_test_img 64M
> +
> +# Hook up snapshot table somewhere safe (at 1 MB)
> +poke_file "$TEST_IMG" 64 '\x00\x00\x00\x00\x00\x10\x00\x00'
> +
> +offset=1048576
> +size_written=0
> +sn_count=0
> +while [ $size_written -le $((64 * 1048576)) ]; do
> +    dd if="$TEST_DIR/sn0" of="$TEST_IMG" bs=1 seek=$offset conv=notrunc \
> +        &> /dev/null
> +    offset=$((offset + sn_size))
> +    size_written=$((size_written + sn_size))
> +    sn_count=$((sn_count + 1))
> +done
> +
> +# Give the last snapshot (the one to be removed) an L1 table so we can
> +# see how that is handled when repairing the image
> +# (Put it two clusters before 1 MB, and one L2 table one cluster
> +# before 1 MB)
> +poke_file "$TEST_IMG" $((offset - sn-size + 0)) \
> +    '\x00\x00\x00\x00\x00\x0e\x00\x00'
> +poke_file "$TEST_IMG" $((offset - sn-size + 8)) \
> +    '\x00\x00\x00\x01'
> +
> +# Hook up the L2 table
> +poke_file "$TEST_IMG" $((1048576 - 2 * 65536)) \
> +    '\x80\x00\x00\x00\x00\x0f\x00\x00'
> +
> +# Make sure all of the clusters we just hooked up are allocated:
> +# - The snapshot table
> +# - The last snapshot's L1 and L2 table
> +refblock0_allocate $((1048576 - 2 * 65536)) $offset
> +
> +poke_file "$TEST_IMG" 60 \
> +    "$(printf '%08x' $sn_count | sed -e 's/\(..\)/\\x\1/g')"

Wow - you went to a lot of work to produce this image :)

> +
> +# Print error
> +_img_info
> +echo
> +_check_test_img
> +echo
> +
> +# Should be repairable
> +_check_test_img -r all

Depending on our decision on 9 and 10.

> +
> +echo
> +echo "$((sn_count - 1)) snapshots should remain:"
> +echo "  qemu-img info reports $(_img_info | grep -c '^ \{34\}') snapshots"
> +echo "  Image header reports $(peek_file_be "$TEST_IMG" 60 4) snapshots"
> +
> +echo
> +echo '=== Too many snapshots ==='
> +echo
> +
> +# Create a v2 image, for speeds' sake: All-zero snapshot table entries
> +# are only valid in v2.
> +IMGOPTS='compat=0.10' _make_test_img 64M
> +
> +# Hook up snapshot table somewhere safe (at 1 MB)
> +poke_file "$TEST_IMG" 64 '\x00\x00\x00\x00\x00\x10\x00\x00'
> +# "Create" more than 65536 snapshots (twice that many here)
> +poke_file "$TEST_IMG" 60 '\x00\x02\x00\x00'
> +
> +# 40-byte all-zero snapshot table entries are valid snapshots, but
> +# only in v2 (v3 needs 16 bytes of extra data, so we would have to
> +# write 131072x '\x10').
> +truncate -s $((1048576 + 40 * 131072)) "$TEST_IMG"
> +
> +# But let us give one of the snapshots to be removed an L1 table so
> +# we can see how that is handled when repairing the image.
> +# (Put it two clusters before 1 MB, and one L2 table one cluster
> +# before 1 MB)
> +poke_file "$TEST_IMG" $((1048576 + 40 * 65536 + 0)) \
> +    '\x00\x00\x00\x00\x00\x0e\x00\x00'
> +poke_file "$TEST_IMG" $((1048576 + 40 * 65536 + 8)) \
> +    '\x00\x00\x00\x01'
> +
> +# Hook up the L2 table
> +poke_file "$TEST_IMG" $((1048576 - 2 * 65536)) \
> +    '\x80\x00\x00\x00\x00\x0f\x00\x00'
> +
> +# Make sure all of the clusters we just hooked up are allocated:
> +# - The snapshot table
> +# - The last snapshot's L1 and L2 table
> +refblock0_allocate $((1048576 - 2 * 65536)) $((1048576 + 40 * 131072))
> +

Another fun image creation.

> +# Print error
> +_img_info
> +echo
> +_check_test_img
> +echo
> +
> +# Should be repairable
> +_check_test_img -r all
> +
> +echo
> +echo '65536 snapshots should remain:'
> +echo "  qemu-img info reports $(_img_info | grep -c '^ \{34\}') snapshots"
> +echo "  Image header reports $(peek_file_be "$TEST_IMG" 60 4) snapshots"
> +
> +# success, all done
> +echo "*** done"
> +status=0

Overall, looks like a nice test.  I'm comfortable giving:

Reviewed-by: Eric Blake <eblake@redhat.com>

> +++ b/tests/qemu-iotests/261.out
> @@ -0,0 +1,321 @@
> +QA output created by 261
> +
> +=== Create v2 template ===
> +
> +Formatting 'TEST_DIR/t.IMGFMT.v2.orig', fmt=IMGFMT size=67108864
> +No errors were found on the image.
> +Snapshots in TEST_DIR/t.IMGFMT.v2.orig:
> +  [0]
> +    ID: 1
> +    Name: sn0
> +    Extra data size: 0
> +  [1]
> +    ID: 2
> +    Name: sn1
> +    Extra data size: 42
> +    VM state size: 0
> +    Disk size: 67108864
> +    Unknown extra data: very important data

Again, if qemu-img is improved to output more information natively
instead of you hand-scraping it, I don't think that printing the
contents of extra data is viable in that context.  But it looks fun here
in this test.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH for-4.2 02/13] qcow2: Keep unknown extra snapshot data
  2019-07-30 17:56   ` Eric Blake
@ 2019-07-31  8:54     ` Max Reitz
  2019-08-16  2:09       ` Max Reitz
  0 siblings, 1 reply; 38+ messages in thread
From: Max Reitz @ 2019-07-31  8:54 UTC (permalink / raw)
  To: Eric Blake, qemu-block; +Cc: Kevin Wolf, qemu-devel

[-- Attachment #1.1: Type: text/plain, Size: 7793 bytes --]

On 30.07.19 19:56, Eric Blake wrote:
> On 7/30/19 12:24 PM, Max Reitz wrote:
>> The qcow2 specification says to ignore unknown extra data fields in
>> snapshot table entries.  Currently, we discard it whenever we update the
>> image, which is a bit different from "ignore".
>>
>> This patch makes the qcow2 driver keep all unknown extra data fields
>> when updating an image's snapshot table.
> 
> The cover letter questioned whether we want this, but I think we do.
> 
>>
>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>> ---
>>  block/qcow2.h          |  5 ++++
>>  block/qcow2-snapshot.c | 59 +++++++++++++++++++++++++++++++++++-------
>>  2 files changed, 55 insertions(+), 9 deletions(-)
>>
>> diff --git a/block/qcow2.h b/block/qcow2.h
>> index 175708cee0..290a48b77e 100644
>> --- a/block/qcow2.h
>> +++ b/block/qcow2.h
>> @@ -61,6 +61,9 @@
>>   * space for snapshot names and IDs */
>>  #define QCOW_MAX_SNAPSHOTS_SIZE (1024 * QCOW_MAX_SNAPSHOTS)
>>  
>> +/* Maximum amount of extra data per snapshot table entry to accept */
>> +#define QCOW_MAX_SNAPSHOT_EXTRA_DATA 1024
>> +
>>  /* Bitmap header extension constraints */
>>  #define QCOW2_MAX_BITMAPS 65535
>>  #define QCOW2_MAX_BITMAP_DIRECTORY_SIZE (1024 * QCOW2_MAX_BITMAPS)
>> @@ -178,6 +181,8 @@ typedef struct QCowSnapshot {
>>      uint32_t date_sec;
>>      uint32_t date_nsec;
>>      uint64_t vm_clock_nsec;
>> +    uint32_t extra_data_size;
>> +    void *unknown_extra_data; /* Extra data past QCowSnapshotExtraData */
> 
> Is char* going to be any easier to use than void*?
> 
>> +++ b/block/qcow2-snapshot.c
> 
>> @@ -80,30 +80,52 @@ int qcow2_read_snapshots(BlockDriverState *bs, Error **errp)
> 
>> +
>> +        /* Read known extra data */
>>          ret = bdrv_pread(bs->file, offset, &extra,
>> -                         MIN(sizeof(extra), extra_data_size));
>> +                         MIN(sizeof(extra), sn->extra_data_size));
>>          if (ret < 0) {
>>              error_setg_errno(errp, -ret, "Failed to read snapshot table");
>>              goto fail;
>>          }
>> -        offset += extra_data_size;
>> +        offset += MIN(sizeof(extra), sn->extra_data_size);
>>  
>> -        if (extra_data_size >= 8) {
>> +        if (sn->extra_data_size >= 8) {
> 
> While touching this, is it worth spelling it:
> if (sn->extra_data_size >= sizeof(extra.vm_state_size_large)) {
> 
>>              sn->vm_state_size = be64_to_cpu(extra.vm_state_size_large);
>>          }
>>  
>> -        if (extra_data_size >= 16) {
>> +        if (sn->extra_data_size >= 16) {
> 
> and a similar use of sizeof() instead of hard-coded 16 here?

Well, the most verbose spelling would be offsetof() + sizeof() both
times.  Hm.  I’ll see how it looks.  I think it is obvious enough what
it means as it is, and the full offsetof() + sizeof() might actually be
less obvious just because it’s longer and thus takes longer to read.

>>              sn->disk_size = be64_to_cpu(extra.disk_size);
>>          } else {
>>              sn->disk_size = bs->total_sectors * BDRV_SECTOR_SIZE;
>>          }
>>  
>> +        if (sn->extra_data_size > sizeof(extra)) {
>> +            /* Store unknown extra data */
>> +            size_t unknown_extra_data_size =
>> +                sn->extra_data_size - sizeof(extra);
>> +
>> +            sn->unknown_extra_data = g_malloc(unknown_extra_data_size);
>> +            ret = bdrv_pread(bs->file, offset, sn->unknown_extra_data,
>> +                             unknown_extra_data_size);
> 
> We're doing two separate bdrv_pread()s. Would it be better to do a
> single bdrv_preadv into a vector composed of &extra and
> &unknown_extra_data, for less I/O?  (Then again, this micro-optimization
> is probably in the noise in the long run)

Interesting idea, we could even add the ID and name string into that
vector.  But I’m not sure whether it’s really useful.

(I’ll take a look anyway, because it sounds interesting.)

>> +            if (ret < 0) {
>> +                error_setg_errno(errp, -ret, "Failed to read snapshot table");
>> +                goto fail;
>> +            }
>> +            offset += unknown_extra_data_size;
>> +        }
>> +
>>          /* Read snapshot ID */
>>          sn->id_str = g_malloc(id_str_size + 1);
>>          ret = bdrv_pread(bs->file, offset, sn->id_str, id_str_size);
>> @@ -161,7 +183,7 @@ static int qcow2_write_snapshots(BlockDriverState *bs)
>>          sn = s->snapshots + i;
>>          offset = ROUND_UP(offset, 8);
>>          offset += sizeof(h);
>> -        offset += sizeof(extra);
>> +        offset += MAX(sizeof(extra), sn->extra_data_size);
>>          offset += strlen(sn->id_str);
>>          offset += strlen(sn->name);
>>  
>> @@ -208,7 +230,8 @@ static int qcow2_write_snapshots(BlockDriverState *bs)
>>          h.date_sec = cpu_to_be32(sn->date_sec);
>>          h.date_nsec = cpu_to_be32(sn->date_nsec);
>>          h.vm_clock_nsec = cpu_to_be64(sn->vm_clock_nsec);
>> -        h.extra_data_size = cpu_to_be32(sizeof(extra));
>> +        h.extra_data_size = cpu_to_be32(MAX(sizeof(extra),
>> +                                            sn->extra_data_size));
>>  
>>          memset(&extra, 0, sizeof(extra));
>>          extra.vm_state_size_large = cpu_to_be64(sn->vm_state_size);
>> @@ -233,6 +256,22 @@ static int qcow2_write_snapshots(BlockDriverState *bs)
>>          }
>>          offset += sizeof(extra);
>>  
>> +        if (sn->extra_data_size > sizeof(extra)) {
>> +            size_t unknown_extra_data_size =
>> +                sn->extra_data_size - sizeof(extra);
>> +
>> +            /* qcow2_read_snapshots() ensures no unbounded allocation */
>> +            assert(unknown_extra_data_size <= INT_MAX);
> 
> Should we make this tighter:
>  assert(unknown_extra_data_size < QCOW_MAX_SNAPSHOT_EXTRA_DATA);
> or even
>  assert(unknown_extra_data_size <=
>         QCOW_MAX_SNAPSHOT_EXTRA_DATA - sizeof(extra));

Hm, I wanted to signal that it can fit into the int taken by
bdrv_pwrite().  But then I should use BDRV_REQUEST_MAX_BYTES, actually.

I’m not sure.  It doesn’t really matter that it’s less than
QCOW_MAX_SNAPSHOT_EXTRA_DATA here.  What does matter is that it doesn’t
exceed BDRV_REQUEST_MAX_BYTES.

>> +            assert(sn->unknown_extra_data);
>> +
>> +            ret = bdrv_pwrite(bs->file, offset, sn->unknown_extra_data,
>> +                              unknown_extra_data_size);
>> +            if (ret < 0) {
>> +                goto fail;
>> +            }
>> +            offset += unknown_extra_data_size;
>> +        }
>> +
>>          ret = bdrv_pwrite(bs->file, offset, sn->id_str, id_str_size);
>>          if (ret < 0) {
>>              goto fail;
>> @@ -375,6 +414,7 @@ int qcow2_snapshot_create(BlockDriverState *bs, QEMUSnapshotInfo *sn_info)
>>      sn->date_sec = sn_info->date_sec;
>>      sn->date_nsec = sn_info->date_nsec;
>>      sn->vm_clock_nsec = sn_info->vm_clock_nsec;
>> +    sn->extra_data_size = sizeof(QCowSnapshotExtraData);
>>  
>>      /* Allocate the L1 table of the snapshot and copy the current one there. */
>>      l1_table_offset = qcow2_alloc_clusters(bs, s->l1_size * sizeof(uint64_t));
>> @@ -646,6 +686,7 @@ int qcow2_snapshot_delete(BlockDriverState *bs,
>>       * The snapshot is now unused, clean up. If we fail after this point, we
>>       * won't recover but just leak clusters.
>>       */
>> +    g_free(sn.unknown_extra_data);
>>      g_free(sn.id_str);
>>      g_free(sn.name);
>>  
>>
> 
> Tweaks to assertions are minor enough that I'm okay with:
> Reviewed-by: Eric Blake <eblake@redhat.com>

Thanks for reviewing!

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH for-4.2 05/13] qcow2: Write v3-compliant snapshot list on upgrade
  2019-07-30 18:10   ` Eric Blake
@ 2019-07-31  8:56     ` Max Reitz
  0 siblings, 0 replies; 38+ messages in thread
From: Max Reitz @ 2019-07-31  8:56 UTC (permalink / raw)
  To: Eric Blake, qemu-block; +Cc: Kevin Wolf, qemu-devel

[-- Attachment #1.1: Type: text/plain, Size: 1530 bytes --]

On 30.07.19 20:10, Eric Blake wrote:
> On 7/30/19 12:25 PM, Max Reitz wrote:
>> qcow2 v3 requires every snapshot table entry to have two extra data
>> fields: The 64-bit VM state size, and the virtual disk size.  Both are
>> optional for v2 images, so they may not be present.
>>
>> qcow2_upgrade() therefore should update the snapshot table to ensure all
>> entries have these extra data fields.
>>
>> Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1727347
>> Reported-by: Eric Blake <eblake@redhat.com>
>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>> ---
>>  block/qcow2.c | 29 +++++++++++++++++++++++++++--
>>  1 file changed, 27 insertions(+), 2 deletions(-)
>>
> 
>> +
>> +    /*
>> +     * In v2, snapshots do not need to have extra data.  v3 requires
>> +     * the 64-bit VM state size and the virtual disk size to be
>> +     * present.
>> +     * qcow2_write_snapshots() will always write the list in the
>> +     * v3-compliant format.
>> +     */
>> +    need_snapshot_update = false;
>> +    for (i = 0; i < s->nb_snapshots; i++) {
>> +        if (s->snapshots[i].extra_data_size < 16) {
> 
> s/16/sizeof(extra)/ looks a bit nicer, but doesn't change semantics.

Hm, but it’s not quite right.  I mean, right now it is, but if we were
to add a new field to snapshot metadata, it wouldn’t be.  It should be 16.

I can make it something like sizeof(extra.vm_state_size_large) +
sizeof(extra.disk_size), though.

Max

> Reviewed-by: Eric Blake <eblake@redhat.com>



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH for-4.2 06/13] qcow2: Separate qcow2_check_read_snapshot_table()
  2019-07-30 18:53   ` Eric Blake
@ 2019-07-31  8:59     ` Max Reitz
  0 siblings, 0 replies; 38+ messages in thread
From: Max Reitz @ 2019-07-31  8:59 UTC (permalink / raw)
  To: Eric Blake, qemu-block; +Cc: Kevin Wolf, qemu-devel

[-- Attachment #1.1: Type: text/plain, Size: 1956 bytes --]

On 30.07.19 20:53, Eric Blake wrote:
> On 7/30/19 12:25 PM, Max Reitz wrote:
>> Reading the snapshot table can fail.  That is a problem when we want to
>> repair the image.
>>
>> Therefore, stop reading the snapshot table in qcow2_do_open() in check
>> mode.  Instead, add a new function qcow2_check_read_snapshot_table()
>> that reads the snapshot table at a later point.  In the future, we want
>> to handle errors here and fix them.
>>
>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>> ---
>>  block/qcow2.h          |  4 +++
>>  block/qcow2-snapshot.c | 58 ++++++++++++++++++++++++++++++++
>>  block/qcow2.c          | 76 ++++++++++++++++++++++++++++++++----------
>>  3 files changed, 120 insertions(+), 18 deletions(-)
>>
> 
>> +++ b/block/qcow2-snapshot.c
>> @@ -321,6 +321,64 @@ fail:
>>      return ret;
>>  }
>>  
>> +int coroutine_fn qcow2_check_read_snapshot_table(BlockDriverState *bs,
>> +                                                 BdrvCheckResult *result,
>> +                                                 BdrvCheckMode fix)
>> +{
>> +    BDRVQcow2State *s = bs->opaque;
>> +    Error *local_err = NULL;
>> +    int ret;
>> +    struct {
>> +        uint32_t nb_snapshots;
>> +        uint64_t snapshots_offset;
>> +    } QEMU_PACKED snapshot_table_pointer;
>> +
>> +    /* qcow2_do_open() discards this information in check mode */
>> +    ret = bdrv_pread(bs->file, 60, &snapshot_table_pointer,
>> +                     sizeof(snapshot_table_pointer));
> 
> Should that '60' be a named constant or offsetof() expression?  (I know,
> you just copied this instance from elsewhere)

Well, I copied it from the specification. O:-)

You’re completely right.  It should be offsetof(QCowHeader, nb_snapshots).

(I blame the fact that I had started writing the test by this point, so
I was already immersed in so many magic numbers.)

> Reviewed-by: Eric Blake <eblake@redhat.com>



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH for-4.2 08/13] qcow2: Fix broken snapshot table entries
  2019-07-30 19:02   ` Eric Blake
@ 2019-07-31  9:06     ` Max Reitz
  0 siblings, 0 replies; 38+ messages in thread
From: Max Reitz @ 2019-07-31  9:06 UTC (permalink / raw)
  To: Eric Blake, qemu-block; +Cc: Kevin Wolf, qemu-devel

[-- Attachment #1.1: Type: text/plain, Size: 2083 bytes --]

On 30.07.19 21:02, Eric Blake wrote:
> On 7/30/19 12:25 PM, Max Reitz wrote:
>> The only case where we currently reject snapshot table entries is when
>> they have too much extra data.  Fix them with qemu-img check -r all by
>> counting it as a corruption, reducing their extra_data_size, and then
>> letting qcow2_check_fix_snapshot_table() do the rest.
>>
>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>> ---
>>  block/qcow2-snapshot.c | 69 ++++++++++++++++++++++++++++++++++--------
>>  1 file changed, 56 insertions(+), 13 deletions(-)
>>
> 
>> @@ -112,16 +141,22 @@ int qcow2_read_snapshots(BlockDriverState *bs, Error **errp)
>>          }
>>  
>>          if (sn->extra_data_size > sizeof(extra)) {
>> -            /* Store unknown extra data */
>>              size_t unknown_extra_data_size =
>>                  sn->extra_data_size - sizeof(extra);
>>  
>> -            sn->unknown_extra_data = g_malloc(unknown_extra_data_size);
>> -            ret = bdrv_pread(bs->file, offset, sn->unknown_extra_data,
>> -                             unknown_extra_data_size);
>> -            if (ret < 0) {
>> -                error_setg_errno(errp, -ret, "Failed to read snapshot table");
>> -                goto fail;
>> +            if (discard_unknown_extra_data) {
>> +                /* Discard unknown extra data */
>> +                sn->extra_data_size = sizeof(extra);
> 
> This truncates it down to just the data we know. Should it instead
> truncate down to the 1024 bytes of QCOW_MAX_SNAPSHOT_EXTRA_DATA defined
> in 2/13?  (We can't keep all of the user's extra stuff, but we can at
> least try to preserve as much as possible)

On one hand, potentially cutting unknown data in half sounds like not
such a good idea to me.

On the other, a field can only be considered present if it is fully
present.  So cutting any optional data in half shouldn’t have any
negative impact.

So, yes, truncating it down to 1024 bytes sounds good.

Max

> Otherwise, looks good.
> Reviewed-by: Eric Blake <eblake@redhat.com>
> 



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH for-4.2 09/13] qcow2: Fix overly long snapshot tables
  2019-07-30 19:08   ` Eric Blake
@ 2019-07-31  9:22     ` Max Reitz
  2019-08-16 18:06       ` Max Reitz
  0 siblings, 1 reply; 38+ messages in thread
From: Max Reitz @ 2019-07-31  9:22 UTC (permalink / raw)
  To: Eric Blake, qemu-block; +Cc: Kevin Wolf, qemu-devel

[-- Attachment #1.1: Type: text/plain, Size: 4026 bytes --]

On 30.07.19 21:08, Eric Blake wrote:
> On 7/30/19 12:25 PM, Max Reitz wrote:
>> We currently refuse to open qcow2 images with overly long snapshot
>> tables.  This patch makes qemu-img check -r all drop all offending
>> entries past what we deem acceptable.
>>
>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>> ---
>>  block/qcow2-snapshot.c | 89 +++++++++++++++++++++++++++++++++++++-----
>>  1 file changed, 79 insertions(+), 10 deletions(-)
> 
> I'm less sure about this one.  8/13 should have no semantic effect (if
> the user _depended_ on that much extra data, they should have set an
> incompatible feature flag bit, at which point we'd leave their data
> alone because we don't recognize the feature bit; so it is safe to
> assume the user did not depend on the data and that we can thus nuke it
> with impunity).  But here, we are throwing away the user's internal
> snapshots, and not even giving them a say in which ones to throw away
> (more likely, by trimming from the end, we are destroying the most
> recent snapshots in favor of the older ones - but I could argue that
> throwing away the oldest also has its uses).

First, I don’t think there really is a legitimate use case for having an
overly long snapshot table.  In fact, I think our limit is too high as
it is and we just introduced it this way because we didn’t have any
repair functionality, and so just had to pick some limit that nobody
could ever reasonably reach.

(As the test shows, you need more than 500 snapshots with 64 kB names
and ID strings, and 1 kB of extra data to reach this limit.)

So the only likely cause to reach this number of snapshots is
corruption.  OK, so maybe we don’t need to be able to fix it, then,
because the image is corrupted anyway.

But I think we do want to be able to fix it, because otherwise you just
can’t open the image at all and thus not even read the active layer.


This gets me to: Second, it doesn’t make things worse.  Right now, we
just refuse to open such images in all cases.  I’d personally prefer
discarding some data on my image over losing it all.


And third, I wonder what interface you have in mind.  I think adding an
interface to qemu-img check to properly address this problem (letting
the user discard individual snapshots) is hard.  I could imagine two things:

(A) Making qemu-img snapshot sometimes set BDRV_O_CHECK, too, or
something.  For qemu-img snapshot -d, you don’t need to read the whole
table into memory, and thus we don’t need to impose any limit.  But that
seems pretty hackish to me.

(B) Maybe the proper solution would be to add an interactive interface
to bdrv_check().  I can imagine that in the future, we may get more
cases where we want interaction with the user on what data to delete and
so on.  But that's hard...  (I’ll try.  Good thing stdio is already the
standard interface in bdrv_check(), so I won’t have to feel bad if I go
down that route even further.)

Max

>> @@ -417,7 +461,32 @@ int coroutine_fn qcow2_check_read_snapshot_table(BlockDriverState *bs,
>>  
>>          return ret;
>>      }
>> -    result->corruptions += extra_data_dropped;
>> +    result->corruptions += nb_clusters_reduced + extra_data_dropped;
>> +
>> +    if (nb_clusters_reduced) {
>> +        /*
>> +         * Update image header now, because:
>> +         * (1) qcow2_check_refcounts() relies on s->nb_snapshots to be
>> +         *     the same as what the image header says,
>> +         * (2) this leaks clusters, but qcow2_check_refcounts() will
>> +         *     fix that.
>> +         */
>> +        assert(fix & BDRV_FIX_ERRORS);
>> +
>> +        snapshot_table_pointer.nb_snapshots = cpu_to_be32(s->nb_snapshots);
>> +        ret = bdrv_pwrite_sync(bs->file, 60,
> 
> That '60' needs a name; it keeps popping up.
> 
> If we like the patch, I didn't spot major coding problems.  But because
> I'm not sure we want this patch, I'll skip R-b for now.
> 



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH for-4.2 10/13] qcow2: Repair snapshot table with too many entries
  2019-07-30 19:10   ` Eric Blake
@ 2019-07-31  9:25     ` Max Reitz
  0 siblings, 0 replies; 38+ messages in thread
From: Max Reitz @ 2019-07-31  9:25 UTC (permalink / raw)
  To: Eric Blake, qemu-block; +Cc: Kevin Wolf, qemu-devel

[-- Attachment #1.1: Type: text/plain, Size: 735 bytes --]

On 30.07.19 21:10, Eric Blake wrote:
> On 7/30/19 12:25 PM, Max Reitz wrote:
>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>> ---
>>  block/qcow2-snapshot.c | 14 ++++++++++++++
>>  1 file changed, 14 insertions(+)
>>
> 
> Same problem as for 9/13 - should we really be throwing away the user's
> data like this?  (9/13 hits if the user has a small number of snapshots,
> but each has enough extra data, that the overall table is bigger than we
> like; 10/13 hits if the user has more snapshots than we like, but
> otherwise they do the same thing).

The same arguments apply (though the “it must be a corruption” argument
applies even more, because having more than 65536 snapshots just isn’t
right.)

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH for-4.2 12/13] iotests: Add peek_file* functions
  2019-07-30 19:22   ` Eric Blake
@ 2019-07-31  9:27     ` Max Reitz
  0 siblings, 0 replies; 38+ messages in thread
From: Max Reitz @ 2019-07-31  9:27 UTC (permalink / raw)
  To: Eric Blake, qemu-block; +Cc: Kevin Wolf, qemu-devel

[-- Attachment #1.1: Type: text/plain, Size: 2121 bytes --]

On 30.07.19 21:22, Eric Blake wrote:
> On 7/30/19 12:25 PM, Max Reitz wrote:
>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>> ---
>>  tests/qemu-iotests/common.rc | 20 ++++++++++++++++++++
>>  1 file changed, 20 insertions(+)
>>
>> diff --git a/tests/qemu-iotests/common.rc b/tests/qemu-iotests/common.rc
>> index 5502c3da2f..78decfd5d5 100644
>> --- a/tests/qemu-iotests/common.rc
>> +++ b/tests/qemu-iotests/common.rc
>> @@ -53,6 +53,26 @@ poke_file()
>>      printf "$3" | dd "of=$1" bs=1 "seek=$2" conv=notrunc &>/dev/null
>>  }
>>  
>> +# peek_file_le 'test.img' 512 2 => 65534
>> +peek_file_le()
>> +{
>> +    # Wrap in echo $() to strip spaces
>> +    echo $(od -j"$2" -N"$3" --endian=little -An -vtu"$3" "$1")
> 
> Requires coreutils' od, but we can patch that later if it proves to be a
> problem on other hosts.
> 
> I used to do something similar in nbdkit (prior to qemu-nbd --list
> making my life a lot easier; see nbdkit commit b228cb40); but there, I
> read a 16-bit value in 2 8-bit chunks and pieced it together myself
> rather than relying on --endian:
> 
> -    # Protocol is big endian, we want native endian.
> -    # xargs trick to trim whitespace from
> -    # https://stackoverflow.com/a/12973694
> -    eflags_hi=$(od -An -N1     -tx1 eflags.out | xargs)
> -    eflags_lo=$(od -An -N1 -j1 -tx1 eflags.out | xargs)
> -    eflags=$(( 0x$eflags_hi << 8 | 0x$eflags_lo ))
> 
> But as long as we are using --endian, your version nicely handles 1, 2,
> 4, and 8-byte reads.
> 
>> +
>> +# peek_file_raw 'test.img' 512 2 => '\xff\xfe'
>> +peek_file_raw()
>> +{
>> +    dd if="$1" bs=1 skip="$2" count="$3" status=none
>> +}
> 
> Of course, calling $(peek_file_raw ...) is a bad idea, because it might
> eat a trailing byte that happened to be a newline; it also doesn't
> handle NUL bytes very well.  Is it worth documenting caveats for using
> this one?

In my experience, it handled NUL bytes so well that I had to tr -d them
away. :-)

I mean, isn’t the problem in the caller, then?

Max

> Reviewed-by: Eric Blake <eblake@redhat.com>


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH for-4.2 13/13] iotests: Test qcow2's snapshot table handling
  2019-07-30 19:56   ` Eric Blake
@ 2019-07-31  9:36     ` Max Reitz
  0 siblings, 0 replies; 38+ messages in thread
From: Max Reitz @ 2019-07-31  9:36 UTC (permalink / raw)
  To: Eric Blake, qemu-block; +Cc: Kevin Wolf, qemu-devel

[-- Attachment #1.1: Type: text/plain, Size: 4581 bytes --]

On 30.07.19 21:56, Eric Blake wrote:
> On 7/30/19 12:25 PM, Max Reitz wrote:
>> Add a test how our qcow2 driver handles extra data in snapshot table
>> entries, and how it repairs overly long snapshot tables.
> 
> May need tweaking if we drop 9 and 10.
> 
>>
>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>> ---
>>  tests/qemu-iotests/261     | 449 +++++++++++++++++++++++++++++++++++++
>>  tests/qemu-iotests/261.out | 321 ++++++++++++++++++++++++++
>>  tests/qemu-iotests/group   |   1 +
>>  3 files changed, 771 insertions(+)
>>  create mode 100755 tests/qemu-iotests/261
>>  create mode 100644 tests/qemu-iotests/261.out
>>
>> +
>> +# Parameters:
>> +#   $1: image filename
>> +#   $2: snapshot table entry offset in the image
>> +snapshot_table_entry_size()
>> +{
>> +    id_len=$(peek_file_be "$1" $(($2 + 12)) 2)
>> +    name_len=$(peek_file_be "$1" $(($2 + 14)) 2)
>> +    extra_len=$(peek_file_be "$1" $(($2 + 36)) 4)
>> +
>> +    full_len=$((40 + extra_len + id_len + name_len))
>> +    if [ $((full_len % 8)) = 0 ]; then
>> +        echo $full_len
>> +    else
>> +        echo $((full_len + 8 - full_len % 8))
> 
> Could replace the entire if with:
>  echo $(( (full_len + 7) / 8 * 8 ))
> but what you have works.

Ah, sure.

>> +    fi
>> +}
>> +
>> +# Parameter:
>> +#   $1: image filename
>> +print_snapshot_table()
>> +{
>> +    nb_entries=$(peek_file_be "$1" 60 4)
>> +    offset=$(peek_file_be "$1" 64 8)
>> +
>> +    echo "Snapshots in $1:" | _filter_testdir | _filter_imgfmt
> 
> Should a separate patch add support in 'qemu-img info'/'qemu-img
> snapshot -l' for letting users know how much extra info is in each
> snapshot?  It seems useful enough without having to recode this
> low-level iotest introspection.

To me, it doesn’t seem really useful right now, as all qemu-created
images (past 1.1) will have the same 16 bytes.

>> +
>> +    for ((i = 0; i < nb_entries; i++)); do
>> +        id_len=$(peek_file_be "$1" $((offset + 12)) 2)
>> +        name_len=$(peek_file_be "$1" $((offset + 14)) 2)
>> +        extra_len=$(peek_file_be "$1" $((offset + 36)) 4)
>> +
>> +        extra_ofs=$((offset + 40))
>> +        id_ofs=$((extra_ofs + extra_len))
>> +        name_ofs=$((id_ofs + id_len))
>> +
>> +        echo "  [$i]"
>> +        echo "    ID: $(peek_file_raw "$1" $id_ofs $id_len)"
>> +        echo "    Name: $(peek_file_raw "$1" $name_ofs $name_len)"
> 
> We're relying on the files having sane strings at those offsets - but
> that's fine for the iotest.
> 
>> +        echo "    Extra data size: $extra_len"
>> +        if [ $extra_len -ge 8 ]; then
>> +            echo "    VM state size: $(peek_file_be "$1" $extra_ofs 8)"
>> +        fi
>> +        if [ $extra_len -ge 16 ]; then
>> +            echo "    Disk size: $(peek_file_be "$1" $((extra_ofs + 8)) 8)"
>> +        fi
>> +        if [ $extra_len -gt 16 ]; then
>> +            echo '    Unknown extra data:' \
>> +                "$(peek_file_raw "$1" $((extra_ofs + 16)) $((extra_len - 16)) \
>> +                   | tr -d '\0')"
> 
> Printing the unknown extra data seems fishy, especially if you are going
> to sanitize out the NUL bytes.  An od dump of every byte might be more
> useful, but I'd also be happy with just printing the number of unknown
> bytes without actually worrying about printing the contents of those bytes.

It’s a test, I know exactly what the extra data is (supposed to be).

(namely “very important data\0\0\0\0\0\0\0”)

[...]

>> +# We only need the fixed part, though.
>> +truncate -s 40 "$TEST_DIR/sn0"
>> +
>> +# 65535-char ID string
>> +poke_file "$TEST_DIR/sn0" 12 '\xff\xff'
>> +# 65535-char name
>> +poke_file "$TEST_DIR/sn0" 14 '\xff\xff'
> 
> Do we care that there are NUL bytes in the id and name?  (The spec is
> clear that id and name are not NUL-terminated, but does not actually
> seem to forbid the use of arbitrary binary values as names...)

Right now we don’t care.  Which is good for me, because anything else
would make this test even slower than it already is (writing a different
name and ID into every snapshot would be a pain).

(It’s even worse for the next case.  There is a reason I do it for v2
only, where fully-zero snapshot table entries are valid.  It takes a
long time just to write a '16' into every one of >65536 entries.)

Max

[...]

> Overall, looks like a nice test.  I'm comfortable giving:
> 
> Reviewed-by: Eric Blake <eblake@redhat.com>

Again, thanks for reviewing!


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH for-4.2 02/13] qcow2: Keep unknown extra snapshot data
  2019-07-31  8:54     ` Max Reitz
@ 2019-08-16  2:09       ` Max Reitz
  0 siblings, 0 replies; 38+ messages in thread
From: Max Reitz @ 2019-08-16  2:09 UTC (permalink / raw)
  To: Eric Blake, qemu-block; +Cc: Kevin Wolf, qemu-devel

[-- Attachment #1.1: Type: text/plain, Size: 1479 bytes --]

On 31.07.19 10:54, Max Reitz wrote:
> On 30.07.19 19:56, Eric Blake wrote:
>> On 7/30/19 12:24 PM, Max Reitz wrote:

[...]

>>> +        if (sn->extra_data_size > sizeof(extra)) {
>>> +            /* Store unknown extra data */
>>> +            size_t unknown_extra_data_size =
>>> +                sn->extra_data_size - sizeof(extra);
>>> +
>>> +            sn->unknown_extra_data = g_malloc(unknown_extra_data_size);
>>> +            ret = bdrv_pread(bs->file, offset, sn->unknown_extra_data,
>>> +                             unknown_extra_data_size);
>>
>> We're doing two separate bdrv_pread()s. Would it be better to do a
>> single bdrv_preadv into a vector composed of &extra and
>> &unknown_extra_data, for less I/O?  (Then again, this micro-optimization
>> is probably in the noise in the long run)
> 
> Interesting idea, we could even add the ID and name string into that
> vector.  But I’m not sure whether it’s really useful.
> 
> (I’ll take a look anyway, because it sounds interesting.)

I did, and it was actually really nice.  I liked it.  (I don’t hink the
performance is important, but it was actually just simpler.)

But then it turned out that it won’t work with patch 8, because after
that we may want to skip parts of the unknown extra data – and skipping
doesn’t work with bdrv_preadv()...

(So all the simplicity is lost, and that’s what I was interested in.)

But it was indeed quite cool. :-/

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH for-4.2 09/13] qcow2: Fix overly long snapshot tables
  2019-07-31  9:22     ` Max Reitz
@ 2019-08-16 18:06       ` Max Reitz
  0 siblings, 0 replies; 38+ messages in thread
From: Max Reitz @ 2019-08-16 18:06 UTC (permalink / raw)
  To: Eric Blake, qemu-block; +Cc: Kevin Wolf, qemu-devel

[-- Attachment #1.1: Type: text/plain, Size: 3524 bytes --]

On 31.07.19 11:22, Max Reitz wrote:
> On 30.07.19 21:08, Eric Blake wrote:
>> On 7/30/19 12:25 PM, Max Reitz wrote:
>>> We currently refuse to open qcow2 images with overly long snapshot
>>> tables.  This patch makes qemu-img check -r all drop all offending
>>> entries past what we deem acceptable.
>>>
>>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>>> ---
>>>  block/qcow2-snapshot.c | 89 +++++++++++++++++++++++++++++++++++++-----
>>>  1 file changed, 79 insertions(+), 10 deletions(-)
>>
>> I'm less sure about this one.  8/13 should have no semantic effect (if
>> the user _depended_ on that much extra data, they should have set an
>> incompatible feature flag bit, at which point we'd leave their data
>> alone because we don't recognize the feature bit; so it is safe to
>> assume the user did not depend on the data and that we can thus nuke it
>> with impunity).  But here, we are throwing away the user's internal
>> snapshots, and not even giving them a say in which ones to throw away
>> (more likely, by trimming from the end, we are destroying the most
>> recent snapshots in favor of the older ones - but I could argue that
>> throwing away the oldest also has its uses).
> 
> First, I don’t think there really is a legitimate use case for having an
> overly long snapshot table.  In fact, I think our limit is too high as
> it is and we just introduced it this way because we didn’t have any
> repair functionality, and so just had to pick some limit that nobody
> could ever reasonably reach.
> 
> (As the test shows, you need more than 500 snapshots with 64 kB names
> and ID strings, and 1 kB of extra data to reach this limit.)
> 
> So the only likely cause to reach this number of snapshots is
> corruption.  OK, so maybe we don’t need to be able to fix it, then,
> because the image is corrupted anyway.
> 
> But I think we do want to be able to fix it, because otherwise you just
> can’t open the image at all and thus not even read the active layer.
> 
> 
> This gets me to: Second, it doesn’t make things worse.  Right now, we
> just refuse to open such images in all cases.  I’d personally prefer
> discarding some data on my image over losing it all.
> 
> 
> And third, I wonder what interface you have in mind.  I think adding an
> interface to qemu-img check to properly address this problem (letting
> the user discard individual snapshots) is hard.  I could imagine two things:
> 
> (A) Making qemu-img snapshot sometimes set BDRV_O_CHECK, too, or
> something.  For qemu-img snapshot -d, you don’t need to read the whole
> table into memory, and thus we don’t need to impose any limit.  But that
> seems pretty hackish to me.
> 
> (B) Maybe the proper solution would be to add an interactive interface
> to bdrv_check().  I can imagine that in the future, we may get more
> cases where we want interaction with the user on what data to delete and
> so on.  But that's hard...  (I’ll try.  Good thing stdio is already the
> standard interface in bdrv_check(), so I won’t have to feel bad if I go
> down that route even further.)

After some fiddling around, I don’t think this is worth it.  As I said,
this is an extremely rare case anyway, so the main goal should be to
just being able to access the active layer to copy at least that data
off the image.

The other side is that this would introduce quite complex code that
basically cannot be tested reasonably.  I’d rather not do that.

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 38+ messages in thread

end of thread, back to index

Thread overview: 38+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-07-30 17:24 [Qemu-devel] [PATCH for-4.2 00/13] qcow2: Let check -r all repair some snapshot bits Max Reitz
2019-07-30 17:24 ` [Qemu-devel] [PATCH for-4.2 01/13] qcow2: Add Error ** to qcow2_read_snapshots() Max Reitz
2019-07-30 17:41   ` Eric Blake
2019-07-30 17:24 ` [Qemu-devel] [PATCH for-4.2 02/13] qcow2: Keep unknown extra snapshot data Max Reitz
2019-07-30 17:56   ` Eric Blake
2019-07-31  8:54     ` Max Reitz
2019-08-16  2:09       ` Max Reitz
2019-07-30 17:24 ` [Qemu-devel] [PATCH for-4.2 03/13] qcow2: Make qcow2_write_snapshots() public Max Reitz
2019-07-30 17:57   ` Eric Blake
2019-07-30 17:24 ` [Qemu-devel] [PATCH for-4.2 04/13] qcow2: Put qcow2_upgrade() into an own function Max Reitz
2019-07-30 18:00   ` Eric Blake
2019-07-30 17:25 ` [Qemu-devel] [PATCH for-4.2 05/13] qcow2: Write v3-compliant snapshot list on upgrade Max Reitz
2019-07-30 18:10   ` Eric Blake
2019-07-31  8:56     ` Max Reitz
2019-07-30 17:25 ` [Qemu-devel] [PATCH for-4.2 06/13] qcow2: Separate qcow2_check_read_snapshot_table() Max Reitz
2019-07-30 18:53   ` Eric Blake
2019-07-31  8:59     ` Max Reitz
2019-07-30 17:25 ` [Qemu-devel] [PATCH for-4.2 07/13] qcow2: Add qcow2_check_fix_snapshot_table() Max Reitz
2019-07-30 18:54   ` Eric Blake
2019-07-30 17:25 ` [Qemu-devel] [PATCH for-4.2 08/13] qcow2: Fix broken snapshot table entries Max Reitz
2019-07-30 19:02   ` Eric Blake
2019-07-31  9:06     ` Max Reitz
2019-07-30 17:25 ` [Qemu-devel] [PATCH for-4.2 09/13] qcow2: Fix overly long snapshot tables Max Reitz
2019-07-30 19:08   ` Eric Blake
2019-07-31  9:22     ` Max Reitz
2019-08-16 18:06       ` Max Reitz
2019-07-30 17:25 ` [Qemu-devel] [PATCH for-4.2 10/13] qcow2: Repair snapshot table with too many entries Max Reitz
2019-07-30 19:10   ` Eric Blake
2019-07-31  9:25     ` Max Reitz
2019-07-30 17:25 ` [Qemu-devel] [PATCH for-4.2 11/13] qcow2: Fix v3 snapshot table entry compliancy Max Reitz
2019-07-30 19:12   ` Eric Blake
2019-07-30 17:25 ` [Qemu-devel] [PATCH for-4.2 12/13] iotests: Add peek_file* functions Max Reitz
2019-07-30 19:22   ` Eric Blake
2019-07-31  9:27     ` Max Reitz
2019-07-30 17:25 ` [Qemu-devel] [PATCH for-4.2 13/13] iotests: Test qcow2's snapshot table handling Max Reitz
2019-07-30 19:56   ` Eric Blake
2019-07-31  9:36     ` Max Reitz
2019-07-30 17:39 ` [Qemu-devel] [PATCH for-4.2 00/13] qcow2: Let check -r all repair some snapshot bits Eric Blake

QEMU-Devel Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/qemu-devel/0 qemu-devel/git/0.git
	git clone --mirror https://lore.kernel.org/qemu-devel/1 qemu-devel/git/1.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 qemu-devel qemu-devel/ https://lore.kernel.org/qemu-devel \
		qemu-devel@nongnu.org
	public-inbox-index qemu-devel

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.nongnu.qemu-devel


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git