[Qemu-devel] [PATCH for 3.1 v3 0/3] minor qcow2 compression improvements

All of lore.kernel.org
 help / color / mirror / Atom feed

* [Qemu-devel] [PATCH for 3.1 v3 0/3] minor qcow2 compression improvements
@ 2018-11-13 23:03 Eric Blake
  2018-11-13 23:03 ` [Qemu-devel] [PATCH v3 1/3] qcow2: Document some maximum size constraints Eric Blake
                   ` (4 more replies)
  0 siblings, 5 replies; 9+ messages in thread
From: Eric Blake @ 2018-11-13 23:03 UTC (permalink / raw)
  To: qemu-devel; +Cc: berto, qemu-block, kwolf, mreitz

As the added iotests shows, we have a (corner case) data corruption
that is user triggerable, therefore, this is still appropriate for
inclusion in 3.1.

v6 was here:
https://lists.gnu.org/archive/html/qemu-devel/2018-06/msg08497.html

since then:
- don't reduce constraints on reftable [Kevin]
- rebase to recent iotests changes
- drop patches that might conflict with Vladimir's work now on Kevin's
block-next branch

001/3:[0042] [FC] 'qcow2: Document some maximum size constraints'
002/3:[0003] [FC] 'qcow2: Don't allow overflow during cluster allocation'
003/3:[0003] [FC] 'iotests: Add new test 220 for max compressed cluster offset'

Eric Blake (3):
  qcow2: Document some maximum size constraints
  qcow2: Don't allow overflow during cluster allocation
  iotests: Add new test 220 for max compressed cluster offset

 docs/interop/qcow2.txt     | 30 +++++++++++-
 block/qcow2.h              |  6 +++
 block/qcow2-refcount.c     | 20 +++++---
 tests/qemu-iotests/220     | 96 ++++++++++++++++++++++++++++++++++++++
 tests/qemu-iotests/220.out | 54 +++++++++++++++++++++
 tests/qemu-iotests/group   |  1 +
 6 files changed, 198 insertions(+), 9 deletions(-)
 create mode 100755 tests/qemu-iotests/220
 create mode 100644 tests/qemu-iotests/220.out

-- 
2.17.2

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Qemu-devel] [PATCH v3 1/3] qcow2: Document some maximum size constraints
  2018-11-13 23:03 [Qemu-devel] [PATCH for 3.1 v3 0/3] minor qcow2 compression improvements Eric Blake
@ 2018-11-13 23:03 ` Eric Blake
  2018-11-15 15:17   ` Alberto Garcia
  2018-11-13 23:03 ` [Qemu-devel] [PATCH v3 2/3] qcow2: Don't allow overflow during cluster allocation Eric Blake
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 9+ messages in thread
From: Eric Blake @ 2018-11-13 23:03 UTC (permalink / raw)
  To: qemu-devel; +Cc: berto, qemu-block, kwolf, mreitz

Although off_t permits up to 63 bits (8EB) of file offsets, in
practice, we're going to hit other limits first.  Document some
of those limits in the qcow2 spec, and how choice of cluster size
can influence some of the limits.

While we cannot map any virtual cluster to any address higher than
64 PB (56 bits) (due to the current L1/L2 field encoding stopping
at bit 55), the refcount table can currently be sized larger.  For
comparison, ext4 with 4k blocks caps files at 16PB.

Another interesting limit: for compressed clusters, the L2 layout
requires an ever-smaller maximum host offset as cluster size gets
larger, down to a 512 TB maximum with 2M clusters.

Signed-off-by: Eric Blake <eblake@redhat.com>

--
v8: don't try and limit refcount (R-b dropped)
v5: even more wording tweaks
v4: more wording tweaks
v3: new patch
---
 docs/interop/qcow2.txt | 30 ++++++++++++++++++++++++++++--
 1 file changed, 28 insertions(+), 2 deletions(-)

diff --git a/docs/interop/qcow2.txt b/docs/interop/qcow2.txt
index 845d40a086d..89faf7b99f3 100644
--- a/docs/interop/qcow2.txt
+++ b/docs/interop/qcow2.txt
@@ -40,7 +40,16 @@ The first cluster of a qcow2 image contains the file header:
                     with larger cluster sizes.

          24 - 31:   size
-                    Virtual disk size in bytes
+                    Virtual disk size in bytes.
+
+                    Note: with a 2 MB cluster size, the maximum
+                    virtual size is 2 EB (61 bits) for a fully sparse
+                    file; however, L1/L2 table layouts limit an image
+                    to no more than 64 PB (56 bits) of populated
+                    clusters, and an image may hit other limits first
+                    (such as a file system's maximum size).  With a
+                    512 byte cluster size, the maximum virtual size
+                    drops to 128 GB (37 bits).

          32 - 35:   crypt_method
                     0 for no encryption
@@ -326,6 +335,11 @@ in the image file.
 It contains pointers to the second level structures which are called refcount
 blocks and are exactly one cluster in size.

+Although the refcount table can reserve clusters past 64 PB (56 bits)
+(assuming the underlying protocol can even be sized that large), note
+that some qcow2 metadata such as L1/L2 tables must point to clusters
+prior to that point.
+
 Given an offset into the image file, the refcount of its cluster can be
 obtained as follows:

@@ -365,6 +379,16 @@ The L1 table has a variable size (stored in the header) and may use multiple
 clusters, however it must be contiguous in the image file. L2 tables are
 exactly one cluster in size.

+The L1 and L2 tables have implications on the maximum virtual file
+size; a larger cluster size is required for the guest to have access
+to more space.  Furthermore, a virtual cluster must currently map to a
+host offset below 64 PB (56 bits) (although this limit could be
+relaxed by putting reserved bits into use).  Additionally, as cluster
+size increases, the maximum host offset for a compressed cluster is
+reduced (a 2M cluster size requires compressed clusters to reside
+below 512 TB (49 bits), and this limit cannot be relaxed without an
+incompatible layout change).
+
 Given an offset into the virtual disk, the offset into the image file can be
 obtained as follows:

@@ -427,7 +451,9 @@ Standard Cluster Descriptor:
 Compressed Clusters Descriptor (x = 62 - (cluster_bits - 8)):

     Bit  0 - x-1:   Host cluster offset. This is usually _not_ aligned to a
-                    cluster or sector boundary!
+                    cluster or sector boundary!  If cluster_bits is
+                    small enough that this field includes bits beyond
+                    55, those upper bits must be set to 0.

          x - 61:    Number of additional 512-byte sectors used for the
                     compressed data, beyond the sector containing the offset
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [Qemu-devel] [PATCH v3 2/3] qcow2: Don't allow overflow during cluster allocation
  2018-11-13 23:03 [Qemu-devel] [PATCH for 3.1 v3 0/3] minor qcow2 compression improvements Eric Blake
  2018-11-13 23:03 ` [Qemu-devel] [PATCH v3 1/3] qcow2: Document some maximum size constraints Eric Blake
@ 2018-11-13 23:03 ` Eric Blake
  2018-11-13 23:03 ` [Qemu-devel] [PATCH v3 3/3] iotests: Add new test 220 for max compressed cluster offset Eric Blake
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 9+ messages in thread
From: Eric Blake @ 2018-11-13 23:03 UTC (permalink / raw)
  To: qemu-devel; +Cc: berto, qemu-block, kwolf, mreitz

Our code was already checking that we did not attempt to
allocate more clusters than what would fit in an INT64 (the
physical maximimum if we can access a full off_t's worth of
data).  But this does not catch smaller limits enforced by
various spots in the qcow2 image description: L1 and normal
clusters of L2 are documented as having bits 63-56 reserved
for other purposes, capping our maximum offset at 64PB (bit
55 is the maximum bit set).  And for compressed images with
2M clusters, the cap drops the maximum offset to bit 48, or
a maximum offset of 512TB.  If we overflow that offset, we
would write compressed data into one place, but try to
decompress from another, which won't work.

It's actually possible to prove that overflow can cause image
corruption without this patch; I'll add the iotests separately
in the next commit.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>

---
v8: don't artificially cap reftable allocations [Kevin]
v6: improve commit message, now that iotests is possible [Max]
v3: use s->cluster_offset_mask instead of open-coding it [Berto],
use MIN() to clamp offset on small cluster size, add spec patch
first to justify clamping even on refcount allocations
---
 block/qcow2.h          |  6 ++++++
 block/qcow2-refcount.c | 20 +++++++++++++-------
 2 files changed, 19 insertions(+), 7 deletions(-)

diff --git a/block/qcow2.h b/block/qcow2.h
index 29c98d87a07..8662b685753 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -42,6 +42,12 @@
 #define QCOW_MAX_CRYPT_CLUSTERS 32
 #define QCOW_MAX_SNAPSHOTS 65536

+/* Field widths in qcow2 mean normal cluster offsets cannot reach
+ * 64PB; depending on cluster size, compressed clusters can have a
+ * smaller limit (64PB for up to 16k clusters, then ramps down to
+ * 512TB for 2M clusters).  */
+#define QCOW_MAX_CLUSTER_OFFSET ((1ULL << 56) - 1)
+
 /* 8 MB refcount table is enough for 2 PB images at 64k cluster size
  * (128 GB for 512 byte clusters, 2 EB for 2 MB clusters) */
 #define QCOW_MAX_REFTABLE_SIZE S_8MiB
diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index 46082aeac1d..1c63ac244ac 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -31,7 +31,8 @@
 #include "qemu/bswap.h"
 #include "qemu/cutils.h"

-static int64_t alloc_clusters_noref(BlockDriverState *bs, uint64_t size);
+static int64_t alloc_clusters_noref(BlockDriverState *bs, uint64_t size,
+                                    uint64_t max);
 static int QEMU_WARN_UNUSED_RESULT update_refcount(BlockDriverState *bs,
                             int64_t offset, int64_t length, uint64_t addend,
                             bool decrease, enum qcow2_discard_type type);
@@ -362,7 +363,7 @@ static int alloc_refcount_block(BlockDriverState *bs,
     }

     /* Allocate the refcount block itself and mark it as used */
-    int64_t new_block = alloc_clusters_noref(bs, s->cluster_size);
+    int64_t new_block = alloc_clusters_noref(bs, s->cluster_size, INT64_MAX);
     if (new_block < 0) {
         return new_block;
     }
@@ -954,7 +955,8 @@ int qcow2_update_cluster_refcount(BlockDriverState *bs,


 /* return < 0 if error */
-static int64_t alloc_clusters_noref(BlockDriverState *bs, uint64_t size)
+static int64_t alloc_clusters_noref(BlockDriverState *bs, uint64_t size,
+                                    uint64_t max)
 {
     BDRVQcow2State *s = bs->opaque;
     uint64_t i, nb_clusters, refcount;
@@ -979,9 +981,9 @@ retry:
     }

     /* Make sure that all offsets in the "allocated" range are representable
-     * in an int64_t */
+     * in the requested max */
     if (s->free_cluster_index > 0 &&
-        s->free_cluster_index - 1 > (INT64_MAX >> s->cluster_bits))
+        s->free_cluster_index - 1 > (max >> s->cluster_bits))
     {
         return -EFBIG;
     }
@@ -1001,7 +1003,7 @@ int64_t qcow2_alloc_clusters(BlockDriverState *bs, uint64_t size)

     BLKDBG_EVENT(bs->file, BLKDBG_CLUSTER_ALLOC);
     do {
-        offset = alloc_clusters_noref(bs, size);
+        offset = alloc_clusters_noref(bs, size, QCOW_MAX_CLUSTER_OFFSET);
         if (offset < 0) {
             return offset;
         }
@@ -1083,7 +1085,11 @@ int64_t qcow2_alloc_bytes(BlockDriverState *bs, int size)
     free_in_cluster = s->cluster_size - offset_into_cluster(s, offset);
     do {
         if (!offset || free_in_cluster < size) {
-            int64_t new_cluster = alloc_clusters_noref(bs, s->cluster_size);
+            int64_t new_cluster;
+
+            new_cluster = alloc_clusters_noref(bs, s->cluster_size,
+                                               MIN(s->cluster_offset_mask,
+                                                   QCOW_MAX_CLUSTER_OFFSET));
             if (new_cluster < 0) {
                 return new_cluster;
             }
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [Qemu-devel] [PATCH v3 3/3] iotests: Add new test 220 for max compressed cluster offset
  2018-11-13 23:03 [Qemu-devel] [PATCH for 3.1 v3 0/3] minor qcow2 compression improvements Eric Blake
  2018-11-13 23:03 ` [Qemu-devel] [PATCH v3 1/3] qcow2: Document some maximum size constraints Eric Blake
  2018-11-13 23:03 ` [Qemu-devel] [PATCH v3 2/3] qcow2: Don't allow overflow during cluster allocation Eric Blake
@ 2018-11-13 23:03 ` Eric Blake
  2018-11-14 14:35 ` [Qemu-devel] [PATCH for 3.1 v3 0/3] minor qcow2 compression improvements Kevin Wolf
  2018-11-14 14:48 ` Eric Blake
  4 siblings, 0 replies; 9+ messages in thread
From: Eric Blake @ 2018-11-13 23:03 UTC (permalink / raw)
  To: qemu-devel; +Cc: berto, qemu-block, kwolf, mreitz

If you have a capable file system (tmpfs is good, ext4 not so much;
run ./check with TEST_DIR pointing to a good location so as not
to skip the test), it's actually possible to create a qcow2 file
that expands to a sparse 512T image with just over 38M of content.
The test is not the world's fastest (qemu crawling through 256M
bits of refcount table to find the next cluster to allocate takes
several seconds, as does qemu-img check reporting millions of
leaked clusters); but it DOES catch the problem that the previous
patch just fixed where writing a compressed cluster to a full
image ended up overwriting the wrong cluster.

Suggested-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>

---
v8: prefer $() over ``
v7: s/214/220/
v6: new patch; took over 90 seconds to run on my setup, using tmpfs
---
 tests/qemu-iotests/220     | 96 ++++++++++++++++++++++++++++++++++++++
 tests/qemu-iotests/220.out | 54 +++++++++++++++++++++
 tests/qemu-iotests/group   |  1 +
 3 files changed, 151 insertions(+)
 create mode 100755 tests/qemu-iotests/220
 create mode 100644 tests/qemu-iotests/220.out

diff --git a/tests/qemu-iotests/220 b/tests/qemu-iotests/220
new file mode 100755
index 00000000000..0c5682bda07
--- /dev/null
+++ b/tests/qemu-iotests/220
@@ -0,0 +1,96 @@
+#!/bin/bash
+#
+# max limits on compression in huge qcow2 files
+#
+# Copyright (C) 2018 Red Hat, Inc.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+#
+
+seq=$(basename $0)
+echo "QA output created by $seq"
+
+status=1    # failure is the default!
+
+_cleanup()
+{
+    _cleanup_test_img
+}
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+# get standard environment, filters and checks
+. ./common.rc
+. ./common.filter
+. ./common.pattern
+
+_supported_fmt qcow2
+_supported_proto file
+_supported_os Linux
+
+echo "== Creating huge file =="
+
+# Sanity check: We require a file system that permits the creation
+# of a HUGE (but very sparse) file.  tmpfs works, ext4 does not.
+if ! truncate --size=513T "$TEST_IMG"; then
+    _notrun "file system on $TEST_DIR does not support large enough files"
+fi
+rm "$TEST_IMG"
+IMGOPTS='cluster_size=2M,refcount_bits=1' _make_test_img 513T
+
+echo "== Populating refcounts =="
+# We want an image with 256M refcounts * 2M clusters = 512T referenced.
+# Each 2M cluster holds 16M refcounts; the refcount table initially uses
+# 1 refblock, so we need to add 15 more.  The refcount table lives at 2M,
+# first refblock at 4M, L2 at 6M, so our remaining additions start at 8M.
+# Then, for each refblock, mark it as fully populated.
+to_hex() {
+    printf %016x\\n $1 | sed 's/\(..\)/\\x\1/g'
+}
+truncate --size=38m "$TEST_IMG"
+entry=$((0x200000))
+$QEMU_IO_PROG -f raw -c "w -P 0xff 4m 2m" "$TEST_IMG" | _filter_qemu_io
+for i in {1..15}; do
+    offs=$((0x600000 + i*0x200000))
+    poke_file "$TEST_IMG" $((i*8 + entry)) $(to_hex $offs)
+    $QEMU_IO_PROG -f raw -c "w -P 0xff $offs 2m" "$TEST_IMG" | _filter_qemu_io
+done
+
+echo "== Checking file before =="
+# FIXME: 'qemu-img check' doesn't diagnose refcounts beyond the end of
+# the file as leaked clusters
+_check_test_img 2>&1 | sed '/^Leaked cluster/d'
+stat -c 'image size %s' "$TEST_IMG"
+
+echo "== Trying to write compressed cluster =="
+# Given our file size, the next available cluster at 512T lies beyond the
+# maximum offset that a compressed 2M cluster can reside in
+$QEMU_IO_PROG -c 'w -c 0 2m' "$TEST_IMG" | _filter_qemu_io
+# The attempt failed, but ended up allocating a new refblock
+stat -c 'image size %s' "$TEST_IMG"
+
+echo "== Writing normal cluster =="
+# The failed write should not corrupt the image, so a normal write succeeds
+$QEMU_IO_PROG -c 'w 0 2m' "$TEST_IMG" | _filter_qemu_io
+
+echo "== Checking file after =="
+# qemu-img now sees the millions of leaked clusters, thanks to the allocations
+# at 512T.  Undo many of our faked references to speed up the check.
+$QEMU_IO_PROG -f raw -c "w -z 5m 1m" -c "w -z 8m 30m" "$TEST_IMG" |
+    _filter_qemu_io
+_check_test_img 2>&1 | sed '/^Leaked cluster/d'
+
+# success, all done
+echo "*** done"
+rm -f $seq.full
+status=0
diff --git a/tests/qemu-iotests/220.out b/tests/qemu-iotests/220.out
new file mode 100644
index 00000000000..af3021fd883
--- /dev/null
+++ b/tests/qemu-iotests/220.out
@@ -0,0 +1,54 @@
+QA output created by 220
+== Creating huge file ==
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=564049465049088
+== Populating refcounts ==
+wrote 2097152/2097152 bytes at offset 4194304
+2 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 2097152/2097152 bytes at offset 8388608
+2 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 2097152/2097152 bytes at offset 10485760
+2 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 2097152/2097152 bytes at offset 12582912
+2 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 2097152/2097152 bytes at offset 14680064
+2 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 2097152/2097152 bytes at offset 16777216
+2 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 2097152/2097152 bytes at offset 18874368
+2 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 2097152/2097152 bytes at offset 20971520
+2 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 2097152/2097152 bytes at offset 23068672
+2 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 2097152/2097152 bytes at offset 25165824
+2 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 2097152/2097152 bytes at offset 27262976
+2 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 2097152/2097152 bytes at offset 29360128
+2 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 2097152/2097152 bytes at offset 31457280
+2 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 2097152/2097152 bytes at offset 33554432
+2 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 2097152/2097152 bytes at offset 35651584
+2 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 2097152/2097152 bytes at offset 37748736
+2 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+== Checking file before ==
+No errors were found on the image.
+image size 39845888
+== Trying to write compressed cluster ==
+write failed: Input/output error
+image size 562949957615616
+== Writing normal cluster ==
+wrote 2097152/2097152 bytes at offset 0
+2 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+== Checking file after ==
+wrote 1048576/1048576 bytes at offset 5242880
+1 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 31457280/31457280 bytes at offset 8388608
+30 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+
+8388589 leaked clusters were found on the image.
+This means waste of disk space, but no harm to data.
+*** done
diff --git a/tests/qemu-iotests/group b/tests/qemu-iotests/group
index ebe4fe78bc3..4d194716f28 100644
--- a/tests/qemu-iotests/group
+++ b/tests/qemu-iotests/group
@@ -219,6 +219,7 @@
 217 rw auto quick
 218 rw auto quick
 219 rw auto
+220 rw auto
 221 rw auto quick
 222 rw auto quick
 223 rw auto quick
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [Qemu-devel] [PATCH for 3.1 v3 0/3] minor qcow2 compression improvements
  2018-11-13 23:03 [Qemu-devel] [PATCH for 3.1 v3 0/3] minor qcow2 compression improvements Eric Blake
                   ` (2 preceding siblings ...)
  2018-11-13 23:03 ` [Qemu-devel] [PATCH v3 3/3] iotests: Add new test 220 for max compressed cluster offset Eric Blake
@ 2018-11-14 14:35 ` Kevin Wolf
  2018-11-15 18:36   ` Eric Blake
  2018-11-14 14:48 ` Eric Blake
  4 siblings, 1 reply; 9+ messages in thread
From: Kevin Wolf @ 2018-11-14 14:35 UTC (permalink / raw)
  To: Eric Blake; +Cc: qemu-devel, berto, qemu-block, mreitz

Am 14.11.2018 um 00:03 hat Eric Blake geschrieben:
> As the added iotests shows, we have a (corner case) data corruption
> that is user triggerable, therefore, this is still appropriate for
> inclusion in 3.1.

Thanks, applied to the block branch.

Kevin

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Qemu-devel] [PATCH for 3.1 v3 0/3] minor qcow2 compression improvements
  2018-11-13 23:03 [Qemu-devel] [PATCH for 3.1 v3 0/3] minor qcow2 compression improvements Eric Blake
                   ` (3 preceding siblings ...)
  2018-11-14 14:35 ` [Qemu-devel] [PATCH for 3.1 v3 0/3] minor qcow2 compression improvements Kevin Wolf
@ 2018-11-14 14:48 ` Eric Blake
  4 siblings, 0 replies; 9+ messages in thread
From: Eric Blake @ 2018-11-14 14:48 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, berto, qemu-block, mreitz

On 11/13/18 5:03 PM, Eric Blake wrote:
> As the added iotests shows, we have a (corner case) data corruption
> that is user triggerable, therefore, this is still appropriate for
> inclusion in 3.1.
> 
> v6 was here:
> https://lists.gnu.org/archive/html/qemu-devel/2018-06/msg08497.html

Wow, I can't count. That was v7, and this email should be titled v8, not 
v3 ;)

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Qemu-devel] [PATCH v3 1/3] qcow2: Document some maximum size constraints
  2018-11-13 23:03 ` [Qemu-devel] [PATCH v3 1/3] qcow2: Document some maximum size constraints Eric Blake
@ 2018-11-15 15:17   ` Alberto Garcia
  2018-11-15 16:24     ` Eric Blake
  0 siblings, 1 reply; 9+ messages in thread
From: Alberto Garcia @ 2018-11-15 15:17 UTC (permalink / raw)
  To: Eric Blake, qemu-devel; +Cc: qemu-block, kwolf, mreitz

On Wed 14 Nov 2018 12:03:17 AM CET, Eric Blake wrote:
> @@ -427,7 +451,9 @@ Standard Cluster Descriptor:
>  Compressed Clusters Descriptor (x = 62 - (cluster_bits - 8)):
>
>      Bit  0 - x-1:   Host cluster offset. This is usually _not_ aligned to a
> -                    cluster or sector boundary!
> +                    cluster or sector boundary!  If cluster_bits is
> +                    small enough that this field includes bits beyond
> +                    55, those upper bits must be set to 0.

While I think that it's good to have a 56 bits upper limit for both
compressed and uncompressed clusters, I'm wondering: is it theoretically
possible to have data clusters above 64PB if they're all compressed?

Berto

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Qemu-devel] [PATCH v3 1/3] qcow2: Document some maximum size constraints
  2018-11-15 15:17   ` Alberto Garcia
@ 2018-11-15 16:24     ` Eric Blake
  0 siblings, 0 replies; 9+ messages in thread
From: Eric Blake @ 2018-11-15 16:24 UTC (permalink / raw)
  To: Alberto Garcia, qemu-devel; +Cc: qemu-block, kwolf, mreitz

On 11/15/18 9:17 AM, Alberto Garcia wrote:
> On Wed 14 Nov 2018 12:03:17 AM CET, Eric Blake wrote:
>> @@ -427,7 +451,9 @@ Standard Cluster Descriptor:
>>   Compressed Clusters Descriptor (x = 62 - (cluster_bits - 8)):
>>
>>       Bit  0 - x-1:   Host cluster offset. This is usually _not_ aligned to a
>> -                    cluster or sector boundary!
>> +                    cluster or sector boundary!  If cluster_bits is
>> +                    small enough that this field includes bits beyond
>> +                    55, those upper bits must be set to 0.
> 
> While I think that it's good to have a 56 bits upper limit for both
> compressed and uncompressed clusters, I'm wondering: is it theoretically
> possible to have data clusters above 64PB if they're all compressed?

The question is only applicable for cluster sizes of 8k and smaller. 
With an 8k cluster image and the qcow2.h limit of a 32MiB L1 table (4096 
clusters, each of which holds 1024 L2 entries, and each L2 table holds 
1024 cluster entries), you can have up to 4k * 1k * 1k * 8k == 32T of 
guest size.  You'd need a LOT of metadata (for example, over 2000 
internal snapshots) before the host file would reach 64PB to even need 
to store compressed clusters at a host offset that large.  At the same 
time, qemu would limit you to an 8MiB refcount table (1024 clusters, 
each of which holds 1024 refblocks, which in turn hold a default of 4096 
refcounts, but with a refcount_order of 0 could hold 64k refcounts), 
which results in qemu allowing your maximum host offset to be 1k * 1k * 
64k * 8k == 512T, which means qemu will refuse to generate or open such 
an image in the first place.  So if you have an image that tries to 
store a compressed data cluster above host offset 64PB, qemu is unable 
to process that image.

But your question does mean this other part of my patch:

 >
 >            24 - 31:   size
 > -                    Virtual disk size in bytes
 > +                    Virtual disk size in bytes.
 > +
 > +                    Note: with a 2 MB cluster size, the maximum
 > +                    virtual size is 2 EB (61 bits) for a fully sparse
 > +                    file; however, L1/L2 table layouts limit an image
 > +                    to no more than 64 PB (56 bits) of populated
 > +                    clusters, and an image may hit other limits first
 > +                    (such as a file system's maximum size).  With a
 > +                    512 byte cluster size, the maximum virtual size
 > +                    drops to 128 GB (37 bits).

is misleading.  Elsewhere, we mention for cluster_bits:

                     Note: qemu as of today has an implementation limit 
of 2 MB
                     as the maximum cluster size and won't be able to 
open images
                     with larger cluster sizes.

and looking at the code in qcow2.h, the 2EB limits on maximum virtual 
size is NOT an inherent limit in the qcow2 file format, but rather a 
result of qemu's implementation refusing to size the L1 table larger 
than 32MiB.  If you allow a larger L1 table, you can get to larger 
virtual addresses.  So I need to fix this patch [again] to add in 
wording about this being a qemu limit.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Qemu-devel] [PATCH for 3.1 v3 0/3] minor qcow2 compression improvements
  2018-11-14 14:35 ` [Qemu-devel] [PATCH for 3.1 v3 0/3] minor qcow2 compression improvements Kevin Wolf
@ 2018-11-15 18:36   ` Eric Blake
  0 siblings, 0 replies; 9+ messages in thread
From: Eric Blake @ 2018-11-15 18:36 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: qemu-devel, berto, qemu-block, mreitz

On 11/14/18 8:35 AM, Kevin Wolf wrote:
> Am 14.11.2018 um 00:03 hat Eric Blake geschrieben:
>> As the added iotests shows, we have a (corner case) data corruption
>> that is user triggerable, therefore, this is still appropriate for
>> inclusion in 3.1.
> 
> Thanks, applied to the block branch.

Patch 2 and 3 are fine; but Berto raised questions about patch 1 so I 
posted a v9 of that patch, in case you want to replace it in your queue 
and/or defer it out of 3.1 and instead save it for 4.0.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2018-11-15 18:37 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-11-13 23:03 [Qemu-devel] [PATCH for 3.1 v3 0/3] minor qcow2 compression improvements Eric Blake
2018-11-13 23:03 ` [Qemu-devel] [PATCH v3 1/3] qcow2: Document some maximum size constraints Eric Blake
2018-11-15 15:17   ` Alberto Garcia
2018-11-15 16:24     ` Eric Blake
2018-11-13 23:03 ` [Qemu-devel] [PATCH v3 2/3] qcow2: Don't allow overflow during cluster allocation Eric Blake
2018-11-13 23:03 ` [Qemu-devel] [PATCH v3 3/3] iotests: Add new test 220 for max compressed cluster offset Eric Blake
2018-11-14 14:35 ` [Qemu-devel] [PATCH for 3.1 v3 0/3] minor qcow2 compression improvements Kevin Wolf
2018-11-15 18:36   ` Eric Blake
2018-11-14 14:48 ` Eric Blake

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.